[jira] [Commented] (SPARK-20703) Add an operator for writing data out

2017-07-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085559#comment-16085559 ] Steve Loughran commented on SPARK-20703: Regarding a patch for this, what do people suggest as a

[jira] [Commented] (SPARK-21374) Reading globbed paths from S3 into DF doesn't work if filesystem caching is disabled

2017-07-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085645#comment-16085645 ] Steve Loughran commented on SPARK-21374: This is possibly a sign that your new configuration

[jira] [Commented] (SPARK-19790) OutputCommitCoordinator should not allow another task to commit after an ExecutorFailure

2017-07-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088630#comment-16088630 ] Steve Loughran commented on SPARK-19790: I've now summarised the FileOutputCommitter v1 and v2

[jira] [Commented] (SPARK-20703) Add an operator for writing data out

2017-07-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088632#comment-16088632 ] Steve Loughran commented on SPARK-20703: ..got a patch for this, but want to see if I can create

[jira] [Commented] (SPARK-20107) Add spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version option to configuration.md

2017-07-04 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073951#comment-16073951 ] Steve Loughran commented on SPARK-20107: If you are curious, I've just written out the v1 and v2

[jira] [Commented] (SPARK-12868) ADD JAR via sparkSQL JDBC will fail when using a HDFS URL

2017-06-29 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068301#comment-16068301 ] Steve Loughran commented on SPARK-12868: It's actually not the cause of that, merely the

[jira] [Commented] (SPARK-21137) Spark reads many small files slowly

2017-06-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065423#comment-16065423 ] Steve Loughran commented on SPARK-21137: Looking at this. something is trying to get the

[jira] [Commented] (SPARK-21137) Spark reads many small files slowly

2017-06-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065445#comment-16065445 ] Steve Loughran commented on SPARK-21137: Filed HADOOP-14600. Looks like a v. old codepath that's

[jira] [Commented] (SPARK-21137) Spark reads many small files slowly

2017-06-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065428#comment-16065428 ] Steve Loughran commented on SPARK-21137: ps, for now, do it in parallel:

[jira] [Commented] (SPARK-12868) ADD JAR via sparkSQL JDBC will fail when using a HDFS URL

2017-06-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065448#comment-16065448 ] Steve Loughran commented on SPARK-12868: I think this is the case of HADOOP-14598: once the FS

[jira] [Commented] (SPARK-21137) Spark reads many small files slowly

2017-06-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065515#comment-16065515 ] Steve Loughran commented on SPARK-21137: bq. so it is something that could be optimized in the

[jira] [Commented] (SPARK-7481) Add spark-hadoop-cloud module to pull in object store support

2017-04-26 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984771#comment-15984771 ] Steve Loughran commented on SPARK-7481: --- I think we ended up going in circles on that PR. Sean has

[jira] [Commented] (SPARK-7481) Add spark-hadoop-cloud module to pull in object store support

2017-04-26 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985040#comment-15985040 ] Steve Loughran commented on SPARK-7481: --- (This is a fairly long comment, but it tries to summarise

[jira] [Commented] (SPARK-17159) Improve FileInputDStream.findNewFiles list performance

2017-04-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981081#comment-15981081 ] Steve Loughran commented on SPARK-17159: pulled out documentation into separate JIRA,

[jira] [Created] (SPARK-20448) Document how FileInputDStream works with object storage

2017-04-24 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-20448: -- Summary: Document how FileInputDStream works with object storage Key: SPARK-20448 URL: https://issues.apache.org/jira/browse/SPARK-20448 Project: Spark

[jira] [Commented] (SPARK-21374) Reading globbed paths from S3 into DF doesn't work if filesystem caching is disabled

2017-08-01 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108897#comment-16108897 ] Steve Loughran commented on SPARK-21374: I understand...the patch shows the issue. Its only

[jira] [Commented] (SPARK-21514) Hive has updated with new support for S3 and InsertIntoHiveTable.scala should update also

2017-08-01 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108882#comment-16108882 ] Steve Loughran commented on SPARK-21514: Can you link this JIRA to the specific HIVE work? >

[jira] [Commented] (SPARK-21618) http(s) not accepted in spark-submit jar uri

2017-08-04 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114379#comment-16114379 ] Steve Loughran commented on SPARK-21618: yes, and that 2.9+ feature breaks things, because when

[jira] [Created] (SPARK-21762) FileFormatWriter metrics collection fails if a newly close()d file isn't yet visible

2017-08-17 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-21762: -- Summary: FileFormatWriter metrics collection fails if a newly close()d file isn't yet visible Key: SPARK-21762 URL: https://issues.apache.org/jira/browse/SPARK-21762

[jira] [Commented] (SPARK-20370) create external table on read only location fails

2017-05-02 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993405#comment-15993405 ] Steve Loughran commented on SPARK-20370: Is this the bit under the PR tagged "!! HACK ALERT !!"

[jira] [Commented] (SPARK-20560) Review Spark's handling of filesystems returning "localhost" in getFileBlockLocations

2017-05-02 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993323#comment-15993323 ] Steve Loughran commented on SPARK-20560: To follow this up, I've now got a test which verifies

[jira] [Commented] (SPARK-20608) Standby namenodes should be allowed to included in yarn.spark.access.namenodes to support HDFS HA

2017-05-11 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006454#comment-16006454 ] Steve Loughran commented on SPARK-20608: One thing to consider here is starting with a test to

[jira] [Commented] (SPARK-21074) Parquet files are read fully even though only count() is requested

2017-06-20 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056309#comment-16056309 ] Steve Loughran commented on SPARK-21074: Given this is an s3 URL, it may be amplifying the

[jira] [Commented] (SPARK-19111) S3 Mesos history upload fails silently if too large

2017-06-20 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056516#comment-16056516 ] Steve Loughran commented on SPARK-19111: Followup: [~drcrallen]; Hadoop 2.8 is out the door with

[jira] [Commented] (SPARK-11373) Add metrics to the History Server and providers

2017-06-22 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059141#comment-16059141 ] Steve Loughran commented on SPARK-11373: metrics might help with understanding the s3 load issues

[jira] [Commented] (SPARK-20799) Unable to infer schema for ORC on reading ORC from S3

2017-05-19 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017258#comment-16017258 ] Steve Loughran commented on SPARK-20799: bq. Spark does output the S3xLoginHelper:90 - The

[jira] [Commented] (SPARK-20799) Unable to infer schema for ORC on reading ORC from S3

2017-05-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022807#comment-16022807 ] Steve Loughran commented on SPARK-20799: If what I think is happening is, then it's the security

[jira] [Updated] (SPARK-20799) Unable to infer schema for ORC on S3N when secrets are in the URL

2017-05-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-20799: --- Summary: Unable to infer schema for ORC on S3N when secrets are in the URL (was: Unable to

[jira] [Commented] (SPARK-19669) Open up visibility for sharedState, sessionState, and a few other functions

2017-05-26 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026219#comment-16026219 ] Steve Loughran commented on SPARK-19669: thanks for this, very nice to have Logging usable

[jira] [Commented] (SPARK-8578) Should ignore user defined output committer when appending data

2017-05-25 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024526#comment-16024526 ] Steve Loughran commented on SPARK-8578: --- Given SPARK-10063 has pulled the

[jira] [Commented] (SPARK-20886) HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null

2017-05-25 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024886#comment-16024886 ] Steve Loughran commented on SPARK-20886: Stack trace: after {code} 2017-05-25 16:22:10,807

[jira] [Commented] (SPARK-20886) HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null

2017-05-25 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024885#comment-16024885 ] Steve Loughran commented on SPARK-20886: Stack trace: before {code} Driver stacktrace: at

[jira] [Created] (SPARK-20886) HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null

2017-05-25 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-20886: -- Summary: HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null Key: SPARK-20886 URL: https://issues.apache.org/jira/browse/SPARK-20886

[jira] [Updated] (SPARK-20799) Unable to infer schema for ORC on S3N when secrets are in the URL

2017-05-25 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-20799: --- Priority: Minor (was: Major) > Unable to infer schema for ORC on S3N when secrets are in

[jira] [Updated] (SPARK-20799) Unable to infer schema for ORC/Parquet on S3N when secrets are in the URL

2017-06-02 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-20799: --- Summary: Unable to infer schema for ORC/Parquet on S3N when secrets are in the URL (was:

[jira] [Updated] (SPARK-20799) Unable to infer schema for ORC/Parquet on S3N when secrets are in the URL

2017-06-02 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-20799: --- Environment: Hadoop 2.8.0 binaries > Unable to infer schema for ORC/Parquet on S3N when

[jira] [Commented] (SPARK-21077) Cannot access public files over S3 protocol

2017-06-16 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051680#comment-16051680 ] Steve Loughran commented on SPARK-21077: like people say, this is inevitably a config problem.

[jira] [Commented] (SPARK-7481) Add spark-hadoop-cloud module to pull in object store support

2017-05-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000502#comment-16000502 ] Steve Loughran commented on SPARK-7481: --- thank you! > Add spark-hadoop-cloud module to pull in

[jira] [Commented] (SPARK-20608) Standby namenodes should be allowed to included in yarn.spark.access.namenodes to support HDFS HA

2017-05-05 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998150#comment-15998150 ] Steve Loughran commented on SPARK-20608: Probably good to pull in someone who understands HDFS

[jira] [Created] (SPARK-20560) Review Spark's handling of filesystems returning "localhost" in getFileBlockLocations

2017-05-02 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-20560: -- Summary: Review Spark's handling of filesystems returning "localhost" in getFileBlockLocations Key: SPARK-20560 URL: https://issues.apache.org/jira/browse/SPARK-20560

[jira] [Commented] (SPARK-20560) Review Spark's handling of filesystems returning "localhost" in getFileBlockLocations

2017-05-02 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993008#comment-15993008 ] Steve Loughran commented on SPARK-20560: {{FileSystem.getFileBlockLocations(path)}} is only

[jira] [Commented] (SPARK-19582) DataFrameReader conceptually inadequate

2017-05-02 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992985#comment-15992985 ] Steve Loughran commented on SPARK-19582: All spark is doing is taking a URL To data, mapping that

[jira] [Resolved] (SPARK-20560) Review Spark's handling of filesystems returning "localhost" in getFileBlockLocations

2017-05-02 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SPARK-20560. Resolution: Invalid "localhost" is filtered, been done in

[jira] [Updated] (SPARK-21137) Spark reads many small files slowly off local filesystem

2017-06-28 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-21137: --- Summary: Spark reads many small files slowly off local filesystem (was: Spark reads many

[jira] [Commented] (SPARK-20886) HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null

2017-09-19 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171386#comment-16171386 ] Steve Loughran commented on SPARK-20886: Not, but related. This is handling the situation where

[jira] [Commented] (SPARK-21549) Spark fails to complete job correctly in case of OutputFormat which do not write into hdfs

2017-09-19 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171393#comment-16171393 ] Steve Loughran commented on SPARK-21549: Linking to SPARK-20045, which highlights the commit

[jira] [Commented] (SPARK-21549) Spark fails to complete job correctly in case of OutputFormat which do not write into hdfs

2017-09-19 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171420#comment-16171420 ] Steve Loughran commented on SPARK-21549: # you can't rely on the committers having output and

[jira] [Commented] (SPARK-21549) Spark fails to complete job correctly in case of OutputFormat which do not write into hdfs

2017-09-20 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173449#comment-16173449 ] Steve Loughran commented on SPARK-21549: The {{newTaskTempFileAbsPath()}} method is an

[jira] [Resolved] (SPARK-17159) Improve FileInputDStream.findNewFiles list performance

2017-09-20 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SPARK-17159. Resolution: Won't Fix Based on the feedback of https://github.com/apache/spark/pull/14731

[jira] [Updated] (SPARK-22163) Design Issue of Spark Streaming that Causes Random Run-time Exception

2017-10-05 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-22163: --- Priority: Major (was: Critical) > Design Issue of Spark Streaming that Causes Random

[jira] [Updated] (SPARK-22217) ParquetFileFormat to support arbitrary OutputCommitters

2017-10-06 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-22217: --- Priority: Minor (was: Major) > ParquetFileFormat to support arbitrary OutputCommitters >

[jira] [Created] (SPARK-22217) ParquetFileFormat to support arbitrary OutputCommitters if parquet.enable.summary-metadata is false

2017-10-06 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-22217: -- Summary: ParquetFileFormat to support arbitrary OutputCommitters if parquet.enable.summary-metadata is false Key: SPARK-22217 URL:

[jira] [Updated] (SPARK-22217) ParquetFileFormat to support arbitrary OutputCommitters

2017-10-06 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-22217: --- Summary: ParquetFileFormat to support arbitrary OutputCommitters (was: ParquetFileFormat to

[jira] [Commented] (SPARK-22240) S3 CSV number of partitions incorrectly computed

2017-10-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203495#comment-16203495 ] Steve Loughran commented on SPARK-22240: We've got a test in HADOOP-14943 which looks @ part

[jira] [Comment Edited] (SPARK-21999) ConcurrentModificationException - Spark Streaming

2017-10-05 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192732#comment-16192732 ] Steve Loughran edited comment on SPARK-21999 at 10/5/17 1:39 PM: - Apache

[jira] [Commented] (SPARK-21999) ConcurrentModificationException - Spark Streaming

2017-10-05 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192732#comment-16192732 ] Steve Loughran commented on SPARK-21999: Apache projects are all open source, with an open

[jira] [Commented] (SPARK-21999) ConcurrentModificationException - Spark Streaming

2017-10-06 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194332#comment-16194332 ] Steve Loughran commented on SPARK-21999: Telling a project "their design is wrong" and expecting

[jira] [Resolved] (SPARK-21999) ConcurrentModificationException - Spark Streaming

2017-10-06 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SPARK-21999. Resolution: Won't Fix > ConcurrentModificationException - Spark Streaming >

[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

2017-10-17 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207573#comment-16207573 ] Steve Loughran commented on SPARK-2984: --- bq. multiple batches writing to same location

[jira] [Commented] (SPARK-22240) S3 CSV number of partitions incorrectly computed

2017-10-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203709#comment-16203709 ] Steve Loughran commented on SPARK-22240: [~hyukjin.kwon]: we now see that on s3a, you only ever

[jira] [Commented] (SPARK-22240) S3 CSV number of partitions incorrectly computed

2017-10-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201863#comment-16201863 ] Steve Loughran commented on SPARK-22240: thanks. Now for a question which is probably obvious to

[jira] [Commented] (SPARK-21797) spark cannot read partitioned data in S3 that are partly in glacier

2017-10-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201917#comment-16201917 ] Steve Loughran commented on SPARK-21797: Update, in HADOOP-14874 I've noted we could use the

[jira] [Commented] (SPARK-22240) S3 CSV number of partitions incorrectly computed

2017-10-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203966#comment-16203966 ] Steve Loughran commented on SPARK-22240: Point me at a simple test suite for the multiline & I'll

[jira] [Commented] (SPARK-21797) spark cannot read partitioned data in S3 that are partly in glacier

2017-08-29 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146193#comment-16146193 ] Steve Loughran commented on SPARK-21797: No> That's a shame. I only came across the option when I

[jira] [Commented] (SPARK-20448) Document how FileInputDStream works with object storage

2017-09-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178236#comment-16178236 ] Steve Loughran commented on SPARK-20448: thanks! > Document how FileInputDStream works with

[jira] [Resolved] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop

2017-09-29 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SPARK-2356. --- Resolution: Duplicate > Exception: Could not locate executable null\bin\winutils.exe in the

[jira] [Commented] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop

2017-09-29 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185744#comment-16185744 ] Steve Loughran commented on SPARK-2356: --- [~Vasilina], that probably means you're running with Hadoop

[jira] [Commented] (SPARK-21817) Pass FSPermissions to LocatedFileStatus from InMemoryFileIndex

2017-08-23 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138683#comment-16138683 ] Steve Loughran commented on SPARK-21817: I think it's a regression in HDFS-6984; the superclass

[jira] [Commented] (SPARK-21817) Pass FSPermissions to LocatedFileStatus from InMemoryFileIndex

2017-08-23 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138664#comment-16138664 ] Steve Loughran commented on SPARK-21817: This a regression in HDFS? > Pass FSPermissions to

[jira] [Commented] (SPARK-21817) Pass FSPermissions to LocatedFileStatus from InMemoryFileIndex

2017-08-23 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138685#comment-16138685 ] Steve Loughran commented on SPARK-21817: API is tagged as stable/evolving; it's clearly in use

[jira] [Commented] (SPARK-21702) Structured Streaming S3A SSE Encryption Not Visible through AWS S3 GUI when PartitionBy Used

2017-08-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139898#comment-16139898 ] Steve Loughran commented on SPARK-21702: IF this is just "directories", then there are no

[jira] [Resolved] (SPARK-21702) Structured Streaming S3A SSE Encryption Not Visible through AWS S3 GUI when PartitionBy Used

2017-08-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SPARK-21702. Resolution: Invalid > Structured Streaming S3A SSE Encryption Not Visible through AWS S3

[jira] [Commented] (SPARK-21797) spark cannot read partitioned data in S3 that are partly in glacier

2017-08-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140138#comment-16140138 ] Steve Loughran commented on SPARK-21797: I was talking about the cost and time of getting data

[jira] [Commented] (SPARK-21797) spark cannot read partitioned data in S3 that are partly in glacier

2017-08-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140147#comment-16140147 ] Steve Loughran commented on SPARK-21797: Note that if it is just during spark partition

[jira] [Commented] (SPARK-21817) Pass FSPermissions to LocatedFileStatus from InMemoryFileIndex

2017-08-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140161#comment-16140161 ] Steve Loughran commented on SPARK-21817: FYI, this is now fixed in hadoop trunk/3.0-beta-1 >

[jira] [Commented] (SPARK-21797) spark cannot read partitioned data in S3 that are partly in glacier

2017-08-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139919#comment-16139919 ] Steve Loughran commented on SPARK-21797: If you are using S3// URLs then its the AWS team's

[jira] [Commented] (SPARK-21797) spark cannot read partitioned data in S3 that are partly in glacier

2017-08-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140408#comment-16140408 ] Steve Loughran commented on SPARK-21797: This is happening deep the Amazon EMR team's closed

[jira] [Updated] (SPARK-21797) spark cannot read partitioned data in S3 that are partly in glacier

2017-08-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-21797: --- Environment: Amazon EMR > spark cannot read partitioned data in S3 that are partly in

[jira] [Comment Edited] (SPARK-21797) spark cannot read partitioned data in S3 that are partly in glacier

2017-08-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140408#comment-16140408 ] Steve Loughran edited comment on SPARK-21797 at 8/24/17 5:56 PM: - This is

[jira] [Commented] (SPARK-21797) spark cannot read partitioned data in S3 that are partly in glacier

2017-08-25 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141483#comment-16141483 ] Steve Loughran commented on SPARK-21797: bq. According to our test, it is 20% slower maximum to

[jira] [Updated] (SPARK-21762) FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible

2017-08-17 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-21762: --- Summary: FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file

[jira] [Comment Edited] (SPARK-21762) FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible

2017-08-17 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131190#comment-16131190 ] Steve Loughran edited comment on SPARK-21762 at 8/17/17 7:41 PM: -

[jira] [Commented] (SPARK-21762) FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible

2017-08-17 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131190#comment-16131190 ] Steve Loughran commented on SPARK-21762: SPARK-20703 simplifies this, especially testing, as it's

[jira] [Commented] (SPARK-22240) S3 CSV number of partitions incorrectly computed

2017-10-11 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200367#comment-16200367 ] Steve Loughran commented on SPARK-22240: Amazon EMR is amazon's own fork of Spark & Hadoop, with

[jira] [Commented] (SPARK-22240) S3 CSV number of partitions incorrectly computed

2017-10-11 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200485#comment-16200485 ] Steve Loughran commented on SPARK-22240: What's the link to the multiline JIRA? As that could

[jira] [Commented] (SPARK-9103) Tracking spark's memory usage

2017-09-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182372#comment-16182372 ] Steve Loughran commented on SPARK-9103: --- If it helps, most uses of ByteBuffer in hadoop core & HDFS

[jira] [Commented] (SPARK-22587) Spark job fails if fs.defaultFS and application jar are different url

2017-11-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266697#comment-16266697 ] Steve Loughran commented on SPARK-22587: See also

[jira] [Commented] (SPARK-22587) Spark job fails if fs.defaultFS and application jar are different url

2017-11-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266674#comment-16266674 ] Steve Loughran commented on SPARK-22587: Jerry had already pulled me in for this; it's one of

[jira] [Commented] (SPARK-22526) Document closing of PortableDataInputStream in binaryFiles

2017-11-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267015#comment-16267015 ] Steve Loughran commented on SPARK-22526: HADOOP-15071 updates the s3a troubleshooting with what

[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265167#comment-16265167 ] Steve Loughran commented on SPARK-22526: it says it in the javadocs for

[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-23 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264095#comment-16264095 ] Steve Loughran commented on SPARK-22526: # Fix the code you invoke #. wrap the code you invoke

[jira] [Comment Edited] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-23 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264095#comment-16264095 ] Steve Loughran edited comment on SPARK-22526 at 11/23/17 3:47 PM: -- # Fix

[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-23 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264496#comment-16264496 ] Steve Loughran commented on SPARK-22526: I'm not giving a permanent fix. It's a bug in your code

[jira] [Commented] (SPARK-22657) Hadoop fs implementation classes are not loaded if they are part of the app jar or other jar when --packages flag is used

2017-11-30 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272974#comment-16272974 ] Steve Loughran commented on SPARK-22657: Hadoop FileSystem service introspection for FS binding

[jira] [Commented] (SPARK-22657) Hadoop fs implementation classes are not loaded if they are part of the app jar or other jar when --packages flag is used

2017-11-30 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273191#comment-16273191 ] Steve Loughran commented on SPARK-22657: if you look at HADOOP-14138 you can see why we cut s3a

[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-22 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263021#comment-16263021 ] Steve Loughran commented on SPARK-22526: If the input stream doesn't get closed, there probably

[jira] [Commented] (SPARK-18294) Implement commit protocol to support `mapred` package's committer

2017-12-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292595#comment-16292595 ] Steve Loughran commented on SPARK-18294: Following up on this, one question: Why support the

[jira] [Commented] (SPARK-14959) ​Problem Reading partitioned ORC or Parquet files

2017-11-17 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256789#comment-16256789 ] Steve Loughran commented on SPARK-14959: Came across a reference to this while scanning for

[jira] [Commented] (SPARK-16996) Hive ACID delta files not seen

2017-11-16 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255329#comment-16255329 ] Steve Loughran commented on SPARK-16996: [~maver1ck] Spark hive is custom as it was modified to

[jira] [Commented] (SPARK-17593) list files on s3 very slow

2017-11-10 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247886#comment-16247886 ] Steve Loughran commented on SPARK-17593: Hey nick, yes, need to move to FileSystem.list(path,

<    1   2   3   4   5   6   7   8   9   >