[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-17 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19497 I guess one aspect of `saveAsNewAPIHadoopFile` is that it calls ` jobConfiguration.set("mapreduce.output.fileoutputformat.outputdir", path)`, and `Configuration.set(String key, St

[GitHub] spark pull request #19269: [SPARK-22026][SQL] data source v2 write path

2017-10-17 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r145096772 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r144948292 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-16 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 thanks for the review everyone! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r144823664 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r144823298 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r144822800 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r144822829 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r144821527 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r144821389 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/SimpleWritableDataSource.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark issue #19487: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19487 The more I see of the committer internals, the less confident I am about understanding any of it. If your committer isn't writing stuff out, it doesn't need to have any value

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 done. Not writing 0-byte files will offer significant speedup against object stores, where the cost of a call to getFileStatus() can take hundreds of millis. I look forward

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19448 > But, if I were working on a Spark distribution at a vendor, this is something I would definitely include because it's such a useful feature. I con

[GitHub] spark issue #19487: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19487 "" can come in via configuration files; I'd treat that the same as null. Things which aren't valid URIs though, that's something you want

[GitHub] spark issue #19487: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19487 Looking a bit more at this. I see it handles """ as well as empty, and also other forms of invalid URI which Path can't handle today ("multiple colons except with fil

[GitHub] spark pull request #19487: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19487#discussion_r144545827 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -60,15 +71,6 @@ class

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 The latest PR update pulls in @dongjoon-hyun's new test; to avoid merge conflict in the Insert suite I've rebased against master. 1. Everything handles missing files on output 2

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19448 PS, for people who are interested in dynamic committers, [MAPREDUCE-6823](https://issues.apache.org/jira/browse/MAPREDUCE-6823) is something to look at. It allows you to switch committers

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19448 Thanks for reviewing this/getting it in. Personally, I had it in the "improvement" category rather than bug fix. If it wasn't for that line in the docs, there'd be no ambiguity abo

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Noted :) @dongjoon-hyun : is the issue with ORC that if there's nothing to write, it doesn't generate a file (so avoiding that issue with sometimes you get 0-byte ORC files & th

[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...

2017-10-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18979#discussion_r144505454 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala --- @@ -57,7 +60,14 @@ class

[GitHub] spark issue #19487: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19487 LGTM. I'm going stick out today a slight roll of my PathOutputCommitter class which is one layer above FileOutputCommitter : lets people write committers without output & work paths,

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-12 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144381367 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,10 @@ class

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-12 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144375059 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,11 @@ class

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-12 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144239543 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,10 @@ class

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-12 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144238941 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala --- @@ -0,0 +1,152

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144065810 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,13 @@ class

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144065074 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,13 @@ class

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144065041 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala --- @@ -0,0 +1,149

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-11 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 @viirya : the new data writer API will allow for a broader set of stats to be propagated back from workers. When you are working with the object stores, an useful stat to get back is throttle

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r143992362 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala --- @@ -0,0 +1,149

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r143992319 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,13 @@ class

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r143992018 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,13 @@ class

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Has anyone had a look at this recently? The problem still exists, and while downstream filesystems can address if they recognise the use case & lie about va

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-09 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19269 +1 for the ability to return statistics: the remote stores have lots of information which committers may return

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-09 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r143530841 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaSimpleWritableDataSource.java --- @@ -0,0 +1,297

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19448 + @rdblue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-06 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/19448 [SPARK-22217] [SQL] ParquetFileFormat to support arbitrary OutputCommitters ## What changes were proposed in this pull request? `ParquetFileFormat` to relax its requirement of output

[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19294 @szhem that null path support in `FileOutputCommitter` came with the App Master recovery work of [MAPREDUCE-3711](https://issues.apache.org/jira/browse/MAPREDUCE-3711); its, trying

[GitHub] spark issue #19368: [SPARK-22146] FileNotFoundException while reading ORC fi...

2017-10-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19368 Looking @ this, things would be a lot less brittle if there wasn't a round trip Path -> String -> Path. I'm thinking of Windows paths here in particular. Other than tests, whic

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19269 One other thing that would be good now and invaluable in future is for the `DataWriter.commit()` call to return a `Map[String,Long]` of statistics alongside the message sent to the committer

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r142005126 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Command.scala --- @@ -0,0 +1,113

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r142005072 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Command.scala --- @@ -0,0 +1,113

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r142004971 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaSimpleWritableDataSource.java --- @@ -0,0 +1,297

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r142004889 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaSimpleWritableDataSource.java --- @@ -0,0 +1,297

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r142004831 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaSimpleWritableDataSource.java --- @@ -0,0 +1,297

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r142004814 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaSimpleWritableDataSource.java --- @@ -0,0 +1,297

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r142004778 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Command.scala --- @@ -0,0 +1,113

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19269 People may know that I'm busy with some S3 committers which work with Hadoop MapReduce & Spark, with an import of Ryan's commtter into the Hadoop codebase. Thisa includes changes to

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r142004644 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request #17745: [SPARK-17159][Streaming] optimise check for new f...

2017-09-28 Thread steveloughran
Github user steveloughran closed the pull request at: https://github.com/apache/spark/pull/17745 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-09-24 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r140658582 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -130,17 +135,21 @@ class

[GitHub] spark issue #17743: [SPARK-20448][DOCS] Document how FileInputDStream works ...

2017-09-22 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17743 People don't realise how much object stores aren't file systems until they discover all their assumptions are broken. Once you know how they work, you can set up a workflow which

[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-09-22 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19294 As I play with commit logic all the way through the stack, I can' t help thinking everyone's lives would be better if we tagged the MRv1 commit APIs as deprecated in Hadoop 3. and uses

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-09-21 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r140188088 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -130,17 +135,21 @@ class

[GitHub] spark issue #17745: [SPARK-17159][Streaming] optimise check for new files in...

2017-09-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17745 Due to lack of support/interest, moved to https://github.com/hortonworks-spark/cloud-integration --- - To unsubscribe, e

[GitHub] spark pull request #17747: [SPARK-11373] [CORE] Add metrics to the FsHistory...

2017-09-21 Thread steveloughran
Github user steveloughran closed the pull request at: https://github.com/apache/spark/pull/17747 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-09-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r140008216 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -568,6 +568,51 @@ class PairRDDFunctionsSuite extends

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-09-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r140008084 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -568,6 +568,51 @@ class PairRDDFunctionsSuite extends

[GitHub] spark issue #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol to han...

2017-08-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18111 thx --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol to han...

2017-08-29 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18111 I believe this patch implements the original design goal: if a committer doesn't have a working path supplied by `getWorkingPath()` then it downgrades. It might be worthwhile doing

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-22 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Related to this, updated spec on [Hadoop output stream, Syncable and StreamCapabilities](https://github.com/steveloughran/hadoop/blob/s3/HADOOP-13327-outputstream-trunk/hadoop-common-project

[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...

2017-08-22 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18979#discussion_r134460176 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala --- @@ -57,7 +60,14 @@ class

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-08-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17342 @Chopinxb no worries; the hard part is thinking how to fix this. I don't see it being possible to do reliably except through an explicit download. Hadoop 2.8+ has moved off commons-logging so

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 @adrian-ionescu wrote > is there a need for calling getFinalStats() more than once? No. As long as everyone is aware of it, it won't be an issue. --- If your project is set

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 > To mimic S3-like behavior, you can overwrite the file system spark.hadoop.fs.$scheme.impl" @gatorsmile: you will be able to do something better soon, as S3A i

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-08-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Currently *nobody should be using s3a:// at the the temp file destination*, which is the same as saying "nobody should be using s3a:// as the direct destination of work", n

[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...

2017-08-18 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18979#discussion_r133919035 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/BasicWriteTaskStatsTrackerSuite.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...

2017-08-18 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18979#discussion_r133918269 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala --- @@ -57,7 +60,14 @@ class

[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...

2017-08-18 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18979#discussion_r133913173 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/BasicWriteTaskStatsTrackerSuite.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...

2017-08-17 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/18979 [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible ## What changes were proposed in this pull request

[GitHub] spark pull request #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol...

2017-08-17 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18111#discussion_r133751724 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -73,7 +73,10 @@ class

[GitHub] spark issue #17743: [SPARK-20448][DOCS] Document how FileInputDStream works ...

2017-08-17 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17743 Just reread this; still looks correct. Review comments welcome --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-08-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17342 Created: [SPARK-21697](https://issues.apache.org/jira/browse/SPARK-21697) with the stack trace attached --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-08-09 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17342 I'm going to recommend you file a SPARK bug on issues.apache.org there & an HDFS linked to it "NPE in BlockReaderFactory log init". It looks like the creation of the LOG f

[GitHub] spark issue #14601: [SPARK-13979][Core] Killed executor is re spawned withou...

2017-08-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14601 I know this hasn't been updated, but it is still important. I can take it on if all it needs is a test case --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-08-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18628 Thanks for making sure this is consistent with other uses of Configuration.get(); consistency is critical here --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131095350 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -50,6 +50,7 @@ private[hive

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131094720 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,61 @@ The location of these configuration files varies across Hadoop versions, but a common

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131093892 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,61 @@ The location of these configuration files varies across Hadoop versions, but a common

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131093320 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,61 @@ The location of these configuration files varies across Hadoop versions, but a common

[GitHub] spark pull request #18628: [SPARK-18061][ThriftServer] Add spnego auth suppo...

2017-08-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18628#discussion_r130598803 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala --- @@ -57,6 +59,19 @@ private[hive

[GitHub] spark pull request #18628: [SPARK-18061][ThriftServer] Add spnego auth suppo...

2017-08-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18628#discussion_r130598230 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala --- @@ -57,6 +59,19 @@ private[hive

[GitHub] spark issue #17747: [SPARK-11373] [CORE] Add metrics to the FsHistoryProvide...

2017-07-12 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17747 I know, I just have too many open JIRAs to try and manage --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17747: [SPARK-11373] [CORE] Add metrics to the FsHistoryProvide...

2017-07-12 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17747 Pushing up a new patched rebased to work with master. It's getting boring all round for this patch: me having to do a merge, retest, repush. How about finalising the review so we can

[GitHub] spark issue #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol to fai...

2017-07-04 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18111 Is there anything else which needs to be one here, or is it matter of finding the right reviewer? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17747: [SPARK-11373] [CORE] Add metrics to the FsHistoryProvide...

2017-06-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17747 Mima test failure was about a new method in hist server ``` [info] spark-mllib: found 0 potential binary incompatibilities while checking against org.apache.spark:spark-mllib_2.11

[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink

2017-06-22 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9518 BTW, here are some ongoing Hadoop JIRAs related to its shipping statsd: [HADOOP-12360](https://issues.apache.org/jira/browse/HADOOP-12360?focusedCommentId=16034826

[GitHub] spark pull request #9518: [SPARK-11574][Core] Add metrics StatsD sink

2017-06-22 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9518#discussion_r123483576 --- Diff: core/src/main/scala/org/apache/spark/metrics/sink/StatsdReporter.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #14601: [SPARK-13979][Core] Killed executor is re spawned withou...

2017-06-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14601 Testing should not be too hard. Here's my *untested* attempt ```scala val sconf = new SparkConf(false) sconf.set("fs.example.value", "true")

[GitHub] spark issue #17747: [SPARK-11373] [CORE] Add metrics to the FsHistoryProvide...

2017-06-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17747 I'm going to go with your suggestion and go via the metricServer to get at the state of counters and gauges; this is is actually better in that it will verify that all metrics are making

[GitHub] spark pull request #17747: [SPARK-11373] [CORE] Add metrics to the FsHistory...

2017-06-15 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17747#discussion_r122295075 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -164,6 +169,16 @@ private[history] class

[GitHub] spark pull request #17747: [SPARK-11373] [CORE] Add metrics to the FsHistory...

2017-06-15 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17747#discussion_r122294995 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -129,6 +131,9 @@ private[history] class

[GitHub] spark pull request #17747: [SPARK-11373] [CORE] Add metrics to the FsHistory...

2017-06-15 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17747#discussion_r122279400 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -110,6 +117,14 @@ class HistoryServer

[GitHub] spark pull request #17747: [SPARK-11373] [CORE] Add metrics to the FsHistory...

2017-06-15 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17747#discussion_r122256050 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryMetricSource.scala --- @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #18247: [SPARK-13933][BUILD] Update hadoop-2.7 profile's curator...

2017-06-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18247 ..just caught this. No, no issues with it. A retrospective non-binding +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol to fai...

2017-05-26 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18111 Not really. I thought about how I could do it, but essentially you do need to do things underneath the commit protocol, either in the Hadoop codebase (me) or in a test which somehow

[GitHub] spark pull request #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol...

2017-05-25 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18111#discussion_r118530138 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -73,7 +73,10 @@ class

[GitHub] spark pull request #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol...

2017-05-25 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/18111 [SPARK-20886][CORE] HadoopMapReduceCommitProtocol to fail meaningfully if FileOutputCommitter.getWorkPath==null ## What changes were proposed in this pull request? Handles

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2017-05-25 Thread steveloughran
Github user steveloughran closed the pull request at: https://github.com/apache/spark/pull/9571 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

<    1   2   3   4   5   6   7   8   9   10   >