[GitHub] spark pull request: Minimize the scope of runAsUser to support Had...

2014-03-27 Thread NathanHowell
GitHub user NathanHowell opened a pull request: https://github.com/apache/spark/pull/257 Minimize the scope of runAsUser to support Hadoop token passing To support accessing a secure HDFS installation from Mesos executed Spark jobs, the resources attached to the `SparkContext

[GitHub] spark pull request: Minimize the scope of runAsUser to support Had...

2014-03-27 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/257#issuecomment-38820877 Yes, that is what I'm doing. It's the only way I've gotten jobs that access our secure HDFS cluster to run under Mesos. YARN sets this environmen

[GitHub] spark pull request: Minimize the scope of runAsUser to support Had...

2014-03-27 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/257#issuecomment-38830103 No, this was the the bare minimum I could do to get a demo hacked together to run on under Mesos. I haven't tested it extensively or tried standalone

[GitHub] spark pull request: Minimize the scope of runAsUser to support Had...

2014-08-26 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/257#issuecomment-53469729 Indeed, we are using Spark 1.0.x without this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: Minimize the scope of runAsUser to support Had...

2014-08-26 Thread NathanHowell
Github user NathanHowell closed the pull request at: https://github.com/apache/spark/pull/257 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 Rebased again to pickup the build break hotfix in c618ccdbe9ac103dfa3182346e2a14a1e7fca91a --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100219153 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -194,5 +195,8 @@ class PortableDataStream

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-09 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100344282 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -48,69 +47,102 @@ class JacksonParser

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-09 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100344879 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -48,69 +47,102 @@ class JacksonParser

[GitHub] spark issue #16199: [SPARK-18772][SQL] NaN/Infinite float parsing in JSON is...

2017-02-09 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16199 @HyukjinKwon Good idea, I'll take another stab and try to revive the original pull request. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-09 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100474809 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -48,69 +47,102 @@ class JacksonParser

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100610662 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -48,69 +47,98 @@ class JacksonParser

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100640620 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -48,69 +47,98 @@ class JacksonParser

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100646497 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -48,69 +47,98 @@ class JacksonParser

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100648524 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -394,36 +447,32 @@ class JacksonParser

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100649635 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -31,10 +31,17 @@ import

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100649836 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -48,69 +47,110 @@ class JacksonParser

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100650450 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -394,36 +447,32 @@ class JacksonParser

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100651706 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala --- @@ -79,7 +80,7 @@ private[sql] object

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100651910 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,123 @@ class JsonSuite extends

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100651990 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,123 @@ class JsonSuite extends

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100652192 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,123 @@ class JsonSuite extends

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100652259 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,123 @@ class JsonSuite extends

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100652445 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100653580 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,123 @@ class JsonSuite extends

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100653757 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,123 @@ class JsonSuite extends

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100653835 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,123 @@ class JsonSuite extends

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100653879 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,123 @@ class JsonSuite extends

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-02-10 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 @cloud-fan I just pushed a few more changes to address some of your comments. I'll be back later next week to continue work. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-02-16 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 @cloud-fan When implementing tests for the other modes I've uncovered an existing bug in schema inference in `DROPMALFORMED` mode: https://issues.apache.org/jira/browse/SPARK-19641. Sin

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-16 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r101671453 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,117 @@ class JsonSuite extends

[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-23 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16976#discussion_r102662258 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed

[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-23 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16976#discussion_r102662330 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed

[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-23 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16976#discussion_r102662637 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed

[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-23 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16976#discussion_r102663016 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -43,23 +37,26 @@ class CSVFileFormat

[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-23 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16976#discussion_r102665619 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed

[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-23 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16976#discussion_r102665872 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed

[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-23 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16976#discussion_r102667812 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed

[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-23 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16976#discussion_r102668816 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2016-12-22 Thread NathanHowell
GitHub user NathanHowell opened a pull request: https://github.com/apache/spark/pull/16386 [SPARK-18352][SQL] Support parsing multiline json files ## What changes were proposed in this pull request? If a new option `wholeFile` is set to `true` the JSON reader will parse

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-22 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 Hello recent JacksonGenerator.scala commiters, please take a look. cc/ @rxin @hvanhovell @clockfly @hyukjinkwon @cloud-fan --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-23 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 @srowen It is functionally the same as what you're suggesting. The question is how (or if) it should it be first class in the `DataFrameReader` api. If we agree that it should be ex

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2016-12-27 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r93969732 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -0,0 +1,204 @@ +/* + * Licensed

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2016-12-27 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r93970059 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -36,29 +31,31 @@ import

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-27 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 @HyukjinKwon I agree that overloading the corrupt record column is undesirable and `F.input_file_name` is a better way to fetch the filename. It would be nice to extend this concept further

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-27 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 The tests failed for an unrelated reason, looks to be running out of heap space in SBT somewhere. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-29 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 @HyukjinKwon I just pushed a change that makes the corrupt record handling consistent: if a corrupt record column is defined it will always get the json text for failed records. If `wholeFile

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-01-10 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 Can someone kick off the tests again? The last failure was in another module (Kafka). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-01-10 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-12 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/17255#discussion_r105563532 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -23,24 +23,25 @@ import

[GitHub] spark issue #17255: [SPARK-19918][SQL] Use TextFileFormat in implementation ...

2017-03-12 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/17255 Would there be any additional benefit of replacing more (or all?) of the uses of `RDD` with the equivalent `Dataset` operations? --- If your project is set up for it, you can reply to this

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-14 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/17255#discussion_r105942833 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala --- @@ -40,18 +40,11 @@ private[sql] object

[GitHub] spark issue #15813: [SPARK-18362][SQL] Use TextFileFormat in JsonFileFormat ...

2016-11-22 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/15813 Any thoughts on modifying `JsonToStruct` to support arrays (and options), then parsing could be something like: ``` dataset.select( Column(Inline( JsonToValue

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-01-23 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 Any other comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100097749 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100098008 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,125 @@ class JsonSuite extends

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100099791 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -31,10 +31,17 @@ import

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100100641 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -194,5 +195,8 @@ class PortableDataStream

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100101464 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -160,7 +164,17 @@ public void writeTo(OutputStream out

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100103739 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -227,66 +267,71 @@ class JacksonParser

[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100104738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -298,22 +312,22 @@ class JacksonParser

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16386 I rebased to master and hopefully addressed all of your comments @cloud-fan, please have another look. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #16084: [SPARK-18654][SQL] Remove unreachable patterns in...

2016-11-30 Thread NathanHowell
GitHub user NathanHowell opened a pull request: https://github.com/apache/spark/pull/16084 [SPARK-18654][SQL] Remove unreachable patterns in makeRootConverter ## What changes were proposed in this pull request? `makeRootConverter` is only called with a `StructType` value

[GitHub] spark issue #16084: [SPARK-18654][SQL] Remove unreachable patterns in makeRo...

2016-11-30 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16084 cc/ @HyukjinKwon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-11-30 Thread NathanHowell
GitHub user NathanHowell opened a pull request: https://github.com/apache/spark/pull/16089 [SPARK-18658][SQL] Write text records directly to a FileOutputStream ## What changes were proposed in this pull request? This replaces uses of `TextOutputFormat` with an `OutputStream

[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...

2016-11-30 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16089 This touches a fair number of components. I also haven't done any performance testing to see what the impact of this is. Curious what your thoughts are? cc/ @marmbrus @rxin @Josh

[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...

2016-11-30 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16089 Yep. It uses the Hadoop `FileSystem` class to open files, just like `TextOutputFormat` does. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-11-30 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90380594 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala --- @@ -132,39 +128,17 @@ class

[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...

2016-11-30 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16089 Doh, forgot to run the Hive tests. Should be fixed now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-11-30 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90385252 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala --- @@ -0,0 +1,73 @@ +/* + * Licensed to the

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90468563 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala --- @@ -0,0 +1,73 @@ +/* + * Licensed to the

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90468858 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala --- @@ -132,39 +128,17 @@ class

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90488454 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala --- @@ -132,39 +128,17 @@ class

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90501927 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -194,4 +194,8 @@ private[sql] class

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90502343 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -147,6 +147,17 @@ public void writeTo(ByteBuffer buffer

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90503024 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the

[GitHub] spark issue #16089: [SPARK-18658][SQL] Write text records directly to a File...

2016-12-01 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16089 @steveloughran Spark is handling the output committing somewhere further up the stack. The path being passed in to `OutputWriterFactory.newInstance` is to a temporary file, such as `/private

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90509459 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -194,4 +194,8 @@ private[sql] class

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90521381 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala --- @@ -245,24 +230,12 @@ private[csv] class

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

2016-12-01 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16089#discussion_r90566162 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -147,6 +147,17 @@ public void writeTo(ByteBuffer buffer

[GitHub] spark issue #16107: SPARK-18677: Fix parsing ['key'] in JSON path expression...

2016-12-02 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16107 I wrote the buggy version, doh... but this LGTM. Thanks for fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #16199: [SPARK-18772][SQL] NaN/Infinite float parsing in ...

2016-12-07 Thread NathanHowell
GitHub user NathanHowell opened a pull request: https://github.com/apache/spark/pull/16199 [SPARK-18772][SQL] NaN/Infinite float parsing in JSON is inconsistent ## What changes were proposed in this pull request? This relaxes the parsing of `Float` and `Double` columns to

[GitHub] spark issue #16199: [SPARK-18772][SQL] NaN/Infinite float parsing in JSON is...

2016-12-07 Thread NathanHowell
Github user NathanHowell commented on the issue: https://github.com/apache/spark/pull/16199 Hello @HyukjinKwon, can you take a look at this one? I am unsure if we should be accepting lowercased values like `nan` (versus strictly testing for `NaN`) but I think this PR matches the

[GitHub] spark pull request #16375: [SPARK-18963] o.a.s.unsafe.types.UTF8StringSuite....

2016-12-21 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16375#discussion_r93460507 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java --- @@ -591,7 +591,11 @@ public void

[GitHub] spark pull request: [SPARK-3858][SQL] Pass the generator alias int...

2014-10-08 Thread NathanHowell
GitHub user NathanHowell opened a pull request: https://github.com/apache/spark/pull/2721 [SPARK-3858][SQL] Pass the generator alias into logical plan node The alias parameter is being ignored, which makes it more difficult to specify a qualifier for Generator expressions. You can

[GitHub] spark pull request: [SPARK-3858][SQL] Pass the generator alias int...

2014-10-08 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/2721#issuecomment-58452145 It works properly from Hive... `HiveQl.withLateralView` creates `Generate` instances directly and doesn't go through the `SchemaRDD.generate` helper functio

[GitHub] spark pull request: [SPARK-3858][SQL] Pass the generator alias int...

2014-10-08 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/2721#issuecomment-58460036 Alright, I've added a test that fails on master and is fixed by this pull request. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: SPARK-4228 SchemaRDD to JSON

2014-11-14 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/3213#issuecomment-63151280 Another approach is to use a `JsonGenerator` instead of an `ObjectMapper`. This is the implementation I've been using for a while: https://gist.githu

[GitHub] spark pull request: [SPARK-5938][SQL] Improve JsonRDD performance

2015-04-30 Thread NathanHowell
GitHub user NathanHowell opened a pull request: https://github.com/apache/spark/pull/5801 [SPARK-5938][SQL] Improve JsonRDD performance This patch comprises of a few related pieces of work: * Schema inference is performed directly on the JSON token stream * `String

[GitHub] spark pull request: [SPARK-5938][SQL] Improve JsonRDD performance

2015-04-30 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/5801#issuecomment-97699564 Looks like it may also resolve [SPARK-5443](https://issues.apache.org/jira/browse/SPARK-5443). --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-5938][SPARK-5443][SQL] Improve JsonRDD ...

2015-04-30 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/5801#issuecomment-97959395 Benchmarked a small-ish real dataset... Runs are with 5 executors (for 5 input splits) with data in HDFS: step | before | after

[GitHub] spark pull request: [SPARK-5938][SPARK-5443][SQL] Improve JsonRDD ...

2015-05-01 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/5801#issuecomment-98081628 I think it's in a decent state now, if this qualifies for the 1.4.0 merge window I'll make time to work through any remaining issues (if any). --- If yo

[GitHub] spark pull request: [SPARK-5938][SPARK-5443][SQL] Improve JsonRDD ...

2015-05-01 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/5801#issuecomment-98251841 @yhuai Fine with me, I'm reworking the patch set now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-5938][SPARK-5443][SQL] Improve JsonRDD ...

2015-05-01 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/5801#issuecomment-98258602 @yhuai The updated patches do not test the old code. Do you have an opinion on the best way to address this? I can duplicate the entire JsonSuite or try to do

[GitHub] spark pull request: [SPARK-5938][SPARK-5443][SQL] Improve JsonRDD ...

2015-05-01 Thread NathanHowell
Github user NathanHowell commented on the pull request: https://github.com/apache/spark/pull/5801#issuecomment-98260541 @marmbrus sounds good, I'll leave it as is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-5938][SPARK-5443][SQL] Improve JsonRDD ...

2015-05-03 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/5801#discussion_r29565276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD2.scala --- @@ -0,0 +1,409 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5938][SPARK-5443][SQL] Improve JsonRDD ...

2015-05-03 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/5801#discussion_r29566704 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD2.scala --- @@ -0,0 +1,409 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5938][SPARK-5443][SQL] Improve JsonRDD ...

2015-05-03 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/5801#discussion_r29566701 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD2.scala --- @@ -0,0 +1,409 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5938][SPARK-5443][SQL] Improve JsonRDD ...

2015-05-03 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/5801#discussion_r29566889 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala --- @@ -101,32 +103,83 @@ private[sql] class DefaultSource

  1   2   >