[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-02 Thread jbax
Github user jbax commented on the issue: https://github.com/apache/spark/pull/21892 univocity-parsers-2.7.3 released. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-01 Thread jbax
Github user jbax commented on the issue: https://github.com/apache/spark/pull/21892 Thanks @MaxGekk I've fixed the error and also made the parser run faster than before when processing fields that were not selected in general. Can you please retest with the latest SNAPSHOT

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-31 Thread jbax
Github user jbax commented on the issue: https://github.com/apache/spark/pull/21892 Did anyone had a chance to test with the 2.7.3-SNAPSHOT build I released to see if the performance issue has been addressed? If it has then let me know and I'll release the final 2.7.3 build

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-14 Thread jbax
Github user jbax commented on the issue: https://github.com/apache/spark/pull/17177 2.4.0 released, thank you guys! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread jbax
Github user jbax commented on the issue: https://github.com/apache/spark/pull/17177 Doesn't seem correct to me. All test cases are using broken CSV and trigger the parser handling of unescaped quotes, where it tries to rescue the data and produce something sensible. See my test case

[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread jbax
Github user jbax commented on the pull request: https://github.com/apache/spark/pull/13267#issuecomment-221454486 @rxin In your case think it's better to have this turned on by default. Regarding your other questions: 1 - There's no timeline. 2.2.x will come out when new

[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread jbax
Github user jbax commented on the pull request: https://github.com/apache/spark/pull/13267#issuecomment-221408197 It's disabled by default because earlier versions were slower when writing CSV and it helped a little bit. Also because parsing unqoted values is faster

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-04 Thread jbax
Github user jbax commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216770729 By the way, may I suggest you guys to upgrade to version 2.1.0 as it comes with substantial performance improvements for parsing and writing CSV. --- If your project

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread jbax
Github user jbax commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216747285 Foo and bar are part of the same value, they just happen to have a line ending in between. And yes `setLineSeparator()` it is related to the values themselves when

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread jbax
Github user jbax commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216743260 What happens if you do this: ``` scala> "foo\r\nbar\r\n".stripLineEnd ``` Shouldn't the result be this? ``` res0: String = fo

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread jbax
Github user jbax commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216738292 I just read the rest of this ticket. Be careful with the `setLineSeparator()`. It uses the default OS line separator but that's not always desired. By default

[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread jbax
Github user jbax commented on the pull request: https://github.com/apache/spark/pull/12818#issuecomment-216737346 Confirmed. It is only used if you call `CsvWriter.commentRow()` or `CsvWriter.commentRowToString()` to write comments to the output. --- If your project is set up