Github user jbax commented on the issue:
https://github.com/apache/spark/pull/21892
univocity-parsers-2.7.3 released. Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user jbax commented on the issue:
https://github.com/apache/spark/pull/21892
Thanks @MaxGekk I've fixed the error and also made the parser run faster
than before when processing fields that were not selected in general.
Can you please retest with the latest SNAPSHOT
Github user jbax commented on the issue:
https://github.com/apache/spark/pull/21892
Did anyone had a chance to test with the 2.7.3-SNAPSHOT build I released to
see if the performance issue has been addressed? If it has then let me know
and I'll release the final 2.7.3 build
Github user jbax commented on the issue:
https://github.com/apache/spark/pull/17177
2.4.0 released, thank you guys!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user jbax commented on the issue:
https://github.com/apache/spark/pull/17177
Doesn't seem correct to me. All test cases are using broken CSV and trigger
the parser handling of unescaped quotes, where it tries to rescue the data and
produce something sensible. See my test case
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/13267#issuecomment-221454486
@rxin In your case think it's better to have this turned on by default.
Regarding your other questions:
1 - There's no timeline. 2.2.x will come out when new
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/13267#issuecomment-221408197
It's disabled by default because earlier versions were slower when writing
CSV and it helped a little bit. Also because parsing unqoted values is
faster
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216770729
By the way, may I suggest you guys to upgrade to version 2.1.0 as it comes
with substantial performance improvements for parsing and writing CSV.
---
If your project
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216747285
Foo and bar are part of the same value, they just happen to have a line
ending in between. And yes `setLineSeparator()` it is related to the values
themselves when
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216743260
What happens if you do this:
```
scala> "foo\r\nbar\r\n".stripLineEnd
```
Shouldn't the result be this?
```
res0: String = fo
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216738292
I just read the rest of this ticket. Be careful with the
`setLineSeparator()`. It uses the default OS line separator but that's not
always desired.
By default
Github user jbax commented on the pull request:
https://github.com/apache/spark/pull/12818#issuecomment-216737346
Confirmed. It is only used if you call `CsvWriter.commentRow()` or
`CsvWriter.commentRowToString()` to write comments to the output.
---
If your project is set up
12 matches
Mail list logo