Github user WeichenXu123 closed the pull request at:
https://github.com/apache/spark/pull/13007
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/13007#issuecomment-218097561
@WeichenXu123 Currently it is possible to support multiple lines because
the lines read from `LineRecordReader` becomes `Reader` (a byte stream) by
`StringIteratorR
Github user WeichenXu123 commented on the pull request:
https://github.com/apache/spark/pull/13007#issuecomment-218094762
@HyukjinKwon
En..current cvs load code use Hadoop `LineRecordReader`, so not allow a row
split into mulit-lines, so I think the code should disable csv multi-l
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/13007#issuecomment-218040423
@WeichenXu123 Also, I realised the test in `CSVSuite` is only for data. If
we have a test for header, this will fail. I mean some might want a header
```
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/13007#issuecomment-218039838
@WeichenXu123 [External CSV data
source](https://github.com/databricks/spark-csv) supports this but has an issue
for parsing unescaped quotes, here,
https://issues
Github user WeichenXu123 commented on the pull request:
https://github.com/apache/spark/pull/13007#issuecomment-218037385
@HyukjinKwon I run existing test against this patch and all pass. If need I
will add a new test in CSVSuit.
And I think the only reason cause the bug is reading
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/13007#issuecomment-218018326
In addition, I think a test is needed for this as well in `CSVSuite`. Also
it would be nicer if we don't comment the original code. I blieve current
change breaks t
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/13007#issuecomment-218014048
I think if we really need to solve this problem, we need a option for
unescaped quote handling.
---
If your project is set up for it, you can reply to this email a
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/13007#discussion_r62587043
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala
---
@@ -63,7 +63,9 @@ class DefaultSource extends F
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/13007#discussion_r62586730
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala
---
@@ -63,7 +63,9 @@ class DefaultSource extends F
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13007#issuecomment-217872808
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your p
11 matches
Mail list logo