[GitHub] spark issue #15138: [SPARK-17583][SQL] Remove uesless rowSeparator variable ...

HyukjinKwon Sun, 18 Sep 2016 17:27:06 -0700

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/15138
  
    Yes, you are right and also yes, the purpose of the setting is to prevent 
OOM 
([documentation](https://github.com/uniVocity/univocity-parsers/blob/1ebd7e290b3f3d71a7ac9b8c5e3d1cfc220c8c96/src/main/java/com/univocity/parsers/common/CommonSettings.java#L35-L37)).
 I believe this limit was initially set by @falaki and I remember I have a 
positive answer when I try to increase this value.
    
    If this is a normal case, it'd make sense if we set explicit limit because 
it is possible to try to read a whole file as a value within a column. However, 
I guess we are already reading and parsing line by line via `LineRecordReader` 
and via [CsvReader.parseLine(line: 
String)](https://github.com/apache/spark/blob/511f52f8423e151b0d0133baf040d34a0af3d422/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala#L59).
 Therefore, I think the limit can't exceed the length of each line which I 
think is okay as a default value.
    
    BTW, Apache common CSV does not have this limit IIRU.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15138: [SPARK-17583][SQL] Remove uesless rowSeparator variable ...

Reply via email to