[jira] [Commented] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-06-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317178#comment-15317178 ] Apache Spark commented on SPARK-15654: -- User 'davies' has created a pull request for this issue:

[jira] [Commented] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-06-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310508#comment-15310508 ] Apache Spark commented on SPARK-15654: -- User 'maropu' has created a pull request for this issue:

[jira] [Commented] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-05-31 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307675#comment-15307675 ] Takeshi Yamamuro commented on SPARK-15654: -- Oh, my bad. Fixed. > Reading gzipped files results

[jira] [Commented] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-05-31 Thread Jurriaan Pruis (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307425#comment-15307425 ] Jurriaan Pruis commented on SPARK-15654: You need to override maxSplitBytes, not

[jira] [Commented] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-05-30 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307043#comment-15307043 ] Takeshi Yamamuro commented on SPARK-15654: -- Seems a root cause is that LineRecordReader cannot

[jira] [Commented] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-05-30 Thread Jurriaan Pruis (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306866#comment-15306866 ] Jurriaan Pruis commented on SPARK-15654: Sorry, not sure about other formats. So this is due to

[jira] [Commented] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-05-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306831#comment-15306831 ] Michael Armbrust commented on SPARK-15654: -- Thanks for point this out! Looks like we need to at

[jira] [Commented] (SPARK-15654) Reading gzipped files results in duplicate rows

2016-05-30 Thread Jurriaan Pruis (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306731#comment-15306731 ] Jurriaan Pruis commented on SPARK-15654: cc [~davies] [~marmbrus] I saw you guys worked on code