[jira] [Commented] (SPARK-17633) texFile() and wholeTextFiles() count difference

2016-09-28 Thread Ding Fei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15529252#comment-15529252 ] Ding Fei commented on SPARK-17633: -- I think the count problem could be viewed as a bug issue. HadoopRDD

[jira] [Commented] (SPARK-17633) texFile() and wholeTextFiles() count difference

2016-09-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513182#comment-15513182 ] Sean Owen commented on SPARK-17633: --- The issue is more at the HDFS API level, which Spark uses to read

[jira] [Commented] (SPARK-17633) texFile() and wholeTextFiles() count difference

2016-09-22 Thread Anshul (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513168#comment-15513168 ] Anshul commented on SPARK-17633: What could be the possible reason for this? As spark's transformations

[jira] [Commented] (SPARK-17633) texFile() and wholeTextFiles() count difference

2016-09-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513125#comment-15513125 ] Sean Owen commented on SPARK-17633: --- Yeah I can reproduce that. It is weird behavior, but, it's due to

[jira] [Commented] (SPARK-17633) texFile() and wholeTextFiles() count difference

2016-09-22 Thread Anshul (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513116#comment-15513116 ] Anshul commented on SPARK-17633: data.csv 1,"a" 2,"b" val x=sc.textFile("data.csv") x.count is 2 If I

[jira] [Commented] (SPARK-17633) texFile() and wholeTextFiles() count difference

2016-09-22 Thread Anshul (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513110#comment-15513110 ] Anshul commented on SPARK-17633: RDD is not cached, in this scenario. > texFile() and wholeTextFiles()

[jira] [Commented] (SPARK-17633) texFile() and wholeTextFiles() count difference

2016-09-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513013#comment-15513013 ] Sean Owen commented on SPARK-17633: --- It's not clear what you're reporting. textFiles and wholeTextFiles