[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-24 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-159527637 OK, I will close this, since some partitions implementation relies on the position of the partition array, so this implementation may be failed in some cases. ---

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-24 Thread jerryshao
Github user jerryshao closed the pull request at: https://github.com/apache/spark/pull/9597 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-16 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-157285732 Currently I don't find such usage scenario which needs position of the partition array in RDD implementation. Let me dig more in Spark code. I will add a unit test

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156794866 Hi @koeninger , how about this change? Still keeping the mapping relations, so offset range can be retrieved through partitionId, just filter out empty partition.

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156822484 Are you 100% sure that all uses of the partition array only use the index associated with the individual Partition, and not its position in the array? At

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156806763 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156806762 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156803004 **[Test build #45949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45949/consoleFull)** for PR 9597 at commit

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156806725 **[Test build #45949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45949/consoleFull)** for PR 9597 at commit

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-13 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156457252 Well, the implementation of HasOffsetRanges.offsetRanges in KafkaRDD is just the val offsetRanges provided on creation, so you're talking about a fair amount of

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-13 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156602165 I think I will not change `offsetRanges `, I will make a simple change so you can see if it is reasonable. Yes, some partitions are empty while others have

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156290949 https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md#hasoffsetranges The 1:1 correspondence is also mentioned in the spark docs Kafka

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-12 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156291634 OK, get it. How about this, still keeping the mapping relation from offset Range to rdd partition, but filter out empty partition, so `offset(i)` still map to

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-12 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156282327 @koeninger , can you please point out in which scenario people will use this 1:1 guarantee, how people use this 1:1 restriction? Actually I saw some users are

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156167582 This is going to break the invariant that offset ranges are 1:1 with spark partitions, which will definitely break some people's jobs in a non-obvious manner. ---

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-155428344 **[Test build #45524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45524/consoleFull)** for PR 9597 at commit

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-155428515 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-155428516 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-10 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/9597 [SPARK-11632][Streaming] Filter out empty partition in KafkaRDD Currently empty partitions or empty `KafkaRDD` will still submit and run tasks to remotely, this is unnecessary since no data is

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-155420328 **[Test build #45524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45524/consoleFull)** for PR 9597 at commit

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-155419891 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-155419904 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not