Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-159527637
OK, I will close this, since some partitions implementation relies on the
position of the partition array, so this implementation may be failed in some
cases.
---
Github user jerryshao closed the pull request at:
https://github.com/apache/spark/pull/9597
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-157285732
Currently I don't find such usage scenario which needs position of the
partition array in RDD implementation. Let me dig more in Spark code. I will
add a unit test
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156794866
Hi @koeninger , how about this change? Still keeping the mapping relations,
so offset range can be retrieved through partitionId, just filter out empty
partition.
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156822484
Are you 100% sure that all uses of the partition array only use the index
associated with the individual Partition, and not its position in the array?
At
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156806763
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156806762
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156803004
**[Test build #45949 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45949/consoleFull)**
for PR 9597 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156806725
**[Test build #45949 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45949/consoleFull)**
for PR 9597 at commit
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156457252
Well, the implementation of HasOffsetRanges.offsetRanges in KafkaRDD is
just the val offsetRanges provided on creation, so you're talking about a
fair amount of
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156602165
I think I will not change `offsetRanges `, I will make a simple change so
you can see if it is reasonable.
Yes, some partitions are empty while others have
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156290949
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md#hasoffsetranges
The 1:1 correspondence is also mentioned in the spark docs Kafka
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156291634
OK, get it. How about this, still keeping the mapping relation from offset
Range to rdd partition, but filter out empty partition, so `offset(i)` still
map to
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156282327
@koeninger , can you please point out in which scenario people will use
this 1:1 guarantee, how people use this 1:1 restriction?
Actually I saw some users are
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-156167582
This is going to break the invariant that offset ranges are 1:1 with spark
partitions, which will definitely break some people's jobs in a non-obvious
manner.
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-155428344
**[Test build #45524 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45524/consoleFull)**
for PR 9597 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-155428515
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-155428516
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
GitHub user jerryshao opened a pull request:
https://github.com/apache/spark/pull/9597
[SPARK-11632][Streaming] Filter out empty partition in KafkaRDD
Currently empty partitions or empty `KafkaRDD` will still submit and run
tasks to remotely, this is unnecessary since no data is
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-155420328
**[Test build #45524 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45524/consoleFull)**
for PR 9597 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-155419891
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9597#issuecomment-155419904
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
22 matches
Mail list logo