GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/9346
[SPARK-11393][SQL] CoGroupedIterator should respect the fact that
GroupedIterator.hasNext is not idempotent
When we cogroup 2 `GroupedIterator`s in `CoGroupedIterator`, if the right
side is smaller, we will consume right data and keep the left data unchanged.
Then we call `hasNext` which will call `left.hasNext`. This will make
`GroupedIterator` generate an extra group as the previous one has not been
comsumed yet.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark cogroup
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9346.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9346
----
commit 9be67c8ae302c6596aaf34c68aa12ed8c56d058f
Author: Wenchen Fan <[email protected]>
Date: 2015-10-29T04:06:07Z
SPARK-11393
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]