Re: Iterating filtered Grouped elements on SparkRunner

Jan Lukavský Thu, 24 Oct 2019 04:20:18 -0700

Hi Noam,

we are working towards fixing this bug so that your code would work, butthat will not be sooner than version 2.18.0 (and I cannot promise eventhat :-)). In the mean time, you have several options:

a) use merging windowing - that will unfortunately mean someperformance penalty and is more a dirty hack than anything else, but itmight work

b) split the logic into two parts - one part to calculate cardinalityof the group (you can use Count.perKey [1]) and then join this on keysof the GBK result (e.g. [2] or [3]), filter there using the calculatedcardinality and calculate your result

The option (b) should be actually more effective if you have largegroups, because calculation of the cardinality can be parallelised.


Hope this helps, please feel free to ask any more questions.

Jan

[1]https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/sdk/transforms/Count.html


[2] https://beam.apache.org/documentation/sdks/java/euphoria/#join

[3] https://beam.apache.org/documentation/sdks/java-extensions/#join-library


On 10/24/19 11:46 AM, Gershi, Noam wrote:

Hi,

I would like to:

1.Group elements

2.Then filter-out some groups

3.Then iterate and calculate on filtered-in grouped

Under Spark execution environments, I get an exception:
Caused by: java.lang.IllegalStateException: ValueIterator can't beiterated more than once,otherwise there could be data lost
atorg.apache.beam.runners.spark.translation.GroupNonMergingWindowsFunctions$GroupByKeyIterator$ValueIterator.iterator(GroupNonMergingWindowsFunctions.java:221)
                at java.lang.Iterable.spliterator(Iterable.java:101)

                at com.company.Main$2.processElement(Main.java:65)

Code attached

citi_logo_mailciti_logo_mail*Noam Gershi*

Software Developer

*T*:+972 (3) 7405718

Mail_signature_blue

Re: Iterating filtered Grouped elements on SparkRunner

Reply via email to