For each element in a dataset, do something with another dataset

Pieter Hameete Tue, 29 Sep 2015 08:01:44 -0700

Good day everyone,

I am looking for a good way to do the following:


I have dataset A and dataset B, and for each element in dataset A I would
like to filter dataset B and obtain the size of the result. To say it short:

*for each element a in A -> B.filter( _ < a.propertyx).count*

Currently I am doing a cross of dataset A and B, making tuples so I can
then filter all the tuples where field2 < field1.propertya and then group
by field1.id and get the sizes of the groups.However this is not working
out in practice. When the datasets get larger, some Tasks hang on the CHAIN
Cross -> Filter probably because there is insufficient memory for the cross
to be completed?

Does anyone have a suggestion on how I could make this work, especially
with datasets that are larger than memory available to a separate Task?

Thank you in advance for your time :-)

Kind regards,

Pieter Hameete

For each element in a dataset, do something with another dataset

Reply via email to