Hi all,

I am working on a pipeline that needs to join two Spark streams. The input
is a stream of integers. And the output is the number of integer's
appearance divided by the total number of unique integers. Suppose the
input is:

1
2
3
1
2
2

There are 3 unique integers and 1 appears twice. Therefore, the output for
the integer 1 will be:
1 0.67

Since the input is from a stream, it seems we need to first join the
appearance of the integers and the total number of unique integers and then
do a calculation using map. I am thinking of adding a dummy key to both
streams and use join. However, a Cartesian product matches the application
here better. How to do this effectively? Thanks!

Bill

Reply via email to