Hi all, I am working on a pipeline that needs to join two Spark streams. The input is a stream of integers. And the output is the number of integer's appearance divided by the total number of unique integers. Suppose the input is:
1 2 3 1 2 2 There are 3 unique integers and 1 appears twice. Therefore, the output for the integer 1 will be: 1 0.67 Since the input is from a stream, it seems we need to first join the appearance of the integers and the total number of unique integers and then do a calculation using map. I am thinking of adding a dummy key to both streams and use join. However, a Cartesian product matches the application here better. How to do this effectively? Thanks! Bill