Since you are doing a "self join" you don't need to actually use trident join, or the multireducer on which it is based. You could group the stream on your join key, then write an aggregator which collects all the tuples in each group and emits the cross product at the end of each batch (or in a streaming fashion where each incremental tuple emits the cross of that tuple with all the tuples already received); and then finally implement a filter function downstream of each aggregates output. Your aggregator will have to take care of the "*" in your SQL example since typically aggregators only keep the join key plus the "aggregated" value.
On Thu, Apr 24, 2014 at 5:46 PM, Charles LeDoux <[email protected]>wrote: > Is it possible to join a trident stream with itself? > > My particular use case is that I want to take the cross product of all the > incoming tuples for a batch and then only keep the joined tuples containing > a known value. > > I believe the SQL for what I am trying to accomplish is: > > SELECT * FROM table AS t1 JOIN table AS t2 ON field1 WHERE t1.field2 = > "known value"; > > My intention was to do a self join on my stream and then run the now > joined stream through a filter. > > Thanks, > Charles > > -- > PhD Candidate; University Fellow > University of Louisiana at Lafayette > Center for Advanced Computer Studies > http://charlesledoux.com > >
