Maybe, PCollection<T>#join(PCollection<T>, JoinType) : PCollection<Pair<T, T>>
You could make additional methods for the different join strategies or maybe an enum perhaps? On Wed Feb 18 2015 at 3:58:38 PM Josh Wills <[email protected]> wrote: > Hey Bryan, > > I like the idea of throwing exceptions when there are null values in one > of the collections in a join. Not sure if there are any other implications > of that I should think through first. > > On the convenience methods for PCollection joins, what do you have in mind? > > J > > > On Wed, Feb 18, 2015 at 12:35 PM, Bryan Baugher <[email protected]> wrote: > >> Hi everyone, >> >> The other day I ran into the issue mentioned here[1] about joining data >> with null values. This took awhile to figure out until I broke down and >> went to look at the docs to see if I was doing something obviously wrong. I >> used null values because I'm basically wanting to join two pcollections. >> >> Can crunch either throw an exception or log errors if I do something like >> this? Similarly would it be possible to get convenience methods for doing >> joins on PCollections? >> >> [1] - http://crunch.apache.org/user-guide.html#joins >> > > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> >
