Oh, I'm dumb-- you mean you want like a left-join like thing where you can find all values in collection A that aren't in collection B, etc., etc.?
J On Wed, Feb 18, 2015 at 2:43 PM, Josh Wills <[email protected]> wrote: > Different from o.a.c.lib.Cartesian.cross(PCollection<U> left, > PCollection<T> right, int parallelism) in some way? > > J > > On Wed, Feb 18, 2015 at 2:41 PM, Bryan Baugher <[email protected]> wrote: > >> >> Maybe, >> >> PCollection<T>#join(PCollection<T>, JoinType) : PCollection<Pair<T, T>> >> >> You could make additional methods for the different join strategies or >> maybe an enum perhaps? >> >> On Wed Feb 18 2015 at 3:58:38 PM Josh Wills <[email protected]> wrote: >> >>> Hey Bryan, >>> >>> I like the idea of throwing exceptions when there are null values in one >>> of the collections in a join. Not sure if there are any other implications >>> of that I should think through first. >>> >>> On the convenience methods for PCollection joins, what do you have in >>> mind? >>> >>> J >>> >>> >>> On Wed, Feb 18, 2015 at 12:35 PM, Bryan Baugher <[email protected]> >>> wrote: >>> >>>> Hi everyone, >>>> >>>> The other day I ran into the issue mentioned here[1] about joining data >>>> with null values. This took awhile to figure out until I broke down and >>>> went to look at the docs to see if I was doing something obviously wrong. I >>>> used null values because I'm basically wanting to join two pcollections. >>>> >>>> Can crunch either throw an exception or log errors if I do something >>>> like this? Similarly would it be possible to get convenience methods for >>>> doing joins on PCollections? >>>> >>>> [1] - http://crunch.apache.org/user-guide.html#joins >>>> >>> >>> >>> >>> -- >>> Director of Data Science >>> Cloudera <http://www.cloudera.com> >>> Twitter: @josh_wills <http://twitter.com/josh_wills> >>> >> > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
