Different from o.a.c.lib.Cartesian.cross(PCollection<U> left, PCollection<T> right, int parallelism) in some way?
J On Wed, Feb 18, 2015 at 2:41 PM, Bryan Baugher <[email protected]> wrote: > > Maybe, > > PCollection<T>#join(PCollection<T>, JoinType) : PCollection<Pair<T, T>> > > You could make additional methods for the different join strategies or > maybe an enum perhaps? > > On Wed Feb 18 2015 at 3:58:38 PM Josh Wills <[email protected]> wrote: > >> Hey Bryan, >> >> I like the idea of throwing exceptions when there are null values in one >> of the collections in a join. Not sure if there are any other implications >> of that I should think through first. >> >> On the convenience methods for PCollection joins, what do you have in >> mind? >> >> J >> >> >> On Wed, Feb 18, 2015 at 12:35 PM, Bryan Baugher <[email protected]> wrote: >> >>> Hi everyone, >>> >>> The other day I ran into the issue mentioned here[1] about joining data >>> with null values. This took awhile to figure out until I broke down and >>> went to look at the docs to see if I was doing something obviously wrong. I >>> used null values because I'm basically wanting to join two pcollections. >>> >>> Can crunch either throw an exception or log errors if I do something >>> like this? Similarly would it be possible to get convenience methods for >>> doing joins on PCollections? >>> >>> [1] - http://crunch.apache.org/user-guide.html#joins >>> >> >> >> >> -- >> Director of Data Science >> Cloudera <http://www.cloudera.com> >> Twitter: @josh_wills <http://twitter.com/josh_wills> >> > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
