That would be very helpful... Thanks for the support, Flavio
On Fri, Apr 17, 2015 at 10:04 AM, Till Rohrmann <till.rohrm...@gmail.com> wrote: > No its not, but at the moment there is afaik no other way around it. There > is an issue for proper outer join support [1] > > [1] https://issues.apache.org/jira/browse/FLINK-687 > > On Fri, Apr 17, 2015 at 10:01 AM, Flavio Pompermaier <pomperma...@okkam.it > > wrote: > >> Could resolve the problem but the fact to accumulate stuff in a local >> variable is it safe if datasets are huge..? >> >> On Fri, Apr 17, 2015 at 9:54 AM, Till Rohrmann <till.rohrm...@gmail.com> >> wrote: >> >>> If it's fine when you have null string values in the cases where >>> D1.f1!="a1" or D1.f2!="a2" then a possible solution could look like (with >>> Scala API): >>> >>> val ds1: DataSet[(String, String, String)] = getDS1 >>> val ds2: DataSet[(String, String, String)] = getDS2 >>> >>> ds1.coGroup(ds2).where(2).equalTo(0) { >>> (left, right, collector: Collector[(String, String, String, String)]) >>> => { >>> if(right.isEmpty) { >>> left foreach { >>> element => { >>> val value1 = if(element._2 == "a1") element._3 else null >>> val value2 = if(element._2 == "a2") element._3 else null >>> collector.collect((element._1, null, value1, value2)) >>> } >>> } >>> } else { >>> val array = right.toArray >>> for(leftElement <- left) { >>> val value1 = if(leftElement._2 == "a1") leftElement._3 else null >>> val value2 = if(leftElement._2 == "a2") leftElement._3 else null >>> >>> for(rightElement <- array) { >>> collector.collect(leftElement._1, rightElement._1, value1, value2)) >>> } >>> } >>> } >>> } >>> } >>> >>> Does this solve your problem? >>> >>> On Fri, Apr 17, 2015 at 9:30 AM, Flavio Pompermaier < >>> pomperma...@okkam.it> wrote: >>> >>>> Hi Till, >>>> thanks for the reply. >>>> What I'd like to do is to merge D1 and D2 if there's a ref from D1 to >>>> D2 (D1.f2==D2.f0). >>>> If this condition is true, I would like to produce a set of tuples with >>>> the matching elements >>>> at the first to places (D1.*f2*, D2.*f0*) and the other two values (if >>>> present) of the matching tuple >>>> in D1 when D1.f1==*"a1"* and D1.f2=*"a2"* (string values) respectively. >>>> (PS: For each value of D1.f0 you can have at most one value of a1 and >>>> a2) >>>> >>>> Is it more clear? >>>> >>>> On Fri, Apr 17, 2015 at 9:03 AM, Till Rohrmann <till.rohrm...@gmail.com >>>> > wrote: >>>> >>>>> Hi Flavio, >>>>> >>>>> I don't really understand what you try to do. What does >>>>> D1.f2(D1.f1==p1) mean? What does happen if the condition in D1.f2(if >>>>> D1.f1==p2) is false? >>>>> >>>>> Where does the values a1 and a2 in (A, X, a1, a2) come from when you >>>>> join [(A, p3, X), (X, s, V)] and [(A, p3, X), (X, r, 2)]? Maybe you can >>>>> elaborate a bit more on your example. >>>>> >>>>> Cheers, >>>>> >>>>> Till >>>>> >>>>> On Thu, Apr 16, 2015 at 10:09 PM, Flavio Pompermaier < >>>>> pomperma...@okkam.it> wrote: >>>>> >>>>>> I cannot find a solution to my use case :( >>>>>> I have 2 datasets D1 and D2 like: >>>>>> >>>>>> D1: >>>>>> A,p1,a1 >>>>>> A,p2,a2 >>>>>> A,p3,X >>>>>> B,p3,Y >>>>>> B,p1,b1 >>>>>> >>>>>> D2: >>>>>> X,s,V >>>>>> X,r,2 >>>>>> Y,j,k >>>>>> >>>>>> I'd like to have a unique dataset D3(Tuple4) like >>>>>> >>>>>> A,X,a1,a2 >>>>>> B,Y,b1,null >>>>>> >>>>>> Basically filling with <D1.f0,D2.f0,D1.f2(D1.f1==p1),D1.f2(if >>>>>> D1.f1==p2)> when D1.f2==D2.f0. >>>>>> Is that possible and how? >>>>>> Could you show me a simple snippet? >>>>>> >>>>>> Thanks in advance, >>>>>> Flavio >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Apr 16, 2015 at 9:48 PM, Till Rohrmann <trohrm...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> You can materialize the input of the right input by creating an >>>>>>> array out of it, for example. Then you can reiterate over it. >>>>>>> >>>>>>> Cheers, >>>>>>> Till >>>>>>> On Apr 16, 2015 7:37 PM, "Flavio Pompermaier" <pomperma...@okkam.it> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Maximilian, >>>>>>>> I tried your solution but it doesn't work because the rightElements >>>>>>>> iterator cannot be used more than once: >>>>>>>> >>>>>>>> Caused by: org.apache.flink.util.TraversableOnceException: The >>>>>>>> Iterable can be iterated over only once. Only the first call to >>>>>>>> 'iterator()' will succeed. >>>>>>>> >>>>>>>> On Wed, Apr 15, 2015 at 12:59 PM, Maximilian Michels < >>>>>>>> m...@apache.org> wrote: >>>>>>>> >>>>>>>>> Hi Flavio, >>>>>>>>> >>>>>>>>> Here's an simple example of a Left Outer Join: >>>>>>>>> https://gist.github.com/mxm/c2e9c459a9d82c18d789 >>>>>>>>> >>>>>>>>> As Stephan pointed out, this can be very easily modified to >>>>>>>>> construct a Right Outer Join (just exchange leftElements and >>>>>>>>> rightElements >>>>>>>>> in the two loops). >>>>>>>>> >>>>>>>>> Here's an excerpt with the most important part, the coGroup >>>>>>>>> function: >>>>>>>>> >>>>>>>>> public static class LeftOuterJoin implements >>>>>>>>> CoGroupFunction<Tuple2<Integer, String>, Tuple2<Integer, String>, >>>>>>>>> Tuple2<Integer, Integer>> { >>>>>>>>> >>>>>>>>> @Override >>>>>>>>> public void coGroup(Iterable<Tuple2<Integer, String>> leftElements, >>>>>>>>> Iterable<Tuple2<Integer, String>> >>>>>>>>> rightElements, >>>>>>>>> Collector<Tuple2<Integer, Integer>> out) >>>>>>>>> throws Exception { >>>>>>>>> >>>>>>>>> final int NULL_ELEMENT = -1; >>>>>>>>> >>>>>>>>> for (Tuple2<Integer, String> leftElem : leftElements) { >>>>>>>>> boolean hadElements = false; >>>>>>>>> for (Tuple2<Integer, String> rightElem : rightElements) { >>>>>>>>> out.collect(new Tuple2<Integer, Integer>(leftElem.f0, >>>>>>>>> rightElem.f0)); >>>>>>>>> hadElements = true; >>>>>>>>> } >>>>>>>>> if (!hadElements) { >>>>>>>>> out.collect(new Tuple2<Integer, Integer>(leftElem.f0, >>>>>>>>> NULL_ELEMENT)); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Apr 15, 2015 at 11:01 AM, Stephan Ewen <se...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I think this may be a great example to add as a utility function. >>>>>>>>>> >>>>>>>>>> Or actually add as an function to the DataSet, internally >>>>>>>>>> realized as a special case of coGroup. >>>>>>>>>> >>>>>>>>>> We do not have a ready example of that, but it should be >>>>>>>>>> straightforward to realize. Similar as for the join, coGroup on the >>>>>>>>>> join >>>>>>>>>> keys. Inside the coGroup function, emit the combination of all >>>>>>>>>> values from >>>>>>>>>> the two iterators. If one of them is empty (the one that is not >>>>>>>>>> outer) then >>>>>>>>>> emit all values from the outer side. >>>>>>>>>> >>>>>>>>>> Greetings, >>>>>>>>>> Stephan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Apr 15, 2015 at 10:36 AM, Flavio Pompermaier < >>>>>>>>>> pomperma...@okkam.it> wrote: >>>>>>>>>> >>>>>>>>>>> Do you have an already working example of it? :) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 15, 2015 at 10:32 AM, Ufuk Celebi <u...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 15 Apr 2015, at 10:30, Flavio Pompermaier < >>>>>>>>>>>> pomperma...@okkam.it> wrote: >>>>>>>>>>>> >>>>>>>>>>>> > >>>>>>>>>>>> > Hi to all, >>>>>>>>>>>> > I have to join two datasets but I'd like to keep all data in >>>>>>>>>>>> the left also if there' no right dataset. >>>>>>>>>>>> > How can you achieve that in Flink? maybe I should use coGroup? >>>>>>>>>>>> >>>>>>>>>>>> Yes, currently you have to implement this manually with a >>>>>>>>>>>> coGroup >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>> >>>> >>> >> >> >