Hi, you could apply a filter operation after the cross operation which filters all combinations out which are not in ascending order.
Cheers, Till On Sun, Feb 8, 2015 at 12:38 PM, tanguy racinet <[email protected]> wrote: > Hi, > > Thank you for you reply. It helped us solve the looping problems in a > nicer way. > > We are struggling with some aspects of the cross function. > Still trying to implement the Apriori algorithm, we need to create > combinations of frequent itemSets. > Our problem is that the crossing gives us duplicates, for instance :(1, 2, > 3, 4) and (2, 1, 4, 3) are equivalent for us so we are trying to find a way > to remove that kind of duplicate in our DataSet. > > We already removed duplicates inside our combinations (1, 1, 2) => (1, 2). > > We were thinking about using HashSet but they are not serializable and we > cannot use them inside the workflow, but only inside functions. > > Can you think of any way to remove those duplicates ? > > Thank you, > ᐧ > > <http://eitictlabs-rennes.fr/> > > > *Racinet Tanguy* > > *EIT ICT Labs Master School Student* > *Distributed Systems and Services* > > Tel : +33 6 63 20 89 16 / +49 176 3749 8854 > Mail : [email protected] > > On Thu, Feb 5, 2015 at 8:51 PM, Vasiliki Kalavri < > [email protected]> wrote: > >> Hi, >> >> I'm not familiar with the particular algorithm, but you can most probably >> use one of the two iterate operators in Flink. >> >> You can read a description and see some examples in the documentation: >> >> http://flink.apache.org/docs/0.8/programming_guide.html#iteration-operators >> >> Let us know if you have any questions! >> >> Cheers, >> V. >> >> On 5 February 2015 at 20:37, tanguy racinet <[email protected]> wrote: >> >>> Hi, >>> >>> We are trying to develop the Apriori algorith with the Flink for our >>> Data minning project. >>> In our understanding, Flink could handle loop within the workflow. >>> However, our knowledge is limited and we cannot find a nice way to do it. >>> >>> Here is the flow of my algorithm : >>> GenerateCandidates ----> CalculateFrequentItemSet >>> mapper ----> reducer >>> >>> We would like to use the reducer result as the mapper's entry for a >>> predefined number of times (loop x times). >>> >>> Is there any smart way to that with Flink. Or should we just copy paste >>> the loop x times ? >>> >>> Thank you, >>> <http://eitictlabs-rennes.fr/> >>> >>> >>> *Racinet Tanguy* >>> >>> *EIT ICT Labs Master School Student* >>> *Distributed Systems and Services* >>> >>> Tel : +33 6 63 20 89 16 / +49 176 3749 8854 >>> Mail : [email protected] >>> >>> ᐧ >>> >> >> >
