Hi Yuhao, I have tried numPartitions from (numExecutors * numExecutorCores), 1000, 2000 and 10000. I did not see much improvement.
Having more partitions solved some perf issues but did not see any improvement when I give less minsupport. It is generating 260 million frequent item sets with 63K transactions and 200K Items in total with lesser min support value. On Tue, Mar 14, 2017 at 3:30 PM, Yuhao Yang <hhb...@gmail.com> wrote: > Hi Raju, > > Have you tried setNumPartitions with a larger number? > > 2017-03-07 0:30 GMT-08:00 Eli Super <eli.su...@gmail.com>: > >> Hi >> >> It's area of knowledge , you will need to read online several hours about >> it >> >> What is your programming language ? >> >> Try search online : "machine learning binning %my_programing_langauge%" >> and >> "machine learning feature engineering %my_programing_langauge%" >> >> On Tue, Mar 7, 2017 at 3:39 AM, Raju Bairishetti <r...@apache.org> wrote: >> >>> @Eli, Thanks for the suggestion. If you do not mind can you please >>> elaborate approaches? >>> >>> On Mon, Mar 6, 2017 at 7:29 PM, Eli Super <eli.su...@gmail.com> wrote: >>> >>>> Hi >>>> >>>> Try to implement binning and/or feature engineering (smart feature >>>> selection for example) >>>> >>>> Good luck >>>> >>>> On Mon, Mar 6, 2017 at 6:56 AM, Raju Bairishetti <r...@apache.org> >>>> wrote: >>>> >>>>> Hi, >>>>> I am new to Spark ML Lib. I am using FPGrowth model for finding >>>>> related items. >>>>> >>>>> Number of transactions are 63K and the total number of items in all >>>>> transactions are 200K. >>>>> >>>>> I am running FPGrowth model to generate frequent items sets. It is >>>>> taking huge amount of time to generate frequent itemsets.* I am >>>>> setting min-support value such that each item appears in at least ~(number >>>>> of items)/(number of transactions).* >>>>> >>>>> It is taking lots of time in case If I say item can appear at least >>>>> once in the database. >>>>> >>>>> If I give higher value to min-support then output is very smaller. >>>>> >>>>> Could anyone please guide me how to reduce the execution time for >>>>> generating frequent items? >>>>> >>>>> ------ >>>>> Thanks, >>>>> Raju Bairishetti, >>>>> www.lazada.com >>>>> >>>> >>>> >>> >>> >>> -- >>> >>> ------ >>> Thanks, >>> Raju Bairishetti, >>> www.lazada.com >>> >> >> > -- ------ Thanks, Raju Bairishetti, www.lazada.com