Hi Yuhao,
    I have tried numPartitions from (numExecutors * numExecutorCores),
1000, 2000 and 10000. I did not see much improvement.

Having more partitions solved some perf issues but did not see any
improvement when I give less minsupport.

It is generating 260 million frequent item sets with 63K transactions and
200K Items in total with lesser min support value.

On Tue, Mar 14, 2017 at 3:30 PM, Yuhao Yang <hhb...@gmail.com> wrote:

> Hi Raju,
>
> Have you tried setNumPartitions with a larger number?
>
> 2017-03-07 0:30 GMT-08:00 Eli Super <eli.su...@gmail.com>:
>
>> Hi
>>
>> It's area of knowledge , you will need to read online several hours about
>> it
>>
>> What is your programming language ?
>>
>> Try search online : "machine learning binning %my_programing_langauge%"
>> and
>> "machine learning feature engineering %my_programing_langauge%"
>>
>> On Tue, Mar 7, 2017 at 3:39 AM, Raju Bairishetti <r...@apache.org> wrote:
>>
>>> @Eli, Thanks for the suggestion. If you do not mind can you please
>>> elaborate approaches?
>>>
>>> On Mon, Mar 6, 2017 at 7:29 PM, Eli Super <eli.su...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Try to implement binning and/or feature engineering (smart feature
>>>> selection for example)
>>>>
>>>> Good luck
>>>>
>>>> On Mon, Mar 6, 2017 at 6:56 AM, Raju Bairishetti <r...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>   I am new to Spark ML Lib. I am using FPGrowth model for finding
>>>>> related items.
>>>>>
>>>>> Number of transactions are 63K and the total number of items in all
>>>>> transactions are 200K.
>>>>>
>>>>> I am running FPGrowth model to generate frequent items sets. It is
>>>>> taking huge amount of time to generate frequent itemsets.* I am
>>>>> setting min-support value such that each item appears in at least ~(number
>>>>> of items)/(number of transactions).*
>>>>>
>>>>> It is taking lots of time in case If I say item can appear at least
>>>>> once in the database.
>>>>>
>>>>> If I give higher value to min-support then output is very smaller.
>>>>>
>>>>> Could anyone please guide me how to reduce the execution time for
>>>>> generating frequent items?
>>>>>
>>>>> ------
>>>>> Thanks,
>>>>> Raju Bairishetti,
>>>>> www.lazada.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> ------
>>> Thanks,
>>> Raju Bairishetti,
>>> www.lazada.com
>>>
>>
>>
>


-- 

------
Thanks,
Raju Bairishetti,
www.lazada.com

Reply via email to