Re: frequent itemsets

2016-01-02 Thread Yanbo Liang
Hi Roberto,

Could you share your code snippet that others can help to diagnose your
problems?



2016-01-02 7:51 GMT+08:00 Roberto Pagliari <roberto.pagli...@asos.com>:

> When using the frequent itemsets APIs, I’m running into stackOverflow
> exception whenever there are too many combinations to deal with and/or too
> many transactions and/or too many items.
>
>
> Does anyone know how many transactions/items these APIs can deal with?
>
>
> Thank you ,
>
>


RE: frequent itemsets

2016-01-02 Thread LinChen
Hi Roberto,What is the minimum support threshold you set? Could you check which 
stage you ran into StackOverFlow exception?
Thanks.

From: roberto.pagli...@asos.com
To: yblia...@gmail.com
CC: user@spark.apache.org
Subject: Re: frequent itemsets
Date: Sat, 2 Jan 2016 12:01:31 +






Hi Yanbo,
Unfortunately, I cannot share the data. I am using the code in the tutorial 



https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html



Did you ever try run it when there are hundreds of millions of co-purchases of 
at least two products?
I suspect AR does not handle that very well. 



Thank you, 











From: Yanbo Liang <yblia...@gmail.com>

Date: Saturday, 2 January 2016 09:03

To: Roberto Pagliari <roberto.pagli...@asos.com>

Cc: "user@spark.apache.org" <user@spark.apache.org>

Subject: Re: frequent itemsets







Hi Roberto,



Could you share your code snippet that others can help to diagnose your 
problems?









2016-01-02 7:51 GMT+08:00 Roberto Pagliari 
<roberto.pagli...@asos.com>:



When using the frequent itemsets APIs, I’m running into stackOverflow exception 
whenever there are too many combinations to deal with and/or too many 
transactions and/or too many items. 






Does anyone know how many transactions/items these APIs can deal with?






Thank you ,











  

Re: frequent itemsets

2016-01-02 Thread Roberto Pagliari
Hi Yanbo,
Unfortunately, I cannot share the data. I am using the code in the tutorial

https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html

Did you ever try run it when there are hundreds of millions of co-purchases of 
at least two products?
I suspect AR does not handle that very well.

Thank you,



From: Yanbo Liang <yblia...@gmail.com<mailto:yblia...@gmail.com>>
Date: Saturday, 2 January 2016 09:03
To: Roberto Pagliari 
<roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: frequent itemsets

Hi Roberto,

Could you share your code snippet that others can help to diagnose your 
problems?



2016-01-02 7:51 GMT+08:00 Roberto Pagliari 
<roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>>:
When using the frequent itemsets APIs, I'm running into stackOverflow exception 
whenever there are too many combinations to deal with and/or too many 
transactions and/or too many items.


Does anyone know how many transactions/items these APIs can deal with?


Thank you ,




Re: frequent itemsets

2016-01-02 Thread Roberto Pagliari
Hi Lin,
>From 1e-5 and below it crashes with me. I also developed my own program in C++ 
>(single machine, no spark) and I was able to compute all itemsets, that is, 
>support = 0.

Stack overflow definitely occur when computing frequent itemset, before 
association rule even starts. If you want, I can try generate an artificial 
dataset to share. Did you ever try with hundreds of millions of frequent 
itemsets?

With small datasets it works, but it looks like there might be issues when the 
number of combination grows.

Thanks,

From: LinChen <m2linc...@outlook.com<mailto:m2linc...@outlook.com>>
Date: Saturday, 2 January 2016 14:48
To: Roberto Pagliari 
<roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: RE: frequent itemsets

Hi Roberto,
What is the minimum support threshold you set?
Could you check which stage you ran into StackOverFlow exception?

Thanks.



From: roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>
To: yblia...@gmail.com<mailto:yblia...@gmail.com>
CC: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: frequent itemsets
Date: Sat, 2 Jan 2016 12:01:31 +

Hi Yanbo,
Unfortunately, I cannot share the data. I am using the code in the tutorial

https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html

Did you ever try run it when there are hundreds of millions of co-purchases of 
at least two products?
I suspect AR does not handle that very well.

Thank you,



From: Yanbo Liang <yblia...@gmail.com<mailto:yblia...@gmail.com>>
Date: Saturday, 2 January 2016 09:03
To: Roberto Pagliari 
<roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: frequent itemsets

Hi Roberto,

Could you share your code snippet that others can help to diagnose your 
problems?



2016-01-02 7:51 GMT+08:00 Roberto Pagliari 
<roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>>:
When using the frequent itemsets APIs, I'm running into stackOverflow exception 
whenever there are too many combinations to deal with and/or too many 
transactions and/or too many items.


Does anyone know how many transactions/items these APIs can deal with?


Thank you ,




RE: frequent itemsets

2016-01-02 Thread LinChen
Hi roberto,I have ever done some experiments on a dataset with 3196 
transactions and 289154813 frequent itemsets. FPGrowth can finish the computing 
within 10 minutes. I can have a try if you could share the artificial dataset.

From: roberto.pagli...@asos.com
To: m2linc...@outlook.com
CC: user@spark.apache.org
Subject: Re: frequent itemsets
Date: Sun, 3 Jan 2016 01:20:07 +






Hi Lin,
From 1e-5 and below it crashes with me. I also developed my own program in C++ 
(single machine, no spark) and I was able to compute all itemsets, that is, 
support = 0. 



Stack overflow definitely occur when computing frequent itemset, before 
association rule even starts. If you want, I can try generate an artificial 
dataset to share. Did you ever try with hundreds of millions of frequent 
itemsets?



With small datasets it works, but it looks like there might be issues when the 
number of combination grows. 



Thanks, 





From: LinChen <m2linc...@outlook.com>

Date: Saturday, 2 January 2016 14:48

To: Roberto Pagliari <roberto.pagli...@asos.com>

Cc: "user@spark.apache.org" <user@spark.apache.org>

Subject: RE: frequent itemsets







Hi Roberto,
What is the minimum support threshold you set? 
Could you check which stage you ran into StackOverFlow exception?



Thanks.






From: roberto.pagli...@asos.com

To: yblia...@gmail.com

CC: user@spark.apache.org

Subject: Re: frequent itemsets

Date: Sat, 2 Jan 2016 12:01:31 +



Hi Yanbo,
Unfortunately, I cannot share the data. I am using the code in the tutorial 



https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html



Did you ever try run it when there are hundreds of millions of co-purchases of 
at least two products?
I suspect AR does not handle that very well. 



Thank you, 











From: Yanbo Liang <yblia...@gmail.com>

Date: Saturday, 2 January 2016 09:03

To: Roberto Pagliari <roberto.pagli...@asos.com>

Cc: "user@spark.apache.org" <user@spark.apache.org>

Subject: Re: frequent itemsets







Hi Roberto,



Could you share your code snippet that others can help to diagnose your 
problems?









2016-01-02 7:51 GMT+08:00 Roberto Pagliari 
<roberto.pagli...@asos.com>:



When using the frequent itemsets APIs, I’m running into stackOverflow exception 
whenever there are too many combinations to deal with and/or too many 
transactions and/or too many items. 






Does anyone know how many transactions/items these APIs can deal with?






Thank you ,
















  

frequent itemsets

2016-01-01 Thread Roberto Pagliari
When using the frequent itemsets APIs, I'm running into stackOverflow exception 
whenever there are too many combinations to deal with and/or too many 
transactions and/or too many items.


Does anyone know how many transactions/items these APIs can deal with?


Thank you ,