Re: frequent itemsets
Hi Roberto, Could you share your code snippet that others can help to diagnose your problems? 2016-01-02 7:51 GMT+08:00 Roberto Pagliari <roberto.pagli...@asos.com>: > When using the frequent itemsets APIs, I’m running into stackOverflow > exception whenever there are too many combinations to deal with and/or too > many transactions and/or too many items. > > > Does anyone know how many transactions/items these APIs can deal with? > > > Thank you , > >
RE: frequent itemsets
Hi Roberto,What is the minimum support threshold you set? Could you check which stage you ran into StackOverFlow exception? Thanks. From: roberto.pagli...@asos.com To: yblia...@gmail.com CC: user@spark.apache.org Subject: Re: frequent itemsets Date: Sat, 2 Jan 2016 12:01:31 + Hi Yanbo, Unfortunately, I cannot share the data. I am using the code in the tutorial https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html Did you ever try run it when there are hundreds of millions of co-purchases of at least two products? I suspect AR does not handle that very well. Thank you, From: Yanbo Liang <yblia...@gmail.com> Date: Saturday, 2 January 2016 09:03 To: Roberto Pagliari <roberto.pagli...@asos.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: frequent itemsets Hi Roberto, Could you share your code snippet that others can help to diagnose your problems? 2016-01-02 7:51 GMT+08:00 Roberto Pagliari <roberto.pagli...@asos.com>: When using the frequent itemsets APIs, I’m running into stackOverflow exception whenever there are too many combinations to deal with and/or too many transactions and/or too many items. Does anyone know how many transactions/items these APIs can deal with? Thank you ,
Re: frequent itemsets
Hi Yanbo, Unfortunately, I cannot share the data. I am using the code in the tutorial https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html Did you ever try run it when there are hundreds of millions of co-purchases of at least two products? I suspect AR does not handle that very well. Thank you, From: Yanbo Liang <yblia...@gmail.com<mailto:yblia...@gmail.com>> Date: Saturday, 2 January 2016 09:03 To: Roberto Pagliari <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: frequent itemsets Hi Roberto, Could you share your code snippet that others can help to diagnose your problems? 2016-01-02 7:51 GMT+08:00 Roberto Pagliari <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>>: When using the frequent itemsets APIs, I'm running into stackOverflow exception whenever there are too many combinations to deal with and/or too many transactions and/or too many items. Does anyone know how many transactions/items these APIs can deal with? Thank you ,
Re: frequent itemsets
Hi Lin, >From 1e-5 and below it crashes with me. I also developed my own program in C++ >(single machine, no spark) and I was able to compute all itemsets, that is, >support = 0. Stack overflow definitely occur when computing frequent itemset, before association rule even starts. If you want, I can try generate an artificial dataset to share. Did you ever try with hundreds of millions of frequent itemsets? With small datasets it works, but it looks like there might be issues when the number of combination grows. Thanks, From: LinChen <m2linc...@outlook.com<mailto:m2linc...@outlook.com>> Date: Saturday, 2 January 2016 14:48 To: Roberto Pagliari <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: RE: frequent itemsets Hi Roberto, What is the minimum support threshold you set? Could you check which stage you ran into StackOverFlow exception? Thanks. From: roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com> To: yblia...@gmail.com<mailto:yblia...@gmail.com> CC: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: frequent itemsets Date: Sat, 2 Jan 2016 12:01:31 + Hi Yanbo, Unfortunately, I cannot share the data. I am using the code in the tutorial https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html Did you ever try run it when there are hundreds of millions of co-purchases of at least two products? I suspect AR does not handle that very well. Thank you, From: Yanbo Liang <yblia...@gmail.com<mailto:yblia...@gmail.com>> Date: Saturday, 2 January 2016 09:03 To: Roberto Pagliari <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: frequent itemsets Hi Roberto, Could you share your code snippet that others can help to diagnose your problems? 2016-01-02 7:51 GMT+08:00 Roberto Pagliari <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>>: When using the frequent itemsets APIs, I'm running into stackOverflow exception whenever there are too many combinations to deal with and/or too many transactions and/or too many items. Does anyone know how many transactions/items these APIs can deal with? Thank you ,
RE: frequent itemsets
Hi roberto,I have ever done some experiments on a dataset with 3196 transactions and 289154813 frequent itemsets. FPGrowth can finish the computing within 10 minutes. I can have a try if you could share the artificial dataset. From: roberto.pagli...@asos.com To: m2linc...@outlook.com CC: user@spark.apache.org Subject: Re: frequent itemsets Date: Sun, 3 Jan 2016 01:20:07 + Hi Lin, From 1e-5 and below it crashes with me. I also developed my own program in C++ (single machine, no spark) and I was able to compute all itemsets, that is, support = 0. Stack overflow definitely occur when computing frequent itemset, before association rule even starts. If you want, I can try generate an artificial dataset to share. Did you ever try with hundreds of millions of frequent itemsets? With small datasets it works, but it looks like there might be issues when the number of combination grows. Thanks, From: LinChen <m2linc...@outlook.com> Date: Saturday, 2 January 2016 14:48 To: Roberto Pagliari <roberto.pagli...@asos.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Subject: RE: frequent itemsets Hi Roberto, What is the minimum support threshold you set? Could you check which stage you ran into StackOverFlow exception? Thanks. From: roberto.pagli...@asos.com To: yblia...@gmail.com CC: user@spark.apache.org Subject: Re: frequent itemsets Date: Sat, 2 Jan 2016 12:01:31 + Hi Yanbo, Unfortunately, I cannot share the data. I am using the code in the tutorial https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html Did you ever try run it when there are hundreds of millions of co-purchases of at least two products? I suspect AR does not handle that very well. Thank you, From: Yanbo Liang <yblia...@gmail.com> Date: Saturday, 2 January 2016 09:03 To: Roberto Pagliari <roberto.pagli...@asos.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: frequent itemsets Hi Roberto, Could you share your code snippet that others can help to diagnose your problems? 2016-01-02 7:51 GMT+08:00 Roberto Pagliari <roberto.pagli...@asos.com>: When using the frequent itemsets APIs, I’m running into stackOverflow exception whenever there are too many combinations to deal with and/or too many transactions and/or too many items. Does anyone know how many transactions/items these APIs can deal with? Thank you ,
frequent itemsets
When using the frequent itemsets APIs, I'm running into stackOverflow exception whenever there are too many combinations to deal with and/or too many transactions and/or too many items. Does anyone know how many transactions/items these APIs can deal with? Thank you ,