Re: MLlib NNLS implementation is buggy, returning wrong solutions

2014-07-28 Thread Shuo Xiang
It is possible that the answer (the final solution vector x) given by two different algorithms (such as the one in mllib and in R) are different, as the problem may not be strictly convex and multiple global optimum may exist. However, these answers should admit the same objective values. Can you

Re: Large scale ranked recommendation

2014-07-17 Thread Shuo Xiang
Hi, Are you suggesting that taking simple vector dot products or sigmoid function on 10K * 1M data takes 5hrs? On Thu, Jul 17, 2014 at 3:59 PM, m3.sharma sharm...@umn.edu wrote: We are using RegressionModels that comes with *mllib* package in SPARK. -- View this message in context:

Re: spark ui on yarn

2014-07-12 Thread Shuo Xiang
Hi Koert, Just curious did you find any information like CANNOT FIND ADDRESS after clicking into some stage? I've seen similar problems due to lost of executors. Best, On Fri, Jul 11, 2014 at 4:42 PM, Koert Kuipers ko...@tresata.com wrote: I just tested a long lived application (that we

Set the number/memory of workers under mesos

2014-06-20 Thread Shuo Xiang
Hi, just wondering anybody knows how to set up the number of workers (and the amount of memory) in mesos, while lauching spark-shell? I was trying to edit conf/spark-env.sh and it looks like that the environment variables are for YARN of standalone. Thanks!

Re: MLLib inside Storm : silly or not ?

2014-06-19 Thread Shuo Xiang
If I'm understanding correctly, you want to use MLlib for offline training and then deploy the learned model to Storm? In this case I don't think there is any problem. However if you are looking for online model update/training, this can be complicated and I guess quite a few algorithms in mllib

Re: Information on Spark UI

2014-06-11 Thread Shuo Xiang
commonly, the result of the stage may be used in a later calculation, and has to be recalculated. This happens if some of the results were evicted from cache. On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang shuoxiang...@gmail.com wrote: Hi, Came up with some confusion regarding

Re: Information on Spark UI

2014-06-11 Thread Shuo Xiang
replication but still seeing this. On Wednesday, June 11, 2014, Shuo Xiang shuoxiang...@gmail.com wrote: Daniel, Thanks for the explanation. On Wed, Jun 11, 2014 at 8:57 AM, Daniel Darabos daniel.dara...@lynxanalytics.com wrote: About more succeeded tasks than total tasks

Re: Not fully cached when there is enough memory

2014-06-11 Thread Shuo Xiang
Xiangrui, clicking into the RDD link, it gives the same message, say only 96 of 100 partitions are cached. The disk/memory usage are the same, which is far below the limit. Is this what you want to check or other issue? On Wed, Jun 11, 2014 at 4:38 PM, Xiangrui Meng men...@gmail.com wrote: