Re: spark-itemsimilarity slower than itemsimilarity

2016-09-30 Thread Arnau Sanchez
e > much difference in the run times. > > -s > > > On 29.09.2016 22:17, Arnau Sanchez wrote: > > --input ratings --output spark-itemsimilarity --maxSimilaritiesPerItem 10

Re: spark-itemsimilarity slower than itemsimilarity

2016-09-29 Thread Arnau Sanchez
ritiesPerItem 10 --master yarn-client |& tee spark-itemsimilarity.out Thanks! On Thu, 29 Sep 2016 19:46:03 +0200 Arnau Sanchez <pyar...@gmail.com> wrote: > Hi Sebastian, > > That's weird, it works here. Anyway, a Dropbox link: > > https://www.dropbox.com/sh/ex0d74sc

Re: spark-itemsimilarity slower than itemsimilarity

2016-09-29 Thread Arnau Sanchez
nd create a model every week and need 4 > r3.8xlarge to do it in 1 hour you only pay 1/168th of what you would for a > permanent cluster. This brings the cost to a quite reasonable range. You are > very unlikely to need machines that large anyway but you could afford it if > you only

Re: spark-itemsimilarity slower than itemsimilarity

2016-09-26 Thread Arnau Sanchez
On Sun, 25 Sep 2016 09:01:43 -0700 Pat Ferrel wrote: > AWS EMR is usually not very well suited for Spark. What infrastructure would you recommend? Some EC2 instances provide lots of memory (though maybe not with the most competitive price: r3.8xlarge, 244Gb RAM). My

spark-itemsimilarity slower than itemsimilarity

2016-09-22 Thread Arnau Sanchez
I've been using the Mahout itemsimilarity job for a while, with good results. I read that the new spark-itemsimilarity job is typically faster, by a factor of 10, so I wanted to give it a try. I must be doing something wrong because, with the same EMR infrastructure, the spark job is slower