Hi Burak, That’s interesting. I’ll try and give it a go.
Eilidh On 14 Nov 2015, at 04:19, Burak Yavuz <brk...@gmail.com> wrote: > Hi, > > The BlockMatrix multiplication should be much more efficient on the current > master (and will be available with Spark 1.6). Could you please give that a > try if you have the chance? > > Thanks, > Burak > > On Fri, Nov 13, 2015 at 10:11 AM, Sabarish Sasidharan > <sabarish.sasidha...@manthan.com> wrote: > Hi Eilidh > > Because you are multiplying with the transpose you don't have to necessarily > build the right side of the matrix. I hope you see that. You can broadcast > blocks of the indexed row matrix to itself and achieve the multiplication. > > But for similarity computation you might want to use some approach like > locality sensitive hashing first to identify a bunch of similar customers and > then apply cosine similarity on that narrowed down list. That would scale > much better than matrix multiplication. You could try the following options > for the same. > > https://github.com/soundcloud/cosine-lsh-join-spark > http://spark-packages.org/package/tdebatty/spark-knn-graphs > https://github.com/marufaytekin/lsh-spark > > Regards > Sab > > Hi Sab, > > Thanks for your response. We’re thinking of trying a bigger cluster, because > we just started with 2 nodes. What we really want to know is whether the code > will scale up with larger matrices and more nodes. I’d be interested to hear > how large a matrix multiplication you managed to do? > > Is there an alternative you’d recommend for calculating similarity over a > large dataset? > > Thanks, > Eilidh > > On 13 Nov 2015, at 09:55, Sabarish Sasidharan > <sabarish.sasidha...@manthan.com> wrote: > >> We have done this by blocking but without using BlockMatrix. We used our own >> blocking mechanism because BlockMatrix didn't exist in Spark 1.2. What is >> the size of your block? How much memory are you giving to the executors? I >> assume you are running on YARN, if so you would want to make sure your yarn >> executor memory overhead is set to a higher value than default. >> >> Just curious, could you also explain why you need matrix multiplication with >> transpose? Smells like similarity computation. >> >> Regards >> Sab >> >> On Thu, Nov 12, 2015 at 7:27 PM, Eilidh Troup <e.tr...@epcc.ed.ac.uk> wrote: >> Hi, >> >> I’m trying to multiply a large squarish matrix with its transpose. >> Eventually I’d like to work with matrices of size 200,000 by 500,000, but >> I’ve started off first with 100 by 100 which was fine, and then with 10,000 >> by 10,000 which failed with an out of memory exception. >> >> I used MLlib and BlockMatrix and tried various block sizes, and also tried >> switching disk serialisation on. >> >> We are running on a small cluster, using a CSV file in HDFS as the input >> data. >> >> Would anyone with experience of multiplying large, dense matrices in spark >> be able to comment on what to try to make this work? >> >> Thanks, >> Eilidh >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> >> >> -- >> >> Architect - Big Data >> Ph: +91 99805 99458 >> >> Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan >> India ICT) >> +++ > > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org