Hi Burak,

That’s interesting. I’ll try and give it a go.

Eilidh

On 14 Nov 2015, at 04:19, Burak Yavuz <brk...@gmail.com> wrote:

> Hi,
> 
> The BlockMatrix multiplication should be much more efficient on the current 
> master (and will be available with Spark 1.6). Could you please give that a 
> try if you have the chance?
> 
> Thanks,
> Burak
> 
> On Fri, Nov 13, 2015 at 10:11 AM, Sabarish Sasidharan 
> <sabarish.sasidha...@manthan.com> wrote:
> Hi Eilidh
> 
> Because you are multiplying with the transpose you don't have  to necessarily 
> build the right side of the matrix. I hope you see that. You can broadcast 
> blocks of the indexed row matrix to itself and achieve the multiplication.
> 
> But for similarity computation you might want to use some approach like 
> locality sensitive hashing first to identify a bunch of similar customers and 
> then apply cosine similarity on that narrowed down list. That would scale 
> much better than matrix multiplication. You could try the following options 
> for the same.
> 
> https://github.com/soundcloud/cosine-lsh-join-spark
> http://spark-packages.org/package/tdebatty/spark-knn-graphs
> https://github.com/marufaytekin/lsh-spark
> 
> Regards
> Sab
> 
> Hi Sab,
> 
> Thanks for your response. We’re thinking of trying a bigger cluster, because 
> we just started with 2 nodes. What we really want to know is whether the code 
> will scale up with larger matrices and more nodes. I’d be interested to hear 
> how large a matrix multiplication you managed to do?
> 
> Is there an alternative you’d recommend for calculating similarity over a 
> large dataset?
> 
> Thanks,
> Eilidh
> 
> On 13 Nov 2015, at 09:55, Sabarish Sasidharan 
> <sabarish.sasidha...@manthan.com> wrote:
> 
>> We have done this by blocking but without using BlockMatrix. We used our own 
>> blocking mechanism because BlockMatrix didn't exist in Spark 1.2. What is 
>> the size of your block? How much memory are you giving to the executors? I 
>> assume you are running on YARN, if so you would want to make sure your yarn 
>> executor memory overhead is set to a higher value than default.
>> 
>> Just curious, could you also explain why you need matrix multiplication with 
>> transpose? Smells like similarity computation.
>> 
>> Regards
>> Sab
>> 
>> On Thu, Nov 12, 2015 at 7:27 PM, Eilidh Troup <e.tr...@epcc.ed.ac.uk> wrote:
>> Hi,
>> 
>> I’m trying to multiply a large squarish matrix with its transpose. 
>> Eventually I’d like to work with matrices of size 200,000 by 500,000, but 
>> I’ve started off first with 100 by 100 which was fine, and then with 10,000 
>> by 10,000 which failed with an out of memory exception.
>> 
>> I used MLlib and BlockMatrix and tried various block sizes, and also tried 
>> switching disk serialisation on.
>> 
>> We are running on a small cluster, using a CSV file in HDFS as the input 
>> data.
>> 
>> Would anyone with experience of multiplying large, dense matrices in spark 
>> be able to comment on what to try to make this work?
>> 
>> Thanks,
>> Eilidh
>> 
>> 
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
>> 
>> 
>> 
>> -- 
>> 
>> Architect - Big Data
>> Ph: +91 99805 99458
>> 
>> Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan 
>> India ICT)
>> +++
> 
> 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to