Re: Benchmaking col vs row similarities

2015-04-10 Thread Debasish Das
I will increase memory for the job...that will also fix it right ? On Apr 10, 2015 12:43 PM, Reza Zadeh r...@databricks.com wrote: You should pull in this PR: https://github.com/apache/spark/pull/5364 It should resolve that. It is in master. Best, Reza On Fri, Apr 10, 2015 at 8:32 AM,

Re: Benchmaking col vs row similarities

2015-04-10 Thread Burak Yavuz
Depends... The heartbeat you received happens due to GC pressure (probably due to Full GC). If you increase the memory too much, the GC's may be less frequent, but the Full GC's may take longer. Try increasing the following confs: spark.executor.heartbeatInterval

Re: Benchmaking col vs row similarities

2015-04-10 Thread Reza Zadeh
You should pull in this PR: https://github.com/apache/spark/pull/5364 It should resolve that. It is in master. Best, Reza On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am benchmarking row vs col similarity flow on 60M x 10M matrices... Details are in

Re: Row similarities

2015-01-18 Thread Pat Ferrel
...@gmail.com To: Reza Zadeh r...@databricks.com mailto:r...@databricks.com Cc: user user@spark.apache.org mailto:user@spark.apache.org Sent: Saturday, January 17, 2015 11:29 AM Subject: Re: Row similarities Thanks Reza, interesting approach. I think what I actually want is to calculate pair

Re: Row similarities

2015-01-17 Thread Reza Zadeh
*To:* Reza Zadeh r...@databricks.com *Cc:* user user@spark.apache.org *Sent:* Saturday, January 17, 2015 11:29 AM *Subject:* Re: Row similarities Thanks Reza, interesting approach. I think what I actually want is to calculate pair-wise distance, on second thought. Is there a pattern

Re: Row similarities

2015-01-17 Thread Andrew Musselman
MapReduce impl and the Spark DSL impl per ur preference. From: Andrew Musselman andrew.mussel...@gmail.com To: Reza Zadeh r...@databricks.com Cc: user user@spark.apache.org Sent: Saturday, January 17, 2015 11:29 AM Subject: Re: Row similarities Thanks Reza, interesting approach. I think

Re: Row similarities

2015-01-17 Thread Pat Ferrel
...@gmail.com mailto:andrew.mussel...@gmail.com To: Reza Zadeh r...@databricks.com mailto:r...@databricks.com Cc: user user@spark.apache.org mailto:user@spark.apache.org Sent: Saturday, January 17, 2015 11:29 AM Subject: Re: Row similarities Thanks Reza, interesting approach. I think what I

Re: Row similarities

2015-01-17 Thread Pat Ferrel
@spark.apache.org mailto:user@spark.apache.org Sent: Saturday, January 17, 2015 11:29 AM Subject: Re: Row similarities Thanks Reza, interesting approach. I think what I actually want is to calculate pair-wise distance, on second thought. Is there a pattern for that? On Jan 16, 2015, at 9

Re: Row similarities

2015-01-17 Thread Andrew Musselman
impl per ur preference. From: Andrew Musselman andrew.mussel...@gmail.com To: Reza Zadeh r...@databricks.com Cc: user user@spark.apache.org Sent: Saturday, January 17, 2015 11:29 AM Subject: Re: Row similarities Thanks Reza, interesting approach. I think what I actually want

Re: Row similarities

2015-01-17 Thread Reza Zadeh
...@databricks.com *Cc:* user user@spark.apache.org *Sent:* Saturday, January 17, 2015 11:29 AM *Subject:* Re: Row similarities Thanks Reza, interesting approach. I think what I actually want is to calculate pair-wise distance, on second thought. Is there a pattern for that? On Jan 16, 2015, at 9:53 PM

Re: Row similarities

2015-01-17 Thread Andrew Musselman
Thanks Reza, interesting approach. I think what I actually want is to calculate pair-wise distance, on second thought. Is there a pattern for that? On Jan 16, 2015, at 9:53 PM, Reza Zadeh r...@databricks.com wrote: You can use K-means with a suitably large k. Each cluster should correspond

Re: Row similarities

2015-01-17 Thread Suneel Marthi
Musselman andrew.mussel...@gmail.com To: Reza Zadeh r...@databricks.com Cc: user user@spark.apache.org Sent: Saturday, January 17, 2015 11:29 AM Subject: Re: Row similarities Thanks Reza, interesting approach.  I think what I actually want is to calculate pair-wise distance, on second

Re: Row similarities

2015-01-17 Thread Pat Ferrel
Musselman andrew.mussel...@gmail.com mailto:andrew.mussel...@gmail.com To: Reza Zadeh r...@databricks.com mailto:r...@databricks.com Cc: user user@spark.apache.org mailto:user@spark.apache.org Sent: Saturday, January 17, 2015 11:29 AM Subject: Re: Row similarities Thanks Reza, interesting

Re: Row similarities

2015-01-17 Thread Andrew Musselman
: Andrew Musselman andrew.mussel...@gmail.com To: Reza Zadeh r...@databricks.com Cc: user user@spark.apache.org Sent: Saturday, January 17, 2015 11:29 AM Subject: Re: Row similarities Thanks Reza, interesting approach. I think what I actually want is to calculate pair-wise distance

Row similarities

2015-01-16 Thread Andrew Musselman
What's a good way to calculate similarities between all vector-rows in a matrix or RDD[Vector]? I'm seeing RowMatrix has a columnSimilarities method but I'm not sure I'm going down a good path to transpose a matrix in order to run that.

Re: Row similarities

2015-01-16 Thread Reza Zadeh
You can use K-means https://spark.apache.org/docs/latest/mllib-clustering.html with a suitably large k. Each cluster should correspond to rows that are similar to one another. On Fri, Jan 16, 2015 at 5:18 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: What's a good way to calculate