[mllib] Add multiplying large scale matrices

2014-09-05 Thread Yu Ishikawa
Hi all, 

It seems that there is a method to multiply a RowMatrix and a (local)
Matrix. 
However, there is not a method to multiply a large scale matrix and another
one in Spark.
It would be helpful. Does anyone have a plan to add multiplying large scale
matrices? 
Or shouldn't  we support it in Spark?

thanks,



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread RJ Nowling
I think it would be interesting to have a variety of matrix operations
(multiplication, addition / subtraction, powers, scalar multiply, etc.)
available in Spark.

Diagonalization may be more difficult but iterative approximation
approaches may be quite amenable.


On Fri, Sep 5, 2014 at 5:26 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com
wrote:

 Hi all,

 It seems that there is a method to multiply a RowMatrix and a (local)
 Matrix.
 However, there is not a method to multiply a large scale matrix and another
 one in Spark.
 It would be helpful. Does anyone have a plan to add multiplying large scale
 matrices?
 Or shouldn't  we support it in Spark?

 thanks,



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-- 
em rnowl...@gmail.com
c 954.496.2314


Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Yu Ishikawa
Hi RJ,

Thank you for your comment. I am interested in to have other matrix
operations too.
I will create a JIRA issue in the first place.

thanks,



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: amplab jenkins is down

2014-09-05 Thread Nicholas Chammas
Hmm, looks like at least some builds
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull
are working now, though this last one was from ~5 hours ago.


On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote:

 yep.  that's exactly the behavior i saw earlier, and will be figuring out
 first thing tomorrow morning.  i bet it's an environment issues on the
 slaves.


 On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Looks like during the last build
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console
 Jenkins was unable to execute a git fetch?


 On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote:

 i'm going to restart jenkins and see if that fixes things.


 On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu wrote:

 looking


 On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 It appears that our main man is having trouble
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/
  hearing new requests
 https://github.com/apache/spark/pull/2277#issuecomment-54549106.

 Do we need some smelling salts?


 On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu
 wrote:

 i'd ping the Jenkinsmench...  the master was completely offline, so
 any new
 jobs wouldn't have reached it.  any jobs that were queued when power
 was
 lost probably started up, but jobs that were running would fail.


 On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com
  wrote:

  Woohoo! Thanks Shane.
 
  Do you know if queued PR builds will automatically be picked up? Or
 do we
  have to ping the Jenkinmensch manually from each PR?
 
  Nick
 
 
  On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu
 wrote:
 
  AND WE'RE UP!
 
  sorry that this took so long...  i'll send out a more detailed
 explanation
  of what happened soon.
 
  now, off to back up jenkins.
 
  shane
 
 
  On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu
 wrote:
 
   it's a faulty power switch on the firewall, which has been
 swapped out.
we're about to reboot and be good to go.
  
  
   On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu
 
  wrote:
  
   looks like some hardware failed, and we're swapping in a
 replacement.
  i
   don't have more specific information yet -- including *what*
 failed,
  as our
   sysadmin is super busy ATM.  the root cause was an incorrect
 circuit
  being
   switched off during building maintenance.
  
   on a side note, this incident will be accelerating our plan to
 move the
   entire jenkins infrastructure in to a managed datacenter
 environment.
  this
   will be our major push over the next couple of weeks.  more
 details
  about
   this, also, as soon as i get them.
  
   i'm very sorry about the downtime, we'll get everything up and
 running
   ASAP.
  
  
   On Thu, Sep 4, 2014 at 12:27 PM, shane knapp 
 skn...@berkeley.edu
  wrote:
  
   looks like a power outage in soda hall.  more updates as they
 happen.
  
  
   On Thu, Sep 4, 2014 at 12:25 PM, shane knapp 
 skn...@berkeley.edu
   wrote:
  
   i am trying to get things up and running, but it looks like
 either
  the
   firewall gateway or jenkins server itself is down.  i'll
 update as
  soon as
   i know more.
  
  
  
  
  
 
 
   --
  You received this message because you are subscribed to the Google
 Groups
  amp-infra group.
  To unsubscribe from this group and stop receiving emails from it,
 send an
  email to amp-infra+unsubscr...@googlegroups.com.
  For more options, visit https://groups.google.com/d/optout.
 









Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Evan R. Sparks
There's some work on this going on in the AMP Lab. Create a ticket and we
can update with our progress so that we don't duplicate effort.


On Fri, Sep 5, 2014 at 8:18 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com
wrote:

 Hi RJ,

 Thank you for your comment. I am interested in to have other matrix
 operations too.
 I will create a JIRA issue in the first place.

 thanks,



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Yu Ishikawa
Hi Evan, 

That's sounds interesting. 

Here is the ticket which I created.
https://issues.apache.org/jira/browse/SPARK-3416

thanks,



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Patrick Wendell
Hey There,

I believe this is on the roadmap for the 1.2 next release. But
Xiangrui can comment on this.

- Patrick

On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa
yuu.ishikawa+sp...@gmail.com wrote:
 Hi Evan,

 That's sounds interesting.

 Here is the ticket which I created.
 https://issues.apache.org/jira/browse/SPARK-3416

 thanks,



 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Shivaram Venkataraman
FWIW matrix multiplication is extremely communication intensive when
you have two row partitioned matrices and there are often other ways
to solve problems. Regardless, it would be good to have a more
complete matrix library and it would be good to contribute some of the
stuff we have done in the AMPLab to MLLib.

Shivaram

On Fri, Sep 5, 2014 at 9:12 AM, Evan R. Sparks evan.spa...@gmail.com wrote:
 There's some work on this going on in the AMP Lab. Create a ticket and we
 can update with our progress so that we don't duplicate effort.


 On Fri, Sep 5, 2014 at 8:18 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com
 wrote:

 Hi RJ,

 Thank you for your comment. I am interested in to have other matrix
 operations too.
 I will create a JIRA issue in the first place.

 thanks,



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: How to kill a Spark job running in local mode programmatically ?

2014-09-05 Thread Marcelo Vanzin
I don't think that's possible at the moment, mainly because
SparkSubmit expects it to be run from the command line, and not
programatically, so it doesn't return anything that can be used to
control what's going on. You may try to interrupt the thread calling
into SparkSubmit, but that might not work - especially if the app
doesn't handle it correctly.

Another thing to consider is that Spark itself doesn't play well with
multiple contexts running in the same JVM, so that would have to be
fixed before having SparkSubmit support that kind of use case.

Have you thought about spawning a child process to run SparkSubmit?
Then you can kill the underlying process if you need to.


On Thu, Sep 4, 2014 at 2:17 PM, randomuser54 talktorohi...@gmail.com wrote:
 I have a java class which calls SparkSubmit.scala with all the arguments to
 run a spark job in a thread. I am running them in local mode for now but
 also want to run them in yarn-cluster mode later.

 Now, I want to kill the running spark job (which can be in local or
 yarn-cluster mode) programmatically.

 I know that SparkContext has a stop() method but from the thread from which
 I am calling the SparkSubmit I don’t have access to it. Can someone suggest
 me how to do this properly ?

 Thanks.




 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-kill-a-Spark-job-running-in-local-mode-programmatically-tp8279.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: amplab jenkins is down

2014-09-05 Thread shane knapp
it's looking like everything except the pull request builders are working.
 i'm going to be working on getting this resolved today.


On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

 Hmm, looks like at least some builds
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull
 are working now, though this last one was from ~5 hours ago.


 On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote:

 yep.  that's exactly the behavior i saw earlier, and will be figuring out
 first thing tomorrow morning.  i bet it's an environment issues on the
 slaves.


 On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Looks like during the last build
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console
 Jenkins was unable to execute a git fetch?


 On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote:

 i'm going to restart jenkins and see if that fixes things.


 On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu
 wrote:

 looking


 On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 It appears that our main man is having trouble
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/
  hearing new requests
 https://github.com/apache/spark/pull/2277#issuecomment-54549106.

 Do we need some smelling salts?


 On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu
 wrote:

 i'd ping the Jenkinsmench...  the master was completely offline, so
 any new
 jobs wouldn't have reached it.  any jobs that were queued when power
 was
 lost probably started up, but jobs that were running would fail.


 On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com
  wrote:

  Woohoo! Thanks Shane.
 
  Do you know if queued PR builds will automatically be picked up?
 Or do we
  have to ping the Jenkinmensch manually from each PR?
 
  Nick
 
 
  On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu
 wrote:
 
  AND WE'RE UP!
 
  sorry that this took so long...  i'll send out a more detailed
 explanation
  of what happened soon.
 
  now, off to back up jenkins.
 
  shane
 
 
  On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu
 wrote:
 
   it's a faulty power switch on the firewall, which has been
 swapped out.
we're about to reboot and be good to go.
  
  
   On Thu, Sep 4, 2014 at 1:19 PM, shane knapp 
 skn...@berkeley.edu
  wrote:
  
   looks like some hardware failed, and we're swapping in a
 replacement.
  i
   don't have more specific information yet -- including *what*
 failed,
  as our
   sysadmin is super busy ATM.  the root cause was an incorrect
 circuit
  being
   switched off during building maintenance.
  
   on a side note, this incident will be accelerating our plan to
 move the
   entire jenkins infrastructure in to a managed datacenter
 environment.
  this
   will be our major push over the next couple of weeks.  more
 details
  about
   this, also, as soon as i get them.
  
   i'm very sorry about the downtime, we'll get everything up and
 running
   ASAP.
  
  
   On Thu, Sep 4, 2014 at 12:27 PM, shane knapp 
 skn...@berkeley.edu
  wrote:
  
   looks like a power outage in soda hall.  more updates as they
 happen.
  
  
   On Thu, Sep 4, 2014 at 12:25 PM, shane knapp 
 skn...@berkeley.edu
   wrote:
  
   i am trying to get things up and running, but it looks like
 either
  the
   firewall gateway or jenkins server itself is down.  i'll
 update as
  soon as
   i know more.
  
  
  
  
  
 
 
   --
  You received this message because you are subscribed to the Google
 Groups
  amp-infra group.
  To unsubscribe from this group and stop receiving emails from it,
 send an
  email to amp-infra+unsubscr...@googlegroups.com.
  For more options, visit https://groups.google.com/d/optout.
 










Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Jeremy Freeman
Hey all, 

Definitely agreed this would be nice! In our own work we've done element-wise 
addition, subtraction, and scalar multiplication of similarly partitioned 
matrices very efficiently with zipping. We've also done matrix-matrix 
multiplication with zipping, but that only works in certain circumstances, and 
it's otherwise very communication intensive (as Shivaram says). Another tricky 
thing with addition / subtraction is how to handle sparse vs. dense arrays.

Would be happy to contribute anything we did, but definitely first worth 
knowing what progress has been made from the AMPLab.

-- Jeremy

-
jeremy freeman, phd
neuroscientist
@thefreemanlab

On Sep 5, 2014, at 12:23 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey There,
 
 I believe this is on the roadmap for the 1.2 next release. But
 Xiangrui can comment on this.
 
 - Patrick
 
 On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa
 yuu.ishikawa+sp...@gmail.com wrote:
 Hi Evan,
 
 That's sounds interesting.
 
 Here is the ticket which I created.
 https://issues.apache.org/jira/browse/SPARK-3416
 
 thanks,
 
 
 
 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
 Sent from the Apache Spark Developers List mailing list archive at 
 Nabble.com.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 



Re: amplab jenkins is down

2014-09-05 Thread Nicholas Chammas
How's it going?

It looks like during the last build
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/lastBuild/console
from about 30 min ago Jenkins was still having trouble fetching from
GitHub. It also looks like not all requests for testing are triggering
builds.


On Fri, Sep 5, 2014 at 1:23 PM, shane knapp skn...@berkeley.edu wrote:

 it's looking like everything except the pull request builders are working.
  i'm going to be working on getting this resolved today.


 On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Hmm, looks like at least some builds
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull
 are working now, though this last one was from ~5 hours ago.


 On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote:

 yep.  that's exactly the behavior i saw earlier, and will be figuring
 out first thing tomorrow morning.  i bet it's an environment issues on the
 slaves.


 On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Looks like during the last build
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console
 Jenkins was unable to execute a git fetch?


 On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu
 wrote:

 i'm going to restart jenkins and see if that fixes things.


 On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu
 wrote:

 looking


 On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 It appears that our main man is having trouble
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/
  hearing new requests
 https://github.com/apache/spark/pull/2277#issuecomment-54549106.

 Do we need some smelling salts?


 On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu
 wrote:

 i'd ping the Jenkinsmench...  the master was completely offline, so
 any new
 jobs wouldn't have reached it.  any jobs that were queued when
 power was
 lost probably started up, but jobs that were running would fail.


 On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com
  wrote:

  Woohoo! Thanks Shane.
 
  Do you know if queued PR builds will automatically be picked up?
 Or do we
  have to ping the Jenkinmensch manually from each PR?
 
  Nick
 
 
  On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu
 wrote:
 
  AND WE'RE UP!
 
  sorry that this took so long...  i'll send out a more detailed
 explanation
  of what happened soon.
 
  now, off to back up jenkins.
 
  shane
 
 
  On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu
 wrote:
 
   it's a faulty power switch on the firewall, which has been
 swapped out.
we're about to reboot and be good to go.
  
  
   On Thu, Sep 4, 2014 at 1:19 PM, shane knapp 
 skn...@berkeley.edu
  wrote:
  
   looks like some hardware failed, and we're swapping in a
 replacement.
  i
   don't have more specific information yet -- including *what*
 failed,
  as our
   sysadmin is super busy ATM.  the root cause was an incorrect
 circuit
  being
   switched off during building maintenance.
  
   on a side note, this incident will be accelerating our plan
 to move the
   entire jenkins infrastructure in to a managed datacenter
 environment.
  this
   will be our major push over the next couple of weeks.  more
 details
  about
   this, also, as soon as i get them.
  
   i'm very sorry about the downtime, we'll get everything up
 and running
   ASAP.
  
  
   On Thu, Sep 4, 2014 at 12:27 PM, shane knapp 
 skn...@berkeley.edu
  wrote:
  
   looks like a power outage in soda hall.  more updates as
 they happen.
  
  
   On Thu, Sep 4, 2014 at 12:25 PM, shane knapp 
 skn...@berkeley.edu
   wrote:
  
   i am trying to get things up and running, but it looks like
 either
  the
   firewall gateway or jenkins server itself is down.  i'll
 update as
  soon as
   i know more.
  
  
  
  
  
 
 
   --
  You received this message because you are subscribed to the
 Google Groups
  amp-infra group.
  To unsubscribe from this group and stop receiving emails from it,
 send an
  email to amp-infra+unsubscr...@googlegroups.com.
  For more options, visit https://groups.google.com/d/optout.
 











Re: Dependency hell in Spark applications

2014-09-05 Thread Tathagata Das
If httpClient dependency is coming from Hive, you could build Spark without
Hive. Alternatively, have you tried excluding httpclient from
spark-streaming dependency in your sbt/maven project?

TD



On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote:

 custom spark builds should not be the answer. at least not if spark ever
 wants to have a vibrant community for spark apps.

 spark does support a user-classpath-first option, which would deal with
 some of these issues, but I don't think it works.
 On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote:

  Hi,
  I run into the same issue and apart from the ideas Aniket said, I only
  could find a nasty workaround. Add my custom
 PoolingClientConnectionManager
  to my classpath.
 
 
 
 http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955
 
 
 
  On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote:
 
   Dumb question -- are you using a Spark build that includes the Kinesis
   dependency? that build would have resolved conflicts like this for
   you. Your app would need to use the same version of the Kinesis client
   SDK, ideally.
  
   All of these ideas are well-known, yes. In cases of super-common
   dependencies like Guava, they are already shaded. This is a
   less-common source of conflicts so I don't think http-client is
   shaded, especially since it is not used directly by Spark. I think
   this is a case of your app conflicting with a third-party dependency?
  
   I think OSGi is deemed too over the top for things like this.
  
   On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
   aniket.bhatna...@gmail.com wrote:
I am trying to use Kinesis as source to Spark Streaming and have run
   into a
dependency issue that can't be resolved without making my own custom
   Spark
build. The issue is that Spark is transitively dependent
on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of
libfb303 coming from hbase and hive-serde) whereas AWS SDK is
 dependent
on org.apache.httpcomponents:httpclient:jar:4.2. When I package and
 run
Spark Streaming application, I get the following:
   
Caused by: java.lang.NoSuchMethodError:
   
  
 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at
   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
at
   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
at
   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
at
   
  
 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
at
   
  
 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
at
com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
at
   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
at
   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
at
   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
at
   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
at
   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)
   
I can create a custom Spark build with
org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly
   but I
was wondering if this is something Spark devs have noticed and are
   looking
to resolve in near releases. Here are my thoughts on this issue:
   
Containers that allow running custom user code have to often resolve
dependency issues in case of conflicts between framework's and user
   code's
dependency. Here is how I have seen some frameworks resolve the
 issue:
1. Provide a child-first class loader: Some JEE containers provided a
child-first class loader that allowed for loading classes from user
  code
first. I don't think this approach completely solves the problem as
 the
framework is then susceptible to class mismatch errors.
2. Fold in all dependencies in a sub-package: This approach involves
folding all dependencies in a project specific sub-package (like
spark.dependencies). This approach is tedious because it involves
   building
custom version of all dependencies (and their transitive
 dependencies)
3. Use something like OSGi: Some frameworks has successfully used
 OSGi
  to
manage dependencies between the modules. The challenge in this
 approach
   is
to 

Re: Dependency hell in Spark applications

2014-09-05 Thread Ted Yu
From output of dependency:tree:

[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @
spark-streaming_2.10 ---
[INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT
INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile
...
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile
[INFO] |  |  +- commons-codec:commons-codec:jar:1.5:compile
[INFO] |  |  +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile
[INFO] |  |  +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile

bq. excluding httpclient from spark-streaming dependency in your sbt/maven
project

This should work.


On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:

 If httpClient dependency is coming from Hive, you could build Spark without
 Hive. Alternatively, have you tried excluding httpclient from
 spark-streaming dependency in your sbt/maven project?

 TD



 On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote:

  custom spark builds should not be the answer. at least not if spark ever
  wants to have a vibrant community for spark apps.
 
  spark does support a user-classpath-first option, which would deal with
  some of these issues, but I don't think it works.
  On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com
 wrote:
 
   Hi,
   I run into the same issue and apart from the ideas Aniket said, I only
   could find a nasty workaround. Add my custom
  PoolingClientConnectionManager
   to my classpath.
  
  
  
 
 http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955
  
  
  
   On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote:
  
Dumb question -- are you using a Spark build that includes the
 Kinesis
dependency? that build would have resolved conflicts like this for
you. Your app would need to use the same version of the Kinesis
 client
SDK, ideally.
   
All of these ideas are well-known, yes. In cases of super-common
dependencies like Guava, they are already shaded. This is a
less-common source of conflicts so I don't think http-client is
shaded, especially since it is not used directly by Spark. I think
this is a case of your app conflicting with a third-party dependency?
   
I think OSGi is deemed too over the top for things like this.
   
On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
 I am trying to use Kinesis as source to Spark Streaming and have
 run
into a
 dependency issue that can't be resolved without making my own
 custom
Spark
 build. The issue is that Spark is transitively dependent
 on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because
 of
 libfb303 coming from hbase and hive-serde) whereas AWS SDK is
  dependent
 on org.apache.httpcomponents:httpclient:jar:4.2. When I package and
  run
 Spark Streaming application, I get the following:

 Caused by: java.lang.NoSuchMethodError:

   
  
 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at

   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at

   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at

   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at

   
  
 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
 at

   
  
 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
 at

 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
 at

   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
 at

   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
 at

   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
 at

   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
 at

   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)

 I can create a custom Spark build with
 org.apache.httpcomponents:httpclient:jar:4.2 included in the
 assembly
but I
 was wondering if this is something Spark devs have noticed and are
looking
 to resolve in near releases. Here are my thoughts on this issue:

 Containers that allow running custom user code have to often
 resolve
 dependency 

Re: amplab jenkins is down

2014-09-05 Thread Josh Rosen
We have successfully purged Jenkins’ build queue.  If you want a PR to be 
re-tested, please ask Jenkins again.

On September 5, 2014 at 5:36:30 PM, shane knapp (skn...@berkeley.edu) wrote:

yeah, it was a problem w/the PRB's OAuth key. josh rosen added a new key,  
and magique!  

we're about to clear the queue of all builds as most aren't wanted/needed.  


On Fri, Sep 5, 2014 at 5:33 PM, Nicholas Chammas nicholas.cham...@gmail.com  
 wrote:  

 Looks like Jenkins is back!  
  
 lol The poor guy has like a million builds  
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/
   
 to catch up on.  
  
  
 On Fri, Sep 5, 2014 at 4:15 PM, Nicholas Chammas   
 nicholas.cham...@gmail.com wrote:  
  
 How's it going?  
  
 It looks like during the last build  
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/lastBuild/console
   
 from about 30 min ago Jenkins was still having trouble fetching from  
 GitHub. It also looks like not all requests for testing are triggering  
 builds.  
  
  
 On Fri, Sep 5, 2014 at 1:23 PM, shane knapp skn...@berkeley.edu wrote:  
  
 it's looking like everything except the pull request builders are  
 working. i'm going to be working on getting this resolved today.  
  
  
 On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas   
 nicholas.cham...@gmail.com wrote:  
  
 Hmm, looks like at least some builds  
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull
   
 are working now, though this last one was from ~5 hours ago.  
  
  
 On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu  
 wrote:  
  
 yep. that's exactly the behavior i saw earlier, and will be figuring  
 out first thing tomorrow morning. i bet it's an environment issues on the 
  
 slaves.  
  
  
 On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas   
 nicholas.cham...@gmail.com wrote:  
  
 Looks like during the last build  
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console
   
 Jenkins was unable to execute a git fetch?  
  
  
 On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu  
 wrote:  
  
 i'm going to restart jenkins and see if that fixes things.  
  
  
 On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu  
 wrote:  
  
 looking  
  
  
 On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas   
 nicholas.cham...@gmail.com wrote:  
  
 It appears that our main man is having trouble  
 https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/
   
 hearing new requests  
 https://github.com/apache/spark/pull/2277#issuecomment-54549106.  
  
 Do we need some smelling salts?  
  
  
 On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu  
 wrote:  
  
 i'd ping the Jenkinsmench... the master was completely offline,  
 so any new  
 jobs wouldn't have reached it. any jobs that were queued when  
 power was  
 lost probably started up, but jobs that were running would fail.  
  
  
 On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas   
 nicholas.cham...@gmail.com  
  wrote:  
  
  Woohoo! Thanks Shane.  
   
  Do you know if queued PR builds will automatically be picked  
 up? Or do we  
  have to ping the Jenkinmensch manually from each PR?  
   
  Nick  
   
   
  On Thu, Sep 4, 2014 at 5:37 PM, shane knapp   
 skn...@berkeley.edu wrote:  
   
  AND WE'RE UP!  
   
  sorry that this took so long... i'll send out a more detailed  
 explanation  
  of what happened soon.  
   
  now, off to back up jenkins.  
   
  shane  
   
   
  On Thu, Sep 4, 2014 at 1:27 PM, shane knapp   
 skn...@berkeley.edu wrote:  
   
   it's a faulty power switch on the firewall, which has been  
 swapped out.  
   we're about to reboot and be good to go.  


   On Thu, Sep 4, 2014 at 1:19 PM, shane knapp   
 skn...@berkeley.edu  
  wrote:  

   looks like some hardware failed, and we're swapping in a  
 replacement.  
  i  
   don't have more specific information yet -- including  
 *what* failed,  
  as our  
   sysadmin is super busy ATM. the root cause was an  
 incorrect circuit  
  being  
   switched off during building maintenance.  

   on a side note, this incident will be accelerating our plan  
 to move the  
   entire jenkins infrastructure in to a managed datacenter  
 environment.  
  this  
   will be our major push over the next couple of weeks. more  
 details  
  about  
   this, also, as soon as i get them.  

   i'm very sorry about the downtime, we'll get everything up  
 and running  
   ASAP.  


   On Thu, Sep 4, 2014 at 12:27 PM, shane knapp   
 skn...@berkeley.edu  
  wrote:  

   looks like a power outage in soda hall. more updates as  
 they happen.  


   On Thu, Sep 4, 2014 at 12:25 PM, shane knapp   
 skn...@berkeley.edu  
   wrote:  

   i am trying to get things 

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread 顾荣
Missed the dev-list last email. Resent it again. Please ignore the
duplicated one.

2014-09-06 11:22 GMT+08:00 顾荣 gurongwal...@gmail.com:

 Hi All,

 This is RongGu from PasaLab at Nanjing Universtiy,China. Actually, we have
 been working on a distributed matrix operations library on Spark this
 summer. It is a Summer Code project hosted by CSDN and Intel Lab (
 http://code.csdn.net/os_camp/8/proposals/26). Previously, the codebase of
 the project is hosted on CSDN's code platform(
 https://code.csdn.net/u014252240/sparkmatrixlib) and we have been writing
 weekly reports on the blog(http://blog.csdn.net/u014252240).

 Now, the project comes to end now. I have moved the project to github
 these days. *Please see the link here *https://github.com/PasaLab/saury .
 We name the project Saury and provide documents to help people know  it
 better.

 Technically, we implement the matrix manipulation on Spark with block
 matrix parallel algorithms to distribute large scale matrix computation
 among cluster nodes. Also, we take advantage of the native linear algebra
 library(e.g BLAS)on each worker node to accelerate the computing process.
 That really makes a difference! See the preliminary performance evaluation
 report at
 https://github.com/PasaLab/saury/wiki/Performance-comparison-on-matrices-multiply

 Currently, we are working on adding more advanced matrix manipulation
 algorithms into Saury, such as matrix factorization and diagonalization
 algorithms. In fact, Saury contains an alpha version distributed LU
 factorization implementation now. Also, we are trying to use Tachyon to
 hold and share the matrix data across the cluster with faster speed.

 Best,
 Rong

 --
 --
 Rong Gu
 Department of Computer Science and Technology
 State Key Laboratory for Novel Software Technology
 Nanjing University
 Email: gurongwal...@gmail.com
 Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/


 2014-09-06 1:29 GMT+08:00 Jeremy Freeman freeman.jer...@gmail.com:

 Hey all,

 Definitely agreed this would be nice! In our own work we've done
 element-wise addition, subtraction, and scalar multiplication of similarly
 partitioned matrices very efficiently with zipping. We've also done
 matrix-matrix multiplication with zipping, but that only works in certain
 circumstances, and it's otherwise very communication intensive (as Shivaram
 says). Another tricky thing with addition / subtraction is how to handle
 sparse vs. dense arrays.

 Would be happy to contribute anything we did, but definitely first worth
 knowing what progress has been made from the AMPLab.

 -- Jeremy

 -
 jeremy freeman, phd
 neuroscientist
 @thefreemanlab

 On Sep 5, 2014, at 12:23 PM, Patrick Wendell pwend...@gmail.com wrote:

  Hey There,
 
  I believe this is on the roadmap for the 1.2 next release. But
  Xiangrui can comment on this.
 
  - Patrick
 
  On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa
  yuu.ishikawa+sp...@gmail.com wrote:
  Hi Evan,
 
  That's sounds interesting.
 
  Here is the ticket which I created.
  https://issues.apache.org/jira/browse/SPARK-3416
 
  thanks,
 
 
 
  --
  View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
  Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 




 --
 --
 Rong Gu
 Department of Computer Science and Technology
 State Key Laboratory for Novel Software Technology
 Nanjing University
 Phone: +86 15850682791
 Email: gurongwal...@gmail.com
 Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/




-- 
--
Rong Gu
Department of Computer Science and Technology
State Key Laboratory for Novel Software Technology
Nanjing University
Phone: +86 15850682791
Email: gurongwal...@gmail.com
Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/