date:20140905

[mllib] Add multiplying large scale matrices

2014-09-05 Thread Yu Ishikawa

Hi all, 

It seems that there is a method to multiply a RowMatrix and a (local)
Matrix. 
However, there is not a method to multiply a large scale matrix and another
one in Spark.
It would be helpful. Does anyone have a plan to add multiplying large scale
matrices? 
Or shouldn't  we support it in Spark?

thanks,



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread RJ Nowling

I think it would be interesting to have a variety of matrix operations
(multiplication, addition / subtraction, powers, scalar multiply, etc.)
available in Spark.

Diagonalization may be more difficult but iterative approximation
approaches may be quite amenable.


On Fri, Sep 5, 2014 at 5:26 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com
wrote:

 Hi all,

 It seems that there is a method to multiply a RowMatrix and a (local)
 Matrix.
 However, there is not a method to multiply a large scale matrix and another
 one in Spark.
 It would be helpful. Does anyone have a plan to add multiplying large scale
 matrices?
 Or shouldn't  we support it in Spark?

 thanks,



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-- 
em rnowl...@gmail.com
c 954.496.2314

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Yu Ishikawa

Hi RJ,

Thank you for your comment. I am interested in to have other matrix
operations too.
I will create a JIRA issue in the first place.

thanks,



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: amplab jenkins is down

2014-09-05 Thread Nicholas Chammas

Hmm, looks like at least some builds
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull
are working now, though this last one was from ~5 hours ago.

On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote:

yep. that's exactly the behavior i saw earlier, and will be figuring out
first thing tomorrow morning. i bet it's an environment issues on the
slaves.

On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Looks like during the last build
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console
Jenkins was unable to execute a git fetch?

On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote:

i'm going to restart jenkins and see if that fixes things.

On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu wrote:

looking

On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

It appears that our main man is having trouble
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/
hearing new requests
https://github.com/apache/spark/pull/2277#issuecomment-54549106.

Do we need some smelling salts?

On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu
wrote:

i'd ping the Jenkinsmench... the master was completely offline, so
any new
jobs wouldn't have reached it. any jobs that were queued when power
was
lost probably started up, but jobs that were running would fail.

On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas
nicholas.cham...@gmail.com
wrote:

Woohoo! Thanks Shane.

Do you know if queued PR builds will automatically be picked up? Or
do we
have to ping the Jenkinmensch manually from each PR?

Nick

On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu
wrote:

AND WE'RE UP!

sorry that this took so long... i'll send out a more detailed
explanation
of what happened soon.

now, off to back up jenkins.

shane

On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu
wrote:

it's a faulty power switch on the firewall, which has been
swapped out.
we're about to reboot and be good to go.

On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu

wrote:

looks like some hardware failed, and we're swapping in a
replacement.
i
don't have more specific information yet -- including *what*
failed,
as our
sysadmin is super busy ATM. the root cause was an incorrect
circuit
being
switched off during building maintenance.

on a side note, this incident will be accelerating our plan to
move the
entire jenkins infrastructure in to a managed datacenter
environment.
this
will be our major push over the next couple of weeks. more
details
about
this, also, as soon as i get them.

i'm very sorry about the downtime, we'll get everything up and
running
ASAP.

On Thu, Sep 4, 2014 at 12:27 PM, shane knapp
skn...@berkeley.edu
wrote:

looks like a power outage in soda hall. more updates as they
happen.

On Thu, Sep 4, 2014 at 12:25 PM, shane knapp
skn...@berkeley.edu
wrote:

i am trying to get things up and running, but it looks like
either
the
firewall gateway or jenkins server itself is down. i'll
update as
soon as
i know more.

--
You received this message because you are subscribed to the Google
Groups
amp-infra group.
To unsubscribe from this group and stop receiving emails from it,
send an
email to amp-infra+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Evan R. Sparks

There's some work on this going on in the AMP Lab. Create a ticket and we
can update with our progress so that we don't duplicate effort.


On Fri, Sep 5, 2014 at 8:18 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com
wrote:

 Hi RJ,

 Thank you for your comment. I am interested in to have other matrix
 operations too.
 I will create a JIRA issue in the first place.

 thanks,



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Yu Ishikawa

Hi Evan, 

That's sounds interesting. 

Here is the ticket which I created.
https://issues.apache.org/jira/browse/SPARK-3416

thanks,



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Patrick Wendell

Hey There,

I believe this is on the roadmap for the 1.2 next release. But
Xiangrui can comment on this.

- Patrick

On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa
yuu.ishikawa+sp...@gmail.com wrote:
 Hi Evan,

 That's sounds interesting.

 Here is the ticket which I created.
 https://issues.apache.org/jira/browse/SPARK-3416

 thanks,



 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Shivaram Venkataraman

FWIW matrix multiplication is extremely communication intensive when
you have two row partitioned matrices and there are often other ways
to solve problems. Regardless, it would be good to have a more
complete matrix library and it would be good to contribute some of the
stuff we have done in the AMPLab to MLLib.

Shivaram

On Fri, Sep 5, 2014 at 9:12 AM, Evan R. Sparks evan.spa...@gmail.com wrote:
 There's some work on this going on in the AMP Lab. Create a ticket and we
 can update with our progress so that we don't duplicate effort.


 On Fri, Sep 5, 2014 at 8:18 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com
 wrote:

 Hi RJ,

 Thank you for your comment. I am interested in to have other matrix
 operations too.
 I will create a JIRA issue in the first place.

 thanks,



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: How to kill a Spark job running in local mode programmatically ?

2014-09-05 Thread Marcelo Vanzin

I don't think that's possible at the moment, mainly because
SparkSubmit expects it to be run from the command line, and not
programatically, so it doesn't return anything that can be used to
control what's going on. You may try to interrupt the thread calling
into SparkSubmit, but that might not work - especially if the app
doesn't handle it correctly.

Another thing to consider is that Spark itself doesn't play well with
multiple contexts running in the same JVM, so that would have to be
fixed before having SparkSubmit support that kind of use case.

Have you thought about spawning a child process to run SparkSubmit?
Then you can kill the underlying process if you need to.


On Thu, Sep 4, 2014 at 2:17 PM, randomuser54 talktorohi...@gmail.com wrote:
 I have a java class which calls SparkSubmit.scala with all the arguments to
 run a spark job in a thread. I am running them in local mode for now but
 also want to run them in yarn-cluster mode later.

 Now, I want to kill the running spark job (which can be in local or
 yarn-cluster mode) programmatically.

 I know that SparkContext has a stop() method but from the thread from which
 I am calling the SparkSubmit I don’t have access to it. Can someone suggest
 me how to do this properly ?

 Thanks.




 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-kill-a-Spark-job-running-in-local-mode-programmatically-tp8279.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: amplab jenkins is down

2014-09-05 Thread shane knapp

it's looking like everything except the pull request builders are working.
i'm going to be working on getting this resolved today.

On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:

On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote:

yep. that's exactly the behavior i saw earlier, and will be figuring out
first thing tomorrow morning. i bet it's an environment issues on the
slaves.

On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Looks like during the last build
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console
Jenkins was unable to execute a git fetch?

On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote:

i'm going to restart jenkins and see if that fixes things.

On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu
wrote:

looking

On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Do we need some smelling salts?

On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu
wrote:

On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas
nicholas.cham...@gmail.com
wrote:

Woohoo! Thanks Shane.

Do you know if queued PR builds will automatically be picked up?
Or do we
have to ping the Jenkinmensch manually from each PR?

Nick

On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu
wrote:

AND WE'RE UP!

sorry that this took so long... i'll send out a more detailed
explanation
of what happened soon.

now, off to back up jenkins.

shane

On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu
wrote:

it's a faulty power switch on the firewall, which has been
swapped out.
we're about to reboot and be good to go.

On Thu, Sep 4, 2014 at 1:19 PM, shane knapp
skn...@berkeley.edu
wrote:

i'm very sorry about the downtime, we'll get everything up and
running
ASAP.

On Thu, Sep 4, 2014 at 12:27 PM, shane knapp
skn...@berkeley.edu
wrote:

looks like a power outage in soda hall. more updates as they
happen.

On Thu, Sep 4, 2014 at 12:25 PM, shane knapp
skn...@berkeley.edu
wrote:

i am trying to get things up and running, but it looks like
either
the
firewall gateway or jenkins server itself is down. i'll
update as
soon as
i know more.

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Jeremy Freeman

Hey all,

Definitely agreed this would be nice! In our own work we've done element-wise
addition, subtraction, and scalar multiplication of similarly partitioned
matrices very efficiently with zipping. We've also done matrix-matrix
multiplication with zipping, but that only works in certain circumstances, and
it's otherwise very communication intensive (as Shivaram says). Another tricky
thing with addition / subtraction is how to handle sparse vs. dense arrays.

Would be happy to contribute anything we did, but definitely first worth
knowing what progress has been made from the AMPLab.

-- Jeremy

-
jeremy freeman, phd
neuroscientist
@thefreemanlab

On Sep 5, 2014, at 12:23 PM, Patrick Wendell pwend...@gmail.com wrote:

Hey There,

I believe this is on the roadmap for the 1.2 next release. But
Xiangrui can comment on this.

- Patrick

On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa
yuu.ishikawa+sp...@gmail.com wrote:
Hi Evan,

That's sounds interesting.

Here is the ticket which I created.
https://issues.apache.org/jira/browse/SPARK-3416

thanks,

--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
Sent from the Apache Spark Developers List mailing list archive at
Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: amplab jenkins is down

2014-09-05 Thread Nicholas Chammas

How's it going?

It looks like during the last build
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/lastBuild/console
from about 30 min ago Jenkins was still having trouble fetching from
GitHub. It also looks like not all requests for testing are triggering
builds.

On Fri, Sep 5, 2014 at 1:23 PM, shane knapp skn...@berkeley.edu wrote:

it's looking like everything except the pull request builders are working.
i'm going to be working on getting this resolved today.

On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote:

yep. that's exactly the behavior i saw earlier, and will be figuring
out first thing tomorrow morning. i bet it's an environment issues on the
slaves.

On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Looks like during the last build
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console
Jenkins was unable to execute a git fetch?

On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu
wrote:

i'm going to restart jenkins and see if that fixes things.

On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu
wrote:

looking

On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Do we need some smelling salts?

On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu
wrote:

i'd ping the Jenkinsmench... the master was completely offline, so
any new
jobs wouldn't have reached it. any jobs that were queued when
power was
lost probably started up, but jobs that were running would fail.

On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas
nicholas.cham...@gmail.com
wrote:

Woohoo! Thanks Shane.

Do you know if queued PR builds will automatically be picked up?
Or do we
have to ping the Jenkinmensch manually from each PR?

Nick

On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu
wrote:

AND WE'RE UP!

sorry that this took so long... i'll send out a more detailed
explanation
of what happened soon.

now, off to back up jenkins.

shane

On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu
wrote:

it's a faulty power switch on the firewall, which has been
swapped out.
we're about to reboot and be good to go.

On Thu, Sep 4, 2014 at 1:19 PM, shane knapp
skn...@berkeley.edu
wrote:

on a side note, this incident will be accelerating our plan
to move the
entire jenkins infrastructure in to a managed datacenter
environment.
this
will be our major push over the next couple of weeks. more
details
about
this, also, as soon as i get them.

i'm very sorry about the downtime, we'll get everything up
and running
ASAP.

On Thu, Sep 4, 2014 at 12:27 PM, shane knapp
skn...@berkeley.edu
wrote:

looks like a power outage in soda hall. more updates as
they happen.

On Thu, Sep 4, 2014 at 12:25 PM, shane knapp
skn...@berkeley.edu
wrote:

i am trying to get things up and running, but it looks like
either
the
firewall gateway or jenkins server itself is down. i'll
update as
soon as
i know more.

--
You received this message because you are subscribed to the
Google Groups
amp-infra group.
To unsubscribe from this group and stop receiving emails from it,
send an
email to amp-infra+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Dependency hell in Spark applications

2014-09-05 Thread Tathagata Das

If httpClient dependency is coming from Hive, you could build Spark without
Hive. Alternatively, have you tried excluding httpclient from
spark-streaming dependency in your sbt/maven project?

TD



On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote:

 custom spark builds should not be the answer. at least not if spark ever
 wants to have a vibrant community for spark apps.

 spark does support a user-classpath-first option, which would deal with
 some of these issues, but I don't think it works.
 On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote:

  Hi,
  I run into the same issue and apart from the ideas Aniket said, I only
  could find a nasty workaround. Add my custom
 PoolingClientConnectionManager
  to my classpath.
 
 
 
 http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955
 
 
 
  On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote:
 
   Dumb question -- are you using a Spark build that includes the Kinesis
   dependency? that build would have resolved conflicts like this for
   you. Your app would need to use the same version of the Kinesis client
   SDK, ideally.
  
   All of these ideas are well-known, yes. In cases of super-common
   dependencies like Guava, they are already shaded. This is a
   less-common source of conflicts so I don't think http-client is
   shaded, especially since it is not used directly by Spark. I think
   this is a case of your app conflicting with a third-party dependency?
  
   I think OSGi is deemed too over the top for things like this.
  
   On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
   aniket.bhatna...@gmail.com wrote:
I am trying to use Kinesis as source to Spark Streaming and have run
   into a
dependency issue that can't be resolved without making my own custom
   Spark
build. The issue is that Spark is transitively dependent
on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of
libfb303 coming from hbase and hive-serde) whereas AWS SDK is
 dependent
on org.apache.httpcomponents:httpclient:jar:4.2. When I package and
 run
Spark Streaming application, I get the following:
   
Caused by: java.lang.NoSuchMethodError:
   
  
 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at
   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
at
   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
at
   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
at
   
  
 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
at
   
  
 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
at
com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
at
   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
at
   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
at
   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
at
   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
at
   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)
   
I can create a custom Spark build with
org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly
   but I
was wondering if this is something Spark devs have noticed and are
   looking
to resolve in near releases. Here are my thoughts on this issue:
   
Containers that allow running custom user code have to often resolve
dependency issues in case of conflicts between framework's and user
   code's
dependency. Here is how I have seen some frameworks resolve the
 issue:
1. Provide a child-first class loader: Some JEE containers provided a
child-first class loader that allowed for loading classes from user
  code
first. I don't think this approach completely solves the problem as
 the
framework is then susceptible to class mismatch errors.
2. Fold in all dependencies in a sub-package: This approach involves
folding all dependencies in a project specific sub-package (like
spark.dependencies). This approach is tedious because it involves
   building
custom version of all dependencies (and their transitive
 dependencies)
3. Use something like OSGi: Some frameworks has successfully used
 OSGi
  to
manage dependencies between the modules. The challenge in this
 approach
   is
to

Re: Dependency hell in Spark applications

2014-09-05 Thread Ted Yu

From output of dependency:tree:

[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @
spark-streaming_2.10 ---
[INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT
INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile
...
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile
[INFO] |  |  +- commons-codec:commons-codec:jar:1.5:compile
[INFO] |  |  +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile
[INFO] |  |  +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile

bq. excluding httpclient from spark-streaming dependency in your sbt/maven
project

This should work.


On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:

 If httpClient dependency is coming from Hive, you could build Spark without
 Hive. Alternatively, have you tried excluding httpclient from
 spark-streaming dependency in your sbt/maven project?

 TD



 On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote:

  custom spark builds should not be the answer. at least not if spark ever
  wants to have a vibrant community for spark apps.
 
  spark does support a user-classpath-first option, which would deal with
  some of these issues, but I don't think it works.
  On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com
 wrote:
 
   Hi,
   I run into the same issue and apart from the ideas Aniket said, I only
   could find a nasty workaround. Add my custom
  PoolingClientConnectionManager
   to my classpath.
  
  
  
 
 http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955
  
  
  
   On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote:
  
Dumb question -- are you using a Spark build that includes the
 Kinesis
dependency? that build would have resolved conflicts like this for
you. Your app would need to use the same version of the Kinesis
 client
SDK, ideally.
   
All of these ideas are well-known, yes. In cases of super-common
dependencies like Guava, they are already shaded. This is a
less-common source of conflicts so I don't think http-client is
shaded, especially since it is not used directly by Spark. I think
this is a case of your app conflicting with a third-party dependency?
   
I think OSGi is deemed too over the top for things like this.
   
On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
 I am trying to use Kinesis as source to Spark Streaming and have
 run
into a
 dependency issue that can't be resolved without making my own
 custom
Spark
 build. The issue is that Spark is transitively dependent
 on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because
 of
 libfb303 coming from hbase and hive-serde) whereas AWS SDK is
  dependent
 on org.apache.httpcomponents:httpclient:jar:4.2. When I package and
  run
 Spark Streaming application, I get the following:

 Caused by: java.lang.NoSuchMethodError:

   
  
 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at

   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at

   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at

   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at

   
  
 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
 at

   
  
 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
 at

 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
 at

   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
 at

   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
 at

   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
 at

   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
 at

   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)

 I can create a custom Spark build with
 org.apache.httpcomponents:httpclient:jar:4.2 included in the
 assembly
but I
 was wondering if this is something Spark devs have noticed and are
looking
 to resolve in near releases. Here are my thoughts on this issue:

 Containers that allow running custom user code have to often
 resolve
 dependency

Re: amplab jenkins is down

2014-09-05 Thread Josh Rosen

We have successfully purged Jenkins’ build queue. If you want a PR to be
re-tested, please ask Jenkins again.

On September 5, 2014 at 5:36:30 PM, shane knapp (skn...@berkeley.edu) wrote:

yeah, it was a problem w/the PRB's OAuth key. josh rosen added a new key,
and magique!

we're about to clear the queue of all builds as most aren't wanted/needed.

On Fri, Sep 5, 2014 at 5:33 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:

Looks like Jenkins is back!

lol The poor guy has like a million builds
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/

to catch up on.

On Fri, Sep 5, 2014 at 4:15 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

How's it going?

It looks like during the last build
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/lastBuild/console

from about 30 min ago Jenkins was still having trouble fetching from
GitHub. It also looks like not all requests for testing are triggering
builds.

On Fri, Sep 5, 2014 at 1:23 PM, shane knapp skn...@berkeley.edu wrote:

it's looking like everything except the pull request builders are
working. i'm going to be working on getting this resolved today.

On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Hmm, looks like at least some builds
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull

are working now, though this last one was from ~5 hours ago.

On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu
wrote:

yep. that's exactly the behavior i saw earlier, and will be figuring
out first thing tomorrow morning. i bet it's an environment issues on the

slaves.

On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Looks like during the last build
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console

Jenkins was unable to execute a git fetch?

On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu
wrote:

i'm going to restart jenkins and see if that fixes things.

On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu
wrote:

looking

On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

It appears that our main man is having trouble
https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/

hearing new requests
https://github.com/apache/spark/pull/2277#issuecomment-54549106.

Do we need some smelling salts?

On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu
wrote:

i'd ping the Jenkinsmench... the master was completely offline,
so any new
jobs wouldn't have reached it. any jobs that were queued when
power was
lost probably started up, but jobs that were running would fail.

On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas
nicholas.cham...@gmail.com
wrote:

Woohoo! Thanks Shane.

Do you know if queued PR builds will automatically be picked
up? Or do we
have to ping the Jenkinmensch manually from each PR?

Nick

On Thu, Sep 4, 2014 at 5:37 PM, shane knapp
skn...@berkeley.edu wrote:

AND WE'RE UP!

sorry that this took so long... i'll send out a more detailed
explanation
of what happened soon.

now, off to back up jenkins.

shane

On Thu, Sep 4, 2014 at 1:27 PM, shane knapp
skn...@berkeley.edu wrote:

it's a faulty power switch on the firewall, which has been
swapped out.
we're about to reboot and be good to go.

On Thu, Sep 4, 2014 at 1:19 PM, shane knapp
skn...@berkeley.edu
wrote:

looks like some hardware failed, and we're swapping in a
replacement.
i
don't have more specific information yet -- including
*what* failed,
as our
sysadmin is super busy ATM. the root cause was an
incorrect circuit
being
switched off during building maintenance.

on a side note, this incident will be accelerating our plan
to move the
entire jenkins infrastructure in to a managed datacenter
environment.
this
will be our major push over the next couple of weeks. more
details
about
this, also, as soon as i get them.

i'm very sorry about the downtime, we'll get everything up
and running
ASAP.

On Thu, Sep 4, 2014 at 12:27 PM, shane knapp
skn...@berkeley.edu
wrote:

looks like a power outage in soda hall. more updates as
they happen.

On Thu, Sep 4, 2014 at 12:25 PM, shane knapp
skn...@berkeley.edu
wrote:

i am trying to get things

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread 顾荣

Missed the dev-list last email. Resent it again. Please ignore the
duplicated one.

2014-09-06 11:22 GMT+08:00 顾荣 gurongwal...@gmail.com:

Hi All,

This is RongGu from PasaLab at Nanjing Universtiy,China. Actually, we have
been working on a distributed matrix operations library on Spark this
summer. It is a Summer Code project hosted by CSDN and Intel Lab (
http://code.csdn.net/os_camp/8/proposals/26). Previously, the codebase of
the project is hosted on CSDN's code platform(
https://code.csdn.net/u014252240/sparkmatrixlib) and we have been writing
weekly reports on the blog(http://blog.csdn.net/u014252240).

Now, the project comes to end now. I have moved the project to github
these days. *Please see the link here *https://github.com/PasaLab/saury .
We name the project Saury and provide documents to help people know it
better.

Technically, we implement the matrix manipulation on Spark with block
matrix parallel algorithms to distribute large scale matrix computation
among cluster nodes. Also, we take advantage of the native linear algebra
library（e.g BLAS）on each worker node to accelerate the computing process.
That really makes a difference! See the preliminary performance evaluation
report at
https://github.com/PasaLab/saury/wiki/Performance-comparison-on-matrices-multiply

Currently, we are working on adding more advanced matrix manipulation
algorithms into Saury, such as matrix factorization and diagonalization
algorithms. In fact, Saury contains an alpha version distributed LU
factorization implementation now. Also, we are trying to use Tachyon to
hold and share the matrix data across the cluster with faster speed.

Best,
Rong

--
--
Rong Gu
Department of Computer Science and Technology
State Key Laboratory for Novel Software Technology
Nanjing University
Email: gurongwal...@gmail.com
Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/

2014-09-06 1:29 GMT+08:00 Jeremy Freeman freeman.jer...@gmail.com:

Hey all,

Definitely agreed this would be nice! In our own work we've done
element-wise addition, subtraction, and scalar multiplication of similarly
partitioned matrices very efficiently with zipping. We've also done
matrix-matrix multiplication with zipping, but that only works in certain
circumstances, and it's otherwise very communication intensive (as Shivaram
says). Another tricky thing with addition / subtraction is how to handle
sparse vs. dense arrays.

Would be happy to contribute anything we did, but definitely first worth
knowing what progress has been made from the AMPLab.

-- Jeremy

-
jeremy freeman, phd
neuroscientist
@thefreemanlab

On Sep 5, 2014, at 12:23 PM, Patrick Wendell pwend...@gmail.com wrote:

Hey There,

I believe this is on the roadmap for the 1.2 next release. But
Xiangrui can comment on this.

- Patrick

On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa
yuu.ishikawa+sp...@gmail.com wrote:
Hi Evan,

That's sounds interesting.

Here is the ticket which I created.
https://issues.apache.org/jira/browse/SPARK-3416

thanks,

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

--
--
Rong Gu
Department of Computer Science and Technology
State Key Laboratory for Novel Software Technology
Nanjing University
Phone: +86 15850682791
Email: gurongwal...@gmail.com
Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/

[mllib] Add multiplying large scale matrices

Re: [mllib] Add multiplying large scale matrices

Re: [mllib] Add multiplying large scale matrices

Re: amplab jenkins is down

Re: [mllib] Add multiplying large scale matrices

Re: [mllib] Add multiplying large scale matrices

Re: [mllib] Add multiplying large scale matrices

Re: [mllib] Add multiplying large scale matrices

Re: How to kill a Spark job running in local mode programmatically ?

Re: amplab jenkins is down

Re: [mllib] Add multiplying large scale matrices

Re: amplab jenkins is down

Re: Dependency hell in Spark applications

Re: Dependency hell in Spark applications

Re: amplab jenkins is down

Re: [mllib] Add multiplying large scale matrices

16 matches

Site Navigation

Mail list logo

Footer information