[mllib] Add multiplying large scale matrices
Hi all, It seems that there is a method to multiply a RowMatrix and a (local) Matrix. However, there is not a method to multiply a large scale matrix and another one in Spark. It would be helpful. Does anyone have a plan to add multiplying large scale matrices? Or shouldn't we support it in Spark? thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
I think it would be interesting to have a variety of matrix operations (multiplication, addition / subtraction, powers, scalar multiply, etc.) available in Spark. Diagonalization may be more difficult but iterative approximation approaches may be quite amenable. On Fri, Sep 5, 2014 at 5:26 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Hi all, It seems that there is a method to multiply a RowMatrix and a (local) Matrix. However, there is not a method to multiply a large scale matrix and another one in Spark. It would be helpful. Does anyone have a plan to add multiplying large scale matrices? Or shouldn't we support it in Spark? thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314
Re: [mllib] Add multiplying large scale matrices
Hi RJ, Thank you for your comment. I am interested in to have other matrix operations too. I will create a JIRA issue in the first place. thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: amplab jenkins is down
Hmm, looks like at least some builds https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull are working now, though this last one was from ~5 hours ago. On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote: yep. that's exactly the behavior i saw earlier, and will be figuring out first thing tomorrow morning. i bet it's an environment issues on the slaves. On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Looks like during the last build https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console Jenkins was unable to execute a git fetch? On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote: i'm going to restart jenkins and see if that fixes things. On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu wrote: looking On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It appears that our main man is having trouble https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/ hearing new requests https://github.com/apache/spark/pull/2277#issuecomment-54549106. Do we need some smelling salts? On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu wrote: i'd ping the Jenkinsmench... the master was completely offline, so any new jobs wouldn't have reached it. any jobs that were queued when power was lost probably started up, but jobs that were running would fail. On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Woohoo! Thanks Shane. Do you know if queued PR builds will automatically be picked up? Or do we have to ping the Jenkinmensch manually from each PR? Nick On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu wrote: AND WE'RE UP! sorry that this took so long... i'll send out a more detailed explanation of what happened soon. now, off to back up jenkins. shane On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu wrote: it's a faulty power switch on the firewall, which has been swapped out. we're about to reboot and be good to go. On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu wrote: looks like some hardware failed, and we're swapping in a replacement. i don't have more specific information yet -- including *what* failed, as our sysadmin is super busy ATM. the root cause was an incorrect circuit being switched off during building maintenance. on a side note, this incident will be accelerating our plan to move the entire jenkins infrastructure in to a managed datacenter environment. this will be our major push over the next couple of weeks. more details about this, also, as soon as i get them. i'm very sorry about the downtime, we'll get everything up and running ASAP. On Thu, Sep 4, 2014 at 12:27 PM, shane knapp skn...@berkeley.edu wrote: looks like a power outage in soda hall. more updates as they happen. On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu wrote: i am trying to get things up and running, but it looks like either the firewall gateway or jenkins server itself is down. i'll update as soon as i know more. -- You received this message because you are subscribed to the Google Groups amp-infra group. To unsubscribe from this group and stop receiving emails from it, send an email to amp-infra+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [mllib] Add multiplying large scale matrices
There's some work on this going on in the AMP Lab. Create a ticket and we can update with our progress so that we don't duplicate effort. On Fri, Sep 5, 2014 at 8:18 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Hi RJ, Thank you for your comment. I am interested in to have other matrix operations too. I will create a JIRA issue in the first place. thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
Hi Evan, That's sounds interesting. Here is the ticket which I created. https://issues.apache.org/jira/browse/SPARK-3416 thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
Hey There, I believe this is on the roadmap for the 1.2 next release. But Xiangrui can comment on this. - Patrick On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Hi Evan, That's sounds interesting. Here is the ticket which I created. https://issues.apache.org/jira/browse/SPARK-3416 thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
FWIW matrix multiplication is extremely communication intensive when you have two row partitioned matrices and there are often other ways to solve problems. Regardless, it would be good to have a more complete matrix library and it would be good to contribute some of the stuff we have done in the AMPLab to MLLib. Shivaram On Fri, Sep 5, 2014 at 9:12 AM, Evan R. Sparks evan.spa...@gmail.com wrote: There's some work on this going on in the AMP Lab. Create a ticket and we can update with our progress so that we don't duplicate effort. On Fri, Sep 5, 2014 at 8:18 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Hi RJ, Thank you for your comment. I am interested in to have other matrix operations too. I will create a JIRA issue in the first place. thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: How to kill a Spark job running in local mode programmatically ?
I don't think that's possible at the moment, mainly because SparkSubmit expects it to be run from the command line, and not programatically, so it doesn't return anything that can be used to control what's going on. You may try to interrupt the thread calling into SparkSubmit, but that might not work - especially if the app doesn't handle it correctly. Another thing to consider is that Spark itself doesn't play well with multiple contexts running in the same JVM, so that would have to be fixed before having SparkSubmit support that kind of use case. Have you thought about spawning a child process to run SparkSubmit? Then you can kill the underlying process if you need to. On Thu, Sep 4, 2014 at 2:17 PM, randomuser54 talktorohi...@gmail.com wrote: I have a java class which calls SparkSubmit.scala with all the arguments to run a spark job in a thread. I am running them in local mode for now but also want to run them in yarn-cluster mode later. Now, I want to kill the running spark job (which can be in local or yarn-cluster mode) programmatically. I know that SparkContext has a stop() method but from the thread from which I am calling the SparkSubmit I don’t have access to it. Can someone suggest me how to do this properly ? Thanks. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-kill-a-Spark-job-running-in-local-mode-programmatically-tp8279.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: amplab jenkins is down
it's looking like everything except the pull request builders are working. i'm going to be working on getting this resolved today. On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Hmm, looks like at least some builds https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull are working now, though this last one was from ~5 hours ago. On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote: yep. that's exactly the behavior i saw earlier, and will be figuring out first thing tomorrow morning. i bet it's an environment issues on the slaves. On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Looks like during the last build https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console Jenkins was unable to execute a git fetch? On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote: i'm going to restart jenkins and see if that fixes things. On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu wrote: looking On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It appears that our main man is having trouble https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/ hearing new requests https://github.com/apache/spark/pull/2277#issuecomment-54549106. Do we need some smelling salts? On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu wrote: i'd ping the Jenkinsmench... the master was completely offline, so any new jobs wouldn't have reached it. any jobs that were queued when power was lost probably started up, but jobs that were running would fail. On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Woohoo! Thanks Shane. Do you know if queued PR builds will automatically be picked up? Or do we have to ping the Jenkinmensch manually from each PR? Nick On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu wrote: AND WE'RE UP! sorry that this took so long... i'll send out a more detailed explanation of what happened soon. now, off to back up jenkins. shane On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu wrote: it's a faulty power switch on the firewall, which has been swapped out. we're about to reboot and be good to go. On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu wrote: looks like some hardware failed, and we're swapping in a replacement. i don't have more specific information yet -- including *what* failed, as our sysadmin is super busy ATM. the root cause was an incorrect circuit being switched off during building maintenance. on a side note, this incident will be accelerating our plan to move the entire jenkins infrastructure in to a managed datacenter environment. this will be our major push over the next couple of weeks. more details about this, also, as soon as i get them. i'm very sorry about the downtime, we'll get everything up and running ASAP. On Thu, Sep 4, 2014 at 12:27 PM, shane knapp skn...@berkeley.edu wrote: looks like a power outage in soda hall. more updates as they happen. On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu wrote: i am trying to get things up and running, but it looks like either the firewall gateway or jenkins server itself is down. i'll update as soon as i know more. -- You received this message because you are subscribed to the Google Groups amp-infra group. To unsubscribe from this group and stop receiving emails from it, send an email to amp-infra+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [mllib] Add multiplying large scale matrices
Hey all, Definitely agreed this would be nice! In our own work we've done element-wise addition, subtraction, and scalar multiplication of similarly partitioned matrices very efficiently with zipping. We've also done matrix-matrix multiplication with zipping, but that only works in certain circumstances, and it's otherwise very communication intensive (as Shivaram says). Another tricky thing with addition / subtraction is how to handle sparse vs. dense arrays. Would be happy to contribute anything we did, but definitely first worth knowing what progress has been made from the AMPLab. -- Jeremy - jeremy freeman, phd neuroscientist @thefreemanlab On Sep 5, 2014, at 12:23 PM, Patrick Wendell pwend...@gmail.com wrote: Hey There, I believe this is on the roadmap for the 1.2 next release. But Xiangrui can comment on this. - Patrick On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Hi Evan, That's sounds interesting. Here is the ticket which I created. https://issues.apache.org/jira/browse/SPARK-3416 thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: amplab jenkins is down
How's it going? It looks like during the last build https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/lastBuild/console from about 30 min ago Jenkins was still having trouble fetching from GitHub. It also looks like not all requests for testing are triggering builds. On Fri, Sep 5, 2014 at 1:23 PM, shane knapp skn...@berkeley.edu wrote: it's looking like everything except the pull request builders are working. i'm going to be working on getting this resolved today. On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Hmm, looks like at least some builds https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull are working now, though this last one was from ~5 hours ago. On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote: yep. that's exactly the behavior i saw earlier, and will be figuring out first thing tomorrow morning. i bet it's an environment issues on the slaves. On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Looks like during the last build https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console Jenkins was unable to execute a git fetch? On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote: i'm going to restart jenkins and see if that fixes things. On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu wrote: looking On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It appears that our main man is having trouble https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/ hearing new requests https://github.com/apache/spark/pull/2277#issuecomment-54549106. Do we need some smelling salts? On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu wrote: i'd ping the Jenkinsmench... the master was completely offline, so any new jobs wouldn't have reached it. any jobs that were queued when power was lost probably started up, but jobs that were running would fail. On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Woohoo! Thanks Shane. Do you know if queued PR builds will automatically be picked up? Or do we have to ping the Jenkinmensch manually from each PR? Nick On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu wrote: AND WE'RE UP! sorry that this took so long... i'll send out a more detailed explanation of what happened soon. now, off to back up jenkins. shane On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu wrote: it's a faulty power switch on the firewall, which has been swapped out. we're about to reboot and be good to go. On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu wrote: looks like some hardware failed, and we're swapping in a replacement. i don't have more specific information yet -- including *what* failed, as our sysadmin is super busy ATM. the root cause was an incorrect circuit being switched off during building maintenance. on a side note, this incident will be accelerating our plan to move the entire jenkins infrastructure in to a managed datacenter environment. this will be our major push over the next couple of weeks. more details about this, also, as soon as i get them. i'm very sorry about the downtime, we'll get everything up and running ASAP. On Thu, Sep 4, 2014 at 12:27 PM, shane knapp skn...@berkeley.edu wrote: looks like a power outage in soda hall. more updates as they happen. On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu wrote: i am trying to get things up and running, but it looks like either the firewall gateway or jenkins server itself is down. i'll update as soon as i know more. -- You received this message because you are subscribed to the Google Groups amp-infra group. To unsubscribe from this group and stop receiving emails from it, send an email to amp-infra+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Dependency hell in Spark applications
If httpClient dependency is coming from Hive, you could build Spark without Hive. Alternatively, have you tried excluding httpclient from spark-streaming dependency in your sbt/maven project? TD On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote: custom spark builds should not be the answer. at least not if spark ever wants to have a vibrant community for spark apps. spark does support a user-classpath-first option, which would deal with some of these issues, but I don't think it works. On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote: Hi, I run into the same issue and apart from the ideas Aniket said, I only could find a nasty workaround. Add my custom PoolingClientConnectionManager to my classpath. http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955 On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote: Dumb question -- are you using a Spark build that includes the Kinesis dependency? that build would have resolved conflicts like this for you. Your app would need to use the same version of the Kinesis client SDK, ideally. All of these ideas are well-known, yes. In cases of super-common dependencies like Guava, they are already shaded. This is a less-common source of conflicts so I don't think http-client is shaded, especially since it is not used directly by Spark. I think this is a case of your app conflicting with a third-party dependency? I think OSGi is deemed too over the top for things like this. On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am trying to use Kinesis as source to Spark Streaming and have run into a dependency issue that can't be resolved without making my own custom Spark build. The issue is that Spark is transitively dependent on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run Spark Streaming application, I get the following: Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) I can create a custom Spark build with org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I was wondering if this is something Spark devs have noticed and are looking to resolve in near releases. Here are my thoughts on this issue: Containers that allow running custom user code have to often resolve dependency issues in case of conflicts between framework's and user code's dependency. Here is how I have seen some frameworks resolve the issue: 1. Provide a child-first class loader: Some JEE containers provided a child-first class loader that allowed for loading classes from user code first. I don't think this approach completely solves the problem as the framework is then susceptible to class mismatch errors. 2. Fold in all dependencies in a sub-package: This approach involves folding all dependencies in a project specific sub-package (like spark.dependencies). This approach is tedious because it involves building custom version of all dependencies (and their transitive dependencies) 3. Use something like OSGi: Some frameworks has successfully used OSGi to manage dependencies between the modules. The challenge in this approach is to
Re: Dependency hell in Spark applications
From output of dependency:tree: [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-streaming_2.10 --- [INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile ... [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile [INFO] | | +- commons-codec:commons-codec:jar:1.5:compile [INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile [INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile bq. excluding httpclient from spark-streaming dependency in your sbt/maven project This should work. On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das tathagata.das1...@gmail.com wrote: If httpClient dependency is coming from Hive, you could build Spark without Hive. Alternatively, have you tried excluding httpclient from spark-streaming dependency in your sbt/maven project? TD On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote: custom spark builds should not be the answer. at least not if spark ever wants to have a vibrant community for spark apps. spark does support a user-classpath-first option, which would deal with some of these issues, but I don't think it works. On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote: Hi, I run into the same issue and apart from the ideas Aniket said, I only could find a nasty workaround. Add my custom PoolingClientConnectionManager to my classpath. http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955 On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote: Dumb question -- are you using a Spark build that includes the Kinesis dependency? that build would have resolved conflicts like this for you. Your app would need to use the same version of the Kinesis client SDK, ideally. All of these ideas are well-known, yes. In cases of super-common dependencies like Guava, they are already shaded. This is a less-common source of conflicts so I don't think http-client is shaded, especially since it is not used directly by Spark. I think this is a case of your app conflicting with a third-party dependency? I think OSGi is deemed too over the top for things like this. On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am trying to use Kinesis as source to Spark Streaming and have run into a dependency issue that can't be resolved without making my own custom Spark build. The issue is that Spark is transitively dependent on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run Spark Streaming application, I get the following: Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) I can create a custom Spark build with org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I was wondering if this is something Spark devs have noticed and are looking to resolve in near releases. Here are my thoughts on this issue: Containers that allow running custom user code have to often resolve dependency
Re: amplab jenkins is down
We have successfully purged Jenkins’ build queue. If you want a PR to be re-tested, please ask Jenkins again. On September 5, 2014 at 5:36:30 PM, shane knapp (skn...@berkeley.edu) wrote: yeah, it was a problem w/the PRB's OAuth key. josh rosen added a new key, and magique! we're about to clear the queue of all builds as most aren't wanted/needed. On Fri, Sep 5, 2014 at 5:33 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Looks like Jenkins is back! lol The poor guy has like a million builds https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/ to catch up on. On Fri, Sep 5, 2014 at 4:15 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: How's it going? It looks like during the last build https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/lastBuild/console from about 30 min ago Jenkins was still having trouble fetching from GitHub. It also looks like not all requests for testing are triggering builds. On Fri, Sep 5, 2014 at 1:23 PM, shane knapp skn...@berkeley.edu wrote: it's looking like everything except the pull request builders are working. i'm going to be working on getting this resolved today. On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Hmm, looks like at least some builds https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19804/consoleFull are working now, though this last one was from ~5 hours ago. On Fri, Sep 5, 2014 at 1:02 AM, shane knapp skn...@berkeley.edu wrote: yep. that's exactly the behavior i saw earlier, and will be figuring out first thing tomorrow morning. i bet it's an environment issues on the slaves. On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Looks like during the last build https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19797/console Jenkins was unable to execute a git fetch? On Thu, Sep 4, 2014 at 7:58 PM, shane knapp skn...@berkeley.edu wrote: i'm going to restart jenkins and see if that fixes things. On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu wrote: looking On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It appears that our main man is having trouble https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/ hearing new requests https://github.com/apache/spark/pull/2277#issuecomment-54549106. Do we need some smelling salts? On Thu, Sep 4, 2014 at 5:49 PM, shane knapp skn...@berkeley.edu wrote: i'd ping the Jenkinsmench... the master was completely offline, so any new jobs wouldn't have reached it. any jobs that were queued when power was lost probably started up, but jobs that were running would fail. On Thu, Sep 4, 2014 at 2:45 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Woohoo! Thanks Shane. Do you know if queued PR builds will automatically be picked up? Or do we have to ping the Jenkinmensch manually from each PR? Nick On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu wrote: AND WE'RE UP! sorry that this took so long... i'll send out a more detailed explanation of what happened soon. now, off to back up jenkins. shane On Thu, Sep 4, 2014 at 1:27 PM, shane knapp skn...@berkeley.edu wrote: it's a faulty power switch on the firewall, which has been swapped out. we're about to reboot and be good to go. On Thu, Sep 4, 2014 at 1:19 PM, shane knapp skn...@berkeley.edu wrote: looks like some hardware failed, and we're swapping in a replacement. i don't have more specific information yet -- including *what* failed, as our sysadmin is super busy ATM. the root cause was an incorrect circuit being switched off during building maintenance. on a side note, this incident will be accelerating our plan to move the entire jenkins infrastructure in to a managed datacenter environment. this will be our major push over the next couple of weeks. more details about this, also, as soon as i get them. i'm very sorry about the downtime, we'll get everything up and running ASAP. On Thu, Sep 4, 2014 at 12:27 PM, shane knapp skn...@berkeley.edu wrote: looks like a power outage in soda hall. more updates as they happen. On Thu, Sep 4, 2014 at 12:25 PM, shane knapp skn...@berkeley.edu wrote: i am trying to get things
Re: [mllib] Add multiplying large scale matrices
Missed the dev-list last email. Resent it again. Please ignore the duplicated one. 2014-09-06 11:22 GMT+08:00 顾荣 gurongwal...@gmail.com: Hi All, This is RongGu from PasaLab at Nanjing Universtiy,China. Actually, we have been working on a distributed matrix operations library on Spark this summer. It is a Summer Code project hosted by CSDN and Intel Lab ( http://code.csdn.net/os_camp/8/proposals/26). Previously, the codebase of the project is hosted on CSDN's code platform( https://code.csdn.net/u014252240/sparkmatrixlib) and we have been writing weekly reports on the blog(http://blog.csdn.net/u014252240). Now, the project comes to end now. I have moved the project to github these days. *Please see the link here *https://github.com/PasaLab/saury . We name the project Saury and provide documents to help people know it better. Technically, we implement the matrix manipulation on Spark with block matrix parallel algorithms to distribute large scale matrix computation among cluster nodes. Also, we take advantage of the native linear algebra library(e.g BLAS)on each worker node to accelerate the computing process. That really makes a difference! See the preliminary performance evaluation report at https://github.com/PasaLab/saury/wiki/Performance-comparison-on-matrices-multiply Currently, we are working on adding more advanced matrix manipulation algorithms into Saury, such as matrix factorization and diagonalization algorithms. In fact, Saury contains an alpha version distributed LU factorization implementation now. Also, we are trying to use Tachyon to hold and share the matrix data across the cluster with faster speed. Best, Rong -- -- Rong Gu Department of Computer Science and Technology State Key Laboratory for Novel Software Technology Nanjing University Email: gurongwal...@gmail.com Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/ 2014-09-06 1:29 GMT+08:00 Jeremy Freeman freeman.jer...@gmail.com: Hey all, Definitely agreed this would be nice! In our own work we've done element-wise addition, subtraction, and scalar multiplication of similarly partitioned matrices very efficiently with zipping. We've also done matrix-matrix multiplication with zipping, but that only works in certain circumstances, and it's otherwise very communication intensive (as Shivaram says). Another tricky thing with addition / subtraction is how to handle sparse vs. dense arrays. Would be happy to contribute anything we did, but definitely first worth knowing what progress has been made from the AMPLab. -- Jeremy - jeremy freeman, phd neuroscientist @thefreemanlab On Sep 5, 2014, at 12:23 PM, Patrick Wendell pwend...@gmail.com wrote: Hey There, I believe this is on the roadmap for the 1.2 next release. But Xiangrui can comment on this. - Patrick On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Hi Evan, That's sounds interesting. Here is the ticket which I created. https://issues.apache.org/jira/browse/SPARK-3416 thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- -- Rong Gu Department of Computer Science and Technology State Key Laboratory for Novel Software Technology Nanjing University Phone: +86 15850682791 Email: gurongwal...@gmail.com Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/ -- -- Rong Gu Department of Computer Science and Technology State Key Laboratory for Novel Software Technology Nanjing University Phone: +86 15850682791 Email: gurongwal...@gmail.com Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/