Re: [discuss] dropping Python 2.6 support

2016-01-04 Thread Kushal Datta
+1 Dr. Kushal Datta Senior Research Scientist Big Data Research & Pathfinding Intel Corporation, USA. On Mon, Jan 4, 2016 at 11:52 PM, Jean-Baptiste Onofré wrote: > +1 > > no problem for me to remove Python 2.6 in 2.0. > > Thanks > Regards > JB > > > On 01/05/2016 08:17 AM, Reynold Xin wro

Re: [discuss] dropping Python 2.6 support

2016-01-04 Thread Jean-Baptiste Onofré
+1 no problem for me to remove Python 2.6 in 2.0. Thanks Regards JB On 01/05/2016 08:17 AM, Reynold Xin wrote: Does anybody here care about us dropping support for Python 2.6 in Spark 2.0? Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json parsing) when compared with Python

GraphX does not unpersist RDDs

2016-01-04 Thread Alexander Pivovarov
// open spark-shell 1.5.2 // run import org.apache.spark.graphx._ val vert = sc.parallelize(List((1L, 1), (2L, 2), (3L, 3)), 1) val edges = sc.parallelize(List(Edge[Long](1L, 2L), Edge[Long](1L, 3L)), 1) val g0 = Graph(vert, edges) val g = g0.partitionBy(PartitionStrategy.EdgePartition2D, 2) val

[discuss] dropping Python 2.6 support

2016-01-04 Thread Reynold Xin
Does anybody here care about us dropping support for Python 2.6 in Spark 2.0? Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json parsing) when compared with Python 2.7. Some libraries that Spark depend on stopped supporting 2.6. We can still convince the library maintainers to su

Re: How to execute non-hadoop command ?

2016-01-04 Thread Jeff Zhang
Sorry, wrong list On Tue, Jan 5, 2016 at 12:36 PM, Jeff Zhang wrote: > I want to create service check for spark, but spark don't use hadoop > script as launch script. I found other component use ExecuteHadoop to > launch hadoop job to verify the service, I am wondering is there is there > any ap

Re: running lda in spark throws exception

2016-01-04 Thread Li Li
anyone could help? the problem is very easy to reproduce. What's wrong? On Wed, Dec 30, 2015 at 8:59 PM, Li Li wrote: > I use a small data and reproduce the problem. > But I don't know my codes are correct or not because I am not familiar > with spark. > So I first post my codes here. If it's cor

How to execute non-hadoop command ?

2016-01-04 Thread Jeff Zhang
I want to create service check for spark, but spark don't use hadoop script as launch script. I found other component use ExecuteHadoop to launch hadoop job to verify the service, I am wondering is there is there any api for non-hadoop command ? BTW I check the source code of execute_hadoop.py but

Re: SparkML algos limitations question.

2016-01-04 Thread Yanbo Liang
Hi Alexander, That's cool! Thanks for the clarification. Yanbo 2016-01-05 5:06 GMT+08:00 Ulanov, Alexander : > Hi Yanbo, > > > > As long as two models fit into memory of a single machine, there should be > no problems, so even 16GB machines can handle large models. (master should > have more me

RE: Data and Model Parallelism in MLPC

2016-01-04 Thread Ulanov, Alexander
Hi Disha, Data is stacked into matrices to perform matrix-matrix multiplication (instead of matrix-vector) that is handled by native BLAS and one can get a speed-up. You can refer here for benchmarks https://github.com/fommil/netlib-java With regards to your second question, data parallelism is

RE: Support off-loading computations to a GPU

2016-01-04 Thread Ulanov, Alexander
Hi Kazuaki, Sounds very interesting! Could you elaborate on your benchmark with regards to logistic regression (LR)? Did you compare your implementation with the current implementation of LR in Spark? Best regards, Alexander From: Kazuaki Ishizaki [mailto:ishiz...@jp.ibm.com] Sent: Sunday, Jan

Re: Spark Streaming Application is Stuck Under Heavy Load Due to DeadLock

2016-01-04 Thread Shixiong Zhu
Hye Rachana, could you provide the full jstack outputs? Maybe it's same as https://issues.apache.org/jira/browse/SPARK-11104 Best Regards, Shixiong Zhu 2016-01-04 12:56 GMT-08:00 Rachana Srivastava < rachana.srivast...@markmonitor.com>: > Hello All, > > > > I am running my application on Spark c

RE: SparkML algos limitations question.

2016-01-04 Thread Ulanov, Alexander
Hi Yanbo, As long as two models fit into memory of a single machine, there should be no problems, so even 16GB machines can handle large models. (master should have more memory because it runs LBFGS) In my experiments, I’ve trained the models 12M and 32M parameters without issues. Best regards

Spark Streaming Application is Stuck Under Heavy Load Due to DeadLock

2016-01-04 Thread Rachana Srivastava
Hello All, I am running my application on Spark cluster but under heavy load the system is hung due to deadlock. I found similar issues resolved here https://datastax-oss.atlassian.net/browse/JAVA-555 in Spark version 2.1.3. But I am running on Spark 1.3 still getting the same issue. Here i

[ANNOUNCE] Announcing Spark 1.6.0

2016-01-04 Thread Michael Armbrust
Hi All, Spark 1.6.0 is the seventh release on the 1.x line. This release includes patches from 248+ contributors! To download Spark 1.6.0 visit the downloads page. (It may take a while for all mirrors to update.) A huge thanks go to all of the individuals and organizations involved in developmen

Re: Support off-loading computations to a GPU

2016-01-04 Thread Kazuaki Ishizaki
I created a new JIRA entry https://issues.apache.org/jira/browse/SPARK-12620 for this instead of reopening the existing JIRA based on the suggestion. Best Regards, Kazuaki Ishizaki From: Kazuaki Ishizaki/Japan/IBM@IBMJP To: dev@spark.apache.org Date: 2016/01/04 12:54 Subject:S