Kryo serialization issues

2014-08-14 Thread Debasish Das
Hi, Is there a JIRA for this bug ? I have seen it multiple times during our ALS runs now...some runs don't show while some runs fail due to the error msg https://github.com/GrahamDennis/spark-kryo-serialisation/blob/master/README.md One way to circumvent this is to not use kryo but then I am

Re: A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms

2014-08-14 Thread Jeremy Freeman
@Ignacio, happy to share, here's a link to a library we've been developing (https://github.com/freeman-lab/thunder). As just a couple examples, we have pipelines that use fourier transforms and other signal processing from scipy, and others that do massively parallel model fitting via Scikit

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-14 Thread Graham Dennis
Hi Deb, The only alternative serialiser is the JavaSerialiser (the default). Theoretically Spark supports custom serialisers, but due to a related issue, custom serialisers currently can't live in application jars and must be available to all executors at launch. My PR fixes this issue as well,

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-14 Thread Reynold Xin
Graham, Thanks for working on this. This is an important bug to fix. I don't have the whole context and obviously I haven't spent nearly as much time on this as you have, but I'm wondering what if we always pass the executor's ClassLoader to the Kryo serializer? Will that solve this problem?

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-14 Thread Graham Dennis
Hi Reynold, That would solve this specific issue, but you'd need to be careful that you never created a serialiser instance before the first task is received. Currently in Executor.TaskRunner.run a closure serialiser instance is created before any application jars are downloaded, but that could

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-14 Thread Graham Dennis
In part, my assertion was based on a comment by sryza on my PR ( https://github.com/apache/spark/pull/1890#issuecomment-51805750), however I thought I had also seen it in the YARN code base. However, now that I look for it, I can't find where this happens, so perhaps I was imagining the YARN

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-14 Thread Reynold Xin
Graham, SparkEnv only creates a KryoSerializer, but as I understand that serializer doesn't actually initializes the registrator since that is only called when newKryo() is called when KryoSerializerInstance is initialized. Basically I'm thinking a quick fix for 1.2: 1. Add a classLoader field

Re: A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms

2014-08-14 Thread Ignacio Zendejas
Thanks, Jeremy! That's awesome. There's a group at Facebook that is considering using Spark, so to have more projects to refer to is great. And Matei, I completely agree. MLlib is very exciting. I respect how well you guys are managing the project for quality. This will set the Spark ecosystem

[SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster

2014-08-14 Thread Mingyu Kim
I ran a really simple code that runs with Spark 1.0.2 jar and connects to a Spark 1.0.1 cluster, but it fails with java.io.InvalidClassException. I filed the bug at https://issues.apache.org/jira/browse/SPARK-3050. I assumed the minor and patch releases shouldn¹t break compatibility. Is that

Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster

2014-08-14 Thread Gary Malouf
To be clear, is it 'compiled' against 1.0.2 or it packaged with it? On Thu, Aug 14, 2014 at 6:39 PM, Mingyu Kim m...@palantir.com wrote: I ran a really simple code that runs with Spark 1.0.2 jar and connects to a Spark 1.0.1 cluster, but it fails with java.io.InvalidClassException. I filed

Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster

2014-08-14 Thread Patrick Wendell
I commented on the bug. For driver mode, you'll need to get the corresponding version of spark-submit for Spark 1.0.2. On Thu, Aug 14, 2014 at 3:43 PM, Gary Malouf malouf.g...@gmail.com wrote: To be clear, is it 'compiled' against 1.0.2 or it packaged with it? On Thu, Aug 14, 2014 at 6:39

mvn test error

2014-08-14 Thread scwf
env: ubuntu 14.04 + spark master buranch mvn -Pyarn -Phive -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package mvn -Pyarn -Phadoop-2.4 -Phive test test error: DriverSuite: Spark assembly has been built with Hive, including Datanucleus jars on classpath - driver should exit after

RE: [sql]enable spark sql cli support spark sql

2014-08-14 Thread Cheng, Hao
Actually the SQL Parser (another SQL dialect in SparkSQL) is quite weak, and only support some basic queries, not sure what's the plan for its enhancement. -Original Message- From: scwf [mailto:wangf...@huawei.com] Sent: Friday, August 15, 2014 11:22 AM To: dev@spark.apache.org Subject:

Re: [sql]enable spark sql cli support spark sql

2014-08-14 Thread Cheng Lian
In the long run, as Michael suggested in his Spark Summit 14 talk, we’d like to implement SQL-92, maybe with the help of Optiq. On Aug 15, 2014, at 1:13 PM, Cheng, Hao hao.ch...@intel.com wrote: Actually the SQL Parser (another SQL dialect in SparkSQL) is quite weak, and only support some