ASF JIRA is down for maintenance

2014-08-01 Thread Patrick Wendell
Please don't let this prevent you from merging patches, just keep a list and we can update the JIRA later. - Patrick

Re: How to run specific sparkSQL test with maven

2014-08-01 Thread Cheng Lian
It’s also useful to set hive.exec.mode.local.auto to true to accelerate the test. ​ On Sat, Aug 2, 2014 at 1:36 AM, Michael Armbrust wrote: > > > > It seems that the HiveCompatibilitySuite need a hadoop and hive > > environment, am I right? > > > > "Relative path in absolute URI: > > file:$%7Bs

Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-08-01 Thread Andrew Ash
The original version numbers I reported were indeed what we had, so let me clarify the situation. Our application had Guava 14 because that's what Spark depends on. But we had added an in-house library to the Hadoop cluster and also the Spark cluster to add a new FileSystem (think hdfs://, s3n://

Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-08-01 Thread Patrick Wendell
Andrew - I think Spark is using Guava 14... are you using Guava 16 in your user app (i.e. you inverted the versions in your earlier e-mail)? - Patrick On Fri, Aug 1, 2014 at 4:15 PM, Colin McCabe wrote: > On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash wrote: > > After several days of debugging, w

SparkContext.hadoopConfiguration vs. SparkHadoopUtil.newConfiguration()

2014-08-01 Thread Marcelo Vanzin
Hi all, While working on some seemingly unrelated code, I ran into this issue where "spark.hadoop.*" configs were not making it to the Configuration objects in some parts of the code. I was trying to do that to avoid having to do dirty ticks with the classpath while running tests, but that's a lit

Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-08-01 Thread Colin McCabe
On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash wrote: > After several days of debugging, we think the issue is that we have > conflicting versions of Guava. Our application was running with Guava 14 > and the Spark services (Master, Workers, Executors) had Guava 16. We had > custom Kryo serializers

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread andy petrella
Actually for click stream, the users space wouldn't be a continuum, unless the order of users is important or the fact that they are coming in a kind of order can be used by the algo. The purpose of the break or binning function is to package things in a cluster for which we know the properties, bu

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread Mayur Rustagi
Interesting, clickstream data would have its own window concept based on session of User , I can imagine windows would change across streams but wouldnt they large be domain specific in Nature? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

Re: Compiling Spark master (284771ef) with sbt/sbt assembly fails on EC2

2014-08-01 Thread Shivaram Venkataraman
Thanks Patrick -- It does look like some maven misconfiguration as wget http://repo1.maven.org/maven2/org/scala-lang/scala-library/2.10.2/scala-library-2.10.2.pom works for me. Shivaram On Fri, Aug 1, 2014 at 3:27 PM, Patrick Wendell wrote: > This is a Scala bug - I filed something upstream

Re: Compiling Spark master (284771ef) with sbt/sbt assembly fails on EC2

2014-08-01 Thread Patrick Wendell
This is a Scala bug - I filed something upstream, hopefully they can fix it soon and/or we can provide a work around: https://issues.scala-lang.org/browse/SI-8772 - Patrick On Fri, Aug 1, 2014 at 3:15 PM, Holden Karau wrote: > Currently scala 2.10.2 can't be pulled in from maven central it se

Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-08-01 Thread Andrew Ash
After several days of debugging, we think the issue is that we have conflicting versions of Guava. Our application was running with Guava 14 and the Spark services (Master, Workers, Executors) had Guava 16. We had custom Kryo serializers for Guava's ImmutableLists, and commenting out those regist

My Spark application had huge performance refression after Spark git commit: 0441515f221146756800dc583b225bdec8a6c075

2014-08-01 Thread Jin, Zhonghui
I found huge performance regression ( 1/20 of original) of my application after Spark git commit: 0441515f221146756800dc583b225bdec8a6c075. Apply the following patch, will fix my issue: diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/s

Interested in contributing to GraphX in Python

2014-08-01 Thread Rajiv Abraham
Hi, I just saw Ankur's GraphX presentation and it looks very exciting! I would like to contribute to a Python version of GraphX. I checked out JIRA and Github but I did not find much info. - Are there limitations currently to port GraphX in Python? (e.g. Maybe the Python Spark RDD API is incomplet

Re: How to run specific sparkSQL test with maven

2014-08-01 Thread Michael Armbrust
> > It seems that the HiveCompatibilitySuite need a hadoop and hive > environment, am I right? > > "Relative path in absolute URI: > file:$%7Bsystem:test.tmp.dir%7D/tmp_showcrt1” > You should only need Hadoop and Hive if you are creating new tests that we need to compute the answers for. Existing

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread andy petrella
Heya, Dunno if these ideas are still in the air or felt in the warp ^^. However there is a paper on avocado that mentions a way of working with their data (sequence's reads) in a windowed manner without n

Re: Re:How to run specific sparkSQL test with maven

2014-08-01 Thread Jeremy Freeman
With maven you can run a particular test suite like this: mvn -DwildcardSuites=org.apache.spark.sql.SQLQuerySuite test see the note here (under "Spark Tests in Maven"): http://spark.apache.org/docs/latest/building-with-maven.html -- View this message in context: http://apache-spark-developer

Re:How to run specific sparkSQL test with maven

2014-08-01 Thread witgo
You can try these commands‍ ./sbt/sbt assembly‍./sbt/sbt "test-only *.HiveCompatibilitySuite" -Phive‍ ‍ -- Original -- From: "田毅";; Date: Fri, Aug 1, 2014 05:00 PM To: "dev"; Subject: How to run specific sparkSQL test with maven Hi everyone! Could any

How to run specific sparkSQL test with maven

2014-08-01 Thread 田毅
Hi everyone! Could any one tell me how to run specific sparkSQL test with maven? For example: I want to test HiveCompatibilitySuite. I ran “mvm test -Dtest=HiveCompatibilitySuite” It did not work. BTW, is there any information about how to build a test environment of sparkSQL? I got this er