Re: buildSupportsSnappy exception when reading the snappy file in Spark

2015-09-08 Thread dong.yajun
hi Akhil, I just use property key LD_LIBRARY_PATH in conf/spark-env.xml instead of SPARK_LIBRARY_PATH which points to the path of native, it works. thanks. On Tue, Sep 8, 2015 at 6:14 PM, Akhil Das wrote: > Looks like you are having different versions of snappy library. Here's a > similar disc

1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
Hi All, I'm trying to build a distribution off of the latest in master and I keep getting errors on MQTT and the build fails. I'm running the build on a m1.large which has 7.5 GB of RAM and no other major processes are running. MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=5

Re: How to read files from S3 from Spark local when there is a http proxy

2015-09-08 Thread tariq
Hi svelusamy, Were you able to make it work? I am facing the exact same problem. Getting connection timed when trying to access S3. Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-files-from-S3-from-Spark-local-when-there-is-a-http-p

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Nicholas R. Peterson
Thans, Igor; I've got it running again right now, and can attach the stack trace when it finishes. In the mean time, I've noticed something interesting: in the Spark UI, the application jar that I submit is not being included on the classpath. It has been successfully uploaded to the nodes -- in

Re: Sending yarn application logs to web socket

2015-09-08 Thread Jeetendra Gangele
1.in order to change log4j.properties at the name node, u can change /home/hadoop/log4j.properties. 2.in order to change log4j.properties for the container logs, u need to change it at the yarn containers jar, since they hard-coded loading the file directly from project resources. 2.1 ssh to the

Re: Split content into multiple Parquet files

2015-09-08 Thread Cheng Lian
In Spark 1.4 and 1.5, you can do something like this: df.write.partitionBy("key").parquet("/datasink/output-parquets") BTW, I'm curious about how did you do it without partitionBy using saveAsHadoopFile? Cheng On 9/8/15 2:34 PM, Adrien Mogenet wrote: Hi there, We've spent several hours to

Re: Parquet Array Support Broken?

2015-09-08 Thread Cheng Lian
Yeah, this is a typical Parquet interoperability issue due to unfortunate historical reasons. Hive (actually parquet-hive) gives the following schema for array: message m0 { optional group f (LIST) { repeated group bag { optional int32 array_element; } } } while Spark SQL gives me

Re: Can not allocate executor when running spark on mesos

2015-09-08 Thread Akhil Das
Here's the mesos master log > > I0908 15:08:16.515960 301916160 master.cpp:1767] Received registration > request for framework 'Spark shell' at > scheduler-1ea1c85b-68bd-40b4-8c7c-ddccfd56f82b@192.168.3.3:57133 > I0908 15:08:16.520545 301916160 master.cpp:1834] Regist

Re: Exception when restoring spark streaming with batch RDD from checkpoint.

2015-09-08 Thread Akhil Das
Try to add a filter to remove/replace the null elements within/before the map operation. Thanks Best Regards On Mon, Sep 7, 2015 at 3:34 PM, ZhengHanbin wrote: > Hi, > > I am using spark streaming to join every RDD of a DStream to a stand alone > RDD to generate a new DStream as followed: > > *

Re: buildSupportsSnappy exception when reading the snappy file in Spark

2015-09-08 Thread Akhil Das
Looks like you are having different versions of snappy library. Here's a similar discussion if you haven't seen it already http://stackoverflow.com/questions/22150417/hadoop-mapreduce-java-lang-unsatisfiedlinkerror-org-apache-hadoop-util-nativec Thanks Best Regards On Mon, Sep 7, 2015 at 7:41 AM,

Applying transformations on a JavaRDD using reflection

2015-09-08 Thread Nirmal Fernando
Hi All, I'd like to apply a chain of Spark transformations (map/filter) on a given JavaRDD. I'll have the set of Spark transformations as Function, and even though I can determine the classes of T and A at the runtime, due to the type erasure, I cannot call JavaRDD's transformations as they expect

about mr-style merge sort

2015-09-08 Thread 周千昊
Hi, community I have an application which I try to migrate from MR to Spark. It will do some calculations from Hive and output to hfile which will be bulk load to HBase Table, details as follow: Rdd input = getSourceInputFromHive() Rdd> mapSideResult = input.glom().mapPartition

Re: Spark SQL - UDF for scoring a model - take $"*"

2015-09-08 Thread Night Wolf
Haha ok, its one of those days, Array isn't valid. RTFM and it says Catalyst array maps to a Scala Seq, that makes sense. So it works! Two follow up questions; 1 - Is this the best approach? 2 - what if I want my expression to return multiple rows? - my binary classification model gives me a arra

Re: Spark SQL - UDF for scoring a model - take $"*"

2015-09-08 Thread Night Wolf
Sorry for the spam - I had some success; case class ScoringDF(function: Row => Double) extends Expression { val dataType = DataTypes.DoubleType override type EvaluatedType = Double override def eval(input: Row): EvaluatedType = { function(input) } override def nullable: Boolean =

Re: Spark on Yarn: Kryo throws ClassNotFoundException for class included in fat jar

2015-09-08 Thread Igor Berman
as a starting point, attach your stacktrace... ps: look for duplicates in your classpath, maybe you include another jar with same class On 8 September 2015 at 06:38, Nicholas R. Peterson wrote: > I'm trying to run a Spark 1.4.1 job on my CDH5.4 cluster, through Yarn. > Serialization is set to us

Re: Spark SQL - UDF for scoring a model - take $"*"

2015-09-08 Thread Night Wolf
So basically I need something like df.withColumn("score", new Column(new Expression { ... def eval(input: Row = null): EvaluatedType = myModel.score(input) ... })) But I can't do this, so how can I make a UDF or something like it, that can take in a Row and pass back a double value or some str

Re: Spark SQL - UDF for scoring a model - take $"*"

2015-09-08 Thread Night Wolf
Not sure how that would work. Really I want to tack on an extra column onto the DF with a UDF that can take a Row object. On Tue, Sep 8, 2015 at 1:54 AM, Jörn Franke wrote: > Can you use a map or list with different properties as one parameter? > Alternatively a string where parameters are Comma

Can not allocate executor when running spark on mesos

2015-09-08 Thread canan chen
I0908 15:08:16.515960 301916160 master.cpp:1767] Received registration request for framework 'Spark shell' at scheduler-1ea1c85b-68bd-40b4-8c7c-ddccfd56f82b@192.168.3.3:57133 I0908 15:08:16.520545 301916160 master.cpp:1834] Registering framework 20150908-143320-16777343-5050-41965-0

Re: Problems with Tungsten in Spark 1.5.0-rc2

2015-09-08 Thread Anders Arpteg
Ok, thanks Reynold. When I tested dynamic allocation with Spark 1.4, it complained saying that it was not tungsten compliant. Lets hope it works with 1.5 then! On Tue, Sep 8, 2015 at 5:49 AM Reynold Xin wrote: > > On Wed, Sep 2, 2015 at 12:03 AM, Anders Arpteg wrote: > >> >> BTW, is it possible

Re: Partitions with zero records & variable task times

2015-09-08 Thread Akhil Das
Try using a custom partitioner for the keys so that they will get evenly distributed across tasks Thanks Best Regards On Fri, Sep 4, 2015 at 7:19 PM, mark wrote: > I am trying to tune a Spark job and have noticed some strange behavior - > tasks in a stage vary in execution time, ranging from 2

Re: Spark 1.4 RDD to DF fails with toDF()

2015-09-08 Thread Gheorghe Postelnicu
Compiling from source with Scala 2.11 support fixed this issue. Thanks again for the help! On Tue, Sep 8, 2015 at 7:33 AM, Gheorghe Postelnicu < gheorghe.posteln...@gmail.com> wrote: > Good point. It is a pre-compiled Spark version. Based on the text on the > downloads page, the answer to your

<    1   2