Add Jars to Master/Worker classpath

2016-03-02 Thread Matthias Niehoff
e also using —driver-java-options and spark.executor.extraClassPath which leads to exceptions when running our apps in standalone cluster mode. So what is the best way to add jars to the master and worker classpath? Thank you -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Cons

Re: Add Jars to Master/Worker classpath

2016-03-02 Thread Matthias Niehoff
no, not to driver and executor but to the master and worker instances of the spark standalone cluster Am 2. März 2016 um 17:05 schrieb Igor Berman : > spark.driver.extraClassPath > spark.executor.extraClassPath > > 2016-03-02 18:01 GMT+02:00 Matthias Niehoff < > matthias.nieh

Re: Add Jars to Master/Worker classpath

2016-03-03 Thread Matthias Niehoff
Sumedh Wale : > On Wednesday 02 March 2016 09:39 PM, Matthias Niehoff wrote: > > no, not to driver and executor but to the master and worker instances of > the spark standalone cluster > > > Why exactly does adding jars to driver/executor extraClassPath not work? > > Cl

Re: Streaming job delays

2016-03-09 Thread Matthias Niehoff
abble.com/Streaming-job-delays-tp26433.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-ma

Re: Streaming job delays

2016-03-10 Thread Matthias Niehoff
cheduling delay is between 0-5ms > and processing time is about 2.8min which is lower than my batch interval. > > I also noticed that enabling dynamic allocation and the external shuffle > service had a high impact on cpu usage. > > Thanks > Juan > > On Wed, Mar 9, 2016 a

Re: Spark job for Reading time series data from Cassandra

2016-03-10 Thread Matthias Niehoff
>> >> >> >> >> >> Thank You >> >> >> >> Prateek >> "DISCLAIMER: This message is proprietary to Aricent and is intended >> solely for the use of the individual to whom it is addressed. It may >> contain privileged or co

DataFrames UDAF with array and struct

2016-03-23 Thread Matthias Niehoff
the Type of the List? Or in other words: What is mapping of StructType with StructFields into Scala collection/data types? Thanks! -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681

Do not wrap result of a UDAF in an Struct

2016-03-29 Thread Matthias Niehoff
0) ++ buffer2.getSeq[String](0) } def evaluate(buffer: Row): Any = { buffer } } -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil:

Re: Please assist: Spark 1.5.2 / cannot find StateSpec / State

2016-04-13 Thread Matthias Niehoff
; %% "spark-streaming-flume" % > "1.3.0" % "provided" > ... > But compilations fail mentioning that class StateSpec and State are not > found > > Could pls someone point me to the right packages to refer if i want to use > StateSpec? > >

Sliding Average over Window in Spark Streaming

2016-05-06 Thread Matthias Niehoff
keep all the values in the window to compute the average. One way would be add every new value to a list in the reduce method and then to the avg computation in a separate map, but this seems kind of ugly. Do you have an idea how to solve this? Thanks! -- Matthias Niehoff | IT-Consultant

Re: about an exception when receiving data from kafka topic using Direct mode of Spark Streaming

2016-05-25 Thread Matthias Niehoff
println("amazon product with rating sent to kafka cluster..." + > amazonRating.toString) > System.exit(0) > } > > } > } > > > I have written a stack overflow post > <http://stackoverflow.com/questions/37303202/about-an-error-accessing-a

Substract two DStreams

2016-06-15 Thread Matthias Niehoff
between to DStreams? Thank you -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) 172.1702676 www.codecentric.de | blog.codecentric.de

Structured Streaming and Microbatches

2016-07-13 Thread Matthias Niehoff
Hi everybody, as far as I understand with new the structured Streaming API the output will not get processed every x seconds anymore. Instead the data will be processed as soon as is arrived. But there might be a delay due to processing time for the data. A small example: Data comes in and the pr

Re: How spark decides whether to do BroadcastHashJoin or SortMergeJoin

2016-07-22 Thread Matthias Niehoff
-decides-whether-to-do-BroadcastHashJoin-or-SortMergeJoin-tp27369.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > --

Using jar bundled log4j.xml on worker nodes

2016-02-04 Thread Matthias Niehoff
are using Spark 1.5.2 Thank you! -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) 172.1702676 www.codecentric.de | blog.codecentric.de

Re: Using jar bundled log4j.xml on worker nodes

2016-02-05 Thread Matthias Niehoff
2016 at 9:06 AM, Matthias Niehoff < > matthias.nieh...@codecentric.de> wrote: > >> Hello everybody, >> >> we’ve bundle our log4j.xml into our jar (in the classpath root). >> >> I’ve added the log4j.xml to the spark-defaults.conf with >> >> spa

Dynamic Resource Allocation with Spark Streaming (Standalone Cluster, Spark 1.5.1)

2015-10-26 Thread Matthias Niehoff
Hello everybody, I have a few (~15) Spark Streaming jobs which have load peaks as well as long times with a low load. So I thought the new Dynamic Resource Allocation for Standalone Clusters might be helpful (SPARK-4751). I have a test "cluster" with 1 worker consisting of 4 executors with 2 core

Logs of Custom Receiver

2015-11-30 Thread Matthias Niehoff
onstructor but no following statements in the start() method and other methods called from there. (All log at the same level) Where do I find this log statements? Thanks! -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185

Problems with new experimental Kafka Consumer for 0.10

2016-09-27 Thread Matthias Niehoff
reset" -> "largest", "enable.auto.commit" -> "false", "max.poll.records" -> "1000" -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49

Re: Problems with new experimental Kafka Consumer for 0.10

2016-09-28 Thread Matthias Niehoff
-09-27 18:55 GMT+02:00 Cody Koeninger : > What's the actual stacktrace / exception you're getting related to > commit failure? > > On Tue, Sep 27, 2016 at 9:37 AM, Matthias Niehoff > wrote: > > Hi everybody, > > > > i am using the new Kafka Receiver fo

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-04 Thread Matthias Niehoff
: > Well, I'd start at the first thing suggested by the error, namely that > the group has rebalanced. > > Is that stream using a unique group id? > > On Wed, Sep 28, 2016 at 5:17 AM, Matthias Niehoff > wrote: >

Re: When will the next version of spark be released?

2016-10-04 Thread Matthias Niehoff
was in July so maybe that is wrong. When will the next version > be released or is it more on an ad-hoc basis? > > Asking as there are some fixes in spark that we would like to use already > done with fix version Spark 2.0.1 > -- Matthias Niehoff | IT-Consultant | Agile

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-10 Thread Matthias Niehoff
ue, Oct 4, 2016 at 2:18 AM, Matthias Niehoff > wrote: > > Hi, > > sry for the late reply. A public holiday in Germany. > > > > Yes, its using a unique group id which no other job or consumer group is > > using. I have increased the session.timeout to 1 minutes and

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
chema: String, recordMapper: GenericRecord => EventType): DStream[EventType] = { serializedAdRequestStream.mapPartitions { iteratorOfMessages => val schema: Schema = new Schema.Parser().parse(avroSchema) val recordInjection = GenericAvroCodecs.toBinary(schema) iterator

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
duler: Added jobs for time 1476100946000 ms 16/10/10 14:03:26 INFO MapPartitionsRDD: Removing RDD 889 from persistence list 16/10/10 14:03:26 INFO JobScheduler: Starting job streaming job 1476100946000 ms.0 from job set of time 1476100946000 ms 2016-10-11 9:28 GMT+02:00 Matthias Niehoff :

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
multiple times, until about on minute has passed. I think this class is responsible for the endless loop, scheduling the microbatches, but I do not know exactly what it does and why it has a problem with multiple Kafka Direct Streams. 2016-10-11 11:46 GMT+02:00 Matthias Niehoff : > I stripped down

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
Koeninger : > Just out of curiosity, have you tried using separate group ids for the > separate streams? > > On Oct 11, 2016 4:46 AM, "Matthias Niehoff" de> wrote: > >> I stripped down the job to just consume the stream and print it, without >> avro deseri

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-17 Thread Matthias Niehoff
e out a way to make this more obvious. > >> > >> > >> On Oct 11, 2016 8:19 AM, "Matthias Niehoff" > >> wrote: > >> > >> good point, I changed the group id to be unique for the separate streams > >> and now it works. Thanks! >

Invalidating/Remove complete mapWithState state

2017-04-17 Thread Matthias Niehoff
and therefore takes to long. I tried to unpersist the RDD retrieved by stateSnapshot ( stateSnapshots().transform(_.unpersist()) ) , but this did not work as expected. Thank you, Matthias -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Hochstraße 11

Re: Invalidating/Remove complete mapWithState state

2017-04-17 Thread Matthias Niehoff
> > On Mon, 17 Apr 2017 at 10:13 pm, Matthias Niehoff < > matthias.nieh...@codecentric.de> wrote: > >> Hi everybody, >> >> is there a way to complete invalidate or remove the state used by >> mapWithState, not only for a given key using State#remove()? >