Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-03-08 Thread Muhammad Haseeb Javed
afka client > dependency, it shouldn't have compiled at all to begin with. > > On Wed, Feb 22, 2017 at 12:11 PM, Muhammad Haseeb Javed > <11besemja...@seecs.edu.pk> wrote: > > I just noticed that Spark version that I am using (2.0.2) is built with > > Scala 2.11.

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-22 Thread Muhammad Haseeb Javed
nting at all, right? Eliminate > that as a possible source of problems. > > Probably unrelated, but this also isn't a very good way to benchmark. > Kafka producers are threadsafe, there's no reason to create one for > each partition. > > On Mon, Feb 20, 2017 at 4:43

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
ic that just > does foreach println or similar, with no checkpointing at all, and get > that working first. > > On Mon, Feb 20, 2017 at 12:10 PM, Muhammad Haseeb Javed > <11besemja...@seecs.edu.pk> wrote: > > Update: I am using Spark 2.0.2 and Kafka 0.8.2 with Scala

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
Update: I am using Spark 2.0.2 and Kafka 0.8.2 with Scala 2.10 On Mon, Feb 20, 2017 at 1:06 PM, Muhammad Haseeb Javed < 11besemja...@seecs.edu.pk> wrote: > I am PhD student at Ohio State working on a study to evaluate streaming > frameworks (Spark Streaming, Storm, Flink) using t

Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
I am PhD student at Ohio State working on a study to evaluate streaming frameworks (Spark Streaming, Storm, Flink) using the the Intel HiBench benchmarks. But I think I am having a problem with Spark. I have Spark Streaming application which I am trying to run on a 5 node cluster (including master

Wrap an RDD with a ShuffledRDD

2015-11-08 Thread Muhammad Haseeb Javed
I am working on a modified Spark core and have a Broadcast variable which I deserialize to obtain an RDD along with its set of dependencies, as is done in ShuffleMapTask, as following: val taskBinary: Broadcast[Array[Byte]]var (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](

What is the abstraction for a Worker process in Spark code

2015-10-12 Thread Muhammad Haseeb Javed
I understand that each executor that is processing a Spark job is emulated in Spark code by the Executor class in Executor.scala and CoarseGrainedExecutorBackend is the abstraction which facilitates communication between an Executor and the Driver. But what is the abstraction for a Worker process i

Building spark-examples takes too much time using Maven

2015-08-26 Thread Muhammad Haseeb Javed
I checked out the master branch and started playing around with the examples. I want to build a jar of the examples as I wish run them using the modified spark jar that I have. However, packaging spark-examples takes too much time as maven tries to download the jar dependencies rather than use the

Re: Difference between Sort based and Hash based shuffle

2015-08-19 Thread Muhammad Haseeb Javed
tinually spills the contents of the > buffer to disk, then finally merges all the spilled files together to form > one final output file. This places much less stress on the file system and > requires much fewer I/O operations especially on the read side. > > -Andrew > >

Re: Difference between Sort based and Hash based shuffle

2015-08-16 Thread Muhammad Haseeb Javed
wrote: > Have a look at this presentation. > http://www.slideshare.net/colorant/spark-shuffle-introduction . Can be of > help to you. > > On Sat, Aug 15, 2015 at 1:42 PM, Muhammad Haseeb Javed < > 11besemja...@seecs.edu.pk> wrote: > >> What are the major difference

Difference between Sort based and Hash based shuffle

2015-08-15 Thread Muhammad Haseeb Javed
What are the major differences between how Sort based and Hash based shuffle operate and what is it that cause Sort Shuffle to perform better than Hash? Any talks that discuss both shuffles in detail, how they are implemented and the performance gains ?