Hi Bernardo,
Glad that our suggestions helped. A bigger thanks for sharing your solution
with us. That was a tricky and difficult problem to track and solve !
Regards,
Prajod
From: Bernardo Vecchia Stein [mailto:bernardovst...@gmail.com]
Sent: 26 October 2015 23:41
To: Prajod S Vettiyattil (WT0
Forwarding to the group, in case someone else has the same error. Just found
out that I did not reply to the group in my original reply.
From: Prajod S Vettiyattil (WT01 - BAS)
Sent: 15 October 2015 11:45
To: 'Bernardo Vecchia Stein'
Subject: RE: Running in cluster mode causes native library lin
Hi,
Another point is the in the receiver based approach, all the data from kafka
first goes to the Worker where the receiver runs
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
Also if you create one stream (which is the normal case), and you have many
worker instances,
Spark streaming needs at least two threads on the worker/slave side. I have
seen this issue when(to test the behavior), I set the thread count for spark
streaming to 1. It should be atleast 2: one for the receiver adapter(kafka,
flume etc) and the second for processing the data.
But I tested th
Hi,
Any {fan-out -> process in parallel -> fan-in -> aggregate} pattern of data
flow can be conceptually Map-Reduce(MR, as it is done in Hadoop).
Apart from the bigger list of map, reduce, sort, filter, pipe, join,
combine,... functions, that are many times more efficient and productive for
de
This is how it works, I think:
sc.parallelize(..) takes the variable inside the (..) and returns a
“distributable equivalent” of that variable. That is, an RDD is returned.
This RDD can be worked on by multiple workers threads in _parallel_. The
parallelize(..) has to be done on the driver runn
Multiple maven profiles may be the ideal way. You can also do this with:
1. The defaul build command “mvn compile” , for local builds(use this to
build with Eclipse’s “Run As->Maven build” option when you right-click on the
pom.xml file)
2. Add maven build options to the same build co
Hi,
When running inside Eclipse IDE, I use another maven target to build. That is
the default maven target. For building for uber jar. I use the assembly jar
target.
So use two maven build targets in the same pom file to solve this issue.
In maven you can have multiple build targets, and each
> but when I run the application locally, it complains that spark related stuff
> is missing
I use the uber jar option. What do you mean by “locally” ? In the Spark scala
shell ? In the
From: bit1...@163.com [mailto:bit1...@163.com]
Sent: 19 June 2015 08:11
To: user
Subject: Build spark applica
More details on the Direct API of Spark 1.3 is at the databricks blog:
https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html
Note the use of checkpoints to persist the Kafka offsets in Spark Streaming
itself, and not in zookeeper.
Also this statement:”
>>not being able to read from Kafka using multiple nodes
> Kafka is plenty capable of doing this..
I faced the same issue before Spark 1.3 was released.
The issue was not with Kafka, but with Spark Streaming’s Kafka connector.
Before Spark 1.3.0 release one Spark worker would get all the stream
Hi Raj,
What you need seems to be an event based initiation of a DStream. Have not seen
one yet. There are many types of DStreams that Spark implements. You can also
implement your own. InputDStream is a close match for your requirement.
See this for the available options with InputDStream:
htt
Hi Ningjun,
This is probably a configuration difference between WIN01 and WIN02.
Execute: ipconfig/all on the windows command line on both machines and compare
them.
Also if you have a localhost entry in the hosts file, it should not have the
wrong sequence: See the first answer in this link:
Hi Steve,
Your spark master is not running if you have not started it. On windows its
missing come scripts/and or the correct installation instructions.
I was able to start the master with
C:\> spark-class.cmd org.apache.spark.deploy.master.Master
Then on the browser with localhost:port you get
14 matches
Mail list logo