RE: Running in cluster mode causes native library linking to fail

2015-10-26 Thread prajod.vettiyattil
Hi Bernardo, Glad that our suggestions helped. A bigger thanks for sharing your solution with us. That was a tricky and difficult problem to track and solve ! Regards, Prajod From: Bernardo Vecchia Stein [mailto:bernardovst...@gmail.com] Sent: 26 October 2015 23:41 To: Prajod S Vettiyattil

RE: Running in cluster mode causes native library linking to fail

2015-10-15 Thread prajod.vettiyattil
Forwarding to the group, in case someone else has the same error. Just found out that I did not reply to the group in my original reply. From: Prajod S Vettiyattil (WT01 - BAS) Sent: 15 October 2015 11:45 To: 'Bernardo Vecchia Stein' Subject: RE: Running in cluster

RE: Node afinity for Kafka-Direct Stream

2015-10-14 Thread prajod.vettiyattil
Hi, Another point is the in the receiver based approach, all the data from kafka first goes to the Worker where the receiver runs https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md Also if you create one stream (which is the normal case), and you have many worker

RE: Spark streaming on standalone cluster

2015-07-01 Thread prajod.vettiyattil
Spark streaming needs at least two threads on the worker/slave side. I have seen this issue when(to test the behavior), I set the thread count for spark streaming to 1. It should be atleast 2: one for the receiver adapter(kafka, flume etc) and the second for processing the data. But I tested

RE: What does Spark is not just MapReduce mean? Isn't every Spark job a form of MapReduce?

2015-06-29 Thread prajod.vettiyattil
Hi, Any {fan-out - process in parallel - fan-in - aggregate} pattern of data flow can be conceptually Map-Reduce(MR, as it is done in Hadoop). Apart from the bigger list of map, reduce, sort, filter, pipe, join, combine,... functions, that are many times more efficient and productive for

RE: Performing sc.paralleize (..) in workers not in the driver program

2015-06-26 Thread prajod.vettiyattil
This is how it works, I think: sc.parallelize(..) takes the variable inside the (..) and returns a “distributable equivalent” of that variable. That is, an RDD is returned. This RDD can be worked on by multiple workers threads in _parallel_. The parallelize(..) has to be done on the driver

RE: Build spark application into uber jar

2015-06-19 Thread prajod.vettiyattil
but when I run the application locally, it complains that spark related stuff is missing I use the uber jar option. What do you mean by “locally” ? In the Spark scala shell ? In the From: bit1...@163.com [mailto:bit1...@163.com] Sent: 19 June 2015 08:11 To: user Subject: Build spark

RE: Re: Build spark application into uber jar

2015-06-19 Thread prajod.vettiyattil
Hi, When running inside Eclipse IDE, I use another maven target to build. That is the default maven target. For building for uber jar. I use the assembly jar target. So use two maven build targets in the same pom file to solve this issue. In maven you can have multiple build targets, and each

RE: RE: Build spark application into uber jar

2015-06-19 Thread prajod.vettiyattil
Multiple maven profiles may be the ideal way. You can also do this with: 1. The defaul build command “mvn compile” , for local builds(use this to build with Eclipse’s “Run As-Maven build” option when you right-click on the pom.xml file) 2. Add maven build options to the same build

RE: RE: Spark or Storm

2015-06-18 Thread prajod.vettiyattil
More details on the Direct API of Spark 1.3 is at the databricks blog: https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html Note the use of checkpoints to persist the Kafka offsets in Spark Streaming itself, and not in zookeeper. Also this

RE: Spark or Storm

2015-06-18 Thread prajod.vettiyattil
not being able to read from Kafka using multiple nodes Kafka is plenty capable of doing this.. I faced the same issue before Spark 1.3 was released. The issue was not with Kafka, but with Spark Streaming’s Kafka connector. Before Spark 1.3.0 release one Spark worker would get all the streamed

RE: Could Spark batch processing live within Spark Streaming?

2015-06-12 Thread prajod.vettiyattil
Hi Raj, What you need seems to be an event based initiation of a DStream. Have not seen one yet. There are many types of DStreams that Spark implements. You can also implement your own. InputDStream is a close match for your requirement. See this for the available options with InputDStream:

RE: How to set spark master URL to contain domain name?

2015-06-11 Thread prajod.vettiyattil
Hi Ningjun, This is probably a configuration difference between WIN01 and WIN02. Execute: ipconfig/all on the windows command line on both machines and compare them. Also if you have a localhost entry in the hosts file, it should not have the wrong sequence: See the first answer in this

RE: I am struggling to run Spark Examples on my local machine

2014-08-21 Thread prajod.vettiyattil
Hi Steve, Your spark master is not running if you have not started it. On windows its missing come scripts/and or the correct installation instructions. I was able to start the master with C:\ spark-class.cmd org.apache.spark.deploy.master.Master Then on the browser with localhost:port you get