How do I link JavaEsSpark.saveToEs() to a sparkConf?

2015-12-14 Thread Spark Enthusiast
Folks, I have the following program : SparkConf conf = new SparkConf().setMaster("local").setAppName("Indexer").set("spark.driver.maxResultSize", "2g");conf.set("es.index.auto.create", "true");conf.set("es.nodes", "localhost");conf.set("es.port", "9200");conf.set("es.write.operation", "index");

Getting an error when trying to read a GZIPPED file

2015-09-02 Thread Spark Enthusiast
Folks, I have an input file which is gzipped. I use sc.textFile("foo.gz") when I see the following problem. Can someone help me how to fix this? 15/09/03 10:05:32 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id15/09/03 10:05:32 INFO CodecPool: Got brand-new decompress

Spark

2015-08-24 Thread Spark Enthusiast
I was running a Spark Job to crunch a 9GB apache log file When I saw the following error: 15/08/25 04:25:16 WARN scheduler.TaskSetManager: Lost task 99.0 in stage 37.0 (TID 4115, ip-10-150-137-100.ap-southeast-1.compute.internal): ExecutorLostFailure (executor 29 lost)15/08/25 04:25:16 INFO s

How to parse multiple event types using Kafka

2015-08-23 Thread Spark Enthusiast
Folks, I use the following Streaming API from KafkaUtils : public JavaPairInputDStream inputDStream() { HashSet topicsSet = new HashSet(Arrays.asList(topics.split(","))); HashMap kafkaParams = new HashMap(); kafkaParams.put(Tokens.KAFKA_BROKER_LIST_TOKEN.getRealTokenName(), brokers);

Re: How to automatically relaunch a Driver program after crashes?

2015-08-19 Thread Spark Enthusiast
mode only)   --supervise If given, restarts the driver on failure. At 2015-08-19 14:55:39, "Spark Enthusiast" wrote: Folks, As I see, the Driver program is a single point of failure. Now, I have seen ways as to how to make it recover from failures on a rest

How to automatically relaunch a Driver program after crashes?

2015-08-18 Thread Spark Enthusiast
Folks, As I see, the Driver program is a single point of failure. Now, I have seen ways as to how to make it recover from failures on a restart (using Checkpointing) but I have not seen anything as to how to restart it automatically if it crashes. Will running the Driver as a Hadoop Yarn Applica

Re: Not seeing Log messages

2015-08-11 Thread Spark Enthusiast
Forgot to mention. Here is how I run the program :  ./bin/spark-submit --conf "spark.app.master"="local[1]" ~/workspace/spark-python/ApacheLogWebServerAnalysis.py On Wednesday, 12 August 2015 10:28 AM, Spark Enthusiast wrote: I wrote a small python program

Not seeing Log messages

2015-08-11 Thread Spark Enthusiast
I wrote a small python program : def parseLogs(self): """ Read and parse log file """ self._logger.debug("Parselogs() start") self.parsed_logs = (self._sc .textFile(self._logFile) .map(self._parseApacheLogLine) .cac

How do I Process Streams that span multiple lines?

2015-08-03 Thread Spark Enthusiast
All  examples of Spark Stream programming that I see assume streams of lines that are then tokenised and acted upon (like the WordCount example). How do I process Streams that span multiple lines? Are there examples that I can use? 

Can a Spark Driver Program be a REST Service by itself?

2015-07-01 Thread Spark Enthusiast
Folks, My Use case is as follows: My Driver program will be aggregating a bunch of Event Streams and acting on it. The Action on the aggregated events is configurable and can change dynamically. One way I can think of is to run the Spark Driver as a Service where a config push can be caught via

Can I do Joins across Event Streams ?

2015-07-01 Thread Spark Enthusiast
Hi, I have to build a system that reacts to a set of events. Each of these events are separate streams by themselves which are consumed from different Kafka Topics and hence will have different InputDStreams. Questions: Will I be able to do joins across multiple InputDStreams and collate the outp

Serialization Exception

2015-06-29 Thread Spark Enthusiast
For prototyping purposes, I created a test program injecting dependancies using Spring. Nothing fancy. This is just a re-write of KafkaDirectWordCount. When I run this, I get the following exception: Exception in thread "main" org.apache.spark.SparkException: Task not serializable     at org.

Re: Spark or Storm

2015-06-17 Thread Spark Enthusiast
of options…. spark streaming, storm, samza, akka and others…  Storm is probably the easiest to pick up,  spark streaming / akka may give you more flexibility and akka would work for CEP.  Just my $0.02 On Jun 16, 2015, at 9:40 PM, Spark Enthusiast wrote: I have a use-case where a stream of

Re: Spark or Storm

2015-06-17 Thread Spark Enthusiast
stored to the database and this transformed data should then be sent to a new pipeline for further processing How can this be achieved using Spark? On Wed, Jun 17, 2015 at 10:10 AM, Spark Enthusiast wrote: I have a use-case where a stream of Incoming events have to be aggregated and joi

Re: Spark or Storm

2015-06-16 Thread Spark Enthusiast
I have a use-case where a stream of Incoming events have to be aggregated and joined to create Complex events. The aggregation will have to happen at an interval of 1 minute (or less). The pipeline is :                                  send events                                          enrich