Folks,
I have the following program :
SparkConf conf = new
SparkConf().setMaster("local").setAppName("Indexer").set("spark.driver.maxResultSize",
"2g");conf.set("es.index.auto.create", "true");conf.set("es.nodes",
"localhost");conf.set("es.port", "9200");conf.set("es.write.operation",
"index");
Folks,
I have an input file which is gzipped. I use sc.textFile("foo.gz") when I see
the following problem. Can someone help me how to fix this?
15/09/03 10:05:32 INFO deprecation: mapred.job.id is deprecated. Instead, use
mapreduce.job.id15/09/03 10:05:32 INFO CodecPool: Got brand-new decompress
I was running a Spark Job to crunch a 9GB apache log file When I saw the
following error:
15/08/25 04:25:16 WARN scheduler.TaskSetManager: Lost task 99.0 in stage 37.0
(TID 4115, ip-10-150-137-100.ap-southeast-1.compute.internal):
ExecutorLostFailure (executor 29 lost)15/08/25 04:25:16 INFO
s
Folks,
I use the following Streaming API from KafkaUtils :
public JavaPairInputDStream inputDStream() {
HashSet topicsSet = new
HashSet(Arrays.asList(topics.split(",")));
HashMap kafkaParams = new HashMap();
kafkaParams.put(Tokens.KAFKA_BROKER_LIST_TOKEN.getRealTokenName(), brokers);
mode only)
--supervise If given, restarts the driver on failure.
At 2015-08-19 14:55:39, "Spark Enthusiast" wrote:
Folks,
As I see, the Driver program is a single point of failure. Now, I have seen
ways as to how to make it recover from failures on a rest
Folks,
As I see, the Driver program is a single point of failure. Now, I have seen
ways as to how to make it recover from failures on a restart (using
Checkpointing) but I have not seen anything as to how to restart it
automatically if it crashes.
Will running the Driver as a Hadoop Yarn Applica
Forgot to mention. Here is how I run the program :
./bin/spark-submit --conf "spark.app.master"="local[1]"
~/workspace/spark-python/ApacheLogWebServerAnalysis.py
On Wednesday, 12 August 2015 10:28 AM, Spark Enthusiast
wrote:
I wrote a small python program
I wrote a small python program :
def parseLogs(self):
""" Read and parse log file """
self._logger.debug("Parselogs() start")
self.parsed_logs = (self._sc
.textFile(self._logFile)
.map(self._parseApacheLogLine)
.cac
All examples of Spark Stream programming that I see assume streams of lines
that are then tokenised and acted upon (like the WordCount example).
How do I process Streams that span multiple lines? Are there examples that I
can use?
Folks,
My Use case is as follows:
My Driver program will be aggregating a bunch of Event Streams and acting on
it. The Action on the aggregated events is configurable and can change
dynamically.
One way I can think of is to run the Spark Driver as a Service where a config
push can be caught via
Hi,
I have to build a system that reacts to a set of events. Each of these events
are separate streams by themselves which are consumed from different Kafka
Topics and hence will have different InputDStreams.
Questions:
Will I be able to do joins across multiple InputDStreams and collate the outp
For prototyping purposes, I created a test program injecting dependancies using
Spring.
Nothing fancy. This is just a re-write of KafkaDirectWordCount. When I run
this, I get the following exception:
Exception in thread "main" org.apache.spark.SparkException: Task not
serializable
at
org.
of options…. spark streaming, storm, samza, akka and others…
Storm is probably the easiest to pick up, spark streaming / akka may give you
more flexibility and akka would work for CEP.
Just my $0.02
On Jun 16, 2015, at 9:40 PM, Spark Enthusiast wrote:
I have a use-case where a stream of
stored to the database and this
transformed data should then be sent to a new pipeline for further processing
How can this be achieved using Spark?
On Wed, Jun 17, 2015 at 10:10 AM, Spark Enthusiast
wrote:
I have a use-case where a stream of Incoming events have to be aggregated and
joi
I have a use-case where a stream of Incoming events have to be aggregated and
joined to create Complex events. The aggregation will have to happen at an
interval of 1 minute (or less).
The pipeline is : send events
enrich
15 matches
Mail list logo