Guys,
I am trying hard to make a DStream API Spark streaming job work on EMR. I’ve
succeeded to the point of running it for a few hours with eventual failure
which is when I start seeing some out of memory exception via “yarn logs”
aggregate.
I am doing a JSON map and extraction of some fields
I have log4j json layout jars added via spark-submit on EMR
/usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --jars
/home/hadoop/lib/jsonevent-layout-1.7.jar,/home/hadoop/lib/json-smart-1.1.1.jar
--driver-java-options "-XX:+AlwaysPreTouch -XX:MaxPermSize=6G" --class
com.mlbam
Wed, Aug 9, 2017 at 2:52 PM, Mikhailau, Alex
wrote:
> I have log4j json layout jars added via spark-submit on EMR
>
>
>
> /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --jars
>
/home/hadoop/lib/jsonevent-layout-1.7.jar,/home/had
Does anyone have a working solution for logging YARN application id, YARN
container hostname, Executor ID and YARN attempt for jobs running on Spark EMR
5.7.0 in log statements? Are there specific ENV variables available or other
workflow for doing that?
Thank you
Alex
MDC way with spark or something other than to achieve this?
Alex
From: Vadim Semenov
Date: Monday, August 28, 2017 at 5:18 PM
To: "Mikhailau, Alex"
Cc: "user@spark.apache.org"
Subject: Re: Referencing YARN application id, YARN container hostname, Executor
ID and YARN atte
Would I use something like this to get to those VM arguments?
val runtimeMxBean = ManagementFactory.getRuntimeMXBean
val args = runtimeMxBean.getInputArguments
val conf = Conf(args)
etc.
From: Vadim Semenov
Date: Tuesday, August 29, 2017 at 11:49 AM
To: "Mikhailau, Alex"
I am getting the following in the logs:
Sink class org.apache.spark.metrics.sink.CloudwatchSink cannot be instantiated
due to CloudwatchSink ClassNotFoundException. I am running this on EMR 5.7.0.
Does anyone have experience adding this sink to an EMR cluster?
Thanks,
Alex
Guys,
I have a Spark 2.1.1 job with Kinesis where it is failing to launch 50 active
receivers with oversized cluster on EMR Yarn. It registers sometimes 16,
sometimes 32, other times 48 receivers but not all 50. Any help would be
greatly appreciated.
Kinesis stream shards = 500
YARN EMR CLus
Hi guys,
When I set up my EMR cluster with Spark I add
"*.sink.graphite.prefix": "$env.$namespace.$team.$app" to metrics.properties
The cluster comes up with correct metrics.properties
Then I simply add-step to EMR with spark-submit without any metrics namespace
parameter. In my Graphite, Spar
How do I create a JIRA issue and associate it with a PR that I created for a
bug in master?
https://github.com/apache/spark/pull/19210
Has anyone seen the following warnings in the log after a kinesis stream has
been re-sharded?
com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask
WARN Cannot get the shard for this ProcessTask, so duplicate KPL user records
in the event of resharding will not be dropped during d
-4454
With 2.2.0
-Alex
From: "Mikhailau, Alex"
Date: Wednesday, September 13, 2017 at 4:16 PM
To: "user@spark.apache.org"
Subject: Re-sharded kinesis stream starts generating warnings after kinesis
shard numbers were doubled
Has anyone seen the following warnings in the
Filed SPARK-22200
From: "Mikhailau, Alex"
Date: Wednesday, October 4, 2017 at 10:43 AM
To: "user@spark.apache.org"
Subject: Re: Re-sharded kinesis stream starts generating warnings after kinesis
shard numbers were doubled
Just found the same exact issues in one of our l
does Kinesis Connector for structured streaming auto-scales receivers if a
cluster is using dynamic allocation and auto-scaling?
14 matches
Mail list logo