Re: Spark 1.3.0: how to let Spark history load old records?

2015-06-02 Thread Otis Gospodnetic
I think Spark doesn't keep historical metrics. You can use something like
SPM for that -
http://blog.sematext.com/2014/01/30/announcement-apache-storm-monitoring-in-spm/

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jun 1, 2015 at 11:36 PM, Haopu Wang  wrote:

> When I start the Spark master process, the old records are not shown in
> the monitoring UI.
>
> How to show the old records? Thank you very much!
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Otis Gospodnetic
I think you can use SPM - http://sematext.com/spm - it will give you all
Spark and all Kafka metrics, including offsets broken down by topic, etc.
out of the box.  I see more and more people using it to monitor various
components in data processing pipelines, a la
http://blog.sematext.com/2015/04/22/monitoring-stream-processing-tools-cassandra-kafka-and-spark/

Otis

On Mon, Jun 1, 2015 at 5:23 PM, dgoldenberg 
wrote:

> Hi,
>
> What are some of the good/adopted approached to monitoring Spark Streaming
> from Kafka?  I see that there are things like
> http://quantifind.github.io/KafkaOffsetMonitor, for example.  Do they all
> assume that Receiver-based streaming is used?
>
> Then "Note that one disadvantage of this approach (Receiverless Approach,
> #2) is that it does not update offsets in Zookeeper, hence Zookeeper-based
> Kafka monitoring tools will not show progress. However, you can access the
> offsets processed by this approach in each batch and update Zookeeper
> yourself".
>
> The code sample, however, seems sparse. What do you need to do here? -
>  directKafkaStream.foreachRDD(
>  new Function, Void>() {
>  @Override
>  public Void call(JavaPairRDD rdd) throws
> IOException {
>  OffsetRange[] offsetRanges =
> ((HasOffsetRanges)rdd).offsetRanges
>  // offsetRanges.length = # of Kafka partitions being consumed
>  ...
>  return null;
>  }
>  }
>  );
>
> and if these are updated, will KafkaOffsetMonitor work?
>
> Monitoring seems to center around the notion of a consumer group.  But in
> the receiverless approach, code on the Spark consumer side doesn't seem to
> expose a consumer group parameter.  Where does it go?  Can I/should I just
> pass in group.id as part of the kafkaParams HashMap?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-monitor-Spark-Streaming-from-Kafka-tp23103.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: RE: ElasticSearch for Spark times out

2015-04-22 Thread Otis Gospodnetic
Hi,

If you get ES response back in 1-5 seconds that's pretty slow.  Are these
ES aggregation queries?  Costin may be right about GC possibly causing
timeouts.  SPM  can give you all Spark and all
key Elasticsearch metrics, including various JVM metrics.  If the problem
is GC, you'll see it.  If you monitor both Spark side and ES side, you
should be able to find some correlation with SPM.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Apr 22, 2015 at 5:43 PM, Costin Leau  wrote:

> Hi,
>
> First off, for Elasticsearch questions is worth pinging the Elastic
> mailing list as that is closer monitored than this one.
>
> Back to your question, Jeetendra is right that the exception indicates
> nodata is flowing back to the es-connector and
> Spark.
> The default is 1m [1] which should be more than enough for a typical
> scenario. As a side note the scroll size is 50 per
> tasks
> (so 150 suggests 3 tasks).
>
> Once the query is made, scrolling the document is fast - likely there's
> something else at hand that causes the
> connection to timeout.
> In such cases, you can enable logging on the REST package and see what
> type of data transfer occurs between ES and Spark.
>
> Do note that if a GC occurs, that can freeze Elastic (or Spark) which
> might trigger the timeout. Consider monitoring
> Elasticsearch during
> the query and see whether anything jumps - in particular the memory
> pressure.
>
> Hope this helps,
>
> [1]
> http://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html#_network
>
> On 4/22/15 10:44 PM, Adrian Mocanu wrote:
>
>> Hi
>>
>> Thanks for the help. My ES is up.
>>
>> Out of curiosity, do you know what the timeout value is? There are
>> probably other things happening to cause the timeout;
>> I don’t think my ES is that slow but it’s possible that ES is taking too
>> long to find the data. What I see happening is
>> that it uses scroll to get the data from ES; about 150 items at a
>> time.Usual delay when I perform the same query from a
>> browser plugin ranges from 1-5sec.
>>
>> Thanks
>>
>> *From:*Jeetendra Gangele [mailto:gangele...@gmail.com]
>> *Sent:* April 22, 2015 3:09 PM
>> *To:* Adrian Mocanu
>> *Cc:* u...@spark.incubator.apache.org
>> *Subject:* Re: ElasticSearch for Spark times out
>>
>> Basically ready timeout means hat no data arrived within the specified
>> receive timeout period.
>>
>> Few thing I would suggest
>>
>> 1.are your ES cluster Up and running?
>>
>> 2. if 1 is yes then reduce the size of the Index make it few kbps and
>> then test?
>>
>> On 23 April 2015 at 00:19, Adrian Mocanu > > wrote:
>>
>> Hi
>>
>> I use the ElasticSearch package for Spark and very often it times out
>> reading data from ES into an RDD.
>>
>> How can I keep the connection alive (why doesn’t it? Bug?)
>>
>> Here’s the exception I get:
>>
>> org.elasticsearch.hadoop.serialization.EsHadoopSerializationException:
>> java.net.SocketTimeoutException: Read timed out
>>
>>  at
>> org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:86)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>  at
>> org.elasticsearch.hadoop.serialization.ParsingUtils.doSeekToken(ParsingUtils.java:70)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>  at
>> org.elasticsearch.hadoop.serialization.ParsingUtils.seek(ParsingUtils.java:58)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>  at
>> org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:149)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>  at
>> org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:102)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>  at
>> org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:81)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>  at
>> org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:314)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>  at
>> org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>  at
>> org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>  at
>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>> ~[scala-library.jar:na]
>>
>>  at
>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>> ~[scala-library.jar:na]
>>
>>  at
>> scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>> ~[scala-library.jar:na]
>>
>>  at
>> scala.collecti

Re: Spark @ EC2: Futures timed out & Ask timed out

2015-03-17 Thread Otis Gospodnetic
Hi Akhil,

Thanks!  I think that was it.  Had to open a bunch of ports (didn't use
spark-ec2, so it didn't do that for me) and the app works fine now.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Mar 17, 2015 at 3:26 AM, Akhil Das 
wrote:

> Did you launch the cluster using spark-ec2 script? Just make sure all
> ports are open for master, slave instances security group. From the error,
> it seems its not able to connect to the driver program (port 58360)
>
> Thanks
> Best Regards
>
> On Tue, Mar 17, 2015 at 3:26 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> Hi,
>>
>> I've been trying to run a simple SparkWordCount app on EC2, but it looks
>> like my apps are not succeeding/completing.  I'm suspecting some sort of
>> communication issue.  I used the SparkWordCount app from
>> http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/
>>
>>
>> Digging through logs I found this:
>>
>>  15/03/16 21:28:20 INFO Utils: Successfully started service
>> 'driverPropsFetcher' on port 58123.
>>
>>
>>  Exception in thread "main"
>> java.lang.reflect.UndeclaredThrowableException
>>
>>
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1563)
>>
>>
>>  at
>> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
>>
>>
>>  at
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115)
>>
>>
>>  at
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
>>
>>
>>  at
>> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>>
>>
>> * Caused by: java.util.concurrent.TimeoutException: Futures timed out
>> after [30 seconds] *
>>
>>
>>  at
>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>
>>
>>  at
>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>
>>
>>  at
>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>
>>
>>  at
>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>
>>
>>  at scala.concurrent.Await$.result(package.scala:107)
>>
>>
>>
>>  at
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:127)
>>
>>
>>  at
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
>>
>>
>>  at
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
>>
>>
>>  at java.security.AccessController.doPrivileged(Native Method)
>>
>>
>>
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>
>>
>>
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>
>>
>>  ... 4 more
>>
>>
>> Or exceptions like:
>>
>> *Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>> [ActorSelection[Anchor(akka.tcp://sparkDriver@ip-10-111-222-111.ec2.internal:58360/),
>> Path(/user/CoarseGrainedScheduler)]] after [3 ms]  *
>>
>>  at
>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>>
>>
>>  at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>>
>>
>>
>>  at
>> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
>>
>>
>>  at
>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
>>
>>
>>  at
>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
>>
>>
>>  at
>> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
>>
>>
>>  at
>> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
>>
>>
>>  at
>> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
>>
>>
>>  at java.la

Spark @ EC2: Futures timed out & Ask timed out

2015-03-16 Thread Otis Gospodnetic
Hi,

I've been trying to run a simple SparkWordCount app on EC2, but it looks
like my apps are not succeeding/completing.  I'm suspecting some sort of
communication issue.  I used the SparkWordCount app from
http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/


Digging through logs I found this:

 15/03/16 21:28:20 INFO Utils: Successfully started service
'driverPropsFetcher' on port 58123.


 Exception in thread "main" java.lang.reflect.UndeclaredThrowableException



 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1563)


 at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)


 at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115)


 at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)


 at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)


* Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[30 seconds] *


 at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)


 at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)


 at
scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)


 at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)


 at scala.concurrent.Await$.result(package.scala:107)



 at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:127)


 at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)


 at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)


 at java.security.AccessController.doPrivileged(Native Method)



 at javax.security.auth.Subject.doAs(Subject.java:415)



 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)


 ... 4 more


Or exceptions like:

*Caused by: akka.pattern.AskTimeoutException: Ask timed out on
[ActorSelection[Anchor(akka.tcp://sparkDriver@ip-10-111-222-111.ec2.internal:58360/),
Path(/user/CoarseGrainedScheduler)]] after [3 ms]  *

 at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)


 at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)



 at
scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)


 at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)


 at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)


 at
akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)


 at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)


 at
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)


 at java.lang.Thread.run(Thread.java:745)


This is in EC2 and I have ports 22, 7077, 8080, and 8081 open to any source.
But maybe I need to do something, too?

I do see Master sees Workers and Workers do connect to the Master.

I did run this in spark-shell, and it runs without problems;
scala> val something = sc.parallelize(1 to 1000).collect().filter(_<1000

This is how I submitted the job (on the Master machine):

$ spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class
com.cloudera.sparkwordcount.SparkWordCount --executor-memory 256m --master
spark://ip-10-171-32-62:7077
wc-spark/target/sparkwordcount-0.0.1-SNAPSHOT.jar /usr/share/dict/words 0

Any help would be greatly appreciated.

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: throughput in the web console?

2015-02-25 Thread Otis Gospodnetic
Hi Josh,

SPM will show you this info. I see you use Kafka, too, whose numerous metrics 
you can also see in SPM side by side with your Spark metrics.  Sounds like 
trends is what you are after, so I hope this helps.  See http://sematext.com/spm

Otis

 

> On Feb 24, 2015, at 11:59, Josh J  wrote:
> 
> Hi,
> 
> I plan to run a parameter search varying the number of cores, epoch, and 
> parallelism. The web console provides a way to archive the previous runs, 
> though is there a way to view in the console the throughput? Rather than 
> logging the throughput separately to the log files and correlating the logs 
> files to the web console processing times?
> 
> Thanks,
> Josh

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark job for demoing Spark metrics monitoring?

2015-01-21 Thread Otis Gospodnetic
Hi,

I'll be showing our Spark monitoring
 at the
upcoming Spark Summit in NYC.  I'd like to run some/any Spark job that
really exercises Spark and makes it emit all its various metrics (so the
metrics charts are full of data and not blank or flat and boring).

Since we don't use Spark at Sematext yet, I was wondering if anyone could
recommend some Spark app/job that's easy to run, just to get some Spark job
to start emitting various Spark metrics?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: monitoring for spark standalone

2014-12-11 Thread Otis Gospodnetic
Hi Judy,

SPM monitors Spark.  Here are some screenshots:
http://blog.sematext.com/2014/10/07/apache-spark-monitoring/

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Dec 8, 2014 at 2:35 AM, Judy Nash 
wrote:

>  Hello,
>
>
>
> Are there ways we can programmatically get health status of master & slave
> nodes, similar to Hadoop Ambari?
>
>
>
> Wiki seems to suggest there are only web UI or instrumentations (
> http://spark.apache.org/docs/latest/monitoring.html).
>
>
>
> Thanks,
> Judy
>
>
>


Re: Monitoring Spark

2014-12-02 Thread Otis Gospodnetic
Hi Isca,

I think SPM can do that for you:
http://blog.sematext.com/2014/10/07/apache-spark-monitoring/

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Dec 2, 2014 at 11:57 PM, Isca Harmatz  wrote:

> hello,
>
> im running spark on a cluster and i want to monitor how many nodes/ cores
> are active in different (specific) points of the program.
>
> is there any way to do this?
>
> thanks,
>   Isca
>


[ANN] Spark resources searchable

2014-11-04 Thread Otis Gospodnetic
Hi everyone,

We've recently added indexing of all Spark resources to
http://search-hadoop.com/spark .

Everything is nicely searchable:
* user & dev mailing lists
* JIRA issues
* web site
* wiki
* source code
* javadoc.

Maybe it's worth adding to http://spark.apache.org/community.html ?

Enjoy!

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: Measuring Performance in Spark

2014-10-31 Thread Otis Gospodnetic
Hi Mahsa,

Use SPM .  See
http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ .

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Oct 31, 2014 at 1:00 PM, mahsa  wrote:

> Is there any tools like Ganglia that I can use to get performance on Spark
> or
> I need to do it myself?
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Measuring-Performance-in-Spark-tp17376p17836.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Spark Monitoring with Ganglia

2014-10-08 Thread Otis Gospodnetic
Hi,

If using Ganglia is not an absolute requirement, check out SPM
 for Spark --
http://blog.sematext.com/2014/10/07/apache-spark-monitoring/

It monitors all Spark metrics (i.e. you don't need to figure out what you
need to monitor, how to get it, how to graph it, etc.) and has alerts and
anomaly detection built in..  If you use Spark with Hadoop, Kafka,
Cassandra, HBase, Elasticsearch SPM monitors them, too, so you can have
visibility into all your tech in one place.

You can send Spark event logs to Logsene ,
too, if you want, and then you can have your performance and log graphs
side by side.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On Wed, Oct 1, 2014 at 4:30 PM, danilopds  wrote:

> Hi,
> I need monitoring some aspects about my cluster like network and resources.
>
> Ganglia looks like a good option for what I need.
> Then, I found out that Spark has support to Ganglia.
>
> On the Spark monitoring webpage there is this information:
> "To install the GangliaSink you’ll need to perform a custom build of
> Spark."
>
> I found in my Spark the directory: "/extras/spark-ganglia-lgpl". But I
> don't
> know how to install it.
>
> How can I install the Ganglia to monitoring Spark cluster?
> How I do this custom build?
>
> Thanks!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Monitoring-with-Ganglia-tp15538.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Larger heap leads to perf degradation due to GC

2014-10-06 Thread Otis Gospodnetic
Hi,

The other option to consider is using G1 GC, which should behave better
with large heaps.  But pointers are not compressed in heaps > 32 GB in
size, so you may be better off staying under 32 GB.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Oct 6, 2014 at 8:08 PM, Mingyu Kim  wrote:

> Ok, cool. This seems to be general issues in JVM with very large heaps. I
> agree that the best workaround would be to keep the heap size below 32GB.
> Thanks guys!
>
> Mingyu
>
> From: Arun Ahuja 
> Date: Monday, October 6, 2014 at 7:50 AM
> To: Andrew Ash 
> Cc: Mingyu Kim , "user@spark.apache.org" <
> user@spark.apache.org>, Dennis Lawler 
> Subject: Re: Larger heap leads to perf degradation due to GC
>
> We have used the strategy that you suggested, Andrew - using many workers
> per machine and keeping the heaps small (< 20gb).
>
> Using a large heap resulted in workers hanging or not responding (leading
> to timeouts).  The same dataset/job for us will fail (most often due to
> akka disassociated or fetch failures errors) with 10 cores / 100 executors,
> 60 gb per executor while succceed with 1 core / 1000 executors / 6gb per
> executor.
>
> When the job does succceed with more cores per executor and larger heap it
> is usually much slower than the smaller executors (the same 8-10 min job
> taking 15-20 min to complete)
>
> The unfortunate downside of this has been, we have had some large
> broadcast variables which may not fit into memory (and unnecessarily
> duplicated) when using the smaller executors.
>
> Most of this is anecdotal but for the most part we have had more success
> and consistency with more executors with smaller memory requirements.
>
> On Sun, Oct 5, 2014 at 7:20 PM, Andrew Ash  wrote:
>
>> Hi Mingyu,
>>
>> Maybe we should be limiting our heaps to 32GB max and running multiple
>> workers per machine to avoid large GC issues.
>>
>> For a 128GB memory, 32 core machine, this could look like:
>>
>> SPARK_WORKER_INSTANCES=4
>> SPARK_WORKER_MEMORY=32
>> SPARK_WORKER_CORES=8
>>
>> Are people running with large (32GB+) executor heaps in production?  I'd
>> be curious to hear if so.
>>
>> Cheers!
>> Andrew
>>
>> On Thu, Oct 2, 2014 at 1:30 PM, Mingyu Kim  wrote:
>>
>>> This issue definitely needs more investigation, but I just wanted to
>>> quickly check if anyone has run into this problem or has general guidance
>>> around it. We’ve seen a performance degradation with a large heap on a
>>> simple map task (I.e. No shuffle). We’ve seen the slowness starting around
>>> from 50GB heap. (I.e. spark.executor.memoty=50g) And, when we checked the
>>> CPU usage, there were just a lot of GCs going on.
>>>
>>> Has anyone seen a similar problem?
>>>
>>> Thanks,
>>> Mingyu
>>>
>>
>>
>


Re: JMXSink for YARN deployment

2014-09-13 Thread Otis Gospodnetic
Hi,

Jerry said "I'm guessing", so maybe the thing to try is to check if his
guess is correct.

What about running sudo lsof | grep metrics.properties ?  I imagine you
should be able to see it if the file was found and read.  If Jerry is
right, then I think you will NOT see it.

Next, how about trying some bogus value in metrics.properties, like *.sink.
jmx.class=org.apache.spark.metrics.sink.*BUGUSSink*?  If the file is being
read then specifying such bogus value should make something log an error or
throw exception at start, I assume.  If you don't see this then maybe this
file is not being read at all.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On Thu, Sep 11, 2014 at 9:18 AM, Shao, Saisai  wrote:

>  Hi,
>
>
>
> I’m guessing the problem is that driver or executor cannot get the
> metrics.properties configuration file in the yarn container, so metrics
> system cannot load the right sinks.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Vladimir Tretyakov [mailto:vladimir.tretya...@sematext.com]
> *Sent:* Thursday, September 11, 2014 7:30 PM
> *To:* user@spark.apache.org
> *Subject:* JMXSink for YARN deployment
>
>
>
> Hello, we are in Sematext (https://apps.sematext.com/) are writing
> Monitoring tool for Spark and we came across one question:
>
>
>
> How to enable JMX metrics for YARN deployment?
>
>
>
> We put "*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink"
>
> to file $SPARK_HOME/conf/metrics.properties but it doesn't work.
>
>
>
> Everything works in Standalone mode, but not in YARN mode.
>
>
>
> Can somebody help?
>
>
>
> Thx!
>
>
>
> PS: I've found also
> https://stackoverflow.com/questions/23529404/spark-on-yarn-how-to-send-metrics-to-graphite-sink/25786112
> without answer.
>


Deployment model popularity - Standard vs. YARN vs. Mesos vs. SIMR

2014-09-07 Thread Otis Gospodnetic
Hi,

I'm trying to determine which Spark deployment models are the most popular
- Standalone, YARN, Mesos, or SIMR.  Anyone knows?

I thought I'm use search-hadoop.com to help me figure this out and this is
what I found:


1) Standalone
http://search-hadoop.com/?q=standalone&fc_project=Spark&fc_type=mail+_hash_+user
(seems the most popular?)

2) YARN
 http://search-hadoop.com/?q=yarn&fc_project=Spark&fc_type=mail+_hash_+user
(almost as popular as standalone?)

3) Mesos
http://search-hadoop.com/?q=mesos&fc_project=Spark&fc_type=mail+_hash_+user
(less popular than yarn or standalone)

4) SIMR
http://search-hadoop.com/?q=simr&fc_project=Spark&fc_type=mail+_hash_+user
(no mentions?)

This is obviously not very accurate but is the order right?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/