Re: Spark Streaming Metrics

2014-11-21 Thread Gerard Maas
Looks like metrics are not a hot topic to discuss - yet so important to sleep well when jobs are running in production. I've created Spark-4537 https://issues.apache.org/jira/browse/SPARK-4537 to track this issue. -kr, Gerard. On Thu, Nov 20, 2014 at 9:25 PM, Gerard Maas gerard.m...@gmail.com

Re: sbt publish-local fails, missing spark-network-common

2014-11-21 Thread PierreB
Hi Pedro, Exact same issue here! Have you found a workaround? Thanks P. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/sbt-publish-local-fails-missing-spark-network-common-tp9471p9475.html Sent from the Apache Spark Developers List mailing list

Why Executor Deserialize Time takes more than 300ms?

2014-11-21 Thread Xuelin Cao
In our experimental cluster (1 driver, 5 workers), we tried the simplest example: sc.parallelize(Range(0, 100), 2).count In the event log, we found the executor takes too much time on deserialization, about 300 ~ 500ms, and the execution time is only 1ms. Our servers are with 2.3G Hz CPU * 24

Re: Spark Streaming Metrics

2014-11-21 Thread andy petrella
Yo, I've discussed with some guyz from cloudera that are working (only oO) on spark-core and streaming. The streaming was telling me the same thing about the scheduling part. Do you have some nice screenshots and info about stages running, task time, akka health and things like these -- I said

Re: sbt publish-local fails, missing spark-network-common

2014-11-21 Thread pedrorodriguez
Haven't found one yet, but work in AMPlab/at ampcamp so I will see if I can find someone who would know more about this (maybe reynold since he rolled out networking improvements for the PB sort). Good to have confirmation at least one other person is having problems with this rather than

Automated github closing of issues is not working

2014-11-21 Thread Patrick Wendell
After we merge pull requests in Spark they are closed via a special message we put in each commit description (Closes #XXX). This feature stopped working around 21 hours ago causing already-merged pull requests to display as open. I've contacted Github support with the issue. No word from them

Troubleshooting JVM OOM during Spark Unit Tests

2014-11-21 Thread Nicholas Chammas
Howdy folks, I’m trying to understand why I’m getting “insufficient memory” errors when trying to run Spark Units tests within a CentOS Docker container. I’m building Spark and running the tests as follows: # build sbt/sbt -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive

How spark and hive integrate in long term?

2014-11-21 Thread Zhan Zhang
Now Spark and hive integration is a very nice feature. But I am wondering what the long term roadmap is for spark integration with hive. Both of these two projects are undergoing fast improvement and changes. Currently, my understanding is that spark hive sql part relies on hive meta store and

Re: How spark and hive integrate in long term?

2014-11-21 Thread Zhan Zhang
Thanks Dean, for the information. Hive-on-spark is nice. Spark sql has the advantage to take the full advantage of spark and allows user to manipulate the table as RDD through native spark support. When I tried to upgrade the current hive-0.13.1 support to hive-0.14.0. I found the hive parser

Re: How spark and hive integrate in long term?

2014-11-21 Thread Ted Yu
bq. spark-0.12 also has some nice feature added Minor correction: you meant Spark 1.2.0 I guess Cheers On Fri, Nov 21, 2014 at 3:45 PM, Zhan Zhang zzh...@hortonworks.com wrote: Thanks Dean, for the information. Hive-on-spark is nice. Spark sql has the advantage to take the full advantage