Re: Troubleshooting JVM OOM during Spark Unit Tests

2014-11-22 Thread Reynold Xin
What does /tmp/jvm-21940/hs_error.log tell you? It might give hints to what threads are allocating the extra off-heap memory. On Fri, Nov 21, 2014 at 1:50 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Howdy folks, I’m trying to understand why I’m getting “insufficient memory”

Why Executor Deserialize Time takes more than 300ms?

2014-11-22 Thread Xuelin Cao
In our experimental cluster (1 driver, 5 workers), we tried the simplest example:   sc.parallelize(Range(0, 100), 2).count  In the event log, we found the executor takes too much time on deserialization, about 300 ~ 500ms, and the execution time is only 1ms.  Our servers are with 2.3G Hz CPU

Re: Why Executor Deserialize Time takes more than 300ms?

2014-11-22 Thread Imran Rashid
Hi Xuelin, this type of question is probably better asked on the spark-user mailing list, u...@spark.apache.org http://apache-spark-user-list.1001560.n3.nabble.com Do you mean the very first set of tasks take 300 - 500 ms to deserialize? That is most likely because of the time taken to ship the

Re: Why Executor Deserialize Time takes more than 300ms?

2014-11-22 Thread Xuelin Cao
Thanks Imran, The problems is, *every time* I run the same task, the deserialization time is around 300~500ms. I don't know if this is a normal case. -- View this message in context:

java.lang.OutOfMemoryError at simple local test

2014-11-22 Thread rzykov
Dear all, Unfortunately I've not got ant respond in users forum. That's why I decided to publish this question here. We encountered problems of failed jobs with huge amount of data. For example, an application works perfectly with relative small sized data, but when it grows in 2 times this

Re: How spark and hive integrate in long term?

2014-11-22 Thread Cheng Lian
Hey Zhan, This is a great question. We are also seeking for a stable API/protocol that works with multiple Hive versions (esp. 0.12+). SPARK-4114 https://issues.apache.org/jira/browse/SPARK-4114 was opened for this. Did some research into HCatalog recently, but I must confess that I’m not an

Re: How spark and hive integrate in long term?

2014-11-22 Thread Cheng Lian
Should emphasize that this is still a quick and rough conclusion, will investigate this in more detail after 1.2.0 release. Anyway we really like to provide Hive support in Spark SQL as smooth and clean as possible for both developers and end users. On 11/22/14 11:05 PM, Cheng Lian wrote:

Re: Troubleshooting JVM OOM during Spark Unit Tests

2014-11-22 Thread Nicholas Chammas
Here’s that log file https://gist.github.com/nchammas/08d3a3a02486cf602ceb from a different run of the unit tests that also failed. I’m not sure what to look for. If it matters any, I also changed JAVA_OPTS as follows for this run: export JAVA_OPTS=-Xms512m -Xmx1024m -XX:PermSize=64m

Re: How spark and hive integrate in long term?

2014-11-22 Thread Zhan Zhang
Thanks Cheng for the insights. Regarding the HCatalog, I did some initial investigation too and agree with you. As of now, it seems not a good solution. I will try to talk to Hive people to see whether there is such guarantee for downward compatibility for thrift protocol. By the way, I tried

Re: sbt publish-local fails, missing spark-network-common

2014-11-22 Thread Prashant Sharma
Can you update to latest master and see if this issue exists. On Nov 21, 2014 10:58 PM, pedrorodriguez ski.rodrig...@gmail.com wrote: Haven't found one yet, but work in AMPlab/at ampcamp so I will see if I can find someone who would know more about this (maybe reynold since he rolled out

Re: How spark and hive integrate in long term?

2014-11-22 Thread Patrick Wendell
There are two distinct topics when it comes to hive integration. Part of the 1.3 roadmap will likely be better defining the plan for Hive integration as Hive adds future versions. 1. Ability to interact with Hive metastore's from different versions == I.e. if a user has a metastore, can Spark SQL

Re: Apache infra github sync down

2014-11-22 Thread Patrick Wendell
Hi All, Unfortunately this went back down again. I've opened a new JIRA to track it: https://issues.apache.org/jira/browse/INFRA-8688 - Patrick On Tue, Nov 18, 2014 at 10:24 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, The Apache--github mirroring is not working right now and