subject:"High GC time"

High GC time when setting custom input partitions

2016-04-10 Thread Johnny W.

Hi spark-user, I am using spark 1.6 to build reverse index for one month of twitter data (~50GB). The split size of HDFS is 1GB, thus by default sc.textFile creates 50 partitions. I'd like to increase the parallelism by increase the number of input partitions. Thus, I use textFile(..., 200) to

Re: Spark SQL High GC time

2015-05-25 Thread Nick Travers

Hi Yuming - I was running into the same issue with larger worker nodes a few weeks ago. The way I managed to get around the high GC time, as per the suggestion of some others, was to break each worker node up into individual workers of around 10G in size. Divide your cores accordingly. The other

Re: High GC time

2015-03-17 Thread Xiangrui Meng

The official guide may help: http://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning -Xiangrui On Tue, Mar 17, 2015 at 8:27 AM, jatinpreet jatinpr...@gmail.com wrote: Hi, I am getting very high GC time in my jobs. For smaller/real-time load, this becomes a real problem

High GC time

2015-03-17 Thread jatinpreet

Hi, I am getting very high GC time in my jobs. For smaller/real-time load, this becomes a real problem. Below are the details of a task I just ran. What could be the cause of such skewed GC times? 36 26010 SUCCESS PROCESS_LOCAL 2 / Slave1 2015/03/17 11:18:44 20 s

Re: use netty shuffle for network cause high gc time

2015-01-14 Thread lihu

I used the spark1.1 On Wed, Jan 14, 2015 at 2:24 PM, Aaron Davidson ilike...@gmail.com wrote: What version are you running? I think spark.shuffle.use.netty was a valid option only in Spark 1.1, where the Netty stuff was strictly experimental. Spark 1.2 contains an officially supported and

use netty shuffle for network cause high gc time

2015-01-13 Thread lihu

Hi, I just test groupByKey method on a 100GB data, the cluster is 20 machine, each with 125GB RAM. At first I set conf.set(spark.shuffle.use.netty, false) and run the experiment, and then I set conf.set(spark.shuffle.use.netty, true) again to re-run the experiment, but at the latter

Re: use netty shuffle for network cause high gc time

2015-01-13 Thread Andrew Ash

To confirm, lihu, are you using Spark version 1.2.0 ? On Tue, Jan 13, 2015 at 9:26 PM, lihu lihu...@gmail.com wrote: Hi, I just test groupByKey method on a 100GB data, the cluster is 20 machine, each with 125GB RAM. At first I set conf.set(spark.shuffle.use.netty, false) and run

Re: use netty shuffle for network cause high gc time

2015-01-13 Thread Aaron Davidson

What version are you running? I think spark.shuffle.use.netty was a valid option only in Spark 1.1, where the Netty stuff was strictly experimental. Spark 1.2 contains an officially supported and much more thoroughly tested version under the property spark.shuffle.blockTransferService, which is

High GC time when setting custom input partitions

Re: Spark SQL High GC time

Re: High GC time

High GC time

Re: use netty shuffle for network cause high gc time

use netty shuffle for network cause high gc time

Re: use netty shuffle for network cause high gc time

Re: use netty shuffle for network cause high gc time

8 matches

Site Navigation

Mail list logo

Footer information