BTW you can see the number of parallel tasks in the application UI 
(http://localhost:4040) or in the log messages (e.g. when it says progress: 
17/20, that means there are 20 tasks total and 17 are done). Spark will try to 
use at least one task per core in local mode so there might be more of them 
here, but if your file is big it will also have at least one task per 32 MB 
block of the file.

Matei

On Jul 14, 2014, at 6:39 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:

> I see, so here might be the problem. With more cores, there's less memory 
> available per core, and now many of your threads are doing external hashing 
> (spilling data to disk), as evidenced by the calls to 
> ExternalAppendOnlyMap.spill. Maybe with 10 threads, there was enough memory 
> per task to do all its hashing there. It's true though that these threads 
> appear to be CPU-bound, largely due to Java Serialization. You could get this 
> to run quite a bit faster using Kryo. However that won't eliminate the issue 
> of spilling here.
> 
> Matei
> 
> On Jul 14, 2014, at 1:02 PM, lokesh.gidra <lokesh.gi...@gmail.com> wrote:
> 
>> I am only playing with 'N' in local[N]. I thought that by increasing N, Spark
>> will automatically use more parallel tasks. Isn't it so? Can you please tell
>> me how can I modify the number of parallel tasks?
>> 
>> For me, there are hardly any threads in BLOCKED state in jstack output. In
>> 'top' I see my application consuming all the 48 cores all the time with
>> N=48.
>> 
>> I am attaching two jstack outputs that I took will the application was
>> running.
>> 
>> 
>> Lokesh
>> 
>> lessoutput3.lessoutput3
>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput3.lessoutput3>
>>   
>> lessoutput4.lessoutput4
>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput4.lessoutput4>
>>   
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9640.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 

Reply via email to