Re: the spark worker assignment Question?

lihu Mon, 06 Jan 2014 08:23:17 -0800

Sorry for my late reply, because the gmail do not notice me.

It is my fault that cause this problem.
I take the config parameter* spark.core.max *as the maximum num in every
machine, but it is the total number in fact.


and thank Andrew and Mayur very much, your answer let understand more about
the spark system.



On Fri, Jan 3, 2014 at 2:28 AM, Mayur Rustagi <[email protected]>wrote:

> Andrew that a good point. I have done that for handling a large number of
> queries. Typically to get good response time on large number of queries in
> parallel, you would want them replicated on a lot of systems.
> Regards
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Thu, Jan 2, 2014 at 11:22 PM, Andrew Ash <[email protected]> wrote:
>
>> That sounds right Mayur.
>>
>> Also in 0.8.1 I hear there's a new repartition method that you might be
>> able to use to further distribute the data.  But if your data is so small
>> that it fits in just a couple blocks, why are you using 20 machines just to
>> process a quarter GB of data?  Is the computation on each bit extremely
>> intensive?
>>
>>
>> On Thu, Jan 2, 2014 at 12:39 PM, Mayur Rustagi 
>> <[email protected]>wrote:
>>
>>> I have experienced a similar issue. The easiest fix I found was to
>>> increase the replication of the data being used in the worker to the number
>>> of workers you want to use for processing. The RDD seem to created on all
>>> the machines where the blocks are replicated. Please correct me if I am
>>> wrong.
>>>
>>> Regards
>>> Mayur
>>>
>>> Mayur Rustagi
>>> Ph: +919632149971
>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>> https://twitter.com/mayur_rustagi
>>>
>>>
>>>
>>> On Thu, Jan 2, 2014 at 10:46 PM, Andrew Ash <[email protected]>wrote:
>>>
>>>> Hi lihu,
>>>>
>>>> Maybe the data you're accessing is in in HDFS and only resides on 4 of
>>>> your 20 machines because it's only about 4 blocks (at default 64MB / block
>>>> that's around a quarter GB).  Where is your source data located and how is
>>>> it stored?
>>>>
>>>> Andrew
>>>>
>>>>
>>>> On Thu, Jan 2, 2014 at 7:53 AM, lihu <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>    I run  spark on a cluster with 20 machine, but when I start an
>>>>> application use the spark-shell, there only 4 machine is working , the
>>>>> other with just idle, without memery and cpu used, I watch this through
>>>>> webui.
>>>>>
>>>>>    I wonder the other machine maybe  busy, so i watch the machines
>>>>> using  "top" and "free" command, but this is not。
>>>>>
>>>>>   * So I just wonder why not spark assignment work to all all the 20
>>>>> machine? this is not a good resource usage.*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
*Best Wishes!*

*Li Hu(李浒) | Graduate Student*

*Institute for Interdisciplinary Information Sciences(IIIS
<http://iiis.tsinghua.edu.cn/>)*
*Tsinghua University, China*

*Email: [email protected] <[email protected]>*
*Tel  : +86 15120081920*
*Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/
<http://iiis.tsinghua.edu.cn/zh/lihu/>*

Re: the spark worker assignment Question?

Reply via email to