Yeah, it will be better!
On Tue, Jan 7, 2014 at 1:04 AM, Andrew Ash <[email protected]> wrote: > Hi Li, > > I've also found this setting confusing in the past. Take a look at this > change -- do you think it makes the setting more clear? > > https://github.com/apache/incubator-spark/pull/341/files > > Andrew > > > On Mon, Jan 6, 2014 at 8:19 AM, lihu <[email protected]> wrote: > >> Sorry for my late reply, because the gmail do not notice me. >> >> It is my fault that cause this problem. >> I take the config parameter* spark.core.max *as the maximum num in every >> machine, but it is the total number in fact. >> >> and thank Andrew and Mayur very much, your answer let understand more >> about the spark system. >> >> >> >> On Fri, Jan 3, 2014 at 2:28 AM, Mayur Rustagi <[email protected]>wrote: >> >>> Andrew that a good point. I have done that for handling a large number >>> of queries. Typically to get good response time on large number of queries >>> in parallel, you would want them replicated on a lot of systems. >>> Regards >>> Mayur Rustagi >>> Ph: +919632149971 >>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com >>> https://twitter.com/mayur_rustagi >>> >>> >>> >>> On Thu, Jan 2, 2014 at 11:22 PM, Andrew Ash <[email protected]>wrote: >>> >>>> That sounds right Mayur. >>>> >>>> Also in 0.8.1 I hear there's a new repartition method that you might be >>>> able to use to further distribute the data. But if your data is so small >>>> that it fits in just a couple blocks, why are you using 20 machines just to >>>> process a quarter GB of data? Is the computation on each bit extremely >>>> intensive? >>>> >>>> >>>> On Thu, Jan 2, 2014 at 12:39 PM, Mayur Rustagi <[email protected] >>>> > wrote: >>>> >>>>> I have experienced a similar issue. The easiest fix I found was to >>>>> increase the replication of the data being used in the worker to the >>>>> number >>>>> of workers you want to use for processing. The RDD seem to created on all >>>>> the machines where the blocks are replicated. Please correct me if I am >>>>> wrong. >>>>> >>>>> Regards >>>>> Mayur >>>>> >>>>> Mayur Rustagi >>>>> Ph: +919632149971 >>>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com >>>>> https://twitter.com/mayur_rustagi >>>>> >>>>> >>>>> >>>>> On Thu, Jan 2, 2014 at 10:46 PM, Andrew Ash <[email protected]>wrote: >>>>> >>>>>> Hi lihu, >>>>>> >>>>>> Maybe the data you're accessing is in in HDFS and only resides on 4 >>>>>> of your 20 machines because it's only about 4 blocks (at default 64MB / >>>>>> block that's around a quarter GB). Where is your source data located and >>>>>> how is it stored? >>>>>> >>>>>> Andrew >>>>>> >>>>>> >>>>>> On Thu, Jan 2, 2014 at 7:53 AM, lihu <[email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> I run spark on a cluster with 20 machine, but when I start an >>>>>>> application use the spark-shell, there only 4 machine is working , the >>>>>>> other with just idle, without memery and cpu used, I watch this through >>>>>>> webui. >>>>>>> >>>>>>> I wonder the other machine maybe busy, so i watch the machines >>>>>>> using "top" and "free" command, but this is not。 >>>>>>> >>>>>>> * So I just wonder why not spark assignment work to all all the >>>>>>> 20 machine? this is not a good resource usage.* >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> >> -- >> *Best Wishes!* >> >> *Li Hu(李浒) | Graduate Student* >> >> *Institute for Interdisciplinary Information Sciences(IIIS >> <http://iiis.tsinghua.edu.cn/>)* >> *Tsinghua University, China* >> >> *Email: [email protected] <[email protected]>* >> *Tel : +86 15120081920 <%2B86%2015120081920>* >> *Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/ >> <http://iiis.tsinghua.edu.cn/zh/lihu/>* >> >> >> > -- *Best Wishes!* *Li Hu(李浒) | Graduate Student* *Institute for Interdisciplinary Information Sciences(IIIS <http://iiis.tsinghua.edu.cn/>) * *Tsinghua University, China* *Email: [email protected] <[email protected]>* *Tel : +86 15120081920* *Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/ <http://iiis.tsinghua.edu.cn/zh/lihu/>*
