No you cannot force RDD to a particular node. Mayur Rustagi Ph: +919632149971 h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi
On Fri, Feb 21, 2014 at 8:30 AM, dachuan <hdc1...@gmail.com> wrote: > Mayur, is there any way to command each RDD's partition to be some node? > > The input data is usually stored in HDFS and has its own preferred > locations. But I am just curious about it, whether we can force the RDD's > partitions to be stored in this way regardless of how you are stored now. > > thanks. > > > On Fri, Feb 21, 2014 at 11:00 AM, Mayur Rustagi > <mayur.rust...@gmail.com>wrote: > >> Using the storage tab on Spark Web UI you can find that. >> Compression will help certainly !!! >> >> Mayur Rustagi >> Ph: +919632149971 >> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com >> https://twitter.com/mayur_rustagi >> >> >> >> On Fri, Feb 21, 2014 at 12:09 AM, vinay Bajaj <vbajaj2...@gmail.com>wrote: >> >>> Hi Mayur, >>> >>> Thanks a lot for very quick reply. >>> >>> I have few questions regarding RDD >>> 1) how do I know RDD placement per machine as in which RDD data is >>> cached at what location ? >>> 2) how do I know total space taken by each RDD created by my >>> program/module ? >>> 3) does enabling compression on RDD help ? >>> >>> Thanks, >>> Vinay >>> >>> >>> >>> >>> On Thu, Feb 20, 2014 at 11:44 PM, Mayur Rustagi <mayur.rust...@gmail.com >>> > wrote: >>> >>>> Its highly likely that locality type will not become a bottleneck as >>>> spark tries to schedule the tasks where the data is cached, 2 thing might >>>> help >>>> 1. Make sure you have enough memory to cache the whole data as a RDD, >>>> keep in mind sometimes the RDD may be higher than just raw text as Java >>>> objects may have overhead >>>> 2. you can try and increase the replication factor of data, so that >>>> data is available on all workers hence is faster to cache in other workers >>>> if they already dont have it(in non-local cases per say). >>>> >>>> Regards >>>> Mayur >>>> >>>> Mayur Rustagi >>>> Ph: +919632149971 >>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com >>>> https://twitter.com/mayur_rustagi >>>> >>>> >>>> >>>> On Thu, Feb 20, 2014 at 12:29 AM, vinay Bajaj <vbajaj2...@gmail.com>wrote: >>>> >>>>> Hi Mayur >>>>> >>>>> I am trying to analyse the Apache logs which contains the traffic >>>>> details. Basically trying to figure out the statistics on Data points such >>>>> as total views from each country and unique URLs. And i have one cluster >>>>> running with 4 workers and one master (total space 240GB and 96 cores). >>>>> And >>>>> i was trying some things to make it faster so was stuck with these >>>>> locality >>>>> type of the process. >>>>> >>>>> Regards >>>>> Vinay Bajaj >>>>> >>>>> >>>>> On Wed, Feb 19, 2014 at 11:34 PM, Mayur Rustagi < >>>>> mayur.rust...@gmail.com> wrote: >>>>> >>>>>> Process local implies the data is cached on the same jvm as the task, >>>>>> node local means its cached on the same system but not in the same jvm(on >>>>>> some other core perhaps). Wait modification is a tune process depends on >>>>>> your system configuration (memory vs disk vs network). I frankly never >>>>>> had >>>>>> to modify it..can you share your usecase that is requiring you to do >>>>>> that? >>>>>> >>>>>> Mayur Rustagi >>>>>> Ph: +919632149971 >>>>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com >>>>>> https://twitter.com/mayur_rustagi >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Feb 19, 2014 at 1:59 AM, vinay Bajaj <vbajaj2...@gmail.com>wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> It will be very helpful if anyone could elaborate your ideas on >>>>>>> spark.locality.wait and multiple locality levels (process-local, >>>>>>> node-local, rack-local and then any) and what is the best configuration >>>>>>> i >>>>>>> can achieve by modifying this wait and what is the difference >>>>>>> between process local and node local. >>>>>>> >>>>>>> Thanks >>>>>>> Vinay Bajaj >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > > -- > Dachuan Huang > Cellphone: 614-390-7234 > 2015 Neil Avenue > Ohio State University > Columbus, Ohio > U.S.A. > 43210 >