Can you please elaborate the way to fetch the records from a particular partition (node in our case) For example, my RDD is distributed to 10 nodes and i want to fetch the data of one particular node/partition i.e. partition/node with index "5". How can i do this? I have tried mapPartitionswithIndex as well as partitions.foreach functions. However, these are expensive. Does any body know more efficient way ?
Thanks in anticipation. On Thu, Apr 16, 2015 at 5:49 PM, Evo Eftimov <evo.efti...@isecc.com> wrote: > Well you can have a two level index structure, still without any need for > physical cluster node awareness > > > > Level 1 Index is the previously described partitioned [K,V] RDD – this > gets you to the value (RDD element) you need on the respective cluster node > > > > Level 2 Index – it will be built and reside within the Value of each [K,V] > RDD element – so after you retrieve the appropriate Element from the > appropriate cluster node based on Level 1 Index, then you query the Value > in the element based on Level 2 Index > > > > *From:* MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com] > *Sent:* Thursday, April 16, 2015 4:32 PM > > *To:* Evo Eftimov > *Cc:* user@spark.apache.org > *Subject:* Re: Data partitioning and node tracking in Spark-GraphX > > > > Thanks a lot for the reply. Indeed it is useful but to be more precise i > have 3D data and want to index it using octree. Thus i aim to build a two > level indexing mechanism i.e. First at global level i want to partition and > send the data to the nodes then at node level i again want to use octree to > inded my data at local level. > > Could you please elaborate the solution in this context ? > > > > On Thu, Apr 16, 2015 at 5:23 PM, Evo Eftimov <evo.efti...@isecc.com> > wrote: > > Well you can use a [Key, Value] RDD and partition it based on hash > function on the Key and even a specific number of partitions (and hence > cluster nodes). This will a) index the data, b) divide it and send it to > multiple nodes. Re your last requirement - in a cluster programming > environment/framework your app code should not be bothered on which > physical node exactly, a partition resides > > > > Regards > > Evo Eftimov > > > > *From:* MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com] > *Sent:* Thursday, April 16, 2015 4:20 PM > *To:* Evo Eftimov > *Cc:* user@spark.apache.org > *Subject:* Re: Data partitioning and node tracking in Spark-GraphX > > > > I want to use Spark functions/APIs to do this task. My basic purpose is to > index the data and divide and send it to multiple nodes. Then at the time > of accessing i want to reach the right node and data partition. I don't > have any clue how to do this. > > Thanks, > > > > On Thu, Apr 16, 2015 at 5:13 PM, Evo Eftimov <evo.efti...@isecc.com> > wrote: > > How do you intend to "fetch the required data" - from within Spark or using > an app / code / module outside Spark > > -----Original Message----- > From: mas [mailto:mas.ha...@gmail.com] > Sent: Thursday, April 16, 2015 4:08 PM > To: user@spark.apache.org > Subject: Data partitioning and node tracking in Spark-GraphX > > I have a big data file, i aim to create index on the data. I want to > partition the data based on user defined function in Spark-GraphX (Scala). > Further i want to keep track the node on which a particular data partition > is send and being processed so i could fetch the required data by accessing > the right node and data partition. > How can i achieve this? > Any help in this regard will be highly appreciated. > > > > -- > View this message in context: > > http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-no > de-tracking-in-Spark-GraphX-tp22527.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional > commands, e-mail: user-h...@spark.apache.org > > > > > > -- > > Regards, > Muhammad Aamir > > > *CONFIDENTIALITY:This email is intended solely for the person(s) named and > may be confidential and/or privileged.If you are not the intended > recipient,please delete it,notify me and do not copy,use,or disclose its > content.* > > > > > > -- > > Regards, > Muhammad Aamir > > > *CONFIDENTIALITY:This email is intended solely for the person(s) named and > may be confidential and/or privileged.If you are not the intended > recipient,please delete it,notify me and do not copy,use,or disclose its > content.* > -- Regards, Muhammad Aamir *CONFIDENTIALITY:This email is intended solely for the person(s) named and may be confidential and/or privileged.If you are not the intended recipient,please delete it,notify me and do not copy,use,or disclose its content.*