Re: Data partitioning and node tracking in Spark-GraphX

MUHAMMAD AAMIR Sun, 17 May 2015 10:56:59 -0700

Can you please elaborate the way to fetch the records from a particular
partition (node in our case) For example, my RDD is distributed to 10 nodes
and i want to fetch the data of one particular node/partition  i.e.
partition/node with index "5".
How can i do this?
I have tried mapPartitionswithIndex as well as partitions.foreach
functions. However, these are expensive. Does any body know more efficient
way ?


Thanks in anticipation.


On Thu, Apr 16, 2015 at 5:49 PM, Evo Eftimov <evo.efti...@isecc.com> wrote:

> Well you can have a two level index structure, still without any need for
> physical cluster node awareness
>
>
>
> Level 1 Index is the previously described partitioned [K,V] RDD – this
> gets you to the value (RDD element) you need on the respective cluster node
>
>
>
> Level 2 Index – it will be built and reside within the Value of each [K,V]
> RDD element – so after you retrieve the appropriate Element from the
> appropriate cluster node based on Level 1 Index, then you query the Value
> in the element based on Level 2 Index
>
>
>
> *From:* MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com]
> *Sent:* Thursday, April 16, 2015 4:32 PM
>
> *To:* Evo Eftimov
> *Cc:* user@spark.apache.org
> *Subject:* Re: Data partitioning and node tracking in Spark-GraphX
>
>
>
> Thanks a lot for the reply. Indeed it is useful but to be more precise i
> have 3D data and want to index it using octree. Thus i aim to build a two
> level indexing mechanism i.e. First at global level i want to partition and
> send the data to the nodes then at node level i again want to use octree to
> inded my data at local level.
>
> Could you please elaborate the solution in this context ?
>
>
>
> On Thu, Apr 16, 2015 at 5:23 PM, Evo Eftimov <evo.efti...@isecc.com>
> wrote:
>
> Well you can use a [Key, Value] RDD and partition it based on hash
> function on the Key and even a specific number of partitions (and hence
> cluster nodes). This will a) index the data, b) divide it and send it to
> multiple nodes. Re your last requirement - in a cluster programming
> environment/framework your app code should not be bothered on which
> physical node exactly, a partition resides
>
>
>
> Regards
>
> Evo Eftimov
>
>
>
> *From:* MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com]
> *Sent:* Thursday, April 16, 2015 4:20 PM
> *To:* Evo Eftimov
> *Cc:* user@spark.apache.org
> *Subject:* Re: Data partitioning and node tracking in Spark-GraphX
>
>
>
> I want to use Spark functions/APIs to do this task. My basic purpose is to
> index the data and divide and send it to multiple nodes. Then at the time
> of accessing i want to reach the right node and data partition. I don't
> have any clue how to do this.
>
> Thanks,
>
>
>
> On Thu, Apr 16, 2015 at 5:13 PM, Evo Eftimov <evo.efti...@isecc.com>
> wrote:
>
> How do you intend to "fetch the required data" - from within Spark or using
> an app / code / module outside Spark
>
> -----Original Message-----
> From: mas [mailto:mas.ha...@gmail.com]
> Sent: Thursday, April 16, 2015 4:08 PM
> To: user@spark.apache.org
> Subject: Data partitioning and node tracking in Spark-GraphX
>
> I have a big data file, i aim to create index on the data. I want to
> partition the data based on user defined function in Spark-GraphX (Scala).
> Further i want to keep track the node on which a particular data partition
> is send and being processed so i could fetch the required data by accessing
> the right node and data partition.
> How can i achieve this?
> Any help in this regard will be highly appreciated.
>
>
>
> --
> View this message in context:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-no
> de-tracking-in-Spark-GraphX-tp22527.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
> commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>
> --
>
> Regards,
> Muhammad Aamir
>
>
> *CONFIDENTIALITY:This email is intended solely for the person(s) named and
> may be confidential and/or privileged.If you are not the intended
> recipient,please delete it,notify me and do not copy,use,or disclose its
> content.*
>
>
>
>
>
> --
>
> Regards,
> Muhammad Aamir
>
>
> *CONFIDENTIALITY:This email is intended solely for the person(s) named and
> may be confidential and/or privileged.If you are not the intended
> recipient,please delete it,notify me and do not copy,use,or disclose its
> content.*
>



-- 
Regards,
Muhammad Aamir


*CONFIDENTIALITY:This email is intended solely for the person(s) named and
may be confidential and/or privileged.If you are not the intended
recipient,please delete it,notify me and do not copy,use,or disclose its
content.*

Re: Data partitioning and node tracking in Spark-GraphX

Reply via email to