Re: Accessing the reduce key

2014-03-20 Thread Surendranauth Hiraman
to occur >>>>> together since the fact that we are looping through the Seq is out of >>>>> Spark's control. >>>>> >>>>> -Suren >>>>> >>>>> >>>>> >>>>> >>>>> On Thu

Re: Accessing the reduce key

2014-03-20 Thread Mayur Rustagi
gt; >>>> -Suren >>>> >>>> >>>> >>>> >>>> On Thu, Mar 20, 2014 at 9:48 AM, Surendranauth Hiraman < >>>> suren.hira...@velos.io> wrote: >>>> >>>>> Hi, >>>>> >>>

Re: Accessing the reduce key

2014-03-20 Thread Surendranauth Hiraman
rote: >>> >>>> Hi, >>>> >>>> My team is trying to replicate an existing Map/Reduce process in Spark. >>>> >>>> Basically, we are creating Bloom Filters for quick set membership tests >>>> within our processing pipeline. >

Re: Accessing the reduce key

2014-03-20 Thread Surendranauth Hiraman
et membership tests >>> within our processing pipeline. >>> >>> We have a single column (call it group_id) that we use to partition into >>> sets. >>> >>> As you would expect, in the map phase, we emit the group_id as the key >>> and

Re: Accessing the reduce key

2014-03-20 Thread Mayur Rustagi
> As you would expect, in the map phase, we emit the group_id as the key >> and in the reduce phase, we instantiate the Bloom Filter for a given key in >> the setup() method and persist that Bloom Filter in the cleanup() method. >> >> In Spark, we can do something simil

Re: Accessing the reduce key

2014-03-20 Thread Surendranauth Hiraman
ce phase, we instantiate the Bloom Filter for a given key in the > setup() method and persist that Bloom Filter in the cleanup() method. > > In Spark, we can do something similar with map() and reduceByKey() but we > have the following questions. > > > 1. Accessing the reduce key

Accessing the reduce key

2014-03-20 Thread Surendranauth Hiraman
. Accessing the reduce key In reduceByKey(), how do we get access to the specific key within the reduce function? 2. Equivalent of setup/cleanup Where should we instantiate and persist each Bloom Filter by key? In the driver and then pass in the references to the reduce function? But if so, how does the