from:"\"Himanish Kushary\""

Filter data from one RDD based on data from another RDD

2015-02-19 Thread Himanish Kushary

Hi, I have two RDD's with csv data as below : RDD-1 101970_5854301840,fbcf5485-e696-4100-9468-a17ec7c5bb43,19229261643 101970_5854301839,fbaf5485-e696-4100-9468-a17ec7c5bb39,9229261645 101970_5854301839,fbbf5485-e696-4100-9468-a17ec7c5bb39,9229261647 101970_17038953,546853f9-cf07-4700-b202-00f21

Re: Filter data from one RDD based on data from another RDD

2015-02-25 Thread Himanish Kushary

do a join (or a variant like cogroup, > leftOuterJoin, subtractByKey etc. found in PairRDDFunctions) > > the downside is this requires a shuffle of both your RDDs > > On Thu, Feb 19, 2015 at 3:36 PM, Himanish Kushary > wrote: > >> Hi, >> >> I have two RDD&

Re: High CPU usage in Driver

2015-02-27 Thread Himanish Kushary

d the settings for the parameters *spark.akka.frameSize (= 500), **spark.akka.timeout,**spark.akka.askTimeout and **spark.core.connection.ack.wait.timeout *to get rid of any insufficient frame size and timeout errors Thanks Himanish On Thu, Feb 26, 2015 at 5:00 PM, Himanish Kushary wrote: > Hi,

Re: Tools to manage workflows on Spark

2015-03-01 Thread Himanish Kushary

We are running our Spark jobs on Amazon AWS and are using AWS Datapipeline for orchestration of the different spark jobs. AWS datapipeline provides automatic EMR cluster provisioning, retry on failure,SNS notification etc. out of the box and works well for us. On Sun, Mar 1, 2015 at 7:02 PM, F

[no subject]

2015-03-25 Thread Himanish Kushary

Hi, I have a RDD of pairs of strings like below : (A,B) (B,C) (C,D) (A,D) (E,F) (B,F) I need to transform/filter this into a RDD of pairs that does not repeat a string once it has been used once. So something like , (A,B) (C,D) (E,F) (B,C) is out because B has already ben used in (A,B), (A,D)

Re:

2015-03-25 Thread Himanish Kushary

PM, Nathan Kronenfeld < nkronenfeld@uncharted.software> wrote: > What would it do with the following dataset? > > (A, B) > (A, C) > (B, D) > > > On Wed, Mar 25, 2015 at 1:02 PM, Himanish Kushary > wrote: > >> Hi, >> >> I have a RDD of pair

Fwd:

2015-04-02 Thread Himanish Kushary

calable solution. Thanks On Wed, Mar 25, 2015 at 3:13 PM, Nathan Kronenfeld < nkronenfeld@uncharted.software> wrote: > You're generating all possible pairs? > > In that case, why not just generate the sequential pairs you want from the > start? > > On Wed, Mar 25, 2015 a

Filter data from one RDD based on data from another RDD

Re: Filter data from one RDD based on data from another RDD

Re: High CPU usage in Driver

Re: Tools to manage workflows on Spark

[no subject]

Re:

Fwd:

7 matches

Site Navigation

Mail list logo

Footer information