hey guys Some of u may care :-) but this is just give u a background with where I am going with this. I have an IOS medical side effects app called MedicalSideFx. I built the entire underlying data layer aggregation using hadoop and currently the search is based on lucene. I am re-architecting the data layer by replacing hadoop with Spark and integrating FDA data, Canadian sidefx data and vaccines sidefx data. @Kapil , sorry but flatMapValues is being reported as undefined To give u a complete picture of the code (its inside IntelliJ but thats only for testing....the real code runs on sparkshell on my cluster) https://github.com/sanjaysubramanian/msfx_scala/blob/master/src/main/scala/org/medicalsidefx/common/utils/AersReacColumnExtractor.scala
If u were to assume dataset as 025003,Delirium,8.10,Hypokinesia,8.10,Hypotonia,8.10,,,, 025005,Arthritis,8.10,Injection site oedema,8.10,Injection site reaction,8.10,,,, This present version of the code, the flatMap works but only gives me values DeliriumHypokinesiaHypotonia ArthritisInjection site oedemaInjection site reaction What I need is 025003,Delirium 025003,Hypokinesia025003,Hypotonia025005,Arthritis 025005,Injection site oedema025005,Injection site reaction thanks sanjay From: Kapil Malik <kma...@adobe.com> To: Sean Owen <so...@cloudera.com>; Sanjay Subramanian <sanjaysubraman...@yahoo.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Sent: Wednesday, December 31, 2014 9:35 AM Subject: RE: FlatMapValues Hi Sanjay, Oh yes .. on flatMapValues, it's defined in PairRDDFunctions, and you need to import org.apache.spark.rdd.SparkContext._ to use them (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions ) @Sean, yes indeed flatMap / flatMapValues both can be used. Regards, Kapil -----Original Message----- From: Sean Owen [mailto:so...@cloudera.com] Sent: 31 December 2014 21:16 To: Sanjay Subramanian Cc: user@spark.apache.org Subject: Re: FlatMapValues >From the clarification below, the problem is that you are calling >flatMapValues, which is only available on an RDD of key-value tuples. Your map function returns a tuple in one case but a String in the other, so your RDD is a bunch of Any, which is not at all what you want. You need to return a tuple in both cases, which is what Kapil pointed out. However it's still not quite what you want. Your input is basically [key value1 value2 value3] so you want to flatMap that to (key,value1) (key,value2) (key,value3). flatMapValues does not come into play. On Wed, Dec 31, 2014 at 3:25 PM, Sanjay Subramanian <sanjaysubraman...@yahoo.com> wrote: > My understanding is as follows > > STEP 1 (This would create a pair RDD) > ======= > > reacRdd.map(line => line.split(',')).map(fields => { > if (fields.length >= 11 && !fields(0).contains("VAERS_ID")) { > > (fields(0),(fields(1)+"\t"+fields(3)+"\t"+fields(5)+"\t"+fields(7)+"\t"+fields(9))) > } > else { > "" > } > }) > > STEP 2 > ======= > Since previous step created a pair RDD, I thought flatMapValues method > will be applicable. > But the code does not even compile saying that flatMapValues is not > applicable to RDD :-( > > > reacRdd.map(line => line.split(',')).map(fields => { > if (fields.length >= 11 && !fields(0).contains("VAERS_ID")) { > > (fields(0),(fields(1)+"\t"+fields(3)+"\t"+fields(5)+"\t"+fields(7)+"\t"+fields(9))) > } > else { > "" > } > }).flatMapValues(skus => > skus.split('\t')).saveAsTextFile("/data/vaers/msfx/reac/" + outFile) > > > SUMMARY > ======= > when a dataset looks like the following > > 1,red,blue,green > 2,yellow,violet,pink > > I want to output the following and I am asking how do I do that ? > Perhaps my code is 100% wrong. Please correct me and educate me :-) > > 1,red > 1,blue > 1,green > 2,yellow > 2,violet > 2,pink --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org