, Dec 31, 2014 at 3:46 PM, Sean Owen so...@cloudera.com wrote:
From the clarification below, the problem is that you are calling
flatMapValues, which is only available on an RDD of key-value tuples.
Your map function returns a tuple in one case but a String in the
other, so your RDD is a bunch
cool let me adapt that. thanks a tonregardssanjay
From: Sean Owen so...@cloudera.com
To: Sanjay Subramanian sanjaysubraman...@yahoo.com
Cc: user@spark.apache.org user@spark.apache.org
Sent: Monday, January 5, 2015 3:19 AM
Subject: Re: FlatMapValues
For the record, the solution I
PM
Subject: Re: FlatMapValues
thanks let me try that out
From: Hitesh Khamesra hiteshk...@gmail.com
To: Sanjay Subramanian sanjaysubraman...@yahoo.com
Cc: Kapil Malik kma...@adobe.com; Sean Owen so...@cloudera.com;
user@spark.apache.org user@spark.apache.org
Sent: Thursday, January
,
Canadian sidefx data and vaccines sidefx data.
@Kapil , sorry but flatMapValues is being reported as undefined
To give u a complete picture of the code (its inside IntelliJ but thats
only for testingthe real code runs on sparkshell on my cluster)
https://github.com/sanjaysubramanian
: FlatMapValues
How about this..apply flatmap on per line. And in that function, parse each
line and return all the colums as per your need.
On Wed, Dec 31, 2014 at 10:16 AM, Sanjay Subramanian
sanjaysubraman...@yahoo.com.invalid wrote:
hey guys
Some of u may care :-) but this is just give u
is as follows but the flatMapValues does not work even after I have
created the pair
RDD.reacRdd.map(line
= line.split(',')).map(fields = {
if (fields.length = 11 !fields(0).contains(VAERS_ID)) {
(fields(0),(fields(1)+\t+fields
but the flatMapValues does not work even after I have
created the pair RDD.
reacRdd.map(line = line.split(',')).map(fields = {
if (fields.length = 11 !fields(0).contains(VAERS_ID)) {
(fields(0),(fields(1)+\t
)) {
(fields(0),(fields(1)+\t+fields(3)+\t+fields(5)+\t+fields(7)+\t+fields(9)))
// Returns a pair (String, String), good
}
else {
// Returns a String, bad
}
}) // RDD[Serializable] – PROBLEM
I was not even able to apply flatMapValues since the filtered RDD passed to it
is RDD
)))
// Returns a pair (String, String), good
}
else {
// Returns a String, bad
}
}) // RDD[Serializable] – PROBLEM
I was not even able to apply flatMapValues since the filtered RDD passed
to it is RDD[Serializable] and not a pair RDD. I am surprised how your code
compiled
my code DID NOT compile saying
that flatMapValues is not defined.
In fact when I used your snippet , the code still does not compile
Error:(36, 57) value flatMapValues is not a member of
org.apache.spark.rdd.RDD[(String, String)] }).filter(pair =
pair._1.length() 0).flatMapValues
Hi Sanjay,
Oh yes .. on flatMapValues, it's defined in PairRDDFunctions, and you need to
import org.apache.spark.rdd.SparkContext._ to use them
(http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
)
@Sean, yes indeed flatMap / flatMapValues both can
), [from, to,
value])
ean and key are string
from and to are DateTime
value is a Double
JavaPairRDDStringString, ListSerializable eanKeyTsParameters =
javaRDD.mapToPair( ... );
Then I try to do flatMapValues to apply the GenerateTimeSeries Function, it
takes the from, to and values to generate
is a Double
JavaPairRDDStringString, ListSerializable eanKeyTsParameters =
javaRDD.mapToPair( ... );
Then I try to do flatMapValues to apply the GenerateTimeSeries Function,
it takes the /from, to /and /values/ to generate a ListLongDouble.
Here is the error I get when compiling:
error
13 matches
Mail list logo