Re: How to maintain multiple JavaRDD created within another method like javaStreamRDD.forEachRDD

2015-07-14 Thread Jong Wook Kim
Your question is not very clear, but from what I understand, you want to deal 
with a stream of MyTable that has parsed records from your Kafka topics.

What you need is JavaDStream, and you can use transform() 

 to make one.

It accepts a function that accepts an RDD and returns an RDD, as opposed to 
foreachRDD, whose argument returns Void as in your code.

PS. I wouldn't name a JavaDStream "javaDStreamRdd", first of all it is not an 
RDD, and it should be more specific about what it contains.


Jong Wook.


> On Jul 15, 2015, at 00:41, unk1102  wrote:
> 
> I use Spark Streaming where messages read from Kafka topics are stored into
> JavaDStream this rdd contains actual data. Now after going through
> documentation and other help I have found we traverse JavaDStream using
> foreachRDD
> 
> javaDStreamRdd.foreachRDD(new Function,Void>() {
>public void call(JavaRDD rdd) {
>//now I want to call mapPartitions on above rdd and generate new
> JavaRDD
>JavaRDD rdd_records = rdd.mapPartitions(
>  new FlatMapFunction, MyTable>() {
>  public Iterable call(Iterator stringIterator)
> throws Exception {
> //create List execute the following in while loop
> String[] fields = line.split(",");
> Record record = create Record from above fields 
> MyTable table = new MyTable();
> return table.append(record);
>}
> });
>}
>return null;
>}
> });
> 
> Now my question how does above code work. I want to create JavaRDD
> for each RDD of JavaDStream. How do I make sure above code will work fine
> with all data and JavaRDD will contain all the data and wont lose
> any previous data because of local JavaRDD.
> 
> It is like calling lambda function within lambda function. How do I make
> sure local variable JavaRDD will point to contain all RDD?
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-maintain-multiple-JavaRDD-created-within-another-method-like-javaStreamRDD-forEachRDD-tp23832.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 



How to maintain multiple JavaRDD created within another method like javaStreamRDD.forEachRDD

2015-07-14 Thread unk1102
I use Spark Streaming where messages read from Kafka topics are stored into
JavaDStream this rdd contains actual data. Now after going through
documentation and other help I have found we traverse JavaDStream using
foreachRDD

javaDStreamRdd.foreachRDD(new Function,Void>() {
public void call(JavaRDD rdd) {
//now I want to call mapPartitions on above rdd and generate new
JavaRDD
JavaRDD rdd_records = rdd.mapPartitions(
  new FlatMapFunction, MyTable>() {
  public Iterable call(Iterator stringIterator)
throws Exception {
 //create List execute the following in while loop
 String[] fields = line.split(",");
 Record record = create Record from above fields 
 MyTable table = new MyTable();
 return table.append(record);
}
 });
}
return null;
}
});

Now my question how does above code work. I want to create JavaRDD
for each RDD of JavaDStream. How do I make sure above code will work fine
with all data and JavaRDD will contain all the data and wont lose
any previous data because of local JavaRDD.

It is like calling lambda function within lambda function. How do I make
sure local variable JavaRDD will point to contain all RDD?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-maintain-multiple-JavaRDD-created-within-another-method-like-javaStreamRDD-forEachRDD-tp23832.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org