Re: Source code JavaNetworkWordcount

2014-02-05 Thread Tathagata Das
Yes. You should be able to. Lets try to have future conversations through the u...@spark.incubator.apache.org mailing list :) On Wed, Feb 5, 2014 at 2:33 PM, Eduardo Costa Alfaia wrote: > So I could use reduceByKeyAndWindow like this > val wordCounts = words.map(x => (x, 1)).reduceByKeyAndWind

Re: Source code JavaNetworkWordcount

2014-02-05 Thread Eduardo Costa Alfaia
So I could use reduceByKeyAndWindow like this val wordCounts = words.map(x => (x, 1)).reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10)) ? > The reduceByKeyAndWindow and other ***ByKey operations work only on > DStreams of key-value pairs. "Words" is a DStream[String], so its not > key-

Re: Source code JavaNetworkWordcount

2014-02-05 Thread Tathagata Das
The reduceByKeyAndWindow and other ***ByKey operations work only on DStreams of key-value pairs. "Words" is a DStream[String], so its not key-value pairs. "words.map(x => (x, 1))" is DStream[(String, Int)] that has key-value pairs, so you can call reduceByKeyAndWindow. TD On Wed, Feb 5, 20

Re: Source code JavaNetworkWordcount

2014-02-05 Thread Eduardo Costa Alfaia
Hi Tathagata I am playing with NetworkWordCount.scala, I did some changes like this(in red): // Create the context with a 1 second batch size 67 val ssc = new StreamingContext(args(0), "NetworkWordCount", Seconds(1), 68 System.getenv("SPARK_HOME"), StreamingContext.jarOfClass(this.ge

Re: Source code JavaNetworkWordcount

2014-02-05 Thread Tathagata Das
Seems good to me. BTW, its find to MEMORY_ONLY (i.e. without replication) for testing, but you should turn on replication if you want fault-tolerance. TD On Mon, Feb 3, 2014 at 3:19 PM, Eduardo Costa Alfaia wrote: > Hi Tathagata, > > You were right when you have said for me to use scala agains

Re: Source code JavaNetworkWordcount

2014-02-03 Thread Eduardo Costa Alfaia
Hi Tathagata, You were right when you have said for me to use scala against java, scala is very easy. I have implemented that code you have given (in bold), but I have implemented also an union function(in red) because I am testing with 2 stream sources, my idea is putting 3 or more stream sources

Re: Source code JavaNetworkWordcount

2014-01-30 Thread Tathagata Das
Let me first ask for a few clarifications. 1. If you just want to count the words in a single text file like Don Quixote (that is, not for a stream of data), you should use only Spark. Then the program to count the frequency of words in a text file would look like this in Java. If you are not supe