hi Diego,
I have the same problem.
// reduce by key in the first window
val *w1* = *one*.reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10))
w1.count().print()
//reduce by key in the second window based on the results of the first
window
val *w2* = *w1*.reduceByKeyAndWindow(_ + _, Seconds(120)
I have test the example codes FlumeEventCount on standalone cluster, and this
is still a problem in Spark 1.1.0, the latest version up to now. Do you have
solved this issue in your way?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDSt
yes, I agree to directly transform on DStream even there is no data injected
in this batch duration.
while my situation is :
Spark receive flume stream continurously, and I use updateStateByKey
function to collect data for a key among several batches, then I will handle
the collected data after wai
i'm sorry I have some error in my code, update here:
var count = -1L // a global variable in the main object
val currentBatch = some_DStream
val countDStream = currentBatch.map(o=>{
count = 0L // reset the count variable in each batch
o
})
countDStream.foreachRDD(rdd=> coun
Thanks all,
yes, i did using foreachRDD, the following is my code:
var count = -1L // a global variable in the main object
val currentBatch = some_DStream
val countDStream = currentBatch.map(o=>{
*count = 0L *// reset the count variable in each batch
o
})
countDStream.foreachRDD
Hi Jerry,
Thanks for your reply.
I use spark streaming to receive the flume stream, then I need to do a
judgement, in each batchDuration, if the received stream has data, then I
should do something, if no data, do the other thing. Then I thought the
count() can give me the measure, but it returns
VisualVM is free and is enough in most situations
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-profile-a-spark-application-tp13684p13770.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I want to implement the following logic:
val stream = getFlumeStream() // a DStream
if(size_of_stream > 0) // if the DStream contains some RDD
stream.someTransfromation
stream.count() can figure out the number of RDD in a DStream, but it return
a DStream[Long] and can't compare with a number
When a MappedRDD is handled by groupByKey transformation, tuples distributed
in different worker nodes with the same key will be collected into one
worker nodes, say,
(K, V1), (K, V2), ..., (K, Vn) -> (K, Seq(V1, V2, ..., Vn)).
I want to know whether the value /Seq(V1, V2, ..., Vn)/ of a tuple i