spark streaming and computation

skippi Sun, 10 May 2015 02:20:08 -0700

Assuming a web server access log shall be analyzed and target of computation
shall be csv-files per time, e.g. one per day containing the
minute-statistics and one per month containing the hour statistics. Incoming
statistics are computed as discretized streams using spark streaming
context.


Basically I have to create the csv-files, combine them with the discretized
stream and then to replace to old csv with the comined one. To realize such
computation some kind of timestamp-based partitioning is required, the
assign contents of discrete stream to time-slots. But there seems no kind of
such processing.

Can you give me a hint how to solve this computation? I am missing examples
explaining how to compute on base of existing time based data. How to
replace existing files? How to design allowing recomputation of larger data
sets?

regards,
markus



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-computation-tp22835.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

spark streaming and computation

Reply via email to