Not sure if it helps, but there is a ZeroMQ Spark Streaming example: https://github.com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/ZeroMQWordCount.scala .
On Mon, Dec 30, 2013 at 9:41 PM, Ognen Duzlevski <[email protected]>wrote: > Can anyone provide any code examples of connecting Spark to zeromq data > producers for purposes of simple real-time analytics? Even the most basic > example would be nice :) > > Thanks! > > > On Mon, Dec 23, 2013 at 2:42 PM, Ognen Duzlevski <[email protected] > > wrote: > >> Hello, I am new to Spark and have installed it, played with it a bit, >> mostly I am reading through the "Fast data processing with Spark" book. >> >> One of the first things I realized is that I have to learn Scala, the >> real-time data analytics part is not supported by the Python API, correct? >> I don't mind, Scala seems to be a lovely language! :) >> >> Anyways, I would like to set up a data analysis pipeline where I have >> already done the job of exposing a port on the internet (amazon elastic >> load balancer) that feeds real-time data from tens-hundreds of thousands of >> clients in real-time into a set of internal instances which are essentially >> zeroMQ sockets (I do this via mongrel2 and associated handlers). >> >> These handlers can themselves create 0mq sockets to feed data into a >> "pipeline" via a 0mq push/pull, pub/sub or whatever mechanism works best. >> >> One of the pipelines I am evaluating is Spark. >> >> There seems to be information on Spark but for some reason I find it to >> be very Hadoop specific. HDFS is mentioned a lot, for example. What if I >> don't use Hadoop/HDFS? >> >> What do people do when they want to inhale real-time information? Let's >> say I want to use 0mq. Does Spark allow for that? How would I go about >> doing this? >> >> What about "dumping" all the data into a persistent store? Can I dump >> into DynamoDB or Mongo or...? How about Amazon S3? I suppose my 0mq >> handlers can do that upon receipt of data before it "sees" the pipeline but >> sometimes storing intermediate results helps too... >> >> Thanks! >> OD >> > >
