Re: Noob Spark questions

Ognen Duzlevski Tue, 31 Dec 2013 05:35:59 -0800

Yes, this helps! I see it uses akka.

Thanks!



On Tue, Dec 31, 2013 at 12:34 AM, Aaron Davidson <[email protected]> wrote:

> Not sure if it helps, but there is a ZeroMQ Spark Streaming example:
> https://github.com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/ZeroMQWordCount.scala
> .
>
>
> On Mon, Dec 30, 2013 at 9:41 PM, Ognen Duzlevski <[email protected]
> > wrote:
>
>> Can anyone provide any code examples of connecting Spark to zeromq data
>> producers for purposes of simple real-time analytics? Even the most basic
>> example would be nice :)
>>
>> Thanks!
>>
>>
>> On Mon, Dec 23, 2013 at 2:42 PM, Ognen Duzlevski <
>> [email protected]> wrote:
>>
>>> Hello, I am new to Spark and have installed it, played with it a bit,
>>> mostly I am reading through the "Fast data processing with Spark" book.
>>>
>>> One of the first things I realized is that I have to learn Scala, the
>>> real-time data analytics part is not supported by the Python API, correct?
>>> I don't mind, Scala seems to be a lovely language! :)
>>>
>>> Anyways, I would like to set up a data analysis pipeline where I have
>>> already done the job of exposing a port on the internet (amazon elastic
>>> load balancer) that feeds real-time data from tens-hundreds of thousands of
>>> clients in real-time into a set of internal instances which are essentially
>>> zeroMQ sockets (I do this via mongrel2 and associated handlers).
>>>
>>> These handlers can themselves create 0mq sockets to feed data into a
>>> "pipeline" via a 0mq push/pull, pub/sub or whatever mechanism works best.
>>>
>>> One of the pipelines I am evaluating is Spark.
>>>
>>> There seems to be information on Spark but for some reason I find it to
>>> be very Hadoop specific. HDFS is mentioned a lot, for example. What if I
>>> don't use Hadoop/HDFS?
>>>
>>> What do people do when they want to inhale real-time information? Let's
>>> say I want to use 0mq. Does Spark allow for that? How would I go about
>>> doing this?
>>>
>>> What about "dumping" all the data into a persistent store? Can I dump
>>> into DynamoDB or Mongo or...? How about Amazon S3? I suppose my 0mq
>>> handlers can do that upon receipt of data before it "sees" the pipeline but
>>> sometimes storing intermediate results helps too...
>>>
>>> Thanks!
>>> OD
>>>
>>
>>
>

Re: Noob Spark questions

Reply via email to