I'm evaluating whether Spark would be a good fit in my current streaming data
processing pipeline, and I'm just a bit confused about the differentiation
between spark and spark streaming.  

Spark seems to have a mature Python API that I plan on trying out, but Spark
Streaming appears to NOT have a Python API.  What is the key differentiator
here?  Does this mean that the only possible data source when using Python
is HDFS?  Or is it possible to grab data from ZeroMQ to process in Python? 
Going even further, can you process data from other interesting data stores
(for example, a Solr index)? 

Thanks in advance for any response.  I'm just trying to get a grasp on the
data source possibilities, and how that impacts language/technology choices.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-HDFS-the-only-possible-data-source-for-spark-with-python-tp1257.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to