I'm evaluating whether Spark would be a good fit in my current streaming data processing pipeline, and I'm just a bit confused about the differentiation between spark and spark streaming.
Spark seems to have a mature Python API that I plan on trying out, but Spark Streaming appears to NOT have a Python API. What is the key differentiator here? Does this mean that the only possible data source when using Python is HDFS? Or is it possible to grab data from ZeroMQ to process in Python? Going even further, can you process data from other interesting data stores (for example, a Solr index)? Thanks in advance for any response. I'm just trying to get a grasp on the data source possibilities, and how that impacts language/technology choices. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-HDFS-the-only-possible-data-source-for-spark-with-python-tp1257.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
