Hello,
I've some question about Spark and Apache Arrow. Up to now, Arrow is only
used for sharing data between Python and Spark executors instead of
transmitting them through sockets. I'm studying currently Dremio as an
interesting way to access multiple sources of data, and as a potential
replacement of ETL tools, included sparksql.
It seems, if the promises are actually right, that arrow and dremio may be
changing game for these two purposes (data source abstraction, etl tasks),
leaving then spark on te two following goals , ie ml/dl and graph
processing, which can be a danger for spark at middle term with the arising
of multiple frameworks in these areas.
My question is then :
- is there a means to use arrow more broadly in spark itself and not only
for sharing data?
- what are the strenghts and weaknesses of spark wrt Arrow and consequently
Dremio?
- What is the difference finally between databricks DBIO and Dremio/arrow?
-How do you see the future of spark regarding these assumptions?
regards 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to