Hi Xavier, Dremio is looking really interesting and has nice UI. I think the idea to replace SSIS or similar tools with Dremio is not so bad, but what about complex scenarios with a lot of code and transformations ? Is it possible to use Dremio via API and define own transformations and transformation workflows with Java or Scala in Dremio? I am not sure, if it is supported at all. I think Dremio guys are looking forward to give users access to Sabot API in order to use Dremio in the same way you can use spark, but I am not sure if it is possible now. Have you also tried comparing performance with Spark ? Are there any benchmarks ?
Best, Michael On Mon, May 14, 2018 at 6:53 AM, xmehaut <xavier.meh...@gmail.com> wrote: > Hello, > I've some question about Spark and Apache Arrow. Up to now, Arrow is only > used for sharing data between Python and Spark executors instead of > transmitting them through sockets. I'm studying currently Dremio as an > interesting way to access multiple sources of data, and as a potential > replacement of ETL tools, included sparksql. > It seems, if the promises are actually right, that arrow and dremio may be > changing game for these two purposes (data source abstraction, etl tasks), > leaving then spark on te two following goals , ie ml/dl and graph > processing, which can be a danger for spark at middle term with the arising > of multiple frameworks in these areas. > My question is then : > - is there a means to use arrow more broadly in spark itself and not only > for sharing data? > - what are the strenghts and weaknesses of spark wrt Arrow and consequently > Dremio? > - What is the difference finally between databricks DBIO and Dremio/arrow? > -How do you see the future of spark regarding these assumptions? > regards > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >