Spark based Data Warehouse

ashish rawat Sat, 11 Nov 2017 23:22:43 -0800

Hello Everyone,

I was trying to understand if anyone here has tried a data warehouse
solution using S3 and Spark SQL. Out of multiple possible options
(redshift, presto, hive etc), we were planning to go with Spark SQL, for
our aggregates and processing requirements.


If anyone has tried it out, would like to understand the following:

   1. Is Spark SQL and UDF, able to handle all the workloads?
   2. What user interface did you provide for data scientist, data
   engineers and analysts
   3. What are the challenges in running concurrent queries, by many users,
   over Spark SQL? Considering Spark still does not provide spill to disk, in
   many scenarios, are there frequent query failures when executing concurrent
   queries
   4. Are there any open source implementations, which provide something
   similar?


Regards,
Ashish

Spark based Data Warehouse

Reply via email to