Hello Everyone, I was trying to understand if anyone here has tried a data warehouse solution using S3 and Spark SQL. Out of multiple possible options (redshift, presto, hive etc), we were planning to go with Spark SQL, for our aggregates and processing requirements.
If anyone has tried it out, would like to understand the following: 1. Is Spark SQL and UDF, able to handle all the workloads? 2. What user interface did you provide for data scientist, data engineers and analysts 3. What are the challenges in running concurrent queries, by many users, over Spark SQL? Considering Spark still does not provide spill to disk, in many scenarios, are there frequent query failures when executing concurrent queries 4. Are there any open source implementations, which provide something similar? Regards, Ashish