Re: [SPARK SQL] Difference between 'Hive on spark' and Spark SQL

2018-12-20 Thread Jörn Franke
If you have already a lot of queries then it makes sense to look at Hive (in a recent version)+TEZ+Llap and all tables in ORC format partitioned and sorted on filter columns. That would be the most easiest way and can improve performance significantly . If you want to use Spark, eg because you

[SPARK SQL] Difference between 'Hive on spark' and Spark SQL

2018-12-19 Thread luby
Hi, All, We are starting to migrate our data to Hadoop platform in hoping to use 'Big Data' technologies to improve our business. We are new in the area and want to get some help from you. Currently all our data is put into Hive and some complicated SQL query statements are run daily. We