Re: Announce: MR3 0.3, and performance comparison with Hive-LLAP, Presto, Spark, Hive on Tez
The article can be found at: https://mr3.postech.ac.kr/blog/2018/08/15/comparison-llap-presto-spark-mr3/ -- Sungwoo Park On Thu, Aug 16, 2018 at 10:53 PM, Sungwoo Park wrote: > Hello Hive users, > > I am pleased to announce the release of MR3 0.3. A new feature of MR3 0.3 > is its support for Hive 3.0.0 on Hadoop 2.7/2.8/2.9. I have also published > a blog article that uses the TPC-DS benchmark to compare the following six > systems: > > 1) Hive-LLAP included in HDP 2.6.4 > 2) Presto 0.203e > 3) Spark 2.2.0 included in HDP 2.6.4 > 4) Hive 3.0.0 on Tez > 5) Hive 3.0.0 on MR3 > 6) Hive 2.3.3 on MR3 > > You can download MR3 0.3 at: > > https://mr3.postech.ac.kr/download/home/ > > Thank you for your interest! > > --- Sungwoo Park > >
Announce: MR3 0.3, and performance comparison with Hive-LLAP, Presto, Spark, Hive on Tez
Hello Hive users, I am pleased to announce the release of MR3 0.3. A new feature of MR3 0.3 is its support for Hive 3.0.0 on Hadoop 2.7/2.8/2.9. I have also published a blog article that uses the TPC-DS benchmark to compare the following six systems: 1) Hive-LLAP included in HDP 2.6.4 2) Presto 0.203e 3) Spark 2.2.0 included in HDP 2.6.4 4) Hive 3.0.0 on Tez 5) Hive 3.0.0 on MR3 6) Hive 2.3.3 on MR3 You can download MR3 0.3 at: https://mr3.postech.ac.kr/download/home/ Thank you for your interest! --- Sungwoo Park
[Hive Metastore] Add a Configuration Item to Skip the HDFS Data Modification
As stated in HIVE-20398 When we are conducting the hive upgrading, we have following use case: We want to sync the operations between two metastore server (A and B) by thrift api, but both them are based on the same HDFS. So, for operations like drop_partitions, drop_table, insert_overwrite, create_table which will cause the data modification in HDFS, we want it to be executed by only Metastore Server A. For metastore Server B, he will only change his metadata, but didn't do corresponding HDFS files operation. So, we need a switch to control this. like hive.metastore.skip.hdfs whose default value is false just like what is happening now. When its value is true, the metastore server will only conduct the metadata modification, but skip the HDFS data modification.
Re: External Table Creation is slow/hangs
Hi, I can't tell for sure where your problem is coming from, but from what you said, I guess that the Hive Metastore is performing some list or scan operation on the files and that operation is taking a very long time. maybe setting *hive.stats.autogather* to false might help. Also, beware that some configuration parameters that apply to the Metastore cannot be changed via a SET operation, and require you to change the configuration file of your Metastore service and restart it. Maybe that's why some of the conf changes you tried had no effect... Also, don't hesitate to provide more details about what type of query you run (e.g. is your table partitioned? etc.) and what configuration tweaks you tried already. Hope this helps, Furcy On Tue, 14 Aug 2018 at 21:39, Luong, Dickson wrote: > I have a dataset up on S3 in partitioned folders. I'm trying to create an > external hive table pointing to the location of that data. The table schema > is set up to have the column partitions matching how the folders are set up > on S3. > > I've done this quite a few times successfully, but when the data is large > the table creation query is either extremely slow or it hangs (We can't > tell). > > I've followed some of the tips in > https://hortonworks.github.io/hdp-aws/s3-hive/index.html#general-performance-tips > by configuring some of the parameters involving file permission and file > size checks to adjust for S3 but still no luck. > > We're using EMR 5.12.1 which contains Hive 2.3.2. The table creation query > does not show up in the Tez UI, but it does show up in the HiveServer UI as > running, but we're not sure if it actually is or just hung (most likely the > latter). > > Our (very roundabout) solution so far is to copy all the files in that > master folder to another directory, delete the files, create the external > table when the directory is empty, and to transfer the files back. We need > to keep the original directory name as other processes depend on it and > can't simply just start in a fresh directory, so this whole method is > obviously not ideal. > > Any tips / solutions to this problem we've been tackling would be greatly > appreciated. > > Dickson >