Re: Best Client / IDE

2019-04-24 Thread Thai Bui
If you consider CLI clients then I would say beeline is the best of breed for a CLI client. For a web UI, the best could be the Data Analytics Studio from Hortonworks (now Cloudera). You will need to be a

Re: Hive on Tez vs Impala

2019-04-22 Thread Thai Bui
I'm using Hive 3.1 on Tez/LLAP and I must say the experience was not good but it was worth it. We built Hive from HDP's hive-release and add Tez UI back, combined that with Hue 4.3 (also built from Cloudera Hue). Now that the two companies have merged I think things are going to get better (I'm

Re: just released: Docker image of a minimal Hive server

2019-02-21 Thread Thai Bui
Great work!! Just curious, is it possible to take it one step further to provide a standalone local Hive that requires no hdfs (local filesystem instead) with embedded metastore and beeline? Would love to collaborate to make this happen similar to how spark-shell works. On Thu, Feb 21, 2019 at

Re: Hive PR

2019-01-07 Thread Thai Bui
Most likely if you don't know the maintainer of the specific feature that you are changing, no one will look at your PR. You could take a look at the git change log to see who has implemented or worked on the feature that you are changing. Pick one of them and contact them politely via email for a

Re: Hive tables query failing for simple query with memory error.

2018-10-19 Thread Thai Bui
Your Tez container size is too small relatively to your query and data size. Notice the log said *1.0 GB of 1 GB physical memory used. *It's because the default Tez container/task size for your cluster is 1024GB. You can increase it to a higher number (such as 2048 or 4096) via the setting

Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-09-02 Thread Thai Bui
g this? Since I do not have much idea > of this. > > On Thu, 30 Aug 2018 20:08 Thai Bui, wrote: > >> Another option is to implement a custom ParquetInputFormat extending the >> current Hive MR Parquet format and handle schema coersion at the input >> split/record read

Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-08-30 Thread Thai Bui
Another option is to implement a custom ParquetInputFormat extending the current Hive MR Parquet format and handle schema coersion at the input split/record reader level. This would be more involving but guarantee to work if you could add auxiliary jars to your Hive cluster. On Wed, Aug 29, 2018

Re: Clustering and Large-scale analysis of Hive Queries

2018-07-26 Thread Thai Bui
I don’t see any project especially tuned for Hive doing what you described. I have encountered this problem recently as the number of users and queries grew exponentially in my company and I’m interested. We are currently collecting Hive internal metrics in order to do certain analysis (don’t

Re:

2018-06-13 Thread Thai Bui
> /dev/nvme1n1p1 5.0G 142M 4.9G 3% /emr > > /dev/nvme1n1p2 115G 2.2G 113G 2% /mnt > > > On Wed, Jun 13, 2018 at 10:28 AM, Thai Bui wrote: > >> That error occurred usually because of disks nearly out of space. In your >> EMR cluster, SSH into one of the n

Re:

2018-06-13 Thread Thai Bui
That error occurred usually because of disks nearly out of space. In your EMR cluster, SSH into one of the nodes and do a `df -h` to check disk usage in all of your EBS storages. HDFS is usually configured to be unhealthy when disks it's writing to are >90% utilized. Once that happens, the

Re: Hive External Table with Zero Bytes files

2018-04-28 Thread Thai Bui
Your external table is referencing the .../day=201803250 location which is empty. Point your table to the capital .../DAY=201803250 and you should be able to read the data there. Also, it looks like you want external partitioned table. You’ll need to create an external table with a partition

Re: Ways to reduce launching time of query in Hive 2.2.1

2018-04-16 Thread Thai Bui
The best approach would be to use a demonized containers such as Hive LLAP + Tez session pool or Spark on Hive. I’m not that familiar with Spark on Hive so I can’t comment on it but Hive on LLAP has worked really well for me when coupled with Tez session pool. You’ll have to specify how many Tez

Re: [Announce] Hive-MR3: Hive running on top of MR3

2018-04-04 Thread Thai Bui
It would be interesting to see how this compares to Hive LLAP on Tez. Since the llap daemons contain a queue of tasks that is shared amongst many Tez AMs, it could have similar characteristics to the way MR3 is sharing the containers between the AMs. On Wed, Apr 4, 2018 at 10:06 AM Sungwoo Park