ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable

2018-07-25 Thread Dmitry Goldenberg
Hi, I apologize for the wide distribution and if this is not the right mailing list for this. We write Avro files to Parquet and load them to HDFS so they can be accessed via an EXTERNAL Hive table. These records have two timestamp fields which are expressed in the Avro schema as type = long

?????? Using snappy compresscodec in hive

2018-07-25 Thread Zhefu Peng
Hi Gopal, Thanks for your reply! One more question, does the effect of using pure-java version is the same as that of using SnappyCodec? Or, in other words, is there any difference between these two methods, about the compression result and effect? Looking forward to your reply and help.

Re: UDFJson cannot Make Progress and Looks Like Deadlock

2018-07-25 Thread Peter Vary
Happy to help! :) Proust (Feng Guizhou) [FDS Payment and Marketing] ezt írta (időpont: 2018. júl. 24., Ke 12:17): > Just FYI, I'm able to make a custom UDF to apply the thread-safe code > changes. > > Thanks a lot for your help > > > Guizhou > -- > *From:* Proust

Re: Total length of orc clustered table is always 2^31 in TezSplitGrouper

2018-07-25 Thread 何宝宁
Thank you Gopal for pointing the root cause. After running command alter table xxx compact ‘major’ to request a force compaction, total length is right ! Is there any way to do compact immediately after insert values. Bob He Thanks On 25 Jul 2018, at 1:45 PM, Gopal Vijayaraghavan wrote: >

Clustering and Large-scale analysis of Hive Queries

2018-07-25 Thread Zheng Shao
Hi, I am interested in working on a project that takes a large number of Hive queries (as well as their meta data like amount of resources used etc) and find out common sub queries and expensive query groups etc. Are there any existing work in this domain? Happy to collaborate as well if there

Re: Clustering and Large-scale analysis of Hive Queries

2018-07-25 Thread Johannes Alberti
Did you guys already look at Dr Elephant? https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark Not sure if there is anything you might find useful, but I would be interested in hearing about good and bad about Dr Elephant w/ Hive.