Re: Hive on TEZ fails starting

2016-01-06 Thread Rajesh Balamohan
Is the job starting and getting stuck in the reducer like you mentioned in the initial mail? or the job itself is not starting? ~Rajesh.B On Wed, Jan 6, 2016 at 1:47 PM, Mich Talebzadeh wrote: > Hi, > > > > Thanks for your help. I downloaded and installed snappy libraries

RE: Hive on TEZ fails starting

2016-01-06 Thread Mich Talebzadeh
Not starting at all! Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical Financial Data on ASE 15

RE: Hive on TEZ fails starting

2016-01-06 Thread Mich Talebzadeh
Apologies it starts OK and fails at reducer! Map 1RUNNING 1 010 3 0 Reducer 2 INITED 1 001 0 0p:

Re: Indexes in Hive

2016-01-06 Thread Alan Gates
The issue with this is that HDFS lacks the ability to co-locate blocks. So if you break your columns into one file per column (the more traditional column route) you end up in a situation where 2/3 of the time only one of your columns is being locally read, which results in a significant

Re: Indexes in Hive

2016-01-06 Thread Jörn Franke
I am not sure how much performance one could gain in comparison to ORC or Parquet. They work pretty well once you know how to use them. However, there is still ways to optimize them. For instance, sorting of data is a key factor for these formats to be efficient. Nevertheless, if you have a lot of

RE: Hive on TEZ fails starting

2016-01-06 Thread Mich Talebzadeh
Hi, Thanks for your help. I downloaded and installed snappy libraries as they were missing. Setting Hive execution engine to tez and doing a simple query. Hive is stuck > set hive.execution.engine=tez; 16/01/06 08:20:22 [main]: DEBUG parse.VariableSubstitution: Substitution is

last_modified_time and transient_lastDdlTime - what is transient_lastDdlTime for.

2016-01-06 Thread Ophir Etzion
I want to know for each of my tables the last time it was modified. some of my tables don't have last_modified_time in the table parameters but all have transient_lastDdlTime. transient_lastDdlTime seems to be the same as last_modified_time in some of the tables I randomly cheked. what is the

RE: last_modified_time and transient_lastDdlTime - what is transient_lastDdlTime for.

2016-01-06 Thread Mich Talebzadeh
When table is created it is the time stamp when the table was created. When any DDL is done it is the last DDL time it looks 0: jdbc:hive2://rhes564:10010/default> create table test (col1 int, col2 string); No rows affected (0.168 seconds) 0: jdbc:hive2://rhes564:10010/default> show

RE: Indexes in Hive

2016-01-06 Thread Mich Talebzadeh
Thanks guys A typical columnar database stores data by breaking the rows of a table into individual columns and storing the successive values in an indexed and compressed form in data blocks. The nth row of the table can be reconstituted by taking the nth element from each column heap So