Hive compaction didn't launch

2016-07-26 Thread Igor Kuzmenko
I'm using Hive 1.2.1 transactional table. Inserting data in it via Hive
Streaming API. After some time i expect compaction to start but it didn't
happen:

Here's part of log, which shows that compactor initiator thread doesn't see
any delta files:
*2016-07-26 18:06:52,459 INFO  [Thread-8]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact
default.data_aaa.dt=20160726*
*2016-07-26 18:06:52,496 DEBUG [Thread-8]: io.AcidUtils
(AcidUtils.java:getAcidState(432)) - in directory
hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/data_aaa/dt=20160726
base = null deltas = 0*

*2016-07-26 18:06:52,496 DEBUG [Thread-8]: compactor.Initiator
(Initiator.java:determineCompactionType(271)) - delta size: 0 base size: 0
threshold: 0.1 will major compact: false*

But in that directory there's actually 23 files:

hadoop fs -ls /apps/hive/warehouse/data_aaa/dt=20160726
Found 23 items
-rw-r--r--   3 storm hdfs  4 2016-07-26 17:20
/apps/hive/warehouse/data_aaa/dt=20160726/_orc_acid_version
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:22
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71741256_71741355
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:23
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71762456_71762555
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:25
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71787756_71787855
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:26
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71795756_71795855
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:27
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71804656_71804755
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:29
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71828856_71828955
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:30
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71846656_71846755
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:32
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71850756_71850855
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:33
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71867356_71867455
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:34
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71891556_71891655
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:36
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71904856_71904955
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:37
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71907256_71907355
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:39
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71918756_71918855
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:40
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71947556_71947655
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:41
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71960656_71960755
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:43
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71963156_71963255
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:44
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71964556_71964655
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:46
/apps/hive/warehouse/data_aaa/dt=20160726/delta_71987156_71987255
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:47
/apps/hive/warehouse/data_aaa/dt=20160726/delta_72015756_72015855
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:48
/apps/hive/warehouse/data_aaa/dt=20160726/delta_72021356_72021455
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:50
/apps/hive/warehouse/data_aaa/dt=20160726/delta_72048756_72048855
drwxrwxrwx   - storm hdfs  0 2016-07-26 17:50
/apps/hive/warehouse/data_aaa/dt=20160726/delta_72070856_72070955


Full log here <http://pastebin.com/gHwvgRUV>.

What could go wrong?


Re: Re-arrange columns

2016-07-26 Thread Furcy Pin
Hi,

I think I had similar issues to yours.

Did you look in the Hive documentation at what the CASCADE keyword does on
ADD or CHANGE COLUMNS statements?
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn

>From what I understand, the behavior of Hive when adding or changing
columns is different from standard RDBMS:
I think that adding or moving a column in Hive will only change the
metadata and not the underlying data.
To cope with that, the metastore keeps separate schema information for the
table itself and for its partitions.
This way, when you change the schema of your table, the old partition can
still be read according to the old schema,
and new partitions will be created with the new schema (unless you use
CASCADE)

There are two dangers with this, however:
- if you create a new column, it will not be accessible in the old
partitions.
- if you regenerate (override) an existing partition, the new data will
correspond to the new schema, but the partition's metadata will not be
updated.
   (I believe this could be considered as a bug, with a workaround since
the "ALTER TABLE table_name ADD COLUMNS ..."
   can also be applied to one specific partition at a time: "ALTER TABLE
table_name PARTITION partition_spec ADD COLUMNS ...")

If you use CASCADE, the change you apply to the table will be immediately
applied to its partitions as well.
But if you don't regenerate your existing partitions, I believe you will
have problems as well, since your partition's schema will not match the
underlying data.


So, I guess it mostly depend on if (and when) you plan to regenerate your
partition to add the new column to your existing data.
But you can either:

A. Drop your table and do a msck repair table.

B. To reduce unavailability: create another table, populate it, and then
swap your tables.

[image: Inline image 1]

C. Do your column change with CASCADE, and regenerate your partitions
immediately,
but they might not be correctly readable between the time you make your
change and regenerate them.

D. Do your column change without CASCADE: you can still query the old
partitions (without the new column though)
and after regenerating a partition, change its schema with a "ALTER TABLE
table_name PARTITION partition_spec ADD COLUMN ..."

E. (I'm not sure this one works) Do your column change without CASCADE,
create a copy of the table, generate the partitions there, and then use
EXCHANGE PARTITION (
https://cwiki.apache.org/confluence/display/Hive/Exchange+Partition) to
move the partitions from the new table to the old.

Hope this helps, let me know how it turns out,

Furcy







On Tue, Jul 26, 2016 at 2:34 AM, Binh Nguyen Van  wrote:

> Hi,
>
> I am writing an application that insert data into Hive and it depends on
> the order of columns in table so I have to rearrange my columns to make
> work around that but I have difficulty of doing that. Could you please help?
>
> Here is my problem:
>
> I have table 'tbl' which is created by using this statement:
> CREATE EXTERNAL TABLE tbl (col1 int, col3 string) PARTITIONED BY (dt string) 
> STORED AS PARQUET LOCATION '/wh/db/tb'.
> Now I want to add new column 'col2' with type int to that table and I want to 
> put it after 'col1' and before 'col3' so I use these two statements:
> ALTER TABLE tbl ADD COLUMNS (col2 int);
> ALTER TABLE tbl CHANGE col2 col2 int AFTER col1;
>
> With Hive 1.2.1, these two statements are executed successfully but I got 
> type cast exception when I query data back.
> With Hive 2.x, The second alter statement failed with exception: 'Unable to 
> alter table. The following columns have types incompatible with the existing 
> columns in their respective positions'
>
> I could work around this by drop old table and then create a new table
> with new schema but this requires me to to run MSCK to update metadata for
> that table and this process could be very slow when I have a lot of data
> and a lot of partition so I am looking for a better one.
>
> Please help!
> Thanks
> -Binh
> ​
>


Re: ORC algorithm skeleton

2016-07-26 Thread praveenesh kumar
Also - http://orc.apache.org/docs/spec-intro.html

On Tue, Jul 26, 2016 at 1:49 PM, praveenesh kumar 
wrote:

> This might help  - https://issues.apache.org/jira/browse/HIVE-3874
>
> Regards
> Prav
>
> On Tue, Jul 26, 2016 at 12:58 PM, Amatucci, Mario, Vodafone Group <
> mario.amatu...@vodafone.com> wrote:
>
>> Hello everyone,
>>
>> Anyone got any idea about ORC algorithm skeleton? How it works?
>>
>>
>


Re: ORC algorithm skeleton

2016-07-26 Thread praveenesh kumar
This might help  - https://issues.apache.org/jira/browse/HIVE-3874

Regards
Prav

On Tue, Jul 26, 2016 at 12:58 PM, Amatucci, Mario, Vodafone Group <
mario.amatu...@vodafone.com> wrote:

> Hello everyone,
>
> Anyone got any idea about ORC algorithm skeleton? How it works?
>
>


ORC algorithm skeleton

2016-07-26 Thread Amatucci, Mario, Vodafone Group
Hello everyone,
 
Anyone got any idea about ORC algorithm skeleton? How it works?
 


RE: Alternatives to self join

2016-07-26 Thread Markovitz, Dudu
Hi

Can you please send your original query and perhaps a small dataset sample?

Thanks

Dudu

From: Buntu Dev [mailto:buntu...@gmail.com]
Sent: Tuesday, July 26, 2016 10:46 AM
To: user@hive.apache.org
Subject: Alternatives to self join

I'm currently doing a self-join on a table 4 times on varying conditions. 
Although it works fine, I'm not sure if there are any alternatives that perform 
better. Please let me know.

Thanks!


Alternatives to self join

2016-07-26 Thread Buntu Dev
I'm currently doing a self-join on a table 4 times on varying conditions.
Although it works fine, I'm not sure if there are any alternatives that
perform better. Please let me know.

Thanks!