Hive compaction didn't launch
I'm using Hive 1.2.1 transactional table. Inserting data in it via Hive Streaming API. After some time i expect compaction to start but it didn't happen: Here's part of log, which shows that compactor initiator thread doesn't see any delta files: *2016-07-26 18:06:52,459 INFO [Thread-8]: compactor.Initiator (Initiator.java:run(89)) - Checking to see if we should compact default.data_aaa.dt=20160726* *2016-07-26 18:06:52,496 DEBUG [Thread-8]: io.AcidUtils (AcidUtils.java:getAcidState(432)) - in directory hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/data_aaa/dt=20160726 base = null deltas = 0* *2016-07-26 18:06:52,496 DEBUG [Thread-8]: compactor.Initiator (Initiator.java:determineCompactionType(271)) - delta size: 0 base size: 0 threshold: 0.1 will major compact: false* But in that directory there's actually 23 files: hadoop fs -ls /apps/hive/warehouse/data_aaa/dt=20160726 Found 23 items -rw-r--r-- 3 storm hdfs 4 2016-07-26 17:20 /apps/hive/warehouse/data_aaa/dt=20160726/_orc_acid_version drwxrwxrwx - storm hdfs 0 2016-07-26 17:22 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71741256_71741355 drwxrwxrwx - storm hdfs 0 2016-07-26 17:23 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71762456_71762555 drwxrwxrwx - storm hdfs 0 2016-07-26 17:25 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71787756_71787855 drwxrwxrwx - storm hdfs 0 2016-07-26 17:26 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71795756_71795855 drwxrwxrwx - storm hdfs 0 2016-07-26 17:27 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71804656_71804755 drwxrwxrwx - storm hdfs 0 2016-07-26 17:29 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71828856_71828955 drwxrwxrwx - storm hdfs 0 2016-07-26 17:30 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71846656_71846755 drwxrwxrwx - storm hdfs 0 2016-07-26 17:32 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71850756_71850855 drwxrwxrwx - storm hdfs 0 2016-07-26 17:33 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71867356_71867455 drwxrwxrwx - storm hdfs 0 2016-07-26 17:34 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71891556_71891655 drwxrwxrwx - storm hdfs 0 2016-07-26 17:36 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71904856_71904955 drwxrwxrwx - storm hdfs 0 2016-07-26 17:37 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71907256_71907355 drwxrwxrwx - storm hdfs 0 2016-07-26 17:39 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71918756_71918855 drwxrwxrwx - storm hdfs 0 2016-07-26 17:40 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71947556_71947655 drwxrwxrwx - storm hdfs 0 2016-07-26 17:41 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71960656_71960755 drwxrwxrwx - storm hdfs 0 2016-07-26 17:43 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71963156_71963255 drwxrwxrwx - storm hdfs 0 2016-07-26 17:44 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71964556_71964655 drwxrwxrwx - storm hdfs 0 2016-07-26 17:46 /apps/hive/warehouse/data_aaa/dt=20160726/delta_71987156_71987255 drwxrwxrwx - storm hdfs 0 2016-07-26 17:47 /apps/hive/warehouse/data_aaa/dt=20160726/delta_72015756_72015855 drwxrwxrwx - storm hdfs 0 2016-07-26 17:48 /apps/hive/warehouse/data_aaa/dt=20160726/delta_72021356_72021455 drwxrwxrwx - storm hdfs 0 2016-07-26 17:50 /apps/hive/warehouse/data_aaa/dt=20160726/delta_72048756_72048855 drwxrwxrwx - storm hdfs 0 2016-07-26 17:50 /apps/hive/warehouse/data_aaa/dt=20160726/delta_72070856_72070955 Full log here <http://pastebin.com/gHwvgRUV>. What could go wrong?
Re: Re-arrange columns
Hi, I think I had similar issues to yours. Did you look in the Hive documentation at what the CASCADE keyword does on ADD or CHANGE COLUMNS statements? https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn >From what I understand, the behavior of Hive when adding or changing columns is different from standard RDBMS: I think that adding or moving a column in Hive will only change the metadata and not the underlying data. To cope with that, the metastore keeps separate schema information for the table itself and for its partitions. This way, when you change the schema of your table, the old partition can still be read according to the old schema, and new partitions will be created with the new schema (unless you use CASCADE) There are two dangers with this, however: - if you create a new column, it will not be accessible in the old partitions. - if you regenerate (override) an existing partition, the new data will correspond to the new schema, but the partition's metadata will not be updated. (I believe this could be considered as a bug, with a workaround since the "ALTER TABLE table_name ADD COLUMNS ..." can also be applied to one specific partition at a time: "ALTER TABLE table_name PARTITION partition_spec ADD COLUMNS ...") If you use CASCADE, the change you apply to the table will be immediately applied to its partitions as well. But if you don't regenerate your existing partitions, I believe you will have problems as well, since your partition's schema will not match the underlying data. So, I guess it mostly depend on if (and when) you plan to regenerate your partition to add the new column to your existing data. But you can either: A. Drop your table and do a msck repair table. B. To reduce unavailability: create another table, populate it, and then swap your tables. [image: Inline image 1] C. Do your column change with CASCADE, and regenerate your partitions immediately, but they might not be correctly readable between the time you make your change and regenerate them. D. Do your column change without CASCADE: you can still query the old partitions (without the new column though) and after regenerating a partition, change its schema with a "ALTER TABLE table_name PARTITION partition_spec ADD COLUMN ..." E. (I'm not sure this one works) Do your column change without CASCADE, create a copy of the table, generate the partitions there, and then use EXCHANGE PARTITION ( https://cwiki.apache.org/confluence/display/Hive/Exchange+Partition) to move the partitions from the new table to the old. Hope this helps, let me know how it turns out, Furcy On Tue, Jul 26, 2016 at 2:34 AM, Binh Nguyen Vanwrote: > Hi, > > I am writing an application that insert data into Hive and it depends on > the order of columns in table so I have to rearrange my columns to make > work around that but I have difficulty of doing that. Could you please help? > > Here is my problem: > > I have table 'tbl' which is created by using this statement: > CREATE EXTERNAL TABLE tbl (col1 int, col3 string) PARTITIONED BY (dt string) > STORED AS PARQUET LOCATION '/wh/db/tb'. > Now I want to add new column 'col2' with type int to that table and I want to > put it after 'col1' and before 'col3' so I use these two statements: > ALTER TABLE tbl ADD COLUMNS (col2 int); > ALTER TABLE tbl CHANGE col2 col2 int AFTER col1; > > With Hive 1.2.1, these two statements are executed successfully but I got > type cast exception when I query data back. > With Hive 2.x, The second alter statement failed with exception: 'Unable to > alter table. The following columns have types incompatible with the existing > columns in their respective positions' > > I could work around this by drop old table and then create a new table > with new schema but this requires me to to run MSCK to update metadata for > that table and this process could be very slow when I have a lot of data > and a lot of partition so I am looking for a better one. > > Please help! > Thanks > -Binh > >
Re: ORC algorithm skeleton
Also - http://orc.apache.org/docs/spec-intro.html On Tue, Jul 26, 2016 at 1:49 PM, praveenesh kumarwrote: > This might help - https://issues.apache.org/jira/browse/HIVE-3874 > > Regards > Prav > > On Tue, Jul 26, 2016 at 12:58 PM, Amatucci, Mario, Vodafone Group < > mario.amatu...@vodafone.com> wrote: > >> Hello everyone, >> >> Anyone got any idea about ORC algorithm skeleton? How it works? >> >> >
Re: ORC algorithm skeleton
This might help - https://issues.apache.org/jira/browse/HIVE-3874 Regards Prav On Tue, Jul 26, 2016 at 12:58 PM, Amatucci, Mario, Vodafone Group < mario.amatu...@vodafone.com> wrote: > Hello everyone, > > Anyone got any idea about ORC algorithm skeleton? How it works? > >
ORC algorithm skeleton
Hello everyone, Anyone got any idea about ORC algorithm skeleton? How it works?
RE: Alternatives to self join
Hi Can you please send your original query and perhaps a small dataset sample? Thanks Dudu From: Buntu Dev [mailto:buntu...@gmail.com] Sent: Tuesday, July 26, 2016 10:46 AM To: user@hive.apache.org Subject: Alternatives to self join I'm currently doing a self-join on a table 4 times on varying conditions. Although it works fine, I'm not sure if there are any alternatives that perform better. Please let me know. Thanks!
Alternatives to self join
I'm currently doing a self-join on a table 4 times on varying conditions. Although it works fine, I'm not sure if there are any alternatives that perform better. Please let me know. Thanks!