[jira] [Comment Edited] (HIVE-637) Add a simple way to create a blob table
[ https://issues.apache.org/jira/browse/HIVE-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505701#comment-16505701 ] Charles Pritchard edited comment on HIVE-637 at 6/8/18 4:55 AM: I just hit this ancient issue, with a bunch of very small files uploaded into a bucket – all I'm looking to do is create external table derp (data blob) location '/bad/place' as I will be running a select _INPUT__FILE__NAME_, data command subsequent. It may be resolved by org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe was (Author: downchuck): I just hit this ancient issue, with a bunch of very small files uploaded into a bucket – all I'm looking to do is create external table derp (data blob) location '/bad/place' as I will be running a select __INPUT_FILE_NAME_, data command subsequent. It may be resolved by org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > Add a simple way to create a blob table > --- > > Key: HIVE-637 > URL: https://issues.apache.org/jira/browse/HIVE-637 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Affects Versions: 0.3.0 >Reporter: Zheng Shao >Priority: Major > > A blob table has a single column of type string. We put all data from the row > into that column. > At present we are able to create blob table like this: > {code} > CREATE TABLE blobTable1 (blob STRING) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( > 'serialization.last.column.takes.rest'='true' > ) > STORED AS TEXTFILE; > CREATE TABLE blobTable1 (blob STRING) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( > 'serialization.last.column.takes.rest'='true' > ) > STORED AS SEQUENCEFILE; > {code} > We should add a simpler way to create such a table, since it's pretty popular. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HIVE-637) Add a simple way to create a blob table
[ https://issues.apache.org/jira/browse/HIVE-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505701#comment-16505701 ] Charles Pritchard edited comment on HIVE-637 at 6/8/18 4:49 AM: I just hit this ancient issue, with a bunch of very small files uploaded into a bucket – all I'm looking to do is create external table derp (data blob) location '/bad/place' as I will be running a select __INPUT_FILE_NAME_, data command subsequent. It may be resolved by org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe was (Author: downchuck): I just hit this ancient issue, with a bunch of very small files uploaded into a bucket – all I'm looking to do is create external table derp (data blob) location '/bad/place' as I will be running a select __INPUT_FILE_NAME_, data command subsequent. > Add a simple way to create a blob table > --- > > Key: HIVE-637 > URL: https://issues.apache.org/jira/browse/HIVE-637 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Affects Versions: 0.3.0 >Reporter: Zheng Shao >Priority: Major > > A blob table has a single column of type string. We put all data from the row > into that column. > At present we are able to create blob table like this: > {code} > CREATE TABLE blobTable1 (blob STRING) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( > 'serialization.last.column.takes.rest'='true' > ) > STORED AS TEXTFILE; > CREATE TABLE blobTable1 (blob STRING) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( > 'serialization.last.column.takes.rest'='true' > ) > STORED AS SEQUENCEFILE; > {code} > We should add a simpler way to create such a table, since it's pretty popular. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-637) Add a simple way to create a blob table
[ https://issues.apache.org/jira/browse/HIVE-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505701#comment-16505701 ] Charles Pritchard commented on HIVE-637: I just hit this ancient issue, with a bunch of very small files uploaded into a bucket – all I'm looking to do is create external table derp (data blob) location '/bad/place' as I will be running a select __INPUT_FILE_NAME_, data command subsequent. > Add a simple way to create a blob table > --- > > Key: HIVE-637 > URL: https://issues.apache.org/jira/browse/HIVE-637 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Affects Versions: 0.3.0 >Reporter: Zheng Shao >Priority: Major > > A blob table has a single column of type string. We put all data from the row > into that column. > At present we are able to create blob table like this: > {code} > CREATE TABLE blobTable1 (blob STRING) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( > 'serialization.last.column.takes.rest'='true' > ) > STORED AS TEXTFILE; > CREATE TABLE blobTable1 (blob STRING) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( > 'serialization.last.column.takes.rest'='true' > ) > STORED AS SEQUENCEFILE; > {code} > We should add a simpler way to create such a table, since it's pretty popular. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-8434) Vectorization logic using wrong values for DATE and TIMESTAMP partitioning columns in vectorized row batches...
[ https://issues.apache.org/jira/browse/HIVE-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049759#comment-16049759 ] Charles Pritchard commented on HIVE-8434: - Hit this in 1.2.1 when using MONTH(CAST(datestr_partitioncol as date)) on select and group by -- gives unstable results. Seeing a lot of 7 and 31. > Vectorization logic using wrong values for DATE and TIMESTAMP partitioning > columns in vectorized row batches... > --- > > Key: HIVE-8434 > URL: https://issues.apache.org/jira/browse/HIVE-8434 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 0.14.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-8434.01.patch, HIVE-8434.02.patch > > > VectorizedRowBatchCtx.addPartitionColsToBatch uses wrong values to populate > DATE and TIMESTAMP data types. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-7847) query orc partitioned table fail when table column type change
[ https://issues.apache.org/jira/browse/HIVE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135260#comment-15135260 ] Charles Pritchard commented on HIVE-7847: - I think I'm seeing this on an INSERT OVERWRITE tied with a series of GROUP BY/SELECT and WITH clauses. > query orc partitioned table fail when table column type change > -- > > Key: HIVE-7847 > URL: https://issues.apache.org/jira/browse/HIVE-7847 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.11.0, 0.12.0, 0.13.0 >Reporter: Zhichun Wu >Assignee: Zhichun Wu > Attachments: HIVE-7847.1.patch, vector_alter_partition_change_col.q > > > I use the following script to test orc column type change with partitioned > table on branch-0.13: > {code} > use test; > DROP TABLE if exists orc_change_type_staging; > DROP TABLE if exists orc_change_type; > CREATE TABLE orc_change_type_staging ( > id int > ); > CREATE TABLE orc_change_type ( > id int > ) PARTITIONED BY (`dt` string) > stored as orc; > --- load staging table > LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO TABLE > orc_change_type_staging; > --- populate orc hive table > INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select * FROM > orc_change_type_staging limit 1; > --- change column id from int to bigint > ALTER TABLE orc_change_type CHANGE id id bigint; > INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select * FROM > orc_change_type_staging limit 1; > SELECT id FROM orc_change_type where dt between '20140718' and '20140719'; > {code} > if fails in the last query "SELECT id FROM orc_change_type where dt between > '20140718' and '20140719';" with exception: > {code} > Error: java.io.IOException: java.io.IOException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) > Caused by: java.io.IOException: java.lang.ClassCastException: > org.apache.hadoop.io.IntWritable cannot be cast to > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254) > ... 11 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997) > at >
[jira] [Commented] (HIVE-7148) Use murmur hash to create bucketed tables
[ https://issues.apache.org/jira/browse/HIVE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133477#comment-15133477 ] Charles Pritchard commented on HIVE-7148: - I could really use custom bucketing functions, as I want to use buckets instead of partitions based on a derived value. > Use murmur hash to create bucketed tables > - > > Key: HIVE-7148 > URL: https://issues.apache.org/jira/browse/HIVE-7148 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner > > HIVE-7121 introduced murmur hashing for queries that don't insert into > bucketed tables. This was done to achieve better distribution of the data. > The same should be done for bucketed tables as well, but this involves making > sure we don't break backwards compat. This probably means that we have to > store the partitioning function used in the metadata and use that to > determine if SMB and bucketed map-join optimizations apply. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12895) Bucket files not renamed with multiple insert overwrite table statements
[ https://issues.apache.org/jira/browse/HIVE-12895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Pritchard updated HIVE-12895: - Description: With two tables that have different cluster by columns, using multiple INSERT OVERWRITE TABLE syntax results in the output files of one of the tables being named "_bucket_number_0" which should have clearly been renamed to the usual "0_0" style. The temporary filename is not picked up for later selects, making this a more urgent issue. This is with: Tbl1: CLUSTERED BY (col1) SORTED BY(col1) INTO 1 BUCKETS; Tbl2: CLUSTERED BY (col2) SORTED BY(col2) INTO 1 BUCKETS; FROM statement INSERT OVERWRITE TABLE tbl1 select... INSERT OVERWRITE TABLE tbl2 select...; was: With two tables that have different cluster by columns, using multiple INSERT OVERWRITE TABLE syntax results in the output files of one of the tables being named "_bucket_number_0", which is not picked up by analyzer/select later on. This is with: Tbl1: CLUSTERED BY (col1) SORTED BY(col1) INTO 1 BUCKETS; Tbl2: CLUSTERED BY (col2) SORTED BY(col2) INTO 1 BUCKETS; FROM statement INSERT OVERWRITE TABLE tbl1 select... INSERT OVERWRITE TABLE tbl2 select...; > Bucket files not renamed with multiple insert overwrite table statements > > > Key: HIVE-12895 > URL: https://issues.apache.org/jira/browse/HIVE-12895 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Charles Pritchard > > With two tables that have different cluster by columns, using multiple INSERT > OVERWRITE TABLE syntax results in the output files of one of the tables being > named "_bucket_number_0" which should have clearly been renamed to the usual > "0_0" style. The temporary filename is not picked up for later selects, > making this a more urgent issue. > This is with: > Tbl1: CLUSTERED BY (col1) SORTED BY(col1) INTO 1 BUCKETS; > Tbl2: CLUSTERED BY (col2) SORTED BY(col2) INTO 1 BUCKETS; > FROM statement > INSERT OVERWRITE TABLE tbl1 select... > INSERT OVERWRITE TABLE tbl2 select...; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10888) Hive Dynamic Partition + Default Partition makes Null Values Not querable
[ https://issues.apache.org/jira/browse/HIVE-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106370#comment-15106370 ] Charles Pritchard commented on HIVE-10888: -- I'm seeing a similar issue, in Hive 0.14. I have a two-level partition -- partitioned by (date string, bucket string) and it seems that most queries do not include the default partition (for bucket) when run. While I can run create temp table as select *, and get a fully functioning table, I can not simply run select * where, and get useable results from the default partition, when I have a where query. This may be a regression introduced in HIVE-4878. I'll check through some support channels to see what I can find. > Hive Dynamic Partition + Default Partition makes Null Values Not querable > - > > Key: HIVE-10888 > URL: https://issues.apache.org/jira/browse/HIVE-10888 > Project: Hive > Issue Type: Bug > Components: Hive, Query Processor >Reporter: Goden Yao > > This is reported by Pivotal.io (Noa Horn) > And HAWQ latest version should have this fixed in our queries. > === Expected Behavior === > When dynamic partition enabled and mode = nonstrict, the null value in the > default partition should still be returned when user specify that in > "...WHERE is Null". > === Problem statment === > *Enable dynamic partitions* > {code} > hive.exec.dynamic.partition = true > hive.exec.dynamic.partition.mode = nonstrict > #Get default partition name: > hive.exec.default.partition.name > Default Value: _HIVE_DEFAULT_PARTITION_ > {code} > Hive creates a default partition if the partition key value doesn’t conform > to the field type. For example, if the partition key is NULL. > *Hive Example* > Add the following parameters to hive-site.xml > {code} > > hive.exec.dynamic.partition > true > > > hive.exec.dynamic.partition.mode > true > > {code} > Create data: > vi /tmp/base_data.txt > 1,1.0,1900-01-01 > 2,2.2,1994-04-14 > 3,3.3,2011-03-31 > 4,4.5,bla > 5,5.0,2013-12-06 > Create hive table and load the data to it. This table is used to load data to > the partition table. > {code} > hive> > CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT > DELIMITED FIELDS TERMINATED BY ','; > LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base; > SELECT * FROM base; > OK > 11.01900-01-01 > 22.21994-04-14 > 33.32011-03-31 > 44.5NULL > 55.02013-12-06 > {code} > Note that one of the rows has NULL in its date field. > Create hive partition table and load data from base table to it. The data > will be dynamically partitioned > {code} > CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date > date); > INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base; > SELECT * FROM sales; > OK > 11.01900-01-01 > 22.21994-04-14 > 33.32011-03-31 > 55.02013-12-06 > 44.5NULL > {code} > Check that the table has different partitions > {code} > hdfs dfs -ls /hive/warehouse/sales > Found 5 items > drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 > /hive/warehouse/sales/date=1900-01-01 > drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 > /hive/warehouse/sales/date=1994-04-14 > drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 > /hive/warehouse/sales/date=2011-03-31 > drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 > /hive/warehouse/sales/date=2013-12-06 > drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 > /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__ > {code} > Hive queries with default partition > Queries without a filter or with a filter on a different field returns the > default partition data: > {code} > hive> select * from sales; > OK > 11.01900-01-01 > 22.21994-04-14 > 33.32011-03-31 > 55.02013-12-06 > 44.5NULL > Time taken: 0.578 seconds, Fetched: 5 row(s) > {code} > Queries with a filter on the partition field omit the default partition data: > {code} > hive> select * from sales where date <> '2013-12-06'; > OK > 11.01900-01-01 > 22.21994-04-14 > 33.32011-03-31 > Time taken: 0.19 seconds, Fetched: 3 row(s) > hive> select * from sales where date is null; > OK > Time taken: 0.035 seconds > hive> select * from sales where date is not null; > OK > 11.01900-01-01 > 22.21994-04-14 > 33.32011-03-31 > 55.02013-12-06 > Time taken: 0.042 seconds, Fetched: 4 row(s) > hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__'; > OK > Time taken: 0.056 seconds > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12759) msck repair table fails when using custom partition patterns
[ https://issues.apache.org/jira/browse/HIVE-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074529#comment-15074529 ] Charles Pritchard commented on HIVE-12759: -- This may just be a duplicate of https://issues.apache.org/jira/browse/HIVE-8053 > msck repair table fails when using custom partition patterns > > > Key: HIVE-12759 > URL: https://issues.apache.org/jira/browse/HIVE-12759 > Project: Hive > Issue Type: Bug >Reporter: Charles Pritchard > > msck repair table will fail to add dynamic partitions when using a custom > pattern. > set > hcat.dynamic.partitioning.custom.pattern="${year}/${month}/${day}/${hour}"; > CREATE EXTERNAL TABLE raw_line (line string) > PARTITIONED BY(year STRING, month STRING, day STRING, hour STRING) > STORED AS TEXTFILE > LOCATION '/raw/data'; > > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: partition spec is invalid; > field year does not exist or is empty > at > org.apache.hadoop.hive.ql.metadata.Hive.createPartition(Hive.java:1628) > at > org.apache.hadoop.hive.ql.exec.DDLTask.msckAddPartitionsOneByOne(DDLTask.java:1659) > at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1726) -- This message was sent by Atlassian JIRA (v6.3.4#6332)