[jira] [Comment Edited] (HIVE-637) Add a simple way to create a blob table

2018-06-07 Thread Charles Pritchard (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505701#comment-16505701
 ] 

Charles Pritchard edited comment on HIVE-637 at 6/8/18 4:55 AM:


I just hit this ancient issue, with a bunch of very small files uploaded into a 
bucket – all I'm looking to do is create external table derp (data blob) 
location '/bad/place' as I will be running a select _INPUT__FILE__NAME_, data 
command subsequent.

It may be resolved by org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
  


was (Author: downchuck):
I just hit this ancient issue, with a bunch of very small files uploaded into a 
bucket – all I'm looking to do is create external table derp (data blob) 
location '/bad/place' as I will be running a select __INPUT_FILE_NAME_, data 
command subsequent.

It may be resolved by org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
 

> Add a simple way to create a blob table
> ---
>
> Key: HIVE-637
> URL: https://issues.apache.org/jira/browse/HIVE-637
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 0.3.0
>Reporter: Zheng Shao
>Priority: Major
>
> A blob table has a single column of type string. We put all data from the row 
> into that column.
> At present we are able to create blob table like this:
> {code}
> CREATE TABLE blobTable1 (blob STRING)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>   WITH SERDEPROPERTIES (
> 'serialization.last.column.takes.rest'='true'
>   )
>   STORED AS TEXTFILE;
> CREATE TABLE blobTable1 (blob STRING)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>   WITH SERDEPROPERTIES (
> 'serialization.last.column.takes.rest'='true'
>   )
>   STORED AS SEQUENCEFILE;
> {code}
> We should add a simpler way to create such a table, since it's pretty popular.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-637) Add a simple way to create a blob table

2018-06-07 Thread Charles Pritchard (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505701#comment-16505701
 ] 

Charles Pritchard edited comment on HIVE-637 at 6/8/18 4:49 AM:


I just hit this ancient issue, with a bunch of very small files uploaded into a 
bucket – all I'm looking to do is create external table derp (data blob) 
location '/bad/place' as I will be running a select __INPUT_FILE_NAME_, data 
command subsequent.

It may be resolved by org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
 


was (Author: downchuck):
I just hit this ancient issue, with a bunch of very small files uploaded into a 
bucket – all I'm looking to do is create external table derp (data blob) 
location '/bad/place' as I will be running a select __INPUT_FILE_NAME_, data 
command subsequent.

 

 

> Add a simple way to create a blob table
> ---
>
> Key: HIVE-637
> URL: https://issues.apache.org/jira/browse/HIVE-637
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 0.3.0
>Reporter: Zheng Shao
>Priority: Major
>
> A blob table has a single column of type string. We put all data from the row 
> into that column.
> At present we are able to create blob table like this:
> {code}
> CREATE TABLE blobTable1 (blob STRING)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>   WITH SERDEPROPERTIES (
> 'serialization.last.column.takes.rest'='true'
>   )
>   STORED AS TEXTFILE;
> CREATE TABLE blobTable1 (blob STRING)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>   WITH SERDEPROPERTIES (
> 'serialization.last.column.takes.rest'='true'
>   )
>   STORED AS SEQUENCEFILE;
> {code}
> We should add a simpler way to create such a table, since it's pretty popular.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-637) Add a simple way to create a blob table

2018-06-07 Thread Charles Pritchard (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505701#comment-16505701
 ] 

Charles Pritchard commented on HIVE-637:


I just hit this ancient issue, with a bunch of very small files uploaded into a 
bucket – all I'm looking to do is create external table derp (data blob) 
location '/bad/place' as I will be running a select __INPUT_FILE_NAME_, data 
command subsequent.

 

 

> Add a simple way to create a blob table
> ---
>
> Key: HIVE-637
> URL: https://issues.apache.org/jira/browse/HIVE-637
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 0.3.0
>Reporter: Zheng Shao
>Priority: Major
>
> A blob table has a single column of type string. We put all data from the row 
> into that column.
> At present we are able to create blob table like this:
> {code}
> CREATE TABLE blobTable1 (blob STRING)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>   WITH SERDEPROPERTIES (
> 'serialization.last.column.takes.rest'='true'
>   )
>   STORED AS TEXTFILE;
> CREATE TABLE blobTable1 (blob STRING)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>   WITH SERDEPROPERTIES (
> 'serialization.last.column.takes.rest'='true'
>   )
>   STORED AS SEQUENCEFILE;
> {code}
> We should add a simpler way to create such a table, since it's pretty popular.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-8434) Vectorization logic using wrong values for DATE and TIMESTAMP partitioning columns in vectorized row batches...

2017-06-14 Thread Charles Pritchard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049759#comment-16049759
 ] 

Charles Pritchard commented on HIVE-8434:
-

Hit this in 1.2.1 when using MONTH(CAST(datestr_partitioncol as date)) on 
select and group by -- gives unstable results. Seeing a lot of 7 and 31.

> Vectorization logic using wrong values for DATE and TIMESTAMP partitioning 
> columns in vectorized row batches...
> ---
>
> Key: HIVE-8434
> URL: https://issues.apache.org/jira/browse/HIVE-8434
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8434.01.patch, HIVE-8434.02.patch
>
>
> VectorizedRowBatchCtx.addPartitionColsToBatch uses wrong values to populate 
> DATE and TIMESTAMP data types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-7847) query orc partitioned table fail when table column type change

2016-02-05 Thread Charles Pritchard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135260#comment-15135260
 ] 

Charles Pritchard commented on HIVE-7847:
-

I think I'm seeing this on an INSERT OVERWRITE tied with a series of GROUP 
BY/SELECT and WITH clauses.

> query orc partitioned table fail when table column type change
> --
>
> Key: HIVE-7847
> URL: https://issues.apache.org/jira/browse/HIVE-7847
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.11.0, 0.12.0, 0.13.0
>Reporter: Zhichun Wu
>Assignee: Zhichun Wu
> Attachments: HIVE-7847.1.patch, vector_alter_partition_change_col.q
>
>
> I use the following script to test orc column type change with partitioned 
> table on branch-0.13:
> {code}
> use test;
> DROP TABLE if exists orc_change_type_staging;
> DROP TABLE if exists orc_change_type;
> CREATE TABLE orc_change_type_staging (
> id int
> );
> CREATE TABLE orc_change_type (
> id int
> ) PARTITIONED BY (`dt` string)
> stored as orc;
> --- load staging table
> LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO TABLE 
> orc_change_type_staging;
> --- populate orc hive table
> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select * FROM 
> orc_change_type_staging limit 1;
> --- change column id from int to bigint
> ALTER TABLE orc_change_type CHANGE id id bigint;
> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select * FROM 
> orc_change_type_staging limit 1;
> SELECT id FROM orc_change_type where dt between '20140718' and '20140719';
> {code}
> if fails in the last query "SELECT id FROM orc_change_type where dt between 
> '20140718' and '20140719';" with exception:
> {code}
> Error: java.io.IOException: java.io.IOException: 
> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
> to org.apache.hadoop.io.LongWritable
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.io.IntWritable cannot be cast to 
> org.apache.hadoop.io.LongWritable
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)
> ... 11 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
> cannot be cast to org.apache.hadoop.io.LongWritable
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997)
> at 
> 

[jira] [Commented] (HIVE-7148) Use murmur hash to create bucketed tables

2016-02-04 Thread Charles Pritchard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133477#comment-15133477
 ] 

Charles Pritchard commented on HIVE-7148:
-

I could really use custom bucketing functions, as I want to use buckets instead 
of partitions based on a derived value.

> Use murmur hash to create bucketed tables
> -
>
> Key: HIVE-7148
> URL: https://issues.apache.org/jira/browse/HIVE-7148
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>
> HIVE-7121 introduced murmur hashing for queries that don't insert into 
> bucketed tables. This was done to achieve better distribution of the data. 
> The same should be done for bucketed tables as well, but this involves making 
> sure we don't break backwards compat. This probably means that we have to 
> store the partitioning function used in the metadata and use that to 
> determine if SMB and bucketed map-join optimizations apply.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12895) Bucket files not renamed with multiple insert overwrite table statements

2016-01-25 Thread Charles Pritchard (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Pritchard updated HIVE-12895:
-
Description: 
With two tables that have different cluster by columns, using multiple INSERT 
OVERWRITE TABLE syntax results in the output files of one of the tables being 
named "_bucket_number_0" which should have clearly been renamed to the usual 
"0_0" style. The temporary filename is not picked up for later selects, 
making this a more urgent issue.

This is with:
Tbl1: CLUSTERED BY (col1) SORTED BY(col1) INTO 1 BUCKETS;
Tbl2: CLUSTERED BY (col2) SORTED BY(col2) INTO 1 BUCKETS;

FROM statement
INSERT OVERWRITE TABLE tbl1 select...
INSERT OVERWRITE TABLE tbl2 select...;

  was:
With two tables that have different cluster by columns, using multiple INSERT 
OVERWRITE TABLE syntax results in the output files of one of the tables being 
named "_bucket_number_0", which is not picked up by analyzer/select later on.


This is with:
Tbl1: CLUSTERED BY (col1) SORTED BY(col1) INTO 1 BUCKETS;
Tbl2: CLUSTERED BY (col2) SORTED BY(col2) INTO 1 BUCKETS;

FROM statement
INSERT OVERWRITE TABLE tbl1 select...
INSERT OVERWRITE TABLE tbl2 select...;


> Bucket files not renamed with multiple insert overwrite table statements
> 
>
> Key: HIVE-12895
> URL: https://issues.apache.org/jira/browse/HIVE-12895
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Charles Pritchard
>
> With two tables that have different cluster by columns, using multiple INSERT 
> OVERWRITE TABLE syntax results in the output files of one of the tables being 
> named "_bucket_number_0" which should have clearly been renamed to the usual 
> "0_0" style. The temporary filename is not picked up for later selects, 
> making this a more urgent issue.
> This is with:
> Tbl1: CLUSTERED BY (col1) SORTED BY(col1) INTO 1 BUCKETS;
> Tbl2: CLUSTERED BY (col2) SORTED BY(col2) INTO 1 BUCKETS;
> FROM statement
> INSERT OVERWRITE TABLE tbl1 select...
> INSERT OVERWRITE TABLE tbl2 select...;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10888) Hive Dynamic Partition + Default Partition makes Null Values Not querable

2016-01-18 Thread Charles Pritchard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106370#comment-15106370
 ] 

Charles Pritchard commented on HIVE-10888:
--

I'm seeing a similar issue, in Hive 0.14. I have a two-level partition -- 
partitioned by (date string, bucket string) and it seems that most queries do 
not include the default partition (for bucket) when run. While I can run  
create temp table as select *, and get a fully functioning table, I can not 
simply run select * where, and get useable results from the default partition, 
when I have a where query.

This may be a regression introduced in HIVE-4878. I'll check through some 
support channels to see what I can find. 

> Hive Dynamic Partition + Default Partition makes Null Values Not querable
> -
>
> Key: HIVE-10888
> URL: https://issues.apache.org/jira/browse/HIVE-10888
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor
>Reporter: Goden Yao
>
> This is reported by Pivotal.io (Noa Horn)
> And HAWQ latest version should have this fixed in our queries.
> === Expected Behavior ===
> When dynamic partition enabled and mode = nonstrict, the null value in the 
> default partition should still be returned when user specify that in 
> "...WHERE is Null".
> === Problem statment ===
> *Enable dynamic partitions*
> {code}
> hive.exec.dynamic.partition = true
> hive.exec.dynamic.partition.mode = nonstrict
> #Get default partition name:
> hive.exec.default.partition.name
> Default Value: _HIVE_DEFAULT_PARTITION_
> {code}
> Hive creates a default partition if the partition key value doesn’t conform 
> to the field type. For example, if the partition key is NULL.
> *Hive Example*
> Add the following parameters to hive-site.xml
> {code}
>   
>   hive.exec.dynamic.partition
>   true
>   
>   
>   hive.exec.dynamic.partition.mode
>   true
>   
> {code}
> Create data:
> vi /tmp/base_data.txt
> 1,1.0,1900-01-01
> 2,2.2,1994-04-14
> 3,3.3,2011-03-31
> 4,4.5,bla
> 5,5.0,2013-12-06
> Create hive table and load the data to it. This table is used to load data to 
> the partition table.
> {code}
> hive>
> CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
> SELECT * FROM base;
> OK
> 11.01900-01-01
> 22.21994-04-14
> 33.32011-03-31
> 44.5NULL
> 55.02013-12-06
> {code}
> Note that one of the rows has NULL in its date field.
> Create hive partition table and load data from base table to it. The data 
> will be dynamically partitioned
> {code}
> CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date 
> date);
> INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
> SELECT * FROM sales;
> OK
> 11.01900-01-01
> 22.21994-04-14
> 33.32011-03-31
> 55.02013-12-06
> 44.5NULL
> {code}
> Check that the table has different partitions
> {code}
> hdfs dfs -ls /hive/warehouse/sales
> Found 5 items
> drwxr-xr-x   - nhorn supergroup   0 2015-04-30 15:03 
> /hive/warehouse/sales/date=1900-01-01
> drwxr-xr-x   - nhorn supergroup   0 2015-04-30 15:03 
> /hive/warehouse/sales/date=1994-04-14
> drwxr-xr-x   - nhorn supergroup   0 2015-04-30 15:03 
> /hive/warehouse/sales/date=2011-03-31
> drwxr-xr-x   - nhorn supergroup   0 2015-04-30 15:03 
> /hive/warehouse/sales/date=2013-12-06
> drwxr-xr-x   - nhorn supergroup   0 2015-04-30 15:03 
> /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
> {code}
> Hive queries with default partition
> Queries without a filter or with a filter on a different field returns the 
> default partition data:
> {code}
> hive> select * from sales;
> OK
> 11.01900-01-01
> 22.21994-04-14
> 33.32011-03-31
> 55.02013-12-06
> 44.5NULL
> Time taken: 0.578 seconds, Fetched: 5 row(s)
> {code}
> Queries with a filter on the partition field omit the default partition data:
> {code}
> hive> select * from sales where date <> '2013-12-06';
> OK
> 11.01900-01-01
> 22.21994-04-14
> 33.32011-03-31
> Time taken: 0.19 seconds, Fetched: 3 row(s)
> hive> select * from sales where date is null;  
> OK
> Time taken: 0.035 seconds
> hive> select * from sales where date is not null;
> OK
> 11.01900-01-01
> 22.21994-04-14
> 33.32011-03-31
> 55.02013-12-06
> Time taken: 0.042 seconds, Fetched: 4 row(s)
> hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
> OK
> Time taken: 0.056 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12759) msck repair table fails when using custom partition patterns

2015-12-29 Thread Charles Pritchard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074529#comment-15074529
 ] 

Charles Pritchard commented on HIVE-12759:
--

This may just be a duplicate of https://issues.apache.org/jira/browse/HIVE-8053

> msck repair table fails when using custom partition patterns
> 
>
> Key: HIVE-12759
> URL: https://issues.apache.org/jira/browse/HIVE-12759
> Project: Hive
>  Issue Type: Bug
>Reporter: Charles Pritchard
>
> msck repair table will fail to add dynamic partitions when using a custom 
> pattern.
> set 
> hcat.dynamic.partitioning.custom.pattern="${year}/${month}/${day}/${hour}";
> CREATE EXTERNAL TABLE raw_line (line string)
> PARTITIONED BY(year STRING, month STRING, day STRING, hour STRING)
> STORED AS TEXTFILE
> LOCATION '/raw/data';
> 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: partition spec is invalid; 
> field year does not exist or is empty
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createPartition(Hive.java:1628)
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.msckAddPartitionsOneByOne(DDLTask.java:1659)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1726)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)