[jira] [Commented] (HIVE-9303) Parquet files are written with incorrect definition levels

2015-01-16 Thread Skye Wanderman-Milne (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281010#comment-14281010
 ] 

Skye Wanderman-Milne commented on HIVE-9303:


That's my understanding. The same thing happens if you insert NULL into the 
Parquet table directly too, it's not just a problem with CTAS or reading from 
text.

> Parquet files are written with incorrect definition levels
> --
>
> Key: HIVE-9303
> URL: https://issues.apache.org/jira/browse/HIVE-9303
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>    Reporter: Skye Wanderman-Milne
>
> The definition level, which determines which level of nesting is NULL, 
> appears to always be n or n-1, where n is the maximum definition level. This 
> means that only the innermost level of nesting can be NULL. This is only 
> relevant for Parquet files. For example:
> {code:sql}
> CREATE TABLE text_tbl (a STRUCT>)
> STORED AS TEXTFILE;
> INSERT OVERWRITE TABLE text_tbl
> SELECT IF(false, named_struct("b", named_struct("c", 1)), NULL)
> FROM tbl LIMIT 1;
> CREATE TABLE parq_tbl
> STORED AS PARQUET
> AS SELECT * FROM text_tbl;
> SELECT * FROM text_tbl;
> => NULL # right
> SELECT * FROM parq_tbl;
> => {"b":{"c":null}} # wrong
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9303) Parquet files are written with incorrect definition levels

2015-01-07 Thread Skye Wanderman-Milne (JIRA)
Skye Wanderman-Milne created HIVE-9303:
--

 Summary: Parquet files are written with incorrect definition levels
 Key: HIVE-9303
 URL: https://issues.apache.org/jira/browse/HIVE-9303
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Skye Wanderman-Milne


The definition level, which determines which level of nesting is NULL, appears 
to always be n or n-1, where n is the maximum definition level. This means that 
only the innermost level of nesting can be NULL. This is only relevant for 
Parquet files. For example:

{code:sql}
CREATE TABLE text_tbl (a STRUCT>)
STORED AS TEXTFILE;

INSERT OVERWRITE TABLE text_tbl
SELECT IF(false, named_struct("b", named_struct("c", 1)), NULL)
FROM tbl LIMIT 1;

CREATE TABLE parq_tbl
STORED AS PARQUET
AS SELECT * FROM text_tbl;

SELECT * FROM text_tbl;
=> NULL # right

SELECT * FROM parq_tbl;
=> {"b":{"c":null}} # wrong
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 20826: HIVE-5823: Support for DECIMAL primitive type in AvroSerDe

2014-04-30 Thread Skye Wanderman-Milne

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20826/#review41889
---



ql/src/test/queries/clientpositive/avro_decimal.q
<https://reviews.apache.org/r/20826/#comment75549>

Here and below the "precison" and "scale" attributes should be JSON 
integers, not strings, as specified in the spec.


- Skye Wanderman-Milne


On April 29, 2014, 5:17 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20826/
> ---
> 
> (Updated April 29, 2014, 5:17 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-5823
> https://issues.apache.org/jira/browse/HIVE-5823
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Support decimal type in Avro storage. The implemenation is based on 
> specifications detailed in AVRO-1402.
> 
> 
> Diffs
> -
> 
>   data/files/dec.txt PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java 
> 52a22e5 
>   ql/src/test/queries/clientpositive/avro_decimal.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_schema_literal.q d77f310 
>   ql/src/test/results/clientpositive/avro_decimal.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_schema_literal.q.out ca945d5 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
> a28861f 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
>  8beffd7 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 92799ed 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 
> 9d58d13 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java 
> b2c58c7 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java 
> 251f04f 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java 
> b3559ea 
>   
> serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java
>  a0e5018 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java d5730fa 
> 
> Diff: https://reviews.apache.org/r/20826/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are added. Test suite passed.
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



[jira] [Created] (HIVE-4195) Avro SerDe causes incorrect behavior in unrelated tables

2013-03-15 Thread Skye Wanderman-Milne (JIRA)
Skye Wanderman-Milne created HIVE-4195:
--

 Summary: Avro SerDe causes incorrect behavior in unrelated tables
 Key: HIVE-4195
 URL: https://issues.apache.org/jira/browse/HIVE-4195
 Project: Hive
  Issue Type: Bug
Reporter: Skye Wanderman-Milne


When I run a file that first creates an Avro table using the Avro SerDe, then 
immediately creates an LZO text table and inserts data into the LZO table, the 
resulting LZO table contain Avro data files. When I remove the Avro CREATE 
TABLE statement, the LZO table contains .lzo files as expected.

{noformat}
DROP TABLE IF EXISTS avro_table;
CREATE EXTERNAL TABLE avro_table
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal' = '{
"namespace": "testing.hive.avro.serde",
"name": "test_record",
"type": "record",
"fields": [
{"name":"int1", "type":"long"},
{"name":"string1", "type":"string"}
]
}');

DROP TABLE IF EXISTS lzo_table;
CREATE EXTERNAL TABLE lzo_table (
id int,
bool_col boolean,
tinyint_col tinyint,
smallint_col smallint,
int_col int,
bigint_col bigint,
float_col float,
double_col double,
date_string_col string,
string_col string,
timestamp_col timestamp)
STORED AS 
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
;

SET hive.exec.compress.output=true;
SET mapred.output.compression.type=BLOCK;
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
SET mapred.max.split.size=25600;
SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
insert overwrite table lzo_table SELECT id, bool_col, tinyint_col, 
smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, 
string_col, timestamp_col FROM src_table;
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira