Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-13 Thread Vihang Karajgaonkar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review197464
---


Ship it!




Thanks for the changes Ferdinand. Patch looks good to me.

- Vihang Karajgaonkar


On Feb. 14, 2018, 1:51 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 14, 2018, 1:51 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
>  c36640d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
>  39689f1 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/package-info.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
>  3e5d831 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/8/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-13 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/
---

(Updated Feb. 14, 2018, 9:51 a.m.)


Review request for hive.


Repository: hive-git


Description
---

VectorizedParquetReader throws an exception when trying to reading from a 
parquet table on which new columns are added.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
 907a9b8 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
 c36640d 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
 08ac57b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
 39689f1 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/package-info.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
 9e414dc 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
 3e5d831 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
 5d3ebd6 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
 PRE-CREATION 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
 PRE-CREATION 
  ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
 PRE-CREATION 


Diff: https://reviews.apache.org/r/65478/diff/8/

Changes: https://reviews.apache.org/r/65478/diff/7-8/


Testing
---

Newly added UT passed and qtest passed locally.


Thanks,

cheng xu



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-13 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/
---

(Updated Feb. 14, 2018, 9:30 a.m.)


Review request for hive.


Repository: hive-git


Description
---

VectorizedParquetReader throws an exception when trying to reading from a 
parquet table on which new columns are added.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
 907a9b8 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
 c36640d 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
 08ac57b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
 39689f1 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/package-info.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
 9e414dc 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
 3e5d831 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
 5d3ebd6 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
 PRE-CREATION 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
 PRE-CREATION 
  ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
 PRE-CREATION 


Diff: https://reviews.apache.org/r/65478/diff/7/

Changes: https://reviews.apache.org/r/65478/diff/6-7/


Testing
---

Newly added UT passed and qtest passed locally.


Thanks,

cheng xu



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-12 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/
---

(Updated Feb. 13, 2018, 3:43 p.m.)


Review request for hive.


Repository: hive-git


Description
---

VectorizedParquetReader throws an exception when trying to reading from a 
parquet table on which new columns are added.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
 907a9b8 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
 c36640d 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
 08ac57b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
 39689f1 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/package-info.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
 9e414dc 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
 3e5d831 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
 5d3ebd6 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
 PRE-CREATION 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
 PRE-CREATION 
  ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
 PRE-CREATION 


Diff: https://reviews.apache.org/r/65478/diff/6/

Changes: https://reviews.apache.org/r/65478/diff/5-6/


Testing
---

Newly added UT passed and qtest passed locally.


Thanks,

cheng xu



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-12 Thread cheng xu


> On Feb. 13, 2018, 8:50 a.m., Vihang Karajgaonkar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
> > Lines 253-254 (original), 255-256 (patched)
> > 
> >
> > before the patch, the code assumes that the ColumnReader's type matches 
> > with what the hive's type is. So these two lines made sense. But after the 
> > patch, it is possible that the underlying parquet type does not have 
> > decimalMetadata so it these lines will throw a NPE. Instead of throwing 
> > NPE, we should check if decimalMetadata is set and throw IOException here 
> > saying that underlying type cannot be converted to decimal. What do you 
> > think?
> > 
> > Also, may be we should add a TODO here to use default precision and 
> > scale if its not available for the future.
> 
> cheng xu wrote:
> Why do we need the default precision and scale here? We can hardly read 
> data as decimal if those values are missing.
> 
> Vihang Karajgaonkar wrote:
> hmm .. I am not a 100% sure but I thought HiveDecimal defines something 
> like USER_DEFAULT_PRECISION and USER_DEFAULT_SCALE which is used when user 
> doesn't provide scale and precision. This is not important for this change so 
> you can ignore the comment related to default precision and scale. But I 
> think we should protect against a NPE which will be thrown if the user tries 
> to change say an int column to a decimal(10,2) (I know it doesn't work 
> anyways, but I think it would be better to throw an exception rather than 
> failing with a NPE)

+1 for the checking part. Patch is updated addressing that.


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review197313
---


On Feb. 13, 2018, 3:34 p.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 13, 2018, 3:34 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
>  c36640d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
>  39689f1 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/package-info.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
>  3e5d831 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/5/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-12 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/
---

(Updated Feb. 13, 2018, 3:34 p.m.)


Review request for hive.


Repository: hive-git


Description
---

VectorizedParquetReader throws an exception when trying to reading from a 
parquet table on which new columns are added.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
 907a9b8 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
 c36640d 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
 08ac57b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
 39689f1 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/package-info.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
 9e414dc 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
 3e5d831 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
 5d3ebd6 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
 PRE-CREATION 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
 PRE-CREATION 
  ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
 PRE-CREATION 


Diff: https://reviews.apache.org/r/65478/diff/5/

Changes: https://reviews.apache.org/r/65478/diff/4-5/


Testing
---

Newly added UT passed and qtest passed locally.


Thanks,

cheng xu



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-12 Thread Vihang Karajgaonkar via Review Board


> On Feb. 13, 2018, 12:50 a.m., Vihang Karajgaonkar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
> > Lines 253-254 (original), 255-256 (patched)
> > 
> >
> > before the patch, the code assumes that the ColumnReader's type matches 
> > with what the hive's type is. So these two lines made sense. But after the 
> > patch, it is possible that the underlying parquet type does not have 
> > decimalMetadata so it these lines will throw a NPE. Instead of throwing 
> > NPE, we should check if decimalMetadata is set and throw IOException here 
> > saying that underlying type cannot be converted to decimal. What do you 
> > think?
> > 
> > Also, may be we should add a TODO here to use default precision and 
> > scale if its not available for the future.
> 
> cheng xu wrote:
> Why do we need the default precision and scale here? We can hardly read 
> data as decimal if those values are missing.

hmm .. I am not a 100% sure but I thought HiveDecimal defines something like 
USER_DEFAULT_PRECISION and USER_DEFAULT_SCALE which is used when user doesn't 
provide scale and precision. This is not important for this change so you can 
ignore the comment related to default precision and scale. But I think we 
should protect against a NPE which will be thrown if the user tries to change 
say an int column to a decimal(10,2) (I know it doesn't work anyways, but I 
think it would be better to throw an exception rather than failing with a NPE)


- Vihang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review197313
---


On Feb. 12, 2018, 8:25 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 12, 2018, 8:25 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
>  c36640d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
>  39689f1 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
>  3e5d831 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/4/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-12 Thread cheng xu


> On Feb. 13, 2018, 8:50 a.m., Vihang Karajgaonkar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
> > Lines 253-254 (original), 255-256 (patched)
> > 
> >
> > before the patch, the code assumes that the ColumnReader's type matches 
> > with what the hive's type is. So these two lines made sense. But after the 
> > patch, it is possible that the underlying parquet type does not have 
> > decimalMetadata so it these lines will throw a NPE. Instead of throwing 
> > NPE, we should check if decimalMetadata is set and throw IOException here 
> > saying that underlying type cannot be converted to decimal. What do you 
> > think?
> > 
> > Also, may be we should add a TODO here to use default precision and 
> > scale if its not available for the future.

Why do we need the default precision and scale here? We can hardly read data as 
decimal if those values are missing.


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review197313
---


On Feb. 12, 2018, 4:25 p.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 12, 2018, 4:25 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
>  c36640d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
>  39689f1 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
>  3e5d831 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/4/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-12 Thread Vihang Karajgaonkar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review197313
---


Fix it, then Ship it!




Overall the patch looks good to me. Some comments below.


ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
Line 65 (original), 69 (patched)


can we rename this member to something more easily understandable like 
dictionaryBasedReader? The name of variable suggests its Dictionary but in fact 
its a DataColumnReader



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
Lines 57 (patched)


nit, some of the methods are missing the @Override



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
Line 156 (original), 159 (patched)


why do we need to remove this check and return 1 or 0?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
Line 175 (original), 179 (patched)


this method has duplicate code, from 
VectorizedPrimitiveColumnReader#decodeDictionaryIds. Can we refactor it to 
remove duplicate code. I think its okay if we do it in a separate change.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
Lines 89 (patched)


previously, this reader didn't support INTERVAL_DAY_TIME. Does this mean 
that it supports this hive type? If yes, shouldn't this change in 
decodeDictionaryIds method too?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
Lines 253-254 (original), 255-256 (patched)


before the patch, the code assumes that the ColumnReader's type matches 
with what the hive's type is. So these two lines made sense. But after the 
patch, it is possible that the underlying parquet type does not have 
decimalMetadata so it these lines will throw a NPE. Instead of throwing NPE, we 
should check if decimalMetadata is set and throw IOException here saying that 
underlying type cannot be converted to decimal. What do you think?

Also, may be we should add a TODO here to use default precision and scale 
if its not available for the future.


- Vihang Karajgaonkar


On Feb. 12, 2018, 8:25 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 12, 2018, 8:25 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
>  c36640d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
>  39689f1 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
>  3e5d831 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/4/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-12 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/
---

(Updated Feb. 12, 2018, 4:25 p.m.)


Review request for hive.


Repository: hive-git


Description
---

VectorizedParquetReader throws an exception when trying to reading from a 
parquet table on which new columns are added.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
 907a9b8 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
 c36640d 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
 08ac57b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
 39689f1 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
 9e414dc 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
 3e5d831 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
 5d3ebd6 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
 PRE-CREATION 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
 PRE-CREATION 
  ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
 PRE-CREATION 


Diff: https://reviews.apache.org/r/65478/diff/4/

Changes: https://reviews.apache.org/r/65478/diff/3-4/


Testing
---

Newly added UT passed and qtest passed locally.


Thanks,

cheng xu



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-09 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/
---

(Updated Feb. 9, 2018, 8:07 p.m.)


Review request for hive.


Repository: hive-git


Description
---

VectorizedParquetReader throws an exception when trying to reading from a 
parquet table on which new columns are added.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
 907a9b8 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
 c36640d 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
 08ac57b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
 39689f1 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
 9e414dc 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
 3e5d831 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
 5d3ebd6 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q
 PRE-CREATION 
  
ql/src/test/queries/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q
 PRE-CREATION 
  ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_dictionary_encoding.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/schema_evol_par_vec_table_non_dictionary_encoding.q.out
 PRE-CREATION 


Diff: https://reviews.apache.org/r/65478/diff/3/

Changes: https://reviews.apache.org/r/65478/diff/2-3/


Testing
---

Newly added UT passed and qtest passed locally.


Thanks,

cheng xu



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-06 Thread cheng xu


> On Feb. 6, 2018, 3:06 p.m., Vihang Karajgaonkar wrote:
> > ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q
> > Lines 71 (patched)
> > 
> >
> > Can you please test using timestamps as well. Specifically, the 
> > following should work.
> > 
> > drop table test_alter2;
> > create table test_alter2 (ts timestamp) stored as parquet;
> > insert into test_alter2 values ('2018-01-01 13:14:15.123456'), 
> > ('2018-01-02 14:15:16.123456'), ('2018-01-03 16:17:18.123456');
> > select * from test_alter2;
> > alter table test_alter2 replace columns (ts string);
> > select * from test_alter2;
> > 
> > 
> > drop table test_alter2;
> > create table test_alter2 (ts timestamp) stored as parquet;
> > insert into test_alter2 values ('2018-01-01 13:14:15.123456'), 
> > ('2018-01-02 14:15:16.123456'), ('2018-01-03 16:17:18.123456');
> > select * from test_alter2;
> > alter table test_alter2 replace columns (ts varchar(19));
> > -- this should truncate the microseconds
> > select * from test_alter2;
> > 
> > 
> > drop table test_alter2;
> > create table test_alter2 (ts timestamp) stored as parquet;
> > insert into test_alter2 values ('2018-01-01 13:14:15.123456'), 
> > ('2018-01-02 14:15:16.123456'), ('2018-01-03 16:17:18.123456');
> > select * from test_alter2;
> > alter table test_alter2 replace columns (ts char(25);
> > select * from test_alter2;
> 
> cheng xu wrote:
> I test non vectorization path and all of these test cases failed due to 
> not abled to type cast error. Any thoughts on this?

Never mind. It's passed in qtest. I will investigate on this.


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review196867
---


On Feb. 5, 2018, 4:46 p.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 5, 2018, 4:46 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/2/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-06 Thread cheng xu


> On Feb. 6, 2018, 3:06 p.m., Vihang Karajgaonkar wrote:
> > ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q
> > Lines 71 (patched)
> > 
> >
> > Can you please test using timestamps as well. Specifically, the 
> > following should work.
> > 
> > drop table test_alter2;
> > create table test_alter2 (ts timestamp) stored as parquet;
> > insert into test_alter2 values ('2018-01-01 13:14:15.123456'), 
> > ('2018-01-02 14:15:16.123456'), ('2018-01-03 16:17:18.123456');
> > select * from test_alter2;
> > alter table test_alter2 replace columns (ts string);
> > select * from test_alter2;
> > 
> > 
> > drop table test_alter2;
> > create table test_alter2 (ts timestamp) stored as parquet;
> > insert into test_alter2 values ('2018-01-01 13:14:15.123456'), 
> > ('2018-01-02 14:15:16.123456'), ('2018-01-03 16:17:18.123456');
> > select * from test_alter2;
> > alter table test_alter2 replace columns (ts varchar(19));
> > -- this should truncate the microseconds
> > select * from test_alter2;
> > 
> > 
> > drop table test_alter2;
> > create table test_alter2 (ts timestamp) stored as parquet;
> > insert into test_alter2 values ('2018-01-01 13:14:15.123456'), 
> > ('2018-01-02 14:15:16.123456'), ('2018-01-03 16:17:18.123456');
> > select * from test_alter2;
> > alter table test_alter2 replace columns (ts char(25);
> > select * from test_alter2;

I test non vectorization path and all of these test cases failed due to not 
abled to type cast error. Any thoughts on this?


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review196867
---


On Feb. 5, 2018, 4:46 p.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 5, 2018, 4:46 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/2/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-05 Thread Vihang Karajgaonkar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review196867
---




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
Lines 114 (patched)


nit, It should probably say "Implementation is consistent with .." Same 
with TypesFromInt64PageReader and TypesFromFloatPageReader below. Thanks



ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q
Lines 9 (patched)


Does this test work when dictionary encoding is both enabled/disabled? You 
can change parquet dictionary encoding by setting table property 
"parquet.enable.dictionary"="true" or "false". Can you please modify the test 
to include that as well?



ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q
Lines 49 (patched)


I think this comment can be removed now so that there is no confusion in 
the future.



ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q
Lines 71 (patched)


Can you please test using timestamps as well. Specifically, the following 
should work.

drop table test_alter2;
create table test_alter2 (ts timestamp) stored as parquet;
insert into test_alter2 values ('2018-01-01 13:14:15.123456'), ('2018-01-02 
14:15:16.123456'), ('2018-01-03 16:17:18.123456');
select * from test_alter2;
alter table test_alter2 replace columns (ts string);
select * from test_alter2;

drop table test_alter2;
create table test_alter2 (ts timestamp) stored as parquet;
insert into test_alter2 values ('2018-01-01 13:14:15.123456'), ('2018-01-02 
14:15:16.123456'), ('2018-01-03 16:17:18.123456');
select * from test_alter2;
alter table test_alter2 replace columns (ts varchar(19));
-- this should truncate the microseconds
select * from test_alter2;

drop table test_alter2;
create table test_alter2 (ts timestamp) stored as parquet;
insert into test_alter2 values ('2018-01-01 13:14:15.123456'), ('2018-01-02 
14:15:16.123456'), ('2018-01-03 16:17:18.123456');
select * from test_alter2;
alter table test_alter2 replace columns (ts char(25);
select * from test_alter2;


- Vihang Karajgaonkar


On Feb. 5, 2018, 8:46 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 5, 2018, 8:46 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/2/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-05 Thread cheng xu


> On Feb. 6, 2018, 1:46 a.m., Vihang Karajgaonkar wrote:
> >

Thanks Vihang for your review. Comments left below.


> On Feb. 6, 2018, 1:46 a.m., Vihang Karajgaonkar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
> > Lines 24 (patched)
> > 
> >
> > Do we need to override the methods for other reads as well? What is the 
> > criteria to identify the methods which need to be overriden for this and 
> > TypesFromInt64PageReader?
> 
> Jerry Chen wrote:
> Current Ferdinand follows the same conversion principles of 
> ETypeConverter in Hive. The basic conversion implemented in ETypeConverter 
> is: low precision data type can convert to high precison data type. for 
> int32: it can be converted to int64, float, double. for int64: it can be 
> converted to float, double. For float: it can be converted to double.
> 
> Although other conversions can logically supported such as int64 to int32 
> or double to float. But that is just not always safe.

Yes, as Jerry mentioned, our current implementation stays the same rules as non 
vectorization path. To support other conversion, we may need to ensure both 
vectorization and non vectorization have the same behavior since vectorization 
is just a performance feature.


> On Feb. 6, 2018, 1:46 a.m., Vihang Karajgaonkar wrote:
> > ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q
> > Lines 9 (patched)
> > 
> >
> > can you please confirm that the result of this q file matches with 
> > non-vectorized execution?

Yes, I checked the output as well by disabling the vectorization. And they have 
the same output as vectorization enabled case.


> On Feb. 6, 2018, 1:46 a.m., Vihang Karajgaonkar wrote:
> > ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out
> > Lines 140 (patched)
> > 
> >
> > this is interesting. Do you know why this row is returned first?

The order is the same as non vectorization read path. It may result from the 
order of generated files.


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review196718
---


On Feb. 5, 2018, 4:46 p.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 5, 2018, 4:46 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/2/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-05 Thread Jerry Chen


> On Feb. 5, 2018, 5:46 p.m., Vihang Karajgaonkar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
> > Lines 24 (patched)
> > 
> >
> > Do we need to override the methods for other reads as well? What is the 
> > criteria to identify the methods which need to be overriden for this and 
> > TypesFromInt64PageReader?

Current Ferdinand follows the same conversion principles of ETypeConverter in 
Hive. The basic conversion implemented in ETypeConverter is: low precision data 
type can convert to high precison data type. for int32: it can be converted to 
int64, float, double. for int64: it can be converted to float, double. For 
float: it can be converted to double.

Although other conversions can logically supported such as int64 to int32 or 
double to float. But that is just not always safe.


- Jerry


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review196718
---


On Feb. 5, 2018, 8:46 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 5, 2018, 8:46 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/2/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-05 Thread Vihang Karajgaonkar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review196718
---




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
Lines 24 (patched)


Do we need to override the methods for other reads as well? What is the 
criteria to identify the methods which need to be overriden for this and 
TypesFromInt64PageReader?



ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q
Lines 9 (patched)


can you please confirm that the result of this q file matches with 
non-vectorized execution?



ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out
Lines 140 (patched)


this is interesting. Do you know why this row is returned first?


- Vihang Karajgaonkar


On Feb. 5, 2018, 8:46 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 5, 2018, 8:46 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/2/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-05 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/
---

(Updated Feb. 5, 2018, 4:46 p.m.)


Review request for hive.


Repository: hive-git


Description
---

VectorizedParquetReader throws an exception when trying to reading from a 
parquet table on which new columns are added.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
 907a9b8 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
 08ac57b 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
 9e414dc 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
 5d3ebd6 
  ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q PRE-CREATION 
  ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/65478/diff/2/

Changes: https://reviews.apache.org/r/65478/diff/1-2/


Testing
---

Newly added UT passed and qtest passed locally.


Thanks,

cheng xu



Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

2018-02-04 Thread Jerry Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review196793
---




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
Lines 92 (patched)


For the types that doesn't support any type conversion, we can simply 
return the realReader, instead of a wrapper but without do anything around it.


- Jerry Chen


On Feb. 2, 2018, 8:46 a.m., cheng xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> ---
> 
> (Updated Feb. 2, 2018, 8:46 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/DefaultParquetDataColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java
>  3e5d831 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/1/
> 
> 
> Testing
> ---
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>