[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646170=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646170
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 04:22
Start Date: 03/Sep/21 04:22
Worklog Time Spent: 10m 
  Work Description: ywskycn commented on pull request #2421:
URL: https://github.com/apache/hive/pull/2421#issuecomment-912244652


   @sunchao help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646170)
Time Spent: 2h 10m  (was: 2h)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=646129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646129
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 01:29
Start Date: 03/Sep/21 01:29
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2344:
URL: https://github.com/apache/hive/pull/2344#discussion_r701523603



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -8142,9 +8146,11 @@ private void dropPartitionAllColumnGrantsNoTxn(
   query.declareParameters("java.lang.String t1");
   mSecurityDCList = (List) query.execute(dcName);
 }
+try (Query q = query) {

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646129)
Time Spent: 9h 50m  (was: 9h 40m)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=646127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646127
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 01:28
Start Date: 03/Sep/21 01:28
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2344:
URL: https://github.com/apache/hive/pull/2344#discussion_r701523382



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -8142,9 +8146,11 @@ private void dropPartitionAllColumnGrantsNoTxn(
   query.declareParameters("java.lang.String t1");
   mSecurityDCList = (List) query.execute(dcName);
 }
+try (Query q = query) {
 pm.retrieveAll(mSecurityDCList);

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646127)
Time Spent: 9.5h  (was: 9h 20m)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=646128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646128
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 01:28
Start Date: 03/Sep/21 01:28
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2344:
URL: https://github.com/apache/hive/pull/2344#discussion_r701523455



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
##
@@ -1454,12 +1455,14 @@ public ColumnStatistics getTableStats(final String 
catName, final String dbName,
   }
 };
 List list = Batchable.runBatched(batchSize, colNames, b);
+final ColumnStatistics result;
 if (list.isEmpty()) {
-  return null;
+  result = null;
+} else {
+  ColumnStatisticsDesc csd = new ColumnStatisticsDesc(true, dbName, 
tableName);
+  csd.setCatName(catName);
+  result = makeColumnStats(list, csd, 0, engine);
 }

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646128)
Time Spent: 9h 40m  (was: 9.5h)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25056) cast ('000-00-00 00:00:00' as timestamp/datetime) results in wrong conversion

2021-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25056?focusedWorklogId=646099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646099
 ]

ASF GitHub Bot logged work on HIVE-25056:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 00:09
Start Date: 03/Sep/21 00:09
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2221:
URL: https://github.com/apache/hive/pull/2221


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646099)
Time Spent: 40m  (was: 0.5h)

> cast ('000-00-00 00:00:00' as timestamp/datetime) results in wrong conversion 
> --
>
> Key: HIVE-25056
> URL: https://issues.apache.org/jira/browse/HIVE-25056
> Project: Hive
>  Issue Type: Bug
>Reporter: Anurag Shekhar
>Assignee: Anurag Shekhar
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> select cast ('-00-00' as date) , cast ('000-00-00 00:00:00' as timestamp) 
> +--+---+
> |_c0|_c1|
> +--+---+
> |0002-11-30|0002-11-30 00:00:00.0|
> +--+---+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25497) Bump ORC to 1.6.10

2021-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25497:
--
Labels: pull-request-available  (was: )

> Bump ORC to 1.6.10
> --
>
> Key: HIVE-25497
> URL: https://issues.apache.org/jira/browse/HIVE-25497
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: William Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25497) Bump ORC to 1.6.10

2021-09-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25497?focusedWorklogId=646097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646097
 ]

ASF GitHub Bot logged work on HIVE-25497:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 00:05
Start Date: 03/Sep/21 00:05
Worklog Time Spent: 10m 
  Work Description: williamhyun opened a new pull request #2615:
URL: https://github.com/apache/hive/pull/2615


   ### What changes were proposed in this pull request?
   This PR aims to bump ORC to version 1.6.10.
   
   ### Why are the changes needed?
   This will bring the latest bug fixes. 
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Pass the CIs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646097)
Remaining Estimate: 0h
Time Spent: 10m

> Bump ORC to 1.6.10
> --
>
> Key: HIVE-25497
> URL: https://issues.apache.org/jira/browse/HIVE-25497
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: William Hyun
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2021-09-02 Thread GuangMing Lu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408892#comment-17408892
 ] 

GuangMing Lu commented on HIVE-20682:
-

Hivs 3.1.0 does not match the current code, which is not shown in version 3.1.0

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2021-09-02 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-20682:

Affects Version/s: (was: 3.1.0)

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25494) Hive query fails with IndexOutOfBoundsException when a struct type column's field is missing in parquet file schema but present in table schema

2021-09-02 Thread Ganesha Shreedhara (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408779#comment-17408779
 ] 

Ganesha Shreedhara commented on HIVE-25494:
---

I see that [HIVE-15156|https://issues.apache.org/jira/browse/HIVE-15156] opened 
to support nested column pruning in vectorized parquet reader.

[~Ferd] I have a question on 
[HIVE-13873|https://issues.apache.org/jira/browse/HIVE-13873]. Along with 
pruning the columns based on the selected fields, can we also prune the columns 
that don't exists in parquet file schema and read only the columns or nested 
fields that exist in the file? This happens when parquet columns are accessed 
by indexes but doesn't happen when columns are accessed by names because of 
[this 
line|https://github.com/apache/hive/blame/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130].

> Hive query fails with IndexOutOfBoundsException when a struct type column's 
> field is missing in parquet file schema but present in table schema
> ---
>
> Key: HIVE-25494
> URL: https://issues.apache.org/jira/browse/HIVE-25494
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet
>Affects Versions: 3.1.2
>Reporter: Ganesha Shreedhara
>Priority: Major
>  Labels: schema-evolution
> Attachments: test-struct.parquet
>
>
> When a struct type column's field is missing in parquet file schema but 
> present in table schema and columns are accessed by names, the 
> requestedSchema getting sent from Hive to Parquet storage layer has type even 
> for missing field since we always add type as primitive type if a field is 
> missing in file schema (Ref: 
> [code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L130]).
>  On a parquet side, this missing field gets pruned and since this field 
> belongs to struct type, it ends up creating a GroupColumnIO without any 
> children. This causes query to fail with IndexOutOfBoundsException, stack 
> trace is given below.
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file test-struct.parquet
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:98)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>  ... 15 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657)
>  at java.util.ArrayList.get(ArrayList.java:433)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:102)
>  at 
> org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:102)
>  at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:97)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:277)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:135)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
>  at 
> org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:101)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:140)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
>  {code}
>  
> Steps to reproduce:
>  
> {code:java}
> CREATE TABLE parquet_struct_test(
> `parent` struct COMMENT '',
> `toplevel` string COMMENT '')
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
>  
> -- Use the attached test-struct.parquet data