[jira] [Resolved] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-29 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-25400.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the review, Panos!

> Move the offset updating in BytesColumnVector to setValPreallocated.
> 
>
> Key: HIVE-25400
> URL: https://issues.apache.org/jira/browse/HIVE-25400
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVE-25190 changed the semantics of BytesColumnVector so that 
> ensureValPreallocated reserved the room, which interacted badly with ORC's 
> redact mask code. The redact mask code needs to be able to increase the 
> allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-28 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-25400:



> Move the offset updating in BytesColumnVector to setValPreallocated.
> 
>
> Key: HIVE-25400
> URL: https://issues.apache.org/jira/browse/HIVE-25400
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>
> HIVE-25190 changed the semantics of BytesColumnVector so that 
> ensureValPreallocated reserved the room, which interacted badly with ORC's 
> redact mask code. The redact mask code needs to be able to increase the 
> allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb

2021-07-26 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-25190:
-
Fix Version/s: storage-2.9.0
   storage-2.8.1
   storage-2.7.3

> BytesColumnVector fails when the aggregate size is > 1gb
> 
>
> Key: HIVE-25190
> URL: https://issues.apache.org/jira/browse/HIVE-25190
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently, BytesColumnVector will allocate a buffer for small values (< 1mb), 
> but fail with:
> {code:java}
> new RuntimeException("Overflow of newLength. smallBuffer.length="
> + smallBuffer.length + ", nextElemLength=" + nextElemLength);
> {code:java}
> if the aggregate size of the buffer crosses over 1gb. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25386) hive-storage-api should not have guava compile dependency

2021-07-26 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-25386.
--
Fix Version/s: storage-2.9.0
   storage-2.8.1
   Resolution: Fixed

I committed this, thanks, Dongjoon!

> hive-storage-api should not have guava compile dependency
> -
>
> Key: HIVE-25386
> URL: https://issues.apache.org/jira/browse/HIVE-25386
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: storage-2.8.1, storage-2.9.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api/2.8.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb

2021-06-02 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-25190:



> BytesColumnVector fails when the aggregate size is > 1gb
> 
>
> Key: HIVE-25190
> URL: https://issues.apache.org/jira/browse/HIVE-25190
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> Currently, BytesColumnVector will allocate a buffer for small values (< 1mb), 
> but fail with:
> {code:java}
> new RuntimeException("Overflow of newLength. smallBuffer.length="
> + smallBuffer.length + ", nextElemLength=" + nextElemLength);
> {code:java}
> if the aggregate size of the buffer crosses over 1gb. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24458) Allow access to SArgs without converting to disjunctive normal form

2020-11-30 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-24458:



> Allow access to SArgs without converting to disjunctive normal form
> ---
>
> Key: HIVE-24458
> URL: https://issues.apache.org/jira/browse/HIVE-24458
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> For some use cases, it is useful to have access to the SArg expression in a 
> non-normalized form. Currently, the SArg only provides the fully normalized 
> expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24455) Fix broken junit framework in storage-api

2020-11-30 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-24455:



> Fix broken junit framework in storage-api
> -
>
> Key: HIVE-24455
> URL: https://issues.apache.org/jira/browse/HIVE-24455
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> The use of junit is broken in storage-api. It results in no tests being found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21174) hive.stats.ndv.error parameter documentation issue

2020-11-24 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-21174:
-
Fix Version/s: (was: 3.10)

> hive.stats.ndv.error parameter documentation issue
> --
>
> Key: HIVE-21174
> URL: https://issues.apache.org/jira/browse/HIVE-21174
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.2.0, 2.3.0, 3.0.0, 2.3.1, 
> 2.3.2, 3.1.0, 2.0.2, 2.1.2, 2.2.1, 2.3.3, 2.3.4, 2.4.0, 3.10, 3.0.1, 3.1.1, 
> 3.1.2, 3.2.0
>Reporter: Pablo Junge
>Assignee: Pablo Junge
>Priority: Major
> Fix For: 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.2.0, 2.3.0, 3.0.0, 2.3.1, 
> 2.3.2, 3.1.0, 2.0.2, 2.1.2, 2.2.1, 2.3.3, 2.3.4, 2.4.0, 3.0.1, 3.1.1, 3.2.0
>
>
> Hive documentation for hive.stats.ndv.error does not specify that 
> hive.stats.ndv.error will only affect FM Sketch and not HLL.
>  
> https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-12679) Allow users to be able to specify an implementation of IMetaStoreClient via HiveConf

2020-06-23 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143114#comment-17143114
 ] 

Owen O'Malley commented on HIVE-12679:
--

Does the patch still apply to trunk?

> Allow users to be able to specify an implementation of IMetaStoreClient via 
> HiveConf
> 
>
> Key: HIVE-12679
> URL: https://issues.apache.org/jira/browse/HIVE-12679
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Metastore, Query Planning
>Reporter: Austin Lee
>Priority: Minor
>  Labels: metastore
> Attachments: HIVE-12679.1.patch, HIVE-12679.2.patch, 
> HIVE-12679.branch-1.2.patch, HIVE-12679.branch-2.3.patch, HIVE-12679.patch
>
>
> Hi,
> I would like to propose a change that would make it possible for users to 
> choose an implementation of IMetaStoreClient via HiveConf, i.e. 
> hive-site.xml.  Currently, in Hive the choice is hard coded to be 
> SessionHiveMetaStoreClient in org.apache.hadoop.hive.ql.metadata.Hive.  There 
> is no other direct reference to SessionHiveMetaStoreClient other than the 
> hard coded class name in Hive.java and the QL component operates only on the 
> IMetaStoreClient interface so the change would be minimal and it would be 
> quite similar to how an implementation of RawStore is specified and loaded in 
> hive-metastore.  One use case this change would serve would be one where a 
> user wishes to use an implementation of this interface without the dependency 
> on the Thrift server.
>   
> Thank you,
> Austin



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23215) Make FilterContext and MutableFilterContext interfaces

2020-04-15 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-23215:



> Make FilterContext and MutableFilterContext interfaces
> --
>
> Key: HIVE-23215
> URL: https://issues.apache.org/jira/browse/HIVE-23215
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> HIVE-22959 introduced FilterContext to support ORC-577. The duplication of 
> fields between the FilterContext and VectorizedRowBatch seems likely to cause 
> user confusion. This patch makes them interfaces that VectorizedRowBatch 
> implements.
> Thus, there is a single copy of the data and no need to copy them back and 
> forth. LLAP can make its own implementation of the interfaces if it doesn't 
> want to use VectorizedRowBatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22959) Extend storage-api to expose FilterContext

2020-03-30 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071287#comment-17071287
 ] 

Owen O'Malley commented on HIVE-22959:
--

Ok, I need to make some time tomorrow to go through ORC-577, but this feels 
like a bad direction. In particular, all of the public ORC and storage apis use 
VectorizedRowBatch. Making a second copy of the API will increase user 
confusion.

> Extend storage-api to expose FilterContext
> --
>
> Key: HIVE-22959
> URL: https://issues.apache.org/jira/browse/HIVE-22959
> Project: Hive
>  Issue Type: Sub-task
>  Components: storage-api
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, storage-2.7.2
>
> Attachments: HIVE-22959.1.patch, HIVE-22959.2.patch, 
> HIVE-22959.3.patch, HIVE-22959.4.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> To enable row-level filtering at the ORC level ORC-577, or as an extension 
> ProDecode MapJoin HIVE-22731 we need a common context class that will hold 
> all the needed information for the filter.
> I propose this class to be part of the storage-api – similar to 
> VectorizedRowBatch class and hold the information below:
>  * A boolean variable showing if the filter is enabled
>  * A int array storing the row Ids that are actually selected (passing the 
> filter)
>  * An int variable storing the the number or rows that passed the filter
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22959) Extend storage-api to expose FilterContext

2020-03-30 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1707#comment-1707
 ] 

Owen O'Malley commented on HIVE-22959:
--

Sorry for coming in late, but I was looking at the proposed release 
(storage-2.7.2rc0) and I'm confused. Doesn't this all duplicate the information 
that is already in VectorizedRowBatch? How is this going to be used?

> Extend storage-api to expose FilterContext
> --
>
> Key: HIVE-22959
> URL: https://issues.apache.org/jira/browse/HIVE-22959
> Project: Hive
>  Issue Type: Sub-task
>  Components: storage-api
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, storage-2.7.2
>
> Attachments: HIVE-22959.1.patch, HIVE-22959.2.patch, 
> HIVE-22959.3.patch, HIVE-22959.4.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> To enable row-level filtering at the ORC level ORC-577, or as an extension 
> ProDecode MapJoin HIVE-22731 we need a common context class that will hold 
> all the needed information for the filter.
> I propose this class to be part of the storage-api – similar to 
> VectorizedRowBatch class and hold the information below:
>  * A boolean variable showing if the filter is enabled
>  * A int array storing the row Ids that are actually selected (passing the 
> filter)
>  * An int variable storing the the number or rows that passed the filter
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22555) Upgrade ORC version to 1.5.8

2019-11-27 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983843#comment-16983843
 ] 

Owen O'Malley commented on HIVE-22555:
--

The method for MemoryManager has moved from addedRow to checkMemory. Any test 
that implements a MemoryManager should be defining checkMemory instead of 
addRows. The default implementation was added to prevent the worst of the 
breakages.

One thought is that we should probably delete the 
TestOrcFile.testMemoryManagement* tests don't add a lot of value now that the 
equivalent tests are down in ORC. The others need to be ported to the new API.

> Upgrade ORC version to 1.5.8
> 
>
> Key: HIVE-22555
> URL: https://issues.apache.org/jira/browse/HIVE-22555
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> Hive currently depends on ORC 1.5.6. We need 1.5.8 upgrade for 
> https://issues.apache.org/jira/browse/HIVE-22499
> ORC-1.5.7 includes https://issues.apache.org/jira/browse/ORC-361 . It causes 
> some tests overriding MemoryManager to fail. These need to be addressed while 
> upgrading.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22405) Add ColumnVector support for ProlepticCalendar

2019-11-26 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-22405:
-
Fix Version/s: 3.1.3

> Add ColumnVector support for ProlepticCalendar
> --
>
> Key: HIVE-22405
> URL: https://issues.apache.org/jira/browse/HIVE-22405
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0, 3.1.3, storage-2.7.1
>
> Attachments: HIVE-22405.01.patch, HIVE-22405.02.patch, 
> HIVE-22405.04.patch
>
>
> Hive recently moved its processing to the proleptic calendar, which has 
> created some issues for users who have dates before 1580 AD.
> I'd propose extending the column vectors for times & dates to encode which 
> calendar they are using.
> * create DateColumnVector that extends LongColumnVector
> * add a method to change calendars to both DateColumnVector and 
> TimestampColumnVector.
> {code}
>   /**
>* Change the calendar to or from proleptic. If the new and old values of 
> the flag are the
>* same, nothing is done.
>* useProleptic - set the flag for the proleptic calendar
>* updateData - change the data to match the new value of the flag.
>*/
>   void changeCalendar(useProleptic: boolean, updateData: boolean);
>   /**
>* Detect whether this data is using the proleptic calendar.
>*/
>   boolean usingProlepticCalendar();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22405) Add ColumnVector support for ProlepticCalendar

2019-11-26 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-22405:
-
Fix Version/s: storage-2.7.1

> Add ColumnVector support for ProlepticCalendar
> --
>
> Key: HIVE-22405
> URL: https://issues.apache.org/jira/browse/HIVE-22405
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0, storage-2.7.1
>
> Attachments: HIVE-22405.01.patch, HIVE-22405.02.patch, 
> HIVE-22405.04.patch
>
>
> Hive recently moved its processing to the proleptic calendar, which has 
> created some issues for users who have dates before 1580 AD.
> I'd propose extending the column vectors for times & dates to encode which 
> calendar they are using.
> * create DateColumnVector that extends LongColumnVector
> * add a method to change calendars to both DateColumnVector and 
> TimestampColumnVector.
> {code}
>   /**
>* Change the calendar to or from proleptic. If the new and old values of 
> the flag are the
>* same, nothing is done.
>* useProleptic - set the flag for the proleptic calendar
>* updateData - change the data to match the new value of the flag.
>*/
>   void changeCalendar(useProleptic: boolean, updateData: boolean);
>   /**
>* Detect whether this data is using the proleptic calendar.
>*/
>   boolean usingProlepticCalendar();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22405) Add ColumnVector support for ProlepticCalendar

2019-11-22 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980565#comment-16980565
 ] 

Owen O'Malley commented on HIVE-22405:
--

Other than not using UTC, your patch looks fine.

> Add ColumnVector support for ProlepticCalendar
> --
>
> Key: HIVE-22405
> URL: https://issues.apache.org/jira/browse/HIVE-22405
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22405.01.patch, HIVE-22405.02.patch, 
> HIVE-22405.03.patch
>
>
> Hive recently moved its processing to the proleptic calendar, which has 
> created some issues for users who have dates before 1580 AD.
> I'd propose extending the column vectors for times & dates to encode which 
> calendar they are using.
> * create DateColumnVector that extends LongColumnVector
> * add a method to change calendars to both DateColumnVector and 
> TimestampColumnVector.
> {code}
>   /**
>* Change the calendar to or from proleptic. If the new and old values of 
> the flag are the
>* same, nothing is done.
>* useProleptic - set the flag for the proleptic calendar
>* updateData - change the data to match the new value of the flag.
>*/
>   void changeCalendar(useProleptic: boolean, updateData: boolean);
>   /**
>* Detect whether this data is using the proleptic calendar.
>*/
>   boolean usingProlepticCalendar();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22405) Add ColumnVector support for ProlepticCalendar

2019-11-22 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980564#comment-16980564
 ] 

Owen O'Malley commented on HIVE-22405:
--

Your conversion routine needs to use UTC.

I've updated the patch at:  https://github.com/omalley/hive/tree/hive-22405

> Add ColumnVector support for ProlepticCalendar
> --
>
> Key: HIVE-22405
> URL: https://issues.apache.org/jira/browse/HIVE-22405
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22405.01.patch, HIVE-22405.02.patch, 
> HIVE-22405.03.patch
>
>
> Hive recently moved its processing to the proleptic calendar, which has 
> created some issues for users who have dates before 1580 AD.
> I'd propose extending the column vectors for times & dates to encode which 
> calendar they are using.
> * create DateColumnVector that extends LongColumnVector
> * add a method to change calendars to both DateColumnVector and 
> TimestampColumnVector.
> {code}
>   /**
>* Change the calendar to or from proleptic. If the new and old values of 
> the flag are the
>* same, nothing is done.
>* useProleptic - set the flag for the proleptic calendar
>* updateData - change the data to match the new value of the flag.
>*/
>   void changeCalendar(useProleptic: boolean, updateData: boolean);
>   /**
>* Detect whether this data is using the proleptic calendar.
>*/
>   boolean usingProlepticCalendar();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22405) Add ColumnVector support for ProlepticCalendar

2019-11-22 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980553#comment-16980553
 ] 

Owen O'Malley commented on HIVE-22405:
--

Patch 03 is failing a test case for me:

{code}
Running org.apache.hadoop.hive.ql.exec.vector.TestDateColumnVector
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.09 sec <<< 
FAILURE! - in org.apache.hadoop.hive.ql.exec.vector.TestDateColumnVector
testProlepticCalendar(org.apache.hadoop.hive.ql.exec.vector.TestDateColumnVector)
  Time elapsed: 0.035 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<2015-11-2[9]> but was:<2015-11-2[8]>
at 
org.apache.hadoop.hive.ql.exec.vector.TestDateColumnVector.setDateAndVerifyProlepticUpdate(TestDateColumnVector.java:76)
at 
org.apache.hadoop.hive.ql.exec.vector.TestDateColumnVector.testProlepticCalendar(TestDateColumnVector.java:45)
{code}

> Add ColumnVector support for ProlepticCalendar
> --
>
> Key: HIVE-22405
> URL: https://issues.apache.org/jira/browse/HIVE-22405
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22405.01.patch, HIVE-22405.02.patch, 
> HIVE-22405.03.patch
>
>
> Hive recently moved its processing to the proleptic calendar, which has 
> created some issues for users who have dates before 1580 AD.
> I'd propose extending the column vectors for times & dates to encode which 
> calendar they are using.
> * create DateColumnVector that extends LongColumnVector
> * add a method to change calendars to both DateColumnVector and 
> TimestampColumnVector.
> {code}
>   /**
>* Change the calendar to or from proleptic. If the new and old values of 
> the flag are the
>* same, nothing is done.
>* useProleptic - set the flag for the proleptic calendar
>* updateData - change the data to match the new value of the flag.
>*/
>   void changeCalendar(useProleptic: boolean, updateData: boolean);
>   /**
>* Detect whether this data is using the proleptic calendar.
>*/
>   boolean usingProlepticCalendar();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22405) Add ColumnVector support for ProlepticCalendar

2019-11-07 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969653#comment-16969653
 ] 

Owen O'Malley commented on HIVE-22405:
--

It looks like the DateColumnVector is interpreting the longs as millis, while I 
believe the correct interpretation is the days since 01-01-1970.

You can consider passing the number of values to update, because the number of 
values set may be significantly smaller than the length. On the other hand, 
this will be fine in the vast majority of cases.

I'd suggest that you calculate a upper bound on the date of divergence between 
the calendars (eg. 1 Nov 1582) as a static and use that as a first pass where 
no conversion is necessary. It would be great to have the vast majority of our 
users not be slowed down by this code.

You also need to add tests that convert from proleptic calendar to 
non-proleptic.



> Add ColumnVector support for ProlepticCalendar
> --
>
> Key: HIVE-22405
> URL: https://issues.apache.org/jira/browse/HIVE-22405
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22405.01.patch, HIVE-22405.02.patch
>
>
> Hive recently moved its processing to the proleptic calendar, which has 
> created some issues for users who have dates before 1580 AD.
> I'd propose extending the column vectors for times & dates to encode which 
> calendar they are using.
> * create DateColumnVector that extends LongColumnVector
> * add a method to change calendars to both DateColumnVector and 
> TimestampColumnVector.
> {code}
>   /**
>* Change the calendar to or from proleptic. If the new and old values of 
> the flag are the
>* same, nothing is done.
>* useProleptic - set the flag for the proleptic calendar
>* updateData - change the data to match the new value of the flag.
>*/
>   void changeCalendar(useProleptic: boolean, updateData: boolean);
>   /**
>* Detect whether this data is using the proleptic calendar.
>*/
>   boolean usingProlepticCalendar();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22105) Update ORC to 1.5.6.

2019-08-13 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-22105:



> Update ORC to 1.5.6.
> 
>
> Key: HIVE-22105
> URL: https://issues.apache.org/jira/browse/HIVE-22105
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> ORC has had some important fixes in the 1.5 branch and they should be picked 
> up by Hive.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-21848) Table property name definition between ORC and Parquet encrytion

2019-07-08 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880790#comment-16880790
 ] 

Owen O'Malley commented on HIVE-21848:
--

[~sha...@uber.com], I think the appropriate response is to throw an exception 
if there are conflicting directions about how to encrypt.

For now, I don't think we should add an exemption list of children to not 
encrypt, although if there are user-requests we can add it later.

> Table property name definition between ORC and Parquet encrytion
> 
>
> Key: HIVE-21848
> URL: https://issues.apache.org/jira/browse/HIVE-21848
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 3.0.0
>
>
> The goal of this Jira is to define a superset of unified table property names 
> that can be used for both Parquet and ORC column encryption. There is no code 
> change needed for this Jira.
> *Background:*
> ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To 
> configure the encryption, e.g. which column is sensitive, what master key to 
> be used, algorithm, etc, table properties can be used. It is important that 
> both Parquet and ORC can use unified names.
> According to the slide 
> [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
>  ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in 
> the Parquet community, it is still discussing to provide several ways and 
> using table properties is one of the options, while there is no detailed 
> design of the table property names yet.
> So it is a good time to discuss within two communities to have unified table 
> names as a superset.
> *Proposal:*
> There are several encryption properties that need to be specified for a 
> table. Here is the list. This is the superset of Parquet and ORC. Some of 
> them might not apply to both.
>  # PII columns including nest columns
>  # Column key metadata, master key metadata
>  # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. 
> ORC might support AES_CTR.
>  # Encryption footer - Parquet allow footer to be encrypted or plaintext
>  # Footer key metadata
> Here is the table properties proposal.  
> |*Table Property Name*|*Value*|*Notes*|
> |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.|
> |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted 
> footer. By default, it is encrypted.|
> |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to 
> the KMS to define what key metadata is. The metadata should have enough 
> information to figure out the corresponding key by the KMS.  |
> |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column 
> name for example, ‘address.zipcode’. 
>  
> It is up to the KMS to define what key metadata is. The metadata should have 
> enough information to figure out the corresponding key by the KMS.|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21848) Table property name definition between ORC and Parquet encrytion

2019-06-24 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871852#comment-16871852
 ] 

Owen O'Malley commented on HIVE-21848:
--

You might also want to look at the ORC KeyProvider API since that is how ORC 
gets the key metadata and local keys. It currently is implemented by the 
Hadoop/Ranger KMS or an in memory implementation. It will also be implemented 
for the Amazon and Azure KMS.

https://github.com/apache/orc/blob/master/java/shims/src/java/org/apache/orc/impl/HadoopShims.java#L150

> Table property name definition between ORC and Parquet encrytion
> 
>
> Key: HIVE-21848
> URL: https://issues.apache.org/jira/browse/HIVE-21848
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 3.0.0
>
>
> The goal of this Jira is to define a superset of unified table property names 
> that can be used for both Parquet and ORC column encryption. There is no code 
> change needed for this Jira.
> *Background:*
> ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To 
> configure the encryption, e.g. which column is sensitive, what master key to 
> be used, algorithm, etc, table properties can be used. It is important that 
> both Parquet and ORC can use unified names.
> According to the slide 
> [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
>  ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in 
> the Parquet community, it is still discussing to provide several ways and 
> using table properties is one of the options, while there is no detailed 
> design of the table property names yet.
> So it is a good time to discuss within two communities to have unified table 
> names as a superset.
> *Proposal:*
> There are several encryption properties that need to be specified for a 
> table. Here is the list. This is the superset of Parquet and ORC. Some of 
> them might not apply to both.
>  # PII columns including nest columns
>  # Column key metadata, master key metadata
>  # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. 
> ORC might support AES_CTR.
>  # Encryption footer - Parquet allow footer to be encrypted or plaintext
>  # Footer key metadata
> Here is the table properties proposal.  
> |*Table Property Name*|*Value*|*Notes*|
> |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.|
> |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted 
> footer. By default, it is encrypted.|
> |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to 
> the KMS to define what key metadata is. The metadata should have enough 
> information to figure out the corresponding key by the KMS.  |
> |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column 
> name for example, ‘address.zipcode’. 
>  
> It is up to the KMS to define what key metadata is. The metadata should have 
> enough information to figure out the corresponding key by the KMS.|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21848) Table property name definition between ORC and Parquet encrytion

2019-06-24 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871532#comment-16871532
 ] 

Owen O'Malley commented on HIVE-21848:
--

For ORC, given a structure like:

{code}
table Customers (
  name: struct,
  address: struct
  credit_cards:list>
)
{code}

If you encrypt *address*, all of the subfields of *address* will be encrypted.

The column finding code in ORC also deals with subfields and so 
*address.street* will return the *street* subfield. The other complex types are 
also addressable as:

* list child = _elem
* map children = _key and _value
* union children = 

So to encrypt just the card numbers, it would be "encrypt.columns" = 
"credit:credit_cards._elem.card_number".

That said, I assumed that Parquet didn't have that kind of column handling, but 
that it wouldn't be hard to build. However, the vast majority of the column 
encryption is likely to be at the top level and thus if Parquet can start with 
that it would be great.


> Table property name definition between ORC and Parquet encrytion
> 
>
> Key: HIVE-21848
> URL: https://issues.apache.org/jira/browse/HIVE-21848
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 3.0.0
>
>
> The goal of this Jira is to define a superset of unified table property names 
> that can be used for both Parquet and ORC column encryption. There is no code 
> change needed for this Jira.
> *Background:*
> ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To 
> configure the encryption, e.g. which column is sensitive, what master key to 
> be used, algorithm, etc, table properties can be used. It is important that 
> both Parquet and ORC can use unified names.
> According to the slide 
> [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
>  ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in 
> the Parquet community, it is still discussing to provide several ways and 
> using table properties is one of the options, while there is no detailed 
> design of the table property names yet.
> So it is a good time to discuss within two communities to have unified table 
> names as a superset.
> *Proposal:*
> There are several encryption properties that need to be specified for a 
> table. Here is the list. This is the superset of Parquet and ORC. Some of 
> them might not apply to both.
>  # PII columns including nest columns
>  # Column key metadata, master key metadata
>  # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. 
> ORC might support AES_CTR.
>  # Encryption footer - Parquet allow footer to be encrypted or plaintext
>  # Footer key metadata
> Here is the table properties proposal.  
> |*Table Property Name*|*Value*|*Notes*|
> |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.|
> |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted 
> footer. By default, it is encrypted.|
> |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to 
> the KMS to define what key metadata is. The metadata should have enough 
> information to figure out the corresponding key by the KMS.  |
> |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column 
> name for example, ‘address.zipcode’. 
>  
> It is up to the KMS to define what key metadata is. The metadata should have 
> enough information to figure out the corresponding key by the KMS.|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21848) Table property name definition between ORC and Parquet encrytion

2019-06-24 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871405#comment-16871405
 ] 

Owen O'Malley commented on HIVE-21848:
--

How about a variant of #2 that looks like:

"encrypt.columns" = "pii:col1,col2;credit:col3"

So:

root =  |  ";" 

key-columns =  ":" 

column-list =  |  "," 

 

> Table property name definition between ORC and Parquet encrytion
> 
>
> Key: HIVE-21848
> URL: https://issues.apache.org/jira/browse/HIVE-21848
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 3.0.0
>
>
> The goal of this Jira is to define a superset of unified table property names 
> that can be used for both Parquet and ORC column encryption. There is no code 
> change needed for this Jira.
> *Background:*
> ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To 
> configure the encryption, e.g. which column is sensitive, what master key to 
> be used, algorithm, etc, table properties can be used. It is important that 
> both Parquet and ORC can use unified names.
> According to the slide 
> [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
>  ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in 
> the Parquet community, it is still discussing to provide several ways and 
> using table properties is one of the options, while there is no detailed 
> design of the table property names yet.
> So it is a good time to discuss within two communities to have unified table 
> names as a superset.
> *Proposal:*
> There are several encryption properties that need to be specified for a 
> table. Here is the list. This is the superset of Parquet and ORC. Some of 
> them might not apply to both.
>  # PII columns including nest columns
>  # Column key metadata, master key metadata
>  # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. 
> ORC might support AES_CTR.
>  # Encryption footer - Parquet allow footer to be encrypted or plaintext
>  # Footer key metadata
> Here is the table properties proposal.  
> |*Table Property Name*|*Value*|*Notes*|
> |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.|
> |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted 
> footer. By default, it is encrypted.|
> |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to 
> the KMS to define what key metadata is. The metadata should have enough 
> information to figure out the corresponding key by the KMS.  |
> |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column 
> name for example, ‘address.zipcode’. 
>  
> It is up to the KMS to define what key metadata is. The metadata should have 
> enough information to figure out the corresponding key by the KMS.|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21848) Table property name definition between ORC and Parquet encrytion

2019-06-17 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866205#comment-16866205
 ] 

Owen O'Malley commented on HIVE-21848:
--

These are the user facing properties, so we should make them user friendly.

To encrypt columns col1 and col2 with the pii key, I'd propose: 
"*encrypt.with.pii*" = "*col1,col2*".

For ORC, all of the properties (algorithm, version, etc.) of the key come from 
the KMS. The properties for algorithm, footer plaintext, and footer key 
metadata are thus all parquet specific, so I'd propose using 
"*parquet.encrypt...*".

ORC also supports masking of the unencrypted data, which needs a mask name and 
an optional list of parameters. I'd propose making those standard also. Note 
that there is a default of nullify in case the user doesn't specify a mask.

For masking with redact and param1, I'd propose: "*mask.with.redact.param1*" = 
"*col1,col2*".

> Table property name definition between ORC and Parquet encrytion
> 
>
> Key: HIVE-21848
> URL: https://issues.apache.org/jira/browse/HIVE-21848
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 3.0.0
>
>
> The goal of this Jira is to define a superset of unified table property names 
> that can be used for both Parquet and ORC column encryption. There is no code 
> change needed for this Jira.
> *Background:*
> ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To 
> configure the encryption, e.g. which column is sensitive, what master key to 
> be used, algorithm, etc, table properties can be used. It is important that 
> both Parquet and ORC can use unified names.
> According to the slide 
> [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
>  ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in 
> the Parquet community, it is still discussing to provide several ways and 
> using table properties is one of the options, while there is no detailed 
> design of the table property names yet.
> So it is a good time to discuss within two communities to have unified table 
> names as a superset.
> *Proposal:*
> There are several encryption properties that need to be specified for a 
> table. Here is the list. This is the superset of Parquet and ORC. Some of 
> them might not apply to both.
>  # PII columns including nest columns
>  # Column key metadata, master key metadata
>  # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. 
> ORC might support AES_CTR.
>  # Encryption footer - Parquet allow footer to be encrypted or plaintext
>  # Footer key metadata
> Here is the table properties proposal.  
> |*Table Property Name*|*Value*|*Notes*|
> |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.|
> |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted 
> footer. By default, it is encrypted.|
> |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to 
> the KMS to define what key metadata is. The metadata should have enough 
> information to figure out the corresponding key by the KMS.  |
> |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column 
> name for example, ‘address.zipcode’. 
>  
> It is up to the KMS to define what key metadata is. The metadata should have 
> enough information to figure out the corresponding key by the KMS.|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21850) branch-3 metastore installation installs wrong version

2019-06-12 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862349#comment-16862349
 ] 

Owen O'Malley commented on HIVE-21850:
--

+1

> branch-3 metastore installation installs wrong version
> --
>
> Key: HIVE-21850
> URL: https://issues.apache.org/jira/browse/HIVE-21850
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.2.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HIVE-21850-branch-3.patch
>
>
> hive.version.shortname in standalone-metastore/pom.xml was not properly 
> updated in branch-3.  It is still set to 3.1.0, which causes the 
> MetastoreSchemaTool to install the wrong version.  Part of this Jira should 
> include updating the HowToRelease doc to include updating this value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-21585) Upgrade branch-2.3 to ORC 1.3.4

2019-04-16 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-21585.
--
   Resolution: Fixed
Fix Version/s: 2.3.5

Thanks for the review, Prasanth!

> Upgrade branch-2.3 to ORC 1.3.4
> ---
>
> Key: HIVE-21585
> URL: https://issues.apache.org/jira/browse/HIVE-21585
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive's branch-2.3 currently uses ORC 1.3.3.
> I'd like to upgrade it use the bug fix release [ORC 
> 1.3.4|https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-printable/temp/SearchRequest.html?jqlQuery=project+%3D+ORC+AND+status+%3D+Closed+AND+fixVersion+%3D+%221.3.4%22=500].
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21585) Upgrade branch-2.3 to ORC 1.3.4

2019-04-08 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812788#comment-16812788
 ] 

Owen O'Malley commented on HIVE-21585:
--

Upgrading Hive from ORC 1.3.x to 1.5.x isn't easy/safe. In particular, we'd 
need to upgrade to storage-api 2.6.1, which requires HIVE-16546.

> Upgrade branch-2.3 to ORC 1.3.4
> ---
>
> Key: HIVE-21585
> URL: https://issues.apache.org/jira/browse/HIVE-21585
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive's branch-2.3 currently uses ORC 1.3.3.
> I'd like to upgrade it use the bug fix release [ORC 
> 1.3.4|https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-printable/temp/SearchRequest.html?jqlQuery=project+%3D+ORC+AND+status+%3D+Closed+AND+fixVersion+%3D+%221.3.4%22=500].
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21585) Upgrade branch-2.3 to ORC 1.3.4

2019-04-05 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-21585:



> Upgrade branch-2.3 to ORC 1.3.4
> ---
>
> Key: HIVE-21585
> URL: https://issues.apache.org/jira/browse/HIVE-21585
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> Hive's branch-2.3 currently uses ORC 1.3.3.
> I'd like to upgrade it use the bug fix release [ORC 
> 1.3.4|https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-printable/temp/SearchRequest.html?jqlQuery=project+%3D+ORC+AND+status+%3D+Closed+AND+fixVersion+%3D+%221.3.4%22=500].
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19638) Configuration not passed to ORC Reader.Options

2019-03-26 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-19638:
-
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

This was fixed in master. I also pushed it back to 2.3 and 3.1.

> Configuration not passed to ORC Reader.Options
> --
>
> Key: HIVE-19638
> URL: https://issues.apache.org/jira/browse/HIVE-19638
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, File Formats, ORC
>Affects Versions: 2.3.0, 2.3.1, 2.3.2
>Reporter: Rentao Wu
>Assignee: Rentao Wu
>Priority: Major
> Attachments: HIVE-19638.patch
>
>
> Configuration is not passed to ORC's Reader.Option in OrcFileInputFormat 
> which causes some [ORC 
> configurations|https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L170-L176]
>  to not be able to be picked up.
> Related issues:
> For example, the ORC upgrade in Hive 2.3.x changed schema evolution from 
> positional to column name matching. A backwards compatibility configuration 
> "orc.force.positional.evolution" could be set in ORC Reader.Options by 
> [ORC-120|https://issues.apache.org/jira/browse/ORC-120] however it could not 
> be picked up resulting in null values when querying ORC tables where the 
> column names do not match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20126) OrcInputFormat does not pass conf to orc reader options

2019-03-26 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802290#comment-16802290
 ] 

Owen O'Malley commented on HIVE-20126:
--

Also committed back to branch-3.1, branch-2, and branch-2.3.

> OrcInputFormat does not pass conf to orc reader options
> ---
>
> Key: HIVE-20126
> URL: https://issues.apache.org/jira/browse/HIVE-20126
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Fix For: 2.4.0, 4.0.0, 3.2.0, 2.3.4, 3.1.2
>
> Attachments: HIVE-20126.1.patch
>
>
> VectorizedOrcInputFormat creates Orc reader options without passing in the 
> configuration object. Without it setting orc configurations will not have any 
> impact. 
> Example: 
> set orc.force.positional.evolution=true;
> does not work for positional schema evolution (will attach test case).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20126) OrcInputFormat does not pass conf to orc reader options

2019-03-26 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-20126:
-
Fix Version/s: 2.3.4
   3.1.2
   2.4.0

> OrcInputFormat does not pass conf to orc reader options
> ---
>
> Key: HIVE-20126
> URL: https://issues.apache.org/jira/browse/HIVE-20126
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Fix For: 2.4.0, 4.0.0, 3.2.0, 2.3.4, 3.1.2
>
> Attachments: HIVE-20126.1.patch
>
>
> VectorizedOrcInputFormat creates Orc reader options without passing in the 
> configuration object. Without it setting orc configurations will not have any 
> impact. 
> Example: 
> set orc.force.positional.evolution=true;
> does not work for positional schema evolution (will attach test case).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21002) Backwards incompatible change: Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x incorrectly

2019-01-18 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746753#comment-16746753
 ] 

Owen O'Malley commented on HIVE-21002:
--

The desired semantics for SQL and therefore Hive, are that timestamp is local 
(ie. timestamp without timezone). Parquet had non-standard semantics for 
timestamp and thus we need to minimize the pain to users while still making 
Hive's use of Parquet implement the standard semantics.

I suspect that most of the users read & write the data in the same time zone, 
which makes the problem less severe. I'd recommend adding an annotation to the 
Parquet file that indicates the writers time zone (eg. "America/Los_Angeles") 
and then using that information to readjust each timestamp. This would handle:

 * Reader: old, writer: new, time zone: same
 * Reader: old, writer: old, time zone: same
 * Reader: new, writer: new, time zone: same or different
 * Reader: new, writer: old, time zone: same

Clearly we should push the reader and writer patch back to each branch of Hive 
that we care about. It would be good to use the isAdjustedToUtc=true for the 
timestamp with local time zone in the Hive 3 code.

> Backwards incompatible change: Hive 3.1 reads back Avro and Parquet 
> timestamps written by Hive 2.x incorrectly
> --
>
> Key: HIVE-21002
> URL: https://issues.apache.org/jira/browse/HIVE-21002
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Zoltan Ivanfi
>Priority: Major
>
> Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x 
> incorrectly. As an example session to demonstrate this problem, create a 
> dataset using Hive version 2.x in America/Los_Angeles:
> {code:sql}
> hive> create table ts_‹format› (ts timestamp) stored as ‹format›;
> hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*);
> {code}
> Querying this table by issuing
> {code:sql}
> hive> select * from ts_‹format›;
> {code}
> from different time zones using different versions of Hive and different 
> storage formats gives the following results:
> |‹format›|Writer time zone (in Hive 2.x)|Reader time zone|Result in Hive 2.x 
> reader|Result in Hive 3.1 reader|
> |Avro and Parquet|America/Los_Angeles|America/Los_Angeles|2018-01-01 
> *00*:00:00.0|2018-01-01 *08*:00:00.0|
> |Avro and Parquet|America/Los_Angeles|Europe/Paris|2018-01-01 
> *09*:00:00.0|2018-01-01 *08*:00:00.0|
> |Textfile and ORC|America/Los_Angeles|America/Los_Angeles|2018-01-01 
> 00:00:00.0|2018-01-01 00:00:00.0|
> |Textfile and ORC|America/Los_Angeles|Europe/Paris|2018-01-01 
> 00:00:00.0|2018-01-01 00:00:00.0|
> *Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored 
> in Avro and Parquet formats.* Apache ORC behaviour has not changed because it 
> was modified to adjust timestamps to retain backwards compatibility. Textfile 
> behaviour has not changed, because its processing involves parsing and 
> formatting instead of proper serializing and deserializing, so they 
> inherently had LocalDateTime semantics even in Hive 2.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21002) Backwards incompatible change: Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x incorrectly

2019-01-11 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740856#comment-16740856
 ] 

Owen O'Malley commented on HIVE-21002:
--

The behavior of Avro and Parquet is wrong both in 2.x and 3.1. The path forward 
should be to match the desired Hive semantics and return '00:00:00' for new 
files, regardless of format.

Iceberg uses Parquet's isAdjustedToUTC = true for timestamptz, which is the 
equivalent of Hive's timestamp with local time zone and isAdjustedToUTC = false 
for timestamp. It would be good to match those semantics in Hive. Can we detect 
the version of Hive that wrote the Parquet file to provide compatibility with 
told files?

> Backwards incompatible change: Hive 3.1 reads back Avro and Parquet 
> timestamps written by Hive 2.x incorrectly
> --
>
> Key: HIVE-21002
> URL: https://issues.apache.org/jira/browse/HIVE-21002
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Zoltan Ivanfi
>Priority: Major
>
> Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x 
> incorrectly. As an example session to demonstrate this problem, create a 
> dataset using Hive version 2.x in America/Los_Angeles:
> {code:sql}
> hive> create table ts_‹format› (ts timestamp) stored as ‹format›;
> hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*);
> {code}
> Querying this table by issuing
> {code:sql}
> hive> select * from ts_‹format›;
> {code}
> from different time zones using different versions of Hive and different 
> storage formats gives the following results:
> |‹format›|Time zone|Hive 2.x|Hive 3.1|
> |Avro and Parquet|America/Los_Angeles|2018-01-01 *00*:00:00.0|2018-01-01 
> *08*:00:00.0|
> |Avro and Parquet|Europe/Paris|2018-01-01 *09*:00:00.0|2018-01-01 
> *08*:00:00.0|
> |Textfile and ORC|America/Los_Angeles|2018-01-01 00:00:00.0|2018-01-01 
> 00:00:00.0|
> |Textfile and ORC|Europe/Paris|2018-01-01 00:00:00.0|2018-01-01 00:00:00.0|
> *Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored 
> in Avro and Parquet formats.* Apache ORC behaviour has not changed because it 
> was modified to adjust timestamps to retain backwards compatibility. Textfile 
> behaviour has not changed, because its processing involves parsing and 
> formatting instead of proper serializing and deserializing, so they 
> inherently had LocalDateTime semantics even in Hive 2.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21002) Backwards incompatible change: Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x incorrectly

2019-01-11 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740856#comment-16740856
 ] 

Owen O'Malley edited comment on HIVE-21002 at 1/11/19 10:53 PM:


The behavior of Avro and Parquet is wrong both in 2.x and 3.1. The path forward 
should be to match the desired Hive semantics and return '00:00:00' for new 
files, regardless of format.

Iceberg uses Parquet's isAdjustedToUTC = true for timestamptz, which is the 
equivalent of Hive's timestamp with local time zone and isAdjustedToUTC = false 
for timestamp. It would be good to match those semantics in Hive. Can we detect 
the version of Hive that wrote the Parquet file to provide compatibility with 
old files?


was (Author: owen.omalley):
The behavior of Avro and Parquet is wrong both in 2.x and 3.1. The path forward 
should be to match the desired Hive semantics and return '00:00:00' for new 
files, regardless of format.

Iceberg uses Parquet's isAdjustedToUTC = true for timestamptz, which is the 
equivalent of Hive's timestamp with local time zone and isAdjustedToUTC = false 
for timestamp. It would be good to match those semantics in Hive. Can we detect 
the version of Hive that wrote the Parquet file to provide compatibility with 
told files?

> Backwards incompatible change: Hive 3.1 reads back Avro and Parquet 
> timestamps written by Hive 2.x incorrectly
> --
>
> Key: HIVE-21002
> URL: https://issues.apache.org/jira/browse/HIVE-21002
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Zoltan Ivanfi
>Priority: Major
>
> Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x 
> incorrectly. As an example session to demonstrate this problem, create a 
> dataset using Hive version 2.x in America/Los_Angeles:
> {code:sql}
> hive> create table ts_‹format› (ts timestamp) stored as ‹format›;
> hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*);
> {code}
> Querying this table by issuing
> {code:sql}
> hive> select * from ts_‹format›;
> {code}
> from different time zones using different versions of Hive and different 
> storage formats gives the following results:
> |‹format›|Time zone|Hive 2.x|Hive 3.1|
> |Avro and Parquet|America/Los_Angeles|2018-01-01 *00*:00:00.0|2018-01-01 
> *08*:00:00.0|
> |Avro and Parquet|Europe/Paris|2018-01-01 *09*:00:00.0|2018-01-01 
> *08*:00:00.0|
> |Textfile and ORC|America/Los_Angeles|2018-01-01 00:00:00.0|2018-01-01 
> 00:00:00.0|
> |Textfile and ORC|Europe/Paris|2018-01-01 00:00:00.0|2018-01-01 00:00:00.0|
> *Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored 
> in Avro and Parquet formats.* Apache ORC behaviour has not changed because it 
> was modified to adjust timestamps to retain backwards compatibility. Textfile 
> behaviour has not changed, because its processing involves parsing and 
> formatting instead of proper serializing and deserializing, so they 
> inherently had LocalDateTime semantics even in Hive 2.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20135) Fix incompatible change in TimestampColumnVector to default to UTC

2018-07-11 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540245#comment-16540245
 ] 

Owen O'Malley commented on HIVE-20135:
--

+1, thanks!

> Fix incompatible change in TimestampColumnVector to default to UTC
> --
>
> Key: HIVE-20135
> URL: https://issues.apache.org/jira/browse/HIVE-20135
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Fix For: storage-2.7.0
>
> Attachments: HIVE-20135.01.patch, HIVE-20135.patch
>
>
> HIVE-20007 changed the default for TimestampColumnVector to be to use UTC, 
> which breaks the API compatibility with storage-api 2.6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19902) Provide Metastore micro-benchmarks

2018-07-11 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540189#comment-16540189
 ] 

Owen O'Malley commented on HIVE-19902:
--

[~akolb], thanks for the answers.

JMH does provide the functionality for benchmark setup/teardown. It looks like:

{code}
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations=2 time=5, timeUnit = TimeUnit.SECONDS) // 2 iterations of 5 
seconds each
@Measurement(iterations=10, time=5, timeUnit = TimeUnit.SECONDS) // 10 
iterations of 5 seconds each
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(1)
public class MetastoreBenchmarks {
   @State(Scope.Thread)
   public static class MyState {
... variables and parameters that benchmarks need ...

@Setup(Level.Trial)
public void setup() {
  .. unmeasured setup code ...
}

@TearDown(Level.Trial)
public void teardown() {
... unmeasured teardown code ...
}
}

@Benchmark
public void testMethod(MyState state) {
... code to be benchmarked 
}
}
{code}

> Provide Metastore micro-benchmarks
> --
>
> Key: HIVE-19902
> URL: https://issues.apache.org/jira/browse/HIVE-19902
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Alexander Kolbasov
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-19902.01.patch, HIVE-19902.02.patch, 
> HIVE-19902.03.patch, HIVE-19902.04.patch
>
>
> It would be very useful to have metastore benchmarks to be able to track perf 
> issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19902) Provide Metastore micro-benchmarks

2018-07-10 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539369#comment-16539369
 ] 

Owen O'Malley commented on HIVE-19902:
--

I agree with [~pvary] that this should be in:

{code:java}standalone-metastore/benchmarks{code}

I see that you've built your own framework for the benchmarks, instead of using 
jmh, which seems excessive.

I guess my biggest concern is determining the goal of this jira. What are you 
trying to measure?

* the rpc costs? 
* the cost of the methods on the server?

Micro-benchmarks do best when they have a relatively limited scope or are 
comparing alternatives (thrift vs http rpc).

> Provide Metastore micro-benchmarks
> --
>
> Key: HIVE-19902
> URL: https://issues.apache.org/jira/browse/HIVE-19902
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Alexander Kolbasov
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-19902.01.patch, HIVE-19902.02.patch, 
> HIVE-19902.03.patch, HIVE-19902.04.patch
>
>
> It would be very useful to have metastore benchmarks to be able to track perf 
> issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19902) Provide Metastore micro-benchmarks

2018-07-10 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539326#comment-16539326
 ] 

Owen O'Malley commented on HIVE-19902:
--

JMH gives you a lot of control over what is measured and what isn't measured. I 
agree that it doesn't give you a lot of control over the formatting, but it 
does handle a lot of cases that you need to worry about. (Warm up, forks, 
blackholes, etc.)

You do need to put the benchmarks in their own module that isn't built by 
default, because of the licensing of jmh.

> Provide Metastore micro-benchmarks
> --
>
> Key: HIVE-19902
> URL: https://issues.apache.org/jira/browse/HIVE-19902
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Alexander Kolbasov
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-19902.01.patch, HIVE-19902.02.patch, 
> HIVE-19902.03.patch, HIVE-19902.04.patch
>
>
> It would be very useful to have metastore benchmarks to be able to track perf 
> issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20135) Fix incompatible change in TimestampColumnVector to default to UTC

2018-07-10 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-20135:
-
Fix Version/s: storage-2.7.0

> Fix incompatible change in TimestampColumnVector to default to UTC
> --
>
> Key: HIVE-20135
> URL: https://issues.apache.org/jira/browse/HIVE-20135
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Fix For: storage-2.7.0
>
>
> HIVE-20007 changed the default for TimestampColumnVector to be to use UTC, 
> which breaks the API compatibility with storage-api 2.6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20135) Fix incompatible change in TimestampColumnVector to default to UTC

2018-07-10 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538847#comment-16538847
 ] 

Owen O'Malley commented on HIVE-20135:
--

We need to have Hive always set the useUTC, but not change the default behavior.

> Fix incompatible change in TimestampColumnVector to default to UTC
> --
>
> Key: HIVE-20135
> URL: https://issues.apache.org/jira/browse/HIVE-20135
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Fix For: storage-2.7.0
>
>
> HIVE-20007 changed the default for TimestampColumnVector to be to use UTC, 
> which breaks the API compatibility with storage-api 2.6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20135) Fix incompatible change in TimestampColumnVector to default to UTC

2018-07-10 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-20135:



> Fix incompatible change in TimestampColumnVector to default to UTC
> --
>
> Key: HIVE-20135
> URL: https://issues.apache.org/jira/browse/HIVE-20135
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
>
> HIVE-20007 changed the default for TimestampColumnVector to be to use UTC, 
> which breaks the API compatibility with storage-api 2.6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19465) Upgrade ORC to 1.5.0

2018-05-22 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484662#comment-16484662
 ] 

Owen O'Malley commented on HIVE-19465:
--

[~jcamachorodriguez], you should also change the version of ORC (orc.version) 
in the standalone metastore to 1.5.1 (once it is released).

> Upgrade ORC to 1.5.0
> 
>
> Key: HIVE-19465
> URL: https://issues.apache.org/jira/browse/HIVE-19465
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Attachments: HIVE-19465.01.patch, HIVE-19465.02.patch, 
> HIVE-19465.03.patch, HIVE-19465.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-19013) Fix some minor build issues in storage-api

2018-03-22 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-19013.
--
   Resolution: Fixed
Fix Version/s: storage-2.5.0

> Fix some minor build issues in storage-api
> --
>
> Key: HIVE-19013
> URL: https://issues.apache.org/jira/browse/HIVE-19013
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: storage-2.5.0
>
>
> Currently, the storage-api tests complain that there isn't a log4j2.xml and 
> the javadoc fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19013) Fix some minor build issues in storage-api

2018-03-21 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-19013:



> Fix some minor build issues in storage-api
> --
>
> Key: HIVE-19013
> URL: https://issues.apache.org/jira/browse/HIVE-19013
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> Currently, the storage-api tests complain that there isn't a log4j2.xml and 
> the javadoc fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17580) Remove dependency of get_fields_with_environment_context API to serde

2018-03-06 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388658#comment-16388658
 ] 

Owen O'Malley commented on HIVE-17580:
--

{quote}I don't see a particular logically grouping of classes in storage-api 
(for example, HiveDecimal is in storage-api, but the other types are in serde). 
I think in the longer run we would need to reorganizing this in more consistent 
modules anyways.
{quote}
I've tried, but not always succeeded. HiveDecimal is there, because it is used 
by DecimalColumnVector. The other types aren't used by the other ColumnVectors. 
It would be better if DecimalColumnVector did not use HiveDecimal, but that 
would take a lot of work to fix.

> Remove dependency of get_fields_with_environment_context API to serde
> -
>
> Key: HIVE-17580
> URL: https://issues.apache.org/jira/browse/HIVE-17580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-17580.003-standalone-metastore.patch, 
> HIVE-17580.04-standalone-metastore.patch, 
> HIVE-17580.05-standalone-metastore.patch, 
> HIVE-17580.06-standalone-metastore.patch, 
> HIVE-17580.07-standalone-metastore.patch, 
> HIVE-17580.08-standalone-metastore.patch, 
> HIVE-17580.09-standalone-metastore.patch, 
> HIVE-17580.092-standalone-metastore.patch
>
>
> {{get_fields_with_environment_context}} metastore API uses {{Deserializer}} 
> class to access the fields metadata for the cases where it is stored along 
> with the data files (avro tables). The problem is Deserializer classes is 
> defined in hive-serde module and in order to make metastore independent of 
> Hive we will have to remove this dependency (atleast we should change it to 
> runtime dependency instead of compile time).
> The other option is investigate if we can use SearchArgument to provide this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17580) Remove dependency of get_fields_with_environment_context API to serde

2018-03-06 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388655#comment-16388655
 ] 

Owen O'Malley commented on HIVE-17580:
--

{quote}The other option suggested of moving ObjectInspector to 
standalone-metastore sounds weird to me since it has got nothing to do with 
metastore. ** 
{quote}
It has even less to do with storage-api, which is precisely why I don't want it 
there. Even worse, it moves really high up the release tree:
 # storage-api
 # ORC
 # metastore
 # hive

Moving things into storage-api that don't need to be there is a big cost. 
Moving it to metastore that actually does need the enum makes at least some 
sense. Nothing in storage-api or ORC need or want that enum.

 

> Remove dependency of get_fields_with_environment_context API to serde
> -
>
> Key: HIVE-17580
> URL: https://issues.apache.org/jira/browse/HIVE-17580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-17580.003-standalone-metastore.patch, 
> HIVE-17580.04-standalone-metastore.patch, 
> HIVE-17580.05-standalone-metastore.patch, 
> HIVE-17580.06-standalone-metastore.patch, 
> HIVE-17580.07-standalone-metastore.patch, 
> HIVE-17580.08-standalone-metastore.patch, 
> HIVE-17580.09-standalone-metastore.patch, 
> HIVE-17580.092-standalone-metastore.patch
>
>
> {{get_fields_with_environment_context}} metastore API uses {{Deserializer}} 
> class to access the fields metadata for the cases where it is stored along 
> with the data files (avro tables). The problem is Deserializer classes is 
> defined in hive-serde module and in order to make metastore independent of 
> Hive we will have to remove this dependency (atleast we should change it to 
> runtime dependency instead of compile time).
> The other option is investigate if we can use SearchArgument to provide this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-17580) Remove dependency of get_fields_with_environment_context API to serde

2018-03-05 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386564#comment-16386564
 ] 

Owen O'Malley edited comment on HIVE-17580 at 3/5/18 7:10 PM:
--

The problem with putting ObjectInspector into storage-api is that 
ObjectInspector by itself doesn't do anything. You need the cloud of stuff 
around ObjectInspector to do anything. It is also ill fitting because 
storage-api is the *vectorized* api. It by design does not include the 
ObjectInspector and the associated slow legacy path for Hive.

Maybe a simpler fix is to move ObjectInspector into standalone metastore and 
make serde depend on it. That would at least not pull ObjectInspect into the 
storage-api and only put it where need is.

You could even combine the two with:
{code}
public enum MetastoreTypeCategory {...};
{code}

and have ObjectInspector.TypeCategory with:
{code}
public MetastoreTypeCategory toMetastore();
{code}



was (Author: owen.omalley):
The problem with putting ObjectInspector into storage-api is that 
ObjectInspector by itself doesn't do anything. You need the cloud of stuff 
around ObjectInspector to do anything. It is also ill fitting because 
storage-api is the *vectorized* api. It by design does not include the 
ObjectInspector and the associated slow legacy path for Hive.

Maybe a simpler fix is to move ObjectInspector into standalone metastore and 
make serde depend on it. That would at least not pull ObjectInspect into the 
storage-api and only put it where need is.

You could even combine the two with:
{{public enum MetastoreTypeCategory {...};}}

and have ObjectInspector.TypeCategory with:
{{public MetastoreTypeCategory toMetastore();}}


> Remove dependency of get_fields_with_environment_context API to serde
> -
>
> Key: HIVE-17580
> URL: https://issues.apache.org/jira/browse/HIVE-17580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-17580.003-standalone-metastore.patch, 
> HIVE-17580.04-standalone-metastore.patch, 
> HIVE-17580.05-standalone-metastore.patch, 
> HIVE-17580.06-standalone-metastore.patch, 
> HIVE-17580.07-standalone-metastore.patch, 
> HIVE-17580.08-standalone-metastore.patch, 
> HIVE-17580.09-standalone-metastore.patch, 
> HIVE-17580.092-standalone-metastore.patch
>
>
> {{get_fields_with_environment_context}} metastore API uses {{Deserializer}} 
> class to access the fields metadata for the cases where it is stored along 
> with the data files (avro tables). The problem is Deserializer classes is 
> defined in hive-serde module and in order to make metastore independent of 
> Hive we will have to remove this dependency (atleast we should change it to 
> runtime dependency instead of compile time).
> The other option is investigate if we can use SearchArgument to provide this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17580) Remove dependency of get_fields_with_environment_context API to serde

2018-03-05 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386564#comment-16386564
 ] 

Owen O'Malley commented on HIVE-17580:
--

The problem with putting ObjectInspector into storage-api is that 
ObjectInspector by itself doesn't do anything. You need the cloud of stuff 
around ObjectInspector to do anything. It is also ill fitting because 
storage-api is the *vectorized* api. It by design does not include the 
ObjectInspector and the associated slow legacy path for Hive.

Maybe a simpler fix is to move ObjectInspector into standalone metastore and 
make serde depend on it. That would at least not pull ObjectInspect into the 
storage-api and only put it where need is.

You could even combine the two with:
{{public enum MetastoreTypeCategory {...};}}

and have ObjectInspector.TypeCategory with:
{{public MetastoreTypeCategory toMetastore();}}


> Remove dependency of get_fields_with_environment_context API to serde
> -
>
> Key: HIVE-17580
> URL: https://issues.apache.org/jira/browse/HIVE-17580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-17580.003-standalone-metastore.patch, 
> HIVE-17580.04-standalone-metastore.patch, 
> HIVE-17580.05-standalone-metastore.patch, 
> HIVE-17580.06-standalone-metastore.patch, 
> HIVE-17580.07-standalone-metastore.patch, 
> HIVE-17580.08-standalone-metastore.patch, 
> HIVE-17580.09-standalone-metastore.patch, 
> HIVE-17580.092-standalone-metastore.patch
>
>
> {{get_fields_with_environment_context}} metastore API uses {{Deserializer}} 
> class to access the fields metadata for the cases where it is stored along 
> with the data files (avro tables). The problem is Deserializer classes is 
> defined in hive-serde module and in order to make metastore independent of 
> Hive we will have to remove this dependency (atleast we should change it to 
> runtime dependency instead of compile time).
> The other option is investigate if we can use SearchArgument to provide this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17580) Remove dependency of get_fields_with_environment_context API to serde

2018-02-28 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380792#comment-16380792
 ] 

Owen O'Malley commented on HIVE-17580:
--

I'm not happy about ObjectInspector moving into storage-api.

If you need the Category enum, I'd suggest you duplicate it and extend the 
ObjectInspector.Category with a mapping to the storage-api Category.

> Remove dependency of get_fields_with_environment_context API to serde
> -
>
> Key: HIVE-17580
> URL: https://issues.apache.org/jira/browse/HIVE-17580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-17580.003-standalone-metastore.patch, 
> HIVE-17580.04-standalone-metastore.patch, 
> HIVE-17580.05-standalone-metastore.patch, 
> HIVE-17580.06-standalone-metastore.patch, 
> HIVE-17580.07-standalone-metastore.patch, 
> HIVE-17580.08-standalone-metastore.patch, 
> HIVE-17580.09-standalone-metastore.patch, 
> HIVE-17580.092-standalone-metastore.patch
>
>
> {{get_fields_with_environment_context}} metastore API uses {{Deserializer}} 
> class to access the fields metadata for the cases where it is stored along 
> with the data files (avro tables). The problem is Deserializer classes is 
> defined in hive-serde module and in order to make metastore independent of 
> Hive we will have to remove this dependency (atleast we should change it to 
> runtime dependency instead of compile time).
> The other option is investigate if we can use SearchArgument to provide this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18810) Parquet Or ORC

2018-02-27 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379702#comment-16379702
 ] 

Owen O'Malley commented on HIVE-18810:
--

It isn't clear that a jira is the best way of documenting this. We should 
probably add a page to either the Hive wiki or the ORC website.

That said, you can see my presentation on the file format benchmarks: 

https://www.slideshare.net/oom65/file-format-benchmarks-avro-json-orc-parquet

There has also been a lot of work recently to improve the performance of ORC 
from Spark :

https://community.hortonworks.com/articles/148917/orc-improvements-for-apache-spark-22.html

Other comparisons:

ORC predicate pushdown happens at: file, stripe, and 10,000 rows.
Parquet predicate pushdown happens at: file and stripe

ORC has optional bloom filters, Parquet doesn't. This is related to the 
previous point, because bloom filters only make sense at levels below the 
stripe level.

ORC's type system is much closer to Hive's than Parquet's.


> Parquet Or ORC
> --
>
> Key: HIVE-18810
> URL: https://issues.apache.org/jira/browse/HIVE-18810
> Project: Hive
>  Issue Type: Test
>  Components: Hive
>Affects Versions: 1.1.0
> Environment: Hadoop 1.2.1
> Hive 1.1
>Reporter: Suddhasatwa Bhaumik
>Priority: Major
>
> Hello Experts, 
> I would like to know for which data types (based on size and complexity of 
> data) should one be using Parquet or ORC tables in Hive. E.g., On Hadoop 
> 0.20.0 with hive 0.13, the performance of ORC tables in Hive is very good 
> when accessed even by 3rd party BI systems like SAP Business Objects or 
> Tableau; performing the same tests on Hadoop 1.2.1 with Hive 1.1 does not 
> yield such reliability in queries, although ETL or insert/update of tables 
> are taking nominal time the read performance is not within acceptable limits. 
> In case of any queries, kindly advise. 
> Thanks
> [~suddhasatwa_bhaumik]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-27 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379551#comment-16379551
 ] 

Owen O'Malley commented on HIVE-18608:
--

I've just opened a jira and a pull request that is useful to this and other 
changes that need to specify column names.

https://issues.apache.org/jira/browse/ORC-308

it allows you to specify subfields by name such as your example: 
myInfoArray._elem_.emailBody.



> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-27 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379551#comment-16379551
 ] 

Owen O'Malley edited comment on HIVE-18608 at 2/28/18 12:26 AM:


I've just opened a jira and a pull request that is useful to this and other 
changes that need to specify column names.

https://issues.apache.org/jira/browse/ORC-308

it allows you to specify subfields by name such as your example: 
myInfoArray.\_elem.emailBody.




was (Author: owen.omalley):
I've just opened a jira and a pull request that is useful to this and other 
changes that need to specify column names.

https://issues.apache.org/jira/browse/ORC-308

it allows you to specify subfields by name such as your example: 
myInfoArray._elem_.emailBody.



> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372101#comment-16372101
 ] 

Owen O'Malley commented on HIVE-18608:
--

I'd suggest making the property:
orc.column.encoding.direct =col10,col20

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18744) Vectorization: VectorHashKeyWrapperBatch doesn't check repeated NULLs correctly

2018-02-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369344#comment-16369344
 ] 

Owen O'Malley commented on HIVE-18744:
--

+1 for the fix, however you should add new unit test cases that cover the fix.

> Vectorization: VectorHashKeyWrapperBatch doesn't check repeated NULLs 
> correctly
> ---
>
> Key: HIVE-18744
> URL: https://issues.apache.org/jira/browse/HIVE-18744
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-18744.01.patch
>
>
> Logic for checking selectedInUse isRepeating case for NULL is broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16480) ORC file with empty array and array fails to read

2017-12-27 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304711#comment-16304711
 ] 

Owen O'Malley commented on HIVE-16480:
--

This patch applies to branch-2.1 and branch-2.2. In branch-2.3 and above Hive 
uses the ORC project artifacts, so we'll need to release from ORC. Once the 
patch goes in, we should start that process.

> ORC file with empty array and array fails to read
> 
>
> Key: HIVE-16480
> URL: https://issues.apache.org/jira/browse/HIVE-16480
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1, 2.2.0
>Reporter: David Capwell
>Assignee: Owen O'Malley
>  Labels: pull-request-available
>
> We have a schema that has a array in it.  We were unable to read this 
> file and digging into ORC it seems that the issue is when the array is empty.
> Here is the stack trace
> {code:title=EmptyList.log|borderStyle=solid}
> ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work 
> with type float 
> java.io.IOException: Error reading file: 
> /var/folders/t8/t5x1031d7mn17f6xpwnkkv_4gn/T/1492619355819-0/file-float.orc
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135)
>  ~[hive-exec-2.1.1.jar:2.1.1]
>   at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) 
> [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363) 
> [junit-4.12.jar:4.12]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
>  [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) 
> [junit-rt.jar:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) 
> [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
> for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
>   at 
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.SerializationUtils.readFloat(SerializationUtils.java:78) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$FloatTreeReader.nextVector(TreeReaderFactory.java:619)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> 

[jira] [Updated] (HIVE-16480) ORC file with empty array and array fails to read

2017-12-27 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-16480:
-
Affects Version/s: 2.2.0

> ORC file with empty array and array fails to read
> 
>
> Key: HIVE-16480
> URL: https://issues.apache.org/jira/browse/HIVE-16480
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1, 2.2.0
>Reporter: David Capwell
>Assignee: Owen O'Malley
>  Labels: pull-request-available
>
> We have a schema that has a array in it.  We were unable to read this 
> file and digging into ORC it seems that the issue is when the array is empty.
> Here is the stack trace
> {code:title=EmptyList.log|borderStyle=solid}
> ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work 
> with type float 
> java.io.IOException: Error reading file: 
> /var/folders/t8/t5x1031d7mn17f6xpwnkkv_4gn/T/1492619355819-0/file-float.orc
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135)
>  ~[hive-exec-2.1.1.jar:2.1.1]
>   at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) 
> [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363) 
> [junit-4.12.jar:4.12]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
>  [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) 
> [junit-rt.jar:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) 
> [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
> for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
>   at 
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.SerializationUtils.readFloat(SerializationUtils.java:78) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$FloatTreeReader.nextVector(TreeReaderFactory.java:619)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   

[jira] [Updated] (HIVE-16480) ORC file with empty array and array fails to read

2017-12-27 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-16480:
-
Status: Patch Available  (was: Open)

> ORC file with empty array and array fails to read
> 
>
> Key: HIVE-16480
> URL: https://issues.apache.org/jira/browse/HIVE-16480
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.1.1
>Reporter: David Capwell
>Assignee: Owen O'Malley
>  Labels: pull-request-available
>
> We have a schema that has a array in it.  We were unable to read this 
> file and digging into ORC it seems that the issue is when the array is empty.
> Here is the stack trace
> {code:title=EmptyList.log|borderStyle=solid}
> ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work 
> with type float 
> java.io.IOException: Error reading file: 
> /var/folders/t8/t5x1031d7mn17f6xpwnkkv_4gn/T/1492619355819-0/file-float.orc
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135)
>  ~[hive-exec-2.1.1.jar:2.1.1]
>   at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) 
> [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363) 
> [junit-4.12.jar:4.12]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
>  [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) 
> [junit-rt.jar:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) 
> [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
> for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
>   at 
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.SerializationUtils.readFloat(SerializationUtils.java:78) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$FloatTreeReader.nextVector(TreeReaderFactory.java:619)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154)
>  

[jira] [Commented] (HIVE-16480) ORC file with empty array and array fails to read

2017-12-27 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304684#comment-16304684
 ] 

Owen O'Malley commented on HIVE-16480:
--

We can use this jira to backport the fix from ORC-285.

> ORC file with empty array and array fails to read
> 
>
> Key: HIVE-16480
> URL: https://issues.apache.org/jira/browse/HIVE-16480
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: David Capwell
>Assignee: Owen O'Malley
>
> We have a schema that has a array in it.  We were unable to read this 
> file and digging into ORC it seems that the issue is when the array is empty.
> Here is the stack trace
> {code:title=EmptyList.log|borderStyle=solid}
> ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work 
> with type float 
> java.io.IOException: Error reading file: 
> /var/folders/t8/t5x1031d7mn17f6xpwnkkv_4gn/T/1492619355819-0/file-float.orc
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135)
>  ~[hive-exec-2.1.1.jar:2.1.1]
>   at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) 
> [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363) 
> [junit-4.12.jar:4.12]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
>  [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) 
> [junit-rt.jar:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) 
> [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
> for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
>   at 
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.SerializationUtils.readFloat(SerializationUtils.java:78) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$FloatTreeReader.nextVector(TreeReaderFactory.java:619)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154)
>  

[jira] [Assigned] (HIVE-16480) ORC file with empty array and array fails to read

2017-12-27 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-16480:


Assignee: Owen O'Malley

> ORC file with empty array and array fails to read
> 
>
> Key: HIVE-16480
> URL: https://issues.apache.org/jira/browse/HIVE-16480
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: David Capwell
>Assignee: Owen O'Malley
>
> We have a schema that has a array in it.  We were unable to read this 
> file and digging into ORC it seems that the issue is when the array is empty.
> Here is the stack trace
> {code:title=EmptyList.log|borderStyle=solid}
> ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work 
> with type float 
> java.io.IOException: Error reading file: 
> /var/folders/t8/t5x1031d7mn17f6xpwnkkv_4gn/T/1492619355819-0/file-float.orc
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135)
>  ~[hive-exec-2.1.1.jar:2.1.1]
>   at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) 
> [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  [junit-4.12.jar:4.12]
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) 
> [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363) 
> [junit-4.12.jar:4.12]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
>  [junit-rt.jar:na]
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
>  [junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) 
> [junit-rt.jar:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_121]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) 
> [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
> for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
>   at 
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.SerializationUtils.readFloat(SerializationUtils.java:78) 
> ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$FloatTreeReader.nextVector(TreeReaderFactory.java:619)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154)
>  ~[hive-orc-2.1.1.jar:2.1.1]
>   at 
> 

[jira] [Commented] (HIVE-18112) show create for view having special char in where clause is not showing properly

2017-12-12 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288759#comment-16288759
 ] 

Owen O'Malley commented on HIVE-18112:
--

This looks fine. In general, I prefer to use StandardCharsets.UTF_8 rather than 
the string "UTF-8", but the patch looks good.

> show create for view having special char in where clause is not showing 
> properly
> 
>
> Key: HIVE-18112
> URL: https://issues.apache.org/jira/browse/HIVE-18112
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-18112-branch-2.2.patch, 
> HIVE-18112.1-branch-2.2.patch
>
>
> e.g., 
> CREATE VIEW `v2` AS select `evil_byte1`.`a` from `default`.`EVIL_BYTE1` where 
> `evil_byte1`.`a` = 'abcÖdefÖgh';
> Output:
> ==
> 0: jdbc:hive2://172.26.122.227:1> show create table v2;
> ++--+
> | createtab_stmt  
>|
> ++--+
> | CREATE VIEW `v2` AS select `evil_byte1`.`a` from `default`.`EVIL_BYTE1` 
> where `evil_byte1`.`a` = 'abc�def�gh'  |
> ++--+
> Only show create output is having invalid characters, actual source table 
> content is displayed properly in the console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17714) move custom SerDe schema considerations into metastore from QL

2017-11-14 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251811#comment-16251811
 ] 

Owen O'Malley commented on HIVE-17714:
--

[~alangates] Thanks for watching out for adding dependencies to storage-api. 
Adding JSON and Avro as recursive dependencies for storage-api would be really 
painful. Minimizing the size of storage-api also means that fewer changes cross 
the hive to storage-api artifacts.

> move custom SerDe schema considerations into metastore from QL
> --
>
> Key: HIVE-17714
> URL: https://issues.apache.org/jira/browse/HIVE-17714
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Alan Gates
>
> Columns in metastore for tables that use external schema don't have the type 
> information (since HIVE-11985) and may be entirely inconsistent (since 
> forever, due to issues like HIVE-17713; or for SerDes that allow an URL for 
> the schema, due to a change in the underlying file).
> Currently, if you trace the usage of ConfVars.SERDESUSINGMETASTOREFORSCHEMA, 
> and to MetaStoreUtils.getFieldsFromDeserializer, you'd see that the code in 
> QL handles this in Hive. So, for the most part metastore just returns 
> whatever is stored for columns in the database.
> One exception appears to be get_fields_with_environment_context, which is 
> interesting... so getTable will return incorrect columns (potentially), but 
> get_fields/get_schema will return correct ones from SerDe as far as I can 
> tell.
> As part of separating the metastore, we should make sure all the APIs return 
> the correct schema for the columns; it's not a good idea to have everyone 
> reimplement getFieldsFromDeserializer.
> Note: this should also remove a flag introduced in HIVE-17731



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17714) move custom SerDe schema considerations into metastore from QL

2017-11-14 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251806#comment-16251806
 ] 

Owen O'Malley commented on HIVE-17714:
--

I'm also -1 to the metastore using the Serdes to recreate the table schema. The 
Avro serde is particularly bad in this regard because it can use an external 
file to store the schema. Thus, the schema of the table can change without 
notifying the metastore. That is pretty broken. Does anyone know what the 
original goal of 
that capability was?

I think the long term goal should be to make "load data" should determine if 
the type is self-describing and invoke an interface to determine the types of 
the loaded data.

For managed tables, the metastore needs to know the types of the tables. The 
goal should be to remove the functions that allow users to update the data 
directly without going through Hive. The metastore needs to know the types and 
have relevant statistics. That is the only way the optimizer has a chance of 
figuring out the proper plan.

> move custom SerDe schema considerations into metastore from QL
> --
>
> Key: HIVE-17714
> URL: https://issues.apache.org/jira/browse/HIVE-17714
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Alan Gates
>
> Columns in metastore for tables that use external schema don't have the type 
> information (since HIVE-11985) and may be entirely inconsistent (since 
> forever, due to issues like HIVE-17713; or for SerDes that allow an URL for 
> the schema, due to a change in the underlying file).
> Currently, if you trace the usage of ConfVars.SERDESUSINGMETASTOREFORSCHEMA, 
> and to MetaStoreUtils.getFieldsFromDeserializer, you'd see that the code in 
> QL handles this in Hive. So, for the most part metastore just returns 
> whatever is stored for columns in the database.
> One exception appears to be get_fields_with_environment_context, which is 
> interesting... so getTable will return incorrect columns (potentially), but 
> get_fields/get_schema will return correct ones from SerDe as far as I can 
> tell.
> As part of separating the metastore, we should make sure all the APIs return 
> the correct schema for the columns; it's not a good idea to have everyone 
> reimplement getFieldsFromDeserializer.
> Note: this should also remove a flag introduced in HIVE-17731



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-11-08 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244839#comment-16244839
 ] 

Owen O'Malley commented on HIVE-17794:
--

This generally looks good. Can you add a testcase?

> HCatLoader breaks when a member is added to a struct-column of a table
> --
>
> Key: HIVE-17794
> URL: https://issues.apache.org/jira/browse/HIVE-17794
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17794.1.patch
>
>
> When a table's schema evolves to add a new member to a struct column, Hive 
> queries work fine, but {{HCatLoader}} breaks with the following trace:
> {noformat}
> TaskAttempt 1 failed, info=
>  Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: kite_composites_with_segments: Local Rearrange
>  tuple
> {chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup: New For Each(false,false)
>  bag
> - scope-548 Operator Key: scope-548): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> ... 17 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing (Name: gup_filtered: Filter
>  bag
> - scope-522 Operator Key: scope-522): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at 
> 

[jira] [Commented] (HIVE-17600) Make OrcFile's "enforceBufferSize" user-settable.

2017-11-08 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244800#comment-16244800
 ] 

Owen O'Malley commented on HIVE-17600:
--

+1 for branch-2.2.

> Make OrcFile's "enforceBufferSize" user-settable.
> -
>
> Key: HIVE-17600
> URL: https://issues.apache.org/jira/browse/HIVE-17600
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17600.1-branch-2.2.patch
>
>
> This is a duplicate of ORC-238, but it applies to {{branch-2.2}}.
> Compression buffer-sizes in OrcFile are computed at runtime, except when 
> enforceBufferSize is set. The only snag here is that this flag can't be set 
> by the user.
> When runtime-computed buffer-sizes are not optimal (for some reason), the 
> user has no way to work around it by setting a custom value.
> I have a patch that we use at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17924) Restore SerDe by reverting HIVE-15167 to unbreak API compatibility

2017-10-31 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227043#comment-16227043
 ] 

Owen O'Malley commented on HIVE-17924:
--

Which is the necessary method?

Hive is on Java 8 now, I believe, which would mean we can add default methods 
to interfaces.

> Restore SerDe by reverting HIVE-15167 to unbreak API compatibility
> --
>
> Key: HIVE-17924
> URL: https://issues.apache.org/jira/browse/HIVE-17924
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> HIVE-15167 broke compatibility badly for very little gain and caused a lot of 
> pain for our users. We should revert it and restore the SerDe interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17924) Restore SerDe by reverting HIVE-15167 to unbreak API compatibility

2017-10-30 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225726#comment-16225726
 ] 

Owen O'Malley commented on HIVE-17924:
--

The reason for keeping it is because removing it is painful for our users. For 
an API breaking change there should be a very good reason.  I believe the right 
fix is removing the deprecation.

> Restore SerDe by reverting HIVE-15167 to unbreak API compatibility
> --
>
> Key: HIVE-17924
> URL: https://issues.apache.org/jira/browse/HIVE-17924
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> HIVE-15167 broke compatibility badly for very little gain and caused a lot of 
> pain for our users. We should revert it and restore the SerDe interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17925) Fix TestHooks so that it avoids ClassNotFound on teardown

2017-10-27 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-17925:



> Fix TestHooks so that it avoids ClassNotFound on teardown
> -
>
> Key: HIVE-17925
> URL: https://issues.apache.org/jira/browse/HIVE-17925
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> TestHooks gets a ClassNotFound exception during teardown, which messes up 
> some following tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17924) Restore SerDe by reverting HIVE-15167 to unbreak API compatibility

2017-10-27 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-17924:



> Restore SerDe by reverting HIVE-15167 to unbreak API compatibility
> --
>
> Key: HIVE-17924
> URL: https://issues.apache.org/jira/browse/HIVE-17924
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> HIVE-15167 broke compatibility badly for very little gain and caused a lot of 
> pain for our users. We should revert it and restore the SerDe interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17609) Tool to manipulate delegation tokens

2017-10-03 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190068#comment-16190068
 ] 

Owen O'Malley commented on HIVE-17609:
--

+1

You'll need to update the wiki with the documentation for the new tool.

> Tool to manipulate delegation tokens
> 
>
> Key: HIVE-17609
> URL: https://issues.apache.org/jira/browse/HIVE-17609
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Security
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17609.1-branch-2.2.patch, 
> HIVE-17609.1-branch-2.patch, HIVE-17609.1.patch
>
>
> This was precipitated by OOZIE-2797. We had a case in production where the 
> number of active metastore delegation tokens outstripped the ZooKeeper 
> {{jute.maxBuffer}} size. Delegation tokens could neither be fetched, nor be 
> cancelled. 
> The root-cause turned out to be a miscommunication, causing delegation tokens 
> fetched by Oozie *not* to be cancelled automatically from HCat. This was 
> sorted out as part of OOZIE-2797.
> The issue exposed how poor the log-messages were, in the code pertaining to 
> token fetch/cancellation. We also found need for a tool to query/list/purge 
> delegation tokens that might have expired already. This patch introduces such 
> a tool, and improves the log-messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17576) Improve progress-reporting in TezProcessor

2017-10-03 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190054#comment-16190054
 ] 

Owen O'Malley commented on HIVE-17576:
--

It should look like the HadoopShims and use reflection to find out if the class 
is there. You probably should put the test in HadoopShims to be consistent with 
the rest of the features.

> Improve progress-reporting in TezProcessor
> --
>
> Key: HIVE-17576
> URL: https://issues.apache.org/jira/browse/HIVE-17576
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17576.1.patch, HIVE-17576.2-branch-2.patch, 
> HIVE-17576.2.patch
>
>
> Another one on behalf of [~selinazh] and [~cdrome]. Following the example in 
> [Apache Tez's 
> {{MapProcessor}}|https://github.com/apache/tez/blob/247719d7314232f680f028f4e1a19370ffb7b1bb/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/processor/map/MapProcessor.java#L88],
>  {{TezProcessor}} ought to use {{ProgressHelper}} to report progress for a 
> Tez task. As per [~kshukla]'s advice,
> {quote}
> Tez... provides {{getProgress()}} API for {{AbstractLogicalInput(s)}} which 
> will give the correct progress value for a given Input. The TezProcessor(s) 
> in Hive should use this to do something similar to what MapProcessor in Tez 
> does today, which is use/override ProgressHelper to get the input progress 
> and then set the progress on the processorContext.
> ...
> The default behavior of the ProgressHelper class sets the processor progress 
> to be the average of progress values from all inputs.
> {quote}
> This code is -whacked from- *inspired by* {{MapProcessor}}'s use of 
> {{ProgressHelper}}.
> (For my reference, YHIVE-978.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17463) ORC: include orc-shims in hive-exec.jar

2017-09-06 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155807#comment-16155807
 ] 

Owen O'Malley commented on HIVE-17463:
--

This is part of upgrading Hive trunk to use the upcoming ORC 1.5.0 release.

> ORC: include orc-shims in hive-exec.jar
> ---
>
> Key: HIVE-17463
> URL: https://issues.apache.org/jira/browse/HIVE-17463
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-17463.1.patch
>
>
> ORC-234 added a new shims module - this needs to be part of hive-exec shading 
> to use ORC-1.5.x branch in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17235) Add ORC Decimal64 Serialization/Deserialization (Part 1)

2017-08-08 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118605#comment-16118605
 ] 

Owen O'Malley commented on HIVE-17235:
--

Ok, this patch makes sense because it doesn't break the API. You should add a 
test case for some special values (eg. +/- 999,999,999,999,999,999; +/- 
0.999,999,999,999,999,999; and +/- 123,456,789,012,345,678)

I'll file my patch as a new jira.

> Add ORC Decimal64 Serialization/Deserialization (Part 1)
> 
>
> Key: HIVE-17235
> URL: https://issues.apache.org/jira/browse/HIVE-17235
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17235.03.patch, HIVE-17235.04.patch, 
> HIVE-17235.05.patch, HIVE-17235.06.patch, HIVE-17235.07.patch, 
> HIVE-17235.patch
>
>
> The storage-api changes for ORC-209.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17235) Add ORC Decimal64 Serialization/Deserialization

2017-08-04 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-17235:
-
Attachment: HIVE-17235.patch

This patch clones LongColumnVector and adds some testing.

> Add ORC Decimal64 Serialization/Deserialization
> ---
>
> Key: HIVE-17235
> URL: https://issues.apache.org/jira/browse/HIVE-17235
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17235.03.patch, HIVE-17235.04.patch, 
> HIVE-17235.05.patch, HIVE-17235.patch
>
>
> The storage-api changes for ORC-209.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-04 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114890#comment-16114890
 ] 

Owen O'Malley commented on HIVE-15686:
--

+1

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-08-04 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114767#comment-16114767
 ] 

Owen O'Malley commented on HIVE-17169:
--

+1

Although I note that in general encryption block size is not the same as the 
key length. I believe HDFS only currently supports AES128 and not AES256, so I 
don't think this is a big issue currently. Clearly Hadoop's CipherSuite should 
also include a method for key length. 

Block size: AES128 & AES256 = 128
Key size: AES128 = 128, AES256 = 256


> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17235) Add ORC Decimal64 Serialization/Deserialization

2017-08-04 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114568#comment-16114568
 ] 

Owen O'Malley commented on HIVE-17235:
--

I think we should make a new type that looks like:

{code}
class Decimal64ColumnVector extends ColumnVector {
  long[] vector;
  int precision;
  int scale;
}
{code}

It will be extremely fast and provide a fast conduit to ORC. 

> Add ORC Decimal64 Serialization/Deserialization
> ---
>
> Key: HIVE-17235
> URL: https://issues.apache.org/jira/browse/HIVE-17235
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17235.03.patch, HIVE-17235.04.patch, 
> HIVE-17235.05.patch
>
>
> The storage-api changes for ORC-209.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17173) Add some convenience redirects to the Hive site

2017-07-31 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-17173.
--
Resolution: Fixed

> Add some convenience redirects to the Hive site
> ---
>
> Key: HIVE-17173
> URL: https://issues.apache.org/jira/browse/HIVE-17173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> I'd propose that we add the following redirects to our site's .htaccess:
> * http://hive.apache.org/bugs -> https://issues.apache.org/jira/browse/hive
> * http://hive.apache.org/downloads -> 
> https://www.apache.org/dyn/closer.cgi/hive/
> * http://hive.apache.org/releases -> 
> https://hive.apache.org/docs/downloads.html
> * http://hive.apache.org/src -> https://github.com/apache/hive
> * http://hive.apache.org/web-src -> 
> https://svn.apache.org/repos/asf/hive/cms/trunk
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-17173) Add some convenience redirects to the Hive site

2017-07-31 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-17173.


I committed this to subversion.

> Add some convenience redirects to the Hive site
> ---
>
> Key: HIVE-17173
> URL: https://issues.apache.org/jira/browse/HIVE-17173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> I'd propose that we add the following redirects to our site's .htaccess:
> * http://hive.apache.org/bugs -> https://issues.apache.org/jira/browse/hive
> * http://hive.apache.org/downloads -> 
> https://www.apache.org/dyn/closer.cgi/hive/
> * http://hive.apache.org/releases -> 
> https://hive.apache.org/docs/downloads.html
> * http://hive.apache.org/src -> https://github.com/apache/hive
> * http://hive.apache.org/web-src -> 
> https://svn.apache.org/repos/asf/hive/cms/trunk
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17171) Remove old javadoc versions

2017-07-31 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-17171.
--
Resolution: Fixed

Ok, I committed the change. I also updated the site to point to the archived 
javadocs for the older versions. I removed the links for the *really* old 
versions 0.10, 0.11, and hcat-0.5

> Remove old javadoc versions
> ---
>
> Key: HIVE-17171
> URL: https://issues.apache.org/jira/browse/HIVE-17171
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> We currently have a lot of old javadoc versions. I'd propose that we keep the 
> following versions:
> * r1.2.2
> * r2.1.1
> * r2.2.0
> (Note that 2.3.0 was not checked in to the site.) In particular, I'd suggest 
> we remove:
> * hcat-r0.5.0
> * r0.10.0
> * r0.11.0
> * r0.12.0
> * r0.13.1
> * r1.0.1
> * r1.1.1
> * r2.0.1
> Any concerns?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-17171) Remove old javadoc versions

2017-07-31 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-17171.


> Remove old javadoc versions
> ---
>
> Key: HIVE-17171
> URL: https://issues.apache.org/jira/browse/HIVE-17171
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> We currently have a lot of old javadoc versions. I'd propose that we keep the 
> following versions:
> * r1.2.2
> * r2.1.1
> * r2.2.0
> (Note that 2.3.0 was not checked in to the site.) In particular, I'd suggest 
> we remove:
> * hcat-r0.5.0
> * r0.10.0
> * r0.11.0
> * r0.12.0
> * r0.13.1
> * r1.0.1
> * r1.1.1
> * r2.0.1
> Any concerns?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17173) Add some convenience redirects to the Hive site

2017-07-27 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-17173:
-
Summary: Add some convenience redirects to the Hive site  (was: Add some 
connivence redirects to the Hive site)

> Add some convenience redirects to the Hive site
> ---
>
> Key: HIVE-17173
> URL: https://issues.apache.org/jira/browse/HIVE-17173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> I'd propose that we add the following redirects to our site's .htaccess:
> * http://hive.apache.org/bugs -> https://issues.apache.org/jira/browse/hive
> * http://hive.apache.org/downloads -> 
> https://www.apache.org/dyn/closer.cgi/hive/
> * http://hive.apache.org/releases -> 
> https://hive.apache.org/docs/downloads.html
> * http://hive.apache.org/src -> https://github.com/apache/hive
> * http://hive.apache.org/web-src -> 
> https://svn.apache.org/repos/asf/hive/cms/trunk
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17171) Remove old javadoc versions

2017-07-27 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103410#comment-16103410
 ] 

Owen O'Malley commented on HIVE-17171:
--

[~leftylev] All of the javadocs are checked into subversion, so every version 
is archived indefinitely.  For example, you can see

current: 
https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/

before 2.2.0 was added: 
https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/?p=1015623

In terms of what versions are still being used, it is much harder question. 
Typically, Apache projects keep the last couple releases available and let the 
archives work for the older stuff.

> Remove old javadoc versions
> ---
>
> Key: HIVE-17171
> URL: https://issues.apache.org/jira/browse/HIVE-17171
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> We currently have a lot of old javadoc versions. I'd propose that we keep the 
> following versions:
> * r1.2.2
> * r2.1.1
> * r2.2.0
> (Note that 2.3.0 was not checked in to the site.) In particular, I'd suggest 
> we remove:
> * hcat-r0.5.0
> * r0.10.0
> * r0.11.0
> * r0.12.0
> * r0.13.1
> * r1.0.1
> * r1.1.1
> * r2.0.1
> Any concerns?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-17154) fix rat problems in branch-2.2

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-17154.


> fix rat problems in branch-2.2
> --
>
> Key: HIVE-17154
> URL: https://issues.apache.org/jira/browse/HIVE-17154
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
>
> Fix rat problems in the branch-2.2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17154) fix rat problems in branch-2.2

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-17154.
--
Resolution: Fixed

This was committed and released in 2.2.0

> fix rat problems in branch-2.2
> --
>
> Key: HIVE-17154
> URL: https://issues.apache.org/jira/browse/HIVE-17154
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
>
> Fix rat problems in the branch-2.2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15236) timestamp and date comparison should happen in timestamp

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15236.


> timestamp and date comparison should happen in timestamp
> 
>
> Key: HIVE-15236
> URL: https://issues.apache.org/jira/browse/HIVE-15236
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-15236.patch
>
>
> Currently it happens in string, which results in incorrect result.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15180) Extend JSONMessageFactory to store additional information about metadata objects on different table events

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15180.


> Extend JSONMessageFactory to store additional information about metadata 
> objects on different table events
> --
>
> Key: HIVE-15180
> URL: https://issues.apache.org/jira/browse/HIVE-15180
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15180.1.patch, HIVE-15180.2.patch, 
> HIVE-15180.3.patch, HIVE-15180.3.patch, HIVE-15180.4.patch, 
> HIVE-15180.5.patch, HIVE-15180.6.patch, HIVE-15180.6.patch, 
> HIVE-15180.7.patch, HIVE-15180.7.patch
>
>
> We want the {{NOTIFICATION_LOG}} table to capture additional information 
> about the metadata objects when {{DbNotificationListener}} captures different 
> events for a table (create/drop/alter) and a partition (create/alter/drop). 
> We'll use the messages field to add json objects for table and partitions for 
> create and alter events. The drop events message remains unchanged. These 
> messages can then be used to replay these events on the destination in event 
> of a replication, in a way that will put the destination in a state that's 
> consistent with one of the past states of the source.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15297) Hive should not split semicolon within quoted string literals

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15297.


> Hive should not split semicolon within quoted string literals
> -
>
> Key: HIVE-15297
> URL: https://issues.apache.org/jira/browse/HIVE-15297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-15297.01.patch, HIVE-15297.02.patch, 
> HIVE-15297.03.patch, HIVE-15297.04.patch, HIVE-15297.05.patch
>
>
> String literals in query cannot have reserved symbols. The same set of query 
> works fine in mysql and postgresql. 
> {code}
> hive> CREATE TABLE ts(s varchar(550));
> OK
> Time taken: 0.075 seconds
> hive> INSERT INTO ts VALUES ('Mozilla/5.0 (iPhone; CPU iPhone OS 5_0');
> MismatchedTokenException(14!=326)
>   at 
> org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
>   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:7271)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:7370)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:7510)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:51854)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:45432)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:44578)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:8)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1694)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1176)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:326)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1169)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1288)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1095)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:31 mismatched input '/' expecting ) near 
> 'Mozilla' in value row constructor
> hive>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15036) Druid code recently included in Hive pulls in GPL jar

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15036.


> Druid code recently included in Hive pulls in GPL jar
> -
>
> Key: HIVE-15036
> URL: https://issues.apache.org/jira/browse/HIVE-15036
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Alan Gates
>Assignee: slim bouguerra
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-15036.patch
>
>
> Druid pulls in a jar annotation-2.3.jar.  According to its pom file it is 
> licensed under GPL.  We cannot ship a binary distribution that includes this 
> jar.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15417) Glitches using ACID's row__id hidden column

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15417.


> Glitches using ACID's row__id hidden column
> ---
>
> Key: HIVE-15417
> URL: https://issues.apache.org/jira/browse/HIVE-15417
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Fix For: 2.2.0
>
> Attachments: HIVE-15417.01.patch, HIVE-15417.02.patch, 
> HIVE-15417.patch
>
>
> This only works if you turn PPD off.
> {code:sql}
> drop table if exists hello_acid;
> create table hello_acid (key int, value int)
> partitioned by (load_date date)
> clustered by(key) into 3 buckets
> stored as orc tblproperties ('transactional'='true');
> insert into hello_acid partition (load_date='2016-03-01') values (1, 1);
> insert into hello_acid partition (load_date='2016-03-02') values (2, 2);
> insert into hello_acid partition (load_date='2016-03-03') values (3, 3);
> {code}
> {code}
> hive> set hive.optimize.ppd=true;
> hive> select tid from (select row__id.transactionid as tid from hello_acid) 
> sub where tid = 15;
> FAILED: SemanticException MetaException(message:cannot find field row__id 
> from [0:load_date])
> hive> set hive.optimize.ppd=false;
> hive> select tid from (select row__id.transactionid as tid from hello_acid) 
> sub where tid = 15;
> OK
> tid
> 15
> Time taken: 0.075 seconds, Fetched: 1 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15478) Add file + checksum list for create table/partition during notification creation (whenever relevant)

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15478.


> Add file + checksum list for create table/partition during notification 
> creation (whenever relevant)
> 
>
> Key: HIVE-15478
> URL: https://issues.apache.org/jira/browse/HIVE-15478
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Daniel Dai
> Fix For: 2.2.0
>
> Attachments: HIVE-15478.1.patch, HIVE-15478.2.patch, 
> HIVE-15478.3.patch, HIVE-15478.fix-jdk1.7.patch
>
>
> Currently, file list is being generated during REPL DUMP which will result in 
> inconsistent data getting captured. This ticket is used for event dumping. 
> Bootstrap dump checksum will be in a different Jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15439.


> Support INSERT OVERWRITE for internal druid datasources.
> 
>
> Key: HIVE-15439
> URL: https://issues.apache.org/jira/browse/HIVE-15439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15439.3.patch, HIVE-15439.4.patch, 
> HIVE-15439.5.patch, HIVE-15439.6.patch, HIVE-15439.patch, HIVE-15439.patch, 
> HIVE-15439.patch, HIVE-15439.patch, HIVE-15439.patch, HIVE-15439.patch
>
>
> Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
> In order to add this support will need to add new post insert hook to update 
> the druid metadata. Creation of the segment will be the same as CTAS.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15472) JDBC: Standalone jar is missing ZK dependencies

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15472.


> JDBC: Standalone jar is missing ZK dependencies
> ---
>
> Key: HIVE-15472
> URL: https://issues.apache.org/jira/browse/HIVE-15472
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Tao Li
> Fix For: 2.2.0
>
> Attachments: HIVE-15472.1.patch, HIVE-15472.2.patch
>
>
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/curator/RetryPolicy
>   at org.apache.hive.jdbc.Utils.configureConnParams(Utils.java:514)
>   at org.apache.hive.jdbc.Utils.parseURL(Utils.java:434)
>   at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:132)
>   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:247)
>   at JDBCExecutor.getConnection(JDBCExecutor.java:65)
>   at JDBCExecutor.executeStatement(JDBCExecutor.java:104)
>   at JDBCExecutor.executeSQLFile(JDBCExecutor.java:81)
>   at JDBCExecutor.main(JDBCExecutor.java:183)
> Caused by: java.lang.ClassNotFoundException: org.apache.curator.RetryPolicy
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15487) LLAP: Improvements to random selection while scheduling

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15487.


> LLAP: Improvements to random selection while scheduling
> ---
>
> Key: HIVE-15487
> URL: https://issues.apache.org/jira/browse/HIVE-15487
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0
>
> Attachments: HIVE-15487.1.patch
>
>
> Currently llap scheduler, picks up random host when no locality information 
> is specified or when all requested hosts are busy serving other requests with 
> forced locality. In such cases, we can pick up the next available node in 
> consistent order to get better locality instead of random selection. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15192.


> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Fix For: 2.2.0
>
> Attachments: HIVE-15192.10.patch, HIVE-15192.2.patch, 
> HIVE-15192.3.patch, HIVE-15192.4.patch, HIVE-15192.5.patch, 
> HIVE-15192.6.patch, HIVE-15192.7.patch, HIVE-15192.8.patch, 
> HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-14922) Add perf logging for post job completion steps

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-14922.


> Add perf logging for post job completion steps 
> ---
>
> Key: HIVE-14922
> URL: https://issues.apache.org/jira/browse/HIVE-14922
> Project: Hive
>  Issue Type: Task
>  Components: Logging
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-14922.patch
>
>
> Mostly FS related operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15219) LLAP: Allow additional slider global parameters to be set while creating the LLAP package

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15219.


> LLAP: Allow additional slider global parameters to be set while creating the 
> LLAP package
> -
>
> Key: HIVE-15219
> URL: https://issues.apache.org/jira/browse/HIVE-15219
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.2.0
>
> Attachments: HIVE-15219.02.patch, HIVE-15219.03.patch, 
> HIVE-15219.04.patch, HIVE-15219.04.patch, HIVE-15219.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15232) Add notification events for functions and indexes

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15232.


> Add notification events for functions and indexes
> -
>
> Key: HIVE-15232
> URL: https://issues.apache.org/jira/browse/HIVE-15232
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Fix For: 2.2.0
>
> Attachments: HIVE-15232.1.patch, HIVE-15232.2.patch, 
> HIVE-15232.2.patch, HIVE-15232.2.patch
>
>
> Create/Drop Function and Create/Drop/Alter Index should also generate 
> metastore notification events.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (HIVE-15296) AM may lose task failures and not reschedule when scheduling to LLAP

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15296.


> AM may lose task failures and not reschedule when scheduling to LLAP
> 
>
> Key: HIVE-15296
> URL: https://issues.apache.org/jira/browse/HIVE-15296
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15296.01.patch, HIVE-15296.patch, HIVE-15296.patch
>
>
> First attempt and failure detection:
> {noformat}
> 2016-11-18 20:20:01,980 [INFO] [TaskSchedulerEventHandlerThread] 
> |tezplugins.LlapTaskSchedulerService|: Received allocateRequest. 
> task=attempt_1478967587833_2622_1_06_31_0, priority=65, 
> capability=memory:4096, vCores:1, hosts=[3n01]
> 2016-11-18 20:20:01,982 [INFO] [LlapScheduler] 
> |tezplugins.LlapTaskSchedulerService|: Assigned task 
> TaskInfo{task=attempt_1478967587833_2622_1_06_31_0, priority=65, 
> startTime=0, containerId=null, assignedInstance=null, uniqueId=55, 
> localityDelayTimeout=9223372036854775807} to container 
> container_1_2622_01_56 on node=DynamicServiceInstance 
> [alive=true, host=3n01:15001 with resources=memory:59392, vCores:16, 
> shufflePort=15551, servicesAddress=http://3n01:15002, mgmtPort=15004]
> 2016-11-18 20:20:01,982 [INFO] [LlapScheduler] 
> |tezplugins.LlapTaskSchedulerService|: ScheduleResult for Task: 
> TaskInfo{task=attempt_1478967587833_2622_1_06_31_0, priority=65, 
> startTime=10550817928, containerId=container_1_2622_01_56, 
> assignedInstance=DynamicServiceInstance [alive=true, host=3n01:15001 with 
> resources=memory:59392, vCores:16, shufflePort=15551, 
> servicesAddress=http://3n01:15002, mgmtPort=15004], uniqueId=55, 
> localityDelayTimeout=9223372036854775807} = SCHEDULED
> 2016-11-18 20:20:03,427 [INFO] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: TaskAttempt: 
> [attempt_1478967587833_2622_1_06_31_0] started. Is using containerId: 
> [container_1_2622_01_56] on NM: [3n01:15001]
> 2016-11-18 20:20:03,427 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1478967587833_2622_1][Event:TASK_ATTEMPT_STARTED]: 
> vertexName=Map 1, taskAttemptId=attempt_1478967587833_2622_1_06_31_0, 
> startTime=1479500403427, containerId=container_1_2622_01_56, 
> nodeId=3n01:15001
> 2016-11-18 20:20:03,430 [INFO] [TaskCommunicator # 1] 
> |tezplugins.LlapTaskCommunicator|: Successfully launched task: 
> attempt_1478967587833_2622_1_06_31_0
> 2016-11-18 20:20:03,434 [INFO] [IPC Server handler 11 on 43092] 
> |impl.TaskImpl|: TaskAttempt:attempt_1478967587833_2622_1_06_31_0 sent 
> events: (0-1).
> 2016-11-18 20:20:03,434 [INFO] [IPC Server handler 11 on 43092] 
> |impl.VertexImpl|: Sending attempt_1478967587833_2622_1_06_31_0 24 events 
> [0,24) total 24 vertex_1478967587833_2622_1_06 [Map 1]
> 2016-11-18 20:25:43,249 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1478967587833_2622_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1478967587833_2622_1_06_31_0, 
> creationTime=1479500401929, allocationTime=1479500403426, 
> startTime=1479500403427, finishTime=1479500743249, timeTaken=339822, 
> status=FAILED, taskFailureType=NON_FATAL, errorEnum=TASK_HEARTBEAT_ERROR, 
> diagnostics=AttemptID:attempt_1478967587833_2622_1_06_31_0 Timed out 
> after 300 secs, nodeHttpAddress=http://3n01:15002, counters=Counters: 1, 
> org.apache.tez.common.counters.DAGCounter, DATA_LOCAL_TASKS=1
> 2016-11-18 20:25:43,255 [INFO] [TaskSchedulerEventHandlerThread] 
> |tezplugins.LlapTaskSchedulerService|: Processing de-allocate request for 
> task=attempt_1478967587833_2622_1_06_31_0, state=ASSIGNED, endReason=OTHER
> 2016-11-18 20:25:43,259 [INFO] [Dispatcher thread {Central}] 
> |node.AMNodeImpl|: Attempt failed on node: 3n01:15001 TA: 
> attempt_1478967587833_2622_1_06_31_0 failed: true container: 
> container_1_2622_01_56 numFailedTAs: 7
> 2016-11-18 20:25:43,262 [INFO] [Dispatcher thread {Central}] 
> |impl.VertexImpl|: Source task attempt completed for vertex: 
> vertex_1478967587833_2622_1_07 [Reducer 2] attempt: 
> attempt_1478967587833_2622_1_06_31_0 with state: FAILED vertexState: 
> RUNNING
> {noformat}
> Second attempt:
> {noformat}
> 2016-11-18 20:25:43,267 [INFO] [TaskSchedulerEventHandlerThread] 
> |tezplugins.LlapTaskSchedulerService|: Received allocateRequest. 
> task=attempt_1478967587833_2622_1_06_31_1, priority=64, 
> capability=memory:4096, vCores:1, hosts=null
> 2016-11-18 20:25:43,297 [INFO] [LlapScheduler] 
> |tezplugins.LlapTaskSchedulerService|: ScheduleResult for Task: 
> 

[jira] [Closed] (HIVE-15323) allow the user to turn off reduce-side SMB join

2017-07-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HIVE-15323.


> allow the user to turn off reduce-side SMB  join
> 
>
> Key: HIVE-15323
> URL: https://issues.apache.org/jira/browse/HIVE-15323
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15323.01.patch, HIVE-15323.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   5   6   >