[jira] [Created] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-28 Thread Owen O'Malley (Jira)
Owen O'Malley created HIVE-25400:


 Summary: Move the offset updating in BytesColumnVector to 
setValPreallocated.
 Key: HIVE-25400
 URL: https://issues.apache.org/jira/browse/HIVE-25400
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0


HIVE-25190 changed the semantics of BytesColumnVector so that 
ensureValPreallocated reserved the room, which interacted badly with ORC's 
redact mask code. The redact mask code needs to be able to increase the 
allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb

2021-06-02 Thread Owen O'Malley (Jira)
Owen O'Malley created HIVE-25190:


 Summary: BytesColumnVector fails when the aggregate size is > 1gb
 Key: HIVE-25190
 URL: https://issues.apache.org/jira/browse/HIVE-25190
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, BytesColumnVector will allocate a buffer for small values (< 1mb), 
but fail with:

{code:java}
new RuntimeException("Overflow of newLength. smallBuffer.length="
+ smallBuffer.length + ", nextElemLength=" + nextElemLength);
{code:java}

if the aggregate size of the buffer crosses over 1gb. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24458) Allow access to SArgs without converting to disjunctive normal form

2020-11-30 Thread Owen O'Malley (Jira)
Owen O'Malley created HIVE-24458:


 Summary: Allow access to SArgs without converting to disjunctive 
normal form
 Key: HIVE-24458
 URL: https://issues.apache.org/jira/browse/HIVE-24458
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley


For some use cases, it is useful to have access to the SArg expression in a 
non-normalized form. Currently, the SArg only provides the fully normalized 
expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24455) Fix broken junit framework in storage-api

2020-11-30 Thread Owen O'Malley (Jira)
Owen O'Malley created HIVE-24455:


 Summary: Fix broken junit framework in storage-api
 Key: HIVE-24455
 URL: https://issues.apache.org/jira/browse/HIVE-24455
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The use of junit is broken in storage-api. It results in no tests being found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23215) Make FilterContext and MutableFilterContext interfaces

2020-04-15 Thread Owen O'Malley (Jira)
Owen O'Malley created HIVE-23215:


 Summary: Make FilterContext and MutableFilterContext interfaces
 Key: HIVE-23215
 URL: https://issues.apache.org/jira/browse/HIVE-23215
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Owen O'Malley
Assignee: Owen O'Malley


HIVE-22959 introduced FilterContext to support ORC-577. The duplication of 
fields between the FilterContext and VectorizedRowBatch seems likely to cause 
user confusion. This patch makes them interfaces that VectorizedRowBatch 
implements.

Thus, there is a single copy of the data and no need to copy them back and 
forth. LLAP can make its own implementation of the interfaces if it doesn't 
want to use VectorizedRowBatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22405) Add ColumnVector support for ProlepticCalendar

2019-10-25 Thread Owen O'Malley (Jira)
Owen O'Malley created HIVE-22405:


 Summary: Add ColumnVector support for ProlepticCalendar
 Key: HIVE-22405
 URL: https://issues.apache.org/jira/browse/HIVE-22405
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


Hive recently moved its processing to the proleptic calendar, which has created 
some issues for users who have dates before 1580 AD.

I'd propose extending the column vectors for times & dates to encode which 
calendar they are using.

* create DateColumnVector that extends LongColumnVector
* add a method to change calendars to both DateColumnVector and 
TimestampColumnVector.

{code}
  /**
   * Change the calendar to or from proleptic. If the new and old values of the 
flag are the
   * same, nothing is done.
   * useProleptic - set the flag for the proleptic calendar
   * updateData - change the data to match the new value of the flag.
   */
  void changeCalendar(useProleptic: boolean, updateData: boolean);

  /**
   * Detect whether this data is using the proleptic calendar.
   */
  boolean usingProlepticCalendar();
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22105) Update ORC to 1.5.6.

2019-08-13 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-22105:


 Summary: Update ORC to 1.5.6.
 Key: HIVE-22105
 URL: https://issues.apache.org/jira/browse/HIVE-22105
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


ORC has had some important fixes in the 1.5 branch and they should be picked up 
by Hive.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-21585) Upgrade branch-2.3 to ORC 1.3.4

2019-04-05 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-21585:


 Summary: Upgrade branch-2.3 to ORC 1.3.4
 Key: HIVE-21585
 URL: https://issues.apache.org/jira/browse/HIVE-21585
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Hive's branch-2.3 currently uses ORC 1.3.3.

I'd like to upgrade it use the bug fix release [ORC 
1.3.4|https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-printable/temp/SearchRequest.html?jqlQuery=project+%3D+ORC+AND+status+%3D+Closed+AND+fixVersion+%3D+%221.3.4%22=500].
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20135) Fix incompatible change in TimestampColumnVector to default to UTC

2018-07-10 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-20135:


 Summary: Fix incompatible change in TimestampColumnVector to 
default to UTC
 Key: HIVE-20135
 URL: https://issues.apache.org/jira/browse/HIVE-20135
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Jesus Camacho Rodriguez


HIVE-20007 changed the default for TimestampColumnVector to be to use UTC, 
which breaks the API compatibility with storage-api 2.6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19013) Fix some minor build issues in storage-api

2018-03-21 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-19013:


 Summary: Fix some minor build issues in storage-api
 Key: HIVE-19013
 URL: https://issues.apache.org/jira/browse/HIVE-19013
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, the storage-api tests complain that there isn't a log4j2.xml and the 
javadoc fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-17925) Fix TestHooks so that it avoids ClassNotFound on teardown

2017-10-27 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-17925:


 Summary: Fix TestHooks so that it avoids ClassNotFound on teardown
 Key: HIVE-17925
 URL: https://issues.apache.org/jira/browse/HIVE-17925
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


TestHooks gets a ClassNotFound exception during teardown, which messes up some 
following tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17924) Restore SerDe by reverting HIVE-15167 to unbreak API compatibility

2017-10-27 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-17924:


 Summary: Restore SerDe by reverting HIVE-15167 to unbreak API 
compatibility
 Key: HIVE-17924
 URL: https://issues.apache.org/jira/browse/HIVE-17924
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.3.0, 2.3.1
Reporter: Owen O'Malley
Assignee: Owen O'Malley


HIVE-15167 broke compatibility badly for very little gain and caused a lot of 
pain for our users. We should revert it and restore the SerDe interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17173) Add some connivence redirects to the Hive site

2017-07-25 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-17173:


 Summary: Add some connivence redirects to the Hive site
 Key: HIVE-17173
 URL: https://issues.apache.org/jira/browse/HIVE-17173
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley


I'd propose that we add the following redirects to our site's .htaccess:

* http://hive.apache.org/bugs -> https://issues.apache.org/jira/browse/hive
* http://hive.apache.org/downloads -> 
https://www.apache.org/dyn/closer.cgi/hive/
* http://hive.apache.org/releases -> https://hive.apache.org/docs/downloads.html
* http://hive.apache.org/src -> https://github.com/apache/hive
* http://hive.apache.org/web-src -> 
https://svn.apache.org/repos/asf/hive/cms/trunk

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17171) Remove old javadoc versions

2017-07-25 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-17171:


 Summary: Remove old javadoc versions
 Key: HIVE-17171
 URL: https://issues.apache.org/jira/browse/HIVE-17171
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley


We currently have a lot of old javadoc versions. I'd propose that we keep the 
following versions:

* r1.2.2
* r2.1.1
* r2.2.0

(Note that 2.3.0 was not checked in to the site.) In particular, I'd suggest we 
remove:

* hcat-r0.5.0
* r0.10.0
* r0.11.0
* r0.12.0
* r0.13.1
* r1.0.1
* r1.1.1
* r2.0.1

Any concerns?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17154) fix rat problems in branch-2.2

2017-07-21 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-17154:


 Summary: fix rat problems in branch-2.2
 Key: HIVE-17154
 URL: https://issues.apache.org/jira/browse/HIVE-17154
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Fix rat problems in the branch-2.2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17118) Clean up of HIVE-14309 to move the orc source code to org.apache.hive.orc

2017-07-18 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-17118:


 Summary: Clean up of HIVE-14309 to move the orc source code to 
org.apache.hive.orc
 Key: HIVE-17118
 URL: https://issues.apache.org/jira/browse/HIVE-17118
 Project: Hive
  Issue Type: Bug
  Components: ORC
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.2.0


Just for branch-2.2.

HIVE-14309 shaded the hive-orc jar to use a unique package org.apache.hive.orc 
package. This patch moves the source files over to the right directory and 
removes the shading.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16787) Fix itests in branch-2.2

2017-05-30 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-16787:


 Summary: Fix itests in branch-2.2
 Key: HIVE-16787
 URL: https://issues.apache.org/jira/browse/HIVE-16787
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.2.0


The itests are broken in branch 2.2 and need to be fixed before release.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16683) ORC WriterVersion gets ArrayIndexOutOfBoundsException on newer ORC files

2017-05-16 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-16683:


 Summary: ORC WriterVersion gets ArrayIndexOutOfBoundsException on 
newer ORC files
 Key: HIVE-16683
 URL: https://issues.apache.org/jira/browse/HIVE-16683
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.1.1, 2.2.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley


This only impacts branch-2.1 and branch-2.2, because it has been fixed in the 
ORC project's code base via ORC-125.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16549) Fix an incompatible change in PredicateLeafImpl from HIVE-15269

2017-04-26 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-16549:


 Summary: Fix an incompatible change in PredicateLeafImpl from 
HIVE-15269
 Key: HIVE-16549
 URL: https://issues.apache.org/jira/browse/HIVE-16549
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


HIVE-15269 added a parameter to the constructor for PredicateLeafImpl for a 
configuration object. The configuration object is only used for the new 
LiteralDelegates.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15929) Fix HiveDecimalWritable

2017-02-15 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-15929:


 Summary: Fix HiveDecimalWritable 
 Key: HIVE-15929
 URL: https://issues.apache.org/jira/browse/HIVE-15929
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


HIVE-15335 broke compatibility with Hive 2.1 by making 
HiveDecimalWritable.getInternalStorate() throw an exception when called on an 
unset value. It is easy to instead return an empty array, which will allow the 
old code to allocate a new array.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15922) SchemaEvolution must guarantee that getFileIncluded is not null

2017-02-14 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-15922:


 Summary: SchemaEvolution must guarantee that getFileIncluded is 
not null
 Key: HIVE-15922
 URL: https://issues.apache.org/jira/browse/HIVE-15922
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.1.1
Reporter: Owen O'Malley
 Fix For: 2.1.2


This only impacts branch-2.1, because it is already fixed in master by 
HIVE-14007.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15841) Upgrade Hive to ORC 1.3.2

2017-02-07 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-15841:


 Summary: Upgrade Hive to ORC 1.3.2
 Key: HIVE-15841
 URL: https://issues.apache.org/jira/browse/HIVE-15841
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


Hive needs ORC-141 and ORC-135, so we should upgrade to ORC-1.3.2 once it 
releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15643) remove use of default charset in FastHiveDecimal

2017-01-16 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-15643:


 Summary: remove use of default charset in FastHiveDecimal
 Key: HIVE-15643
 URL: https://issues.apache.org/jira/browse/HIVE-15643
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


HIVE-15335 introduced some new uses of String.getBytes(), which uses the 
default char set. These need to be replaced with the version that always uses 
UTF8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15419) Separate out storage-api to be released independently

2016-12-12 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-15419:


 Summary: Separate out storage-api to be released independently
 Key: HIVE-15419
 URL: https://issues.apache.org/jira/browse/HIVE-15419
 Project: Hive
  Issue Type: Task
  Components: storage-api
Reporter: Owen O'Malley


Currently, the Hive project releases a single monolithic release, but this 
makes file formats reading directly into Hive's vector row batches a circular 
dependence. Storage-api is a small module with the vectorized row batches and 
SearchArgument that are necessary for efficient vectorized read and write. By 
releasing storage-api independently, we can make an interface that the file 
formats can read and write from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15375) Port ORC-115 to storage-api

2016-12-06 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-15375:


 Summary: Port ORC-115 to storage-api
 Key: HIVE-15375
 URL: https://issues.apache.org/jira/browse/HIVE-15375
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, VectorizedRowBatch.toString() assumes that all BytesColumnVector's 
use the internal buffer for all of the values. This leads to incorrect strings 
in many common cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15124) Fix OrcInputFormat to use reader's schema for include boolean array

2016-11-03 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-15124:


 Summary: Fix OrcInputFormat to use reader's schema for include 
boolean array
 Key: HIVE-15124
 URL: https://issues.apache.org/jira/browse/HIVE-15124
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.1.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, the OrcInputFormat uses the file's schema rather than the reader's 
schema. This means that SchemaEvolution fails with an 
ArrayIndexOutOfBoundsException if a partition has a different schema than the 
table.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14309) Fix naming of classes in orc module to not conflict with standalone orc

2016-07-21 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-14309:


 Summary: Fix naming of classes in orc module to not conflict with 
standalone orc
 Key: HIVE-14309
 URL: https://issues.apache.org/jira/browse/HIVE-14309
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The current Hive 2.0 and 2.1 releases have classes in the org.apache.orc 
namespace that clash with the ORC project's classes. From Hive 2.2 onward, the 
classes will only be on ORC, but we'll reduce the problems of classpath issues 
if we rename the classes to org.apache.hive.orc.

I've looked at a set of projects (pig, spark, oozie, flume, & storm) and can't 
find any uses of Hive's versions of the org.apache.orc classes, so I believe 
this is a safe change that will reduce the integration problems down stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14242) Backport ORC-53 to Hive

2016-07-14 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-14242:


 Summary: Backport ORC-53 to Hive
 Key: HIVE-14242
 URL: https://issues.apache.org/jira/browse/HIVE-14242
 Project: Hive
  Issue Type: Bug
  Components: ORC
Reporter: Owen O'Malley
Assignee: Owen O'Malley


ORC-53 was mostly about the mapreduce shims for ORC, but it fixed a problem in 
TypeDescription that should be backported to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14220) Protected users from Reader.rows(Options) modifying the Options object

2016-07-12 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-14220:


 Summary: Protected users from Reader.rows(Options) modifying the 
Options object
 Key: HIVE-14220
 URL: https://issues.apache.org/jira/browse/HIVE-14220
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


This is a matching fix to HIVE-14004 where ACID was getting in to trouble 
because it was reusing the Reader.Options argument between files and 
Reader.rows was modifying it. HIVE-14004 just fixed the Hive case, but we need 
a corresponding fix over here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14166) Minor updates to the website.

2016-07-05 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-14166:


 Summary: Minor updates to the website.
 Key: HIVE-14166
 URL: https://issues.apache.org/jira/browse/HIVE-14166
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Minor updates to the website & documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14007) Replace ORC module with ORC release

2016-06-13 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-14007:


 Summary: Replace ORC module with ORC release
 Key: HIVE-14007
 URL: https://issues.apache.org/jira/browse/HIVE-14007
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.2.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.2.0


This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13906) Remove guava dependence from storage-api module

2016-06-01 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-13906:


 Summary: Remove guava dependence from storage-api module
 Key: HIVE-13906
 URL: https://issues.apache.org/jira/browse/HIVE-13906
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Guava is a very problematic library to depend on because of the version 
incompatibilities and the use of it in the storage-api module causes it to leak 
into everything that depends on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13763) Update smart-apply-patch.sh with ability to use patches from git

2016-05-14 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-13763:


 Summary: Update smart-apply-patch.sh with ability to use patches 
from git
 Key: HIVE-13763
 URL: https://issues.apache.org/jira/browse/HIVE-13763
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, the smart-apply-patch.sh doesn't understand git patches.  It is 
relatively easy to make it understand patches generated by:

{code}
% git format-patch apache/master --stdout > HIVE-999.patch
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13464) Backport changes to storage-api into branch 2 for release into 2.0.1

2016-04-08 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-13464:


 Summary: Backport changes to storage-api into branch 2 for release 
into 2.0.1
 Key: HIVE-13464
 URL: https://issues.apache.org/jira/browse/HIVE-13464
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.1


To release ORC as a separate project, backporting the safe changes for 
storage-api to 2.0.1 will minimize the disruption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13232) Aggressively drop compression buffers in ORC OutStreams

2016-03-08 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-13232:


 Summary: Aggressively drop compression buffers in ORC OutStreams
 Key: HIVE-13232
 URL: https://issues.apache.org/jira/browse/HIVE-13232
 Project: Hive
  Issue Type: Bug
  Components: ORC
Reporter: Owen O'Malley
Assignee: Owen O'Malley


In Hive 0.11, when ORC's OutStream's were flushed they dropped all of the their 
buffers. In the patch for HIVE-4342, we inadvertently changed that behavior so 
that one of the buffers is held on to. For queries with a lot of writers and 
thus under significant memory pressure this can have a significant impact on 
the memory usage. 

Note that "hive.optimize.sort.dynamic.partition" avoids this problem by sorting 
on the dynamic partition key and thus only a single ORC writer is open at once. 
This will use memory more effectively and avoid creating ORC files with very 
small stripes, which will produce better downstream performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12838) Add methods for getting and storing serialized ORC file tails

2016-01-11 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-12838:


 Summary: Add methods for getting and storing serialized ORC file 
tails
 Key: HIVE-12838
 URL: https://issues.apache.org/jira/browse/HIVE-12838
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Provide a pair of routines for getting and restoring from a serialized file 
footer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12638) Hive should not create empty files in partitions

2015-12-09 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-12638:


 Summary: Hive should not create empty files in partitions
 Key: HIVE-12638
 URL: https://issues.apache.org/jira/browse/HIVE-12638
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley


Currently Hive creates empty files for buckets with no rows in a directory. I 
believe this was originally because the SMB and bucket join require files to be 
present to get InputSplits. There are customers where this behavior leads the 
creation of more 200,000 empty ORC files per an hour on a cluster (with peaks 
of more than 725,000 per an hour). We've also seen instances where a single 
DataNode is involved in 5600 of these empty ORC files within a 2 minute period. 
This causes significant stress on HDFS at both the NameNode and DataNode and is 
completely unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12571) Push TypeDescription in to the ReaderImpl and RecordReaderImpl

2015-12-02 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-12571:


 Summary: Push TypeDescription in to the ReaderImpl and 
RecordReaderImpl
 Key: HIVE-12571
 URL: https://issues.apache.org/jira/browse/HIVE-12571
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley


We want to use the TypeDescription rather than List because it 
gives us a much better interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12286) Add option to ORC vectorized reader to not trim spaces from char columns.

2015-10-28 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-12286:


 Summary: Add option to ORC vectorized reader to not trim spaces 
from char columns.
 Key: HIVE-12286
 URL: https://issues.apache.org/jira/browse/HIVE-12286
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley


Currently the ORC reader in nextBatch always strips spaces from char columns. 
It is more natural for non-Hive applications to make it not trim the results on 
read, so I propose adding a switch to ReaderOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12159) Create vectorized readers for the complex types

2015-10-13 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-12159:


 Summary: Create vectorized readers for the complex types
 Key: HIVE-12159
 URL: https://issues.apache.org/jira/browse/HIVE-12159
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


We need vectorized readers for the complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12066) Add javadoc for methods added to public APIs

2015-10-07 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-12066:


 Summary: Add javadoc for methods added to public APIs
 Key: HIVE-12066
 URL: https://issues.apache.org/jira/browse/HIVE-12066
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Sergey Shelukhin


Looking through the changes for ORC, there are methods being added without 
documentation:

{code}
--- ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
+++ ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
@@ -360,8 +353,18 @@ RecordReader rows(long offset, long length,

   MetadataReader metadata() throws IOException;

+  List getVersionList();
+
+  int getMetadataSize();
+
+  List getOrcProtoStripeStatistics();
+
+  List getStripeStatistics();
+
+  List getOrcProtoFileStatistics();
+
+  DataReader createDefaultDataReader(boolean useZeroCopy);
+
{code}

You really need to look through all of the interfaces and fix them before 
merging into master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12055) Create row-by-row shims for the write path

2015-10-07 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-12055:


 Summary: Create row-by-row shims for the write path 
 Key: HIVE-12055
 URL: https://issues.apache.org/jira/browse/HIVE-12055
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


As part of removing the row-by-row writer, we'll need to shim out the higher 
level API (OrcSerde and OrcOutputFormat) so that we maintain backwards 
compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12054) Create vectorized write method

2015-10-07 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-12054:


 Summary: Create vectorized write method
 Key: HIVE-12054
 URL: https://issues.apache.org/jira/browse/HIVE-12054
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley


We need to add writer methods that can write VectorizedRowBatch to an ORC file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11890) Create ORC module

2015-09-18 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11890:


 Summary: Create ORC module
 Key: HIVE-11890
 URL: https://issues.apache.org/jira/browse/HIVE-11890
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


Start moving classes over to the ORC module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11807) Set ORC buffer size in relation to set stripe size

2015-09-13 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11807:


 Summary: Set ORC buffer size in relation to set stripe size
 Key: HIVE-11807
 URL: https://issues.apache.org/jira/browse/HIVE-11807
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley


A customer produced ORC files with very small stripe sizes (10k rows/stripe) by 
setting a small 64MB stripe size and 256K buffer size for a 54 column table. At 
that size, each of the streams only get a buffer or two before the stripe size 
is reached. The current code uses the available memory instead of the stripe 
size and thus doesn't shrink the buffer size if the JVM has much more memory 
than the stripe size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11808) In ORC removing the dynamic dispatch for StringTreeReader improves read by 10%

2015-09-13 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11808:


 Summary: In ORC removing the dynamic dispatch for StringTreeReader 
improves read by 10%
 Key: HIVE-11808
 URL: https://issues.apache.org/jira/browse/HIVE-11808
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


When we introduced the dictionary/direct encodings for ORC, we made subclasses 
of StringTreeReader named StringDirectTreeReader and StringDictionaryTreeReader 
and introduce an additional dynamic dispatch in the inner loop. For tables with 
a lot of string columns, removing that extra dispatch improves performance 10%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11704) Create errata.txt file

2015-08-31 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11704:


 Summary: Create errata.txt file
 Key: HIVE-11704
 URL: https://issues.apache.org/jira/browse/HIVE-11704
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Owen O'Malley
Assignee: Owen O'Malley


As discussed on the email list, we should have a file documenting known 
problems in the commit messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11618) Correct the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG

2015-08-21 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11618:


 Summary: Correct the SARG api to reunify the PredicateLeaf.Type 
INTEGER and LONG
 Key: HIVE-11618
 URL: https://issues.apache.org/jira/browse/HIVE-11618
 Project: Hive
  Issue Type: Bug
  Components: Types
Reporter: Owen O'Malley


The Parquet binding leaked implementation details into the generic SARG api. 

Rather than make all users of the SARG api deal with each of the specific 
types, reunify the INTEGER and LONG types. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11417) Create ObjectInspectors for VectorizedRowBatch

2015-07-30 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11417:


 Summary: Create ObjectInspectors for VectorizedRowBatch
 Key: HIVE-11417
 URL: https://issues.apache.org/jira/browse/HIVE-11417
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


I'd like to make the default path for reading and writing ORC files to be 
vectorized. To ensure that Hive can still read row by row, I'll make 
ObjectInspectors that are backed by the VectorizedRowBatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11370) Extend SARGs to support binary type

2015-07-24 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11370:


 Summary: Extend SARGs to support binary type
 Key: HIVE-11370
 URL: https://issues.apache.org/jira/browse/HIVE-11370
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


Currently the sargs only apply to string, boolean, integer, decimal, floating, 
date, and timestamp columns. It would be good to support binary blobs also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11321) Move OrcFile.OrcTableProperties from OrcFile into OrcConf.

2015-07-20 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11321:


 Summary: Move OrcFile.OrcTableProperties from OrcFile into OrcConf.
 Key: HIVE-11321
 URL: https://issues.apache.org/jira/browse/HIVE-11321
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


We should pull all of the configuration/table property knobs into a single list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11307) Remove getWritableObject from ColumnVectorBatch

2015-07-18 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11307:


 Summary: Remove getWritableObject from ColumnVectorBatch
 Key: HIVE-11307
 URL: https://issues.apache.org/jira/browse/HIVE-11307
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.0


ColumnVectorBatch.getWritableObject is only used in a few tests and is really 
problematic when adding the complex types to vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11253) Move SearchArgument and VectorizedRowBatch classes to storage-api.

2015-07-14 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11253:


 Summary: Move SearchArgument and VectorizedRowBatch classes to 
storage-api.
 Key: HIVE-11253
 URL: https://issues.apache.org/jira/browse/HIVE-11253
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11245) Fix the LLAP to ORC APIs

2015-07-13 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11245:


 Summary: Fix the LLAP to ORC APIs
 Key: HIVE-11245
 URL: https://issues.apache.org/jira/browse/HIVE-11245
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Priority: Blocker
 Fix For: llap


Currently the LLAP branch has refactored the ORC code to have different code 
paths depending on whether the data is coming from the cache or a FileSystem.

We need to introduce a concept of a DataSource that is responsible for getting 
the necessary bytes regardless of whether they are coming from a FileSystem, in 
memory cache, or both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11209) Clean up dependencies in HiveDecimalWritable

2015-07-08 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11209:


 Summary: Clean up dependencies in HiveDecimalWritable
 Key: HIVE-11209
 URL: https://issues.apache.org/jira/browse/HIVE-11209
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently HiveDecimalWritable depends on:
* org.apache.hadoop.hive.serde2.ByteStream
* org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils
* org.apache.hadoop.hive.serde2.typeinfo.HiveDecimalUtils

since we need HiveDecimalWritable for the decimal VectorizedColumnBatch, 
breaking these dependencies will improve things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11210) Remove dependency on HiveConf from Orc reader writer

2015-07-08 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11210:


 Summary: Remove dependency on HiveConf from Orc reader  writer
 Key: HIVE-11210
 URL: https://issues.apache.org/jira/browse/HIVE-11210
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently the ORC reader and writer get their default values from HiveConf. I 
propose that we make the reader and writer have their own programatic defaults 
and the OrcInputFormat and OrcOutputFormat can use the version in HiveConf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11212) Create vectorized types for complex types

2015-07-08 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11212:


 Summary: Create vectorized types for complex types
 Key: HIVE-11212
 URL: https://issues.apache.org/jira/browse/HIVE-11212
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


We need vectorized types for structs, maps, lists, and unions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11144) Replace row by row reader and writer with shims to vectorized path.

2015-06-29 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11144:


 Summary: Replace row by row reader and writer with shims to 
vectorized path.
 Key: HIVE-11144
 URL: https://issues.apache.org/jira/browse/HIVE-11144
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The core ORC reader and writer will be better served if the vectorized read and 
write paths are the primary API and the row by row reader and writer and their 
corresponding object inspectors become Hive-specific shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils

2015-06-28 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11137:


 Summary: In DateWritable remove the use of LazyBinaryUtils
 Key: HIVE-11137
 URL: https://issues.apache.org/jira/browse/HIVE-11137
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently the DateWritable class uses LazyBinaryUtils, which has a lot of 
dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11124) Move OrcRecordUpdater.getAcidEventFields to RecordReaderFactory

2015-06-25 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11124:


 Summary: Move OrcRecordUpdater.getAcidEventFields to 
RecordReaderFactory
 Key: HIVE-11124
 URL: https://issues.apache.org/jira/browse/HIVE-11124
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Move OrcRecordUpdater.getAcidEventFields to RecordReaderFactory to avoid the 
extra dependence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11115) Remove dependence from ORC's WriterImpl to OrcInputFormat

2015-06-25 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-5:


 Summary: Remove dependence from ORC's WriterImpl to OrcInputFormat
 Key: HIVE-5
 URL: https://issues.apache.org/jira/browse/HIVE-5
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently there is a link from WriterImpl to OrcInputFormat that should be 
removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11086) Remove use of ErrorMsg in Orc's RunLengthIntegerReaderV2

2015-06-23 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11086:


 Summary: Remove use of ErrorMsg in Orc's RunLengthIntegerReaderV2
 Key: HIVE-11086
 URL: https://issues.apache.org/jira/browse/HIVE-11086
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


ORC's rle v2 reader uses a string literal from ErrorMsg, which forces a large 
dependency on the rle v2 reader. Pulling the string literal in directly doesn't 
change the behavior and fixes the linkage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11080) Modify VectorizedRowBatch.toString() to not depend on VectorExpressionWriter

2015-06-22 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11080:


 Summary: Modify VectorizedRowBatch.toString() to not depend on 
VectorExpressionWriter
 Key: HIVE-11080
 URL: https://issues.apache.org/jira/browse/HIVE-11080
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently the VectorizedRowBatch.toString method uses the 
VectorExpressionWriter to convert the row batch to a string.

Since the string is only used for printing error messages, I'd propose making 
the toString use the types of the vector batch instead of the object inspector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10798) Remove dependence on VectorizedBatchUtil from VectorizedOrcAcidRowReader

2015-05-22 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10798:


 Summary: Remove dependence on VectorizedBatchUtil from 
VectorizedOrcAcidRowReader
 Key: HIVE-10798
 URL: https://issues.apache.org/jira/browse/HIVE-10798
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


VectorizedBatchUtil has a lot of dependences that Orc should avoid and the code 
should be refactored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10796) Remove dependencies on NumericHistogram and NumDistinctValueEstimator from JavaDataModel

2015-05-22 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10796:


 Summary: Remove dependencies on NumericHistogram and 
NumDistinctValueEstimator from JavaDataModel
 Key: HIVE-10796
 URL: https://issues.apache.org/jira/browse/HIVE-10796
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The JavaDataModel class is used in a lot of places and the non-general 
calculations are better done in the other classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10797) Simplify the test for vectorized input

2015-05-22 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10797:


 Summary: Simplify the test for vectorized input
 Key: HIVE-10797
 URL: https://issues.apache.org/jira/browse/HIVE-10797
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The call to Utilities.isVectorMode should be simplified for the readers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10795) Remove use of PerfLogger from Orc

2015-05-22 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10795:


 Summary: Remove use of PerfLogger from Orc
 Key: HIVE-10795
 URL: https://issues.apache.org/jira/browse/HIVE-10795
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


PerfLogger is yet another class with a huge dependency set that Orc doesn't 
need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10799) Refactor the SearchArgumentFactory to remove the dependence on ExprNodeGenericFuncDesc

2015-05-22 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10799:


 Summary: Refactor the SearchArgumentFactory to remove the 
dependence on ExprNodeGenericFuncDesc
 Key: HIVE-10799
 URL: https://issues.apache.org/jira/browse/HIVE-10799
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


SearchArgumentFactory and SearchArgumentImpl are high level and shouldn't 
depend on the internals of Hive's AST model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10794) Remove the dependence from ErrorMsg to HiveUtils

2015-05-21 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10794:


 Summary: Remove the dependence from ErrorMsg to HiveUtils
 Key: HIVE-10794
 URL: https://issues.apache.org/jira/browse/HIVE-10794
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley


HiveUtils has a large set of dependencies and ErrorMsg only needs the new line 
constant. Breaking the dependence will reduce the dependency set from ErrorMsg 
significantly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10407) separate out the timestamp ranges for testing purposes

2015-04-20 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10407:


 Summary: separate out the timestamp ranges for testing purposes
 Key: HIVE-10407
 URL: https://issues.apache.org/jira/browse/HIVE-10407
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Some platforms have limits for date ranges, so separate out the test cases that 
are outside of the range 1970 to 2038.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10305) TestOrcFile has a mistake that makes metadata test ineffective

2015-04-10 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10305:


 Summary: TestOrcFile has a mistake that makes metadata test 
ineffective
 Key: HIVE-10305
 URL: https://issues.apache.org/jira/browse/HIVE-10305
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Two of the values that are being stored as user metadata in 
TestOrcFile.metaData weren't flipped and thus were empty buffers. The test 
passes because they are compared to empty buffers. We should fix the test to 
perform the expected test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10171) Create a storage-api module

2015-03-31 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10171:


 Summary: Create a storage-api module
 Key: HIVE-10171
 URL: https://issues.apache.org/jira/browse/HIVE-10171
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


To support high performance file formats, I'd like to propose that we move the 
minimal set of classes that are required to integrate with Hive in to a new 
module named storage-api. This module will include VectorizedRowBatch, the 
various ColumnVector classes, and the SARG classes. It will form the start of 
an API that high performance storage formats can use to integrate with Hive. 
Both ORC and Parquet can use the new API to support vectorization and SARGs 
without performance destroying shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9593) ORC Reader should ignore unknown metadata streams

2015-02-11 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9593:

   Resolution: Fixed
Fix Version/s: 1.1.0
   1.0.1
   Status: Resolved  (was: Patch Available)

I committed this. Thanks for the review, Gopal!

 ORC Reader should ignore unknown metadata streams 
 --

 Key: HIVE-9593
 URL: https://issues.apache.org/jira/browse/HIVE-9593
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.11.0, 0.12.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
Reporter: Gopal V
Assignee: Owen O'Malley
 Fix For: 1.0.1, 1.1.0

 Attachments: HIVE-9593.no-autogen.patch, hive-9593.patch


 ORC readers should ignore metadata streams which are non-essential additions 
 to the main data streams.
 This will include additional indices, histograms or anything we add as an 
 optional stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9593) ORC Reader should ignore unknown metadata streams

2015-02-05 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9593:

Status: Patch Available  (was: Open)

 ORC Reader should ignore unknown metadata streams 
 --

 Key: HIVE-9593
 URL: https://issues.apache.org/jira/browse/HIVE-9593
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.1, 0.12.0, 0.11.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Gopal V
Assignee: Owen O'Malley
 Attachments: hive-9593.patch


 ORC readers should ignore metadata streams which are non-essential additions 
 to the main data streams.
 This will include additional indices, histograms or anything we add as an 
 optional stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9593) ORC Reader should ignore unknown metadata streams

2015-02-05 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9593:

Attachment: hive-9593.patch

This patch changes all of the required fields to be optional. I've gone through 
the current code to ensure that null pointers from getKind() won't cause NPE.

 ORC Reader should ignore unknown metadata streams 
 --

 Key: HIVE-9593
 URL: https://issues.apache.org/jira/browse/HIVE-9593
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.11.0, 0.12.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
Reporter: Gopal V
Assignee: Owen O'Malley
 Attachments: hive-9593.patch


 ORC readers should ignore metadata streams which are non-essential additions 
 to the main data streams.
 This will include additional indices, histograms or anything we add as an 
 optional stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index

2015-02-02 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302507#comment-14302507
 ] 

Owen O'Malley commented on HIVE-9188:
-

Suggestions:
* Pick m to always be a multiple of 64 (since you are using longs are the 
representation)
* change the representation of BloomFilter in orc_proto to record the number of 
hash functions and not the size or fpp.
* use fixed64 for the bit field
* you'll also need to update the specification in the wiki with the change to 
the format 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-orc-specORCFormatSpecification)
* revert the spurious change to CliDriver.java
* revert the spurious change to .gitignore
* it seems suboptimal to convert long values to bytes before hashing


 BloomFilter in ORC row group index
 --

 Key: HIVE-9188
 URL: https://issues.apache.org/jira/browse/HIVE-9188
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: orcfile
 Attachments: HIVE-9188.1.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, 
 HIVE-9188.4.patch, HIVE-9188.5.patch, HIVE-9188.6.patch


 BloomFilters are well known probabilistic data structure for set membership 
 checking. We can use bloom filters in ORC index for better row group pruning. 
 Currently, ORC row group index uses min/max statistics to eliminate row 
 groups (stripes as well) that do not satisfy predicate condition specified in 
 the query. But in some cases, the efficiency of min/max based elimination is 
 not optimal (unsorted columns with wide range of entries). Bloom filters can 
 be an effective and efficient alternative for row group/split elimination for 
 point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata

2015-01-29 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297178#comment-14297178
 ] 

Owen O'Malley commented on HIVE-9451:
-

We should also record the stripe size that was used as the file was written. 
That gives a strict upper bound on the size of memory in the writer.

 Add max size of column dictionaries to ORC metadata
 ---

 Key: HIVE-9451
 URL: https://issues.apache.org/jira/browse/HIVE-9451
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley

 To predict the amount of memory required to read an ORC file we need to know 
 the size of the dictionaries for the columns that we are reading. I propose 
 adding the number of bytes for each column's dictionary to the stripe's 
 column statistics. The file's column statistics would have the maximum 
 dictionary size for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-29 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297319#comment-14297319
 ] 

Owen O'Malley commented on HIVE-9317:
-

+1 to not rolling a new RC specifically for this one. I just want to make sure 
it goes into to any new RCs.

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.15.0, 1.0.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

   Resolution: Fixed
Fix Version/s: 1.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I committed this. Thanks for the review, Alan.

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.15.0, 1.0.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

Attachment: hive-9327.txt

This patch changes no code, just puts the required Apache header on the source 
files and moves Microsoft's copyright notice to the NOTICE file.

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
 Fix For: 0.15.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

Priority: Blocker  (was: Major)

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.15.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-9317:
---

Assignee: Owen O'Malley

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.15.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

Status: Patch Available  (was: Open)

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.15.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9467) ORC - sort dictionary streams to the end of the stripe

2015-01-26 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-9467:
---

 Summary: ORC - sort dictionary streams to the end of the stripe
 Key: HIVE-9467
 URL: https://issues.apache.org/jira/browse/HIVE-9467
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley


When reading ORC files, it would be convenient to group the dictionary streams 
at the end of the stripe. This would allow the reader to use fewer read 
operations if they want to load the dictionaries before they load the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9451) Add max size of column dictionaries to ORC metadata

2015-01-23 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-9451:
---

 Summary: Add max size of column dictionaries to ORC metadata
 Key: HIVE-9451
 URL: https://issues.apache.org/jira/browse/HIVE-9451
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley


To predict the amount of memory required to read an ORC file we need to know 
the size of the dictionaries for the columns that we are reading. I propose 
adding the number of bytes for each column's dictionary to the stripe's column 
statistics. The file's column statistics would have the maximum dictionary size 
for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2015-01-20 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284927#comment-14284927
 ] 

Owen O'Malley commented on HIVE-8966:
-

This looks good, Alan. +1

One minor nit is that the class javadoc for ValidReadTxnList has And instead 
of the intended An.


 Delta files created by hive hcatalog streaming cannot be compacted
 --

 Key: HIVE-8966
 URL: https://issues.apache.org/jira/browse/HIVE-8966
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
 Environment: hive
Reporter: Jihong Liu
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, 
 HIVE-8966.5.patch, HIVE-8966.patch


 hive hcatalog streaming will also create a file like bucket_n_flush_length in 
 each delta directory. Where n is the bucket number. But the 
 compactor.CompactorMR think this file also needs to compact. However this 
 file of course cannot be compacted, so compactor.CompactorMR will not 
 continue to do the compaction. 
 Did a test, after removed the bucket_n_flush_length file, then the alter 
 table partition compact finished successfully. If don't delete that file, 
 nothing will be compacted. 
 This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2015-01-20 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284935#comment-14284935
 ] 

Owen O'Malley commented on HIVE-8966:
-

After a little more thought, I'm worried that someone will accidentally create 
a ValidCompactorTxnList and get confused by the different behavior. I think it 
would make sense to move it into the compactor package to minimize the chance 
that someone accidentally uses it by mistake. 

 Delta files created by hive hcatalog streaming cannot be compacted
 --

 Key: HIVE-8966
 URL: https://issues.apache.org/jira/browse/HIVE-8966
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
 Environment: hive
Reporter: Jihong Liu
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8966.2.patch, HIVE-8966.3.patch, HIVE-8966.4.patch, 
 HIVE-8966.5.patch, HIVE-8966.patch


 hive hcatalog streaming will also create a file like bucket_n_flush_length in 
 each delta directory. Where n is the bucket number. But the 
 compactor.CompactorMR think this file also needs to compact. However this 
 file of course cannot be compacted, so compactor.CompactorMR will not 
 continue to do the compaction. 
 Did a test, after removed the bucket_n_flush_length file, then the alter 
 table partition compact finished successfully. If don't delete that file, 
 nothing will be compacted. 
 This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index

2015-01-13 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275997#comment-14275997
 ] 

Owen O'Malley commented on HIVE-9188:
-

[~prasanth_j] Please remove the upper two levels of bloom filters. They are 
utterly useless. Their false positive rate will be far above 99%.

They absolutely should not be stored in the column statistics. That will hurt 
the common ppd case and not help.

 BloomFilter in ORC row group index
 --

 Key: HIVE-9188
 URL: https://issues.apache.org/jira/browse/HIVE-9188
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: orcfile
 Attachments: HIVE-9188.1.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, 
 HIVE-9188.4.patch


 BloomFilters are well known probabilistic data structure for set membership 
 checking. We can use bloom filters in ORC index for better row group pruning. 
 Currently, ORC row group index uses min/max statistics to eliminate row 
 groups (stripes as well) that do not satisfy predicate condition specified in 
 the query. But in some cases, the efficiency of min/max based elimination is 
 not optimal (unsorted columns with wide range of entries). Bloom filters can 
 be an effective and efficient alternative for row group/split elimination for 
 point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-08 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-9317:
---

 Summary: move Microsoft copyright to NOTICE file
 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
 Fix For: 0.15.0


There are a set of files that still have the Microsoft copyright notices. Those 
notices need to be moved into NOTICES and replaced with the standard Apache 
headers.

{code}
./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index

2015-01-07 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268573#comment-14268573
 ] 

Owen O'Malley commented on HIVE-9188:
-

[~prasanth_j] Ok, I thought that you said that you were going to have bloom 
filters at row group, stripe, and file level. I agree completely that ORC 
should only have bloom filters at the row group level.

Having the bloom filter as a separate stream means the reader does *far* less 
IO. It will still go through the code that merges adjacent ranges together into 
a single read. So if you need all of the indexes and bloom filters for all of 
the columns the reader should read them in a single IO operation. On the other 
hand, if it doesn't need any bloom filter it shouldn't have to load the extra 
mb of data it doesn't need.

 BloomFilter in ORC row group index
 --

 Key: HIVE-9188
 URL: https://issues.apache.org/jira/browse/HIVE-9188
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: orcfile
 Attachments: HIVE-9188.1.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, 
 HIVE-9188.4.patch


 BloomFilters are well known probabilistic data structure for set membership 
 checking. We can use bloom filters in ORC index for better row group pruning. 
 Currently, ORC row group index uses min/max statistics to eliminate row 
 groups (stripes as well) that do not satisfy predicate condition specified in 
 the query. But in some cases, the efficiency of min/max based elimination is 
 not optimal (unsorted columns with wide range of entries). Bloom filters can 
 be an effective and efficient alternative for row group/split elimination for 
 point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index

2015-01-07 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268176#comment-14268176
 ] 

Owen O'Malley commented on HIVE-9188:
-

[~gopalv] I don't understand your concern. The indexes are already stored in 
ROW_INDEX streams. I'm just saying that the bloom filters, which are much 
larger than the rest of the ROW_INDEX be split into a BLOOM_FILTER stream 
instead of bundled in with the ROW_INDEX stream. That would let you load just 
the ROW_INDEX if you don't need the bloom filter.

The size of the bloom filter needs to be changed relative to the number of 
items. You've sized them for the default row group size (n = 10,000, p=0.05) - 
7.8kb. To use them at the file level, you'd need to make the bloom filters much 
much much larger. For a file with 100 million values in a column, you'd need a 
74mb bloom filter. I'd propose that you only do the bloom filters at the row 
group level and scale them to match the row index stride rather than just use 
the default 10k.

 BloomFilter in ORC row group index
 --

 Key: HIVE-9188
 URL: https://issues.apache.org/jira/browse/HIVE-9188
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: orcfile
 Attachments: HIVE-9188.1.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, 
 HIVE-9188.4.patch


 BloomFilters are well known probabilistic data structure for set membership 
 checking. We can use bloom filters in ORC index for better row group pruning. 
 Currently, ORC row group index uses min/max statistics to eliminate row 
 groups (stripes as well) that do not satisfy predicate condition specified in 
 the query. But in some cases, the efficiency of min/max based elimination is 
 not optimal (unsorted columns with wide range of entries). Bloom filters can 
 be an effective and efficient alternative for row group/split elimination for 
 point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-07 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268053#comment-14268053
 ] 

Owen O'Malley commented on HIVE-4639:
-

You should encode four values:
  no_values, all_nulls, some_nulls, no_nulls

This will allow you to support a richer set of sargs.

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Attachments: HIVE-4639.1.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index

2015-01-07 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267993#comment-14267993
 ] 

Owen O'Malley commented on HIVE-9188:
-

I'm concerned about the size of the bloom filters and making them an integrated 
part of the column statistics. I think we'd do much better to make a 
BLOOM_FILTER stream kind and place them in a completely separate stream. That 
would allow the predicate push down to only load the bloom filters for the 
columns that it needs.

 BloomFilter in ORC row group index
 --

 Key: HIVE-9188
 URL: https://issues.apache.org/jira/browse/HIVE-9188
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: orcfile
 Attachments: HIVE-9188.1.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, 
 HIVE-9188.4.patch


 BloomFilters are well known probabilistic data structure for set membership 
 checking. We can use bloom filters in ORC index for better row group pruning. 
 Currently, ORC row group index uses min/max statistics to eliminate row 
 groups (stripes as well) that do not satisfy predicate condition specified in 
 the query. But in some cases, the efficiency of min/max based elimination is 
 not optimal (unsorted columns with wide range of entries). Bloom filters can 
 be an effective and efficient alternative for row group/split elimination for 
 point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9166) Place an upper bound for SARG CNF conversion

2014-12-18 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252406#comment-14252406
 ] 

Owen O'Malley commented on HIVE-9166:
-

+1 LGTM

You probably should add a test case where there is something other than the 
large CNF.

something like (and leaf-1 (or ...))

You should end up with leaf-1 as your final expression.



 Place an upper bound for SARG CNF conversion
 

 Key: HIVE-9166
 URL: https://issues.apache.org/jira/browse/HIVE-9166
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: orcfile
 Attachments: HIVE-9166.1.patch, HIVE-9166.2.patch


 SARG creation in ORC, applies several optimizations to expression tree. In 
 that CNF conversion is an exponential algorithm as it finds all combinations 
 of expressions when converting from OR of AND form to AND of OR form (CNF). 
 We need an upper bound for this algorithm to prevent it from running for long 
 time and generating huge combinations list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2014-12-09 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240415#comment-14240415
 ] 

Owen O'Malley commented on HIVE-8966:
-

Alan, your patch looks good +1

 Delta files created by hive hcatalog streaming cannot be compacted
 --

 Key: HIVE-8966
 URL: https://issues.apache.org/jira/browse/HIVE-8966
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
 Environment: hive
Reporter: Jihong Liu
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8966.2.patch, HIVE-8966.patch


 hive hcatalog streaming will also create a file like bucket_n_flush_length in 
 each delta directory. Where n is the bucket number. But the 
 compactor.CompactorMR think this file also needs to compact. However this 
 file of course cannot be compacted, so compactor.CompactorMR will not 
 continue to do the compaction. 
 Did a test, after removed the bucket_n_flush_length file, then the alter 
 table partition compact finished successfully. If don't delete that file, 
 nothing will be compacted. 
 This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8880) non-synchronized access to split list in OrcInputFormat

2014-12-05 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236177#comment-14236177
 ] 

Owen O'Malley commented on HIVE-8880:
-

+1, this is good.

 non-synchronized access to split list in OrcInputFormat
 ---

 Key: HIVE-8880
 URL: https://issues.apache.org/jira/browse/HIVE-8880
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.14.1

 Attachments: HIVE-8880.patch


 When adding delta files to the list of orc splits access to the list is not 
 synchronized though it is shared across threads.  All other additions to the 
 list are synchronized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-07 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202126#comment-14202126
 ] 

Owen O'Malley commented on HIVE-8732:
-

I should also point out that I added a line to the orcfiledump with a line 
about the version. New files will get the line:

File Version: 0.12 with HIVE_8732

Files written by the old writer will say either:

File Version: 0.12 with ORIGINAL
or
File Version: 0.11 with ORIGINAL



 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-06 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-8732:

Attachment: HIVE-8732.patch

I had to fix some minor problems and update a bunch of qfile tests because the 
ORC files are now 2 bytes longer.

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8746) ORC timestamp columns are sensitive to daylight savings time

2014-11-05 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-8746:
---

 Summary: ORC timestamp columns are sensitive to daylight savings 
time
 Key: HIVE-8746
 URL: https://issues.apache.org/jira/browse/HIVE-8746
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Hive uses Java's Timestamp class to manipulate timestamp columns. Unfortunately 
the textual parsing in Timestamp is done in local time and the internal storage 
is in UTC.

ORC mostly side steps this issue by storing the difference between the time and 
a base time also in local and storing that difference in the file. Reading the 
file between timezones will mostly work correctly 2014-01-01 12:34:56 will 
read correctly in every timezone.

However, when moving between timezones with different daylight saving it 
creates trouble. In particular, moving from a computer in PST to UTC will read 
2014-06-06 12:34:56 as 2014-06-06 11:34:56.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-05 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199199#comment-14199199
 ] 

Owen O'Malley commented on HIVE-8732:
-

I've created the timestamp bug as HIVE-8746. The fix for that one is pretty 
touchy and I'll do it in 0.15 I think rather than risk the 0.14 release.

I don't want to create a new write format since the old reader will read the 
corrected files. I will add a flag that I can use to suppress using the split 
elimination code for files with broken stripe/file indexes.

Does that sound reasonable?

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   >