date:20150319

Re: ORC separate project

2015-03-19 Thread Xuefu Zhang

Hi Owen,

I'd like to get involved.

Thanks,
Xuefu

On Thu, Mar 19, 2015 at 2:44 PM, Owen O'Malley omal...@apache.org wrote:

 All,
Over the last year, there has been a fair number of projects that want
 to integrate with ORC, but don't want a dependence on Hive's exec jar.
 Additionally, we've been working on a C++ reader (and soon writer) and it
 would be great to host them both in the same project. Toward that end, I'd
 like to create a separate ORC project at Apache. There will be lots of
 technical details to work out, but I wanted to give the Hive community a
 chance to discuss it. Do any of the Hive committers want to be included on
 the proposal?

 Of the current Hive committers, my list looks like:
 * Alan
 * Gunther
 * Prasanth
 * Lefty
 * Owen
 * Sergey
 * Gopal
 * Kevin

 Did I miss anyone?

 Thanks!
Owen

Re: Review Request 32254: HIVE-10007:Support qualified table name in analyze table compute statistics for columns

2015-03-19 Thread Chaoyu Tang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32254/
---

(Updated March 20, 2015, 12:08 a.m.)


Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.


Changes
---

Fixed three failed tests and add a new case to compute column stats for a table 
not under default db.


Bugs: HIVE-10007
https://issues.apache.org/jira/browse/HIVE-10007


Repository: hive-git


Description
---

Currently Currently analyze table compute statistics for columns command can 
not compute column stats for a table in a different database since it does not 
support qualified table name. You need switch to that table database in order 
to compute its column stats. So does it for ALTER TABLE .. UPDATE STATISTICS 
FOR COLUMN command.
This JIRA will provide the support to qualified table name in analyze and 
update column stats commands.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 
e2f696eea7b472d4e417629eedd33406b41caf98 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java 
28c17b75356eb4ffaac2193a80b27bdd67d98009 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
c83523e507f1003eff1a821c216a7f6904f293ed 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
b9e15a1b2f60b291759444a4407acaf1e0d9c515 
  ql/src/test/queries/clientpositive/alter_partition_update_status.q 
1eee9a506f3854eace7deaa3f4ec02c1f7bd3479 
  ql/src/test/queries/clientpositive/alter_table_update_status.q 
fd45cd426c17a7bfa1371081dd81ff6eb1b7353d 
  ql/src/test/queries/clientpositive/columnstats_partlvl.q 
82a9e0f0fef0c9bf909028f23ccd47ee07a61a2b 
  ql/src/test/queries/clientpositive/columnstats_tbllvl.q 
07cc959ec6bb9f5e1f1ba3184da5ce3fc8bf951b 
  ql/src/test/results/clientpositive/alter_partition_update_status.q.out 
7e33a7e8ecc07b2b9d8ab8b0f12b383e5a442cf3 
  ql/src/test/results/clientpositive/alter_table_update_status.q.out 
361359807b9c8c1a2e25a629fd53f026e02957e7 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out 
a86c5fbd718a490c5973382810cc02c10184925e 
  ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 
073b387c7b1d5f174928a110381faa851133c7ea 
  ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 
0ef04f5f4b70d84cd6118c2407d29fc555bac116 
  ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out 
a176d9a2a9db7eb848d1db022c4d99680db1b62c 
  ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out 
933d24b6cde934faa013f1ea102b9b753a7d44a4 

Diff: https://reviews.apache.org/r/32254/diff/


Testing
---

1. Performed some manullay tests
2. Added new qtests for those cases
3. Precommit build


Thanks,

Chaoyu Tang

[jira] [Created] (HIVE-10026) LLAP: AM should get notifications on daemons going down or restarting

2015-03-19 Thread Siddharth Seth (JIRA)

Siddharth Seth created HIVE-10026:
-

 Summary: LLAP: AM should get notifications on daemons going down 
or restarting
 Key: HIVE-10026
 URL: https://issues.apache.org/jira/browse/HIVE-10026
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth


There's lost state otherwise, which can cause queries to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 32268: HIVE-9998 Vectorization support for interval types

2015-03-19 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32268/
---

Review request for hive, Ashutosh Chauhan and Matt McCline.


Bugs: HIVE-9998
https://issues.apache.org/jira/browse/HIVE-9998


Repository: hive-git


Description
---

Enables support for vectorized interval types
This also fixes some vectorized comparisons for Date, when one of the left or 
right side is a constant expression.


Diffs
-

  ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java 375c173 
  common/src/java/org/apache/hive/common/util/DateTimeMath.java 28030e6 
  common/src/java/org/apache/hive/common/util/DateUtils.java b4159d3 
  common/src/test/org/apache/hive/common/util/TestDateTimeMath.java 4886576 
  
ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumnWithConvert.txt
 PRE-CREATION 
  
ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticScalarWithConvert.txt
 PRE-CREATION 
  ql/src/gen/vectorization/ExpressionTemplates/ColumnUnaryMinus.txt 6bf6def 
  
ql/src/gen/vectorization/ExpressionTemplates/DTIColumnArithmeticDTIColumnNoConvert.txt
 PRE-CREATION 
  
ql/src/gen/vectorization/ExpressionTemplates/DTIColumnArithmeticDTIScalarNoConvert.txt
 PRE-CREATION 
  ql/src/gen/vectorization/ExpressionTemplates/DTIColumnCompareScalar.txt 
PRE-CREATION 
  
ql/src/gen/vectorization/ExpressionTemplates/DTIScalarArithmeticDTIColumnNoConvert.txt
 PRE-CREATION 
  ql/src/gen/vectorization/ExpressionTemplates/DTIScalarCompareColumn.txt 
PRE-CREATION 
  
ql/src/gen/vectorization/ExpressionTemplates/DateTimeColumnArithmeticIntervalColumnWithConvert.txt
 PRE-CREATION 
  
ql/src/gen/vectorization/ExpressionTemplates/DateTimeColumnArithmeticIntervalScalarWithConvert.txt
 PRE-CREATION 
  
ql/src/gen/vectorization/ExpressionTemplates/DateTimeScalarArithmeticIntervalColumnWithConvert.txt
 PRE-CREATION 
  ql/src/gen/vectorization/ExpressionTemplates/FilterDTIColumnCompareScalar.txt 
PRE-CREATION 
  ql/src/gen/vectorization/ExpressionTemplates/FilterDTIScalarCompareColumn.txt 
PRE-CREATION 
  
ql/src/gen/vectorization/ExpressionTemplates/IntervalColumnArithmeticDateTimeColumnWithConvert.txt
 PRE-CREATION 
  
ql/src/gen/vectorization/ExpressionTemplates/IntervalColumnArithmeticDateTimeScalarWithConvert.txt
 PRE-CREATION 
  
ql/src/gen/vectorization/ExpressionTemplates/IntervalScalarArithmeticDateTimeColumnWithConvert.txt
 PRE-CREATION 
  
ql/src/gen/vectorization/ExpressionTemplates/ScalarArithmeticColumnWithConvert.txt
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/TimestampUtils.java 352e43e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
 c915f72 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExpressionDescriptor.java
 bb18b32 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
5201c57 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java 
e304cf8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 
88ec2b2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToIntervalDayTime.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToIntervalYearMonth.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpression.java
 d7ace6d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java
 94a47e0 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTBuilder.java
 10bf2bd 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
aca4273 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPDTIMinus.java 
a32c133 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPDTIPlus.java 
9a5c3a9 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPEqual.java 
3870b51 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPEqualOrGreaterThan.java
 65e1835 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPEqualOrLessThan.java
 3e4a1d2 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPGreaterThan.java 
df7a857 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPLessThan.java 
fafd99b 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPMinus.java 
18fbb5a 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNotEqual.java 
0436488 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPPlus.java 
bfac5a8 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToIntervalDayTime.java
 89c3988 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToIntervalYearMonth.java
 5c05655 
  ql/src/java/org/apache/hadoop/hive/ql/util/DateTimeMath.java PRE-CREATION

[jira] [Created] (HIVE-10025) LLAP: Queued work times out

2015-03-19 Thread Siddharth Seth (JIRA)

Siddharth Seth created HIVE-10025:
-

 Summary: LLAP: Queued work times out
 Key: HIVE-10025
 URL: https://issues.apache.org/jira/browse/HIVE-10025
 Project: Hive
  Issue Type: Improvement
Reporter: Siddharth Seth


If a daemon holds a task in queue for a long time, it'll eventually time out - 
but isn't removed from the queue. Ideally, it shouldn't be allowed to time out. 
Otherwise, handle the timeout so that the task doesn't run - or starts and 
fails - likely a change in the TaskCommunicator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-03-19 Thread Dong Chen (JIRA)

Dong Chen created HIVE-10016:


 Summary: Remove duplicated Hive table schema parsing in 
DataWritableReadSupport
 Key: HIVE-10016
 URL: https://issues.apache.org/jira/browse/HIVE-10016
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen


In {{DataWritableReadSupport.init()}}, the table schema is created and its 
string format is set in conf. When construct the {{ParquetRecordReaderWrapper}} 
, the schema is fetched from conf and parsed several times.

We could remove these schema parsing, and improve the speed of getRecordReader  
a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10017) SparkTask log improvement [Spark Branch]

2015-03-19 Thread Chinna Rao Lalam (JIRA)

Chinna Rao Lalam created HIVE-10017:
---

 Summary: SparkTask log improvement [Spark Branch]
 Key: HIVE-10017
 URL: https://issues.apache.org/jira/browse/HIVE-10017
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chinna Rao Lalam
Priority: Minor
 Fix For: spark-branch


Initialize log object in the own class for better log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 31800: HIVE-9658 Reduce parquet memory use by bypassing java primitive objects on ETypeConverter

2015-03-19 Thread cheng xu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31800/#review77168
---


Hi Sergio, thank you for your update. Just few more minor suggestions.


ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
https://reviews.apache.org/r/31800/#comment125044

Is that possible to pass down the switch block into 
buildObjectAssignMethod? In current implement, we already do this kind of thing 
in that method.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveCollectionConverter.java
https://reviews.apache.org/r/31800/#comment125045

How about moving this code block to the top of the class definition?
Object objs[] = new Object[] { null, null };



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java
https://reviews.apache.org/r/31800/#comment125046

Arrrays.fill(elements,null)



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java.orig
https://reviews.apache.org/r/31800/#comment125047

Please remove this file. It shouldn't be committed into trunk.



ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java
https://reviews.apache.org/r/31800/#comment125048

Please only import needed packages.


- cheng xu


On March 10, 2015, 6:02 p.m., Sergio Pena wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31800/
 ---
 
 (Updated March 10, 2015, 6:02 p.m.)
 
 
 Review request for hive, Ryan Blue and cheng xu.
 
 
 Bugs: HIVE-9658
 https://issues.apache.org/jira/browse/HIVE-9658
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This patch bypasses primitive java objects to hive object inspectors without 
 using primitive Writable objects.
 It helps to reduce memory usage.
 
 I did not bypass other complex objects, such as binaries, decimal and 
 date/timestamp, because their Writable objects are needed in other parts of 
 the code,
 and creating them later takes more ops/s to do it. Better save time at the 
 beginning.
 
 
 Diffs
 -
 
   
 itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java
  4f6985cd13017ce37f4f0c100b16a27aa5b02f8b 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
  c915f728fc9b27da0fabefab5d8f5faa53640b78 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
  0391229723cc3ecef551fa44b8456b0d2ac93fb5 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java
  d7edd52614771857d1b21971a66894841c248ef9 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ConverterParent.java 
 6ff6b473c9f1867bc14bb597094ddb92487cc954 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java
  a43661eb54ba29692c07c264584b5aecf648ef99 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
 3fc012970e23bbc188ce2a2e2ba0b04bc6f22317 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveCollectionConverter.java
  f1c8b6f13718b37f590263e5b35ed6c327f5cf4f 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java
  c6d03a19029d5bcc86b998dd7a8609973648c103 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java
  f95d15eddc21bc432fa53572de5756751a13341a 
   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/Repeated.java 
 ee57b31dac53d99af0c5a520f51102796ca32fd3 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
  57ae7a9740d55b407cadfc8bc030593b29f90700 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
  a26199612cf338e336f210f29acb0398c536e1f9 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java.orig
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
  49bf1c5325833993f4c09efdf1546af560783c28 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
  609188206f88e296d893b84bcaaab53f974e6b7d 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/DeepParquetHiveMapInspector.java
  143d72e76502d4877e8208181d9743259051dcea 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ObjectArrayWritableObjectInspector.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveArrayInspector.java
  bde0dcbb3978ba47b15ae2c9bbe2f87ed3984ab1 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
 7fd5e9612d4e3c9bf3b816bc48dbdbe59fb8a5a8

[jira] [Created] (HIVE-10030) The unit test of testNewInputFormat got failure

2015-03-19 Thread Yi Zhou (JIRA)

Yi Zhou created HIVE-10030:
--

 Summary: The unit test of testNewInputFormat got failure
 Key: HIVE-10030
 URL: https://issues.apache.org/jira/browse/HIVE-10030
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.14.0
Reporter: Yi Zhou


Running org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.126 sec  
FAILURE! - in org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat
testNewInputFormat(org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat)  
Time elapsed: 4.938 sec   FAILURE!
junit.framework.ComparisonFailure: expected:...in}, {1234, hat}], 
{[mauddib={1, mauddib}, chani={5, chani]}}, 2000-03-12 15:00... but 
was:...in}, {1234, hat}], {[chani={5, chani}, mauddib={1, mauddib]}}, 
2000-03-12 15:00...
at junit.framework.Assert.assertEquals(Assert.java:100)
at junit.framework.Assert.assertEquals(Assert.java:107)
at 
org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat.testNewInputFormat(TestNewInputOutputFormat.java:125)

For reproduce this test case, run mvn test 
-Dtest=TestNewInputOutputFormat#testNewInputFormat -Phadoop-2






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: ORC separate project

2015-03-19 Thread Nick Dimiduk

This is a great plan, +1!

On Thursday, March 19, 2015, Owen O'Malley omal...@apache.org wrote:

 All,
Over the last year, there has been a fair number of projects that want
 to integrate with ORC, but don't want a dependence on Hive's exec jar.
 Additionally, we've been working on a C++ reader (and soon writer) and it
 would be great to host them both in the same project. Toward that end, I'd
 like to create a separate ORC project at Apache. There will be lots of
 technical details to work out, but I wanted to give the Hive community a
 chance to discuss it. Do any of the Hive committers want to be included on
 the proposal?

 Of the current Hive committers, my list looks like:
 * Alan
 * Gunther
 * Prasanth
 * Lefty
 * Owen
 * Sergey
 * Gopal
 * Kevin

 Did I miss anyone?

 Thanks!
Owen

[jira] [Created] (HIVE-10029) LLAP: Scheduling of work from different queries within the daemon

2015-03-19 Thread Siddharth Seth (JIRA)

Siddharth Seth created HIVE-10029:
-

 Summary: LLAP: Scheduling of work from different queries within 
the daemon
 Key: HIVE-10029
 URL: https://issues.apache.org/jira/browse/HIVE-10029
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
 Fix For: llap


The current implementation is a simple queue - whichever query wins the race to 
submit work to a daemon will execute first.
A policy around this may be useful - potentially a fair share, or a first query 
in gets all slots approach.
Also, prioritiy associated with work within a query should be considered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10027) Use descriptions from Avro schema files in column comments

2015-03-19 Thread Jeremy Beard (JIRA)

Jeremy Beard created HIVE-10027:
---

 Summary: Use descriptions from Avro schema files in column comments
 Key: HIVE-10027
 URL: https://issues.apache.org/jira/browse/HIVE-10027
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Jeremy Beard
Priority: Minor


Avro schema files can include field descriptions using the doc tag. It would 
be helpful if the Hive metastore would use these descriptions as the comments 
for a field when the table is backed by such a schema file, instead of the 
default from deserializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10028) LLAP: Create a fixed size execution queue for daemons

2015-03-19 Thread Siddharth Seth (JIRA)

Siddharth Seth created HIVE-10028:
-

 Summary: LLAP: Create a fixed size execution queue for daemons
 Key: HIVE-10028
 URL: https://issues.apache.org/jira/browse/HIVE-10028
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
 Fix For: llap


Currently, this is unbounded. This should be a configurable size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 32189: HIVE-9859 Create bitwise left/right shift UDFs

2015-03-19 Thread Alexander Pivovarov


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32189/
---

(Updated March 20, 2015, 5:24 a.m.)


Review request for hive and Jason Dere.


Changes
---

I changed bitwise shift operators to -, -, - because  conflicts with 
nested complex type declaration, e.g. ARRAYMAPSTRING,STRING

I found that HiveParser.g has the following declaration for array type.

: KW_ARRAY LESSTHAN type GREATERTHAN   - ^(TOK_LIST type)

I think it uses LESSTHAN and GREATERTHAN in array type declaration to avoid 
conflicts with   operators

I did not find a way how to do this trick with  


Bugs: HIVE-9859
https://issues.apache.org/jira/browse/HIVE-9859


Repository: hive-git


Description
---

HIVE-9859 Create bitwise left/right shift UDFs


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
fdbfab9e1c4f098766f58e2d07653a44f45d3350 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 
e7de6c86a3c7a674b54f3678b00f34f2dd903dc8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 
d2d998972b64a19bde28cf176b3f948c00ba492a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 
0a05cebf1f71bd32c8023cdb10c8393a0d871cc2 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPBitLeftShift.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPBitRightShift.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPBitUnsignedRightShift.java 
PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_bitwise_left_shift.q PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_bitwise_right_shift.q PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_bitwise_unsigned_right_shift.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out 
81abeb9be4fd47724be544c7bc8da8b25fcd6e75 
  ql/src/test/results/clientpositive/udf_bitwise_left_shift.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/udf_bitwise_right_shift.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/udf_bitwise_unsigned_right_shift.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/32189/diff/


Testing
---


Thanks,

Alexander Pivovarov

[jira] [Created] (HIVE-10031) Modify the using of jobConf variable in ParquetRecordReaderWrapper constructor

2015-03-19 Thread Dong Chen (JIRA)

Dong Chen created HIVE-10031:


 Summary: Modify the using of jobConf variable in 
ParquetRecordReaderWrapper constructor
 Key: HIVE-10031
 URL: https://issues.apache.org/jira/browse/HIVE-10031
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen


In {{ParquetRecordReaderWrapper}} constructor, it create splits, set 
projections and filters in conf, create task context, and then create Parquet 
record reader. In this procedure, we could improve the logic of conf usage:
1. the clone of jobConf is not necessary. This could speed up getRecordReader a 
little.
2. the updated jobConf is not passed to Parquet in one case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10021) Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled

2015-03-19 Thread Richard Williams (JIRA)

Richard Williams created HIVE-10021:
---

 Summary: Alter index rebuild statements submitted through 
HiveServer2 fail when Sentry is enabled
 Key: HIVE-10021
 URL: https://issues.apache.org/jira/browse/HIVE-10021
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Indexing
Affects Versions: 0.13.1
 Environment: CDH 5.3.2
Reporter: Richard Williams


When HiveServer2 is configured to authorize submitted queries and statements 
through Sentry, any attempt to issue an alter index rebuild statement fails 
with a SemanticException caused by a NullPointerException. This occurs 
regardless of whether the index is a compact or bitmap index. 

The root cause of the problem appears to be the fact that the static 
createRootTask function in org.apache.hadoop.hive.ql.optimizer.IndexUtils 
creates a new 
org.apache.hadoop.hive.ql.Driver object to compile the index builder query, and 
this new Driver object, unlike the one used by HiveServer2 to compile the 
submitted statement, is used without having its userName field initialized 
with the submitting user's username. Adding null checks to the Sentry code is 
insufficient to solve this problem, because Sentry needs the userName to 
determine whether or not the submitting user should be able to execute the 
index rebuild statement.

Example stack trace from the HiveServer2 logs:

FAILED: NullPointerException null
java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
at org.apache.hadoop.security.Groups.getGroups(Groups.java:161)
at 
org.apache.sentry.provider.common.HadoopGroupMappingService.getGroups(HadoopGroupMappingService.java:46)
at 
org.apache.sentry.binding.hive.authz.HiveAuthzBinding.getGroups(HiveAuthzBinding.java:370)
at 
org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:314)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440)
at 
org.apache.hadoop.hive.ql.optimizer.IndexUtils.createRootTask(IndexUtils.java:258)
at 
org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler.getIndexBuilderMapRedTask(CompactIndexHandler.java:149)
at 
org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.generateIndexBuildTaskList(TableBasedIndexHandler.java:67)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getIndexBuilderMapRed(DDLSemanticAnalyzer.java:1171)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterIndexRebuild(DDLSemanticAnalyzer.java:1117)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:410)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:204)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1019)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100)
at 
org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:173)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:370)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:357)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:238)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:393)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
at 
org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:99)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:479)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at

[jira] [Created] (HIVE-10020) Add Month() and Second() buildin functions

2015-03-19 Thread Alicia Ying Shu (JIRA)

Alicia Ying Shu created HIVE-10020:
--

 Summary: Add Month() and Second() buildin functions
 Key: HIVE-10020
 URL: https://issues.apache.org/jira/browse/HIVE-10020
 Project: Hive
  Issue Type: Bug
Reporter: Alicia Ying Shu


From Oracle doc: Month(date) and Second(date). Very similar to Year(date) 
buildin. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 32254: HIVE-10007:Support qualified table name in analyze table compute statistics for columns

2015-03-19 Thread Chaoyu Tang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32254/
---

Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-10007
https://issues.apache.org/jira/browse/HIVE-10007


Repository: hive-git


Description
---

Currently Currently analyze table compute statistics for columns command can 
not compute column stats for a table in a different database since it does not 
support qualified table name. You need switch to that table database in order 
to compute its column stats. So does it for ALTER TABLE .. UPDATE STATISTICS 
FOR COLUMN command.
This JIRA will provide the support to qualified table name in analyze and 
update column stats commands.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 
e2f696eea7b472d4e417629eedd33406b41caf98 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java 
28c17b75356eb4ffaac2193a80b27bdd67d98009 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
c83523e507f1003eff1a821c216a7f6904f293ed 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
b9e15a1b2f60b291759444a4407acaf1e0d9c515 
  ql/src/test/queries/clientpositive/alter_partition_update_status.q 
1eee9a506f3854eace7deaa3f4ec02c1f7bd3479 
  ql/src/test/queries/clientpositive/alter_table_update_status.q 
fd45cd426c17a7bfa1371081dd81ff6eb1b7353d 
  ql/src/test/queries/clientpositive/columnstats_partlvl.q 
82a9e0f0fef0c9bf909028f23ccd47ee07a61a2b 
  ql/src/test/queries/clientpositive/columnstats_tbllvl.q 
07cc959ec6bb9f5e1f1ba3184da5ce3fc8bf951b 
  ql/src/test/results/clientpositive/alter_partition_update_status.q.out 
7e33a7e8ecc07b2b9d8ab8b0f12b383e5a442cf3 
  ql/src/test/results/clientpositive/alter_table_update_status.q.out 
361359807b9c8c1a2e25a629fd53f026e02957e7 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out 
a86c5fbd718a490c5973382810cc02c10184925e 
  ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 
0ef04f5f4b70d84cd6118c2407d29fc555bac116 

Diff: https://reviews.apache.org/r/32254/diff/


Testing
---

1. Performed some manullay tests
2. Added new qtests for those cases
3. Precommit build


Thanks,

Chaoyu Tang

Re: Review Request 31386: HIVE-9555 assorted ORC refactorings for LLAP on trunk

2015-03-19 Thread Sergey Shelukhin



 On March 18, 2015, 8:14 a.m., Gopal V wrote:
  common/src/java/org/apache/hadoop/hive/common/DiskRange.java, line 78
  https://reviews.apache.org/r/31386/diff/3/?file=890915#file890915line78
 
  Bad behaviour - the original DiskRange was written with final variables 
  for easier debugging.

actually only BC had finals, not DR. I will hide the fields and make 
modifications restricted in scope


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31386/#review76879
---


On March 11, 2015, 12:50 a.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31386/
 ---
 
 (Updated March 11, 2015, 12:50 a.m.)
 
 
 Review request for hive and Prasanth_J.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 see jira
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/common/DiskRange.java PRE-CREATION 
   common/src/java/org/apache/hadoop/hive/common/DiskRangeList.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java 5e2d880 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 9788c16 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java 62c6f8d 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/MetadataReader.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 25bb15a 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 498ee14 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java 3daa9ba 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java f85c21b 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java 03f8085 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 458ad21 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderUtils.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java 
 4057036 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcInputFormat.java 
 23e5f27 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 79dc5a1 
   ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentFactory.java 
 f4a2e65 
   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java 0ea4a7b 
   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
 2cc3d7a 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java
  591ec3f 
   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java cd1d645 
   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java 
 326dde4 
 
 Diff: https://reviews.apache.org/r/31386/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin

Re: Review Request 31386: HIVE-9555 assorted ORC refactorings for LLAP on trunk

2015-03-19 Thread Sergey Shelukhin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31386/
---

(Updated March 19, 2015, 9:12 p.m.)


Review request for hive and Prasanth_J.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  b/common/src/java/org/apache/hadoop/hive/common/DiskRangeList.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/DiskRange.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/DiskRangeList.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java 5e2d880 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 9788c16 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java 62c6f8d 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/MetadataReader.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 25bb15a 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 498ee14 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java 3daa9ba 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java f85c21b 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java 03f8085 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 458ad21 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java 
4057036 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcInputFormat.java 
23e5f27 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 79dc5a1 
  ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentFactory.java 
f4a2e65 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java 0ea4a7b 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
2cc3d7a 
  
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java 
591ec3f 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java cd1d645 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java 
326dde4 

Diff: https://reviews.apache.org/r/31386/diff/


Testing
---


Thanks,

Sergey Shelukhin

[jira] [Created] (HIVE-10023) Fix more cache related concurrency issue

2015-03-19 Thread Jimmy Xiang (JIRA)

Jimmy Xiang created HIVE-10023:
--

 Summary: Fix more cache related concurrency issue
 Key: HIVE-10023
 URL: https://issues.apache.org/jira/browse/HIVE-10023
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Searched the code and found couple more issues with cache, such as
LazyBinaryObjectInspectorFactory, PrimitiveObjectInspectorFactory, and 
TypeInfoFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10022) DFS in authorization might take too long

2015-03-19 Thread Pankit Thapar (JIRA)

Pankit Thapar created HIVE-10022:


 Summary: DFS in authorization might take too long
 Key: HIVE-10022
 URL: https://issues.apache.org/jira/browse/HIVE-10022
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 0.14.0
Reporter: Pankit Thapar


I am testing a query like : 

set hive.test.authz.sstd.hs2.mode=true;
set 
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
set 
hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
set hive.security.authorization.enabled=true;
set user.name=user1;
create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');

Now, in the above query,  since authorization is true, 
we would end up calling doAuthorizationV2() which ultimately ends up calling 
SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
of the object we are trying to authorize if the object does not exist. 

The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.

Now assume, we have a path as a/b/c/d that we are trying to authorize.
In case, a/b/c/d does not exist, we would call 
FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c also 
does not exist.
If under the subtree at a/b, we have millions of files, then 
FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
permission on each of those objects. 

I do not completely understand why do we have to check for file permissions in 
all the objects in  branch of the tree that we are not  trying to read from 
/write to.  
We could have checked file permission on the ancestor that exists and if it 
matches what we expect, the return true.

Please confirm if this is a bug so that I can submit a patch else let me know 
what I am missing ?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10024) LLAP: q file test is broken again

2015-03-19 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-10024:
---

 Summary: LLAP: q file test is broken again
 Key: HIVE-10024
 URL: https://issues.apache.org/jira/browse/HIVE-10024
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

ORC separate project

2015-03-19 Thread Owen O'Malley

All,
   Over the last year, there has been a fair number of projects that want
to integrate with ORC, but don't want a dependence on Hive's exec jar.
Additionally, we've been working on a C++ reader (and soon writer) and it
would be great to host them both in the same project. Toward that end, I'd
like to create a separate ORC project at Apache. There will be lots of
technical details to work out, but I wanted to give the Hive community a
chance to discuss it. Do any of the Hive committers want to be included on
the proposal?

Of the current Hive committers, my list looks like:
* Alan
* Gunther
* Prasanth
* Lefty
* Owen
* Sergey
* Gopal
* Kevin

Did I miss anyone?

Thanks!
   Owen

Re: Reading 2 table data in MapReduce for Performing Join

2015-03-19 Thread Suraj Nayak

Hi All,

I was successfully able to integrate HCatMultipleInputs with the patch for
the tables created with TEXTFILE. But I get error when I read table created
with ORC file. The error is below :

15/03/19 10:51:32 INFO mapreduce.Job: Task Id :
attempt_1425012118520_9756_m_00_0, Status : FAILED
Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable
cannot be cast to org.apache.hadoop.io.LongWritable
at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)


Can anyone help?

Thanks in advance!

On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak snay...@gmail.com wrote:

 Hi All,

 https://issues.apache.org/jira/browse/HIVE-4997 patch helped!


 On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote:

 Hi,

 I tried reading data via HCatalog for 1 Hive table in MapReduce using
 something similar to
 https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
 I was able to read successfully.

 Now am trying to read 2 tables, as the requirement is to join 2 tables. I
 did not find API similar to *FileInputFormat.addInputPaths* in
 *HCatInputFormat*. What is the equivalent of the same in HCat ?

 I had performed join using FilesInputFormat in HDFS(by getting split
 information in mapper). This article(
 http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join.
 http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone
 suggest how I can perform join operation using HCatalog ?

 Briefly, the aim is to

- Read 2 tables (almost similar schema)
- If key exists in both the table send it to same reducer.
- Do some processing on the records in reducer.
- Save the output into file/Hive table.

 *P.S : The reason for using MapReduce to perform join is because of
 complex requirement which can't be solved via Hive/Pig directly. *

 Any help will be greatly appreciated :)

 --
 Thanks
 Suraj Nayak M




 --
 Thanks
 Suraj Nayak M




-- 
Thanks
Suraj Nayak M

Re: Reading 2 table data in MapReduce for Performing Join

2015-03-19 Thread Suraj Nayak

Is this related to https://issues.apache.org/jira/browse/HIVE-4329 ? Is
there a workaround?

On Thu, Mar 19, 2015 at 9:47 PM, Suraj Nayak snay...@gmail.com wrote:

 Hi All,

 I was successfully able to integrate HCatMultipleInputs with the patch for
 the tables created with TEXTFILE. But I get error when I read table created
 with ORC file. The error is below :

 15/03/19 10:51:32 INFO mapreduce.Job: Task Id :
 attempt_1425012118520_9756_m_00_0, Status : FAILED
 Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable
 cannot be cast to org.apache.hadoop.io.LongWritable
 at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)


 Can anyone help?

 Thanks in advance!

 On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak snay...@gmail.com wrote:

 Hi All,

 https://issues.apache.org/jira/browse/HIVE-4997 patch helped!


 On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote:

 Hi,

 I tried reading data via HCatalog for 1 Hive table in MapReduce using
 something similar to
 https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
 I was able to read successfully.

 Now am trying to read 2 tables, as the requirement is to join 2 tables.
 I did not find API similar to *FileInputFormat.addInputPaths* in
 *HCatInputFormat*. What is the equivalent of the same in HCat ?

 I had performed join using FilesInputFormat in HDFS(by getting split
 information in mapper). This article(
 http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join.
 http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone
 suggest how I can perform join operation using HCatalog ?

 Briefly, the aim is to

- Read 2 tables (almost similar schema)
- If key exists in both the table send it to same reducer.
- Do some processing on the records in reducer.
- Save the output into file/Hive table.

 *P.S : The reason for using MapReduce to perform join is because of
 complex requirement which can't be solved via Hive/Pig directly. *

 Any help will be greatly appreciated :)

 --
 Thanks
 Suraj Nayak M




 --
 Thanks
 Suraj Nayak M




 --
 Thanks
 Suraj Nayak M




-- 
Thanks
Suraj Nayak M

[jira] [Created] (HIVE-10018) Activating SQLStandardAuth results in NPE [hbase-metastore branch]

2015-03-19 Thread Alan Gates (JIRA)

Alan Gates created HIVE-10018:
-

 Summary: Activating SQLStandardAuth results in NPE 
[hbase-metastore branch]
 Key: HIVE-10018
 URL: https://issues.apache.org/jira/browse/HIVE-10018
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: hbase-metastore-branch
Reporter: Alan Gates
Assignee: Alan Gates


Setting the config to run SQLStandardAuth and then doing even simple SQL 
statements results in an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10019) Configure jenkins precommit tests to run HMS upgrade tests

2015-03-19 Thread JIRA

Sergio Peña created HIVE-10019:
--

 Summary: Configure jenkins precommit tests to run HMS upgrade tests
 Key: HIVE-10019
 URL: https://issues.apache.org/jira/browse/HIVE-10019
 Project: Hive
  Issue Type: Task
Reporter: Sergio Peña
Assignee: Sergio Peña


This task is created to configure all jenkins precommit jobs with different 
branches to run HMS upgrade tests if there are changes on the metastore upgrade 
script.

These tests are already created, so this task is only for final jenkins 
configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: ORC separate project

Re: Review Request 32254: HIVE-10007:Support qualified table name in analyze table compute statistics for columns

[jira] [Created] (HIVE-10026) LLAP: AM should get notifications on daemons going down or restarting

Review Request 32268: HIVE-9998 Vectorization support for interval types

[jira] [Created] (HIVE-10025) LLAP: Queued work times out

[jira] [Created] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

[jira] [Created] (HIVE-10017) SparkTask log improvement [Spark Branch]

Re: Review Request 31800: HIVE-9658 Reduce parquet memory use by bypassing java primitive objects on ETypeConverter

[jira] [Created] (HIVE-10030) The unit test of testNewInputFormat got failure

Re: ORC separate project

[jira] [Created] (HIVE-10029) LLAP: Scheduling of work from different queries within the daemon

[jira] [Created] (HIVE-10027) Use descriptions from Avro schema files in column comments

[jira] [Created] (HIVE-10028) LLAP: Create a fixed size execution queue for daemons

Re: Review Request 32189: HIVE-9859 Create bitwise left/right shift UDFs

[jira] [Created] (HIVE-10031) Modify the using of jobConf variable in ParquetRecordReaderWrapper constructor

[jira] [Created] (HIVE-10021) Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled

[jira] [Created] (HIVE-10020) Add Month() and Second() buildin functions

Review Request 32254: HIVE-10007:Support qualified table name in analyze table compute statistics for columns

Re: Review Request 31386: HIVE-9555 assorted ORC refactorings for LLAP on trunk

Re: Review Request 31386: HIVE-9555 assorted ORC refactorings for LLAP on trunk

[jira] [Created] (HIVE-10023) Fix more cache related concurrency issue

[jira] [Created] (HIVE-10022) DFS in authorization might take too long

[jira] [Created] (HIVE-10024) LLAP: q file test is broken again

ORC separate project

Re: Reading 2 table data in MapReduce for Performing Join

Re: Reading 2 table data in MapReduce for Performing Join

[jira] [Created] (HIVE-10018) Activating SQLStandardAuth results in NPE [hbase-metastore branch]

[jira] [Created] (HIVE-10019) Configure jenkins precommit tests to run HMS upgrade tests

28 matches

Site Navigation

Mail list logo

Footer information