[jira] [Commented] (HIVE-11693) CommonMergeJoinOperator throws exception with tez

2016-05-26 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302701#comment-15302701
 ] 

Selina Zhang commented on HIVE-11693:
-

[~rajesh.balamohan], we hit the same issue recently. But I think the patch you 
attached did not fix the root problem. 

The issue is actually CommonMergeJoinOperator only set big table position when 
it has inputs for big table. 

{code:title=CommonMergeJoinOperator.java}
  @Override
  public void process(Object row, int tag) throws HiveException {
posBigTable = (byte) conf.getBigTablePosition();
...
{code}

If the input is empty, the above method will not be called. In the query you 
listed, a subquery is involved. The generated table is tagged as 0, while the 
left table is tagged as 1.  GenTezWork.java set the big table position as 1 for 
both reduce work and CommonJoinOperator. In reduce phase, when 
ReduceRecordProcessor got executed, it retrieves the record from big table:

{code:title=ReduceRecordProcessor.java}
@Override
  void run() throws Exception {

// run the operator pipeline
while (sources[bigTablePosition].pushRecord()) {
}
  }
{code}

The big table position here is 1. If the input from the big table is empty, 
this is the only place pushRecord() be called to read big table. However, 
because the CommonMergeJoinOperator missed set big table position, in closeOp() 
part, it will think tag 1 is small table, so another pushRecord() is called to 
retrieve table content. Then we see the exception listed in this JIRA. 

Please let me know if my analysis has problem. If you think it is correct, can 
you update the patch?

Thanks

> CommonMergeJoinOperator throws exception with tez
> -
>
> Key: HIVE-11693
> URL: https://issues.apache.org/jira/browse/HIVE-11693
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: HIVE-11693.1.patch
>
>
> Got this when executing a simple query with latest hive build + tez latest 
> version.
> {noformat}
> Error: Failure while running task: 
> attempt_1439860407967_0291_2_03_45_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators: 
> java.lang.RuntimeException: java.io.IOException: Please check if you are 
> invoking moveToNext() even after it returned false.
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: java.lang.RuntimeException: java.io.IOException: Please check if 
> you are invoking moveToNext() even after it returned false.
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:316)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
> ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.io.IOException: Please check if you are 
> invoking moveToNext() even after it returned false.
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:412)
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.doFirstFetchIfNeeded(CommonMergeJoinOperator.java:482)
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinFinalLeftData(CommonMergeJoinOperator.java:434)
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.closeOp(CommonMergeJoinOperator.java:384)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:616)
> at 
> 

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563668#comment-14563668
 ] 

Selina Zhang commented on HIVE-10841:
-

Seems this is introduced with Hive 0.13. Have tested with Hive 0.12 MR and got 
correct result.

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Alexander Pivovarov

 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col6 (type: int)
   1 id (type: int)
   

[jira] [Commented] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories

2015-05-27 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561450#comment-14561450
 ] 

Selina Zhang commented on HIVE-10809:
-

Thanks, [~swarnim]!

I have added a new patch to address the concern 1. For the concern 2, because 
we always do a drop table at the beginning of each test case, I think it may 
not be necessary to verify the table directory existed. 

I also modified the test testMultiPartColsInData() to make sure the multiple 
level partition keys case works. 

 HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
 --

 Key: HIVE-10809
 URL: https://issues.apache.org/jira/browse/HIVE-10809
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch


 When static partition is added through HCatStorer or HCatWriter
 {code}
 JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader();
 STORE JoinedData INTO 'selina.joined_events_e' USING 
 org.apache.hive.hcatalog.pig.HCatStorer('author=selina');
 {code}
 The table directory looks like
 {noformat}
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/author=selina
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories

2015-05-27 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561449#comment-14561449
 ] 

Selina Zhang commented on HIVE-10809:
-

Thanks, [~swarnim]!

I have added a new patch to address the concern 1. For the concern 2, because 
we always do a drop table at the beginning of each test case, I think it may 
not be necessary to verify the table directory existed. 

I also modified the test testMultiPartColsInData() to make sure the multiple 
level partition keys case works. 

 HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
 --

 Key: HIVE-10809
 URL: https://issues.apache.org/jira/browse/HIVE-10809
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch


 When static partition is added through HCatStorer or HCatWriter
 {code}
 JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader();
 STORE JoinedData INTO 'selina.joined_events_e' USING 
 org.apache.hive.hcatalog.pig.HCatStorer('author=selina');
 {code}
 The table directory looks like
 {noformat}
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/author=selina
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories

2015-05-27 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10809:

Attachment: HIVE-10809.3.patch

 HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
 --

 Key: HIVE-10809
 URL: https://issues.apache.org/jira/browse/HIVE-10809
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch, 
 HIVE-10809.3.patch


 When static partition is added through HCatStorer or HCatWriter
 {code}
 JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader();
 STORE JoinedData INTO 'selina.joined_events_e' USING 
 org.apache.hive.hcatalog.pig.HCatStorer('author=selina');
 {code}
 The table directory looks like
 {noformat}
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/author=selina
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories

2015-05-26 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10809:

Attachment: HIVE-10809.2.patch

The above unit test failures seem not relevant to this patch. 

Uploaded a new patch. Add verification in TestHCatStorer to verify the scratch 
directories are removed. 





 HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
 --

 Key: HIVE-10809
 URL: https://issues.apache.org/jira/browse/HIVE-10809
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch


 When static partition is added through HCatStorer or HCatWriter
 {code}
 JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader();
 STORE JoinedData INTO 'selina.joined_events_e' USING 
 org.apache.hive.hcatalog.pig.HCatStorer('author=selina');
 {code}
 The table directory looks like
 {noformat}
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/author=selina
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories

2015-05-23 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557498#comment-14557498
 ] 

Selina Zhang commented on HIVE-10809:
-

Thanks! Will do!

 HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
 --

 Key: HIVE-10809
 URL: https://issues.apache.org/jira/browse/HIVE-10809
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10809.1.patch


 When static partition is added through HCatStorer or HCatWriter
 {code}
 JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader();
 STORE JoinedData INTO 'selina.joined_events_e' USING 
 org.apache.hive.hcatalog.pig.HCatStorer('author=selina');
 {code}
 The table directory looks like
 {noformat}
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/author=selina
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories

2015-05-22 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10809:

Attachment: HIVE-10809.1.patch

 HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
 --

 Key: HIVE-10809
 URL: https://issues.apache.org/jira/browse/HIVE-10809
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10809.1.patch


 When static partition is added through HCatStorer or HCatWriter
 {code}
 JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader();
 STORE JoinedData INTO 'selina.joined_events_e' USING 
 org.apache.hive.hcatalog.pig.HCatStorer('author=selina');
 {code}
 The table directory looks like
 {noformat}
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798
 drwx--   - selinaz users  0 2015-05-22 21:19 
 /user/selinaz/joined_events_e/author=selina
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-05-21 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555069#comment-14555069
 ] 

Selina Zhang commented on HIVE-10729:
-

The above unit test failure seems not relevant to this patch. 

 Query failed when select complex columns from joinned table (tez map join 
 only)
 ---

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10729.1.patch, HIVE-10729.2.patch


 When map join happens, if projection columns include complex data types, 
 query will fail. 
 Steps to reproduce:
 {code:sql}
 hive set hive.auto.convert.join;
 hive.auto.convert.join=true
 hive desc foo;
 a arrayint
 hive select * from foo;
 [1,2]
 hive desc src_int;
 key   int
 value string
 hive select * from src_int where key=2;
 2val_2
 hive select * from foo join src_int src  on src.key = foo.a[1];
 {code}
 Query will fail with stack trace
 {noformat}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
   at 
 org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
   ... 23 more
 {noformat}
 Similar error when projection columns include a map:
 {code:sql}
 hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
 hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
 src LIMIT 1;
 hive select * from src join test where src.key=test.a;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-05-20 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10729:

Attachment: HIVE-10729.2.patch

added pre-size the array list to fields size

 Query failed when select complex columns from joinned table (tez map join 
 only)
 ---

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10729.1.patch, HIVE-10729.2.patch


 When map join happens, if projection columns include complex data types, 
 query will fail. 
 Steps to reproduce:
 {code:sql}
 hive set hive.auto.convert.join;
 hive.auto.convert.join=true
 hive desc foo;
 a arrayint
 hive select * from foo;
 [1,2]
 hive desc src_int;
 key   int
 value string
 hive select * from src_int where key=2;
 2val_2
 hive select * from foo join src_int src  on src.key = foo.a[1];
 {code}
 Query will fail with stack trace
 {noformat}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
   at 
 org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
   ... 23 more
 {noformat}
 Similar error when projection columns include a map:
 {code:sql}
 hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
 hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
 src LIMIT 1;
 hive select * from src join test where src.key=test.a;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-05-20 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552853#comment-14552853
 ] 

Selina Zhang commented on HIVE-10729:
-

Yes. Will do. 

 Query failed when select complex columns from joinned table (tez map join 
 only)
 ---

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10729.1.patch


 When map join happens, if projection columns include complex data types, 
 query will fail. 
 Steps to reproduce:
 {code:sql}
 hive set hive.auto.convert.join;
 hive.auto.convert.join=true
 hive desc foo;
 a arrayint
 hive select * from foo;
 [1,2]
 hive desc src_int;
 key   int
 value string
 hive select * from src_int where key=2;
 2val_2
 hive select * from foo join src_int src  on src.key = foo.a[1];
 {code}
 Query will fail with stack trace
 {noformat}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
   at 
 org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
   ... 23 more
 {noformat}
 Similar error when projection columns include a map:
 {code:sql}
 hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
 hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
 src LIMIT 1;
 hive select * from src join test where src.key=test.a;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-05-15 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10729:

Summary: Query failed when select complex columns from joinned table (tez 
map join only)  (was: Query failed when join table with complex types (tez map 
join only))

 Query failed when select complex columns from joinned table (tez map join 
 only)
 ---

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10729.1.patch


 When map join happens, if projection columns from the small table are complex 
 data types, query will fail. 
 Steps to reproduce:
 {code:sql}
 hive set hive.auto.convert.join;
 hive.auto.convert.join=true
 hive desc foo;
 a arrayint
 hive select * from foo;
 [1,2]
 hive desc src_int;
 key   int
 value string
 hive select * from src_int where key=2;
 2val_2
 hive select * from foo join src  on src.key = foo.a[1];
 {code}
 Query will fail with stack trace
 {noformat}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
   at 
 org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
   ... 23 more
 {noformat}
 Similar error when join on a map key:
 {code:sql}
 hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
 hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
 src LIMIT 1;
 hive select * from src join test where src.value=test.b[2];
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10729) Query failed when join table with complex types (tez map join only)

2015-05-15 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10729:

Summary: Query failed when join table with complex types (tez map join 
only)  (was: Query failed when join on an element in complex type (tez map join 
only))

 Query failed when join table with complex types (tez map join only)
 ---

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10729.1.patch


 Steps to reproduce:
 {code:sql}
 hive set hive.auto.convert.join;
 hive.auto.convert.join=true
 hive desc foo;
 a arrayint
 hive select * from foo;
 [1,2]
 hive desc src_int;
 key   int
 value string
 hive select * from src_int where key=2;
 2val_2
 hive select * from foo join src  on src.key = foo.a[1];
 {code}
 Query will fail with stack trace
 {noformat}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
   at 
 org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
   ... 23 more
 {noformat}
 Similar error when join on a map key:
 {code:sql}
 hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
 hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
 src LIMIT 1;
 hive select * from src join test where src.value=test.b[2];
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-05-15 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10729:

Description: 
When map join happens, if projection columns are complex data types, query will 
fail. 

Steps to reproduce:
{code:sql}
hive set hive.auto.convert.join;
hive.auto.convert.join=true
hive desc foo;
a   arrayint
hive select * from foo;
[1,2]
hive desc src_int;
key int
value   string
hive select * from src_int where key=2;
2  val_2
hive select * from foo join src  on src.key = foo.a[1];
{code}
Query will fail with stack trace

{noformat}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
[Ljava.lang.Object;
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
... 23 more
{noformat}

Similar error when join on a map key:
{code:sql}
hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
src LIMIT 1;
hive select * from src join test where src.key=test.a;
{code}




  was:
When map join happens, if projection columns from the small table are complex 
data types, query will fail. 

Steps to reproduce:
{code:sql}
hive set hive.auto.convert.join;
hive.auto.convert.join=true
hive desc foo;
a   arrayint
hive select * from foo;
[1,2]
hive desc src_int;
key int
value   string
hive select * from src_int where key=2;
2  val_2
hive select * from foo join src  on src.key = foo.a[1];
{code}
Query will fail with stack trace

{noformat}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
[Ljava.lang.Object;
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
... 23 more
{noformat}

Similar error when join on a map key:
{code:sql}
hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
src LIMIT 1;
hive select * from src join test where src.value=test.b[2];
{code}





 Query failed when select complex columns from joinned table (tez map join 
 only)
 ---

 Key: HIVE-10729
 URL: 

[jira] [Updated] (HIVE-10729) Query failed when join on an element in complex type (tez map join only)

2015-05-15 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10729:

Description: 
Steps to reproduce:
{code:sql}
hive set hive.auto.convert.join;
hive.auto.convert.join=true
hive desc foo;
a   arrayint
hive select * from foo;
[1,2]
hive desc src_int;
key int
value   string
hive select * from src_int where key=2;
2  val_2
hive select * from foo join src  on src.key = foo.a[1];
{code}
Query will fail with stack trace

{noformat}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
[Ljava.lang.Object;
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
... 23 more
{noformat}

Similar error when join on a map key:
{code:sql}
hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
src LIMIT 1;
hive select * from src join test where src.value=test.b[2];
{code}




 Query failed when join on an element in complex type (tez map join only)
 

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang

 Steps to reproduce:
 {code:sql}
 hive set hive.auto.convert.join;
 hive.auto.convert.join=true
 hive desc foo;
 a arrayint
 hive select * from foo;
 [1,2]
 hive desc src_int;
 key   int
 value string
 hive select * from src_int where key=2;
 2val_2
 hive select * from foo join src  on src.key = foo.a[1];
 {code}
 Query will fail with stack trace
 {noformat}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
   at 
 org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
   ... 23 more
 {noformat}
 Similar error when join on a map key:
 {code:sql}
 hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
 hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
 src LIMIT 1;
 hive select * from src join test where 

[jira] [Commented] (HIVE-10308) Vectorization execution throws java.lang.IllegalArgumentException: Unsupported complex type: MAP

2015-05-14 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544611#comment-14544611
 ] 

Selina Zhang commented on HIVE-10308:
-

[~mmccline] Thanks for reviewing this! Which patch fixed this problem? 

 Vectorization execution throws java.lang.IllegalArgumentException: 
 Unsupported complex type: MAP
 

 Key: HIVE-10308
 URL: https://issues.apache.org/jira/browse/HIVE-10308
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0, 0.13.1, 1.2.0, 1.1.0
Reporter: Selina Zhang
Assignee: Matt McCline
 Attachments: HIVE-10308.1.patch


 Steps to reproduce:
 {code:sql}
 CREATE TABLE test_orc (a INT, b MAPINT, STRING) STORED AS ORC;
 INSERT OVERWRITE TABLE test_orc SELECT 1, MAP(1, one, 2, two) FROM src 
 LIMIT 1;
 CREATE TABLE test(key INT) ;
 INSERT OVERWRITE TABLE test SELECT 1 FROM src LIMIT 1;
 set hive.vectorized.execution.enabled=true;
 set hive.auto.convert.join=false;
 select l.key from test l left outer join test_orc r on (l.key= r.a) where r.a 
 is not null;
 {code}
 Stack trace:
 {noformat}
 Caused by: java.lang.IllegalArgumentException: Unsupported complex type: MAP
   at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:456)
   at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1191)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:58)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:198)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-10 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537540#comment-14537540
 ] 

Selina Zhang commented on HIVE-10036:
-

The above two unit test failures seem irrelevant to this patch. 

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, 
 HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-10 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.9.patch

Fixed the unit tests. 

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, 
 HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-07 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.8.patch

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, 
 HIVE-10036.7.patch, HIVE-10036.8.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-05 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.8.patch

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, 
 HIVE-10036.7.patch, HIVE-10036.8.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-03 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526023#comment-14526023
 ] 

Selina Zhang commented on HIVE-10036:
-

[~gopalv] Thank you! I included the io.netty to ql/pom.xml and uploaded a new 
patch. 

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, HIVE-10036.7.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-03 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.7.patch

Fixed ql/pom.xml

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, HIVE-10036.7.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10308) Vectorization execution throws java.lang.IllegalArgumentException: Unsupported complex type: MAP

2015-04-13 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493272#comment-14493272
 ] 

Selina Zhang commented on HIVE-10308:
-

Actually the test results show only 1 test failure and the failure is 33 days 
old. So should be irrelevant. 

 Vectorization execution throws java.lang.IllegalArgumentException: 
 Unsupported complex type: MAP
 

 Key: HIVE-10308
 URL: https://issues.apache.org/jira/browse/HIVE-10308
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0, 0.13.1, 1.2.0, 1.1.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10308.1.patch


 Steps to reproduce:
 
 CREATE TABLE test_orc (a INT, b MAPINT, STRING) STORED AS ORC;
 INSERT OVERWRITE TABLE test_orc SELECT 1, MAP(1, one, 2, two) FROM src 
 LIMIT 1;
 CREATE TABLE test(key INT) ;
 INSERT OVERWRITE TABLE test SELECT 1 FROM src LIMIT 1;
 set hive.vectorized.execution.enabled=true;
 set hive.auto.convert.join=false;
 select l.key from test l left outer join test_orc r on (l.key= r.a) where r.a 
 is not null;
 Stack trace:
 
 Caused by: java.lang.IllegalArgumentException: Unsupported complex type: MAP
   at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:456)
   at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1191)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:58)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:198)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-04-10 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.5.patch

Thanks Mithun and Prasanth! Uploaded modified patch. 

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10308) Vectorization execution throws java.lang.IllegalArgumentException: Unsupported complex type: MAP

2015-04-10 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10308:

Attachment: HIVE-10308.1.patch

 Vectorization execution throws java.lang.IllegalArgumentException: 
 Unsupported complex type: MAP
 

 Key: HIVE-10308
 URL: https://issues.apache.org/jira/browse/HIVE-10308
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0, 0.13.1, 1.2.0, 1.1.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10308.1.patch


 Steps to reproduce:
 
 CREATE TABLE test_orc (a INT, b MAPINT, STRING) STORED AS ORC;
 INSERT OVERWRITE TABLE test_orc SELECT 1, MAP(1, one, 2, two) FROM src 
 LIMIT 1;
 CREATE TABLE test(key INT) ;
 INSERT OVERWRITE TABLE test SELECT 1 FROM src LIMIT 1;
 set hive.vectorized.execution.enabled=true;
 set hive.auto.convert.join=false;
 select l.key from test l left outer join test_orc r on (l.key= r.a) where r.a 
 is not null;
 Stack trace:
 
 Caused by: java.lang.IllegalArgumentException: Unsupported complex type: MAP
   at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:456)
   at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1191)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:58)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:198)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10089) RCFile: lateral view explode caused ConcurrentModificationException

2015-03-27 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10089:

Attachment: HIVE-10089.1.patch

 RCFile: lateral view explode caused ConcurrentModificationException
 ---

 Key: HIVE-10089
 URL: https://issues.apache.org/jira/browse/HIVE-10089
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10089.1.patch


 CREATE TABLE test_table123 (a INT, b MAPSTRING, STRING) STORED AS RCFILE;
 INSERT OVERWRITE TABLE test_table123 SELECT 1, MAP(a1, b1, c1, d1) 
 FROM src LIMIT 1;
 The following query will lead to ConcurrentModificationException
 SELECT * FROM (SELECT b FROM test_table123) t1 LATERAL VIEW explode(b) x AS 
 b,c LIMIT 1;
 Failed with exception 
 java.io.IOException:java.util.ConcurrentModificationException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-24 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378863#comment-14378863
 ] 

Selina Zhang commented on HIVE-10036:
-

[~gopalv] I checked my maven repository, it seems avro 1.7.5 pulled in the 
netty-3.4.0.Final, which override the one defined in main pom.xml.   

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-24 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378176#comment-14378176
 ] 

Selina Zhang commented on HIVE-10036:
-

The review request:
https://reviews.apache.org/r/32445/

Thank you!

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-23 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.2.patch

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-20 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.1.patch

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)