Re: HIVE-11254 Review Request (HPL/SQL tool)

2015-07-21 Thread Alan Gates

I'll look at it later today.

Alan.


Dmitry Tolpeko mailto:dmtolp...@gmail.com
July 21, 2015 at 0:41
Can please anyone review 
https://issues.apache.org/jira/browse/HIVE-11254? The changes are 
related to new HPL/SQL component only.


When renaming the tool from plhql to hplsql I introduced a bug so it 
cannot find a connection profile.


Thanks,

Dmitry


Hive-0.14 - Build # 1019 - Still Failing

2015-07-21 Thread Apache Jenkins Server
Changes for Build #1000

Changes for Build #1001

Changes for Build #1002

Changes for Build #1003

Changes for Build #1004

Changes for Build #1005

Changes for Build #1006

Changes for Build #1007

Changes for Build #1008

Changes for Build #1009

Changes for Build #1010

Changes for Build #1011

Changes for Build #1012

Changes for Build #1013

Changes for Build #1014

Changes for Build #1015

Changes for Build #1016

Changes for Build #1017

Changes for Build #1018

Changes for Build #1019



No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #1019)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-0.14/1019/ to view 
the results.

[jira] [Created] (HIVE-11330) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression

2015-07-21 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created HIVE-11330:
--

 Summary: Add early termination for recursion in 
StatsRulesProcFactory$FilterStatsRule.evaluateExpression
 Key: HIVE-11330
 URL: https://issues.apache.org/jira/browse/HIVE-11330
 Project: Hive
  Issue Type: Bug
  Components: Hive, Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran


Queries with heavily nested filters can cause a StackOverflowError

{code}
Exception in thread main java.lang.StackOverflowError
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:301)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] hive pull request: HIVE-11334-fix substring_index for multiple cha...

2015-07-21 Thread zhichao-li
GitHub user zhichao-li opened a pull request:

https://github.com/apache/hive/pull/47

HIVE-11334-fix substring_index for multiple chars delim



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhichao-li/hive substringindex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/47.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #47


commit 2acedeef1ca4ff1211410e9ffe9c437f2902de0d
Author: zhichao.li zhichao...@intel.com
Date:   2015-07-22T01:50:03Z

fix substring_index for multiple chars delim




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-11334) Incorrect answer when facing multiple chars delim and negative count for substring_index

2015-07-21 Thread zhichao-li (JIRA)
zhichao-li created HIVE-11334:
-

 Summary: Incorrect answer when facing multiple chars delim and 
negative count for substring_index 
 Key: HIVE-11334
 URL: https://issues.apache.org/jira/browse/HIVE-11334
 Project: Hive
  Issue Type: Bug
Reporter: zhichao-li
Priority: Minor


substring_index(www||apache||org, ||, -2) would return |apache||org 
instead of apache||org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11333) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ColumnPruner prunes columns of UnionOperator that should be kept

2015-07-21 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-11333:
--

 Summary: CBO: Calcite Operator To Hive Operator (Calcite Return 
Path): ColumnPruner prunes columns of UnionOperator that should be kept
 Key: HIVE-11333
 URL: https://issues.apache.org/jira/browse/HIVE-11333
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong


unionOperator will have the schema following the operator in the first branch. 
Because ColumnPruner prunes columns based on the internal name, the column in 
other branches may be pruned due to a different internal name from the first 
branch. To repro, run rcfile_union.q with return path turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11335) Multi-Join Inner Query producing incorrect results

2015-07-21 Thread fatkun (JIRA)
fatkun created HIVE-11335:
-

 Summary: Multi-Join Inner Query producing incorrect results
 Key: HIVE-11335
 URL: https://issues.apache.org/jira/browse/HIVE-11335
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.1.0
 Environment: CDH5.4.0
Reporter: fatkun


test step

```
create table log (uid string, uid2 string);
insert into log values ('1', '1');

create table user (uid string, name string);
insert into user values ('1', test1);

select b.name, c.name from log a
 left outer join (select uid, name from user) b on (a.uid=b.uid)
 left outer join user c on (a.uid2=c.uid);

```
return wrong result:
1   test1

It should be both return test1

I try to find error, if I use this query, return right result.(join key 
different)
```
select b.name, c.name from log a
 left outer join (select uid, name from user) b on (a.uid=b.uid)
 left outer join user c on (a.uid=c.uid);
```

The explain is different,Query1 only select one colum
```
b:user 
  TableScan
alias: user
Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: uid (type: string)
  outputColumnNames: _col0
```
I think there is something wrong in ColumnPruner.But i cannot find it out.
It may relate HIVE-10996






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 36650: HIVE-11316 Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()

2015-07-21 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36650/
---

Review request for hive, Gunther Hagleitner and Jesús Camacho Rodríguez.


Repository: hive-git


Description
---

Use datastructure that doesnt duplicate any part of string for 
ASTNode::toStringTree()


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 0c111bc 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ASTNode.java c8dbe97 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 5b469e3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFInvocationSpec.java 29b8510 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java e0cd398 
  ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 14a7e9c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java 5190bda 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java aab4250 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckCtx.java b19e2bf 
  ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java a181f7c 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/MatchPath.java cc2b77b 
  ql/src/test/org/apache/hadoop/hive/ql/parse/TestIUD.java 9d4457c 
  ql/src/test/org/apache/hadoop/hive/ql/parse/TestQBSubQuery.java 77ff79a 
  
ql/src/test/org/apache/hadoop/hive/ql/parse/TestSQL11ReservedKeyWordsPositive.java
 4c84e91 

Diff: https://reviews.apache.org/r/36650/diff/


Testing
---


Thanks,

Hari Sankar Sivarama Subramaniyan



Re: Review Request 36280: HIVE-11196

2015-07-21 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36280/
---

(Updated July 21, 2015, 10:01 p.m.)


Review request for hive, Gunther Hagleitner, Jesús Camacho Rodríguez, and John 
Pullokkaran.


Changes
---

Unit test failures addressed in the new patch.


Repository: hive-git


Description
---

Utilities.getPartitionDesc() should try to reuse TableDesc object


Diffs (updated)
-

  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatMultiOutputFormat.java
 049de54 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d8e463d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 29854d8 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java 
317454d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java b123511 

Diff: https://reviews.apache.org/r/36280/diff/


Testing
---


Thanks,

Hari Sankar Sivarama Subramaniyan



[jira] [Created] (HIVE-11331) Doc Notes

2015-07-21 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-11331:
-

 Summary: Doc Notes
 Key: HIVE-11331
 URL: https://issues.apache.org/jira/browse/HIVE-11331
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Reporter: Eugene Koifman


This ticket is to track various doc related issues for HIVE-9675 since the 
works is spread out over time.

1. calling set autocommit = true while a transaction is open will commit the 
transaction
2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11332) Unicode table comments do not work

2015-07-21 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-11332:
---

 Summary: Unicode table comments do not work
 Key: HIVE-11332
 URL: https://issues.apache.org/jira/browse/HIVE-11332
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


Noticed by accident.
{noformat}
select ' ', count(*) from moo;
Query ID = sershe_20150721190413_979e1b6f-86d6-436f-b8e6-d6785b9d3b83
Total jobs = 1
Launching Job 1 out of 1

[snip]
OK
   0
Time taken: 13.347 seconds, Fetched: 1 row(s)
hive ALTER TABLE moo SET TBLPROPERTIES ('comment' = '' ');
OK
Time taken: 0.292 seconds
hive desc extended moo;
OK
i   int 
 
Detailed Table Information  Table(tableName:moo, dbName:default, 
owner:sershe, createTime:1437519787, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:i, type:int, comment:null)], 
location:hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/moo, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{last_modified_time=1437519883, totalSize=0, 
numRows=-1, rawDataSize=-1, COLUMN_STATS_ACCURATE=false, numFiles=0, 
transient_lastDdlTime=1437519883, comment=?? , last_modified_by=sershe}, 
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) 
Time taken: 0.347 seconds, Fetched: 3 row(s)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


HIVE-11254 Review Request (HPL/SQL tool)

2015-07-21 Thread Dmitry Tolpeko
Can please anyone review https://issues.apache.org/jira/browse/HIVE-11254?
The changes are related to new HPL/SQL component only.

When renaming the tool from plhql to hplsql I introduced a bug so it cannot
find a connection profile.

Thanks,

Dmitry


issue while reading parquet file in hive

2015-07-21 Thread Santlal J Gupta
Hello,



I have following issue.



I have created parquet file through cascading parquet  and want to  load into 
the hive table.

My datafile contain data of type timestamp.

Cascading parquet does not  support  timestamp data type , so while creating 
parquet file I have given as binary type. After generating parquet file , this  
Parquet file is loaded successfully in the hive .



While creating hive table I have given the column type as timestamp.



Code :



package com.parquet.TimestampTest;



import cascading.flow.FlowDef;

import cascading.flow.hadoop.HadoopFlowConnector;

import cascading.pipe.Pipe;

import cascading.scheme.Scheme;

import cascading.scheme.hadoop.TextDelimited;

import cascading.tap.SinkMode;

import cascading.tap.Tap;

import cascading.tap.hadoop.Hfs;

import cascading.tuple.Fields;

import parquet.cascading.ParquetTupleScheme;



public class GenrateTimeStampParquetFile {

static String inputPath = target/input/timestampInputFile1;

static String outputPath = 
target/parquetOutput/TimestampOutput;



public static void main(String[] args) {



write();

}



private static void write() {

// TODO Auto-generated method stub



Fields field = new 
Fields(timestampField).applyTypes(String.class);

Scheme sourceSch = new TextDelimited(field, 
false, \n);



Fields outputField = new 
Fields(timestampField);



Scheme sinkSch = new ParquetTupleScheme(field, 
outputField,

message 
TimeStampTest{optional binary timestampField ;});



Tap source = new Hfs(sourceSch, inputPath);

Tap sink = new Hfs(sinkSch, outputPath, 
SinkMode.REPLACE);



Pipe pipe = new Pipe(Hive timestamp);



FlowDef fd = FlowDef.flowDef().addSource(pipe, 
source).addTailSink(pipe, sink);



new 
HadoopFlowConnector().connect(fd).complete();

}

}



Input file:



timestampInputFile1



timestampField

1988-05-25 15:15:15.254

1987-05-06 14:14:25.362



After running the code following files are generated.

Output :

1. part-0-m-0.parquet

2. _SUCCESS

3. _metadata

4. _common_metadata



I have created the table in hive to load the  part-0-m-0.parquet file.



I have written following query in the hive.

Query :



hive create table test3(timestampField timestamp) stored as parquet;

hive load data local inpath  
'/home/hduser/parquet_testing/part-0-m-0.parquet' into table test3;

hive select  * from test3;



After running above command I got following as output.



Output :



OK

SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.

SLF4J: Defaulting to no-operation (NOP) logger implementation

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast 
to org.apache.hadoop.hive.serde2.io.TimestampWritable





But I have got above exception.



So please help me to solve this problem.



Currently I am using

Hive 1.1.0-cdh5.4.2.

   Cascading 2.5.1

   parquet-format-2.2.0



Thanks

Santlal J. Gupta





**Disclaimer**
 This e-mail message and any attachments may contain confidential information 
and is for the sole use of the intended recipient(s) only. Any views or 
opinions presented or implied are solely those of the author and do not 
necessarily represent the views of BitWise. If you are not the intended 
recipient(s), you are hereby notified that disclosure, printing, copying, 
forwarding, distribution, or the taking of any action whatsoever in reliance on 
the contents of this electronic information is strictly prohibited. If you have 
received this e-mail message in error, please immediately notify the sender and 
delete the electronic message and any attachments.BitWise does not accept 
liability for any virus introduced by this e-mail or any attachments. 



[jira] [Created] (HIVE-11327) HiveQL to HBase - Predicate Pushdown for composite key not working

2015-07-21 Thread Yannik Zuehlke (JIRA)
Yannik Zuehlke created HIVE-11327:
-

 Summary: HiveQL to HBase - Predicate Pushdown for composite key 
not working
 Key: HIVE-11327
 URL: https://issues.apache.org/jira/browse/HIVE-11327
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler, Hive
Affects Versions: 0.14.0
Reporter: Yannik Zuehlke
Priority: Blocker


I am using Hive 0.14 and Hbase 0.98.8 I would like to use HiveQL for accessing 
a HBase table.

I created a table with a complex composite rowkey:

{quote}
CREATE EXTERNAL TABLE db.hive_hbase (rowkey structp1:string, p2:string, 
p3:string, column1 string, column2 string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ';'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (hbase.columns.mapping = 
:key,cf:c1,cf:c2)
TBLPROPERTIES(hbase.table.name=hbase_table);
{quote}


The table is getting successfully created, but the HiveQL query is taking 
forever:


{quote}
SELECT * from db.hive_hbase WHERE rowkey.p1 = 'xyz';
{quote}


I am working with 1 TB of data (around 1,5 bn records) and this queries takes 
forever (It ran over night, but did not finish in the morning).

I changed the log4j properties to 'DEBUG' and found some interesting 
information:

{quote}
2015-07-15 15:56:41,232 INFO  ppd.OpProcFactory
(OpProcFactory.java:logExpr(823)) - Pushdown Predicates of FIL For Alias : 
hive_hbase
2015-07-15 15:56:41,232 INFO  ppd.OpProcFactory 
(OpProcFactory.java:logExpr(826)) - (rowkey.p1 = 'xyz')
{quote}


But some lines later:


{quote}
2015-07-15 15:56:41,430 DEBUG ppd.OpProcFactory 
(OpProcFactory.java:pushFilterToStorageHandler(1051)) - No pushdown possible 
for predicate:  (rowkey.p1 = 'xyz')
{quote}


So my guess is: HiveQL over HBase does not do any predicate pushdown but starts 
a MapReduce job.

The normal HBase scan (via the HBase Shell) takes around 5 seconds.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 36540: HIVE-8128: Improve Parquet Vectorization

2015-07-21 Thread Dong Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36540/
---

(Updated July 21, 2015, 8:44 a.m.)


Review request for hive, Ryan Blue, cheng xu, and Sergio Pena.


Changes
---

Review request


Repository: hive-git


Description
---

This patch is based on the Parquet vector API at 
https://github.com/nezihyigitbasi-nflx/parquet-mr/commits/vector

In this POC, the general workflow was done, two tests passed, and INT type was 
supported. The idea is that we create a VectorizedParquetRecordReader, which 
wraps the ParquetRecordReader provided by Parquet. Then in its next() method, 
we convert Parquet RowBatch to Hive VectorizedRowBatch.

This is the first patch. To complete vectorization feature, we still have work 
to do in follow-up: 1) support all data types 2) support partition column 3) 
add more test cases 4) evaluate performance on a real cluster.


Diffs
-

  pom.xml 1abf738 
  ql/pom.xml 6026c49 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 
e1b6dd8 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java
 98691c7 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 adeb971 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedParquetReader.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorized_parquet_data_types.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/vectorized_parquet_data_types.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/36540/diff/


Testing
---

unit test passed


Thanks,

Dong Chen



[jira] [Created] (HIVE-11326) Parquet table: where clause with partition column fails

2015-07-21 Thread Thomas Friedrich (JIRA)
Thomas Friedrich created HIVE-11326:
---

 Summary: Parquet table: where clause with partition column fails
 Key: HIVE-11326
 URL: https://issues.apache.org/jira/browse/HIVE-11326
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.2.0, 1.2.1
Reporter: Thomas Friedrich


Steps:
create table t1 (c1 int) partitioned by (part string) stored as parquet;
insert into table t1 partition (part='p1') values (1);
select * from t1 where part='p1';

Error message:
Caused by: java.lang.IllegalArgumentException: Column [part] was not found in 
schema!
at parquet.Preconditions.checkArgument(Preconditions.java:55)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:190)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:178)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:160)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:94)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:59)
at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:180)
at 
parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:64)
at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:59)
at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:40)
at 
parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:126)
at 
parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:46)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:275)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:99)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:85)
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:67)

Seems that problem was introduced with HIVE-10252 ([~dongc]). Filter can't 
contain any partition columns in case of Parquet table. 
While searching for an existing JIRA, I found a similar problem reported for 
Spark - SPARK-6554

I think the setFilter method should remove all predicates that reference 
partition columns before building the FilterPredicate object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11329) Column prefix in key of hbase column prefix map

2015-07-21 Thread Wojciech Indyk (JIRA)
Wojciech Indyk created HIVE-11329:
-

 Summary: Column prefix in key of hbase column prefix map
 Key: HIVE-11329
 URL: https://issues.apache.org/jira/browse/HIVE-11329
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.13.0
Reporter: Wojciech Indyk
Assignee: Wojciech Indyk
Priority: Minor


When I create a table with hbase column prefix 
https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result map 
in hive. 
E.g. record in HBase
rowkey: 123
column: tag_one, value: 0.5
column: tag_two, value 0.5
representation in Hive via column prefix mapping tag_.*:
column: tag mapstring,string
key: tag_one, value: 0.5
key: tag_two, value: 0.5

should be:
key: one, value: 0.5
key: two: value: 0.5




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11328) Avoid String representation of expression nodes in ConstantPropagateProcFactory unless necessary

2015-07-21 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-11328:
--

 Summary: Avoid String representation of expression nodes in 
ConstantPropagateProcFactory unless necessary
 Key: HIVE-11328
 URL: https://issues.apache.org/jira/browse/HIVE-11328
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)