[jira] [Created] (HIVE-7346) Wrong results caused by hive ppd under specific join condition

2014-07-03 Thread dima machlin (JIRA)
dima machlin created HIVE-7346:
--

 Summary: Wrong results caused by hive ppd under specific join 
condition
 Key: HIVE-7346
 URL: https://issues.apache.org/jira/browse/HIVE-7346
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: dima machlin


Assuming two tables :
{code:sql} t1(id1 string,id2 string) , t2 (id string,d int) {code}
t1 contains 1 row : 'a','a'
t2 contains 1 row : 'a',2

The following query : 
{code:sql} select a.*,b.d d1,c.d d2
from t1 a join t2 b on (a.id1=b.id)
join t2 c on (a.id2=b.id)
where b.d =1 and c.d=1 {code}

Returns 0 rows as expected because t2.d = 2

Wrapping this query, like so : 
{code:sql} select * from (

select a.*,b.d d1,c.d d2
from t1 a join t2 b on (a.id1=b.id)
join t2 c on (a.id2=b.id)
where b.d =1 and c.d=1

) z where d11 or d21 {code}
Where another filter was add on the columns causes the plan to lack the filter 
of the =1 and return a single row - *Wrong Results*.

The plan is : 
{code:sql}
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN 
(TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. 
(TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF 
(TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) 
id (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
(TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL 
b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (= (. 
(TOK_TABLE_OR_COL b) d) 1) (= (. (TOK_TABLE_OR_COL c) d) 1) z)) 
(TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
TOK_ALLCOLREF)) (TOK_WHERE (or ( (TOK_TABLE_OR_COL d1) 1) ( (TOK_TABLE_OR_COL 
d2) 1)

STAGE DEPENDENCIES:
  Stage-7 is a root stage
  Stage-5 depends on stages: Stage-7
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-7
Map Reduce Local Work
  Alias - Map Local Tables:
z:b 
  Fetch Operator
limit: -1
z:c 
  Fetch Operator
limit: -1
  Alias - Map Local Operator Tree:
z:b 
  TableScan
alias: b
HashTable Sink Operator
  condition expressions:
0 {id1} {id2}
1 {id} {d}
  handleSkewJoin: false
  keys:
0 [Column[id1]]
1 [Column[id]]
  Position of Big Table: 0
z:c 
  TableScan
alias: c
HashTable Sink Operator
  condition expressions:
0 {_col5} {_col0} {_col1}
1 {d}
  handleSkewJoin: false
  keys:
0 []
1 []
  Position of Big Table: 0

  Stage: Stage-5
Map Reduce
  Alias - Map Operator Tree:
z:a 
  TableScan
alias: a
Map Join Operator
  condition map:
   Inner Join 0 to 1
  condition expressions:
0 {id1} {id2}
1 {id} {d}
  handleSkewJoin: false
  keys:
0 [Column[id1]]
1 [Column[id]]
  outputColumnNames: _col0, _col1, _col4, _col5
  Position of Big Table: 0
  Filter Operator
predicate:
expr: (_col1 = _col4)
type: boolean
Map Join Operator
  condition map:
   Inner Join 0 to 1
  condition expressions:
0 {_col5} {_col0} {_col1}
1 {d}
  handleSkewJoin: false
  keys:
0 []
1 []
  outputColumnNames: _col1, _col4, _col5, _col9
  Position of Big Table: 0
  Filter Operator
predicate:
expr: ((_col1  1) or (_col9  1))
type: boolean
Select Operator
  expressions:
expr: _col4
type: string
expr: _col5
type: string
expr: _col1
type: int
expr: _col9
type: int
  outputColumnNames: _col0, _col1, _col2, _col3
  File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: 
org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  Local Work:
Map Reduce Local 

[jira] [Created] (HIVE-7314) Wrong results of UDF when hive.cache.expr.evaluation is set

2014-06-30 Thread dima machlin (JIRA)
dima machlin created HIVE-7314:
--

 Summary: Wrong results of UDF when hive.cache.expr.evaluation is 
set
 Key: HIVE-7314
 URL: https://issues.apache.org/jira/browse/HIVE-7314
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: dima machlin


It seems that the expression caching doesn't work when using UDF inside another 
UDF or a hive function.

For example :
tbl has one row : 'a','b'
The following query :
select concat(custUDF(a),' ', custUDF(b)) from tbl;

returns 'a a'

seems to cache custUDF(a)  and use it for custUDF(b).
Same query without the concat works fine.
Replacing the concat with another custom UDF also returns 'a a'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7314) Wrong results of UDF when hive.cache.expr.evaluation is set

2014-06-30 Thread dima machlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-7314:
---

Description: 
It seems that the expression caching doesn't work when using UDF inside another 
UDF or a hive function.

For example :
tbl has one row : 'a','b'
The following query :
{code:sql} select concat(custUDF(a),' ', custUDF(b)) from tbl; {code}

returns 'a a'

seems to cache custUDF(a)  and use it for custUDF(b).
Same query without the concat works fine.
Replacing the concat with another custom UDF also returns 'a a'

  was:
It seems that the expression caching doesn't work when using UDF inside another 
UDF or a hive function.

For example :
tbl has one row : 'a','b'
The following query :
select concat(custUDF(a),' ', custUDF(b)) from tbl;

returns 'a a'

seems to cache custUDF(a)  and use it for custUDF(b).
Same query without the concat works fine.
Replacing the concat with another custom UDF also returns 'a a'


 Wrong results of UDF when hive.cache.expr.evaluation is set
 ---

 Key: HIVE-7314
 URL: https://issues.apache.org/jira/browse/HIVE-7314
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: dima machlin

 It seems that the expression caching doesn't work when using UDF inside 
 another UDF or a hive function.
 For example :
 tbl has one row : 'a','b'
 The following query :
 {code:sql} select concat(custUDF(a),' ', custUDF(b)) from tbl; {code}
 returns 'a a'
 seems to cache custUDF(a)  and use it for custUDF(b).
 Same query without the concat works fine.
 Replacing the concat with another custom UDF also returns 'a a'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7314) Wrong results of UDF when hive.cache.expr.evaluation is set

2014-06-30 Thread dima machlin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047477#comment-14047477
 ] 

dima machlin commented on HIVE-7314:


{code:java} public String getDisplayString(String[] children) {
return PurswayCleanNameUDF;
}
{code}

 Wrong results of UDF when hive.cache.expr.evaluation is set
 ---

 Key: HIVE-7314
 URL: https://issues.apache.org/jira/browse/HIVE-7314
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: dima machlin

 It seems that the expression caching doesn't work when using UDF inside 
 another UDF or a hive function.
 For example :
 tbl has one row : 'a','b'
 The following query :
 {code:sql} select concat(custUDF(a),' ', custUDF(b)) from tbl; {code}
 returns 'a a'
 seems to cache custUDF(a)  and use it for custUDF(b).
 Same query without the concat works fine.
 Replacing the concat with another custom UDF also returns 'a a'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7314) Wrong results of UDF when hive.cache.expr.evaluation is set

2014-06-30 Thread dima machlin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047493#comment-14047493
 ] 

dima machlin commented on HIVE-7314:


This is the workaround i'm using. Thank you.

But why does it work as separate columns and not inside a 2nd function?

 Wrong results of UDF when hive.cache.expr.evaluation is set
 ---

 Key: HIVE-7314
 URL: https://issues.apache.org/jira/browse/HIVE-7314
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: dima machlin
Assignee: Navis
 Attachments: HIVE-7314.1.patch.txt


 It seems that the expression caching doesn't work when using UDF inside 
 another UDF or a hive function.
 For example :
 tbl has one row : 'a','b'
 The following query :
 {code:sql} select concat(custUDF(a),' ', custUDF(b)) from tbl; {code}
 returns 'a a'
 seems to cache custUDF(a)  and use it for custUDF(b).
 Same query without the concat works fine.
 Replacing the concat with another custom UDF also returns 'a a'



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization

2014-06-10 Thread dima machlin (JIRA)
dima machlin created HIVE-7205:
--

 Summary: Wrong results when union all of grouping followed by 
group by with correlation optimization
 Key: HIVE-7205
 URL: https://issues.apache.org/jira/browse/HIVE-7205
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: dima machlin
Priority: Critical


use case :

table TBL (a string,b string) contains single row : 'a','a'
the following query :

select b, sum(cc) from (
select b,count(1) as cc from TBL group by b
union all
select a as b,count(1) as cc from TBL group by a
) z
group by b

returns 

a 1
a 1
while set hive.optimize.correlation=true;

if we change set hive.optimize.correlation=false;
it returns correct results : a 2



The plan with correlation optimization :

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
(TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
(TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
(TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
(TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) 
z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
(TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
(TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias - Map Operator Tree:
null-subquery1:z-subquery1:TBL 
  TableScan
alias: TBL
Select Operator
  expressions:
expr: b
type: string
  outputColumnNames: b
  Group By Operator
aggregations:
  expr: count(1)
bucketGroup: false
keys:
  expr: b
  type: string
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
  key expressions:
expr: _col0
type: string
  sort order: +
  Map-reduce partition columns:
expr: _col0
type: string
  tag: 0
  value expressions:
expr: _col1
type: bigint
null-subquery2:z-subquery2:TBL 
  TableScan
alias: TBL
Select Operator
  expressions:
expr: a
type: string
  outputColumnNames: a
  Group By Operator
aggregations:
  expr: count(1)
bucketGroup: false
keys:
  expr: a
  type: string
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
  key expressions:
expr: _col0
type: string
  sort order: +
  Map-reduce partition columns:
expr: _col0
type: string
  tag: 1
  value expressions:
expr: _col1
type: bigint
  Reduce Operator Tree:
Demux Operator
  Group By Operator
aggregations:
  expr: count(VALUE._col0)
bucketGroup: false
keys:
  expr: KEY._col0
  type: string
mode: mergepartial
outputColumnNames: _col0, _col1
Select Operator
  expressions:
expr: _col0
type: string
expr: _col1
type: bigint
  outputColumnNames: _col0, _col1
  Union
Select Operator
  expressions:
expr: _col0
type: string
expr: _col1
type: bigint
  outputColumnNames: _col0, _col1
  Mux Operator
Group By Operator
  aggregations:
expr: sum(_col1)
  bucketGroup: false
  keys:
expr: _col0
type: string
  mode: complete
  outputColumnNames: _col0, _col1
  Select Operator
   

[jira] [Updated] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization

2014-06-10 Thread dima machlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-7205:
---

Description: 
use case :

table TBL (a string,b string) contains single row : 'a','a'
the following query :

select b, sum(cc) from (
select b,count(1) as cc from TBL group by b
union all
select a as b,count(1) as cc from TBL group by a
) z
group by b

returns 

a 1
a 1
while set hive.optimize.correlation=true;

if we change set hive.optimize.correlation=false;
it returns correct results : a 2



The plan with correlation optimization :
{code:sql}
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
(TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
(TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
(TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
(TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) 
z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
(TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
(TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias - Map Operator Tree:
null-subquery1:z-subquery1:TBL 
  TableScan
alias: TBL
Select Operator
  expressions:
expr: b
type: string
  outputColumnNames: b
  Group By Operator
aggregations:
  expr: count(1)
bucketGroup: false
keys:
  expr: b
  type: string
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
  key expressions:
expr: _col0
type: string
  sort order: +
  Map-reduce partition columns:
expr: _col0
type: string
  tag: 0
  value expressions:
expr: _col1
type: bigint
null-subquery2:z-subquery2:TBL 
  TableScan
alias: TBL
Select Operator
  expressions:
expr: a
type: string
  outputColumnNames: a
  Group By Operator
aggregations:
  expr: count(1)
bucketGroup: false
keys:
  expr: a
  type: string
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
  key expressions:
expr: _col0
type: string
  sort order: +
  Map-reduce partition columns:
expr: _col0
type: string
  tag: 1
  value expressions:
expr: _col1
type: bigint
  Reduce Operator Tree:
Demux Operator
  Group By Operator
aggregations:
  expr: count(VALUE._col0)
bucketGroup: false
keys:
  expr: KEY._col0
  type: string
mode: mergepartial
outputColumnNames: _col0, _col1
Select Operator
  expressions:
expr: _col0
type: string
expr: _col1
type: bigint
  outputColumnNames: _col0, _col1
  Union
Select Operator
  expressions:
expr: _col0
type: string
expr: _col1
type: bigint
  outputColumnNames: _col0, _col1
  Mux Operator
Group By Operator
  aggregations:
expr: sum(_col1)
  bucketGroup: false
  keys:
expr: _col0
type: string
  mode: complete
  outputColumnNames: _col0, _col1
  Select Operator
expressions:
  expr: _col0
  type: string
  expr: _col1
  type: bigint
   

[jira] [Updated] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization

2014-06-10 Thread dima machlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-7205:
---

Description: 
use case :

table TBL (a string,b string) contains single row : 'a','a'
the following query :
{code:sql}
select b, sum(cc) from (
select b,count(1) as cc from TBL group by b
union all
select a as b,count(1) as cc from TBL group by a
) z
group by b
{code}
returns 

a 1
a 1
while set hive.optimize.correlation=true;

if we change set hive.optimize.correlation=false;
it returns correct results : a 2



The plan with correlation optimization :
{code:sql}
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
(TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
(TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
(TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
(TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) 
z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
(TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
(TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias - Map Operator Tree:
null-subquery1:z-subquery1:TBL 
  TableScan
alias: TBL
Select Operator
  expressions:
expr: b
type: string
  outputColumnNames: b
  Group By Operator
aggregations:
  expr: count(1)
bucketGroup: false
keys:
  expr: b
  type: string
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
  key expressions:
expr: _col0
type: string
  sort order: +
  Map-reduce partition columns:
expr: _col0
type: string
  tag: 0
  value expressions:
expr: _col1
type: bigint
null-subquery2:z-subquery2:TBL 
  TableScan
alias: TBL
Select Operator
  expressions:
expr: a
type: string
  outputColumnNames: a
  Group By Operator
aggregations:
  expr: count(1)
bucketGroup: false
keys:
  expr: a
  type: string
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
  key expressions:
expr: _col0
type: string
  sort order: +
  Map-reduce partition columns:
expr: _col0
type: string
  tag: 1
  value expressions:
expr: _col1
type: bigint
  Reduce Operator Tree:
Demux Operator
  Group By Operator
aggregations:
  expr: count(VALUE._col0)
bucketGroup: false
keys:
  expr: KEY._col0
  type: string
mode: mergepartial
outputColumnNames: _col0, _col1
Select Operator
  expressions:
expr: _col0
type: string
expr: _col1
type: bigint
  outputColumnNames: _col0, _col1
  Union
Select Operator
  expressions:
expr: _col0
type: string
expr: _col1
type: bigint
  outputColumnNames: _col0, _col1
  Mux Operator
Group By Operator
  aggregations:
expr: sum(_col1)
  bucketGroup: false
  keys:
expr: _col0
type: string
  mode: complete
  outputColumnNames: _col0, _col1
  Select Operator
expressions:
  expr: _col0
  type: string
  expr: _col1
  type: 

[jira] [Updated] (HIVE-2627) NPE on MAP-JOIN with a UDF in an external JAR

2014-06-08 Thread dima machlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-2627:
---

Affects Version/s: 0.12.0

 NPE on MAP-JOIN with a UDF in an external JAR
 -

 Key: HIVE-2627
 URL: https://issues.apache.org/jira/browse/HIVE-2627
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jonathan Chang

 When a query is converted into a map join, and it depends on some UDF (ADD 
 JAR...; CREATE TEMPORARY FUNCTION...), then an NPE may happen.  Here is an 
 example.
 SELECT
 some_udf(dummy1) as dummies
 FROM (
 SELECT
 a.dummy as dummy1,
 b.dummy as dummy2
 FROM
 test a
 LEFT OUTER JOIN
 test b
 ON
 a.dummy = b.dummy
 ) c;
 My guess is that the JAR classes are not getting propagated to the 
 hashmapjoin operator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4317) StackOverflowError when add jar concurrently

2014-05-27 Thread dima machlin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009499#comment-14009499
 ] 

dima machlin commented on HIVE-4317:


I can confirm that this also happens in Hive 0.12 and is fully reproducible.

 StackOverflowError when add jar concurrently 
 -

 Key: HIVE-4317
 URL: https://issues.apache.org/jira/browse/HIVE-4317
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0, 0.10.0
Reporter: wangwenli
 Attachments: hive-4317.1.patch


 scenario: multiple thread add jar and do select operation by jdbc 
 concurrently , when hiveserver serializeMapRedWork sometimes, it will throw 
 StackOverflowError from XMLEncoder.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-2627) NPE on MAP-JOIN with a UDF in an external JAR

2014-05-27 Thread dima machlin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009503#comment-14009503
 ] 

dima machlin commented on HIVE-2627:


I can confirm that this still happens in hive 0.12.
Getting : 

java.lang.ClassNotFoundException: com.some.class.used.by.UDF
Continuing ...
java.lang.NullPointerException: target should not be null
java.lang.NullPointerException: target should not be null
Continuing ...

and eventually 

ERROR mr.MapredLocalTask: Hive Runtime Error: Map local work failed
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.isStateful(FunctionRegistry.java:1415)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.isDeterministic(FunctionRegistry.java:1385)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.isDeterministic(ExprNodeGenericFuncEvaluator.java:132)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.isDeterministic(FunctionRegistry.java:1385)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.isDeterministic(ExprNodeGenericFuncEvaluator.java:132)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.iterate(ExprNodeEvaluatorFactory.java:83)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.iterate(ExprNodeEvaluatorFactory.java:83)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.toCachedEval(ExprNodeEvaluatorFactory.java:73)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:57)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:57)
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)

at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:453)at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:453)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:188)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:188)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:419)
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:305)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:722)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)



 NPE on MAP-JOIN with a UDF in an external JAR
 -

 Key: HIVE-2627
 URL: https://issues.apache.org/jira/browse/HIVE-2627
 Project: Hive
  Issue Type: Bug
Reporter: Jonathan Chang

 When a query is converted into a map join, and it depends on some UDF (ADD 
 JAR...; CREATE TEMPORARY FUNCTION...), then an NPE may happen.  Here is an 
 example.
 SELECT
 some_udf(dummy1) as dummies
 FROM (
 SELECT
 a.dummy as dummy1,
 b.dummy as dummy2
 FROM
 test a
 LEFT OUTER JOIN
 test b
 ON
 a.dummy = b.dummy
 ) c;
 My guess is that the JAR classes are not getting propagated to the 
 hashmapjoin operator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause

2014-05-12 Thread dima machlin (JIRA)
dima machlin created HIVE-7045:
--

 Summary: Wrong results in multi-table insert aggregating without 
group by clause
 Key: HIVE-7045
 URL: https://issues.apache.org/jira/browse/HIVE-7045
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.10.0
Reporter: dima machlin


The scenario :

CREATE  TABLE t1 (a int, b int);
CREATE  TABLE t2 (cnt int) PARTITIONED BY (var_name string);

insert into table t1 select 1,1 from asd limit 1;
insert into table t1 select 2,2 from asd limit 1;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt 
insert overwrite table t2 partition(var_name='b') select count(b) cnt ;

select * from t2;
returns : 
2 a
2 b

as expected.

Setting the number of reducers higher than 1 :

set mapred.reduce.tasks=2;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt;

select * from t2;
1   a
1   a
1   b
1   b

Wrong results.

This happens when ever t1 is big enough to automatically generate more than 1 
reducers and without specifying it directly.

adding group by 1 in the end of each insert solves the problem :

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1
insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 
1;

generates : 
2 a
2 b

This should work without the group by...
The number of rows for each partition will be the amount of reducers.
Each reducer calculated a sub total of the count.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause

2014-05-12 Thread dima machlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-7045:
---

Description: 
The scenario :

CREATE  TABLE t1 (a int, b int);
CREATE  TABLE t2 (cnt int) PARTITIONED BY (var_name string);

insert into table t1 select 1,1 from asd limit 1;
insert into table t1 select 2,2 from asd limit 1;

t1 contains :
1 1
2 2

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt 
insert overwrite table t2 partition(var_name='b') select count(b) cnt ;

select * from t2;
returns : 
2 a
2 b

as expected.

Setting the number of reducers higher than 1 :

set mapred.reduce.tasks=2;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt;

select * from t2;
1   a
1   a
1   b
1   b

Wrong results.

This happens when ever t1 is big enough to automatically generate more than 1 
reducers and without specifying it directly.

adding group by 1 in the end of each insert solves the problem :

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1
insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 
1;

generates : 
2 a
2 b

This should work without the group by...
The number of rows for each partition will be the amount of reducers.
Each reducer calculated a sub total of the count.



  was:
The scenario :

CREATE  TABLE t1 (a int, b int);
CREATE  TABLE t2 (cnt int) PARTITIONED BY (var_name string);

insert into table t1 select 1,1 from asd limit 1;
insert into table t1 select 2,2 from asd limit 1;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt 
insert overwrite table t2 partition(var_name='b') select count(b) cnt ;

select * from t2;
returns : 
2 a
2 b

as expected.

Setting the number of reducers higher than 1 :

set mapred.reduce.tasks=2;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt;

select * from t2;
1   a
1   a
1   b
1   b

Wrong results.

This happens when ever t1 is big enough to automatically generate more than 1 
reducers and without specifying it directly.

adding group by 1 in the end of each insert solves the problem :

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1
insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 
1;

generates : 
2 a
2 b

This should work without the group by...
The number of rows for each partition will be the amount of reducers.
Each reducer calculated a sub total of the count.




 Wrong results in multi-table insert aggregating without group by clause
 ---

 Key: HIVE-7045
 URL: https://issues.apache.org/jira/browse/HIVE-7045
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.12.0
Reporter: dima machlin

 The scenario :
 CREATE  TABLE t1 (a int, b int);
 CREATE  TABLE t2 (cnt int) PARTITIONED BY (var_name string);
 insert into table t1 select 1,1 from asd limit 1;
 insert into table t1 select 2,2 from asd limit 1;
 t1 contains :
 1 1
 2 2
 from  t1
 insert overwrite table t2 partition(var_name='a') select count(a) cnt 
 insert overwrite table t2 partition(var_name='b') select count(b) cnt ;
 select * from t2;
 returns : 
 2 a
 2 b
 as expected.
 Setting the number of reducers higher than 1 :
 set mapred.reduce.tasks=2;
 from  t1
 insert overwrite table t2 partition(var_name='a') select count(a) cnt
 insert overwrite table t2 partition(var_name='b') select count(b) cnt;
 select * from t2;
 1 a
 1 a
 1 b
 1 b
 Wrong results.
 This happens when ever t1 is big enough to automatically generate more than 1 
 reducers and without specifying it directly.
 adding group by 1 in the end of each insert solves the problem :
 from  t1
 insert overwrite table t2 partition(var_name='a') select count(a) cnt group 
 by 1
 insert overwrite table t2 partition(var_name='b') select count(b) cnt group 
 by 1;
 generates : 
 2 a
 2 b
 This should work without the group by...
 The number of rows for each partition will be the amount of reducers.
 Each reducer calculated a sub total of the count.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6448) Java heap space pasring a query with many union all

2014-02-17 Thread dima machlin (JIRA)
dima machlin created HIVE-6448:
--

 Summary: Java heap space pasring a query with many union all
 Key: HIVE-6448
 URL: https://issues.apache.org/jira/browse/HIVE-6448
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Affects Versions: 0.11.0
 Environment:  
Reporter: dima machlin


too many union all statements in a single query require massive amount of 
memory to process.
Under default HiveServer Xmx (256) we can run ~50 union alls.
It seems to be rather linear as increasing the Xmx to 1024 we can run ~200.
The error is :

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at 
org.apache.hadoop.hive.ql.QueryPlan.getJSONKeyValue(QueryPlan.java:453)
at org.apache.hadoop.hive.ql.QueryPlan.getJSONQuery(QueryPlan.java:598)
at org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:609)
at 
org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:499)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:136)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6448) Java heap space pasring a query with many union all

2014-02-17 Thread dima machlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-6448:
---

Description: 
too many union all statements in a single query require massive amount of 
memory to process.
Under default HiveServer Xmx (256) we can run ~50 union alls.
It seems to be rather linear as increasing the Xmx to 1024 we can run ~200.
The error is :

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at 
org.apache.hadoop.hive.ql.QueryPlan.getJSONKeyValue(QueryPlan.java:453)
at org.apache.hadoop.hive.ql.QueryPlan.getJSONQuery(QueryPlan.java:598)
at org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:609)
at 
org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:499)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:136)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

Query example :
select * 
from (
select count(*) from default.dual union all
select count(*) from default.dual union all
..
select count(*) from default.dual) z



  was:
too many union all statements in a single query require massive amount of 
memory to process.
Under default HiveServer Xmx (256) we can run ~50 union alls.
It seems to be rather linear as increasing the Xmx to 1024 we can run ~200.
The error is :

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at 
org.apache.hadoop.hive.ql.QueryPlan.getJSONKeyValue(QueryPlan.java:453)
at org.apache.hadoop.hive.ql.QueryPlan.getJSONQuery(QueryPlan.java:598)
at org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:609)
at 
org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:499)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:136)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)



 Java heap space pasring a query with many union all
 ---

 Key: HIVE-6448
 URL: https://issues.apache.org/jira/browse/HIVE-6448
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Affects Versions: 0.11.0
 Environment:  
Reporter: dima machlin

 too many union all statements in a single query require massive amount of 
 memory to process.
 Under default HiveServer Xmx (256) we can run ~50 union alls.
 It seems to be rather linear as increasing the Xmx to 1024 we can run ~200.
 The error is :
 java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2882)
 at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
 at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
 at java.lang.StringBuilder.append(StringBuilder.java:119)
 at 
 org.apache.hadoop.hive.ql.QueryPlan.getJSONKeyValue(QueryPlan.java:453)
 at 
 org.apache.hadoop.hive.ql.QueryPlan.getJSONQuery(QueryPlan.java:598)
 at org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:609)
 at 
 org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:499)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:136)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
 Query example :
 select * 
 from (
 select count(*) from default.dual union all
 select count(*) from default.dual union all
 ..
 select count(*) from default.dual) z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6448) Java heap space pasring a query with many union all

2014-02-17 Thread dima machlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-6448:
---

Description: 
too many union all statements in a single query require massive amount of 
memory to process.
Under default HiveServer Xmx (256) we can run ~50 union alls.
It seems to be rather linear as increasing the Xmx to 1024 we can run ~200.
The error is :

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at 
org.apache.hadoop.hive.ql.QueryPlan.getJSONKeyValue(QueryPlan.java:453)
at org.apache.hadoop.hive.ql.QueryPlan.getJSONQuery(QueryPlan.java:598)
at org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:609)
at 
org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:499)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:136)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

Query example :
select * 
from (
select count( * ) from default.dual union all
select count ( * ) from default.dual union all
..
select count( * ) from default.dual) z



  was:
too many union all statements in a single query require massive amount of 
memory to process.
Under default HiveServer Xmx (256) we can run ~50 union alls.
It seems to be rather linear as increasing the Xmx to 1024 we can run ~200.
The error is :

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at 
org.apache.hadoop.hive.ql.QueryPlan.getJSONKeyValue(QueryPlan.java:453)
at org.apache.hadoop.hive.ql.QueryPlan.getJSONQuery(QueryPlan.java:598)
at org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:609)
at 
org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:499)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:136)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

Query example :
select * 
from (
select count(*) from default.dual union all
select count(*) from default.dual union all
..
select count(*) from default.dual) z




 Java heap space pasring a query with many union all
 ---

 Key: HIVE-6448
 URL: https://issues.apache.org/jira/browse/HIVE-6448
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Affects Versions: 0.11.0
 Environment:  
Reporter: dima machlin

 too many union all statements in a single query require massive amount of 
 memory to process.
 Under default HiveServer Xmx (256) we can run ~50 union alls.
 It seems to be rather linear as increasing the Xmx to 1024 we can run ~200.
 The error is :
 java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2882)
 at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
 at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
 at java.lang.StringBuilder.append(StringBuilder.java:119)
 at 
 org.apache.hadoop.hive.ql.QueryPlan.getJSONKeyValue(QueryPlan.java:453)
 at 
 org.apache.hadoop.hive.ql.QueryPlan.getJSONQuery(QueryPlan.java:598)
 at org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:609)
 at 
 org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:499)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:136)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
 Query example :
 select * 
 from (
 select count( * ) from default.dual union all
 select count ( * ) from default.dual union all
 ..
 select count( * ) from default.dual) z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6141) Can't select from views after upgrading metastore from 0.7 to 0.10

2014-01-05 Thread dima machlin (JIRA)
dima machlin created HIVE-6141:
--

 Summary: Can't select from views after upgrading metastore from 
0.7 to 0.10
 Key: HIVE-6141
 URL: https://issues.apache.org/jira/browse/HIVE-6141
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: dima machlin
Priority: Minor


Selecting from a view created in 0.7 after upgrade to 0.10 fails on 
bq. Invalid table alias or column reference
The reason is that while running this :
bq. create view a as select b from c.d;
in v0.7 will generate in the metastore this SQL :
bq. select `*c.*d`.b from `c`.`d`; which fails in v0.10
While running the same SQL in v0.10 generates different SQL :
bq. select `d`.b from `c`.`d`; which succeeds in v0.10.

A workaround is to recreate the view.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6141) Can't select from views after upgrading metastore from 0.7 to 0.10

2014-01-05 Thread dima machlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-6141:
---

Description: 
Selecting from a view created in 0.7 after upgrade to 0.10 fails on 
bq. Invalid table alias or column reference
The reason is that while running this :
bq. create view a as select b from c.d;
in v0.7 will generate in the metastore this SQL :
bq. select `*c.*d`.b from `c`.`d`; 
which fails in v0.10
While running the same create SQL in v0.10 generates different SQL :
bq. select `d`.b from `c`.`d`; 
which succeeds in v0.10.

A workaround is to recreate the view.

  was:
Selecting from a view created in 0.7 after upgrade to 0.10 fails on 
bq. Invalid table alias or column reference
The reason is that while running this :
bq. create view a as select b from c.d;
in v0.7 will generate in the metastore this SQL :
bq. select `*c.*d`.b from `c`.`d`; which fails in v0.10
While running the same SQL in v0.10 generates different SQL :
bq. select `d`.b from `c`.`d`; which succeeds in v0.10.

A workaround is to recreate the view.


 Can't select from views after upgrading metastore from 0.7 to 0.10
 --

 Key: HIVE-6141
 URL: https://issues.apache.org/jira/browse/HIVE-6141
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: dima machlin
Priority: Minor

 Selecting from a view created in 0.7 after upgrade to 0.10 fails on 
 bq. Invalid table alias or column reference
 The reason is that while running this :
 bq. create view a as select b from c.d;
 in v0.7 will generate in the metastore this SQL :
 bq. select `*c.*d`.b from `c`.`d`; 
 which fails in v0.10
 While running the same create SQL in v0.10 generates different SQL :
 bq. select `d`.b from `c`.`d`; 
 which succeeds in v0.10.
 A workaround is to recreate the view.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6141) Can't select from views after upgrading metastore from 0.7 to 0.10

2014-01-05 Thread dima machlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-6141:
---

Description: 
Selecting from a view created in 0.7 after upgrade to 0.10 fails on 
bq. Invalid table alias or column reference
The reason is that while running this :
bq. create view a as select b from c.d;
in v0.7 will generate in the metastore this SQL :
bq. select `*c.*d`.b from `c`.`d`; 
which fails in v0.10
While running the same create SQL in v0.10 generates different SQL in the 
metastore :
bq. select `d`.b from `c`.`d`; 
which succeeds in v0.10.

A workaround is to recreate the view.

  was:
Selecting from a view created in 0.7 after upgrade to 0.10 fails on 
bq. Invalid table alias or column reference
The reason is that while running this :
bq. create view a as select b from c.d;
in v0.7 will generate in the metastore this SQL :
bq. select `*c.*d`.b from `c`.`d`; 
which fails in v0.10
While running the same create SQL in v0.10 generates different SQL :
bq. select `d`.b from `c`.`d`; 
which succeeds in v0.10.

A workaround is to recreate the view.


 Can't select from views after upgrading metastore from 0.7 to 0.10
 --

 Key: HIVE-6141
 URL: https://issues.apache.org/jira/browse/HIVE-6141
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: dima machlin
Priority: Minor

 Selecting from a view created in 0.7 after upgrade to 0.10 fails on 
 bq. Invalid table alias or column reference
 The reason is that while running this :
 bq. create view a as select b from c.d;
 in v0.7 will generate in the metastore this SQL :
 bq. select `*c.*d`.b from `c`.`d`; 
 which fails in v0.10
 While running the same create SQL in v0.10 generates different SQL in the 
 metastore :
 bq. select `d`.b from `c`.`d`; 
 which succeeds in v0.10.
 A workaround is to recreate the view.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6141) Can't select from views after upgrading metastore from 0.7 to 0.10

2014-01-05 Thread dima machlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-6141:
---

Description: 
Selecting from a view created in 0.7 after upgrade to 0.10 fails on 
bq. Invalid table alias or column reference
The reason is that while running this :
bq. create view a as select b from c.d;
in v0.7 will generate in the metastore this SQL :
bq. select `c.d`.b from `c`.`d`; 
which fails in v0.10
While running the same create SQL in v0.10 generates different SQL in the 
metastore :
bq. select `d`.b from `c`.`d`; 
which succeeds in v0.10 (notice there is no DB prefix in the select part)

*A workaround is to recreate the view.*

  was:
Selecting from a view created in 0.7 after upgrade to 0.10 fails on 
bq. Invalid table alias or column reference
The reason is that while running this :
bq. create view a as select b from c.d;
in v0.7 will generate in the metastore this SQL :
bq. select `*c.*d`.b from `c`.`d`; 
which fails in v0.10
While running the same create SQL in v0.10 generates different SQL in the 
metastore :
bq. select `d`.b from `c`.`d`; 
which succeeds in v0.10.

A workaround is to recreate the view.


 Can't select from views after upgrading metastore from 0.7 to 0.10
 --

 Key: HIVE-6141
 URL: https://issues.apache.org/jira/browse/HIVE-6141
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: dima machlin
Priority: Minor

 Selecting from a view created in 0.7 after upgrade to 0.10 fails on 
 bq. Invalid table alias or column reference
 The reason is that while running this :
 bq. create view a as select b from c.d;
 in v0.7 will generate in the metastore this SQL :
 bq. select `c.d`.b from `c`.`d`; 
 which fails in v0.10
 While running the same create SQL in v0.10 generates different SQL in the 
 metastore :
 bq. select `d`.b from `c`.`d`; 
 which succeeds in v0.10 (notice there is no DB prefix in the select part)
 *A workaround is to recreate the view.*



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-5964) Hive missing a filter predicate causing wrong results joining tables after sort by

2013-12-05 Thread dima machlin (JIRA)
dima machlin created HIVE-5964:
--

 Summary: Hive missing a filter predicate causing wrong results 
joining tables after sort by
 Key: HIVE-5964
 URL: https://issues.apache.org/jira/browse/HIVE-5964
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.10.0
Reporter: dima machlin


It seems like the optimization of predicate pushdown is failing under certain 
conditions causing wrong results as a filter predicate appears to be completely 
disregarded by the query processor for some reason.

Here is the scenario (assuming dual table exists) :

set hive.optimize.ppd=true;
drop table if exists test_tbl ;
create table test_tbl (id string,name string);

insert into table test_tbl
select 'a','b' from dual;

test_tbl now contains :
a b

the following query :
select t2.* 
from
(select id,name from (select id,name from test_tbl) t1 sort by id) t2
 join test_tbl t3 on (t2.id=t3.id )
where t2.name='c' and t3.id='a';

returns :
a b

The filter : t2.name='c'  is missing from the execution plan and obviously 
doesn't apply.
The filter t3.id='a'  does appear in the plan and is being applied before the 
join.

If the query changes a little bit like removing the sort by, removing the t1 
sub-query or disabling hive.optimize.ppd then the predicate appears.

I'm able to reproduce the problem both in Hive 0.10 and Hive 0.11 although It 
seems to work fine in Hive 0.7




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5607) Hive fails to parse the % (mod) sign after brackets.

2013-10-22 Thread dima machlin (JIRA)
dima machlin created HIVE-5607:
--

 Summary: Hive fails to parse the % (mod) sign after brackets.
 Key: HIVE-5607
 URL: https://issues.apache.org/jira/browse/HIVE-5607
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: dima machlin
Priority: Minor


the scenario :
create table t(a int);
select * from t order by (a)%7;

will fail with the following exception :
FAILED: ParseException line 1:28 mismatched input '%' expecting EOF near ')'

I must mention that this *does* work in 0.7.1 and doesn't work in 0.10



--
This message was sent by Atlassian JIRA
(v6.1#6144)