[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-12-02 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231935#comment-14231935
 ] 

Prasanth J commented on HIVE-:
--

Committed this patch to trunk. lvj_mapjoin.q ran successfully locally for me 
too. [~brocknoland] Can you reenable lvj_mapjoin.q test to see if it runs 
successfully now?

 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.1

 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
 HIVE-.4.patch, HIVE-.5.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8990) mapjoin_mapjoin.q is failing on Tez (missed golden file update)

2014-12-02 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8990:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 mapjoin_mapjoin.q is failing on Tez (missed golden file update)
 ---

 Key: HIVE-8990
 URL: https://issues.apache.org/jira/browse/HIVE-8990
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.15.0

 Attachments: HIVE-8990.1.patch


 mapjoin_mapjoin.q was updated (SORT_BEFORE_DIFF). However, since the tez test 
 were stuck the accompanying update to the golden file was missed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-12-01 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230905#comment-14230905
 ] 

Prasanth J commented on HIVE-:
--

Last patch looks good to me. +1

 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.1

 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
 HIVE-.4.patch, HIVE-.5.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q

2014-11-26 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226865#comment-14226865
 ] 

Prasanth J commented on HIVE-8975:
--

[~jcamachorodriguez] I see what the issue here is. That check (RS after GBY) 
was used to determine map-reduce boundary. The map-side GBY has different stats 
logic as compared to reduce side GBY. 
Now after the identity projection removal optimization
{code}
TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
TS[6]-FIL[17]-RS[10]-JOIN[11]
{code}

both GBY[2] and GBY[4] are identified as map-side GBY. I think we need to 
improve that if condition to better differentiate map-side and reduce-side GBY. 
Somewhat better check would be if RS is contained in upstream operators of GBY 
then that GBY is reduce side. In the above case GBY[4] contains RS[3] in its 
upstreams operators. Any thoughts?

 Possible performance regression on bucket_map_join_tez2.q
 -

 Key: HIVE-8975
 URL: https://issues.apache.org/jira/browse/HIVE-8975
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Statistics
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez

 After introducing the identity project removal optimization in HIVE-8435, 
 plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In 
 particular, earlier it was doing a map-join and after HIVE-8435 it changed to 
 a reduce-join.
 The query is the following one:
 {noformat}
 select a.key, b.key from (select distinct key from tab) a join tab b on b.key 
 = a.key
 {noformat}
 The plan before removing the projections is:
 {noformat}
 TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 And after removing identity projections:
 {noformat}
 TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 After digging a bit, I realized it is not converting the reduce-join into a 
 map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the 
 optimization does not kick in. 
 The reason for the stats change in the GroupBy operator is in [this 
 line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633],
  where it is checked whether the GBY is immediately followed by a RS operator 
 or not, and calculate stats differently depending on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7896) orcfiledump should be able to dump data

2014-11-26 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226898#comment-14226898
 ] 

Prasanth J commented on HIVE-7896:
--

LGTM, +1

 orcfiledump should be able to dump data
 ---

 Key: HIVE-7896
 URL: https://issues.apache.org/jira/browse/HIVE-7896
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, 
 alltypes2.txt


 The FileDumper utility in orc, exposed as a service as orcfiledump, can print 
 out metadata from Orc files but not the actual data.  Being able to dump the 
 data is also useful in some debugging contexts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q

2014-11-26 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227005#comment-14227005
 ] 

Prasanth J commented on HIVE-8975:
--

[~ashutoshc] What are all the possible modes for map-side and reduce-side? 
Stats calculation also has some logic for hash-aggregation enabled vs disabled. 
Is it safe to assume that if mode is HASH/PARTIAL it is map-side? And if the 
mode is FULL then reduce-side?
If so I can change the logic accordingly without depending on the child/parent 
checks in operator tree. 

 Possible performance regression on bucket_map_join_tez2.q
 -

 Key: HIVE-8975
 URL: https://issues.apache.org/jira/browse/HIVE-8975
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Statistics
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez

 After introducing the identity project removal optimization in HIVE-8435, 
 plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In 
 particular, earlier it was doing a map-join and after HIVE-8435 it changed to 
 a reduce-join.
 The query is the following one:
 {noformat}
 select a.key, b.key from (select distinct key from tab) a join tab b on b.key 
 = a.key
 {noformat}
 The plan before removing the projections is:
 {noformat}
 TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 And after removing identity projections:
 {noformat}
 TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 After digging a bit, I realized it is not converting the reduce-join into a 
 map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the 
 optimization does not kick in. 
 The reason for the stats change in the GroupBy operator is in [this 
 line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633],
  where it is checked whether the GBY is immediately followed by a RS operator 
 or not, and calculate stats differently depending on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8875) hive.optimize.sort.dynamic.partition should be turned off for ACID

2014-11-25 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225436#comment-14225436
 ] 

Prasanth J commented on HIVE-8875:
--

LGTM, +1

 hive.optimize.sort.dynamic.partition should be turned off for ACID
 --

 Key: HIVE-8875
 URL: https://issues.apache.org/jira/browse/HIVE-8875
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8875.2.patch, HIVE-8875.patch


 Turning this on causes ACID insert, updates, and deletes to produce 
 non-optimal plans with extra reduce phases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218549#comment-14218549
 ] 

Prasanth J commented on HIVE-:
--

[~hagleitn] Even I don't think the test failure is related. The code changes 
should not affect TestCliDriver tests. I ran the test locally and it ran 
successfully.

Also can we have this for 0.14.1?


 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
 HIVE-.4.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218809#comment-14218809
 ] 

Prasanth J commented on HIVE-:
--

Committed to trunk

 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.15.0

 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
 HIVE-.4.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-19 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

[~hagleitn]/[~ashutoshc] Should this go into 0.14.1?

 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.15.0

 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
 HIVE-.4.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-17 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-:
-
Attachment: HIVE-.3.patch

This patch bails out when the operator tree is visited again from the same root.

 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-16 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-:
-
Attachment: HIVE-.2.patch

Wrong if condition in previous patch.

 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-.1.patch, HIVE-.2.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-14 Thread Prasanth J (JIRA)
Prasanth J created HIVE-:


 Summary: Mapjoin with LateralViewJoin generates wrong plan in Tez
 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.13.0, 0.14.0, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J


Queries like these 
{code}
with sub1 as
(select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
sub2 as
(select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
select sub1.aid, sub1.avalue, sub2.bvalue
from sub1,sub2
where sub1.aid=sub2.bid;
{code}

generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-14 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-:
-
Attachment: HIVE-.1.patch

 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-.1.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-14 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-:
-
Status: Patch Available  (was: Open)

 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.13.0, 0.14.0, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-.1.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-14 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14213371#comment-14213371
 ] 

Prasanth J commented on HIVE-:
--

[~hagleitn] Can you take a look at the fix? https://reviews.apache.org/r/28086/

 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-.1.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8137) Empty ORC file handling

2014-11-13 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211545#comment-14211545
 ] 

Prasanth J commented on HIVE-8137:
--

[~pankit] I am also concerned about the changes with CombineHiveInputFormat. 
CombineHiveInputFormat already sets a PathFilter (CombineFilter) which filters 
out files from the paths. If I understand correctly adding another path filter 
(for filtering out empty files) to combine.createPool() should do the job.

 Empty ORC file handling
 ---

 Key: HIVE-8137
 URL: https://issues.apache.org/jira/browse/HIVE-8137
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.14.0

 Attachments: HIVE-8137.2.patch, HIVE-8137.patch


 Hive 13 does not handle reading of a zero size Orc File properly. An Orc file 
 is suposed to have a post-script
 which the ReaderIml class tries to read and initialize the footer with it. 
 But in case, the file is empty 
 or is of zero size, then it runs into an IndexOutOfBound Exception because of 
 ReaderImpl trying to read in its constructor.
 Code Snippet : 
 //get length of PostScript
 int psLen = buffer.get(readSize - 1)  0xff; 
 In the above code, readSize for an empty file is zero.
 I see that ensureOrcFooter() method performs some sanity checks for footer , 
 so, either we can move the above code snippet to ensureOrcFooter() and throw 
 a Malformed ORC file exception or we can create a dummy Reader that does 
 not initialize footer and basically has hasNext() set to false so that it 
 returns false on the first call.
 Basically, I would like to know what might be the correct way to handle an 
 empty ORC file in a mapred job?
 Should we neglect it and not throw an exception or we can throw an exeption 
 that the ORC file is malformed.
 Please let me know your thoughts on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8801) Make orc_merge_incompat1.q deterministic across platforms

2014-11-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8801:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Test failures are unrelated. Committed to trunk.

 Make orc_merge_incompat1.q deterministic across platforms
 -

 Key: HIVE-8801
 URL: https://issues.apache.org/jira/browse/HIVE-8801
 Project: Hive
  Issue Type: Test
Affects Versions: 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.15.0

 Attachments: HIVE-8801.1.patch, HIVE-8801.2.patch


 orc_merge_incompat1.q tests for ORC fast file merge when there are 
 incompatible files in a partition. The outcome of merge will be dependent on 
 order of the files that CombineHiveInputFormat passes on to 
 OrcFileMergeOperator. Since the ordering of files is not guaranteed the 
 result of merge operation will be different across different OS'es.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-11-10 Thread Prasanth J (JIRA)
Prasanth J created HIVE-8809:


 Summary: Activate maven profile hadoop-2 by default
 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor


For every maven command profile needs to be specified explicitly. It will be 
better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
profile. With this change both the following commands will be equivalent
{code}
mvn clean install -DskipTests
mvn clean install -DskipTests -Phadoop-2
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-11-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8809:
-
Attachment: HIVE-8809.1.patch

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Attachments: HIVE-8809.1.patch


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-11-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8809:
-
Status: Patch Available  (was: Open)

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Attachments: HIVE-8809.1.patch


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-11-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8809:
-
Attachment: dep_itests_without_hadoop_2.txt
dep_itests_with_hadoop_2.txt
dep_without_hadoop_2.txt
dep_with_hadoop_2.txt

Attaching the output of mvn dependency:tree with and without specify -Phadoop-2 
explicitly. The dependency tree looks exactly the same. One thing I am not sure 
is why hive shims common dependency tree is showing hadoop-core. Following 
dependency is not within a profile in hive/shims/commom/pom.xml
{code}
dependency
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-core/artifactId
  version${hadoop-20.version}/version
  optionaltrue/optional
/dependency
{code}

[~brocknoland] Any idea why?
Also how to check if the issue mentioned in HIVE-5755 does not happen? Atleast 
from dependency tree it doesn't seem to happen.

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
 Attachments: HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, 
 dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, 
 dep_without_hadoop_2.txt


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8736) add ordering to cbo_correctness to make result consistent

2014-11-08 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8736:
-
Fix Version/s: 0.15.0

 add ordering to cbo_correctness to make result consistent
 -

 Key: HIVE-8736
 URL: https://issues.apache.org/jira/browse/HIVE-8736
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8736.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8799) boatload of missing apache headers

2014-11-08 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203726#comment-14203726
 ] 

Prasanth J commented on HIVE-8799:
--

The change in pom.xml, exclude**/sit/exclude did you mean **/site 
directory?

 boatload of missing apache headers
 --

 Key: HIVE-8799
 URL: https://issues.apache.org/jira/browse/HIVE-8799
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-8799.1.patch


 Adding missing apache headers to a number of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8799) boatload of missing apache headers

2014-11-08 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203729#comment-14203729
 ] 

Prasanth J commented on HIVE-8799:
--

ha ha :) completely self-contained name.

 boatload of missing apache headers
 --

 Key: HIVE-8799
 URL: https://issues.apache.org/jira/browse/HIVE-8799
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-8799.1.patch


 Adding missing apache headers to a number of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8800) Update release notes and notice for hive .14

2014-11-08 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203744#comment-14203744
 ] 

Prasanth J commented on HIVE-8800:
--

+1

 Update release notes and notice for hive .14
 

 Key: HIVE-8800
 URL: https://issues.apache.org/jira/browse/HIVE-8800
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-8800.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8801) Make orc_merge_incompat1.q deterministic across platforms

2014-11-08 Thread Prasanth J (JIRA)
Prasanth J created HIVE-8801:


 Summary: Make orc_merge_incompat1.q deterministic across platforms
 Key: HIVE-8801
 URL: https://issues.apache.org/jira/browse/HIVE-8801
 Project: Hive
  Issue Type: Test
Affects Versions: 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J


orc_merge_incompat1.q tests for ORC fast file merge when there are incompatible 
files in a partition. The outcome of merge will be dependent on order of the 
files that CombineHiveInputFormat passes on to OrcFileMergeOperator. Since the 
ordering of files is not guaranteed the result of merge operation will be 
different across different OS'es.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8801) Make orc_merge_incompat1.q deterministic across platforms

2014-11-08 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8801:
-
Status: Patch Available  (was: Open)

 Make orc_merge_incompat1.q deterministic across platforms
 -

 Key: HIVE-8801
 URL: https://issues.apache.org/jira/browse/HIVE-8801
 Project: Hive
  Issue Type: Test
Affects Versions: 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8801.1.patch


 orc_merge_incompat1.q tests for ORC fast file merge when there are 
 incompatible files in a partition. The outcome of merge will be dependent on 
 order of the files that CombineHiveInputFormat passes on to 
 OrcFileMergeOperator. Since the ordering of files is not guaranteed the 
 result of merge operation will be different across different OS'es.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8801) Make orc_merge_incompat1.q deterministic across platforms

2014-11-08 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8801:
-
Attachment: HIVE-8801.1.patch

Added one more file to partition. Now there are 3 files written with 0.11 
version and 3 files written with 0.12 version. The outcome of merge will be 4 
files independent of which input file is chosen first.

 Make orc_merge_incompat1.q deterministic across platforms
 -

 Key: HIVE-8801
 URL: https://issues.apache.org/jira/browse/HIVE-8801
 Project: Hive
  Issue Type: Test
Affects Versions: 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8801.1.patch


 orc_merge_incompat1.q tests for ORC fast file merge when there are 
 incompatible files in a partition. The outcome of merge will be dependent on 
 order of the files that CombineHiveInputFormat passes on to 
 OrcFileMergeOperator. Since the ordering of files is not guaranteed the 
 result of merge operation will be different across different OS'es.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8801) Make orc_merge_incompat1.q deterministic across platforms

2014-11-08 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8801:
-
Attachment: HIVE-8801.2.patch

Missed the diff for tez test.

 Make orc_merge_incompat1.q deterministic across platforms
 -

 Key: HIVE-8801
 URL: https://issues.apache.org/jira/browse/HIVE-8801
 Project: Hive
  Issue Type: Test
Affects Versions: 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8801.1.patch, HIVE-8801.2.patch


 orc_merge_incompat1.q tests for ORC fast file merge when there are 
 incompatible files in a partition. The outcome of merge will be dependent on 
 order of the files that CombineHiveInputFormat passes on to 
 OrcFileMergeOperator. Since the ordering of files is not guaranteed the 
 result of merge operation will be different across different OS'es.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-07 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202230#comment-14202230
 ] 

Prasanth J commented on HIVE-8732:
--

I have verified that file version in file dump with old orc formats. 

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8779) Tez in-place progress UI can show wrong estimated time for sub-second queries

2014-11-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8779:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14

 Tez in-place progress UI can show wrong estimated time for sub-second queries
 -

 Key: HIVE-8779
 URL: https://issues.apache.org/jira/browse/HIVE-8779
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Trivial
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8779.1.patch


 The in-place progress update UI added as part of HIVE-8495 can show wrong 
 estimated time for AM only job which goes from INITED to SUCCEEDED DAG state 
 directly without going to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8778) ORC split elimination can cause NPE when column statistics is null

2014-11-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8778:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14

 ORC split elimination can cause NPE when column statistics is null
 --

 Key: HIVE-8778
 URL: https://issues.apache.org/jira/browse/HIVE-8778
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8778.1.patch


 Row group elimination has protection for NULL statistics values in 
 RecordReaderImpl.evaluatePredicate() which then calls 
 evaluatePredicateRange(). But split elimination directly calls 
 evaluatePredicateRange() without NULL protection. This can lead to 
 NullPointerException when a column is NULL in entire stripe. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk

2014-11-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200612#comment-14200612
 ] 

Prasanth J commented on HIVE-8753:
--

+1

 TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
 -

 Key: HIVE-8753
 URL: https://issues.apache.org/jira/browse/HIVE-8753
 Project: Hive
  Issue Type: Test
  Components: Logical Optimizer
Affects Versions: 0.15.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8753.patch


 Because of HIVE-7111 
 needs .q.out update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large

2014-11-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200683#comment-14200683
 ] 

Prasanth J commented on HIVE-8744:
--

HIVE-8735 is also addressing the same problem. Usually the client which 
publishes provides the key (FSOperator, StatsTask) has some logic to trim down 
the length of the key using MD5 hash. If the key gets greater than max stats 
key prefix (from hive config), Utilities.getHashedPrefixKey() method is invoked 
to get a smaller length key. Can you try with the patch from HIVE-8735 to see 
if the test case works? HIVE-8735 truncates the key before publishing.

 hbase_stats3.q test fails when paths stored at 
 JDBCStatsUtils.getIdColumnName() are too large
 -

 Key: HIVE-8744
 URL: https://issues.apache.org/jira/browse/HIVE-8744
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch


 This test is related to the bug HIVE-8065 where I am trying to support HDFS 
 encryption. One of the enhancements to support it is to create a 
 .hive-staging directory on the same table directory location where the query 
 is executed.
 Now, when running the hbase_stats3.q test from a temporary directory that has 
 a large path, then the new path, a combination of table location + 
 .hive-staging + random temporary subdirectories, is too large to fit into the 
 statistics table, so the path is truncated.
 This causes the following error:
 {noformat}
 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
 jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
 during publishing statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   

[jira] [Commented] (HIVE-8556) introduce overflow control and sanity check to BytesBytesMapJoin

2014-11-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201047#comment-14201047
 ] 

Prasanth J commented on HIVE-8556:
--

+1

 introduce overflow control and sanity check to BytesBytesMapJoin
 

 Key: HIVE-8556
 URL: https://issues.apache.org/jira/browse/HIVE-8556
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-8556.patch


 When stats are incorrect, negative or very large number can be passed to the 
 map



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large

2014-11-06 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8744:
-
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Thanks [~spena] for confirming! I will close this issue as duplicate of 
HIVE-8735.

 hbase_stats3.q test fails when paths stored at 
 JDBCStatsUtils.getIdColumnName() are too large
 -

 Key: HIVE-8744
 URL: https://issues.apache.org/jira/browse/HIVE-8744
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch


 This test is related to the bug HIVE-8065 where I am trying to support HDFS 
 encryption. One of the enhancements to support it is to create a 
 .hive-staging directory on the same table directory location where the query 
 is executed.
 Now, when running the hbase_stats3.q test from a temporary directory that has 
 a large path, then the new path, a combination of table location + 
 .hive-staging + random temporary subdirectories, is too large to fit into the 
 statistics table, so the path is truncated.
 This causes the following error:
 {noformat}
 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
 jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
 during publishing statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
   ... 30 more
 Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
 VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to 
 length 255.
   at 

[jira] [Commented] (HIVE-8735) statistics update can fail due to long paths

2014-11-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201070#comment-14201070
 ] 

Prasanth J commented on HIVE-8735:
--

[~hagleitn] Can we have this for 0.14? This fixes test failures related to 
stats publishing. Same issue in HIVE-8744 as well.

 statistics update can fail due to long paths
 

 Key: HIVE-8735
 URL: https://issues.apache.org/jira/browse/HIVE-8735
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch


 {noformat}
 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher 
 (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing 
 statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
   ... 31 more
 Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
 VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to 
 length 255.
   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
 Source)
   at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source)
   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
   at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeColumn(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown 
 Source)
   at 
 

[jira] [Created] (HIVE-8771) Abstract merge file operator does not move/rename incompatible files correctly

2014-11-06 Thread Prasanth J (JIRA)
Prasanth J created HIVE-8771:


 Summary: Abstract merge file operator does not move/rename 
incompatible files correctly
 Key: HIVE-8771
 URL: https://issues.apache.org/jira/browse/HIVE-8771
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0


AbstractFileMergeOperator moves incompatible files (files which cannot be 
merged) to final destination. The destination path must be directory instead of 
file. This causes orc_merge_incompat2.q to fail under CentOS with IOException 
failing to rename/move files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8771) Abstract merge file operator does not move/rename incompatible files correctly

2014-11-06 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8771:
-
Status: Patch Available  (was: Open)

 Abstract merge file operator does not move/rename incompatible files correctly
 --

 Key: HIVE-8771
 URL: https://issues.apache.org/jira/browse/HIVE-8771
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8771.1.patch


 AbstractFileMergeOperator moves incompatible files (files which cannot be 
 merged) to final destination. The destination path must be directory instead 
 of file. This causes orc_merge_incompat2.q to fail under CentOS with 
 IOException failing to rename/move files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8771) Abstract merge file operator does not move/rename incompatible files correctly

2014-11-06 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8771:
-
Attachment: HIVE-8771.1.patch

 Abstract merge file operator does not move/rename incompatible files correctly
 --

 Key: HIVE-8771
 URL: https://issues.apache.org/jira/browse/HIVE-8771
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8771.1.patch


 AbstractFileMergeOperator moves incompatible files (files which cannot be 
 merged) to final destination. The destination path must be directory instead 
 of file. This causes orc_merge_incompat2.q to fail under CentOS with 
 IOException failing to rename/move files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8735) statistics update can fail due to long paths

2014-11-06 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8735:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14

 statistics update can fail due to long paths
 

 Key: HIVE-8735
 URL: https://issues.apache.org/jira/browse/HIVE-8735
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch


 {noformat}
 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher 
 (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing 
 statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
   ... 31 more
 Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
 VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to 
 length 255.
   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
 Source)
   at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source)
   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
   at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeColumn(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(Unknown 
 Source)
   at 
 

[jira] [Updated] (HIVE-8771) Abstract merge file operator does not move/rename incompatible files correctly

2014-11-06 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8771:
-
Description: 
AbstractFileMergeOperator moves incompatible files (files which cannot be 
merged) to final destination. The destination path must be directory instead of 
file. This causes orc_merge_incompat2.q to fail under CentOS with IOException 
failing to rename/move files.
Stack trace:
{code}
2014-11-05 02:38:56,588 DEBUG fs.FileSystem 
(RawLocalFileSystem.java:rename(337)) - Falling through to a copy of 
file:/home/prasanth/hive/itests/qtest/target/warehouse/orc_merge5a/st=80.0/00_0
 to 
file:/home/prasanth/hive/itests/qtest/target/tmp/scratchdir/prasanth/0de64e52-6615-4c5a-bdfb-c3b2c28131f6/hive_2014-11-05_02-38-55_511_7578595409877157627-1/_tmp.-ext-1/00_0/00_0
2014-11-05 02:38:56,589 INFO  mapred.LocalJobRunner 
(LocalJobRunner.java:runTasks(456)) - map task executor complete.
2014-11-05 02:38:56,590 WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(560)) - job_local1144733438_0036
java.lang.Exception: java.io.IOException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
AbstractFileMergeOperator
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
AbstractFileMergeOperator
at 
org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:100)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
AbstractFileMergeOperator
at 
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:233)
at 
org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:220)
at 
org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98)
... 10 more
Caused by: java.io.FileNotFoundException: Destination exists and is not a 
directory: 
/home/prasanth/hive/itests/qtest/target/tmp/scratchdir/prasanth/0de64e52-6615-4c5a-bdfb-c3b2c28131f6/hive_2014-11-05_02-38-55_511_7578595409877157627-1/_tmp.-ext-1/00_0
at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:423)
at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:267)
at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:257)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
at 
org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:339)
at 
org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:507)
at 
org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:214)
at org.apache.hadoop.fs.ProxyFileSystem.rename(ProxyFileSystem.java:177)
at 
org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:214)
at 
org.apache.hadoop.hive.ql.exec.Utilities.renameOrMoveFiles(Utilities.java:1589)
at 
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:218)
... 12 more

{code}

  was:AbstractFileMergeOperator moves incompatible files (files which cannot be 
merged) to final destination. The destination path must be directory instead of 
file. This causes orc_merge_incompat2.q to fail under CentOS with IOException 
failing to rename/move files.


 Abstract merge file operator does not move/rename incompatible files correctly
 --

 Key: HIVE-8771
 URL: https://issues.apache.org/jira/browse/HIVE-8771
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
  

[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201442#comment-14201442
 ] 

Prasanth J commented on HIVE-8732:
--

The new changes looks good to me. +1.
Can you create a followup for dealing with NaN in double column statistics?

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8778) ORC split elimination can cause NPE when column statistics is null

2014-11-06 Thread Prasanth J (JIRA)
Prasanth J created HIVE-8778:


 Summary: ORC split elimination can cause NPE when column 
statistics is null
 Key: HIVE-8778
 URL: https://issues.apache.org/jira/browse/HIVE-8778
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0


Row group elimination has protection for NULL statistics values in 
RecordReaderImpl.evaluatePredicate() which then calls evaluatePredicateRange(). 
But split elimination directly calls evaluatePredicateRange() without NULL 
protection. This can lead to NullPointerException when a column is NULL in 
entire stripe. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8778) ORC split elimination can cause NPE when column statistics is null

2014-11-06 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8778:
-
Attachment: HIVE-8778.1.patch

[~owen.omalley]/[~gopalv] Can someone take a look at this patch?

 ORC split elimination can cause NPE when column statistics is null
 --

 Key: HIVE-8778
 URL: https://issues.apache.org/jira/browse/HIVE-8778
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8778.1.patch


 Row group elimination has protection for NULL statistics values in 
 RecordReaderImpl.evaluatePredicate() which then calls 
 evaluatePredicateRange(). But split elimination directly calls 
 evaluatePredicateRange() without NULL protection. This can lead to 
 NullPointerException when a column is NULL in entire stripe. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8778) ORC split elimination can cause NPE when column statistics is null

2014-11-06 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8778:
-
Status: Patch Available  (was: Open)

 ORC split elimination can cause NPE when column statistics is null
 --

 Key: HIVE-8778
 URL: https://issues.apache.org/jira/browse/HIVE-8778
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8778.1.patch


 Row group elimination has protection for NULL statistics values in 
 RecordReaderImpl.evaluatePredicate() which then calls 
 evaluatePredicateRange(). But split elimination directly calls 
 evaluatePredicateRange() without NULL protection. This can lead to 
 NullPointerException when a column is NULL in entire stripe. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8779) Tez in-place progress UI can show wrong estimated time for AM only job

2014-11-06 Thread Prasanth J (JIRA)
Prasanth J created HIVE-8779:


 Summary: Tez in-place progress UI can show wrong estimated time 
for AM only job
 Key: HIVE-8779
 URL: https://issues.apache.org/jira/browse/HIVE-8779
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Trivial


The in-place progress update UI added as part of HIVE-8495 can show wrong 
estimated time for AM only job which goes from INITED to SUCCEEDED DAG state 
directly without going to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8779) Tez in-place progress UI can show wrong estimated time for AM only job

2014-11-06 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8779:
-
Attachment: HIVE-8779.1.patch

 Tez in-place progress UI can show wrong estimated time for AM only job
 --

 Key: HIVE-8779
 URL: https://issues.apache.org/jira/browse/HIVE-8779
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Trivial
 Attachments: HIVE-8779.1.patch


 The in-place progress update UI added as part of HIVE-8495 can show wrong 
 estimated time for AM only job which goes from INITED to SUCCEEDED DAG state 
 directly without going to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8779) Tez in-place progress UI can show wrong estimated time for AM only job

2014-11-06 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8779:
-
Status: Patch Available  (was: Open)

 Tez in-place progress UI can show wrong estimated time for AM only job
 --

 Key: HIVE-8779
 URL: https://issues.apache.org/jira/browse/HIVE-8779
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Trivial
 Attachments: HIVE-8779.1.patch


 The in-place progress update UI added as part of HIVE-8495 can show wrong 
 estimated time for AM only job which goes from INITED to SUCCEEDED DAG state 
 directly without going to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8781) Nullsafe joins are busted on Tez

2014-11-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201711#comment-14201711
 ] 

Prasanth J commented on HIVE-8781:
--

LGTM, +1

 Nullsafe joins are busted on Tez
 

 Key: HIVE-8781
 URL: https://issues.apache.org/jira/browse/HIVE-8781
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8781.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8781) Nullsafe joins are busted on Tez

2014-11-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201712#comment-14201712
 ] 

Prasanth J commented on HIVE-8781:
--

Pending tests

 Nullsafe joins are busted on Tez
 

 Key: HIVE-8781
 URL: https://issues.apache.org/jira/browse/HIVE-8781
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8781.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8716) Partition filters are not pushed down with lateral view

2014-11-05 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8716:
-
Attachment: HIVE-8716.2.patch

 Partition filters are not pushed down with lateral view
 ---

 Key: HIVE-8716
 URL: https://issues.apache.org/jira/browse/HIVE-8716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Attachments: HIVE-8716.1.patch, HIVE-8716.2.patch


 Changes to HIVE-8454 revealed issues with partition filters not being pushed 
 down in case of lateral view. For more info see discussion in HIVE-5718.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8727) Dag summary has incorrect row counts and duration per vertex

2014-11-05 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8727:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14

 Dag summary has incorrect row counts and duration per vertex
 

 Key: HIVE-8727
 URL: https://issues.apache.org/jira/browse/HIVE-8727
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8727.1.patch


 During the code review for HIVE-8495 some code was reworked which broke some 
 of INPUT/OUTPUT counters and duration.
 Patch attached which fixes that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-05 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198987#comment-14198987
 ] 

Prasanth J commented on HIVE-8740:
--

Ah ok!. With hive convention of INSERT INTO I was thinking new rows will be 
appended to the existing partition and not replacing it.

 Sorted dynamic partition does not work correctly with constant folding
 --

 Key: HIVE-8740
 URL: https://issues.apache.org/jira/browse/HIVE-8740
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8740.1.patch


 Sorted dynamic partition optimization looks for partition columns from the 
 operator above FileSinkOperator. As per hive convention it expects partition 
 columns at the last. But with HIVE-8585 equality filters on partition columns 
 gets folded to constant. The column pruner then prunes the constant 
 expression as they don't reference any columns. This in some cases will yield 
 unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
 dynamic partition insert optimization. In such we don't really need sorted 
 dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8735) statistics update can fail due to long paths

2014-11-05 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199107#comment-14199107
 ] 

Prasanth J commented on HIVE-8735:
--

Some comments in RB

 statistics update can fail due to long paths
 

 Key: HIVE-8735
 URL: https://issues.apache.org/jira/browse/HIVE-8735
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8735.patch


 {noformat}
 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher 
 (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing 
 statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
   ... 31 more
 Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
 VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to 
 length 255.
   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
 Source)
   at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source)
   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
   at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeColumn(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.DMLWriteResultSet.getNextRowCore(Unknown 
 Source)
   at 

[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-05 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8740:
-
Attachment: HIVE-8740.2.patch

Thanks for the clarification [~alangates]. That was my mistake.. I should have 
added where value = 'bar' to the predicate to get the result that I was 
expecting. Updated the queries in this new patch.

 Sorted dynamic partition does not work correctly with constant folding
 --

 Key: HIVE-8740
 URL: https://issues.apache.org/jira/browse/HIVE-8740
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch


 Sorted dynamic partition optimization looks for partition columns from the 
 operator above FileSinkOperator. As per hive convention it expects partition 
 columns at the last. But with HIVE-8585 equality filters on partition columns 
 gets folded to constant. The column pruner then prunes the constant 
 expression as they don't reference any columns. This in some cases will yield 
 unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
 dynamic partition insert optimization. In such we don't really need sorted 
 dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-05 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8740:
-
Description: Sorted dynamic partition optimization looks for partition 
columns from the operator above FileSinkOperator. As per hive convention it 
expects partition columns at the last. But with HIVE-8585 equality filters on 
partition columns gets folded to constant. The column pruner then prunes the 
constant expression as they don't reference any columns. This in some cases 
will yield unexpected results (throw ArrayIndexOutOfBounds exception) with 
sorted dynamic partition insert optimization. In such cases we don't really 
need sorted dynamic partition optimization.  (was: Sorted dynamic partition 
optimization looks for partition columns from the operator above 
FileSinkOperator. As per hive convention it expects partition columns at the 
last. But with HIVE-8585 equality filters on partition columns gets folded to 
constant. The column pruner then prunes the constant expression as they don't 
reference any columns. This in some cases will yield unexpected results (throw 
ArrayIndexOutOfBounds exception) with sorted dynamic partition insert 
optimization. In such we don't really need sorted dynamic partition 
optimization.)

 Sorted dynamic partition does not work correctly with constant folding
 --

 Key: HIVE-8740
 URL: https://issues.apache.org/jira/browse/HIVE-8740
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch


 Sorted dynamic partition optimization looks for partition columns from the 
 operator above FileSinkOperator. As per hive convention it expects partition 
 columns at the last. But with HIVE-8585 equality filters on partition columns 
 gets folded to constant. The column pruner then prunes the constant 
 expression as they don't reference any columns. This in some cases will yield 
 unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
 dynamic partition insert optimization. In such cases we don't really need 
 sorted dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-05 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8740:
-
Attachment: HIVE-8740.3.patch

Added more tests to cover cases where sorted dynamic partition is enabled and 
constant propagation is disabled to make sure the generated plan and results 
are correct.

 Sorted dynamic partition does not work correctly with constant folding
 --

 Key: HIVE-8740
 URL: https://issues.apache.org/jira/browse/HIVE-8740
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch, HIVE-8740.3.patch


 Sorted dynamic partition optimization looks for partition columns from the 
 operator above FileSinkOperator. As per hive convention it expects partition 
 columns at the last. But with HIVE-8585 equality filters on partition columns 
 gets folded to constant. The column pruner then prunes the constant 
 expression as they don't reference any columns. This in some cases will yield 
 unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
 dynamic partition insert optimization. In such cases we don't really need 
 sorted dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8747) Estimate number of rows for table with 0 rows overflows resulting in an in-efficient plan

2014-11-05 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J resolved HIVE-8747.
--
Resolution: Cannot Reproduce

Can't reproduce the issue. Please reopen it if the case is reproducible.

 Estimate number of rows for table with 0 rows overflows resulting in an 
 in-efficient plan 
 --

 Key: HIVE-8747
 URL: https://issues.apache.org/jira/browse/HIVE-8747
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0


 ship_mode table has 0 rows.
 Query 
 {code}
 select count(*) 
 from
   web_sales
  ,date_dim
 ,ship_mode
  where
  web_sales.ws_sold_date_sk = date_dim.d_date_sk
   and web_sales.ws_ship_mode_sk = ship_mode.sm_ship_mode_sk
 and d_year = 2002
   and sm_carrier in ('DIAMOND','AIRBORNE')
 {code}
 Explain 
 {code}
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 4 (BROADCAST_EDGE)
 Map 4 - Map 3 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
   DagName: mmokhtar_20141105180404_59e6fb65-529f-4eaa-9446-7f34d12bffac:30
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: ship_mode
   filterExpr: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and 
 sm_ship_mode_sk is not null) (type: boolean)
   Statistics: Num rows: 0 Data size: 45 Basic stats: PARTIAL 
 Column stats: COMPLETE
   Filter Operator
 predicate: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and 
 sm_ship_mode_sk is not null) (type: boolean)
 Statistics: Num rows: 9223372036854775807 Data size: 
 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: sm_ship_mode_sk (type: int)
   outputColumnNames: _col0
   Statistics: Num rows: 9223372036854775807 Data size: 
 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE
   Map Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0
   1
 keys:
   0 _col1 (type: int)
   1 _col0 (type: int)
 input vertices:
   0 Map 4
 Statistics: Num rows: 9223372036854775807 Data size: 
 0 Basic stats: PARTIAL Column stats: COMPLETE
 Select Operator
   Statistics: Num rows: 9223372036854775807 Data 
 size: 0 Basic stats: PARTIAL Column stats: COMPLETE
   Group By Operator
 aggregations: count()
 mode: hash
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 8 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Reduce Output Operator
   sort order:
   Statistics: Num rows: 1 Data size: 8 Basic 
 stats: COMPLETE Column stats: COMPLETE
   value expressions: _col0 (type: bigint)
 Execution mode: vectorized
 Map 3
 Map Operator Tree:
 TableScan
   alias: date_dim
   filterExpr: ((d_year = 2002) and d_date_sk is not null) 
 (type: boolean)
   Statistics: Num rows: 73049 Data size: 81741831 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ((d_year = 2002) and d_date_sk is not null) 
 (type: boolean)
 Statistics: Num rows: 652 Data size: 5216 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: d_date_sk (type: int)
   outputColumnNames: _col0
   Statistics: Num rows: 652 Data size: 2608 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: int)
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 652 Data size: 2608 Basic 
 stats: COMPLETE Column stats: COMPLETE
 Execution mode: vectorized
 Map 4
 Map Operator Tree:
   

[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-05 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8740:
-
Attachment: HIVE-8740.4.patch

Rebased patch after HIVE-8716 commit.

 Sorted dynamic partition does not work correctly with constant folding
 --

 Key: HIVE-8740
 URL: https://issues.apache.org/jira/browse/HIVE-8740
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch, HIVE-8740.3.patch, 
 HIVE-8740.4.patch


 Sorted dynamic partition optimization looks for partition columns from the 
 operator above FileSinkOperator. As per hive convention it expects partition 
 columns at the last. But with HIVE-8585 equality filters on partition columns 
 gets folded to constant. The column pruner then prunes the constant 
 expression as they don't reference any columns. This in some cases will yield 
 unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
 dynamic partition insert optimization. In such cases we don't really need 
 sorted dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-05 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8740:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14

 Sorted dynamic partition does not work correctly with constant folding
 --

 Key: HIVE-8740
 URL: https://issues.apache.org/jira/browse/HIVE-8740
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch, HIVE-8740.3.patch, 
 HIVE-8740.4.patch


 Sorted dynamic partition optimization looks for partition columns from the 
 operator above FileSinkOperator. As per hive convention it expects partition 
 columns at the last. But with HIVE-8585 equality filters on partition columns 
 gets folded to constant. The column pruner then prunes the constant 
 expression as they don't reference any columns. This in some cases will yield 
 unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
 dynamic partition insert optimization. In such cases we don't really need 
 sorted dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8735) statistics update can fail due to long paths

2014-11-05 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199705#comment-14199705
 ] 

Prasanth J commented on HIVE-8735:
--

+1. Will be good if you can add some tests.

 statistics update can fail due to long paths
 

 Key: HIVE-8735
 URL: https://issues.apache.org/jira/browse/HIVE-8735
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch


 {noformat}
 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher 
 (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing 
 statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
   ... 31 more
 Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
 VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to 
 length 255.
   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
 Source)
   at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source)
   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
   at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeColumn(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown 
 Source)
   at 
 org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(Unknown 
 Source)
   at 
 

[jira] [Commented] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es

2014-11-04 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196623#comment-14196623
 ] 

Prasanth J commented on HIVE-8720:
--

[~hagleitn] Can we have this for 0.14? These are just test file diffs to make 
the qfile results consistent across platforms.

 Update orc_merge tests to make it consistent across OS'es
 -

 Key: HIVE-8720
 URL: https://issues.apache.org/jira/browse/HIVE-8720
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8720.1.patch, orc_merge5_filedump_macosx.txt, 
 orc_merge5_filedump_opensuse.txt


 orc_merge*.q test cases fails with qfile diffs related to file size on 
 different OSes. I have seen failures with Open SUSE and CentOS. The order of 
 insertion of rows into ORC table impacts the file size because of run length 
 encoding. Since the order of rows is not guaranteed during insertion into 
 table we may get different file sizes. We cannot add ORDER BY to insert 
 queries as it will force insertion through single reducer which will disable 
 orc merge file optimization. Since these test cases test if the files are 
 merged or not it is sufficient to know the number of files after merging. 
 Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can 
 use dfs -ls to know the number of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8727) Dag summary has incorrect row counts and duration per vertex

2014-11-04 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8727:
-
Status: Patch Available  (was: Open)

 Dag summary has incorrect row counts and duration per vertex
 

 Key: HIVE-8727
 URL: https://issues.apache.org/jira/browse/HIVE-8727
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-8727.1.patch


 During the code review for HIVE-8495 some code was reworked which broke some 
 of INPUT/OUTPUT counters and duration.
 Patch attached which fixes that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8727) Dag summary has incorrect row counts and duration per vertex

2014-11-04 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196665#comment-14196665
 ] 

Prasanth J commented on HIVE-8727:
--

+1

 Dag summary has incorrect row counts and duration per vertex
 

 Key: HIVE-8727
 URL: https://issues.apache.org/jira/browse/HIVE-8727
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-8727.1.patch


 During the code review for HIVE-8495 some code was reworked which broke some 
 of INPUT/OUTPUT counters and duration.
 Patch attached which fixes that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8727) Dag summary has incorrect row counts and duration per vertex

2014-11-04 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196697#comment-14196697
 ] 

Prasanth J commented on HIVE-8727:
--

[~hagleitn] HIVE-8495 broke output of dag summary. Can we have this for 0.14?

 Dag summary has incorrect row counts and duration per vertex
 

 Key: HIVE-8727
 URL: https://issues.apache.org/jira/browse/HIVE-8727
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-8727.1.patch


 During the code review for HIVE-8495 some code was reworked which broke some 
 of INPUT/OUTPUT counters and duration.
 Patch attached which fixes that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-04 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197045#comment-14197045
 ] 

Prasanth J commented on HIVE-8732:
--

LGTM, +1. Pending tests

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-04 Thread Prasanth J (JIRA)
Prasanth J created HIVE-8740:


 Summary: Sorted dynamic partition does not work correctly with 
constant folding
 Key: HIVE-8740
 URL: https://issues.apache.org/jira/browse/HIVE-8740
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J


Sorted dynamic partition optimization looks for partition columns from the 
operator above FileSinkOperator. As per hive convention it expects partition 
columns at the last. But with HIVE-8585 equality filters on partition columns 
gets folded to constant. The column pruner then prunes the constant expression 
as they don't reference any columns. This in some cases will yield unexpected 
results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition 
insert optimization. In such we don't really need sorted dynamic partition 
optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-04 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8740:
-
Attachment: HIVE-8740.1.patch

 Sorted dynamic partition does not work correctly with constant folding
 --

 Key: HIVE-8740
 URL: https://issues.apache.org/jira/browse/HIVE-8740
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8740.1.patch


 Sorted dynamic partition optimization looks for partition columns from the 
 operator above FileSinkOperator. As per hive convention it expects partition 
 columns at the last. But with HIVE-8585 equality filters on partition columns 
 gets folded to constant. The column pruner then prunes the constant 
 expression as they don't reference any columns. This in some cases will yield 
 unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
 dynamic partition insert optimization. In such we don't really need sorted 
 dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-04 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197718#comment-14197718
 ] 

Prasanth J commented on HIVE-8740:
--

[~alangates] Can you look at the test case that I added in this patch? Its 
related to ACID DELETE operation. After deleting a newly added row, the select 
count\(*\) query return 0 rows instead of actual 1000 rows. Is this a bug/known 
issue or am I doing something wrong?

 Sorted dynamic partition does not work correctly with constant folding
 --

 Key: HIVE-8740
 URL: https://issues.apache.org/jira/browse/HIVE-8740
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8740.1.patch


 Sorted dynamic partition optimization looks for partition columns from the 
 operator above FileSinkOperator. As per hive convention it expects partition 
 columns at the last. But with HIVE-8585 equality filters on partition columns 
 gets folded to constant. The column pruner then prunes the constant 
 expression as they don't reference any columns. This in some cases will yield 
 unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
 dynamic partition insert optimization. In such we don't really need sorted 
 dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-04 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8740:
-
Status: Patch Available  (was: Open)

 Sorted dynamic partition does not work correctly with constant folding
 --

 Key: HIVE-8740
 URL: https://issues.apache.org/jira/browse/HIVE-8740
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8740.1.patch


 Sorted dynamic partition optimization looks for partition columns from the 
 operator above FileSinkOperator. As per hive convention it expects partition 
 columns at the last. But with HIVE-8585 equality filters on partition columns 
 gets folded to constant. The column pruner then prunes the constant 
 expression as they don't reference any columns. This in some cases will yield 
 unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
 dynamic partition insert optimization. In such we don't really need sorted 
 dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8716) Partition filters are not pushed down with lateral view

2014-11-04 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197744#comment-14197744
 ] 

Prasanth J commented on HIVE-8716:
--

I will look into the test failures.

 Partition filters are not pushed down with lateral view
 ---

 Key: HIVE-8716
 URL: https://issues.apache.org/jira/browse/HIVE-8716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Attachments: HIVE-8716.1.patch


 Changes to HIVE-8454 revealed issues with partition filters not being pushed 
 down in case of lateral view. For more info see discussion in HIVE-5718.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es

2014-11-04 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8720:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14.

 Update orc_merge tests to make it consistent across OS'es
 -

 Key: HIVE-8720
 URL: https://issues.apache.org/jira/browse/HIVE-8720
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8720.1.patch, orc_merge5_filedump_macosx.txt, 
 orc_merge5_filedump_opensuse.txt


 orc_merge*.q test cases fails with qfile diffs related to file size on 
 different OSes. I have seen failures with Open SUSE and CentOS. The order of 
 insertion of rows into ORC table impacts the file size because of run length 
 encoding. Since the order of rows is not guaranteed during insertion into 
 table we may get different file sizes. We cannot add ORDER BY to insert 
 queries as it will force insertion through single reducer which will disable 
 orc merge file optimization. Since these test cases test if the files are 
 merged or not it is sufficient to know the number of files after merging. 
 Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can 
 use dfs -ls to know the number of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.

2014-11-03 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194817#comment-14194817
 ] 

Prasanth J commented on HIVE-5718:
--

Alternatively we can just pull out the fix for LV PPD from this patch to a new 
smaller patch. 

 Support direct fetch for lateral views, sub queries, etc.
 -

 Key: HIVE-5718
 URL: https://issues.apache.org/jira/browse/HIVE-5718
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, 
 HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, 
 HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, 
 HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, 
 HIVE-5718.9.patch.txt


 Extend HIVE-2925 with LV and SubQ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.

2014-11-03 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5718:
-
Attachment: HIVE-5718.diff-v11-v12.patch

This is the diff between [~navis]'s v11 and v12 of the patch that fixes PPD 
with LV. [~ashutoshc] can you take a look?


 Support direct fetch for lateral views, sub queries, etc.
 -

 Key: HIVE-5718
 URL: https://issues.apache.org/jira/browse/HIVE-5718
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, 
 HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, 
 HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, 
 HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, 
 HIVE-5718.9.patch.txt, HIVE-5718.diff-v11-v12.patch


 Extend HIVE-2925 with LV and SubQ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8716) Partition filters are not pushed down with lateral view

2014-11-03 Thread Prasanth J (JIRA)
Prasanth J created HIVE-8716:


 Summary: Partition filters are not pushed down with lateral view
 Key: HIVE-8716
 URL: https://issues.apache.org/jira/browse/HIVE-8716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical


Changes to HIVE-8454 revealed issues with partition filters not being pushed 
down in case of lateral view. For more info see discussion in HIVE-5718.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8716) Partition filters are not pushed down with lateral view

2014-11-03 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8716:
-
Attachment: HIVE-8716.1.patch

This patch is generated from diff of v11 and v12 patches from HIVE-5718 which 
seems to fix the issue.

 Partition filters are not pushed down with lateral view
 ---

 Key: HIVE-8716
 URL: https://issues.apache.org/jira/browse/HIVE-8716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Attachments: HIVE-8716.1.patch


 Changes to HIVE-8454 revealed issues with partition filters not being pushed 
 down in case of lateral view. For more info see discussion in HIVE-5718.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.

2014-11-03 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195165#comment-14195165
 ] 

Prasanth J commented on HIVE-5718:
--

Created HIVE-8716 to address PPD issue with LV.

 Support direct fetch for lateral views, sub queries, etc.
 -

 Key: HIVE-5718
 URL: https://issues.apache.org/jira/browse/HIVE-5718
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, 
 HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, 
 HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, 
 HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, 
 HIVE-5718.9.patch.txt, HIVE-5718.diff-v11-v12.patch


 Extend HIVE-2925 with LV and SubQ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8720) Update orc_merge tests to make it consistent across OSes

2014-11-03 Thread Prasanth J (JIRA)
Prasanth J created HIVE-8720:


 Summary: Update orc_merge tests to make it consistent across OSes
 Key: HIVE-8720
 URL: https://issues.apache.org/jira/browse/HIVE-8720
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J


orc_merge*.q test cases fails with qfile diffs related to file size on 
different OSes. I have seen failures with Open SUSE and CentOS. The order of 
insertion of rows into ORC table impacts the file size because of run length 
encoding. Since the order of rows is not guaranteed during insertion into table 
we may get different file sizes. We cannot add ORDER BY to insert queries as it 
will force insertion through single reducer which will disable orc merge file 
optimization. Since these test cases test if the files are merged or not it is 
sufficient to know the number of files after merging. Instead of DESCRIBE 
FORMATTED (which shows the numFiles and fileSize) we can use dfs -ls to know 
the number of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es

2014-11-03 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8720:
-
Summary: Update orc_merge tests to make it consistent across OS'es  (was: 
Update orc_merge tests to make it consistent across OSes)

 Update orc_merge tests to make it consistent across OS'es
 -

 Key: HIVE-8720
 URL: https://issues.apache.org/jira/browse/HIVE-8720
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J

 orc_merge*.q test cases fails with qfile diffs related to file size on 
 different OSes. I have seen failures with Open SUSE and CentOS. The order of 
 insertion of rows into ORC table impacts the file size because of run length 
 encoding. Since the order of rows is not guaranteed during insertion into 
 table we may get different file sizes. We cannot add ORDER BY to insert 
 queries as it will force insertion through single reducer which will disable 
 orc merge file optimization. Since these test cases test if the files are 
 merged or not it is sufficient to know the number of files after merging. 
 Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can 
 use dfs -ls to know the number of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es

2014-11-03 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8720:
-
Attachment: orc_merge5_filedump_opensuse.txt
orc_merge5_filedump_macosx.txt

Attaching orc filedump for orc_merge5.q file test case ran in Mac OS X and 
OpenSUSE. As we can see from the row index statistics of stripe 1 and 2 the 
order of rows were different (stripe 1 in Mac OS X ended up as stripe 2 in 
OpenSuse).

 Update orc_merge tests to make it consistent across OS'es
 -

 Key: HIVE-8720
 URL: https://issues.apache.org/jira/browse/HIVE-8720
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: orc_merge5_filedump_macosx.txt, 
 orc_merge5_filedump_opensuse.txt


 orc_merge*.q test cases fails with qfile diffs related to file size on 
 different OSes. I have seen failures with Open SUSE and CentOS. The order of 
 insertion of rows into ORC table impacts the file size because of run length 
 encoding. Since the order of rows is not guaranteed during insertion into 
 table we may get different file sizes. We cannot add ORDER BY to insert 
 queries as it will force insertion through single reducer which will disable 
 orc merge file optimization. Since these test cases test if the files are 
 merged or not it is sufficient to know the number of files after merging. 
 Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can 
 use dfs -ls to know the number of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es

2014-11-03 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8720:
-
Status: Patch Available  (was: Open)

 Update orc_merge tests to make it consistent across OS'es
 -

 Key: HIVE-8720
 URL: https://issues.apache.org/jira/browse/HIVE-8720
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8720.1.patch, orc_merge5_filedump_macosx.txt, 
 orc_merge5_filedump_opensuse.txt


 orc_merge*.q test cases fails with qfile diffs related to file size on 
 different OSes. I have seen failures with Open SUSE and CentOS. The order of 
 insertion of rows into ORC table impacts the file size because of run length 
 encoding. Since the order of rows is not guaranteed during insertion into 
 table we may get different file sizes. We cannot add ORDER BY to insert 
 queries as it will force insertion through single reducer which will disable 
 orc merge file optimization. Since these test cases test if the files are 
 merged or not it is sufficient to know the number of files after merging. 
 Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can 
 use dfs -ls to know the number of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es

2014-11-03 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8720:
-
Attachment: HIVE-8720.1.patch

 Update orc_merge tests to make it consistent across OS'es
 -

 Key: HIVE-8720
 URL: https://issues.apache.org/jira/browse/HIVE-8720
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-8720.1.patch, orc_merge5_filedump_macosx.txt, 
 orc_merge5_filedump_opensuse.txt


 orc_merge*.q test cases fails with qfile diffs related to file size on 
 different OSes. I have seen failures with Open SUSE and CentOS. The order of 
 insertion of rows into ORC table impacts the file size because of run length 
 encoding. Since the order of rows is not guaranteed during insertion into 
 table we may get different file sizes. We cannot add ORDER BY to insert 
 queries as it will force insertion through single reducer which will disable 
 orc merge file optimization. Since these test cases test if the files are 
 merged or not it is sufficient to know the number of files after merging. 
 Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can 
 use dfs -ls to know the number of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8495) Add progress bar for Hive on Tez queries

2014-11-02 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193731#comment-14193731
 ] 

Prasanth J commented on HIVE-8495:
--

Looks great! But STATUS columns shouldn't be left aligned. sigh!

 Add progress bar for Hive on Tez queries
 

 Key: HIVE-8495
 URL: https://issues.apache.org/jira/browse/HIVE-8495
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-8495.1.patch, HIVE-8495.2.patch, HIVE-8495.3.patch, 
 HIVE-8495.4.patch, HIVE-8495.5.patch, HIVE-8495.6.patch, HIVE-8495.7.patch, 
 HIVE-8495.8.patch, HIVE-8495.9.patch, Screen Shot 2014-10-16 at 9.35.26 
 PM.png, Screen Shot 2014-10-22 at 11.48.57 AM.png, 
 in-place-progress-update.png, ux-demo.gif


 Build a Progress bar to provide overall progress on running tasks.
 Progress is calculated as : 
  (Completed tasks) / (Total number of tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8495) Add progress bar for Hive on Tez queries

2014-11-02 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8495:
-
Attachment: HIVE-8495.10.patch

Fixed alignment of STATUS column. Addressed review comments. 

 Add progress bar for Hive on Tez queries
 

 Key: HIVE-8495
 URL: https://issues.apache.org/jira/browse/HIVE-8495
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-8495.1.patch, HIVE-8495.10.patch, 
 HIVE-8495.2.patch, HIVE-8495.3.patch, HIVE-8495.4.patch, HIVE-8495.5.patch, 
 HIVE-8495.6.patch, HIVE-8495.7.patch, HIVE-8495.8.patch, HIVE-8495.9.patch, 
 Screen Shot 2014-10-16 at 9.35.26 PM.png, Screen Shot 2014-10-22 at 11.48.57 
 AM.png, in-place-progress-update.png, ux-demo.gif


 Build a Progress bar to provide overall progress on running tasks.
 Progress is calculated as : 
  (Completed tasks) / (Total number of tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8495) Add progress bar for Hive on Tez queries

2014-11-02 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193991#comment-14193991
 ] 

Prasanth J commented on HIVE-8495:
--

Tests failures looks unrelated. 

 Add progress bar for Hive on Tez queries
 

 Key: HIVE-8495
 URL: https://issues.apache.org/jira/browse/HIVE-8495
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-8495.1.patch, HIVE-8495.10.patch, 
 HIVE-8495.2.patch, HIVE-8495.3.patch, HIVE-8495.4.patch, HIVE-8495.5.patch, 
 HIVE-8495.6.patch, HIVE-8495.7.patch, HIVE-8495.8.patch, HIVE-8495.9.patch, 
 Screen Shot 2014-10-16 at 9.35.26 PM.png, Screen Shot 2014-10-22 at 11.48.57 
 AM.png, in-place-progress-update.png, ux-demo.gif


 Build a Progress bar to provide overall progress on running tasks.
 Progress is calculated as : 
  (Completed tasks) / (Total number of tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.

2014-11-02 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194220#comment-14194220
 ] 

Prasanth J commented on HIVE-5718:
--

[~navis] Sorry about that. I will look into that now and see whats the issue.

 Support direct fetch for lateral views, sub queries, etc.
 -

 Key: HIVE-5718
 URL: https://issues.apache.org/jira/browse/HIVE-5718
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, 
 HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, 
 HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, 
 HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, 
 HIVE-5718.9.patch.txt


 Extend HIVE-2925 with LV and SubQ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8689) handle overflows in statistics better

2014-11-02 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8689:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14. Thanks [~sershe]!

 handle overflows in statistics better
 -

 Key: HIVE-8689
 URL: https://issues.apache.org/jira/browse/HIVE-8689
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.14.0

 Attachments: HIVE-8689.01.patch, HIVE-8689.02.patch, HIVE-8689.patch


 Improve overflow checks in StatsAnnotation optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8454) Select Operator does not rename column stats properly in case of select star

2014-11-02 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8454:
-
Attachment: (was: HIVE-8474.7.patch)

 Select Operator does not rename column stats properly in case of select star
 

 Key: HIVE-8454
 URL: https://issues.apache.org/jira/browse/HIVE-8454
 Project: Hive
  Issue Type: Sub-task
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8454.1.patch, HIVE-8454.2.patch, HIVE-8454.3.patch, 
 HIVE-8454.3.patch, HIVE-8454.4.patch, HIVE-8454.5.patch, HIVE-8454.6.patch, 
 HIVE-8454.7.patch


 The estimated data size of some Select Operators is 0. BytesBytesHashMap uses 
 data size to determine the estimated initial number of entries in the 
 hashmap. If this data size is 0 then exception is thrown (refer below)
 Query 
 {code}
 select count(*) from
  store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
   JOIN item ON store_sales.ss_item_sk = item.i_item_sk
   JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
and d1.d_year = 2001;
 {code}
 {code}
 ], TaskAttempt 3 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 java.lang.AssertionError: Capacity must be a power of two
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:187)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:142)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity 
 must be a power of two
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:93)
   at 
 

[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.

2014-11-02 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194300#comment-14194300
 ] 

Prasanth J commented on HIVE-5718:
--

[~hagleitn] I think its good to have this in 0.14. As [~navis] mentioned 
HIVE-8454 revealed a problem with PPD not getting pushed down with lateral 
view. This patch has fix for it.

 Support direct fetch for lateral views, sub queries, etc.
 -

 Key: HIVE-5718
 URL: https://issues.apache.org/jira/browse/HIVE-5718
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, 
 HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, 
 HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, 
 HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, 
 HIVE-5718.9.patch.txt


 Extend HIVE-2925 with LV and SubQ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.

2014-11-02 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194301#comment-14194301
 ] 

Prasanth J commented on HIVE-5718:
--

[~navis] Thanks for the fix!

 Support direct fetch for lateral views, sub queries, etc.
 -

 Key: HIVE-5718
 URL: https://issues.apache.org/jira/browse/HIVE-5718
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, 
 HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, 
 HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, 
 HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, 
 HIVE-5718.9.patch.txt


 Extend HIVE-2925 with LV and SubQ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8521) Document the ORC format

2014-11-02 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194323#comment-14194323
 ] 

Prasanth J commented on HIVE-8521:
--

[~owen.omalley] I took a pass over the document. Mostly looks good. Few things
1) Section 4.4: Runs start with an initial byte of 0x00 to 0xf7. Shouldn't it 
be 0x7f?
2) Section 4.5.1: encoded if they type is signed should be the type
3) Section 4.5.2: DEAD BEEF hex code :)
4) Section 4.5.3: I think we should revert the percentile back to 95. Since we 
only have 5 bits patch length we will not be able to encode lengths 32 which 
could happen if we consider 90th percentile (512 * 0.1 = 51 elements can be 
patched).
5) Section 5: The default stripe size is now 64MB. Do we need to mention that 
in this section?
6) Section 5.1: DICTIONARY_DATA, DIRECT_V2, DICTIONARY_V2 has a stray \ 
before _
7) Section 5.2.7: definition was change should be changed

 Document the ORC format
 ---

 Key: HIVE-8521
 URL: https://issues.apache.org/jira/browse/HIVE-8521
 Project: Hive
  Issue Type: Bug
  Components: Documentation, File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: orc-spec.pdf


 It is past time that we document the ORC file format. I've started and should 
 have a first pass this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8671) Overflow in estimate row count and data size with fetch column stats

2014-11-01 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193351#comment-14193351
 ] 

Prasanth J commented on HIVE-8671:
--

[~hagleitn] Can we have this is 0.14?

 Overflow in estimate row count and data size with fetch column stats
 

 Key: HIVE-8671
 URL: https://issues.apache.org/jira/browse/HIVE-8671
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8671.1.patch, HIVE-8671.2.patch, HIVE-8671.3.patch, 
 HIVE-8671.4.patch, HIVE-8671.5.patch


 Overflow in row counts and data size for several TPC-DS queries.
 Interestingly the operators which have overflow end up running with a small 
 parallelism.
 For instance Reducer 2 has an overflow but it only runs with parallelism of 2.
 {code}
Reducer 2 
 Reduce Operator Tree:
   Group By Operator
 aggregations: sum(VALUE._col0)
 keys: KEY._col0 (type: string), KEY._col1 (type: string), 
 KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: float)
 mode: mergepartial
 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
 Statistics: Num rows: 9223372036854775807 Data size: 
 9223372036854775341 Basic stats: COMPLETE Column stats: COMPLETE
 Reduce Output Operator
   key expressions: _col3 (type: string), _col3 (type: string)
   sort order: ++
   Map-reduce partition columns: _col3 (type: string)
   Statistics: Num rows: 9223372036854775807 Data size: 
 9223372036854775341 Basic stats: COMPLETE Column stats: COMPLETE
   value expressions: _col0 (type: string), _col1 (type: 
 string), _col2 (type: string), _col3 (type: string), _col4 (type: float), 
 _col5 (type: double)
 Execution mode: vectorized
 {code}
 {code}
 VERTEX   TOTAL_TASKSDURATION_SECONDS CPU_TIME_MILLIS 
 INPUT_RECORDS   OUTPUT_RECORDS 
 Map 1 62   26.41   1,779,510   
 211,978,502   60,628,390
 Map 5  14.28   6,950   
 138,098  138,098
 Map 6  12.44   3,910
 31   31
 Reducer 2  2   22.69  61,320
 60,628,390   69,182
 Reducer 3  12.63   3,910
 69,182  100
 Reducer 4  11.01   1,180   
 100  100
 {code}
 Query
 {code}
 explain  
 select  i_item_desc 
   ,i_category 
   ,i_class 
   ,i_current_price
   ,i_item_id
   ,sum(ws_ext_sales_price) as itemrevenue 
   ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over
   (partition by i_class) as revenueratio
 from  
   web_sales
   ,item 
   ,date_dim
 where 
   web_sales.ws_item_sk = item.i_item_sk 
   and item.i_category in ('Jewelry', 'Sports', 'Books')
   and web_sales.ws_sold_date_sk = date_dim.d_date_sk
   and date_dim.d_date between '2001-01-12' and '2001-02-11'
 group by 
   i_item_id
 ,i_item_desc 
 ,i_category
 ,i_class
 ,i_current_price
 order by 
   i_category
 ,i_class
 ,i_item_id
 ,i_item_desc
 ,revenueratio
 limit 100
 {code}
 Explain 
 {code}
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
 Reducer 4 - Reducer 3 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019164343_854cb757-01bd-40cb-843e-9ada7c5e6f38:1
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: web_sales
   filterExpr: ws_item_sk is not null (type: boolean)
   Statistics: Num rows: 21594638446 Data size: 2850189889652 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ws_item_sk is not null (type: boolean)
 Statistics: Num rows: 21594638446 Data size: 172746300152 
 Basic stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: ws_item_sk (type: int), ws_ext_sales_price 
 (type: float), ws_sold_date_sk (type: int)
   outputColumnNames: _col0, _col1, _col2
   Statistics: Num rows: 21594638446 

[jira] [Updated] (HIVE-8671) Overflow in estimate row count and data size with fetch column stats

2014-11-01 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8671:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk and branch-0.14.

 Overflow in estimate row count and data size with fetch column stats
 

 Key: HIVE-8671
 URL: https://issues.apache.org/jira/browse/HIVE-8671
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8671.1.patch, HIVE-8671.2.patch, HIVE-8671.3.patch, 
 HIVE-8671.4.patch, HIVE-8671.5.patch


 Overflow in row counts and data size for several TPC-DS queries.
 Interestingly the operators which have overflow end up running with a small 
 parallelism.
 For instance Reducer 2 has an overflow but it only runs with parallelism of 2.
 {code}
Reducer 2 
 Reduce Operator Tree:
   Group By Operator
 aggregations: sum(VALUE._col0)
 keys: KEY._col0 (type: string), KEY._col1 (type: string), 
 KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: float)
 mode: mergepartial
 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
 Statistics: Num rows: 9223372036854775807 Data size: 
 9223372036854775341 Basic stats: COMPLETE Column stats: COMPLETE
 Reduce Output Operator
   key expressions: _col3 (type: string), _col3 (type: string)
   sort order: ++
   Map-reduce partition columns: _col3 (type: string)
   Statistics: Num rows: 9223372036854775807 Data size: 
 9223372036854775341 Basic stats: COMPLETE Column stats: COMPLETE
   value expressions: _col0 (type: string), _col1 (type: 
 string), _col2 (type: string), _col3 (type: string), _col4 (type: float), 
 _col5 (type: double)
 Execution mode: vectorized
 {code}
 {code}
 VERTEX   TOTAL_TASKSDURATION_SECONDS CPU_TIME_MILLIS 
 INPUT_RECORDS   OUTPUT_RECORDS 
 Map 1 62   26.41   1,779,510   
 211,978,502   60,628,390
 Map 5  14.28   6,950   
 138,098  138,098
 Map 6  12.44   3,910
 31   31
 Reducer 2  2   22.69  61,320
 60,628,390   69,182
 Reducer 3  12.63   3,910
 69,182  100
 Reducer 4  11.01   1,180   
 100  100
 {code}
 Query
 {code}
 explain  
 select  i_item_desc 
   ,i_category 
   ,i_class 
   ,i_current_price
   ,i_item_id
   ,sum(ws_ext_sales_price) as itemrevenue 
   ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over
   (partition by i_class) as revenueratio
 from  
   web_sales
   ,item 
   ,date_dim
 where 
   web_sales.ws_item_sk = item.i_item_sk 
   and item.i_category in ('Jewelry', 'Sports', 'Books')
   and web_sales.ws_sold_date_sk = date_dim.d_date_sk
   and date_dim.d_date between '2001-01-12' and '2001-02-11'
 group by 
   i_item_id
 ,i_item_desc 
 ,i_category
 ,i_class
 ,i_current_price
 order by 
   i_category
 ,i_class
 ,i_item_id
 ,i_item_desc
 ,revenueratio
 limit 100
 {code}
 Explain 
 {code}
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
 Reducer 4 - Reducer 3 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019164343_854cb757-01bd-40cb-843e-9ada7c5e6f38:1
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: web_sales
   filterExpr: ws_item_sk is not null (type: boolean)
   Statistics: Num rows: 21594638446 Data size: 2850189889652 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ws_item_sk is not null (type: boolean)
 Statistics: Num rows: 21594638446 Data size: 172746300152 
 Basic stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: ws_item_sk (type: int), ws_ext_sales_price 
 (type: float), ws_sold_date_sk (type: int)
   outputColumnNames: _col0, _col1, _col2
  

[jira] [Commented] (HIVE-8689) handle overflows in statistics better

2014-11-01 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193394#comment-14193394
 ] 

Prasanth J commented on HIVE-8689:
--

[~sershe] HIVE-8671 committed now. Can you rebase this patch now? Also can you 
fix Mostafa's change to reducer estimation. It will estimate one reducer less 
than the previous code. For example: if totalInputFileSize is 140 and 
bytesPerReducer is 100 then current change will just say 1 reducer. We should 
either have Math.ceil or Math.max(totalInputFileSize, totalInputFileSize + 
bytesPerReducer - 1)/bytesPerReducer.. 

 handle overflows in statistics better
 -

 Key: HIVE-8689
 URL: https://issues.apache.org/jira/browse/HIVE-8689
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.14.0

 Attachments: HIVE-8689.01.patch, HIVE-8689.patch


 Improve overflow checks in StatsAnnotation optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8689) handle overflows in statistics better

2014-11-01 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193521#comment-14193521
 ] 

Prasanth J commented on HIVE-8689:
--

+1

 handle overflows in statistics better
 -

 Key: HIVE-8689
 URL: https://issues.apache.org/jira/browse/HIVE-8689
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.14.0

 Attachments: HIVE-8689.01.patch, HIVE-8689.02.patch, HIVE-8689.patch


 Improve overflow checks in StatsAnnotation optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8689) handle overflows in statistics better

2014-11-01 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193522#comment-14193522
 ] 

Prasanth J commented on HIVE-8689:
--

[~sershe] minor nit: Can you remove the getMaxIfOverflow() method? Since we are 
using safeAdd, safeMultiply methods we don't need that anymore.

 handle overflows in statistics better
 -

 Key: HIVE-8689
 URL: https://issues.apache.org/jira/browse/HIVE-8689
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.14.0

 Attachments: HIVE-8689.01.patch, HIVE-8689.02.patch, HIVE-8689.patch


 Improve overflow checks in StatsAnnotation optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8495) Add progress bar for Hive on Tez queries

2014-11-01 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8495:
-
Attachment: HIVE-8495.8.patch

 Add progress bar for Hive on Tez queries
 

 Key: HIVE-8495
 URL: https://issues.apache.org/jira/browse/HIVE-8495
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-8495.1.patch, HIVE-8495.2.patch, HIVE-8495.3.patch, 
 HIVE-8495.4.patch, HIVE-8495.5.patch, HIVE-8495.6.patch, HIVE-8495.7.patch, 
 HIVE-8495.8.patch, Screen Shot 2014-10-16 at 9.35.26 PM.png, Screen Shot 
 2014-10-22 at 11.48.57 AM.png, in-place-progress-update.png


 Build a Progress bar to provide overall progress on running tasks.
 Progress is calculated as : 
  (Completed tasks) / (Total number of tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8495) Add progress bar for Hive on Tez queries

2014-11-01 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8495:
-
Attachment: HIVE-8495.9.patch

Vertex status information fixes.

 Add progress bar for Hive on Tez queries
 

 Key: HIVE-8495
 URL: https://issues.apache.org/jira/browse/HIVE-8495
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-8495.1.patch, HIVE-8495.2.patch, HIVE-8495.3.patch, 
 HIVE-8495.4.patch, HIVE-8495.5.patch, HIVE-8495.6.patch, HIVE-8495.7.patch, 
 HIVE-8495.8.patch, HIVE-8495.9.patch, Screen Shot 2014-10-16 at 9.35.26 
 PM.png, Screen Shot 2014-10-22 at 11.48.57 AM.png, 
 in-place-progress-update.png


 Build a Progress bar to provide overall progress on running tasks.
 Progress is calculated as : 
  (Completed tasks) / (Total number of tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >