[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231935#comment-14231935 ] Prasanth J commented on HIVE-: -- Committed this patch to trunk. lvj_mapjoin.q ran successfully locally for me too. [~brocknoland] Can you reenable lvj_mapjoin.q test to see if it runs successfully now? Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.1 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.4.patch, HIVE-.5.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8990) mapjoin_mapjoin.q is failing on Tez (missed golden file update)
[ https://issues.apache.org/jira/browse/HIVE-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8990: - Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. mapjoin_mapjoin.q is failing on Tez (missed golden file update) --- Key: HIVE-8990 URL: https://issues.apache.org/jira/browse/HIVE-8990 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.15.0 Attachments: HIVE-8990.1.patch mapjoin_mapjoin.q was updated (SORT_BEFORE_DIFF). However, since the tez test were stuck the accompanying update to the golden file was missed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230905#comment-14230905 ] Prasanth J commented on HIVE-: -- Last patch looks good to me. +1 Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.1 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.4.patch, HIVE-.5.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q
[ https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226865#comment-14226865 ] Prasanth J commented on HIVE-8975: -- [~jcamachorodriguez] I see what the issue here is. That check (RS after GBY) was used to determine map-reduce boundary. The map-side GBY has different stats logic as compared to reduce side GBY. Now after the identity projection removal optimization {code} TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {code} both GBY[2] and GBY[4] are identified as map-side GBY. I think we need to improve that if condition to better differentiate map-side and reduce-side GBY. Somewhat better check would be if RS is contained in upstream operators of GBY then that GBY is reduce side. In the above case GBY[4] contains RS[3] in its upstreams operators. Any thoughts? Possible performance regression on bucket_map_join_tez2.q - Key: HIVE-8975 URL: https://issues.apache.org/jira/browse/HIVE-8975 Project: Hive Issue Type: Bug Components: Logical Optimizer, Statistics Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez After introducing the identity project removal optimization in HIVE-8435, plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In particular, earlier it was doing a map-join and after HIVE-8435 it changed to a reduce-join. The query is the following one: {noformat} select a.key, b.key from (select distinct key from tab) a join tab b on b.key = a.key {noformat} The plan before removing the projections is: {noformat} TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} And after removing identity projections: {noformat} TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} After digging a bit, I realized it is not converting the reduce-join into a map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the optimization does not kick in. The reason for the stats change in the GroupBy operator is in [this line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633], where it is checked whether the GBY is immediately followed by a RS operator or not, and calculate stats differently depending on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7896) orcfiledump should be able to dump data
[ https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226898#comment-14226898 ] Prasanth J commented on HIVE-7896: -- LGTM, +1 orcfiledump should be able to dump data --- Key: HIVE-7896 URL: https://issues.apache.org/jira/browse/HIVE-7896 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, alltypes2.txt The FileDumper utility in orc, exposed as a service as orcfiledump, can print out metadata from Orc files but not the actual data. Being able to dump the data is also useful in some debugging contexts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q
[ https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227005#comment-14227005 ] Prasanth J commented on HIVE-8975: -- [~ashutoshc] What are all the possible modes for map-side and reduce-side? Stats calculation also has some logic for hash-aggregation enabled vs disabled. Is it safe to assume that if mode is HASH/PARTIAL it is map-side? And if the mode is FULL then reduce-side? If so I can change the logic accordingly without depending on the child/parent checks in operator tree. Possible performance regression on bucket_map_join_tez2.q - Key: HIVE-8975 URL: https://issues.apache.org/jira/browse/HIVE-8975 Project: Hive Issue Type: Bug Components: Logical Optimizer, Statistics Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez After introducing the identity project removal optimization in HIVE-8435, plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In particular, earlier it was doing a map-join and after HIVE-8435 it changed to a reduce-join. The query is the following one: {noformat} select a.key, b.key from (select distinct key from tab) a join tab b on b.key = a.key {noformat} The plan before removing the projections is: {noformat} TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} And after removing identity projections: {noformat} TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} After digging a bit, I realized it is not converting the reduce-join into a map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the optimization does not kick in. The reason for the stats change in the GroupBy operator is in [this line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633], where it is checked whether the GBY is immediately followed by a RS operator or not, and calculate stats differently depending on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8875) hive.optimize.sort.dynamic.partition should be turned off for ACID
[ https://issues.apache.org/jira/browse/HIVE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225436#comment-14225436 ] Prasanth J commented on HIVE-8875: -- LGTM, +1 hive.optimize.sort.dynamic.partition should be turned off for ACID -- Key: HIVE-8875 URL: https://issues.apache.org/jira/browse/HIVE-8875 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8875.2.patch, HIVE-8875.patch Turning this on causes ACID insert, updates, and deletes to produce non-optimal plans with extra reduce phases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218549#comment-14218549 ] Prasanth J commented on HIVE-: -- [~hagleitn] Even I don't think the test failure is related. The code changes should not affect TestCliDriver tests. I ran the test locally and it ran successfully. Also can we have this for 0.14.1? Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.4.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218809#comment-14218809 ] Prasanth J commented on HIVE-: -- Committed to trunk Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.15.0 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.4.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-: - Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) [~hagleitn]/[~ashutoshc] Should this go into 0.14.1? Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.15.0 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.4.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-: - Attachment: HIVE-.3.patch This patch bails out when the operator tree is visited again from the same root. Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-: - Attachment: HIVE-.2.patch Wrong if condition in previous patch. Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-.1.patch, HIVE-.2.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
Prasanth J created HIVE-: Summary: Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.13.0, 0.14.0, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-: - Attachment: HIVE-.1.patch Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-.1.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-: - Status: Patch Available (was: Open) Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.13.0, 0.14.0, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-.1.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14213371#comment-14213371 ] Prasanth J commented on HIVE-: -- [~hagleitn] Can you take a look at the fix? https://reviews.apache.org/r/28086/ Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-.1.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8137) Empty ORC file handling
[ https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211545#comment-14211545 ] Prasanth J commented on HIVE-8137: -- [~pankit] I am also concerned about the changes with CombineHiveInputFormat. CombineHiveInputFormat already sets a PathFilter (CombineFilter) which filters out files from the paths. If I understand correctly adding another path filter (for filtering out empty files) to combine.createPool() should do the job. Empty ORC file handling --- Key: HIVE-8137 URL: https://issues.apache.org/jira/browse/HIVE-8137 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 Attachments: HIVE-8137.2.patch, HIVE-8137.patch Hive 13 does not handle reading of a zero size Orc File properly. An Orc file is suposed to have a post-script which the ReaderIml class tries to read and initialize the footer with it. But in case, the file is empty or is of zero size, then it runs into an IndexOutOfBound Exception because of ReaderImpl trying to read in its constructor. Code Snippet : //get length of PostScript int psLen = buffer.get(readSize - 1) 0xff; In the above code, readSize for an empty file is zero. I see that ensureOrcFooter() method performs some sanity checks for footer , so, either we can move the above code snippet to ensureOrcFooter() and throw a Malformed ORC file exception or we can create a dummy Reader that does not initialize footer and basically has hasNext() set to false so that it returns false on the first call. Basically, I would like to know what might be the correct way to handle an empty ORC file in a mapred job? Should we neglect it and not throw an exception or we can throw an exeption that the ORC file is malformed. Please let me know your thoughts on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8801) Make orc_merge_incompat1.q deterministic across platforms
[ https://issues.apache.org/jira/browse/HIVE-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8801: - Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Test failures are unrelated. Committed to trunk. Make orc_merge_incompat1.q deterministic across platforms - Key: HIVE-8801 URL: https://issues.apache.org/jira/browse/HIVE-8801 Project: Hive Issue Type: Test Affects Versions: 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.15.0 Attachments: HIVE-8801.1.patch, HIVE-8801.2.patch orc_merge_incompat1.q tests for ORC fast file merge when there are incompatible files in a partition. The outcome of merge will be dependent on order of the files that CombineHiveInputFormat passes on to OrcFileMergeOperator. Since the ordering of files is not guaranteed the result of merge operation will be different across different OS'es. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8809) Activate maven profile hadoop-2 by default
Prasanth J created HIVE-8809: Summary: Activate maven profile hadoop-2 by default Key: HIVE-8809 URL: https://issues.apache.org/jira/browse/HIVE-8809 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor For every maven command profile needs to be specified explicitly. It will be better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 profile. With this change both the following commands will be equivalent {code} mvn clean install -DskipTests mvn clean install -DskipTests -Phadoop-2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default
[ https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8809: - Attachment: HIVE-8809.1.patch Activate maven profile hadoop-2 by default -- Key: HIVE-8809 URL: https://issues.apache.org/jira/browse/HIVE-8809 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Attachments: HIVE-8809.1.patch For every maven command profile needs to be specified explicitly. It will be better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 profile. With this change both the following commands will be equivalent {code} mvn clean install -DskipTests mvn clean install -DskipTests -Phadoop-2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default
[ https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8809: - Status: Patch Available (was: Open) Activate maven profile hadoop-2 by default -- Key: HIVE-8809 URL: https://issues.apache.org/jira/browse/HIVE-8809 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Attachments: HIVE-8809.1.patch For every maven command profile needs to be specified explicitly. It will be better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 profile. With this change both the following commands will be equivalent {code} mvn clean install -DskipTests mvn clean install -DskipTests -Phadoop-2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default
[ https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8809: - Attachment: dep_itests_without_hadoop_2.txt dep_itests_with_hadoop_2.txt dep_without_hadoop_2.txt dep_with_hadoop_2.txt Attaching the output of mvn dependency:tree with and without specify -Phadoop-2 explicitly. The dependency tree looks exactly the same. One thing I am not sure is why hive shims common dependency tree is showing hadoop-core. Following dependency is not within a profile in hive/shims/commom/pom.xml {code} dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-core/artifactId version${hadoop-20.version}/version optionaltrue/optional /dependency {code} [~brocknoland] Any idea why? Also how to check if the issue mentioned in HIVE-5755 does not happen? Atleast from dependency tree it doesn't seem to happen. Activate maven profile hadoop-2 by default -- Key: HIVE-8809 URL: https://issues.apache.org/jira/browse/HIVE-8809 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Attachments: HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, dep_without_hadoop_2.txt For every maven command profile needs to be specified explicitly. It will be better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 profile. With this change both the following commands will be equivalent {code} mvn clean install -DskipTests mvn clean install -DskipTests -Phadoop-2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8736) add ordering to cbo_correctness to make result consistent
[ https://issues.apache.org/jira/browse/HIVE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8736: - Fix Version/s: 0.15.0 add ordering to cbo_correctness to make result consistent - Key: HIVE-8736 URL: https://issues.apache.org/jira/browse/HIVE-8736 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8736.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8799) boatload of missing apache headers
[ https://issues.apache.org/jira/browse/HIVE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203726#comment-14203726 ] Prasanth J commented on HIVE-8799: -- The change in pom.xml, exclude**/sit/exclude did you mean **/site directory? boatload of missing apache headers -- Key: HIVE-8799 URL: https://issues.apache.org/jira/browse/HIVE-8799 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-8799.1.patch Adding missing apache headers to a number of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8799) boatload of missing apache headers
[ https://issues.apache.org/jira/browse/HIVE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203729#comment-14203729 ] Prasanth J commented on HIVE-8799: -- ha ha :) completely self-contained name. boatload of missing apache headers -- Key: HIVE-8799 URL: https://issues.apache.org/jira/browse/HIVE-8799 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-8799.1.patch Adding missing apache headers to a number of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8800) Update release notes and notice for hive .14
[ https://issues.apache.org/jira/browse/HIVE-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203744#comment-14203744 ] Prasanth J commented on HIVE-8800: -- +1 Update release notes and notice for hive .14 Key: HIVE-8800 URL: https://issues.apache.org/jira/browse/HIVE-8800 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-8800.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8801) Make orc_merge_incompat1.q deterministic across platforms
Prasanth J created HIVE-8801: Summary: Make orc_merge_incompat1.q deterministic across platforms Key: HIVE-8801 URL: https://issues.apache.org/jira/browse/HIVE-8801 Project: Hive Issue Type: Test Affects Versions: 0.15.0 Reporter: Prasanth J Assignee: Prasanth J orc_merge_incompat1.q tests for ORC fast file merge when there are incompatible files in a partition. The outcome of merge will be dependent on order of the files that CombineHiveInputFormat passes on to OrcFileMergeOperator. Since the ordering of files is not guaranteed the result of merge operation will be different across different OS'es. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8801) Make orc_merge_incompat1.q deterministic across platforms
[ https://issues.apache.org/jira/browse/HIVE-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8801: - Status: Patch Available (was: Open) Make orc_merge_incompat1.q deterministic across platforms - Key: HIVE-8801 URL: https://issues.apache.org/jira/browse/HIVE-8801 Project: Hive Issue Type: Test Affects Versions: 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8801.1.patch orc_merge_incompat1.q tests for ORC fast file merge when there are incompatible files in a partition. The outcome of merge will be dependent on order of the files that CombineHiveInputFormat passes on to OrcFileMergeOperator. Since the ordering of files is not guaranteed the result of merge operation will be different across different OS'es. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8801) Make orc_merge_incompat1.q deterministic across platforms
[ https://issues.apache.org/jira/browse/HIVE-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8801: - Attachment: HIVE-8801.1.patch Added one more file to partition. Now there are 3 files written with 0.11 version and 3 files written with 0.12 version. The outcome of merge will be 4 files independent of which input file is chosen first. Make orc_merge_incompat1.q deterministic across platforms - Key: HIVE-8801 URL: https://issues.apache.org/jira/browse/HIVE-8801 Project: Hive Issue Type: Test Affects Versions: 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8801.1.patch orc_merge_incompat1.q tests for ORC fast file merge when there are incompatible files in a partition. The outcome of merge will be dependent on order of the files that CombineHiveInputFormat passes on to OrcFileMergeOperator. Since the ordering of files is not guaranteed the result of merge operation will be different across different OS'es. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8801) Make orc_merge_incompat1.q deterministic across platforms
[ https://issues.apache.org/jira/browse/HIVE-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8801: - Attachment: HIVE-8801.2.patch Missed the diff for tez test. Make orc_merge_incompat1.q deterministic across platforms - Key: HIVE-8801 URL: https://issues.apache.org/jira/browse/HIVE-8801 Project: Hive Issue Type: Test Affects Versions: 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8801.1.patch, HIVE-8801.2.patch orc_merge_incompat1.q tests for ORC fast file merge when there are incompatible files in a partition. The outcome of merge will be dependent on order of the files that CombineHiveInputFormat passes on to OrcFileMergeOperator. Since the ordering of files is not guaranteed the result of merge operation will be different across different OS'es. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202230#comment-14202230 ] Prasanth J commented on HIVE-8732: -- I have verified that file version in file dump with old orc formats. ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8779) Tez in-place progress UI can show wrong estimated time for sub-second queries
[ https://issues.apache.org/jira/browse/HIVE-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8779: - Resolution: Fixed Fix Version/s: 0.15.0 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-0.14 Tez in-place progress UI can show wrong estimated time for sub-second queries - Key: HIVE-8779 URL: https://issues.apache.org/jira/browse/HIVE-8779 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Trivial Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8779.1.patch The in-place progress update UI added as part of HIVE-8495 can show wrong estimated time for AM only job which goes from INITED to SUCCEEDED DAG state directly without going to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8778) ORC split elimination can cause NPE when column statistics is null
[ https://issues.apache.org/jira/browse/HIVE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8778: - Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-0.14 ORC split elimination can cause NPE when column statistics is null -- Key: HIVE-8778 URL: https://issues.apache.org/jira/browse/HIVE-8778 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8778.1.patch Row group elimination has protection for NULL statistics values in RecordReaderImpl.evaluatePredicate() which then calls evaluatePredicateRange(). But split elimination directly calls evaluatePredicateRange() without NULL protection. This can lead to NullPointerException when a column is NULL in entire stripe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200612#comment-14200612 ] Prasanth J commented on HIVE-8753: -- +1 TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk - Key: HIVE-8753 URL: https://issues.apache.org/jira/browse/HIVE-8753 Project: Hive Issue Type: Test Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8753.patch Because of HIVE-7111 needs .q.out update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200683#comment-14200683 ] Prasanth J commented on HIVE-8744: -- HIVE-8735 is also addressing the same problem. Usually the client which publishes provides the key (FSOperator, StatsTask) has some logic to trim down the length of the key using MD5 hash. If the key gets greater than max stats key prefix (from hive config), Utilities.getHashedPrefixKey() method is invoked to get a smaller length key. Can you try with the patch from HIVE-8735 to see if the test case works? HIVE-8735 truncates the key before publishing. hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
[jira] [Commented] (HIVE-8556) introduce overflow control and sanity check to BytesBytesMapJoin
[ https://issues.apache.org/jira/browse/HIVE-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201047#comment-14201047 ] Prasanth J commented on HIVE-8556: -- +1 introduce overflow control and sanity check to BytesBytesMapJoin Key: HIVE-8556 URL: https://issues.apache.org/jira/browse/HIVE-8556 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HIVE-8556.patch When stats are incorrect, negative or very large number can be passed to the map -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8744: - Resolution: Duplicate Status: Resolved (was: Patch Available) Thanks [~spena] for confirming! I will close this issue as duplicate of HIVE-8735. hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at
[jira] [Commented] (HIVE-8735) statistics update can fail due to long paths
[ https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201070#comment-14201070 ] Prasanth J commented on HIVE-8735: -- [~hagleitn] Can we have this for 0.14? This fixes test failures related to stats publishing. Same issue in HIVE-8744 as well. statistics update can fail due to long paths Key: HIVE-8735 URL: https://issues.apache.org/jira/browse/HIVE-8735 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch {noformat} 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 31 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source) at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source) at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source) at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeColumn(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown Source) at
[jira] [Created] (HIVE-8771) Abstract merge file operator does not move/rename incompatible files correctly
Prasanth J created HIVE-8771: Summary: Abstract merge file operator does not move/rename incompatible files correctly Key: HIVE-8771 URL: https://issues.apache.org/jira/browse/HIVE-8771 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Fix For: 0.14.0 AbstractFileMergeOperator moves incompatible files (files which cannot be merged) to final destination. The destination path must be directory instead of file. This causes orc_merge_incompat2.q to fail under CentOS with IOException failing to rename/move files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8771) Abstract merge file operator does not move/rename incompatible files correctly
[ https://issues.apache.org/jira/browse/HIVE-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8771: - Status: Patch Available (was: Open) Abstract merge file operator does not move/rename incompatible files correctly -- Key: HIVE-8771 URL: https://issues.apache.org/jira/browse/HIVE-8771 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8771.1.patch AbstractFileMergeOperator moves incompatible files (files which cannot be merged) to final destination. The destination path must be directory instead of file. This causes orc_merge_incompat2.q to fail under CentOS with IOException failing to rename/move files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8771) Abstract merge file operator does not move/rename incompatible files correctly
[ https://issues.apache.org/jira/browse/HIVE-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8771: - Attachment: HIVE-8771.1.patch Abstract merge file operator does not move/rename incompatible files correctly -- Key: HIVE-8771 URL: https://issues.apache.org/jira/browse/HIVE-8771 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8771.1.patch AbstractFileMergeOperator moves incompatible files (files which cannot be merged) to final destination. The destination path must be directory instead of file. This causes orc_merge_incompat2.q to fail under CentOS with IOException failing to rename/move files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8735) statistics update can fail due to long paths
[ https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8735: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk and branch-0.14 statistics update can fail due to long paths Key: HIVE-8735 URL: https://issues.apache.org/jira/browse/HIVE-8735 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch {noformat} 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 31 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source) at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source) at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source) at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeColumn(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(Unknown Source) at
[jira] [Updated] (HIVE-8771) Abstract merge file operator does not move/rename incompatible files correctly
[ https://issues.apache.org/jira/browse/HIVE-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8771: - Description: AbstractFileMergeOperator moves incompatible files (files which cannot be merged) to final destination. The destination path must be directory instead of file. This causes orc_merge_incompat2.q to fail under CentOS with IOException failing to rename/move files. Stack trace: {code} 2014-11-05 02:38:56,588 DEBUG fs.FileSystem (RawLocalFileSystem.java:rename(337)) - Falling through to a copy of file:/home/prasanth/hive/itests/qtest/target/warehouse/orc_merge5a/st=80.0/00_0 to file:/home/prasanth/hive/itests/qtest/target/tmp/scratchdir/prasanth/0de64e52-6615-4c5a-bdfb-c3b2c28131f6/hive_2014-11-05_02-38-55_511_7578595409877157627-1/_tmp.-ext-1/00_0/00_0 2014-11-05 02:38:56,589 INFO mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete. 2014-11-05 02:38:56,590 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local1144733438_0036 java.lang.Exception: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close AbstractFileMergeOperator at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close AbstractFileMergeOperator at org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:100) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close AbstractFileMergeOperator at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:233) at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:220) at org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98) ... 10 more Caused by: java.io.FileNotFoundException: Destination exists and is not a directory: /home/prasanth/hive/itests/qtest/target/tmp/scratchdir/prasanth/0de64e52-6615-4c5a-bdfb-c3b2c28131f6/hive_2014-11-05_02-38-55_511_7578595409877157627-1/_tmp.-ext-1/00_0 at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:423) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:267) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:257) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:339) at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:507) at org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:214) at org.apache.hadoop.fs.ProxyFileSystem.rename(ProxyFileSystem.java:177) at org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:214) at org.apache.hadoop.hive.ql.exec.Utilities.renameOrMoveFiles(Utilities.java:1589) at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:218) ... 12 more {code} was:AbstractFileMergeOperator moves incompatible files (files which cannot be merged) to final destination. The destination path must be directory instead of file. This causes orc_merge_incompat2.q to fail under CentOS with IOException failing to rename/move files. Abstract merge file operator does not move/rename incompatible files correctly -- Key: HIVE-8771 URL: https://issues.apache.org/jira/browse/HIVE-8771 Project: Hive Issue Type: Bug Affects Versions: 0.14.0
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201442#comment-14201442 ] Prasanth J commented on HIVE-8732: -- The new changes looks good to me. +1. Can you create a followup for dealing with NaN in double column statistics? ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8778) ORC split elimination can cause NPE when column statistics is null
Prasanth J created HIVE-8778: Summary: ORC split elimination can cause NPE when column statistics is null Key: HIVE-8778 URL: https://issues.apache.org/jira/browse/HIVE-8778 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Fix For: 0.14.0 Row group elimination has protection for NULL statistics values in RecordReaderImpl.evaluatePredicate() which then calls evaluatePredicateRange(). But split elimination directly calls evaluatePredicateRange() without NULL protection. This can lead to NullPointerException when a column is NULL in entire stripe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8778) ORC split elimination can cause NPE when column statistics is null
[ https://issues.apache.org/jira/browse/HIVE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8778: - Attachment: HIVE-8778.1.patch [~owen.omalley]/[~gopalv] Can someone take a look at this patch? ORC split elimination can cause NPE when column statistics is null -- Key: HIVE-8778 URL: https://issues.apache.org/jira/browse/HIVE-8778 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8778.1.patch Row group elimination has protection for NULL statistics values in RecordReaderImpl.evaluatePredicate() which then calls evaluatePredicateRange(). But split elimination directly calls evaluatePredicateRange() without NULL protection. This can lead to NullPointerException when a column is NULL in entire stripe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8778) ORC split elimination can cause NPE when column statistics is null
[ https://issues.apache.org/jira/browse/HIVE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8778: - Status: Patch Available (was: Open) ORC split elimination can cause NPE when column statistics is null -- Key: HIVE-8778 URL: https://issues.apache.org/jira/browse/HIVE-8778 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8778.1.patch Row group elimination has protection for NULL statistics values in RecordReaderImpl.evaluatePredicate() which then calls evaluatePredicateRange(). But split elimination directly calls evaluatePredicateRange() without NULL protection. This can lead to NullPointerException when a column is NULL in entire stripe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8779) Tez in-place progress UI can show wrong estimated time for AM only job
Prasanth J created HIVE-8779: Summary: Tez in-place progress UI can show wrong estimated time for AM only job Key: HIVE-8779 URL: https://issues.apache.org/jira/browse/HIVE-8779 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Trivial The in-place progress update UI added as part of HIVE-8495 can show wrong estimated time for AM only job which goes from INITED to SUCCEEDED DAG state directly without going to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8779) Tez in-place progress UI can show wrong estimated time for AM only job
[ https://issues.apache.org/jira/browse/HIVE-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8779: - Attachment: HIVE-8779.1.patch Tez in-place progress UI can show wrong estimated time for AM only job -- Key: HIVE-8779 URL: https://issues.apache.org/jira/browse/HIVE-8779 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Trivial Attachments: HIVE-8779.1.patch The in-place progress update UI added as part of HIVE-8495 can show wrong estimated time for AM only job which goes from INITED to SUCCEEDED DAG state directly without going to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8779) Tez in-place progress UI can show wrong estimated time for AM only job
[ https://issues.apache.org/jira/browse/HIVE-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8779: - Status: Patch Available (was: Open) Tez in-place progress UI can show wrong estimated time for AM only job -- Key: HIVE-8779 URL: https://issues.apache.org/jira/browse/HIVE-8779 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Trivial Attachments: HIVE-8779.1.patch The in-place progress update UI added as part of HIVE-8495 can show wrong estimated time for AM only job which goes from INITED to SUCCEEDED DAG state directly without going to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8781) Nullsafe joins are busted on Tez
[ https://issues.apache.org/jira/browse/HIVE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201711#comment-14201711 ] Prasanth J commented on HIVE-8781: -- LGTM, +1 Nullsafe joins are busted on Tez Key: HIVE-8781 URL: https://issues.apache.org/jira/browse/HIVE-8781 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8781.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8781) Nullsafe joins are busted on Tez
[ https://issues.apache.org/jira/browse/HIVE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201712#comment-14201712 ] Prasanth J commented on HIVE-8781: -- Pending tests Nullsafe joins are busted on Tez Key: HIVE-8781 URL: https://issues.apache.org/jira/browse/HIVE-8781 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8781.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8716) Partition filters are not pushed down with lateral view
[ https://issues.apache.org/jira/browse/HIVE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8716: - Attachment: HIVE-8716.2.patch Partition filters are not pushed down with lateral view --- Key: HIVE-8716 URL: https://issues.apache.org/jira/browse/HIVE-8716 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Attachments: HIVE-8716.1.patch, HIVE-8716.2.patch Changes to HIVE-8454 revealed issues with partition filters not being pushed down in case of lateral view. For more info see discussion in HIVE-5718. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8727) Dag summary has incorrect row counts and duration per vertex
[ https://issues.apache.org/jira/browse/HIVE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8727: - Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-0.14 Dag summary has incorrect row counts and duration per vertex Key: HIVE-8727 URL: https://issues.apache.org/jira/browse/HIVE-8727 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8727.1.patch During the code review for HIVE-8495 some code was reworked which broke some of INPUT/OUTPUT counters and duration. Patch attached which fixes that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding
[ https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198987#comment-14198987 ] Prasanth J commented on HIVE-8740: -- Ah ok!. With hive convention of INSERT INTO I was thinking new rows will be appended to the existing partition and not replacing it. Sorted dynamic partition does not work correctly with constant folding -- Key: HIVE-8740 URL: https://issues.apache.org/jira/browse/HIVE-8740 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8740.1.patch Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such we don't really need sorted dynamic partition optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8735) statistics update can fail due to long paths
[ https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199107#comment-14199107 ] Prasanth J commented on HIVE-8735: -- Some comments in RB statistics update can fail due to long paths Key: HIVE-8735 URL: https://issues.apache.org/jira/browse/HIVE-8735 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8735.patch {noformat} 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 31 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source) at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source) at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source) at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeColumn(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(Unknown Source) at org.apache.derby.impl.sql.execute.DMLWriteResultSet.getNextRowCore(Unknown Source) at
[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding
[ https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8740: - Attachment: HIVE-8740.2.patch Thanks for the clarification [~alangates]. That was my mistake.. I should have added where value = 'bar' to the predicate to get the result that I was expecting. Updated the queries in this new patch. Sorted dynamic partition does not work correctly with constant folding -- Key: HIVE-8740 URL: https://issues.apache.org/jira/browse/HIVE-8740 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such we don't really need sorted dynamic partition optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding
[ https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8740: - Description: Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such cases we don't really need sorted dynamic partition optimization. (was: Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such we don't really need sorted dynamic partition optimization.) Sorted dynamic partition does not work correctly with constant folding -- Key: HIVE-8740 URL: https://issues.apache.org/jira/browse/HIVE-8740 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such cases we don't really need sorted dynamic partition optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding
[ https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8740: - Attachment: HIVE-8740.3.patch Added more tests to cover cases where sorted dynamic partition is enabled and constant propagation is disabled to make sure the generated plan and results are correct. Sorted dynamic partition does not work correctly with constant folding -- Key: HIVE-8740 URL: https://issues.apache.org/jira/browse/HIVE-8740 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch, HIVE-8740.3.patch Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such cases we don't really need sorted dynamic partition optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8747) Estimate number of rows for table with 0 rows overflows resulting in an in-efficient plan
[ https://issues.apache.org/jira/browse/HIVE-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J resolved HIVE-8747. -- Resolution: Cannot Reproduce Can't reproduce the issue. Please reopen it if the case is reproducible. Estimate number of rows for table with 0 rows overflows resulting in an in-efficient plan -- Key: HIVE-8747 URL: https://issues.apache.org/jira/browse/HIVE-8747 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Critical Fix For: 0.14.0 ship_mode table has 0 rows. Query {code} select count(*) from web_sales ,date_dim ,ship_mode where web_sales.ws_sold_date_sk = date_dim.d_date_sk and web_sales.ws_ship_mode_sk = ship_mode.sm_ship_mode_sk and d_year = 2002 and sm_carrier in ('DIAMOND','AIRBORNE') {code} Explain {code} STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Map 4 - Map 3 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) DagName: mmokhtar_20141105180404_59e6fb65-529f-4eaa-9446-7f34d12bffac:30 Vertices: Map 1 Map Operator Tree: TableScan alias: ship_mode filterExpr: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and sm_ship_mode_sk is not null) (type: boolean) Statistics: Num rows: 0 Data size: 45 Basic stats: PARTIAL Column stats: COMPLETE Filter Operator predicate: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and sm_ship_mode_sk is not null) (type: boolean) Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: sm_ship_mode_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) input vertices: 0 Map 4 Statistics: Num rows: 9223372036854775807 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Select Operator Statistics: Num rows: 9223372036854775807 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Group By Operator aggregations: count() mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: bigint) Execution mode: vectorized Map 3 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2002) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((d_year = 2002) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 652 Data size: 5216 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 4 Map Operator Tree:
[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding
[ https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8740: - Attachment: HIVE-8740.4.patch Rebased patch after HIVE-8716 commit. Sorted dynamic partition does not work correctly with constant folding -- Key: HIVE-8740 URL: https://issues.apache.org/jira/browse/HIVE-8740 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch, HIVE-8740.3.patch, HIVE-8740.4.patch Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such cases we don't really need sorted dynamic partition optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding
[ https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8740: - Resolution: Fixed Fix Version/s: 0.15.0 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-0.14 Sorted dynamic partition does not work correctly with constant folding -- Key: HIVE-8740 URL: https://issues.apache.org/jira/browse/HIVE-8740 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch, HIVE-8740.3.patch, HIVE-8740.4.patch Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such cases we don't really need sorted dynamic partition optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8735) statistics update can fail due to long paths
[ https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199705#comment-14199705 ] Prasanth J commented on HIVE-8735: -- +1. Will be good if you can add some tests. statistics update can fail due to long paths Key: HIVE-8735 URL: https://issues.apache.org/jira/browse/HIVE-8735 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch {noformat} 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 31 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source) at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source) at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source) at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeColumn(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown Source) at org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(Unknown Source) at
[jira] [Commented] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es
[ https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196623#comment-14196623 ] Prasanth J commented on HIVE-8720: -- [~hagleitn] Can we have this for 0.14? These are just test file diffs to make the qfile results consistent across platforms. Update orc_merge tests to make it consistent across OS'es - Key: HIVE-8720 URL: https://issues.apache.org/jira/browse/HIVE-8720 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8720.1.patch, orc_merge5_filedump_macosx.txt, orc_merge5_filedump_opensuse.txt orc_merge*.q test cases fails with qfile diffs related to file size on different OSes. I have seen failures with Open SUSE and CentOS. The order of insertion of rows into ORC table impacts the file size because of run length encoding. Since the order of rows is not guaranteed during insertion into table we may get different file sizes. We cannot add ORDER BY to insert queries as it will force insertion through single reducer which will disable orc merge file optimization. Since these test cases test if the files are merged or not it is sufficient to know the number of files after merging. Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can use dfs -ls to know the number of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8727) Dag summary has incorrect row counts and duration per vertex
[ https://issues.apache.org/jira/browse/HIVE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8727: - Status: Patch Available (was: Open) Dag summary has incorrect row counts and duration per vertex Key: HIVE-8727 URL: https://issues.apache.org/jira/browse/HIVE-8727 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-8727.1.patch During the code review for HIVE-8495 some code was reworked which broke some of INPUT/OUTPUT counters and duration. Patch attached which fixes that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8727) Dag summary has incorrect row counts and duration per vertex
[ https://issues.apache.org/jira/browse/HIVE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196665#comment-14196665 ] Prasanth J commented on HIVE-8727: -- +1 Dag summary has incorrect row counts and duration per vertex Key: HIVE-8727 URL: https://issues.apache.org/jira/browse/HIVE-8727 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-8727.1.patch During the code review for HIVE-8495 some code was reworked which broke some of INPUT/OUTPUT counters and duration. Patch attached which fixes that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8727) Dag summary has incorrect row counts and duration per vertex
[ https://issues.apache.org/jira/browse/HIVE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196697#comment-14196697 ] Prasanth J commented on HIVE-8727: -- [~hagleitn] HIVE-8495 broke output of dag summary. Can we have this for 0.14? Dag summary has incorrect row counts and duration per vertex Key: HIVE-8727 URL: https://issues.apache.org/jira/browse/HIVE-8727 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-8727.1.patch During the code review for HIVE-8495 some code was reworked which broke some of INPUT/OUTPUT counters and duration. Patch attached which fixes that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197045#comment-14197045 ] Prasanth J commented on HIVE-8732: -- LGTM, +1. Pending tests ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding
Prasanth J created HIVE-8740: Summary: Sorted dynamic partition does not work correctly with constant folding Key: HIVE-8740 URL: https://issues.apache.org/jira/browse/HIVE-8740 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such we don't really need sorted dynamic partition optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding
[ https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8740: - Attachment: HIVE-8740.1.patch Sorted dynamic partition does not work correctly with constant folding -- Key: HIVE-8740 URL: https://issues.apache.org/jira/browse/HIVE-8740 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8740.1.patch Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such we don't really need sorted dynamic partition optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding
[ https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197718#comment-14197718 ] Prasanth J commented on HIVE-8740: -- [~alangates] Can you look at the test case that I added in this patch? Its related to ACID DELETE operation. After deleting a newly added row, the select count\(*\) query return 0 rows instead of actual 1000 rows. Is this a bug/known issue or am I doing something wrong? Sorted dynamic partition does not work correctly with constant folding -- Key: HIVE-8740 URL: https://issues.apache.org/jira/browse/HIVE-8740 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8740.1.patch Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such we don't really need sorted dynamic partition optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding
[ https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8740: - Status: Patch Available (was: Open) Sorted dynamic partition does not work correctly with constant folding -- Key: HIVE-8740 URL: https://issues.apache.org/jira/browse/HIVE-8740 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8740.1.patch Sorted dynamic partition optimization looks for partition columns from the operator above FileSinkOperator. As per hive convention it expects partition columns at the last. But with HIVE-8585 equality filters on partition columns gets folded to constant. The column pruner then prunes the constant expression as they don't reference any columns. This in some cases will yield unexpected results (throw ArrayIndexOutOfBounds exception) with sorted dynamic partition insert optimization. In such we don't really need sorted dynamic partition optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8716) Partition filters are not pushed down with lateral view
[ https://issues.apache.org/jira/browse/HIVE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197744#comment-14197744 ] Prasanth J commented on HIVE-8716: -- I will look into the test failures. Partition filters are not pushed down with lateral view --- Key: HIVE-8716 URL: https://issues.apache.org/jira/browse/HIVE-8716 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Attachments: HIVE-8716.1.patch Changes to HIVE-8454 revealed issues with partition filters not being pushed down in case of lateral view. For more info see discussion in HIVE-5718. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es
[ https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8720: - Resolution: Fixed Fix Version/s: 0.15.0 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-0.14. Update orc_merge tests to make it consistent across OS'es - Key: HIVE-8720 URL: https://issues.apache.org/jira/browse/HIVE-8720 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8720.1.patch, orc_merge5_filedump_macosx.txt, orc_merge5_filedump_opensuse.txt orc_merge*.q test cases fails with qfile diffs related to file size on different OSes. I have seen failures with Open SUSE and CentOS. The order of insertion of rows into ORC table impacts the file size because of run length encoding. Since the order of rows is not guaranteed during insertion into table we may get different file sizes. We cannot add ORDER BY to insert queries as it will force insertion through single reducer which will disable orc merge file optimization. Since these test cases test if the files are merged or not it is sufficient to know the number of files after merging. Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can use dfs -ls to know the number of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.
[ https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194817#comment-14194817 ] Prasanth J commented on HIVE-5718: -- Alternatively we can just pull out the fix for LV PPD from this patch to a new smaller patch. Support direct fetch for lateral views, sub queries, etc. - Key: HIVE-5718 URL: https://issues.apache.org/jira/browse/HIVE-5718 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, HIVE-5718.9.patch.txt Extend HIVE-2925 with LV and SubQ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.
[ https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-5718: - Attachment: HIVE-5718.diff-v11-v12.patch This is the diff between [~navis]'s v11 and v12 of the patch that fixes PPD with LV. [~ashutoshc] can you take a look? Support direct fetch for lateral views, sub queries, etc. - Key: HIVE-5718 URL: https://issues.apache.org/jira/browse/HIVE-5718 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, HIVE-5718.9.patch.txt, HIVE-5718.diff-v11-v12.patch Extend HIVE-2925 with LV and SubQ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8716) Partition filters are not pushed down with lateral view
Prasanth J created HIVE-8716: Summary: Partition filters are not pushed down with lateral view Key: HIVE-8716 URL: https://issues.apache.org/jira/browse/HIVE-8716 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Changes to HIVE-8454 revealed issues with partition filters not being pushed down in case of lateral view. For more info see discussion in HIVE-5718. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8716) Partition filters are not pushed down with lateral view
[ https://issues.apache.org/jira/browse/HIVE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8716: - Attachment: HIVE-8716.1.patch This patch is generated from diff of v11 and v12 patches from HIVE-5718 which seems to fix the issue. Partition filters are not pushed down with lateral view --- Key: HIVE-8716 URL: https://issues.apache.org/jira/browse/HIVE-8716 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Critical Attachments: HIVE-8716.1.patch Changes to HIVE-8454 revealed issues with partition filters not being pushed down in case of lateral view. For more info see discussion in HIVE-5718. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.
[ https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195165#comment-14195165 ] Prasanth J commented on HIVE-5718: -- Created HIVE-8716 to address PPD issue with LV. Support direct fetch for lateral views, sub queries, etc. - Key: HIVE-5718 URL: https://issues.apache.org/jira/browse/HIVE-5718 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, HIVE-5718.9.patch.txt, HIVE-5718.diff-v11-v12.patch Extend HIVE-2925 with LV and SubQ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8720) Update orc_merge tests to make it consistent across OSes
Prasanth J created HIVE-8720: Summary: Update orc_merge tests to make it consistent across OSes Key: HIVE-8720 URL: https://issues.apache.org/jira/browse/HIVE-8720 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J orc_merge*.q test cases fails with qfile diffs related to file size on different OSes. I have seen failures with Open SUSE and CentOS. The order of insertion of rows into ORC table impacts the file size because of run length encoding. Since the order of rows is not guaranteed during insertion into table we may get different file sizes. We cannot add ORDER BY to insert queries as it will force insertion through single reducer which will disable orc merge file optimization. Since these test cases test if the files are merged or not it is sufficient to know the number of files after merging. Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can use dfs -ls to know the number of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es
[ https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8720: - Summary: Update orc_merge tests to make it consistent across OS'es (was: Update orc_merge tests to make it consistent across OSes) Update orc_merge tests to make it consistent across OS'es - Key: HIVE-8720 URL: https://issues.apache.org/jira/browse/HIVE-8720 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J orc_merge*.q test cases fails with qfile diffs related to file size on different OSes. I have seen failures with Open SUSE and CentOS. The order of insertion of rows into ORC table impacts the file size because of run length encoding. Since the order of rows is not guaranteed during insertion into table we may get different file sizes. We cannot add ORDER BY to insert queries as it will force insertion through single reducer which will disable orc merge file optimization. Since these test cases test if the files are merged or not it is sufficient to know the number of files after merging. Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can use dfs -ls to know the number of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es
[ https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8720: - Attachment: orc_merge5_filedump_opensuse.txt orc_merge5_filedump_macosx.txt Attaching orc filedump for orc_merge5.q file test case ran in Mac OS X and OpenSUSE. As we can see from the row index statistics of stripe 1 and 2 the order of rows were different (stripe 1 in Mac OS X ended up as stripe 2 in OpenSuse). Update orc_merge tests to make it consistent across OS'es - Key: HIVE-8720 URL: https://issues.apache.org/jira/browse/HIVE-8720 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: orc_merge5_filedump_macosx.txt, orc_merge5_filedump_opensuse.txt orc_merge*.q test cases fails with qfile diffs related to file size on different OSes. I have seen failures with Open SUSE and CentOS. The order of insertion of rows into ORC table impacts the file size because of run length encoding. Since the order of rows is not guaranteed during insertion into table we may get different file sizes. We cannot add ORDER BY to insert queries as it will force insertion through single reducer which will disable orc merge file optimization. Since these test cases test if the files are merged or not it is sufficient to know the number of files after merging. Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can use dfs -ls to know the number of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es
[ https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8720: - Status: Patch Available (was: Open) Update orc_merge tests to make it consistent across OS'es - Key: HIVE-8720 URL: https://issues.apache.org/jira/browse/HIVE-8720 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8720.1.patch, orc_merge5_filedump_macosx.txt, orc_merge5_filedump_opensuse.txt orc_merge*.q test cases fails with qfile diffs related to file size on different OSes. I have seen failures with Open SUSE and CentOS. The order of insertion of rows into ORC table impacts the file size because of run length encoding. Since the order of rows is not guaranteed during insertion into table we may get different file sizes. We cannot add ORDER BY to insert queries as it will force insertion through single reducer which will disable orc merge file optimization. Since these test cases test if the files are merged or not it is sufficient to know the number of files after merging. Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can use dfs -ls to know the number of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es
[ https://issues.apache.org/jira/browse/HIVE-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8720: - Attachment: HIVE-8720.1.patch Update orc_merge tests to make it consistent across OS'es - Key: HIVE-8720 URL: https://issues.apache.org/jira/browse/HIVE-8720 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-8720.1.patch, orc_merge5_filedump_macosx.txt, orc_merge5_filedump_opensuse.txt orc_merge*.q test cases fails with qfile diffs related to file size on different OSes. I have seen failures with Open SUSE and CentOS. The order of insertion of rows into ORC table impacts the file size because of run length encoding. Since the order of rows is not guaranteed during insertion into table we may get different file sizes. We cannot add ORDER BY to insert queries as it will force insertion through single reducer which will disable orc merge file optimization. Since these test cases test if the files are merged or not it is sufficient to know the number of files after merging. Instead of DESCRIBE FORMATTED (which shows the numFiles and fileSize) we can use dfs -ls to know the number of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8495) Add progress bar for Hive on Tez queries
[ https://issues.apache.org/jira/browse/HIVE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193731#comment-14193731 ] Prasanth J commented on HIVE-8495: -- Looks great! But STATUS columns shouldn't be left aligned. sigh! Add progress bar for Hive on Tez queries Key: HIVE-8495 URL: https://issues.apache.org/jira/browse/HIVE-8495 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-8495.1.patch, HIVE-8495.2.patch, HIVE-8495.3.patch, HIVE-8495.4.patch, HIVE-8495.5.patch, HIVE-8495.6.patch, HIVE-8495.7.patch, HIVE-8495.8.patch, HIVE-8495.9.patch, Screen Shot 2014-10-16 at 9.35.26 PM.png, Screen Shot 2014-10-22 at 11.48.57 AM.png, in-place-progress-update.png, ux-demo.gif Build a Progress bar to provide overall progress on running tasks. Progress is calculated as : (Completed tasks) / (Total number of tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8495) Add progress bar for Hive on Tez queries
[ https://issues.apache.org/jira/browse/HIVE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8495: - Attachment: HIVE-8495.10.patch Fixed alignment of STATUS column. Addressed review comments. Add progress bar for Hive on Tez queries Key: HIVE-8495 URL: https://issues.apache.org/jira/browse/HIVE-8495 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-8495.1.patch, HIVE-8495.10.patch, HIVE-8495.2.patch, HIVE-8495.3.patch, HIVE-8495.4.patch, HIVE-8495.5.patch, HIVE-8495.6.patch, HIVE-8495.7.patch, HIVE-8495.8.patch, HIVE-8495.9.patch, Screen Shot 2014-10-16 at 9.35.26 PM.png, Screen Shot 2014-10-22 at 11.48.57 AM.png, in-place-progress-update.png, ux-demo.gif Build a Progress bar to provide overall progress on running tasks. Progress is calculated as : (Completed tasks) / (Total number of tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8495) Add progress bar for Hive on Tez queries
[ https://issues.apache.org/jira/browse/HIVE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193991#comment-14193991 ] Prasanth J commented on HIVE-8495: -- Tests failures looks unrelated. Add progress bar for Hive on Tez queries Key: HIVE-8495 URL: https://issues.apache.org/jira/browse/HIVE-8495 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-8495.1.patch, HIVE-8495.10.patch, HIVE-8495.2.patch, HIVE-8495.3.patch, HIVE-8495.4.patch, HIVE-8495.5.patch, HIVE-8495.6.patch, HIVE-8495.7.patch, HIVE-8495.8.patch, HIVE-8495.9.patch, Screen Shot 2014-10-16 at 9.35.26 PM.png, Screen Shot 2014-10-22 at 11.48.57 AM.png, in-place-progress-update.png, ux-demo.gif Build a Progress bar to provide overall progress on running tasks. Progress is calculated as : (Completed tasks) / (Total number of tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.
[ https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194220#comment-14194220 ] Prasanth J commented on HIVE-5718: -- [~navis] Sorry about that. I will look into that now and see whats the issue. Support direct fetch for lateral views, sub queries, etc. - Key: HIVE-5718 URL: https://issues.apache.org/jira/browse/HIVE-5718 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, HIVE-5718.9.patch.txt Extend HIVE-2925 with LV and SubQ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8689) handle overflows in statistics better
[ https://issues.apache.org/jira/browse/HIVE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8689: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk and branch-0.14. Thanks [~sershe]! handle overflows in statistics better - Key: HIVE-8689 URL: https://issues.apache.org/jira/browse/HIVE-8689 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-8689.01.patch, HIVE-8689.02.patch, HIVE-8689.patch Improve overflow checks in StatsAnnotation optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8454) Select Operator does not rename column stats properly in case of select star
[ https://issues.apache.org/jira/browse/HIVE-8454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8454: - Attachment: (was: HIVE-8474.7.patch) Select Operator does not rename column stats properly in case of select star Key: HIVE-8454 URL: https://issues.apache.org/jira/browse/HIVE-8454 Project: Hive Issue Type: Sub-task Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Critical Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8454.1.patch, HIVE-8454.2.patch, HIVE-8454.3.patch, HIVE-8454.3.patch, HIVE-8454.4.patch, HIVE-8454.5.patch, HIVE-8454.6.patch, HIVE-8454.7.patch The estimated data size of some Select Operators is 0. BytesBytesHashMap uses data size to determine the estimated initial number of entries in the hashmap. If this data size is 0 then exception is thrown (refer below) Query {code} select count(*) from store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 and d1.d_year = 2001; {code} {code} ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:187) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:142) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:93) at
[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.
[ https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194300#comment-14194300 ] Prasanth J commented on HIVE-5718: -- [~hagleitn] I think its good to have this in 0.14. As [~navis] mentioned HIVE-8454 revealed a problem with PPD not getting pushed down with lateral view. This patch has fix for it. Support direct fetch for lateral views, sub queries, etc. - Key: HIVE-5718 URL: https://issues.apache.org/jira/browse/HIVE-5718 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, HIVE-5718.9.patch.txt Extend HIVE-2925 with LV and SubQ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.
[ https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194301#comment-14194301 ] Prasanth J commented on HIVE-5718: -- [~navis] Thanks for the fix! Support direct fetch for lateral views, sub queries, etc. - Key: HIVE-5718 URL: https://issues.apache.org/jira/browse/HIVE-5718 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, HIVE-5718.10.patch.txt, HIVE-5718.11.patch.txt, HIVE-5718.12.patch.txt, HIVE-5718.13.patch.txt, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, HIVE-5718.9.patch.txt Extend HIVE-2925 with LV and SubQ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8521) Document the ORC format
[ https://issues.apache.org/jira/browse/HIVE-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194323#comment-14194323 ] Prasanth J commented on HIVE-8521: -- [~owen.omalley] I took a pass over the document. Mostly looks good. Few things 1) Section 4.4: Runs start with an initial byte of 0x00 to 0xf7. Shouldn't it be 0x7f? 2) Section 4.5.1: encoded if they type is signed should be the type 3) Section 4.5.2: DEAD BEEF hex code :) 4) Section 4.5.3: I think we should revert the percentile back to 95. Since we only have 5 bits patch length we will not be able to encode lengths 32 which could happen if we consider 90th percentile (512 * 0.1 = 51 elements can be patched). 5) Section 5: The default stripe size is now 64MB. Do we need to mention that in this section? 6) Section 5.1: DICTIONARY_DATA, DIRECT_V2, DICTIONARY_V2 has a stray \ before _ 7) Section 5.2.7: definition was change should be changed Document the ORC format --- Key: HIVE-8521 URL: https://issues.apache.org/jira/browse/HIVE-8521 Project: Hive Issue Type: Bug Components: Documentation, File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: orc-spec.pdf It is past time that we document the ORC file format. I've started and should have a first pass this week. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8671) Overflow in estimate row count and data size with fetch column stats
[ https://issues.apache.org/jira/browse/HIVE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193351#comment-14193351 ] Prasanth J commented on HIVE-8671: -- [~hagleitn] Can we have this is 0.14? Overflow in estimate row count and data size with fetch column stats Key: HIVE-8671 URL: https://issues.apache.org/jira/browse/HIVE-8671 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8671.1.patch, HIVE-8671.2.patch, HIVE-8671.3.patch, HIVE-8671.4.patch, HIVE-8671.5.patch Overflow in row counts and data size for several TPC-DS queries. Interestingly the operators which have overflow end up running with a small parallelism. For instance Reducer 2 has an overflow but it only runs with parallelism of 2. {code} Reducer 2 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: float) mode: mergepartial outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775341 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col3 (type: string), _col3 (type: string) sort order: ++ Map-reduce partition columns: _col3 (type: string) Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775341 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: float), _col5 (type: double) Execution mode: vectorized {code} {code} VERTEX TOTAL_TASKSDURATION_SECONDS CPU_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS Map 1 62 26.41 1,779,510 211,978,502 60,628,390 Map 5 14.28 6,950 138,098 138,098 Map 6 12.44 3,910 31 31 Reducer 2 2 22.69 61,320 60,628,390 69,182 Reducer 3 12.63 3,910 69,182 100 Reducer 4 11.01 1,180 100 100 {code} Query {code} explain select i_item_desc ,i_category ,i_class ,i_current_price ,i_item_id ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratio from web_sales ,item ,date_dim where web_sales.ws_item_sk = item.i_item_sk and item.i_category in ('Jewelry', 'Sports', 'Books') and web_sales.ws_sold_date_sk = date_dim.d_date_sk and date_dim.d_date between '2001-01-12' and '2001-02-11' group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100 {code} Explain {code} STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) Reducer 4 - Reducer 3 (SIMPLE_EDGE) DagName: mmokhtar_20141019164343_854cb757-01bd-40cb-843e-9ada7c5e6f38:1 Vertices: Map 1 Map Operator Tree: TableScan alias: web_sales filterExpr: ws_item_sk is not null (type: boolean) Statistics: Num rows: 21594638446 Data size: 2850189889652 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ws_item_sk is not null (type: boolean) Statistics: Num rows: 21594638446 Data size: 172746300152 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ws_item_sk (type: int), ws_ext_sales_price (type: float), ws_sold_date_sk (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 21594638446
[jira] [Updated] (HIVE-8671) Overflow in estimate row count and data size with fetch column stats
[ https://issues.apache.org/jira/browse/HIVE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8671: - Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Patch committed to trunk and branch-0.14. Overflow in estimate row count and data size with fetch column stats Key: HIVE-8671 URL: https://issues.apache.org/jira/browse/HIVE-8671 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Critical Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8671.1.patch, HIVE-8671.2.patch, HIVE-8671.3.patch, HIVE-8671.4.patch, HIVE-8671.5.patch Overflow in row counts and data size for several TPC-DS queries. Interestingly the operators which have overflow end up running with a small parallelism. For instance Reducer 2 has an overflow but it only runs with parallelism of 2. {code} Reducer 2 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: float) mode: mergepartial outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775341 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col3 (type: string), _col3 (type: string) sort order: ++ Map-reduce partition columns: _col3 (type: string) Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775341 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: float), _col5 (type: double) Execution mode: vectorized {code} {code} VERTEX TOTAL_TASKSDURATION_SECONDS CPU_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS Map 1 62 26.41 1,779,510 211,978,502 60,628,390 Map 5 14.28 6,950 138,098 138,098 Map 6 12.44 3,910 31 31 Reducer 2 2 22.69 61,320 60,628,390 69,182 Reducer 3 12.63 3,910 69,182 100 Reducer 4 11.01 1,180 100 100 {code} Query {code} explain select i_item_desc ,i_category ,i_class ,i_current_price ,i_item_id ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratio from web_sales ,item ,date_dim where web_sales.ws_item_sk = item.i_item_sk and item.i_category in ('Jewelry', 'Sports', 'Books') and web_sales.ws_sold_date_sk = date_dim.d_date_sk and date_dim.d_date between '2001-01-12' and '2001-02-11' group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100 {code} Explain {code} STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) Reducer 4 - Reducer 3 (SIMPLE_EDGE) DagName: mmokhtar_20141019164343_854cb757-01bd-40cb-843e-9ada7c5e6f38:1 Vertices: Map 1 Map Operator Tree: TableScan alias: web_sales filterExpr: ws_item_sk is not null (type: boolean) Statistics: Num rows: 21594638446 Data size: 2850189889652 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ws_item_sk is not null (type: boolean) Statistics: Num rows: 21594638446 Data size: 172746300152 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ws_item_sk (type: int), ws_ext_sales_price (type: float), ws_sold_date_sk (type: int) outputColumnNames: _col0, _col1, _col2
[jira] [Commented] (HIVE-8689) handle overflows in statistics better
[ https://issues.apache.org/jira/browse/HIVE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193394#comment-14193394 ] Prasanth J commented on HIVE-8689: -- [~sershe] HIVE-8671 committed now. Can you rebase this patch now? Also can you fix Mostafa's change to reducer estimation. It will estimate one reducer less than the previous code. For example: if totalInputFileSize is 140 and bytesPerReducer is 100 then current change will just say 1 reducer. We should either have Math.ceil or Math.max(totalInputFileSize, totalInputFileSize + bytesPerReducer - 1)/bytesPerReducer.. handle overflows in statistics better - Key: HIVE-8689 URL: https://issues.apache.org/jira/browse/HIVE-8689 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-8689.01.patch, HIVE-8689.patch Improve overflow checks in StatsAnnotation optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8689) handle overflows in statistics better
[ https://issues.apache.org/jira/browse/HIVE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193521#comment-14193521 ] Prasanth J commented on HIVE-8689: -- +1 handle overflows in statistics better - Key: HIVE-8689 URL: https://issues.apache.org/jira/browse/HIVE-8689 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-8689.01.patch, HIVE-8689.02.patch, HIVE-8689.patch Improve overflow checks in StatsAnnotation optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8689) handle overflows in statistics better
[ https://issues.apache.org/jira/browse/HIVE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193522#comment-14193522 ] Prasanth J commented on HIVE-8689: -- [~sershe] minor nit: Can you remove the getMaxIfOverflow() method? Since we are using safeAdd, safeMultiply methods we don't need that anymore. handle overflows in statistics better - Key: HIVE-8689 URL: https://issues.apache.org/jira/browse/HIVE-8689 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-8689.01.patch, HIVE-8689.02.patch, HIVE-8689.patch Improve overflow checks in StatsAnnotation optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8495) Add progress bar for Hive on Tez queries
[ https://issues.apache.org/jira/browse/HIVE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8495: - Attachment: HIVE-8495.8.patch Add progress bar for Hive on Tez queries Key: HIVE-8495 URL: https://issues.apache.org/jira/browse/HIVE-8495 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-8495.1.patch, HIVE-8495.2.patch, HIVE-8495.3.patch, HIVE-8495.4.patch, HIVE-8495.5.patch, HIVE-8495.6.patch, HIVE-8495.7.patch, HIVE-8495.8.patch, Screen Shot 2014-10-16 at 9.35.26 PM.png, Screen Shot 2014-10-22 at 11.48.57 AM.png, in-place-progress-update.png Build a Progress bar to provide overall progress on running tasks. Progress is calculated as : (Completed tasks) / (Total number of tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8495) Add progress bar for Hive on Tez queries
[ https://issues.apache.org/jira/browse/HIVE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8495: - Attachment: HIVE-8495.9.patch Vertex status information fixes. Add progress bar for Hive on Tez queries Key: HIVE-8495 URL: https://issues.apache.org/jira/browse/HIVE-8495 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-8495.1.patch, HIVE-8495.2.patch, HIVE-8495.3.patch, HIVE-8495.4.patch, HIVE-8495.5.patch, HIVE-8495.6.patch, HIVE-8495.7.patch, HIVE-8495.8.patch, HIVE-8495.9.patch, Screen Shot 2014-10-16 at 9.35.26 PM.png, Screen Shot 2014-10-22 at 11.48.57 AM.png, in-place-progress-update.png Build a Progress bar to provide overall progress on running tasks. Progress is calculated as : (Completed tasks) / (Total number of tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)