[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969878#action_12969878 ] Paul Butler commented on HIVE-1648: --- Added an SVN patch (5) that applies to latest. Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.4.patch, HIVE-1648.5.patch, HIVE-1648.patch, hive-1648.svn.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970009#action_12970009 ] Namit Jain commented on HIVE-1648: -- 1. QBParseInfo: add setDestToLimit() for symmetry() 2. I am not sure any of your tests are working - set hive.stats.autogather = false before you create the tables for which you want the stats to be populated while reading. Clearly, this is the reason why piggyback_part.q is working. 2. piggyback_join.q End: show table extended like piggy_table3; drop table piggy_table; Add: show table extended like piggy_table1; show table extended like piggy_table2; Also, add a test where you are joining: piggyback_table1 a join piggyback_table2 b on a.key = b.key join piggyback_table3 c b.key = c.key and then show table extended all the 3 tables. 3. piggyback_limit.q add: show table extended like piggy_table1; before the end. It should have no stats 4. piggbyback_subq.q and _union.q are wrong - you need to create new tables, and then show table extended them at the end, just like other tests. 5. Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.4.patch, HIVE-1648.5.patch, HIVE-1648.patch, hive-1648.svn.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968841#action_12968841 ] Namit Jain commented on HIVE-1648: -- @Yongqiang, you have missed the test changes in the patch - can you add them also ? Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.4.patch, HIVE-1648.patch, hive-1648.svn.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1290#action_1290 ] Namit Jain commented on HIVE-1648: -- I dont see any new tests Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.4.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966354#action_12966354 ] Paul Butler commented on HIVE-1648: --- Changes made. Note that subqueries are not piggybacked, but tests are there to make sure they still run when hive.stats.autogather=true. Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.4.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935551#action_12935551 ] Paul Butler commented on HIVE-1648: --- Namit, it looks like show table extended like `table_name`; doesn't print the number of rows. Unless there's a way to make it do that, I'll have to stick with desc extended. I sent you an email for clarification on the ConditionalTasks also. Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934619#action_12934619 ] Namit Jain commented on HIVE-1648: -- I haven't taken a look at the code, but here are the comments for the tests Instead of: desc extended table_name in the tests, please use show table extended like `table_name`; This will dump stats in a new line and can be easily compared. The non-deterministic stats are ignored. Add a test for limit in the sub-query. Dont select from existing tables: src/src1 for your stats tests. Create new tables and then set hive.stats.autogather.read to true. This was, you are sure that the remaining tests will not be affected. Add another test for 3-way join where the join keys are not the same: something like: select .. from A join B on A.key1 = B.key1 join C on B.key2 = C.key2 where Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934682#action_12934682 ] Namit Jain commented on HIVE-1648: -- In SemanticAnalyzer:addStatsTask: } else { 6177 ListNode children = (ListNode) op.getChildren(); 6178 if (children != null) { 6179for (Node child : children) { 6180 opsToProcess.add((Operator? extends Serializable) child); 6181} 6182 } why is the above code block needed ? TableScan can only be at the top. Also, can you check for Conditional Tasks in addition to MapRedTask ? Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934683#action_12934683 ] Namit Jain commented on HIVE-1648: -- Otherwise, it looks OK Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930326#action_12930326 ] Paul Butler commented on HIVE-1648: --- I get a bunch of tests failing when I build the latest trunk, even without applying my patch. I'm trying to figure out what's wrong with those first. Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930321#action_12930321 ] Namit Jain commented on HIVE-1648: -- Paul, any updates on the unit tests ? Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928706#action_12928706 ] Namit Jain commented on HIVE-1648: -- Paul, do you have an apache account ? Can you send it to me - I will add you as a hive contributor. Also, can you 'Submit Patch' when you add a patch - this way everyone knows that the patch is ready for review Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Attachments: HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928733#action_12928733 ] Namit Jain commented on HIVE-1648: -- getOuterQueryLimit is not used correctly. qb maintains a mapping of limit per destination case HiveParser.TOK_LIMIT: qbp.setDestLimit(ctx_1.dest, new Integer(ast.getChild(0).getText())); break; Since you are browsing the input tables, you dont care about the destination. Go over: private final HashMapString, Integer destToLimit; in QBParseInfo. If that is not-empty, do not gather stats Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Attachments: HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.