[jira] Commented: (HIVE-78) Authorization infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932306#action_12932306 ] Namit Jain commented on HIVE-78: Few minor comments: 1. Can you add more comments in M* files (the new files in the metastore) ? 2. MRoleEntiry needs a database name - so does the thirft file ? 3. Can you verify that create and create table as select works for hive replication ? 4. Can you check who adds inputs/outputs for locking operations ? Authorization infrastructure for Hive - Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hive Issue Type: New Feature Components: Metastore, Query Processor, Server Infrastructure Reporter: Ashish Thusoo Assignee: He Yongqiang Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, HIVE-78.1.nothrift.patch, HIVE-78.1.thrift.patch, HIVE-78.2.nothrift.patch, HIVE-78.2.thrift.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authorization infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932354#action_12932354 ] Namit Jain commented on HIVE-78: Driver: //do the authorization check 385 if (HiveConf.getBoolVar(conf, 386 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED)) { 387 boolean pass = doAuthorization(sem); 388 if (!pass) { 389 console.printError(Authrizatio\ n failed (not enough privileges found t? o run the query.).); 390 return (400); 391 } 392 } Can we print the reason which privilege was missing ? Can we optimize the scenario - we are checking for all partitions one-by-one both for inputs and outputs ? What if the user/group/role has the table privilege - we dont need to go over all the partitions one by one. We can even do this in a follow-up Why do we need the change in QueryPlan ? showGrants: should the output have a schema ? Going forwad, it will be easier for JDBC clients to parse. No need to change WriteEntity etc. ? user cannot be made a reserved word - ~20 tables have a column called 'user' in facebook - please check 'role' and 'option'. SemanticAnalyzer: 3511 not needed What happens to replication of roles - needs to be done Where are the privileges copied for a newly created partition ? Authorization infrastructure for Hive - Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hive Issue Type: New Feature Components: Metastore, Query Processor, Server Infrastructure Reporter: Ashish Thusoo Assignee: He Yongqiang Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, HIVE-78.1.nothrift.patch, HIVE-78.1.thrift.patch, HIVE-78.2.nothrift.patch, HIVE-78.2.thrift.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row
[ https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932666#action_12932666 ] Namit Jain commented on HIVE-1642: -- hive-default.xml 477 property 478 namehive.mapjoin.hashtable.threshold/name 479 value10/value 480 descriptionthe threshold for the mapjoin hashtable/description 481 /property 482 483 property 484 namehive.mapjoin.hashtable.loadfactor/name 485 value0.75/value 486 descriptionthe load factor for the mapjoin hashtable/description 487 /property 488 489 property 490 namehive.mapjoin.smalltable.filesize/name 491 value2500/value 492 descriptionThe threshold for the input file size of the small tables; if the file size is smaller than this threshold, it will try to concert the common join into map join/description 493 /property 494 495 property 496 namehive.mapjoin.localtask.max.memory.usage/name 497 value0.90/value 498 descriptionThe max memory usage of the local task for map join/description 499 /property 500 Add more comments for the 1,2 and 4 properties. spelling mistake in the third: concert - convert Uncheckout DriverContext.java Why should backup task be obtained from the resolver ? It can be created at task creation time itself ? Convert join queries to map-join based on size of table/row --- Key: HIVE-1642 URL: https://issues.apache.org/jira/browse/HIVE-1642 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive_1642_1.patch, hive_1642_2.patch, hive_1642_4.patch Based on the number of rows and size of each table, Hive should automatically be able to convert a join into map-join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1796) dumps time at which lock was taken along with the queryid in show locks T extended
dumps time at which lock was taken along with the queryid in show locks T extended Key: HIVE-1796 URL: https://issues.apache.org/jira/browse/HIVE-1796 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.7.0 It would be useful to dump the time at which the lock was taken for debugging -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row
[ https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932741#action_12932741 ] Namit Jain commented on HIVE-1642: -- ConditionalResolverCommonJoin // generate file size to alias mapping; but connot set file size as key, // using 2 list to keep mapping spelling (connot) Convert join queries to map-join based on size of table/row --- Key: HIVE-1642 URL: https://issues.apache.org/jira/browse/HIVE-1642 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1642_5.patch, hive-1642_6.patch, hive_1642_1.patch, hive_1642_2.patch, hive_1642_4.patch Based on the number of rows and size of each table, Hive should automatically be able to convert a join into map-join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row
[ https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932744#action_12932744 ] Namit Jain commented on HIVE-1642: -- ConditionalResolverCommonJoin // Iterate the sorted_set to get big/small table file size for (int index = 0; index sortedList.size(); index++) { Long key = sortedList.get(index); int i = fileSizeList.indexOf(key); String alias = aliasList.get(i); if (index != (size - 1)) { smallTablesFileSizeSum += key.longValue(); } else { bigTableFileSize += key.longValue(); bigTableFileAlias = alias; } } The lines: int i = fileSizeList.indexOf(key); String alias = aliasList.get(i); are only needed in the 'else' block Convert join queries to map-join based on size of table/row --- Key: HIVE-1642 URL: https://issues.apache.org/jira/browse/HIVE-1642 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1642_5.patch, hive-1642_6.patch, hive_1642_1.patch, hive_1642_2.patch, hive_1642_4.patch Based on the number of rows and size of each table, Hive should automatically be able to convert a join into map-join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1795) outputs not correctly populated for alter table
[ https://issues.apache.org/jira/browse/HIVE-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1795: - Attachment: hive.1795.1.patch outputs not correctly populated for alter table --- Key: HIVE-1795 URL: https://issues.apache.org/jira/browse/HIVE-1795 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.7.0 Attachments: hive.1795.1.patch For any : alter table T partition p ... The table T is added in the output. It leads to problems with locking, and will lead to problems in future for authorization. The partition should be in the output, not the table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row
[ https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932843#action_12932843 ] Namit Jain commented on HIVE-1642: -- +1 running tests Convert join queries to map-join based on size of table/row --- Key: HIVE-1642 URL: https://issues.apache.org/jira/browse/HIVE-1642 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1642_10.patch, hive-1642_11.patch, hive-1642_5.patch, hive-1642_6.patch, hive-1642_7.patch, hive-1642_9.patch, hive_1642_1.patch, hive_1642_2.patch, hive_1642_4.patch Based on the number of rows and size of each table, Hive should automatically be able to convert a join into map-join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext
[ https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933044#action_12933044 ] Namit Jain commented on HIVE-1785: -- Can you regenerate the patch ? I have already committed HIVE-1642 change Pre/Post Query Hooks to take in 1 parameter: HookContext --- Key: HIVE-1785 URL: https://issues.apache.org/jira/browse/HIVE-1785 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Liyin Tang Attachments: hive_1785_1.patch, hive_1785_2.patch This way, it would be possible to add new parameters to the hooks without changing the existing hooks. This will be a incompatible change, and all the hooks need to change to the new API -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1783: - Status: Open (was: Patch Available) Can you refresh the patch ? HIVE-1642 has been committed, so this is good to go CommonJoinOperator optimize the case of 1:1 join Key: HIVE-1783 URL: https://issues.apache.org/jira/browse/HIVE-1783 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to: 1. handle null cases for outer joins 2. handle the case of duplicated keys from one join party We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1796) dumps time at which lock was taken along with the queryid in show locks T extended
[ https://issues.apache.org/jira/browse/HIVE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1796: - Attachment: hive.1796.1.patch dumps time at which lock was taken along with the queryid in show locks T extended Key: HIVE-1796 URL: https://issues.apache.org/jira/browse/HIVE-1796 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.7.0 Attachments: hive.1796.1.patch It would be useful to dump the time at which the lock was taken for debugging -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext
[ https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933062#action_12933062 ] Namit Jain commented on HIVE-1785: -- +1 running tests change Pre/Post Query Hooks to take in 1 parameter: HookContext --- Key: HIVE-1785 URL: https://issues.apache.org/jira/browse/HIVE-1785 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Liyin Tang Attachments: hive-1785_3.patch, hive_1785_1.patch, hive_1785_2.patch This way, it would be possible to add new parameters to the hooks without changing the existing hooks. This will be a incompatible change, and all the hooks need to change to the new API -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext
[ https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1785: - Hadoop Flags: [Reviewed] Status: Patch Available (was: Open) change Pre/Post Query Hooks to take in 1 parameter: HookContext --- Key: HIVE-1785 URL: https://issues.apache.org/jira/browse/HIVE-1785 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Liyin Tang Attachments: hive-1785_3.patch, hive_1785_1.patch, hive_1785_2.patch This way, it would be possible to add new parameters to the hooks without changing the existing hooks. This will be a incompatible change, and all the hooks need to change to the new API -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site
[ https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933065#action_12933065 ] Namit Jain commented on HIVE-1611: -- @Edward, can we get this in ? Add alternative search-provider to Hive site Key: HIVE-1611 URL: https://issues.apache.org/jira/browse/HIVE-1611 Project: Hive Issue Type: Improvement Reporter: Alex Baranau Assignee: Edward Capriolo Priority: Minor Attachments: HIVE-1611.patch Use search-hadoop.com service to make available search in Hive sources, MLs, wiki, etc. This was initially proposed on user mailing list. The search service was already added in site's skin (common for all Hadoop related projects) before so this issue is about enabling it for Hive. The ultimate goal is to use it at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933516#action_12933516 ] Namit Jain commented on HIVE-1783: -- +1 running tests CommonJoinOperator optimize the case of 1:1 join Key: HIVE-1783 URL: https://issues.apache.org/jira/browse/HIVE-1783 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch, HIVE-1783.4.patch CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to: 1. handle null cases for outer joins 2. handle the case of duplicated keys from one join party We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1783) CommonJoinOperator optimize the case of 1:1 join
[ https://issues.apache.org/jira/browse/HIVE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1783: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Siying CommonJoinOperator optimize the case of 1:1 join Key: HIVE-1783 URL: https://issues.apache.org/jira/browse/HIVE-1783 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-1783.1.patch, HIVE-1783.2.patch, HIVE-1783.3.patch, HIVE-1783.4.patch CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to: 1. handle null cases for outer joins 2. handle the case of duplicated keys from one join party We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1787) optimize the code path when there are no outer joins
[ https://issues.apache.org/jira/browse/HIVE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934405#action_12934405 ] Namit Jain commented on HIVE-1787: -- +1 running tests. How much improvement did it lead to in the join queries ? optimize the code path when there are no outer joins Key: HIVE-1787 URL: https://issues.apache.org/jira/browse/HIVE-1787 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-1787.1.patch Currently, outer joins and joins are handled in the same manner - a special case for no outer joins would be useful -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934619#action_12934619 ] Namit Jain commented on HIVE-1648: -- I haven't taken a look at the code, but here are the comments for the tests Instead of: desc extended table_name in the tests, please use show table extended like `table_name`; This will dump stats in a new line and can be easily compared. The non-deterministic stats are ignored. Add a test for limit in the sub-query. Dont select from existing tables: src/src1 for your stats tests. Create new tables and then set hive.stats.autogather.read to true. This was, you are sure that the remaining tests will not be affected. Add another test for 3-way join where the join keys are not the same: something like: select .. from A join B on A.key1 = B.key1 join C on B.key2 = C.key2 where Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1805) Ability to create dynamic partitions atomically
[ https://issues.apache.org/jira/browse/HIVE-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934657#action_12934657 ] Namit Jain commented on HIVE-1805: -- Currently, if a query creates partitions dynamically, some of them may be created and some others fail. It will be useful to have an atomic way to running the query - either all the partitions should be created or none of them. The same problem exists for multi-table inserts, but it is not a very common scenario. Ability to create dynamic partitions atomically --- Key: HIVE-1805 URL: https://issues.apache.org/jira/browse/HIVE-1805 Project: Hive Issue Type: New Feature Reporter: Namit Jain -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934682#action_12934682 ] Namit Jain commented on HIVE-1648: -- In SemanticAnalyzer:addStatsTask: } else { 6177 ListNode children = (ListNode) op.getChildren(); 6178 if (children != null) { 6179for (Node child : children) { 6180 opsToProcess.add((Operator? extends Serializable) child); 6181} 6182 } why is the above code block needed ? TableScan can only be at the top. Also, can you check for Conditional Tasks in addition to MapRedTask ? Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934683#action_12934683 ] Namit Jain commented on HIVE-1648: -- Otherwise, it looks OK Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1792) track the joins which are being converted to map-join automatically
[ https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935167#action_12935167 ] Namit Jain commented on HIVE-1792: -- No need for this track the joins which are being converted to map-join automatically --- Key: HIVE-1792 URL: https://issues.apache.org/jira/browse/HIVE-1792 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1792-1.patch, hive-1792-2.patch We should be able to track how many queries (join) got converted to map-join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1792) track the joins which are being converted to map-join automatically
[ https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935399#action_12935399 ] Namit Jain commented on HIVE-1792: -- Why dont we do the same in plan/ConditionalResolverCommonJoin - there we know what is going on ? Also, can we remove the unrelated changes -- for eg. using a different DistributedCache API etc. in this patch track the joins which are being converted to map-join automatically --- Key: HIVE-1792 URL: https://issues.apache.org/jira/browse/HIVE-1792 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1792-1.patch, hive-1792-2.patch, hive-1792-3.patch We should be able to track how many queries (join) got converted to map-join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1792) track the joins which are being converted to map-join automatically
[ https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935450#action_12935450 ] Namit Jain commented on HIVE-1792: -- +1 running tests track the joins which are being converted to map-join automatically --- Key: HIVE-1792 URL: https://issues.apache.org/jira/browse/HIVE-1792 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1792-1.patch, hive-1792-2.patch, hive-1792-3.patch, hive-1792-4.patch We should be able to track how many queries (join) got converted to map-join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1792) track the joins which are being converted to map-join automatically
[ https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-1792. -- Resolution: Fixed Hadoop Flags: [Reviewed] Committed. Thanks Liyin track the joins which are being converted to map-join automatically --- Key: HIVE-1792 URL: https://issues.apache.org/jira/browse/HIVE-1792 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1792-1.patch, hive-1792-2.patch, hive-1792-3.patch, hive-1792-4.patch We should be able to track how many queries (join) got converted to map-join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1813) Hive should be able to run on multiple data centers
Hive should be able to run on multiple data centers --- Key: HIVE-1813 URL: https://issues.apache.org/jira/browse/HIVE-1813 Project: Hive Issue Type: New Feature Reporter: Namit Jain Fix For: 0.7.0 Currently, hive assumes a single metastore and the HADOOP_HOME is passed as a environment variable. It would be desirable to support hive on top of multiple data centers (dfs + mr). For eg. there could be 2 metastores: primary and secondary. They would have different dfs's , and there will be a dfs-mr mapping maintained by the metastore. Hive would be enhanced to support multiple metastores and all operations (ddl + query) would span multiple metastores. Different consistency pluggable policies can be employed - for eg. if a table/partition can be present in both the metastores with different last modification times, either the last one can be used or an error can be thrown. It will be upto the application (outside hive) to copy the data from one metastore to another, and to maintain consistency inside. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1813) Hive should be able to run on multiple data centers
[ https://issues.apache.org/jira/browse/HIVE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965126#action_12965126 ] Namit Jain commented on HIVE-1813: -- The data can be copied from one dfs to another using distcp - later on a wrapper can be developed in hive for the same. Something like: alter table T partition P copy src to dst; alter table T partition P move src to dst; Hive should be able to run on multiple data centers --- Key: HIVE-1813 URL: https://issues.apache.org/jira/browse/HIVE-1813 Project: Hive Issue Type: New Feature Reporter: Namit Jain Fix For: 0.7.0 Currently, hive assumes a single metastore and the HADOOP_HOME is passed as a environment variable. It would be desirable to support hive on top of multiple data centers (dfs + mr). For eg. there could be 2 metastores: primary and secondary. They would have different dfs's , and there will be a dfs-mr mapping maintained by the metastore. Hive would be enhanced to support multiple metastores and all operations (ddl + query) would span multiple metastores. Different consistency pluggable policies can be employed - for eg. if a table/partition can be present in both the metastores with different last modification times, either the last one can be used or an error can be thrown. It will be upto the application (outside hive) to copy the data from one metastore to another, and to maintain consistency inside. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1819) maintain lastAccessTime in the metastore
[ https://issues.apache.org/jira/browse/HIVE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain reassigned HIVE-1819: Assignee: Namit Jain maintain lastAccessTime in the metastore Key: HIVE-1819 URL: https://issues.apache.org/jira/browse/HIVE-1819 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1819) maintain lastAccessTime in the metastore
maintain lastAccessTime in the metastore Key: HIVE-1819 URL: https://issues.apache.org/jira/browse/HIVE-1819 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1819) maintain lastAccessTime in the metastore
[ https://issues.apache.org/jira/browse/HIVE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1819: - Attachment: hive.1819.1.patch maintain lastAccessTime in the metastore Key: HIVE-1819 URL: https://issues.apache.org/jira/browse/HIVE-1819 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1819.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1819) maintain lastAccessTime in the metastore
[ https://issues.apache.org/jira/browse/HIVE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1819: - Status: Patch Available (was: Open) maintain lastAccessTime in the metastore Key: HIVE-1819 URL: https://issues.apache.org/jira/browse/HIVE-1819 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1819.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1819) maintain lastAccessTime in the metastore
[ https://issues.apache.org/jira/browse/HIVE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965736#action_12965736 ] Namit Jain commented on HIVE-1819: -- The reason I did not use it is because it an int. maintain lastAccessTime in the metastore Key: HIVE-1819 URL: https://issues.apache.org/jira/browse/HIVE-1819 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1819.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1822) Hive Conf variables should be relative to the dfs
Hive Conf variables should be relative to the dfs - Key: HIVE-1822 URL: https://issues.apache.org/jira/browse/HIVE-1822 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Namit Jain Currently, the following parameter: hive.metastore.warehouse.dir refers the path completely. It becomes difficult to maintain if a mapping from Hive Database - DFS is added. This is needed for multi data-center support from Hive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1820) Make Hive database data center aware
[ https://issues.apache.org/jira/browse/HIVE-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965918#action_12965918 ] Namit Jain commented on HIVE-1820: -- Going forward, none of the other hive configuration parameters should access the dfs directly Make Hive database data center aware Key: HIVE-1820 URL: https://issues.apache.org/jira/browse/HIVE-1820 Project: Hive Issue Type: New Feature Reporter: Ning Zhang Assignee: Ning Zhang In order to support multiple data centers (different DFS, MR clusters) for hive, it is desirable to extend Hive database to be data center aware. Currently Hive database is a logical concept and has no DFS or MR cluster info associated with it. Database has the location property indicating the default warehouse directory, but user cannot specify and change it. In order to make it data center aware, the following info need to be maintained: 1) data warehouse root location which is the default HDFS location for newly created tables (default=hive.metadata.warehouse.dir). 2) scratch dir which is the HDFS location where MR intermediate files are created (default=hive.exec.scratch.dir) 3) MR job tracker URI that jobs should be submitted to (default=mapred.job.tracker) 4) hadoop (bin) dir ($HADOOP_HOME/bin/hadoop) These parameters should be saved in database.parameters (key, value) pair and they overwrite the jobconf parameters (so if the default database has no parameter it will get it from the hive-default.xml or hive-site.xml as it is now). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1819) maintain lastAccessTime in the metastore
[ https://issues.apache.org/jira/browse/HIVE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1819: - Attachment: hive.1819.2.patch maintain lastAccessTime in the metastore Key: HIVE-1819 URL: https://issues.apache.org/jira/browse/HIVE-1819 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1819.1.patch, hive.1819.2.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1819) maintain lastAccessTime in the metastore
[ https://issues.apache.org/jira/browse/HIVE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965950#action_12965950 ] Namit Jain commented on HIVE-1819: -- added comments maintain lastAccessTime in the metastore Key: HIVE-1819 URL: https://issues.apache.org/jira/browse/HIVE-1819 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1819.1.patch, hive.1819.2.patch, hive.1819.3.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1517) ability to select across a database
[ https://issues.apache.org/jira/browse/HIVE-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966243#action_12966243 ] Namit Jain commented on HIVE-1517: -- We would like to use it right away ability to select across a database --- Key: HIVE-1517 URL: https://issues.apache.org/jira/browse/HIVE-1517 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Carl Steinbach Fix For: 0.7.0 Attachments: HIVE-1517.1.patch.txt After https://issues.apache.org/jira/browse/HIVE-675, we need a way to be able to select across a database for this feature to be useful. For eg: use db1 create table foo(); use db2 select .. from db1.foo. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1819) maintain lastAccessTime in the metastore
[ https://issues.apache.org/jira/browse/HIVE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1819: - Attachment: hive.1819.4.patch maintain lastAccessTime in the metastore Key: HIVE-1819 URL: https://issues.apache.org/jira/browse/HIVE-1819 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1819.1.patch, hive.1819.2.patch, hive.1819.3.patch, hive.1819.4.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1826) StatsTask updates the table/partition object leaving a inconsistent version in hooks
[ https://issues.apache.org/jira/browse/HIVE-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966357#action_12966357 ] Namit Jain commented on HIVE-1826: -- The inputs and outputs from the ReadEntity and WriteEntity are passed to the hooks. However, the StatsTask may have updated these objects. Isn't it possible that the hooks (post execution) will see a stale version of this data ? And, if these hooks update these objects and write them back to the metastore, the Stats changes will be lost. StatsTask updates the table/partition object leaving a inconsistent version in hooks Key: HIVE-1826 URL: https://issues.apache.org/jira/browse/HIVE-1826 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1828) show locks should not use getTable()/getPartition
show locks should not use getTable()/getPartition -- Key: HIVE-1828 URL: https://issues.apache.org/jira/browse/HIVE-1828 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: He Yongqiang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1822) Hive Conf variables should be relative to the dfs
[ https://issues.apache.org/jira/browse/HIVE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1822: - Status: Patch Available (was: Open) Hive Conf variables should be relative to the dfs - Key: HIVE-1822 URL: https://issues.apache.org/jira/browse/HIVE-1822 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1822.1.patch Currently, the following parameter: hive.metastore.warehouse.dir refers the path completely. It becomes difficult to maintain if a mapping from Hive Database - DFS is added. This is needed for multi data-center support from Hive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1822) Hive Conf variables should be relative to the dfs
[ https://issues.apache.org/jira/browse/HIVE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1822: - Attachment: hive.1822.1.patch Hive Conf variables should be relative to the dfs - Key: HIVE-1822 URL: https://issues.apache.org/jira/browse/HIVE-1822 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1822.1.patch Currently, the following parameter: hive.metastore.warehouse.dir refers the path completely. It becomes difficult to maintain if a mapping from Hive Database - DFS is added. This is needed for multi data-center support from Hive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1290#action_1290 ] Namit Jain commented on HIVE-1648: -- I dont see any new tests Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.4.patch, HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1828) show locks should not use getTable()/getPartition
[ https://issues.apache.org/jira/browse/HIVE-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966847#action_12966847 ] Namit Jain commented on HIVE-1828: -- One minor comment: In case of show locks T extended; Does anyone check that the table exists ? The DDLTask can do that before calling zookeeper show locks should not use getTable()/getPartition -- Key: HIVE-1828 URL: https://issues.apache.org/jira/browse/HIVE-1828 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: He Yongqiang Attachments: HIVE-1828.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1828) show locks should not use getTable()/getPartition
[ https://issues.apache.org/jira/browse/HIVE-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966981#action_12966981 ] Namit Jain commented on HIVE-1828: -- can you add the new patch ? also, can you add a negative test (if you have not done so already) ? show locks should not use getTable()/getPartition -- Key: HIVE-1828 URL: https://issues.apache.org/jira/browse/HIVE-1828 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: He Yongqiang Attachments: HIVE-1828.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1828) show locks should not use getTable()/getPartition
[ https://issues.apache.org/jira/browse/HIVE-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1828: - Status: Open (was: Patch Available) show locks should not use getTable()/getPartition -- Key: HIVE-1828 URL: https://issues.apache.org/jira/browse/HIVE-1828 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: He Yongqiang Attachments: HIVE-1828.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1830) mappers in group followed by joins may die OOM
mappers in group followed by joins may die OOM -- Key: HIVE-1830 URL: https://issues.apache.org/jira/browse/HIVE-1830 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Liyin Tang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM
[ https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967105#action_12967105 ] Namit Jain commented on HIVE-1830: -- After HIVE-1642, joins are automatically converted into map-joins at physical optimization time. However, this may lead to problems. For eg: consider the query: select T1.val, count(1) from T1 join T2 on T1.key=T2.key group by T1.val This will have 2 map-reduce jobs, one for the join and the other for group by. Before HIVE-1642, the partial group for aggregation will be performed in the reducer where the join is performed. However, after HIVE-1642, the same will be performed in the mapper. The local task will confirm that there is just enough memory to hold the map-join data. Hoever, it does not take into account the memory needed for partial group by. So, in case there is group by followed by join, it is a good idea to reduce the memory given to the local task to validate if there is enough memory to fit small table - it can be controlled by a new configuration paramter, but it can be some default: say 70% of total memory (instead of 90%). Also, the group by may still run out of memory, so it might be a good idea to check in group by for free memory and periodically flush memory mappers in group followed by joins may die OOM -- Key: HIVE-1830 URL: https://issues.apache.org/jira/browse/HIVE-1830 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Liyin Tang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1831) Add a option to run task to check map-join possibility in non-local mode
Add a option to run task to check map-join possibility in non-local mode Key: HIVE-1831 URL: https://issues.apache.org/jira/browse/HIVE-1831 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Liyin Tang In HIVE-1642, we run a local task to figure out if the small table can be held in memory, and then convert the join into a map-join. However, this can be a good idea for thin clients (which may not have enough memory). This should be made configurable - where the default can still be to run the task locally on the client machine, but an option should be added for thin clients, where the task would be run as a map-only task -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1834) more debugging for locking
more debugging for locking -- Key: HIVE-1834 URL: https://issues.apache.org/jira/browse/HIVE-1834 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Along with the time and the queryid, it might be a good idea to log if the lock was acquired explicitly (by a lock command) or implicitly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1823) upgrade the database thrift interface to allow parameters key-value pairs
[ https://issues.apache.org/jira/browse/HIVE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968463#action_12968463 ] Namit Jain commented on HIVE-1823: -- +1 running tests upgrade the database thrift interface to allow parameters key-value pairs - Key: HIVE-1823 URL: https://issues.apache.org/jira/browse/HIVE-1823 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1823.patch In order to store data center specify parameters to Hive database, it is desirable to extend Hive database thrift interface with a parameters map similar to Table and Partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1763) drop table (or view) should issue warning if table doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968612#action_12968612 ] Namit Jain commented on HIVE-1763: -- +1 The approach looks fine drop table (or view) should issue warning if table doesn't exist Key: HIVE-1763 URL: https://issues.apache.org/jira/browse/HIVE-1763 Project: Hive Issue Type: Improvement Components: Metastore Reporter: dan f Assignee: Paul Butler Priority: Minor Attachments: HIVE-1763.patch drop table reports OK even if the table doesn't exist. Better to report something like mysql's Unknown table 'foo' so that, e.g., unwanted tables (especially ones with names prone to typos) don't persist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1823) upgrade the database thrift interface to allow parameters key-value pairs
[ https://issues.apache.org/jira/browse/HIVE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-1823. -- Resolution: Fixed Hadoop Flags: [Reviewed] Committed. Thanks Ning upgrade the database thrift interface to allow parameters key-value pairs - Key: HIVE-1823 URL: https://issues.apache.org/jira/browse/HIVE-1823 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1823.2.patch, HIVE-1823.patch In order to store data center specify parameters to Hive database, it is desirable to extend Hive database thrift interface with a parameters map similar to Table and Partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1763) drop table (or view) should issue warning if table doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1763: - Status: Open (was: Patch Available) drop table (or view) should issue warning if table doesn't exist Key: HIVE-1763 URL: https://issues.apache.org/jira/browse/HIVE-1763 Project: Hive Issue Type: Improvement Components: Metastore Reporter: dan f Assignee: Paul Butler Priority: Minor Attachments: HIVE-1763.patch drop table reports OK even if the table doesn't exist. Better to report something like mysql's Unknown table 'foo' so that, e.g., unwanted tables (especially ones with names prone to typos) don't persist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1763) drop table (or view) should issue warning if table doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968835#action_12968835 ] Namit Jain commented on HIVE-1763: -- However, it will need a lot of test result files to be updated. Most of the tests will break drop table (or view) should issue warning if table doesn't exist Key: HIVE-1763 URL: https://issues.apache.org/jira/browse/HIVE-1763 Project: Hive Issue Type: Improvement Components: Metastore Reporter: dan f Assignee: Paul Butler Priority: Minor Attachments: HIVE-1763.patch drop table reports OK even if the table doesn't exist. Better to report something like mysql's Unknown table 'foo' so that, e.g., unwanted tables (especially ones with names prone to typos) don't persist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968841#action_12968841 ] Namit Jain commented on HIVE-1648: -- @Yongqiang, you have missed the test changes in the patch - can you add them also ? Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.4.patch, HIVE-1648.patch, hive-1648.svn.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1508) Add cleanup method to HiveHistory class
[ https://issues.apache.org/jira/browse/HIVE-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968845#action_12968845 ] Namit Jain commented on HIVE-1508: -- +1 Add cleanup method to HiveHistory class --- Key: HIVE-1508 URL: https://issues.apache.org/jira/browse/HIVE-1508 Project: Hive Issue Type: Bug Components: Metastore Reporter: Anurag Phadke Assignee: Edward Capriolo Priority: Blocker Fix For: 0.7.0 Attachments: hive-1508-1-patch.txt Running hive server for long time 90 minutes results in too many open file-handles, eventually causing the server to crash as the server runs out of file handle. Actual bug as described by Carl Steinbach: the hive_job_log_* files are created by the HiveHistory class. This class creates a PrintWriter for writing to the file, but never closes the writer. It looks like we need to add a cleanup method to HiveHistory that closes the PrintWriter and does any other necessary cleanup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1821) describe database command
[ https://issues.apache.org/jira/browse/HIVE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-1821. -- Resolution: Duplicate Duplicate of HIVE-1836 describe database command - Key: HIVE-1821 URL: https://issues.apache.org/jira/browse/HIVE-1821 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang a describe (extended) database command would be helpful if we introduces parameters associated with databases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1821) describe database command
[ https://issues.apache.org/jira/browse/HIVE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968939#action_12968939 ] Namit Jain commented on HIVE-1821: -- If you are doing this, do you want to add a 'alter database' also ? describe database command - Key: HIVE-1821 URL: https://issues.apache.org/jira/browse/HIVE-1821 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang a describe (extended) database command would be helpful if we introduces parameters associated with databases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1836) Extend the CREATE DATABASE command with DBPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969037#action_12969037 ] Namit Jain commented on HIVE-1836: -- +1 Extend the CREATE DATABASE command with DBPROPERTIES Key: HIVE-1836 URL: https://issues.apache.org/jira/browse/HIVE-1836 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1836.patch We should be able to assign key-value pairs of properties to Hive databases. The proposed syntax is similar to the CREATE TABLE and CREATE INDEX commands: {code} CREATE DATABASE DB_NAME WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 'value2'); {code} The {code} DESC DATABASE EXTENDED DB_NAME; {code} should be able to display the properties. (requires HIVE-1821) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969089#action_12969089 ] Namit Jain commented on HIVE-1096: -- sure, that would be very useful Let me know if you run into any issues Hive Variables -- Key: HIVE-1096 URL: https://issues.apache.org/jira/browse/HIVE-1096 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-15.patch.txt, hive-1096-15.patch.txt, hive-1096-2.diff, hive-1096-20.patch.txt, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff From mailing list: --Amazon Elastic MapReduce version of Hive seems to have a nice feature called Variables. Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT} within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?-- This could be implemented in many places. A simple place to put this is in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected. There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this. I will get started on implementing in compile unless someone wants to discuss this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1837) optional timeout for hive clients
[ https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969093#action_12969093 ] Namit Jain commented on HIVE-1837: -- @Ashutosh, we cant wait for this feature till secure hadoop is available. Once Hive is migrated to that, we can change the implementation of this feature. @Yongqiang, can you add the new parameter definition in hive-default.xml ? Also, can you make the thread sleep time (10 min.) configurable ? Can you add a new test for the same - I mean, have a very small timeout and thread sleep time, and a custom script which is sleeping indefinitely ? optional timeout for hive clients - Key: HIVE-1837 URL: https://issues.apache.org/jira/browse/HIVE-1837 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive-1837.1.patch, hive-1837.2.patch It would be a good idea to have a optional timeout for hive clients. We encountered a query today, which seemed to have run by mistake, and it was running for about a month. This was holding zookeeper locks, and making the whole debugging more complex than it should be. It would be a good idea to have a timeout for a hive client. @Ning, I remember there was some issue with the Hive client having a timeout of 1 day with HiPal. Do you remember the details ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM
[ https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969203#action_12969203 ] Namit Jain commented on HIVE-1830: -- if (groupByOp.getConf() == null) { 91 System.out.println(Group by desc is null); 92 return null; 93} This should never happen GroupByOperator: memoryThreshold = HiveConf.getFloatVar(hconf, HiveConf.ConfVars.HIVEMAPAGGRM⬅ EMORYTHRESHOLD); This should also be in groupByDesc mappers in group followed by joins may die OOM -- Key: HIVE-1830 URL: https://issues.apache.org/jira/browse/HIVE-1830 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Liyin Tang Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, hive-1830-4.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1837) optional timeout for hive clients
[ https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969562#action_12969562 ] Namit Jain commented on HIVE-1837: -- OK, the changes look good. +1 optional timeout for hive clients - Key: HIVE-1837 URL: https://issues.apache.org/jira/browse/HIVE-1837 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive-1837.1.patch, hive-1837.2.patch It would be a good idea to have a optional timeout for hive clients. We encountered a query today, which seemed to have run by mistake, and it was running for about a month. This was holding zookeeper locks, and making the whole debugging more complex than it should be. It would be a good idea to have a timeout for a hive client. @Ning, I remember there was some issue with the Hive client having a timeout of 1 day with HiPal. Do you remember the details ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1842) Add the local flag to all the map red tasks, if the query is running locally.
[ https://issues.apache.org/jira/browse/HIVE-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969565#action_12969565 ] Namit Jain commented on HIVE-1842: -- +1 Add the local flag to all the map red tasks, if the query is running locally. - Key: HIVE-1842 URL: https://issues.apache.org/jira/browse/HIVE-1842 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: 0.4.1 Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hive-1842-1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1844) Hanging hive client caused by TaskRunner's OutOfMemoryError
[ https://issues.apache.org/jira/browse/HIVE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969614#action_12969614 ] Namit Jain commented on HIVE-1844: -- Great find, Yongqiang +1 Hanging hive client caused by TaskRunner's OutOfMemoryError --- Key: HIVE-1844 URL: https://issues.apache.org/jira/browse/HIVE-1844 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1844.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1842) Add the local flag to all the map red tasks, if the query is running locally.
[ https://issues.apache.org/jira/browse/HIVE-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1842: - Resolution: Fixed Fix Version/s: 0.7.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Liyin Add the local flag to all the map red tasks, if the query is running locally. - Key: HIVE-1842 URL: https://issues.apache.org/jira/browse/HIVE-1842 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: 0.4.1 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1842-1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift
[ https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969633#action_12969633 ] Namit Jain commented on HIVE-1526: -- @Ning, can you take care of this ? So many other patches are waiting for this ? Hive should depend on a release version of Thrift - Key: HIVE-1526 URL: https://issues.apache.org/jira/browse/HIVE-1526 Project: Hive Issue Type: Task Components: Build Infrastructure, Clients Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.7.0 Attachments: compile.err, HIVE-1526-complete.4.patch.txt, HIVE-1526-complete.5.patch.txt, HIVE-1526-complete.6.patch.txt, HIVE-1526-complete.7.patch.txt, HIVE-1526-complete.8.patch.txt, HIVE-1526-no-codegen.3.patch.txt, HIVE-1526-no-codegen.4.patch.txt, HIVE-1526-no-codegen.5.patch.txt, HIVE-1526-no-codegen.6.patch.txt, HIVE-1526-no-codegen.7.patch.txt, HIVE-1526-no-codegen.8.patch.txt, HIVE-1526.2.patch.txt, HIVE-1526.3.patch.txt, hive-1526.txt, libfb303.jar, libthrift.jar, serde2_test.patch, svn_rm.sh, test.log, thrift-0.5.0.jar, thrift-fb303-0.5.0.jar Hive should depend on a release version of Thrift, and ideally it should use Ivy to resolve this dependency. The Thrift folks are working on adding Thrift artifacts to a maven repository here: https://issues.apache.org/jira/browse/THRIFT-363 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1830) mappers in group followed by joins may die OOM
[ https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-1830. -- Resolution: Fixed Fix Version/s: 0.7.0 Hadoop Flags: [Reviewed] Committed. Thanks Liyin mappers in group followed by joins may die OOM -- Key: HIVE-1830 URL: https://issues.apache.org/jira/browse/HIVE-1830 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, hive-1830-4.patch, hive-1830-5.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1844) Hanging hive client caused by TaskRunner's OutOfMemoryError
[ https://issues.apache.org/jira/browse/HIVE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1844: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Yongqiang Hanging hive client caused by TaskRunner's OutOfMemoryError --- Key: HIVE-1844 URL: https://issues.apache.org/jira/browse/HIVE-1844 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-1844.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1843) add an option in dynamic partition inserts to throw an error if 0 partitions are created
[ https://issues.apache.org/jira/browse/HIVE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969861#action_12969861 ] Namit Jain commented on HIVE-1843: -- +1 add an option in dynamic partition inserts to throw an error if 0 partitions are created Key: HIVE-1843 URL: https://issues.apache.org/jira/browse/HIVE-1843 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang Attachments: HIVE-1843.patch Currently, we print a error message in that scenario. However, it would be very useful if an option was added where we would error out. This would help a lot in debugging -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970009#action_12970009 ] Namit Jain commented on HIVE-1648: -- 1. QBParseInfo: add setDestToLimit() for symmetry() 2. I am not sure any of your tests are working - set hive.stats.autogather = false before you create the tables for which you want the stats to be populated while reading. Clearly, this is the reason why piggyback_part.q is working. 2. piggyback_join.q End: show table extended like piggy_table3; drop table piggy_table; Add: show table extended like piggy_table1; show table extended like piggy_table2; Also, add a test where you are joining: piggyback_table1 a join piggyback_table2 b on a.key = b.key join piggyback_table3 c b.key = c.key and then show table extended all the 3 tables. 3. piggyback_limit.q add: show table extended like piggy_table1; before the end. It should have no stats 4. piggbyback_subq.q and _union.q are wrong - you need to create new tables, and then show table extended them at the end, just like other tests. 5. Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.4.patch, HIVE-1648.5.patch, HIVE-1648.patch, hive-1648.svn.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1843) add an option in dynamic partition inserts to throw an error if 0 partitions are created
[ https://issues.apache.org/jira/browse/HIVE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1843: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Ning add an option in dynamic partition inserts to throw an error if 0 partitions are created Key: HIVE-1843 URL: https://issues.apache.org/jira/browse/HIVE-1843 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang Attachments: HIVE-1843.patch Currently, we print a error message in that scenario. However, it would be very useful if an option was added where we would error out. This would help a lot in debugging -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1694) Accelerate query execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970245#action_12970245 ] Namit Jain commented on HIVE-1694: -- I think having a mechanism which lets is issue internal or recursive sql is better in the long term. That is something we will need anyway for future optimizations. We can create a thin API around SemanticAnalyzer (analyze etc.), which is indirectly present in Driver. Another implementation of that API can be the internal API, say RecursiveDriver. In a recursive context, you are only allowed to invoke RecursiveDriver. External Clients (CliDriver, HiveServer etc.) invoke Driver directly. As John said, definitely keep your optimizations pluggable. Currently, they are invoked as rule-based, but should be flexible enough to be invoked based on some costs in the future. Accelerate query execution using indexes Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: demo_q1.hql, demo_q2.hql, HIVE-1694_2010-10-28.diff The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1847) option of continue on error
option of continue on error --- Key: HIVE-1847 URL: https://issues.apache.org/jira/browse/HIVE-1847 Project: Hive Issue Type: Improvement Reporter: Namit Jain In hive -f script, if any sql/command fails in that script than hive exists with exit status -1, without continuing the remaining hive commands. Sometimes it is better to continue the script even during errors. For example, if a hive sql script contains many drop table commands, the command would exit when it could not find a table. But in this case, it is preferable to continue dropping remaining tables -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1848) bug in MAPJOIN
bug in MAPJOIN -- Key: HIVE-1848 URL: https://issues.apache.org/jira/browse/HIVE-1848 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain explain FROM srcpart c JOIN srcpart d ON ( c.key=d.key AND c.ds='2008-04-08' AND d.ds='2008-04-08') SELECT /*+ MAPJOIN(d) */ DISTINCT c.campaign_id; The above query throws an error: FAILED: Error in semantic analysis: line 0:-1 Invalid Function TOK_MAPJOIN -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1849) add more logging to partition pruning
add more logging to partition pruning - Key: HIVE-1849 URL: https://issues.apache.org/jira/browse/HIVE-1849 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain In facebook, we are seeing some intermittent errors, where it seems that either all the partitions are not returned by the metastore or some of them are pruned wrongly. This patch adds more logging for debugging such scenarios. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1849) add more logging to partition pruning
[ https://issues.apache.org/jira/browse/HIVE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1849: - Attachment: hive.1849.1.patch add more logging to partition pruning - Key: HIVE-1849 URL: https://issues.apache.org/jira/browse/HIVE-1849 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1849.1.patch In facebook, we are seeing some intermittent errors, where it seems that either all the partitions are not returned by the metastore or some of them are pruned wrongly. This patch adds more logging for debugging such scenarios. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job
[ https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1695: - Status: Open (was: Patch Available) MapJoin followed by ReduceSink should be done as single MapReduce Job - Key: HIVE-1695 URL: https://issues.apache.org/jira/browse/HIVE-1695 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Sreekanth Ramakrishnan Attachments: hive-1695-1.patch, hive-1695.patch Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map only job followed by a Map-Reduce job. It can be combined into single MapReduce Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-78) Authorization infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-78: --- Status: Open (was: Patch Available) Authorization infrastructure for Hive - Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hive Issue Type: New Feature Components: Metastore, Query Processor, Server Infrastructure Reporter: Ashish Thusoo Assignee: He Yongqiang Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, HIVE-78.1.nothrift.patch, HIVE-78.1.thrift.patch, HIVE-78.2.nothrift.patch, HIVE-78.2.thrift.patch, HIVE-78.4.complete.patch, HIVE-78.4.no_thrift.patch, HIVE-78.5.complete.patch, HIVE-78.5.no_thrift.patch, HIVE-78.6.complete.patch, HIVE-78.6.no_thrift.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1848) bug in MAPJOIN
[ https://issues.apache.org/jira/browse/HIVE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1848: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Yongqiang bug in MAPJOIN -- Key: HIVE-1848 URL: https://issues.apache.org/jira/browse/HIVE-1848 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive-1848.1.patch explain FROM srcpart c JOIN srcpart d ON ( c.key=d.key AND c.ds='2008-04-08' AND d.ds='2008-04-08') SELECT /*+ MAPJOIN(d) */ DISTINCT c.campaign_id; The above query throws an error: FAILED: Error in semantic analysis: line 0:-1 Invalid Function TOK_MAPJOIN -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1845) Some attributes in the Eclipse template file is deprecated
[ https://issues.apache.org/jira/browse/HIVE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1845: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Liyin Some attributes in the Eclipse template file is deprecated Key: HIVE-1845 URL: https://issues.apache.org/jira/browse/HIVE-1845 Project: Hive Issue Type: Bug Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hive-1845-1.patch In the eclipse template file, it will reference this jar file, which is deprecated. /@PROJECT@/build/metastore/hive-mod...@hive_version@.jar So the correct one should be: /@PROJECT@/build/metastore/hive-metasto...@hive_version@.jar Just update all the eclipse template files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1849) add more logging to partition pruning
[ https://issues.apache.org/jira/browse/HIVE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971425#action_12971425 ] Namit Jain commented on HIVE-1849: -- We need this log to confirm that add more logging to partition pruning - Key: HIVE-1849 URL: https://issues.apache.org/jira/browse/HIVE-1849 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1849.1.patch In facebook, we are seeing some intermittent errors, where it seems that either all the partitions are not returned by the metastore or some of them are pruned wrongly. This patch adds more logging for debugging such scenarios. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1851) wrong number of rows inserted reported by Hive
wrong number of rows inserted reported by Hive -- Key: HIVE-1851 URL: https://issues.apache.org/jira/browse/HIVE-1851 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang The counters that hive uses to report the number of rows inserted are not very reliable. Unless they become correct, it is a good idea to disable these reports. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1851) wrong number of rows inserted reported by Hive
[ https://issues.apache.org/jira/browse/HIVE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-1851. -- Resolution: Duplicate Duplicate of https://issues.apache.org/jira/browse/HIVE-934 wrong number of rows inserted reported by Hive -- Key: HIVE-1851 URL: https://issues.apache.org/jira/browse/HIVE-1851 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang The counters that hive uses to report the number of rows inserted are not very reliable. Unless they become correct, it is a good idea to disable these reports. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1806) The merge criteria on dynamic partitons should be per partiton
[ https://issues.apache.org/jira/browse/HIVE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971870#action_12971870 ] Namit Jain commented on HIVE-1806: -- +1 The merge criteria on dynamic partitons should be per partiton -- Key: HIVE-1806 URL: https://issues.apache.org/jira/browse/HIVE-1806 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1806.patch Currently the criteria of whether a merge job should be fired on dynamic generated partitions are is the average file size of files across all dynamic partitions. It is very common that some dynamic partitions contains mostly large files and some contains mostly small files. Even though the average size of the total files are larger than the hive.merge.smallfiles.avgsize, we should merge those partitions containing small files only. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1806) The merge criteria on dynamic partitons should be per partiton
[ https://issues.apache.org/jira/browse/HIVE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971977#action_12971977 ] Namit Jain commented on HIVE-1806: -- test dyn_part_empty.q failed - can you take a look ? The merge criteria on dynamic partitons should be per partiton -- Key: HIVE-1806 URL: https://issues.apache.org/jira/browse/HIVE-1806 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1806.patch Currently the criteria of whether a merge job should be fired on dynamic generated partitions are is the average file size of files across all dynamic partitions. It is very common that some dynamic partitions contains mostly large files and some contains mostly small files. Even though the average size of the total files are larger than the hive.merge.smallfiles.avgsize, we should merge those partitions containing small files only. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1806) The merge criteria on dynamic partitons should be per partiton
[ https://issues.apache.org/jira/browse/HIVE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1806: - Status: Open (was: Patch Available) The merge criteria on dynamic partitons should be per partiton -- Key: HIVE-1806 URL: https://issues.apache.org/jira/browse/HIVE-1806 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1806.patch Currently the criteria of whether a merge job should be fired on dynamic generated partitions are is the average file size of files across all dynamic partitions. It is very common that some dynamic partitions contains mostly large files and some contains mostly small files. Even though the average size of the total files are larger than the hive.merge.smallfiles.avgsize, we should merge those partitions containing small files only. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1853) downgrade JDO version
downgrade JDO version - Key: HIVE-1853 URL: https://issues.apache.org/jira/browse/HIVE-1853 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Paul Yang After HIVE-1609, we are seeing some table not found errors intermittently. We have a test case where 5 processes are concurrently issueing the same query - explain extended insert .. select from T and once in a while, we get a error T not found - When we revert back the JDO version, the error is gone. We can investigate later to find the JDO bug, but for now this is a show-stopper for facebook, and needs to be reverted back immediately. This also means, that the filters will not be pushed to mysql. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1853) downgrade JDO version
[ https://issues.apache.org/jira/browse/HIVE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972198#action_12972198 ] Namit Jain commented on HIVE-1853: -- +1 Running tests downgrade JDO version - Key: HIVE-1853 URL: https://issues.apache.org/jira/browse/HIVE-1853 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Paul Yang Attachments: HIVE-1853.1.patch, HIVE-1853.2.patch After HIVE-1609, we are seeing some table not found errors intermittently. We have a test case where 5 processes are concurrently issueing the same query - explain extended insert .. select from T and once in a while, we get a error T not found - When we revert back the JDO version, the error is gone. We can investigate later to find the JDO bug, but for now this is a show-stopper for facebook, and needs to be reverted back immediately. This also means, that the filters will not be pushed to mysql. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1853) downgrade JDO version
[ https://issues.apache.org/jira/browse/HIVE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1853: - Resolution: Fixed Fix Version/s: 0.7.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Paul downgrade JDO version - Key: HIVE-1853 URL: https://issues.apache.org/jira/browse/HIVE-1853 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Paul Yang Fix For: 0.7.0 Attachments: HIVE-1853.1.patch, HIVE-1853.2.patch After HIVE-1609, we are seeing some table not found errors intermittently. We have a test case where 5 processes are concurrently issueing the same query - explain extended insert .. select from T and once in a while, we get a error T not found - When we revert back the JDO version, the error is gone. We can investigate later to find the JDO bug, but for now this is a show-stopper for facebook, and needs to be reverted back immediately. This also means, that the filters will not be pushed to mysql. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1853) downgrade JDO version
[ https://issues.apache.org/jira/browse/HIVE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972276#action_12972276 ] Namit Jain commented on HIVE-1853: -- Also, is there some other more stable version of JDO which does not have this problem ? downgrade JDO version - Key: HIVE-1853 URL: https://issues.apache.org/jira/browse/HIVE-1853 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Paul Yang Fix For: 0.7.0 Attachments: HIVE-1853.1.patch, HIVE-1853.2.patch After HIVE-1609, we are seeing some table not found errors intermittently. We have a test case where 5 processes are concurrently issueing the same query - explain extended insert .. select from T and once in a while, we get a error T not found - When we revert back the JDO version, the error is gone. We can investigate later to find the JDO bug, but for now this is a show-stopper for facebook, and needs to be reverted back immediately. This also means, that the filters will not be pushed to mysql. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1853) downgrade JDO version
[ https://issues.apache.org/jira/browse/HIVE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972280#action_12972280 ] Namit Jain commented on HIVE-1853: -- Ashutosh, what is your timeline ? Right now, we dont have the infra-structure in place to pick some patches and ignore others. We pick all the patches from the open source to our internal tree. For the time it will take us to develop this, can you live with the current trunk (lower JDO) ? downgrade JDO version - Key: HIVE-1853 URL: https://issues.apache.org/jira/browse/HIVE-1853 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Paul Yang Fix For: 0.7.0 Attachments: HIVE-1853.1.patch, HIVE-1853.2.patch After HIVE-1609, we are seeing some table not found errors intermittently. We have a test case where 5 processes are concurrently issueing the same query - explain extended insert .. select from T and once in a while, we get a error T not found - When we revert back the JDO version, the error is gone. We can investigate later to find the JDO bug, but for now this is a show-stopper for facebook, and needs to be reverted back immediately. This also means, that the filters will not be pushed to mysql. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1854) Temporarily disable metastore tests for listPartitionsByFilter()
[ https://issues.apache.org/jira/browse/HIVE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1854: - Status: Open (was: Patch Available) Temporarily disable metastore tests for listPartitionsByFilter() Key: HIVE-1854 URL: https://issues.apache.org/jira/browse/HIVE-1854 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Attachments: HIVE-1854.1.patch After the JDO downgrade in HIVE-1853, the tests for the disabled function listPartitionByFilter() should be disabled as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1854) Temporarily disable metastore tests for listPartitionsByFilter()
[ https://issues.apache.org/jira/browse/HIVE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973322#action_12973322 ] Namit Jain commented on HIVE-1854: -- +1 Temporarily disable metastore tests for listPartitionsByFilter() Key: HIVE-1854 URL: https://issues.apache.org/jira/browse/HIVE-1854 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Attachments: HIVE-1854.1.patch After the JDO downgrade in HIVE-1853, the tests for the disabled function listPartitionByFilter() should be disabled as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1854) Temporarily disable metastore tests for listPartitionsByFilter()
[ https://issues.apache.org/jira/browse/HIVE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-1854. -- Resolution: Fixed Hadoop Flags: [Reviewed] Committed. Thanks Paul Temporarily disable metastore tests for listPartitionsByFilter() Key: HIVE-1854 URL: https://issues.apache.org/jira/browse/HIVE-1854 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Attachments: HIVE-1854.1.patch After the JDO downgrade in HIVE-1853, the tests for the disabled function listPartitionByFilter() should be disabled as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1855) Include Process ID in the log4j log file name
[ https://issues.apache.org/jira/browse/HIVE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973383#action_12973383 ] Namit Jain commented on HIVE-1855: -- +1 Include Process ID in the log4j log file name - Key: HIVE-1855 URL: https://issues.apache.org/jira/browse/HIVE-1855 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1855.patch Hive client side always log into /tmp/${user.name}/hive.log. If there are multipel CLI running on the same host, logging could be stopped or if it is not it's difficult to distinguish messages between them. It would be easier for debugging if different CLI output to different log files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1855) Include Process ID in the log4j log file name
[ https://issues.apache.org/jira/browse/HIVE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1855: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Ning Include Process ID in the log4j log file name - Key: HIVE-1855 URL: https://issues.apache.org/jira/browse/HIVE-1855 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1855.patch Hive client side always log into /tmp/${user.name}/hive.log. If there are multipel CLI running on the same host, logging could be stopped or if it is not it's difficult to distinguish messages between them. It would be easier for debugging if different CLI output to different log files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1853) downgrade JDO version
[ https://issues.apache.org/jira/browse/HIVE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973847#action_12973847 ] Namit Jain commented on HIVE-1853: -- Unfortunately, the query that I was running used some production tables. I will try to reproduce the query with some non-production tables. downgrade JDO version - Key: HIVE-1853 URL: https://issues.apache.org/jira/browse/HIVE-1853 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Paul Yang Fix For: 0.7.0 Attachments: HIVE-1853.1.patch, HIVE-1853.2.patch After HIVE-1609, we are seeing some table not found errors intermittently. We have a test case where 5 processes are concurrently issueing the same query - explain extended insert .. select from T and once in a while, we get a error T not found - When we revert back the JDO version, the error is gone. We can investigate later to find the JDO bug, but for now this is a show-stopper for facebook, and needs to be reverted back immediately. This also means, that the filters will not be pushed to mysql. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1818) Call frequency and duration metrics for HiveMetaStore via jmx
[ https://issues.apache.org/jira/browse/HIVE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1818: - Status: Patch Available (was: Open) Call frequency and duration metrics for HiveMetaStore via jmx - Key: HIVE-1818 URL: https://issues.apache.org/jira/browse/HIVE-1818 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Sushanth Sowmyan Priority: Minor Attachments: HIVE-1818.patch As recently brought up in the hive-dev mailing list, it'd be useful if the HiveMetaStore had some sort of instrumentation capability so as to measure frequency of calls to various calls on the HiveMetaStore and the duration of time spent in these calls. There are already incrementCounter() and logStartFunction() / logStartTableFunction() ,etc calls in HiveMetaStore, and they could be refactored/repurposed to make calls that expose JMX MBeans as well. Or, a Metrics subsystem could be introduced which made calls to incrementCounter()/etc as a refactor. It might also be possible to specify a -D parameter that the Metrics subsystem could use to determine whether or not to be enabled, and if so, on to what port. And once we have the capability to instrument and expose MBeans, it might also be possible for other subsystems to also adopt and use this system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.