[jira] [Commented] (HIVE-6701) Analyze table compute statistics for decimal columns.
[ https://issues.apache.org/jira/browse/HIVE-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943626#comment-13943626 ] Shreepadma Venugopalan commented on HIVE-6701: -- The extra unused field were added in HIVE-1362 precisely to avoid upgrading the schema. Analyze table compute statistics for decimal columns. - Key: HIVE-6701 URL: https://issues.apache.org/jira/browse/HIVE-6701 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Sergey Shelukhin Attachments: HIVE-6701.02.patch, HIVE-6701.1.patch Analyze table should compute statistics for decimal columns as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876140#comment-13876140 ] Shreepadma Venugopalan commented on HIVE-6157: -- Currently, the API fetches statistics for a given column. hive.stats.fetch.column.stats fetches stats for all columns for all partitions in all tables. Bad idea. HIVE-4301 was filed to support a bulk fetch API so that stats for all columns for all partitions in multiple tables can be fetched with a single call. Feel free to pick up HIVE-4301. Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5780) Add the missing declaration of HIVE_CLI_SERVICE_PROTOCOL_V4 in TCLIService.thrift
[ https://issues.apache.org/jira/browse/HIVE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816940#comment-13816940 ] Shreepadma Venugopalan commented on HIVE-5780: -- Regenerating the thrift bindings fails without this patch. Thanks for putting this patch together. Add the missing declaration of HIVE_CLI_SERVICE_PROTOCOL_V4 in TCLIService.thrift - Key: HIVE-5780 URL: https://issues.apache.org/jira/browse/HIVE-5780 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-5780.1.patch TCLIService.thrift is updated as part of HIVE-5355. The new enum HIVE_CLI_SERVICE_PROTOCOL_V4 is referred in the file, but the declaration is missing. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801070#comment-13801070 ] Shreepadma Venugopalan commented on HIVE-4957: -- Thanks, Brock! Restrict number of bit vectors, to prevent out of Java heap memory -- Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland Assignee: Shreepadma Venugopalan Fix For: 0.13.0 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5536) Incorrect Operation Name is passed to hookcontext
Shreepadma Venugopalan created HIVE-5536: Summary: Incorrect Operation Name is passed to hookcontext Key: HIVE-5536 URL: https://issues.apache.org/jira/browse/HIVE-5536 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0, 0.12.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan HS2 passes incorrect operation name to hookcontext. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786375#comment-13786375 ] Shreepadma Venugopalan commented on HIVE-4669: -- Thank you Brock! Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.13.0 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch, HIVE-4669.4.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4669: - Status: Open (was: Patch Available) Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4669: - Status: Patch Available (was: Open) Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4669: - Attachment: HIVE-4669.3.patch Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784352#comment-13784352 ] Shreepadma Venugopalan commented on HIVE-4669: -- Attached a new patch with the changes. Not sure if we'd have had to modify the patch except to remove {noformat} this.userName = userName {noformat} from Driver.java. Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4669: - Attachment: HIVE-4669.4.patch Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch, HIVE-4669.4.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784371#comment-13784371 ] Shreepadma Venugopalan commented on HIVE-4669: -- No worries. Let's make sure the new code is clean. Uploaded a new patch. Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch, HIVE-4669.4.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4957: - Attachment: HIVE-4957.2.patch Restrict number of bit vectors, to prevent out of Java heap memory -- Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland Assignee: Shreepadma Venugopalan Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783358#comment-13783358 ] Shreepadma Venugopalan commented on HIVE-4957: -- New patch addresses review comments. Restrict number of bit vectors, to prevent out of Java heap memory -- Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland Assignee: Shreepadma Venugopalan Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4669: - Status: Open (was: Patch Available) Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4669: - Status: Patch Available (was: Open) Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4669: - Attachment: HIVE-4669.2.patch Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783433#comment-13783433 ] Shreepadma Venugopalan commented on HIVE-4669: -- Attached new patch rebased to the tip of trunk. Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch Make username available to the semantic analyzer hooks. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779271#comment-13779271 ] Shreepadma Venugopalan commented on HIVE-4669: -- Is there anything else needed from my side? Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch Make username available to the semantic analyzer hooks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle
[ https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779269#comment-13779269 ] Shreepadma Venugopalan commented on HIVE-4670: -- Is there anything else needed from my side? Authentication module should pass the instance part of the Kerberos principle - Key: HIVE-4670 URL: https://issues.apache.org/jira/browse/HIVE-4670 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4670.2.patch, HIVE-4670.3.patch When Kerberos authentication is enabled for HiveServer2, the thrift SASL layer passes instance@realm from the principal. It should instead strip the realm and pass just the instance part of the principal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778224#comment-13778224 ] Shreepadma Venugopalan commented on HIVE-4629: -- I'm able to apply the patch with -p0 to the tip of trunk. I've re-attached the patch to trigger a run. HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4629.1.patch, HIVE-4629.2.patch, HIVE-4629-no_thrift.1.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4629: - Attachment: HIVE-4629.2.patch HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4629.1.patch, HIVE-4629.2.patch, HIVE-4629-no_thrift.1.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776701#comment-13776701 ] Shreepadma Venugopalan commented on HIVE-4669: -- My apologies for not responding earlier. We need this for integrating Sentry with Hive. Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch Make username available to the semantic analyzer hooks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle
[ https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776709#comment-13776709 ] Shreepadma Venugopalan commented on HIVE-4670: -- Apologies for not responding sooner. We need this for integrating Sentry with Hive. Users of Sentry prefer to mention the username without the realm when grating privileges. Authentication module should pass the instance part of the Kerberos principle - Key: HIVE-4670 URL: https://issues.apache.org/jira/browse/HIVE-4670 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4670.2.patch, HIVE-4670.3.patch When Kerberos authentication is enabled for HiveServer2, the thrift SASL layer passes instance@realm from the principal. It should instead strip the realm and pass just the instance part of the principal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4629: - Status: Patch Available (was: In Progress) HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4629: - Attachment: HIVE-4629.1.patch HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4629.1.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4629: - Attachment: HIVE-4629-no_thrift.1.patch HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4629.1.patch, HIVE-4629-no_thrift.1.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776969#comment-13776969 ] Shreepadma Venugopalan commented on HIVE-4629: -- Review board: https://reviews.apache.org/r/14326/ HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4629.1.patch, HIVE-4629-no_thrift.1.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4629: - Attachment: HIVE-4629.1.patch HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4629.1.patch, HIVE-4629-no_thrift.1.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4629: - Attachment: (was: HIVE-4629.1.patch) HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4629.1.patch, HIVE-4629-no_thrift.1.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks
[ https://issues.apache.org/jira/browse/HIVE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-5330: - Attachment: HIVE-5330.1.patch Pass query text and IPAddress to SemanticAnalyzerHooks -- Key: HIVE-5330 URL: https://issues.apache.org/jira/browse/HIVE-5330 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-5330.1.patch Today, semantic analyzer hooks don't have IPAddress of the client and query text available. Adding these additional pieces of information to the semantic analyzer hook will make auditing useful and meaningful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks
[ https://issues.apache.org/jira/browse/HIVE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-5330: - Status: Patch Available (was: Open) Pass query text and IPAddress to SemanticAnalyzerHooks -- Key: HIVE-5330 URL: https://issues.apache.org/jira/browse/HIVE-5330 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-5330.1.patch Today, semantic analyzer hooks don't have IPAddress of the client and query text available. Adding these additional pieces of information to the semantic analyzer hook will make auditing useful and meaningful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks
[ https://issues.apache.org/jira/browse/HIVE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-5330: - Component/s: SQL Affects Version/s: 0.11.0 Pass query text and IPAddress to SemanticAnalyzerHooks -- Key: HIVE-5330 URL: https://issues.apache.org/jira/browse/HIVE-5330 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Today, semantic analyzer hooks don't have IPAddress of the client and query text available. Adding these additional pieces of information to the semantic analyzer hook will make auditing useful and meaningful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks
Shreepadma Venugopalan created HIVE-5330: Summary: Pass query text and IPAddress to SemanticAnalyzerHooks Key: HIVE-5330 URL: https://issues.apache.org/jira/browse/HIVE-5330 Project: Hive Issue Type: Bug Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Today, semantic analyzer hooks don't have IPAddress of the client and query text available. Adding these additional pieces of information to the semantic analyzer hook will make auditing useful and meaningful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks
[ https://issues.apache.org/jira/browse/HIVE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-5330: - Attachment: (was: HIVE-5330.1.patch) Pass query text and IPAddress to SemanticAnalyzerHooks -- Key: HIVE-5330 URL: https://issues.apache.org/jira/browse/HIVE-5330 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Today, semantic analyzer hooks don't have IPAddress of the client and query text available. Adding these additional pieces of information to the semantic analyzer hook will make auditing useful and meaningful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks
[ https://issues.apache.org/jira/browse/HIVE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-5330: - Status: Open (was: Patch Available) Pass query text and IPAddress to SemanticAnalyzerHooks -- Key: HIVE-5330 URL: https://issues.apache.org/jira/browse/HIVE-5330 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Today, semantic analyzer hooks don't have IPAddress of the client and query text available. Adding these additional pieces of information to the semantic analyzer hook will make auditing useful and meaningful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773351#comment-13773351 ] Shreepadma Venugopalan commented on HIVE-4957: -- RB: https://reviews.apache.org/r/14250/ Restrict number of bit vectors, to prevent out of Java heap memory -- Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland Assignee: Shreepadma Venugopalan normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4957: - Attachment: HIVE-4957.1.patch Restrict number of bit vectors, to prevent out of Java heap memory -- Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland Assignee: Shreepadma Venugopalan Attachments: HIVE-4957.1.patch normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4957: - Status: Patch Available (was: In Progress) Restrict number of bit vectors, to prevent out of Java heap memory -- Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland Assignee: Shreepadma Venugopalan normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5272) Column statistics on a invalid column name results in IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764658#comment-13764658 ] Shreepadma Venugopalan commented on HIVE-5272: -- Thanks, Prasanth. The code in question assumes, incorrectly, that the validation done later by the SemanticAnalyzer is sufficient to raise an invalid column error. But looks like, the IndexOutOfBounds occurs prior. I think we can either fix the if condition in getTableColumnType() or alternatively perform the validation early. One of the reasons for deferring the validation was to piggyback on the existing logic later during SemanticAnalysis and avoid duplicating work. But, the patch you have put together looks simple enough. Column statistics on a invalid column name results in IndexOutOfBoundsException --- Key: HIVE-5272 URL: https://issues.apache.org/jira/browse/HIVE-5272 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: statistics Fix For: 0.13.0 Attachments: HIVE-5272.txt When invalid column name is specified for column statistics IndexOutOfBoundsException is thrown. {code}hive analyze table customer_staging compute statistics for columns c_first_name, invalid_name, c_customer_sk; FAILED: IndexOutOfBoundsException Index: 2, Size: 1{code} If the invalid column name appears at first or last then INVALID_COLUMN_REFERENCE is thrown at query planning stage. But if the invalid column name appears somewhere in the middle of column lists then IndexOutOfBoundsException is thrown at semantic analysis step. The problem is with getTableColumnType() and getPartitionColumnType() methods. The following segment {code}for (int i=0; i numCols; i++) { colName = colNames.get(i); for (FieldSchema col: cols) { if (colName.equalsIgnoreCase(col.getName())) { colTypes.add(i, new String(col.getType())); } } }{code} is the reason for it. If the invalid column names appears in the middle of column list then the equalsIgnoreCase() skips the invalid name and increments the i. Since the list is not initialized it results in exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5272) Column statistics on a invalid column name results in IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764688#comment-13764688 ] Shreepadma Venugopalan commented on HIVE-5272: -- In case I wasn't clear, I'm +1 on it. Column statistics on a invalid column name results in IndexOutOfBoundsException --- Key: HIVE-5272 URL: https://issues.apache.org/jira/browse/HIVE-5272 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: statistics Fix For: 0.13.0 Attachments: HIVE-5272.txt When invalid column name is specified for column statistics IndexOutOfBoundsException is thrown. {code}hive analyze table customer_staging compute statistics for columns c_first_name, invalid_name, c_customer_sk; FAILED: IndexOutOfBoundsException Index: 2, Size: 1{code} If the invalid column name appears at first or last then INVALID_COLUMN_REFERENCE is thrown at query planning stage. But if the invalid column name appears somewhere in the middle of column lists then IndexOutOfBoundsException is thrown at semantic analysis step. The problem is with getTableColumnType() and getPartitionColumnType() methods. The following segment {code}for (int i=0; i numCols; i++) { colName = colNames.get(i); for (FieldSchema col: cols) { if (colName.equalsIgnoreCase(col.getName())) { colTypes.add(i, new String(col.getType())); } } }{code} is the reason for it. If the invalid column names appears in the middle of column list then the equalsIgnoreCase() skips the invalid name and increments the i. Since the list is not initialized it results in exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5240) Column statistics on a partitioned column should fail early with proper error message
[ https://issues.apache.org/jira/browse/HIVE-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763363#comment-13763363 ] Shreepadma Venugopalan commented on HIVE-5240: -- Thanks, [~ashutoshc]. Column statistics on a partitioned column should fail early with proper error message - Key: HIVE-5240 URL: https://issues.apache.org/jira/browse/HIVE-5240 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: Prasanth J Assignee: Prasanth J Labels: statistics Fix For: 0.12.0 Attachments: HIVE-5240.txt When computing column statistics on a partitioned table, if one of the columns equals the partitioned column then IndexOutOfBoundsException is thrown. Following analyze query throws IndexOutOfBoundsException during semantic analysis phase {code}hive analyze table qlog_1m_part partition(year=5) compute statistics for columns year,month,week,type; FAILED: IndexOutOfBoundsException Index: 1, Size: 0{code} If the partitioned column is specified at last like below then the same exception is thrown at runtime {code}hive analyze table qlog_1m_part partition(year=5) compute statistics for columns month,week,type,year; Hadoop job information for null: number of mappers: 0; number of reducers: 0 2013-09-06 18:05:06,587 null map = 0%, reduce = 100% Ended Job = job_local861862820_0001 Execution completed successfully Mapred Local Task Succeeded . Convert the Join into MapJoin java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 at java.util.LinkedList.entry(LinkedList.java:365) at java.util.LinkedList.get(LinkedList.java:315) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.constructColumnStatsFromPackedRow(ColumnStatsTask.java:262) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistPartitionStats(ColumnStatsTask.java:302) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:345) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1407) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1187) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1017) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:885) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5240) Column statistics on a partitioned column should fail early with proper error message
[ https://issues.apache.org/jira/browse/HIVE-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762536#comment-13762536 ] Shreepadma Venugopalan commented on HIVE-5240: -- There is already a JIRA for this issue - HIVE-4426. However, HIVE-4426 aims to allow stats collection on the partitioning key. I think this can be useful. I'll be able to start working on HIVE-4426 next week. Let me know if there's interest. Thanks! Column statistics on a partitioned column should fail early with proper error message - Key: HIVE-5240 URL: https://issues.apache.org/jira/browse/HIVE-5240 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: Prasanth J Assignee: Prasanth J Labels: statistics Fix For: 0.12.0 Attachments: HIVE-5240.txt When computing column statistics on a partitioned table, if one of the columns equals the partitioned column then IndexOutOfBoundsException is thrown. Following analyze query throws IndexOutOfBoundsException during semantic analysis phase {code}hive analyze table qlog_1m_part partition(year=5) compute statistics for columns year,month,week,type; FAILED: IndexOutOfBoundsException Index: 1, Size: 0{code} If the partitioned column is specified at last like below then the same exception is thrown at runtime {code}hive analyze table qlog_1m_part partition(year=5) compute statistics for columns month,week,type,year; Hadoop job information for null: number of mappers: 0; number of reducers: 0 2013-09-06 18:05:06,587 null map = 0%, reduce = 100% Ended Job = job_local861862820_0001 Execution completed successfully Mapred Local Task Succeeded . Convert the Join into MapJoin java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 at java.util.LinkedList.entry(LinkedList.java:365) at java.util.LinkedList.get(LinkedList.java:315) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.constructColumnStatsFromPackedRow(ColumnStatsTask.java:262) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistPartitionStats(ColumnStatsTask.java:302) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:345) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1407) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1187) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1017) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:885) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1719) Move RegexSerDe out of hive-contrib and over to hive-serde
[ https://issues.apache.org/jira/browse/HIVE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748033#comment-13748033 ] Shreepadma Venugopalan commented on HIVE-1719: -- It was left in contrib so that we don't break backwards compatibility for existing users. Move RegexSerDe out of hive-contrib and over to hive-serde -- Key: HIVE-1719 URL: https://issues.apache.org/jira/browse/HIVE-1719 Project: Hive Issue Type: Task Components: Serializers/Deserializers Reporter: Carl Steinbach Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3051.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3051.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3141.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.4.patch, HIVE-1719.3.patch, HIVE-1719.D3249.1.patch RegexSerDe is as much a part of the standard Hive distribution as the other SerDes currently in hive-serde. I think we should move it over to the hive-serde module so that users don't have to go to the added effort of manually registering the contrib jar before using it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-4957: Assignee: Shreepadma Venugopalan Restrict number of bit vectors, to prevent out of Java heap memory -- Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland Assignee: Shreepadma Venugopalan normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory
[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-4957 started by Shreepadma Venugopalan. Restrict number of bit vectors, to prevent out of Java heap memory -- Key: HIVE-4957 URL: https://issues.apache.org/jira/browse/HIVE-4957 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Brock Noland Assignee: Shreepadma Venugopalan normally increase number of bit vectors will increase calculation accuracy. Let's say {noformat} select compute_stats(a, 40) from test_hive; {noformat} generally get better accuracy than {noformat} select compute_stats(a, 16) from test_hive; {noformat} But larger number of bit vectors also cause query run slower. When number of bit vectors over 50, it won't help to increase accuracy anymore. But it still increase memory usage, and crash Hive if number if too huge. Current Hive doesn't prevent user use ridiculous large number of bit vectors in 'compute_stats' query. One example {noformat} select compute_stats(a, 9) from column_eight_types; {noformat} crashes Hive. {noformat} 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 sec MapReduce Total cumulative CPU time: 290 msec Ended Job = job_1354923204155_0777 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ Examining task ID: task_1354923204155_0777_m_00 (and more) from job job_1354923204155_0777 Task with the most failures(4): - Task ID: task_1354923204155_0777_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00 - Diagnostic Messages for this Task: Error: Java heap space {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701665#comment-13701665 ] Shreepadma Venugopalan commented on HIVE-4669: -- Ping :) Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch Make username available to the semantic analyzer hooks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4549) JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM
[ https://issues.apache.org/jira/browse/HIVE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678217#comment-13678217 ] Shreepadma Venugopalan commented on HIVE-4549: -- +1 (non-committer). LGTM. JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM -- Key: HIVE-4549 URL: https://issues.apache.org/jira/browse/HIVE-4549 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.10.0 Environment: Hive 0.10 Reporter: Johndee Burks Assignee: Prasad Mujumdar Priority: Trivial Labels: newbie Fix For: 0.12.0 Attachments: HIVE-4549-1.patch The ResultSet returned by HiveDatabaseMetadata.getTables has the metadata columns TABLE_CAT, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, REMARKS. The second column name is not compliant with the JDBC standard (http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html#getSchemas()): the column name should be TABLE_SCHEM instead of TABLE_SCHEMA. Suggested fix in Hive (org.apache.hive.service.cli.operation.GetTablesOperation.java) change from private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() .addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) .addStringColumn(TABLE_SCHEMA, Schema name.) .addStringColumn(TABLE_NAME, Table name.) .addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, etc.) .addStringColumn(REMARKS, Comments about the table.); to private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() .addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) .addStringColumn(TABLE_SCHEM, Schema name.) .addStringColumn(TABLE_NAME, Table name.) .addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, etc.) .addStringColumn(REMARKS, Comments about the table.); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle
[ https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4670: - Attachment: HIVE-4670.3.patch Authentication module should pass the instance part of the Kerberos principle - Key: HIVE-4670 URL: https://issues.apache.org/jira/browse/HIVE-4670 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4670.2.patch, HIVE-4670.3.patch When Kerberos authentication is enabled for HiveServer2, the thrift SASL layer passes instance@realm from the principal. It should instead strip the realm and pass just the instance part of the principal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4588) Support session level hooks for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678515#comment-13678515 ] Shreepadma Venugopalan commented on HIVE-4588: -- +1 (non-binding), LGTM. Support session level hooks for HiveServer2 --- Key: HIVE-4588 URL: https://issues.apache.org/jira/browse/HIVE-4588 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-4588-1.patch Support session level hooks for HiveSrver2. The configured hooks will get executed at beginning of each new session. This is useful for auditing connections, possibly tuning the session level properties etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676744#comment-13676744 ] Shreepadma Venugopalan commented on HIVE-4561: -- [~clarkyzl]: My suggestion is to use Long.Min/Long.Max value instead of a null value. The code that looks at column stats can use the min/max in conjunction with other stats such as number of rows etc. to infer that the values are initialization values for min/max and not true values that represent the bounds on the column. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, HIVE-4561.4.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4675) Create new parallel unit test environment
[ https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13677319#comment-13677319 ] Shreepadma Venugopalan commented on HIVE-4675: -- +1 to the proposal. Create new parallel unit test environment - Key: HIVE-4675 URL: https://issues.apache.org/jira/browse/HIVE-4675 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Brock Noland Assignee: Brock Noland The current ptest tool is great, but it has the following limitations: -Requires an NFS filer -Unless the NFS filer is dedicated ptests can become IO bound easily -Investigating of failures is troublesome because the source directory for the failure is not saved -Ignoring or isolated tests is not supported -No unit tests for the ptest framework exist It'd be great to have a ptest tool that addresses this limitations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle
[ https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4670: - Status: Patch Available (was: In Progress) Authentication module should pass the instance part of the Kerberos principle - Key: HIVE-4670 URL: https://issues.apache.org/jira/browse/HIVE-4670 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4670.2.patch When Kerberos authentication is enabled for HiveServer2, the thrift SASL layer passes instance@realm from the principal. It should instead strip the realm and pass just the instance part of the principal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle
[ https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-4670 started by Shreepadma Venugopalan. Authentication module should pass the instance part of the Kerberos principle - Key: HIVE-4670 URL: https://issues.apache.org/jira/browse/HIVE-4670 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4670.2.patch When Kerberos authentication is enabled for HiveServer2, the thrift SASL layer passes instance@realm from the principal. It should instead strip the realm and pass just the instance part of the principal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle
[ https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4670: - Attachment: HIVE-4670.2.patch Authentication module should pass the instance part of the Kerberos principle - Key: HIVE-4670 URL: https://issues.apache.org/jira/browse/HIVE-4670 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4670.2.patch When Kerberos authentication is enabled for HiveServer2, the thrift SASL layer passes instance@realm from the principal. It should instead strip the realm and pass just the instance part of the principal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle
[ https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13677740#comment-13677740 ] Shreepadma Venugopalan commented on HIVE-4670: -- https://reviews.apache.org/r/11705/ Authentication module should pass the instance part of the Kerberos principle - Key: HIVE-4670 URL: https://issues.apache.org/jira/browse/HIVE-4670 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4670.2.patch When Kerberos authentication is enabled for HiveServer2, the thrift SASL layer passes instance@realm from the principal. It should instead strip the realm and pass just the instance part of the principal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4657) HCatalog checkstyle violation after HIVE-2670
[ https://issues.apache.org/jira/browse/HIVE-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4657: - Attachment: HIVE-4657.1.patch HCatalog checkstyle violation after HIVE-2670 -- Key: HIVE-4657 URL: https://issues.apache.org/jira/browse/HIVE-4657 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Shreepadma Venugopalan Attachments: HIVE-4657.1.patch After HIVE-2670 was committed, I see the following error, {noformat} checkstyle: [echo] hcatalog [checkstyle] Running Checkstyle 5.5 on 416 files [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/drivers/TestDriverHiveCmdLine.pm:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_cmdline.conf:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_nightly.conf:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [for] hcatalog: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/build.xml:310: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/hcatalog/build.xml:109: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/hcatalog/build-support/ant/checkstyle.xml:32: Got 3 errors and 0 warnings. BUILD FAILED /Users/vshree/work/repositories/hive15/build.xml:308: Keepgoing execution: 2 of 11 iterations failed. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4657) HCatalog checkstyle violation after HIVE-2670
[ https://issues.apache.org/jira/browse/HIVE-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4657: - Status: Patch Available (was: Open) HCatalog checkstyle violation after HIVE-2670 -- Key: HIVE-4657 URL: https://issues.apache.org/jira/browse/HIVE-4657 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Shreepadma Venugopalan Attachments: HIVE-4657.1.patch After HIVE-2670 was committed, I see the following error, {noformat} checkstyle: [echo] hcatalog [checkstyle] Running Checkstyle 5.5 on 416 files [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/drivers/TestDriverHiveCmdLine.pm:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_cmdline.conf:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_nightly.conf:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [for] hcatalog: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/build.xml:310: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/hcatalog/build.xml:109: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/hcatalog/build-support/ant/checkstyle.xml:32: Got 3 errors and 0 warnings. BUILD FAILED /Users/vshree/work/repositories/hive15/build.xml:308: Keepgoing execution: 2 of 11 iterations failed. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4657) HCatalog checkstyle violation after HIVE-2670
[ https://issues.apache.org/jira/browse/HIVE-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676190#comment-13676190 ] Shreepadma Venugopalan commented on HIVE-4657: -- This fixes the build which is currently broken. HCatalog checkstyle violation after HIVE-2670 -- Key: HIVE-4657 URL: https://issues.apache.org/jira/browse/HIVE-4657 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Shreepadma Venugopalan Attachments: HIVE-4657.1.patch After HIVE-2670 was committed, I see the following error, {noformat} checkstyle: [echo] hcatalog [checkstyle] Running Checkstyle 5.5 on 416 files [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/drivers/TestDriverHiveCmdLine.pm:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_cmdline.conf:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_nightly.conf:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [for] hcatalog: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/build.xml:310: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/hcatalog/build.xml:109: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/hcatalog/build-support/ant/checkstyle.xml:32: Got 3 errors and 0 warnings. BUILD FAILED /Users/vshree/work/repositories/hive15/build.xml:308: Keepgoing execution: 2 of 11 iterations failed. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676365#comment-13676365 ] Shreepadma Venugopalan commented on HIVE-4435: -- [~ashutoshc]: I've updated the .q files in the patches. Thanks! Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: chart_1(1).png, HIVE-4435.1.patch, HIVE-4435.2.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4435: - Status: Patch Available (was: Open) Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.11.0, 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: chart_1(1).png, HIVE-4435.1.patch, HIVE-4435.2.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4435: - Attachment: HIVE-4435.2.patch Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: chart_1(1).png, HIVE-4435.1.patch, HIVE-4435.2.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4641) Support post execution/fetch hook for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676367#comment-13676367 ] Shreepadma Venugopalan commented on HIVE-4641: -- Enforcing security on a per row basis could be one use of such a hook. The hook can be used in other ways to apply custom transformations to the result set before returning to the client. Support post execution/fetch hook for HiveServer2 - Key: HIVE-4641 URL: https://issues.apache.org/jira/browse/HIVE-4641 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Support post execution/fetch hook that is invoked prior to returning results to the client. This can be used to filter results to enforce a specific security policy before returning the result set to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4641) Support post execution/fetch hook for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676521#comment-13676521 ] Shreepadma Venugopalan commented on HIVE-4641: -- This is a general purpose hook and is not specific to any feature. Hive has hooks at various stages of compilation and execution - pre semantic analysis, post semantic analysis, pre execution etc, but misses a post execution/post fetch hook. This JIRA just adds that. Support post execution/fetch hook for HiveServer2 - Key: HIVE-4641 URL: https://issues.apache.org/jira/browse/HIVE-4641 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Support post execution/fetch hook that is invoked prior to returning results to the client. This can be used to filter results before returning the result set to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4669) Make username available to semantic analyzer hooks
Shreepadma Venugopalan created HIVE-4669: Summary: Make username available to semantic analyzer hooks Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Make username available to the semantic analyzer hooks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4669: - Status: Patch Available (was: In Progress) Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch Make username available to the semantic analyzer hooks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4669: - Attachment: HIVE-4669.1.patch Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch Make username available to the semantic analyzer hooks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-4669) Make username available to semantic analyzer hooks
[ https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-4669 started by Shreepadma Venugopalan. Make username available to semantic analyzer hooks -- Key: HIVE-4669 URL: https://issues.apache.org/jira/browse/HIVE-4669 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4669.1.patch Make username available to the semantic analyzer hooks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676598#comment-13676598 ] Shreepadma Venugopalan commented on HIVE-4561: -- [~clarkyzl]: I'm not sure I understand the fix here. Can you please elaborate on what it means to leaving it empty in the ColumnStatsTask? Thanks! Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, HIVE-4561.4.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle
Shreepadma Venugopalan created HIVE-4670: Summary: Authentication module should pass the instance part of the Kerberos principle Key: HIVE-4670 URL: https://issues.apache.org/jira/browse/HIVE-4670 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan When Kerberos authentication is enabled for HiveServer2, the thrift SASL layer passes instance@realm from the principal. It should instead strip the realm and pass just the instance part of the principal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4657) HCatalog checkstyle violation after HIVE-2670
Shreepadma Venugopalan created HIVE-4657: Summary: HCatalog checkstyle violation after HIVE-2670 Key: HIVE-4657 URL: https://issues.apache.org/jira/browse/HIVE-4657 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Shreepadma Venugopalan After HIVE-2670 was committed, I see the following error, {noformat} checkstyle: [echo] hcatalog [checkstyle] Running Checkstyle 5.5 on 416 files [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/drivers/TestDriverHiveCmdLine.pm:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_cmdline.conf:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [checkstyle] /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_nightly.conf:1: Line does not match expected header line of '\W*or more contributor license agreements. See the NOTICE file$'. [for] hcatalog: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/build.xml:310: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/hcatalog/build.xml:109: The following error occurred while executing this line: [for] /Users/vshree/work/repositories/hive15/hcatalog/build-support/ant/checkstyle.xml:32: Got 3 errors and 0 warnings. BUILD FAILED /Users/vshree/work/repositories/hive15/build.xml:308: Keepgoing execution: 2 of 11 iterations failed. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4648) Add ability to set hadoop conf overrides in JDBC for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673367#comment-13673367 ] Shreepadma Venugopalan commented on HIVE-4648: -- [~harisekhon]: It is possible to set and unset config variables through JDBC that can be set/unset through the command line. To do so, you'd need to do an execute statement with set config.var = value. To set the scratch dir, you can do the following in JDBC, {noformat} statement.execute(set hive.exec.scratchdir = /tmp/mydir); {noformat} Note that this property is set for the particular JDBC connection. Add ability to set hadoop conf overrides in JDBC for HiveServer2 Key: HIVE-4648 URL: https://issues.apache.org/jira/browse/HIVE-4648 Project: Hive Issue Type: Improvement Components: HiveServer2, JDBC Affects Versions: 0.10.0 Reporter: Hari Sekhon It's possible in BeeLine to specify set command overides of hadoop config variables, but I haven't seen any example code of how to do this in JDBC with HiveServer2. We need an ability to specify hadoop conf overrides on a per session basis or even half way through the session. See this Hive ticket for some background: https://issues.apache.org/jira/browse/HIVE-4644 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4648) Add ability to set hadoop conf overrides in JDBC for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673369#comment-13673369 ] Shreepadma Venugopalan commented on HIVE-4648: -- Please note that setting hive.exec.scratchdir is just an example of doing sets through JDBC. Add ability to set hadoop conf overrides in JDBC for HiveServer2 Key: HIVE-4648 URL: https://issues.apache.org/jira/browse/HIVE-4648 Project: Hive Issue Type: Improvement Components: HiveServer2, JDBC Affects Versions: 0.10.0 Reporter: Hari Sekhon It's possible in BeeLine to specify set command overides of hadoop config variables, but I haven't seen any example code of how to do this in JDBC with HiveServer2. We need an ability to specify hadoop conf overrides on a per session basis or even half way through the session. See this Hive ticket for some background: https://issues.apache.org/jira/browse/HIVE-4644 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673508#comment-13673508 ] Shreepadma Venugopalan commented on HIVE-4629: -- [~cwsteinbach]: Can you look at this? Thanks! HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673599#comment-13673599 ] Shreepadma Venugopalan commented on HIVE-4435: -- Thanks Ashutosh! Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: chart_1(1).png, HIVE-4435.1.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673616#comment-13673616 ] Shreepadma Venugopalan commented on HIVE-4561: -- [~ashutoshc]: Sure, I'll take a look at this today. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673887#comment-13673887 ] Shreepadma Venugopalan commented on HIVE-4561: -- LGTM! +1 (non-binding). Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4641) Support post execution/fetch hook for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-4641: Assignee: Shreepadma Venugopalan Support post execution/fetch hook for HiveServer2 - Key: HIVE-4641 URL: https://issues.apache.org/jira/browse/HIVE-4641 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Support post execution/fetch hook that is invoked prior to returning results to the client. This can be used to filter results to enforce a specific security policy before returning the result set to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4641) Support post execution/fetch hook for HiveServer2
Shreepadma Venugopalan created HIVE-4641: Summary: Support post execution/fetch hook for HiveServer2 Key: HIVE-4641 URL: https://issues.apache.org/jira/browse/HIVE-4641 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Reporter: Shreepadma Venugopalan Support post execution/fetch hook that is invoked prior to returning results to the client. This can be used to filter results to enforce a specific security policy before returning the result set to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-4641) Support post execution/fetch hook for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-4641 started by Shreepadma Venugopalan. Support post execution/fetch hook for HiveServer2 - Key: HIVE-4641 URL: https://issues.apache.org/jira/browse/HIVE-4641 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Support post execution/fetch hook that is invoked prior to returning results to the client. This can be used to filter results to enforce a specific security policy before returning the result set to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-4426) Support statistics collection for partitioning key
[ https://issues.apache.org/jira/browse/HIVE-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-4426 started by Shreepadma Venugopalan. Support statistics collection for partitioning key -- Key: HIVE-4426 URL: https://issues.apache.org/jira/browse/HIVE-4426 Project: Hive Issue Type: Bug Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan We should support the ability to collect statistics on the partitioning key column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4301) Bulk retrieval API for column stats
[ https://issues.apache.org/jira/browse/HIVE-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4301: - Description: Provide APIs to bulk fetch column stats i.e., stats for all columns in a table and stats for all columns in all partitions in a table. This is necessary when fetching per partition column stats to avoid un necessary network round trips. This is particularly relevant when running a remote metastore service. (was: Provide APIs to bulk fetch column stats i.e., stats for all columns in a table and stats for all columns in all partitions in a table.) Bulk retrieval API for column stats --- Key: HIVE-4301 URL: https://issues.apache.org/jira/browse/HIVE-4301 Project: Hive Issue Type: Bug Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Provide APIs to bulk fetch column stats i.e., stats for all columns in a table and stats for all columns in all partitions in a table. This is necessary when fetching per partition column stats to avoid un necessary network round trips. This is particularly relevant when running a remote metastore service. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4628) HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map
[ https://issues.apache.org/jira/browse/HIVE-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670516#comment-13670516 ] Shreepadma Venugopalan commented on HIVE-4628: -- Good catch Tejas. Looks like this is not an issue any more. I've set the appropriate status. HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map - Key: HIVE-4628 URL: https://issues.apache.org/jira/browse/HIVE-4628 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Critical HS2 SessionManager maintains a hashmap of active HS2 sessions. However, insert and deletes to this hashmap is not synchronized. A consequence of this is a racing thread could overwrite a valid session object in the hashmap and we could end up losing a session! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4628) HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map
[ https://issues.apache.org/jira/browse/HIVE-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan resolved HIVE-4628. -- Resolution: Not A Problem HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map - Key: HIVE-4628 URL: https://issues.apache.org/jira/browse/HIVE-4628 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Critical HS2 SessionManager maintains a hashmap of active HS2 sessions. However, insert and deletes to this hashmap is not synchronized. A consequence of this is a racing thread could overwrite a valid session object in the hashmap and we could end up losing a session! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-4629 started by Shreepadma Venugopalan. HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670583#comment-13670583 ] Shreepadma Venugopalan commented on HIVE-4629: -- @Carl: The proposed addition to TCLIService.thrift is the following new API and structs, {noformat} // GetLog() // Fetch operation log from the server corresponding to // a particular OperationHandle. struct TGetLogReq { // Operation whose log is requested 1: required TOperationHandle operationHandle } struct TGetLogResp { 1: required TStatus status 2: required string log } service TCLIService { ... ... TGetLogResp GetLog(1:TGetLogReq req); } {noformat} HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4618) show create table creating unusable DDL when field delimiter is \001
[ https://issues.apache.org/jira/browse/HIVE-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671073#comment-13671073 ] Shreepadma Venugopalan commented on HIVE-4618: -- LGTM. +1 (non-binding). show create table creating unusable DDL when field delimiter is \001 Key: HIVE-4618 URL: https://issues.apache.org/jira/browse/HIVE-4618 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.10.0 Environment: CDH4.2 Hive 0.10 Reporter: Johndee Burks Assignee: Navis Priority: Minor Attachments: HIVE-4618.D11007.1.patch When including a fields terminated by in the create statement. If the delimiter is preceded by a \001, hive turns this into \u0001 which is correct. However it then gives you a ddl that does not work because the parser changes the \u0001 into u0001. Example: hive create table j1 (a string) row format delimited fields terminated by '\001'; hive show create table j1; CREATE TABLE j1( a string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://forza-1.cloud.rtp.cloudera.com:8020/user/hive/warehouse/j1' TBLPROPERTIES ( 'transient_lastDdlTime'='1369664999') hive desc formatted j1; …shortened to save space Storage Desc Params: field.delim \u0001 serialization.format\u0001 hive drop table j1; hive CREATE TABLE j1( a string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://forza-1.cloud.rtp.cloudera.com:8020/user/hive/warehouse/j1' TBLPROPERTIES ( 'transient_lastDdlTime'='1369664999'); hive desc formatted j1; …shortened to save space Storage Desc Params: field.delim u0001 serialization.formatu0001 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4628) HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map
Shreepadma Venugopalan created HIVE-4628: Summary: HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map Key: HIVE-4628 URL: https://issues.apache.org/jira/browse/HIVE-4628 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Critical HS2 SessionManager maintains a hashmap of active HS2 sessions. However, insert and deletes to this hashmap is not synchronized. A consequence of this is a racing thread could overwrite a valid session object in the hashmap and we could end up losing a session! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4629) HS2 should support an API to retrieve query logs
Shreepadma Venugopalan created HIVE-4629: Summary: HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Bug Reporter: Shreepadma Venugopalan HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-4629: Assignee: Shreepadma Venugopalan HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Bug Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4629: - Issue Type: New Feature (was: Bug) HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: New Feature Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4629: - Issue Type: Sub-task (was: New Feature) Parent: HIVE-2935 HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-4628) HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map
[ https://issues.apache.org/jira/browse/HIVE-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-4628 started by Shreepadma Venugopalan. HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map - Key: HIVE-4628 URL: https://issues.apache.org/jira/browse/HIVE-4628 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Critical HS2 SessionManager maintains a hashmap of active HS2 sessions. However, insert and deletes to this hashmap is not synchronized. A consequence of this is a racing thread could overwrite a valid session object in the hashmap and we could end up losing a session! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668883#comment-13668883 ] Shreepadma Venugopalan commented on HIVE-4435: -- Ping :) Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: chart_1(1).png, HIVE-4435.1.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648987#comment-13648987 ] Shreepadma Venugopalan commented on HIVE-4435: -- Can a committer take a look at this? Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: chart_1(1).png, HIVE-4435.1.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4435: - Status: Patch Available (was: Open) Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4435.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4435: - Attachment: HIVE-4435.1.patch Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4435.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4435: - Description: The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4435.1.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644840#comment-13644840 ] Shreepadma Venugopalan commented on HIVE-4435: -- The fix is to use hash functions that are pairwise independent. More on pairwise independence and family of hash functions - http://people.csail.mit.edu/ronitt/COURSE/S12/handouts/lec5.pdf Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4435.1.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644844#comment-13644844 ] Shreepadma Venugopalan commented on HIVE-4435: -- review board: https://reviews.apache.org/r/10841/ Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4435.1.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira