[jira] [Commented] (HIVE-13866) flatten callstack for directSQL errors
[ https://issues.apache.org/jira/browse/HIVE-13866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325707#comment-15325707 ] Hive QA commented on HIVE-13866: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809339/HIVE-13866.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10223 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/81/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/81/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-81/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12809339 - PreCommit-HIVE-MASTER-Build > flatten callstack for directSQL errors > -- > > Key: HIVE-13866 > URL: https://issues.apache.org/jira/browse/HIVE-13866 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13866.01.patch, HIVE-13866.patch > > > These errors look like final errors and confuse people. The callstack may be > useful if it's some datanucleus/db issue, but it needs to be flattened and > logged with a warning that this is not a final query error and that there's a > fallback -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader
[ https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325650#comment-15325650 ] Hive QA commented on HIVE-13913: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809343/HIVE-13913.02.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10223 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_acid org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_uncompressed org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_llap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_nullscan org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_join_part_col_char org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.ql.TestTxnCommands.testSimpleAcidInsert {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/80/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/80/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-80/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12809343 - PreCommit-HIVE-MASTER-Build > LLAP: introduce backpressure to recordreader > > > Key: HIVE-13913 > URL: https://issues.apache.org/jira/browse/HIVE-13913 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, > HIVE-13913.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13921) Fix spark on yarn tests for HoS
[ https://issues.apache.org/jira/browse/HIVE-13921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325631#comment-15325631 ] Ashutosh Chauhan commented on HIVE-13921: - I see. Lets use this jira for golden file update. {{INSERT OVERWRITE DIRECTORY}} bug should be reproducible outside of this easily in TestCliDriver. Lets come up with standalone test case for it and track it in separate jira. > Fix spark on yarn tests for HoS > --- > > Key: HIVE-13921 > URL: https://issues.apache.org/jira/browse/HIVE-13921 > Project: Hive > Issue Type: Test >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13921.1.patch > > > {{index_bitmap3}} and {{constprog_partitioner}} have been failing. Let's fix > them here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13970) refactor LLAPIF splits - get rid of SubmitWorkInfo
[ https://issues.apache.org/jira/browse/HIVE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13970: Assignee: Sergey Shelukhin Status: Patch Available (was: Open) > refactor LLAPIF splits - get rid of SubmitWorkInfo > -- > > Key: HIVE-13970 > URL: https://issues.apache.org/jira/browse/HIVE-13970 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13970.only.patch, HIVE-13970.patch > > > First we build the signable vertex spec, convert it into bytes (as we > should), and put it inside SubmitWorkInfo. Then we serialize that into byte[] > and put it into LlapInputSplit. Then we serialize that to return... We should > get rid of one of the steps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13970) refactor LLAPIF splits - get rid of SubmitWorkInfo
[ https://issues.apache.org/jira/browse/HIVE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13970: Attachment: HIVE-13970.patch HIVE-13970.only.patch A simple patch to remove SWI, merging it with LLAP split (plus the same w/some other patch for Hive QA) I wonder if we should make it protobuf instead of writable... [~hagleitn] [~sseth] fyi > refactor LLAPIF splits - get rid of SubmitWorkInfo > -- > > Key: HIVE-13970 > URL: https://issues.apache.org/jira/browse/HIVE-13970 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin > Attachments: HIVE-13970.only.patch, HIVE-13970.patch > > > First we build the signable vertex spec, convert it into bytes (as we > should), and put it inside SubmitWorkInfo. Then we serialize that into byte[] > and put it into LlapInputSplit. Then we serialize that to return... We should > get rid of one of the steps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported
[ https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325627#comment-15325627 ] Ashutosh Chauhan commented on HIVE-13990: - Usual practice is to first commit on master and then do backports. Would you like to put up a patch against master? > Client should not check dfs.namenode.acls.enabled to determine if extended > ACLs are supported > - > > Key: HIVE-13990 > URL: https://issues.apache.org/jira/browse/HIVE-13990 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 >Reporter: Chris Drome > Attachments: HIVE-13990-branch-1.patch > > > dfs.namenode.acls.enabled is a server side configuration and the client > should not presume to know how the server is configured. Barring a method for > querying the NN whether ACLs are supported the client should try and catch > the appropriate exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems
[ https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325602#comment-15325602 ] Sergey Shelukhin commented on HIVE-13901: - +1 pending tests and [~ashutoshc] feedback > Hivemetastore add partitions can be slow depending on filesystems > - > > Key: HIVE-13901 > URL: https://issues.apache.org/jira/browse/HIVE-13901 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch > > > Depending on FS, creating external tables & adding partitions can be > expensive (e.g msck which adds all partitions). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13986) LLAP: kill Tez AM on token errors from plugin
[ https://issues.apache.org/jira/browse/HIVE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325604#comment-15325604 ] Sergey Shelukhin commented on HIVE-13986: - Test failures are unrelated > LLAP: kill Tez AM on token errors from plugin > - > > Key: HIVE-13986 > URL: https://issues.apache.org/jira/browse/HIVE-13986 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13986.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-13971) Address testcase failures of acid_globallimit.q and etc
[ https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong resolved HIVE-13971. Resolution: Fixed > Address testcase failures of acid_globallimit.q and etc > --- > > Key: HIVE-13971 > URL: https://issues.apache.org/jira/browse/HIVE-13971 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13971) Address testcase failures of acid_globallimit.q and etc
[ https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325599#comment-15325599 ] Pengcheng Xiong commented on HIVE-13971: list_bucket_dml_12.q,list_bucket_dml_13.q > Address testcase failures of acid_globallimit.q and etc > --- > > Key: HIVE-13971 > URL: https://issues.apache.org/jira/browse/HIVE-13971 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13971) Address testcase failures of acid_globallimit.q and etc
[ https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325594#comment-15325594 ] Pengcheng Xiong commented on HIVE-13971: update test cases using java8 > Address testcase failures of acid_globallimit.q and etc > --- > > Key: HIVE-13971 > URL: https://issues.apache.org/jira/browse/HIVE-13971 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13971) Address testcase failures of acid_globallimit.q and etc
[ https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325594#comment-15325594 ] Pengcheng Xiong edited comment on HIVE-13971 at 6/11/16 1:14 AM: - update test case golden files using java8 was (Author: pxiong): update test cases using java8 > Address testcase failures of acid_globallimit.q and etc > --- > > Key: HIVE-13971 > URL: https://issues.apache.org/jira/browse/HIVE-13971 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13971) Address testcase failures of acid_globallimit.q and etc
[ https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13971: --- Summary: Address testcase failures of acid_globallimit.q and etc (was: Address testcase failures of acid_globallimit.q and acid_table_stats.q) > Address testcase failures of acid_globallimit.q and etc > --- > > Key: HIVE-13971 > URL: https://issues.apache.org/jira/browse/HIVE-13971 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-13971) Address testcase failures of acid_globallimit.q and acid_table_stats.q
[ https://issues.apache.org/jira/browse/HIVE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reopened HIVE-13971: > Address testcase failures of acid_globallimit.q and acid_table_stats.q > -- > > Key: HIVE-13971 > URL: https://issues.apache.org/jira/browse/HIVE-13971 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13827) LLAPIF: authentication on the output channel
[ https://issues.apache.org/jira/browse/HIVE-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13827: Attachment: HIVE-13827.01.patch Finished and rebased the patch > LLAPIF: authentication on the output channel > > > Key: HIVE-13827 > URL: https://issues.apache.org/jira/browse/HIVE-13827 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13827.01.patch, HIVE-13827.patch > > > The current thinking is that we'd send the token. There's no protocol on the > channel right now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13988) zero length file is being created for empty bucket in tez mode
[ https://issues.apache.org/jira/browse/HIVE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325553#comment-15325553 ] Hive QA commented on HIVE-13988: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809325/HIVE-13988.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10223 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_SortUnionTransposeRule org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_join_transpose org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit_ppd_optimizer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_limit org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_limit org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/79/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/79/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-79/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12809325 - PreCommit-HIVE-MASTER-Build > zero length file is being created for empty bucket in tez mode > -- > > Key: HIVE-13988 > URL: https://issues.apache.org/jira/browse/HIVE-13988 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13988.01.patch > > > Even though bucket is empty, zero length file is being created in tez mode. > steps to reproduce the issue: > {noformat} > hive> set hive.execution.engine; > hive.execution.engine=tez > hive> drop table if exists emptybucket_orc; > OK > Time taken: 5.416 seconds > hive> create table emptybucket_orc(age int) clustered by (age) sorted by > (age) into 99 buckets stored as orc; > OK > Time taken: 0.493 seconds > hive> insert into table emptybucket_orc select distinct(age) from > studenttab10k limit 0; > Query ID = hrt_qa_20160523231955_8b981be7-68c4-4416-8a48-5f8c7ff551c3 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1464045121842_0002) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. llap SUCCEEDED 1 100 > 0 0 > Reducer 2 .. llap SUCCEEDED 1 100 > 0 0 > Reducer 3 .. llap SUCCEEDED 1 100 > 0 0 > Reducer 4 .. llap SUCCEEDED 99 9900 > 0 0 > -- > VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 11.00 s > > -- > Loading data to table default.emptybucket_orc > OK > Time taken: 16.907 seconds > hive> dfs -ls /apps/hive/warehouse/emptybucket_orc; > Found 99 items > -rwxrwxrwx 3 hrt_qa hdfs 0 2016-05-23 23:20 > /apps/hive/warehouse/emptybucket_orc/00_0 > -rwxrwxrwx 3 hrt_qa hdfs 0 2016-05-23 23:20 > /apps/hive/warehouse/emptybucket_orc/01_0 > .. > {noformat} > Expected behavior: > In tez mode, zero length file shouldn't get created on hdfs if bucket is empty --
[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck
[ https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13984: --- Status: Patch Available (was: Open) address [~hsubramaniyan]'s comments. > Use multi-threaded approach to listing files for msck > - > > Key: HIVE-13984 > URL: https://issues.apache.org/jira/browse/HIVE-13984 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13984) Use multi-threaded approach to listing files for msck
[ https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325534#comment-15325534 ] Pengcheng Xiong commented on HIVE-13984: The tests results are good. > Use multi-threaded approach to listing files for msck > - > > Key: HIVE-13984 > URL: https://issues.apache.org/jira/browse/HIVE-13984 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13984) Use multi-threaded approach to listing files for msck
[ https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325535#comment-15325535 ] Ashutosh Chauhan commented on HIVE-13984: - [~prasanth_j] You are familiar with multi-threaded listStatus code in ORC. This is also very similiar. Can you help review this? > Use multi-threaded approach to listing files for msck > - > > Key: HIVE-13984 > URL: https://issues.apache.org/jira/browse/HIVE-13984 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck
[ https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13984: --- Attachment: HIVE-13984.02.patch > Use multi-threaded approach to listing files for msck > - > > Key: HIVE-13984 > URL: https://issues.apache.org/jira/browse/HIVE-13984 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck
[ https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13984: --- Status: Open (was: Patch Available) > Use multi-threaded approach to listing files for msck > - > > Key: HIVE-13984 > URL: https://issues.apache.org/jira/browse/HIVE-13984 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems
[ https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325520#comment-15325520 ] Ashutosh Chauhan commented on HIVE-13901: - * Are any of the failures related ? * I think we should pick different name for config like: hive.metastore.fshandler.threads or something similar. * [~sershe] Can you take another look at the patch? > Hivemetastore add partitions can be slow depending on filesystems > - > > Key: HIVE-13901 > URL: https://issues.apache.org/jira/browse/HIVE-13901 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch > > > Depending on FS, creating external tables & adding partitions can be > expensive (e.g msck which adds all partitions). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13771) LLAPIF: generate app ID
[ https://issues.apache.org/jira/browse/HIVE-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13771: Attachment: HIVE-13771.01.patch Rebase (noop) > LLAPIF: generate app ID > --- > > Key: HIVE-13771 > URL: https://issues.apache.org/jira/browse/HIVE-13771 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13771.01.patch, HIVE-13771.patch > > > See comments in the HIVE-13675 patch. The uniqueness needs to be ensured; the > user may be allowed to supply a prefix (e.g. his YARN app Id, if any) for > ease of tracking -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13771) LLAPIF: generate app ID
[ https://issues.apache.org/jira/browse/HIVE-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13771: Attachment: (was: HIVE-13771.01.wo.13731.patch) > LLAPIF: generate app ID > --- > > Key: HIVE-13771 > URL: https://issues.apache.org/jira/browse/HIVE-13771 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13771.01.patch, HIVE-13771.patch > > > See comments in the HIVE-13675 patch. The uniqueness needs to be ensured; the > user may be allowed to supply a prefix (e.g. his YARN app Id, if any) for > ease of tracking -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13771) LLAPIF: generate app ID
[ https://issues.apache.org/jira/browse/HIVE-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13771: Attachment: (was: HIVE-13771.01.patch) > LLAPIF: generate app ID > --- > > Key: HIVE-13771 > URL: https://issues.apache.org/jira/browse/HIVE-13771 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13771.01.patch, HIVE-13771.patch > > > See comments in the HIVE-13675 patch. The uniqueness needs to be ensured; the > user may be allowed to supply a prefix (e.g. his YARN app Id, if any) for > ease of tracking -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13731) LLAP: return LLAP token with the splits
[ https://issues.apache.org/jira/browse/HIVE-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13731: Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to master. > LLAP: return LLAP token with the splits > --- > > Key: HIVE-13731 > URL: https://issues.apache.org/jira/browse/HIVE-13731 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > Attachments: HIVE-13731.01.patch, HIVE-13731.01.wo.13675-13443.patch, > HIVE-13731.02.patch, HIVE-13731.03.patch, HIVE-13731.patch, > HIVE-13731.wo.13444-13675-13443.patch > > > Need to return the token with the splits, then take it in LLAPIF and make > sure it's used when talking to LLAP -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan reassigned HIVE-13995: Assignee: Hari Sankar Sivarama Subramaniyan > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13617) LLAP: support non-vectorized execution in IO
[ https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13617: Attachment: HIVE-13617.06.patch More q file updates > LLAP: support non-vectorized execution in IO > > > Key: HIVE-13617 > URL: https://issues.apache.org/jira/browse/HIVE-13617 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13617-wo-11417.patch, HIVE-13617-wo-11417.patch, > HIVE-13617.01.patch, HIVE-13617.03.patch, HIVE-13617.04.patch, > HIVE-13617.05.patch, HIVE-13617.06.patch, HIVE-13617.patch, HIVE-13617.patch, > HIVE-15396-with-oi.patch > > > Two approaches - a separate decoding path, into rows instead of VRBs; or > decoding VRBs into rows on a higher level (the original LlapInputFormat). I > think the latter might be better - it's not a hugely important path, and perf > in non-vectorized case is not the best anyway, so it's better to make do with > much less new code and architectural disruption. > Some ORC patches in progress introduce an easy to reuse (or so I hope, > anyway) VRB-to-row conversion, so we should just use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325434#comment-15325434 ] Sergey Shelukhin commented on HIVE-13957: - Test failures are unrelated. > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO
[ https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325427#comment-15325427 ] Sergey Shelukhin commented on HIVE-13617: - [~spena] I have a question; I added a test (orc_llap_nonvector) to a separate minillap.query.files variable, and that to excludeQueryFile for standard CLI tests (it's ok to run that test in regular CliDriver, but it's pretty useless). However, the test has been run by HiveQA in the CliDriver anyway... does the configuration only propagate on commit? I can add the out file now and remove it after commit. > LLAP: support non-vectorized execution in IO > > > Key: HIVE-13617 > URL: https://issues.apache.org/jira/browse/HIVE-13617 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13617-wo-11417.patch, HIVE-13617-wo-11417.patch, > HIVE-13617.01.patch, HIVE-13617.03.patch, HIVE-13617.04.patch, > HIVE-13617.05.patch, HIVE-13617.patch, HIVE-13617.patch, > HIVE-15396-with-oi.patch > > > Two approaches - a separate decoding path, into rows instead of VRBs; or > decoding VRBs into rows on a higher level (the original LlapInputFormat). I > think the latter might be better - it's not a hugely important path, and perf > in non-vectorized case is not the best anyway, so it's better to make do with > much less new code and architectural disruption. > Some ORC patches in progress introduce an easy to reuse (or so I hope, > anyway) VRB-to-row conversion, so we should just use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13731) LLAP: return LLAP token with the splits
[ https://issues.apache.org/jira/browse/HIVE-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325418#comment-15325418 ] Jason Dere commented on HIVE-13731: --- +1 > LLAP: return LLAP token with the splits > --- > > Key: HIVE-13731 > URL: https://issues.apache.org/jira/browse/HIVE-13731 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13731.01.patch, HIVE-13731.01.wo.13675-13443.patch, > HIVE-13731.02.patch, HIVE-13731.03.patch, HIVE-13731.patch, > HIVE-13731.wo.13444-13675-13443.patch > > > Need to return the token with the splits, then take it in LLAPIF and make > sure it's used when talking to LLAP -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13984) Use multi-threaded approach to listing files for msck
[ https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325406#comment-15325406 ] Hive QA commented on HIVE-13984: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809322/HIVE-13984.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10223 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/78/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/78/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-78/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12809322 - PreCommit-HIVE-MASTER-Build > Use multi-threaded approach to listing files for msck > - > > Key: HIVE-13984 > URL: https://issues.apache.org/jira/browse/HIVE-13984 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13984.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint
[ https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325394#comment-15325394 ] Vaibhav Gumashta commented on HIVE-13725: - [~ekoifman] Sorry should've done that before. Just made it patch available > ACID: Streaming API should synchronize calls when multiple threads use the > same endpoint > > > Key: HIVE-13725 > URL: https://issues.apache.org/jira/browse/HIVE-13725 > Project: Hive > Issue Type: Bug > Components: HCatalog, Metastore, Transactions >Affects Versions: 1.2.1, 2.0.0 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Critical > Labels: ACID, Streaming > Attachments: HIVE-13725.1.patch > > > Currently, the streaming endpoint creates a metastore client which gets used > for RPC. The client itself is not internally thread safe. Therefore, the API > methods should provide the relevant synchronization so that the methods can > be called from different threads. A sample use case is as follows: > 1. Thread 1 creates a streaming endpoint and opens a txn batch. > 2. Thread 2 heartbeats the txn batch. > With the current impl, this can result in an "out of sequence response", > since the response of the calls in thread1 might end up going to thread2 and > vice-versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint
[ https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-13725: Status: Patch Available (was: Open) > ACID: Streaming API should synchronize calls when multiple threads use the > same endpoint > > > Key: HIVE-13725 > URL: https://issues.apache.org/jira/browse/HIVE-13725 > Project: Hive > Issue Type: Bug > Components: HCatalog, Metastore, Transactions >Affects Versions: 2.0.0, 1.2.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Critical > Labels: ACID, Streaming > Attachments: HIVE-13725.1.patch > > > Currently, the streaming endpoint creates a metastore client which gets used > for RPC. The client itself is not internally thread safe. Therefore, the API > methods should provide the relevant synchronization so that the methods can > be called from different threads. A sample use case is as follows: > 1. Thread 1 creates a streaming endpoint and opens a txn batch. > 2. Thread 2 heartbeats the txn batch. > With the current impl, this can result in an "out of sequence response", > since the response of the calls in thread1 might end up going to thread2 and > vice-versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint
[ https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325387#comment-15325387 ] Eugene Koifman commented on HIVE-13725: --- [~vgumashta] should this be Patch Available? For some reason I don't see Submit Patch button in this ticket?! > ACID: Streaming API should synchronize calls when multiple threads use the > same endpoint > > > Key: HIVE-13725 > URL: https://issues.apache.org/jira/browse/HIVE-13725 > Project: Hive > Issue Type: Bug > Components: HCatalog, Metastore, Transactions >Affects Versions: 1.2.1, 2.0.0 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Critical > Labels: ACID, Streaming > Attachments: HIVE-13725.1.patch > > > Currently, the streaming endpoint creates a metastore client which gets used > for RPC. The client itself is not internally thread safe. Therefore, the API > methods should provide the relevant synchronization so that the methods can > be called from different threads. A sample use case is as follows: > 1. Thread 1 creates a streaming endpoint and opens a txn batch. > 2. Thread 2 heartbeats the txn batch. > With the current impl, this can result in an "out of sequence response", > since the response of the calls in thread1 might end up going to thread2 and > vice-versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint
[ https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-13725: - Assignee: Eugene Koifman (was: Vaibhav Gumashta) > ACID: Streaming API should synchronize calls when multiple threads use the > same endpoint > > > Key: HIVE-13725 > URL: https://issues.apache.org/jira/browse/HIVE-13725 > Project: Hive > Issue Type: Bug > Components: HCatalog, Metastore, Transactions >Affects Versions: 1.2.1, 2.0.0 >Reporter: Vaibhav Gumashta >Assignee: Eugene Koifman >Priority: Critical > Labels: ACID, Streaming > Attachments: HIVE-13725.1.patch > > > Currently, the streaming endpoint creates a metastore client which gets used > for RPC. The client itself is not internally thread safe. Therefore, the API > methods should provide the relevant synchronization so that the methods can > be called from different threads. A sample use case is as follows: > 1. Thread 1 creates a streaming endpoint and opens a txn batch. > 2. Thread 2 heartbeats the txn batch. > With the current impl, this can result in an "out of sequence response", > since the response of the calls in thread1 might end up going to thread2 and > vice-versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint
[ https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-13725: -- Assignee: Vaibhav Gumashta (was: Eugene Koifman) > ACID: Streaming API should synchronize calls when multiple threads use the > same endpoint > > > Key: HIVE-13725 > URL: https://issues.apache.org/jira/browse/HIVE-13725 > Project: Hive > Issue Type: Bug > Components: HCatalog, Metastore, Transactions >Affects Versions: 1.2.1, 2.0.0 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Critical > Labels: ACID, Streaming > Attachments: HIVE-13725.1.patch > > > Currently, the streaming endpoint creates a metastore client which gets used > for RPC. The client itself is not internally thread safe. Therefore, the API > methods should provide the relevant synchronization so that the methods can > be called from different threads. A sample use case is as follows: > 1. Thread 1 creates a streaming endpoint and opens a txn batch. > 2. Thread 2 heartbeats the txn batch. > With the current impl, this can result in an "out of sequence response", > since the response of the calls in thread1 might end up going to thread2 and > vice-versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13788) hive msck listpartitions need to make use of directSQL instead of datanucleus
[ https://issues.apache.org/jira/browse/HIVE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325344#comment-15325344 ] Ashutosh Chauhan commented on HIVE-13788: - +1 > hive msck listpartitions need to make use of directSQL instead of datanucleus > - > > Key: HIVE-13788 > URL: https://issues.apache.org/jira/browse/HIVE-13788 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Hari Sankar Sivarama Subramaniyan >Priority: Minor > Attachments: HIVE-13788.1.patch, HIVE-13788.2.patch, > msck_call_stack_with_fix.png, msck_stack_trace.png > > > Currently, for tables having 1000s of partitions too many DB calls are made > via datanucleus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13788) hive msck listpartitions need to make use of directSQL instead of datanucleus
[ https://issues.apache.org/jira/browse/HIVE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13788: - Attachment: HIVE-13788.2.patch > hive msck listpartitions need to make use of directSQL instead of datanucleus > - > > Key: HIVE-13788 > URL: https://issues.apache.org/jira/browse/HIVE-13788 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Hari Sankar Sivarama Subramaniyan >Priority: Minor > Attachments: HIVE-13788.1.patch, HIVE-13788.2.patch, > msck_call_stack_with_fix.png, msck_stack_trace.png > > > Currently, for tables having 1000s of partitions too many DB calls are made > via datanucleus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13930) upgrade Hive to latest Hadoop version
[ https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13930: Attachment: HIVE-13930.01.patch Trying again, new dependencies for some Hive classes. Spark tests still failed for me locally due to CNF, but that CNF was in Hadoop for a class that hasn't moved to a different package, so it might just be local issue. > upgrade Hive to latest Hadoop version > - > > Key: HIVE-13930 > URL: https://issues.apache.org/jira/browse/HIVE-13930 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13930.01.patch, HIVE-13930.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13930) upgrade Hive to latest Hadoop version
[ https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13930: Attachment: (was: HIVE-13930.01.patch) > upgrade Hive to latest Hadoop version > - > > Key: HIVE-13930 > URL: https://issues.apache.org/jira/browse/HIVE-13930 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13930.01.patch, HIVE-13930.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13930) upgrade Hive to latest Hadoop version
[ https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13930: Attachment: HIVE-13930.01.patch > upgrade Hive to latest Hadoop version > - > > Key: HIVE-13930 > URL: https://issues.apache.org/jira/browse/HIVE-13930 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13930.01.patch, HIVE-13930.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13788) hive msck listpartitions need to make use of directSQL instead of datanucleus
[ https://issues.apache.org/jira/browse/HIVE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13788: - Status: Open (was: Patch Available) > hive msck listpartitions need to make use of directSQL instead of datanucleus > - > > Key: HIVE-13788 > URL: https://issues.apache.org/jira/browse/HIVE-13788 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Hari Sankar Sivarama Subramaniyan >Priority: Minor > Attachments: HIVE-13788.1.patch, HIVE-13788.2.patch, > msck_call_stack_with_fix.png, msck_stack_trace.png > > > Currently, for tables having 1000s of partitions too many DB calls are made > via datanucleus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13788) hive msck listpartitions need to make use of directSQL instead of datanucleus
[ https://issues.apache.org/jira/browse/HIVE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13788: - Status: Patch Available (was: Open) > hive msck listpartitions need to make use of directSQL instead of datanucleus > - > > Key: HIVE-13788 > URL: https://issues.apache.org/jira/browse/HIVE-13788 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Hari Sankar Sivarama Subramaniyan >Priority: Minor > Attachments: HIVE-13788.1.patch, HIVE-13788.2.patch, > msck_call_stack_with_fix.png, msck_stack_trace.png > > > Currently, for tables having 1000s of partitions too many DB calls are made > via datanucleus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13833) Add an initial delay when starting the heartbeat
[ https://issues.apache.org/jira/browse/HIVE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13833: - Attachment: HIVE-13833.3.patch > Add an initial delay when starting the heartbeat > > > Key: HIVE-13833 > URL: https://issues.apache.org/jira/browse/HIVE-13833 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Minor > Attachments: HIVE-13833.1.patch, HIVE-13833.2.patch, > HIVE-13833.3.patch > > > Since the scheduling of heartbeat happens immediately after lock acquisition, > it's unnecessary to send heartbeat at the time when locks is acquired. Add an > initial delay to skip this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in
[ https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325252#comment-15325252 ] Abdullah Yousufi commented on HIVE-13964: - I attached a new patch addressing the exit error issue. You also should not get the "No such file or directory" error. > Add a parameter to beeline to allow a properties file to be passed in > - > > Key: HIVE-13964 > URL: https://issues.apache.org/jira/browse/HIVE-13964 > Project: Hive > Issue Type: New Feature > Components: Beeline >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch, > HIVE-13964.03.patch > > > HIVE-6652 removed the ability to pass in a properties file as a beeline > parameter. It may be a useful feature to be able to pass the file in is a > parameter, such as --property-file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in
[ https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdullah Yousufi updated HIVE-13964: Attachment: HIVE-13964.03.patch > Add a parameter to beeline to allow a properties file to be passed in > - > > Key: HIVE-13964 > URL: https://issues.apache.org/jira/browse/HIVE-13964 > Project: Hive > Issue Type: New Feature > Components: Beeline >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch, > HIVE-13964.03.patch > > > HIVE-6652 removed the ability to pass in a properties file as a beeline > parameter. It may be a useful feature to be able to pass the file in is a > parameter, such as --property-file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13986) LLAP: kill Tez AM on token errors from plugin
[ https://issues.apache.org/jira/browse/HIVE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325233#comment-15325233 ] Hive QA commented on HIVE-13986: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809296/HIVE-13986.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10221 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/76/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/76/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-76/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12809296 - PreCommit-HIVE-MASTER-Build > LLAP: kill Tez AM on token errors from plugin > - > > Key: HIVE-13986 > URL: https://issues.apache.org/jira/browse/HIVE-13986 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13986.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13989) Extended ACLs are not handled according to specification
[ https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Drome updated HIVE-13989: --- Attachment: HIVE-13989-branch-1.patch > Extended ACLs are not handled according to specification > > > Key: HIVE-13989 > URL: https://issues.apache.org/jira/browse/HIVE-13989 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13989-branch-1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported
[ https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Drome updated HIVE-13990: --- Attachment: HIVE-13990-branch-1.patch > Client should not check dfs.namenode.acls.enabled to determine if extended > ACLs are supported > - > > Key: HIVE-13990 > URL: https://issues.apache.org/jira/browse/HIVE-13990 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 >Reporter: Chris Drome > Attachments: HIVE-13990-branch-1.patch > > > dfs.namenode.acls.enabled is a server side configuration and the client > should not presume to know how the server is configured. Barring a method for > querying the NN whether ACLs are supported the client should try and catch > the appropriate exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13994) increase large varchar limits in db scripts to be db-specific
[ https://issues.apache.org/jira/browse/HIVE-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325208#comment-15325208 ] Ashutosh Chauhan commented on HIVE-13994: - sounds good to me. > increase large varchar limits in db scripts to be db-specific > - > > Key: HIVE-13994 > URL: https://issues.apache.org/jira/browse/HIVE-13994 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Right now all our max varchar limits are 4k, presumably due to Oracle > limitations. All other dbs support larger values and/or MAX; given that we > moved away from schema auto-creation and towards db-specific scripts, we can > increase these limits per database to maximum allowed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-13864) Beeline ignores the command that follows a semicolon and comment
[ https://issues.apache.org/jira/browse/HIVE-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-13864 started by Reuben Kuhnert. - > Beeline ignores the command that follows a semicolon and comment > > > Key: HIVE-13864 > URL: https://issues.apache.org/jira/browse/HIVE-13864 > Project: Hive > Issue Type: Bug >Reporter: Muthu Manickam >Assignee: Reuben Kuhnert > Attachments: HIVE-13864.01.patch > > > Beeline ignores the next line/command that follows a command with semicolon > and comments. > Example 1: > select * > from table1; -- comments > select * from table2; > In this case, only the first command is executed.. second command "select * > from table2" is not executed. > -- > Example 2: > select * > from table1; -- comments > select * from table2; > select * from table3; > In this case, first command and third command is executed. second command > "select * from table2" is not executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13864) Beeline ignores the command that follows a semicolon and comment
[ https://issues.apache.org/jira/browse/HIVE-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-13864: -- Attachment: HIVE-13864.01.patch > Beeline ignores the command that follows a semicolon and comment > > > Key: HIVE-13864 > URL: https://issues.apache.org/jira/browse/HIVE-13864 > Project: Hive > Issue Type: Bug >Reporter: Muthu Manickam >Assignee: Reuben Kuhnert > Attachments: HIVE-13864.01.patch > > > Beeline ignores the next line/command that follows a command with semicolon > and comments. > Example 1: > select * > from table1; -- comments > select * from table2; > In this case, only the first command is executed.. second command "select * > from table2" is not executed. > -- > Example 2: > select * > from table1; -- comments > select * from table2; > select * from table3; > In this case, first command and third command is executed. second command > "select * from table2" is not executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13864) Beeline ignores the command that follows a semicolon and comment
[ https://issues.apache.org/jira/browse/HIVE-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-13864: -- Status: Patch Available (was: In Progress) > Beeline ignores the command that follows a semicolon and comment > > > Key: HIVE-13864 > URL: https://issues.apache.org/jira/browse/HIVE-13864 > Project: Hive > Issue Type: Bug >Reporter: Muthu Manickam >Assignee: Reuben Kuhnert > Attachments: HIVE-13864.01.patch > > > Beeline ignores the next line/command that follows a command with semicolon > and comments. > Example 1: > select * > from table1; -- comments > select * from table2; > In this case, only the first command is executed.. second command "select * > from table2" is not executed. > -- > Example 2: > select * > from table1; -- comments > select * from table2; > select * from table3; > In this case, first command and third command is executed. second command > "select * from table2" is not executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13731) LLAP: return LLAP token with the splits
[ https://issues.apache.org/jira/browse/HIVE-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13731: Attachment: HIVE-13731.03.patch > LLAP: return LLAP token with the splits > --- > > Key: HIVE-13731 > URL: https://issues.apache.org/jira/browse/HIVE-13731 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13731.01.patch, HIVE-13731.01.wo.13675-13443.patch, > HIVE-13731.02.patch, HIVE-13731.03.patch, HIVE-13731.patch, > HIVE-13731.wo.13444-13675-13443.patch > > > Need to return the token with the splits, then take it in LLAPIF and make > sure it's used when talking to LLAP -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13731) LLAP: return LLAP token with the splits
[ https://issues.apache.org/jira/browse/HIVE-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325098#comment-15325098 ] Sergey Shelukhin commented on HIVE-13731: - Test failures are known or have a namenode in safe mode > LLAP: return LLAP token with the splits > --- > > Key: HIVE-13731 > URL: https://issues.apache.org/jira/browse/HIVE-13731 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13731.01.patch, HIVE-13731.01.wo.13675-13443.patch, > HIVE-13731.02.patch, HIVE-13731.patch, HIVE-13731.wo.13444-13675-13443.patch > > > Need to return the token with the splits, then take it in LLAPIF and make > sure it's used when talking to LLAP -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13994) increase large varchar limits in db scripts to be db-specific
[ https://issues.apache.org/jira/browse/HIVE-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325086#comment-15325086 ] Sergey Shelukhin commented on HIVE-13994: - [~ashutoshc] [~sushanth] thoughts/objections? > increase large varchar limits in db scripts to be db-specific > - > > Key: HIVE-13994 > URL: https://issues.apache.org/jira/browse/HIVE-13994 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Right now all our max varchar limits are 4k, presumably due to Oracle > limitations. All other dbs support larger values and/or MAX; given that we > moved away from schema auto-creation and towards db-specific scripts, we can > increase these limits per database to maximum allowed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in
[ https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325076#comment-15325076 ] Abdullah Yousufi commented on HIVE-13964: - It shouldn't be displaying that error. Could you possibly retry and see if you get the error again? > Add a parameter to beeline to allow a properties file to be passed in > - > > Key: HIVE-13964 > URL: https://issues.apache.org/jira/browse/HIVE-13964 > Project: Hive > Issue Type: New Feature > Components: Beeline >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch > > > HIVE-6652 removed the ability to pass in a properties file as a beeline > parameter. It may be a useful feature to be able to pass the file in is a > parameter, such as --property-file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13392) disable speculative execution for ACID Compactor
[ https://issues.apache.org/jira/browse/HIVE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325068#comment-15325068 ] Alan Gates commented on HIVE-13392: --- Patch seems fine, though it seems to contains some stuff unrelated to the stated purpose of the JIRA (e.g. moving ValidCompactorTxnList around). +1 > disable speculative execution for ACID Compactor > > > Key: HIVE-13392 > URL: https://issues.apache.org/jira/browse/HIVE-13392 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-13392.2.patch, HIVE-13392.3.patch, > HIVE-13392.4.patch, HIVE-13392.patch > > > https://developer.yahoo.com/hadoop/tutorial/module4.html > Speculative execution is enabled by default. You can disable speculative > execution for the mappers and reducers by setting the > mapred.map.tasks.speculative.execution and > mapred.reduce.tasks.speculative.execution JobConf options to false, > respectively. > CompactorMR is currently not set up to handle speculative execution and may > lead to something like > {code} > 2016-02-08 22:56:38,256 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): > Failed to CREATE_FILE > /apps/hive/warehouse/service_logs_v2/ds=2016-01-20/_tmp_6cf08b9f-c2e2-4182-bc81-e032801b147f/base_13858600/bucket_4 > for DFSClient_attempt_1454628390210_27756_m_01_1_131224698_1 on > 172.18.129.12 because this file lease is currently owned by > DFSClient_attempt_1454628390210_27756_m_01_0_-2027182532_1 on > 172.18.129.18 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2937) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2562) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2451) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > {code} > Short term: disable speculative execution for this job > Longer term perhaps make each task write to dir with UUID... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications
[ https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325043#comment-15325043 ] Sravya Tirukkovalur commented on HIVE-13966: What we really need here is to bring DbNotificationListners as part of transaction. Without requiring all post event listeners to be part of transaction as the contracts can be different. Afaict, all listeners are synchronous. So we should think of a better name? > DbNotificationListener: can loose DDL operation notifications > - > > Key: HIVE-13966 > URL: https://issues.apache.org/jira/browse/HIVE-13966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Nachiket Vaidya >Priority: Critical > > The code for each API in HiveMetaStore.java is like this: > 1. openTransaction() > 2. -- operation-- > 3. commit() or rollback() based on result of the operation. > 4. add entry to notification log (unconditionally) > If the operation is failed (in step 2), we still add entry to notification > log. Found this issue in testing. > It is still ok as this is the case of false positive. > If the operation is successful and adding to notification log failed, the > user will get an MetaException. It will not rollback the operation, as it is > already committed. We need to handle this case so that we will not have false > negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in
[ https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325038#comment-15325038 ] Sergio Peña commented on HIVE-13964: we should not display the 'No such file or directory' error. I thin that if an unknown parameter is passed, then we can continue beeline. > Add a parameter to beeline to allow a properties file to be passed in > - > > Key: HIVE-13964 > URL: https://issues.apache.org/jira/browse/HIVE-13964 > Project: Hive > Issue Type: New Feature > Components: Beeline >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch > > > HIVE-6652 removed the ability to pass in a properties file as a beeline > parameter. It may be a useful feature to be able to pass the file in is a > parameter, such as --property-file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13833) Add an initial delay when starting the heartbeat
[ https://issues.apache.org/jira/browse/HIVE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325036#comment-15325036 ] Hive QA commented on HIVE-13833: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809297/HIVE-13833.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 10223 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testHeartbeater org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.checkExpectedLocks org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.lockConflictDbTable org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking10 org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking11 org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking3 org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking5 org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking7 org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking8 org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking9 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/75/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/75/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-75/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12809297 - PreCommit-HIVE-MASTER-Build > Add an initial delay when starting the heartbeat > > > Key: HIVE-13833 > URL: https://issues.apache.org/jira/browse/HIVE-13833 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Minor > Attachments: HIVE-13833.1.patch, HIVE-13833.2.patch > > > Since the scheduling of heartbeat happens immediately after lock acquisition, > it's unnecessary to send heartbeat at the time when locks is acquired. Add an > initial delay to skip this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session
[ https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-13264: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to master. Thanks [~nithinmahesh] for the work. > JDBC driver makes 2 Open Session Calls for every open session > - > > Key: HIVE-13264 > URL: https://issues.apache.org/jira/browse/HIVE-13264 > Project: Hive > Issue Type: Bug > Components: JDBC >Affects Versions: 1.2.1, 2.0.1 >Reporter: NITHIN MAHESH >Assignee: NITHIN MAHESH > Labels: jdbc > Fix For: 2.2.0 > > Attachments: HIVE-13264.1.patch, HIVE-13264.2.patch, > HIVE-13264.3.patch, HIVE-13264.4.patch, HIVE-13264.5.patch, > HIVE-13264.6.patch, HIVE-13264.6.patch, HIVE-13264.7.patch, > HIVE-13264.8.patch, HIVE-13264.9.patch, HIVE-13264.patch > > > When HTTP is used as the transport mode by the Hive JDBC driver, we noticed > that there is an additional open/close session just to validate the > connection. > > TCLIService.Iface client = new TCLIService.Client(new > TBinaryProtocol(transport)); > TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq()); > if (openResp != null) { > client.CloseSession(new > TCloseSessionReq(openResp.getSessionHandle())); > } > > The open session call is a costly one and should not be used to test > transport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session
[ https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-13264: Affects Version/s: 1.2.1 2.0.1 > JDBC driver makes 2 Open Session Calls for every open session > - > > Key: HIVE-13264 > URL: https://issues.apache.org/jira/browse/HIVE-13264 > Project: Hive > Issue Type: Bug > Components: JDBC >Affects Versions: 1.2.1, 2.0.1 >Reporter: NITHIN MAHESH >Assignee: NITHIN MAHESH > Labels: jdbc > Attachments: HIVE-13264.1.patch, HIVE-13264.2.patch, > HIVE-13264.3.patch, HIVE-13264.4.patch, HIVE-13264.5.patch, > HIVE-13264.6.patch, HIVE-13264.6.patch, HIVE-13264.7.patch, > HIVE-13264.8.patch, HIVE-13264.9.patch, HIVE-13264.patch > > > When HTTP is used as the transport mode by the Hive JDBC driver, we noticed > that there is an additional open/close session just to validate the > connection. > > TCLIService.Iface client = new TCLIService.Client(new > TBinaryProtocol(transport)); > TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq()); > if (openResp != null) { > client.CloseSession(new > TCloseSessionReq(openResp.getSessionHandle())); > } > > The open session call is a costly one and should not be used to test > transport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in
[ https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324967#comment-15324967 ] Abdullah Yousufi edited comment on HIVE-13964 at 6/10/16 6:22 PM: -- I resolved the first issue and now the error is set to 1 in that case. The second issue is pretty important though, because that was the issue HIVE-6652 addressed, so it's not good if it still exists. However, I checked and it seems that's the current behavior, so my patch doesn't reintroduce that issue. This is what I get if I do this in the repo (or with my patch): {code} $ ./beeline BLA Beeline version 2.2.0-SNAPSHOT by Apache Hive beeline> {code} Note, how I don't get the 'No such file or directory' statement error. What behavior do we want here? It seems that the fix from HIVE-6652 was reverted at some point. [~xuefuz] was (Author: ayousufi): I resolved the first issue and now the error is set to 1 in that case. The second issue is pretty important though, because that was the issue HIVE-6652 addressed, so it's not good if it still exists. However, I checked and it seems that's the current behavior in upstream currently, so my patch doesn't reintroduce that issue. This is what I get if I do this in upstream (or with my patch): {code} $ ./beeline BLA Beeline version 2.2.0-SNAPSHOT by Apache Hive beeline> {code} Note, how I don't get the 'No such file or directory' statement error. What behavior do we want here? It seems that the fix from HIVE-6652 was reverted at some point. [~xuefuz] > Add a parameter to beeline to allow a properties file to be passed in > - > > Key: HIVE-13964 > URL: https://issues.apache.org/jira/browse/HIVE-13964 > Project: Hive > Issue Type: New Feature > Components: Beeline >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch > > > HIVE-6652 removed the ability to pass in a properties file as a beeline > parameter. It may be a useful feature to be able to pass the file in is a > parameter, such as --property-file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13982) Extension to limit push down through order by & group by
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13982: --- Description: Pointed out by [~gopalv]. Queries which follow the format are not optimal with map-side aggregation, because the Map 1 does not have TopN in the reduce sink. These queries shuffle 100% of the aggregate in cases where the reduce de-dup does not kick in. {code} select state, city, sum(sales) from table group by state, city order by state, city limit 10; {code} {code} select state, city, sum(sales) from table group by city, state order by state, city limit 10; {code} {code} select state, city, sum(sales) from table group by city, state order by state desc, city limit 10; {code} was: Pointed out by [~gopalv]. Queries which follow the format are not optimal with map-side aggregation, because the Map 1 does not have TopN in the reduce sink. These queries shuffle 100% of the aggregate in cases where the reduce de-dup does not kick in. As input data grows, it falls off a cliff of performance after 4 reducers. {code} select state, city, sum(sales) from table group by state, city order by state, city limit 10; {code} {code} select state, city, sum(sales) from table group by city, state order by state, city limit 10; {code} {code} select state, city, sum(sales) from table group by city, state order by state desc, city limit 10; {code} > Extension to limit push down through order by & group by > > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > Queries which follow the format are not optimal with map-side aggregation, > because the Map 1 does not have TopN in the reduce sink. > These queries shuffle 100% of the aggregate in cases where the reduce de-dup > does not kick in. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session
[ https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325009#comment-15325009 ] Vaibhav Gumashta commented on HIVE-13264: - Failures look unrelated. Will commit shortly. > JDBC driver makes 2 Open Session Calls for every open session > - > > Key: HIVE-13264 > URL: https://issues.apache.org/jira/browse/HIVE-13264 > Project: Hive > Issue Type: Bug > Components: JDBC >Reporter: NITHIN MAHESH >Assignee: NITHIN MAHESH > Labels: jdbc > Attachments: HIVE-13264.1.patch, HIVE-13264.2.patch, > HIVE-13264.3.patch, HIVE-13264.4.patch, HIVE-13264.5.patch, > HIVE-13264.6.patch, HIVE-13264.6.patch, HIVE-13264.7.patch, > HIVE-13264.8.patch, HIVE-13264.9.patch, HIVE-13264.patch > > > When HTTP is used as the transport mode by the Hive JDBC driver, we noticed > that there is an additional open/close session just to validate the > connection. > > TCLIService.Iface client = new TCLIService.Client(new > TBinaryProtocol(transport)); > TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq()); > if (openResp != null) { > client.CloseSession(new > TCloseSessionReq(openResp.getSessionHandle())); > } > > The open session call is a costly one and should not be used to test > transport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13993) Hive should provide built-in UDF that can apply another UDF to each element of an array
[ https://issues.apache.org/jira/browse/HIVE-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324998#comment-15324998 ] Sergey Shelukhin commented on HIVE-13993: - Then we also need fold, and we can have map-reduce on top of Hive ;) > Hive should provide built-in UDF that can apply another UDF to each element > of an array > --- > > Key: HIVE-13993 > URL: https://issues.apache.org/jira/browse/HIVE-13993 > Project: Hive > Issue Type: New Feature >Reporter: Anthony Hsu > > There is currently no simple way to take an array field and apply a UDF on > each element of the array, returning a new array. This is a basic use case > that Hive should provide a built-in UDF for. More motivation: > http://stackoverflow.com/questions/27722493/how-to-invoke-udf-for-each-element-in-an-array-in-hive -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13954) Parquet logs should go to STDERR
[ https://issues.apache.org/jira/browse/HIVE-13954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13954: - Reporter: Takahiko Saito (was: Prasanth Jayachandran) > Parquet logs should go to STDERR > > > Key: HIVE-13954 > URL: https://issues.apache.org/jira/browse/HIVE-13954 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Takahiko Saito >Assignee: Prasanth Jayachandran > Labels: TODOC1.3, TODOC2.1 > Fix For: 1.3.0, 2.1.0, 2.2.0 > > Attachments: HIVE-13954-branch-1.patch, HIVE-13954.1.patch > > > Parquet uses java util logging. When java logging is not configured using > default logging.properties file, parquet's default fallback handler writes to > STDOUT at INFO level. Hive writes all logging to STDERR and writes only the > query output to STDOUT. Writing logs to STDOUT may cause issues when > comparing query results. > If we provide default logging.properties for parquet then we can configure it > to write to file or stderr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in
[ https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324967#comment-15324967 ] Abdullah Yousufi commented on HIVE-13964: - I resolved the first issue and now the error is set to 1 in that case. The second issue is pretty important though, because that was the issue HIVE-6652 addressed, so it's not good if it still exists. However, I checked and it seems that's the current behavior in upstream currently, so my patch doesn't reintroduce that issue. This is what I get if I do this in upstream (or with my patch): {code} $ ./beeline BLA Beeline version 2.2.0-SNAPSHOT by Apache Hive beeline> {code} Note, how I don't get the 'No such file or directory' statement error. What behavior do we want here? It seems that the fix from HIVE-6652 was reverted at some point. [~xuefuz] > Add a parameter to beeline to allow a properties file to be passed in > - > > Key: HIVE-13964 > URL: https://issues.apache.org/jira/browse/HIVE-13964 > Project: Hive > Issue Type: New Feature > Components: Beeline >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch > > > HIVE-6652 removed the ability to pass in a properties file as a beeline > parameter. It may be a useful feature to be able to pass the file in is a > parameter, such as --property-file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13982) Extension to limit push down through order by & group by
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13982: --- Attachment: HIVE-13982.2.patch > Extension to limit push down through order by & group by > > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > Queries which follow the format are not optimal with map-side aggregation, > because the Map 1 does not have TopN in the reduce sink. > These queries shuffle 100% of the aggregate in cases where the reduce de-dup > does not kick in. > As input data grows, it falls off a cliff of performance after 4 reducers. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-13982) Extension to limit push down through order by & group by
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-13982 started by Jesus Camacho Rodriguez. -- > Extension to limit push down through order by & group by > > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > Queries which follow the format are not optimal with map-side aggregation, > because the Map 1 does not have TopN in the reduce sink. > These queries shuffle 100% of the aggregate in cases where the reduce de-dup > does not kick in. > As input data grows, it falls off a cliff of performance after 4 reducers. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13982) Extension to limit push down through order by & group by
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13982: --- Status: Patch Available (was: In Progress) > Extension to limit push down through order by & group by > > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > Queries which follow the format are not optimal with map-side aggregation, > because the Map 1 does not have TopN in the reduce sink. > These queries shuffle 100% of the aggregate in cases where the reduce de-dup > does not kick in. > As input data grows, it falls off a cliff of performance after 4 reducers. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13982) Extension to limit push down through order by & group by
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13982: --- Status: Open (was: Patch Available) > Extension to limit push down through order by & group by > > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > Queries which follow the format are not optimal with map-side aggregation, > because the Map 1 does not have TopN in the reduce sink. > These queries shuffle 100% of the aggregate in cases where the reduce de-dup > does not kick in. > As input data grows, it falls off a cliff of performance after 4 reducers. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13392) disable speculative execution for ACID Compactor
[ https://issues.apache.org/jira/browse/HIVE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324924#comment-15324924 ] Wei Zheng commented on HIVE-13392: -- +1 > disable speculative execution for ACID Compactor > > > Key: HIVE-13392 > URL: https://issues.apache.org/jira/browse/HIVE-13392 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-13392.2.patch, HIVE-13392.3.patch, > HIVE-13392.4.patch, HIVE-13392.patch > > > https://developer.yahoo.com/hadoop/tutorial/module4.html > Speculative execution is enabled by default. You can disable speculative > execution for the mappers and reducers by setting the > mapred.map.tasks.speculative.execution and > mapred.reduce.tasks.speculative.execution JobConf options to false, > respectively. > CompactorMR is currently not set up to handle speculative execution and may > lead to something like > {code} > 2016-02-08 22:56:38,256 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): > Failed to CREATE_FILE > /apps/hive/warehouse/service_logs_v2/ds=2016-01-20/_tmp_6cf08b9f-c2e2-4182-bc81-e032801b147f/base_13858600/bucket_4 > for DFSClient_attempt_1454628390210_27756_m_01_1_131224698_1 on > 172.18.129.12 because this file lease is currently owned by > DFSClient_attempt_1454628390210_27756_m_01_0_-2027182532_1 on > 172.18.129.18 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2937) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2562) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2451) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > {code} > Short term: disable speculative execution for this job > Longer term perhaps make each task write to dir with UUID... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13833) Add an initial delay when starting the heartbeat
[ https://issues.apache.org/jira/browse/HIVE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324846#comment-15324846 ] Eugene Koifman commented on HIVE-13833: --- +1 pending tests > Add an initial delay when starting the heartbeat > > > Key: HIVE-13833 > URL: https://issues.apache.org/jira/browse/HIVE-13833 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Minor > Attachments: HIVE-13833.1.patch, HIVE-13833.2.patch > > > Since the scheduling of heartbeat happens immediately after lock acquisition, > it's unnecessary to send heartbeat at the time when locks is acquired. Add an > initial delay to skip this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in
[ https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324820#comment-15324820 ] Sergio Peña commented on HIVE-13964: #2 is ok. It is displayed now. Maybe I wasn't noticing it. #3 It does exit. There are other problems. If the file passed does not exist, it exits (OK), but the exit error is 0, We should have an error number higher than 0. Sometimes users use this number on their scripts to see if beeline run correctly or not. {noformat} # beeline --property-file /tmp/a /tmp/a (No such file or directory) Beeline version 2.2.0-SNAPSHOT by Apache Hive # echo $? 0 {noformat} If I pass a different argument, Beeline only displays 'No such file or directory', and it continues with it. {noformat} # beeline BLA Beeline version 2.2.0-SNAPSHOT by Apache Hive No such file or directory beeline> {noformat} > Add a parameter to beeline to allow a properties file to be passed in > - > > Key: HIVE-13964 > URL: https://issues.apache.org/jira/browse/HIVE-13964 > Project: Hive > Issue Type: New Feature > Components: Beeline >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch > > > HIVE-6652 removed the ability to pass in a properties file as a beeline > parameter. It may be a useful feature to be able to pass the file in is a > parameter, such as --property-file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13159) TxnHandler should support datanucleus.connectionPoolingType = None
[ https://issues.apache.org/jira/browse/HIVE-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-13159: -- Status: Patch Available (was: Open) > TxnHandler should support datanucleus.connectionPoolingType = None > -- > > Key: HIVE-13159 > URL: https://issues.apache.org/jira/browse/HIVE-13159 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Sergey Shelukhin >Assignee: Alan Gates > Attachments: HIVE-13159.2.patch, HIVE-13159.3.patch, HIVE-13159.patch > > > Right now, one has to choose bonecp or dbcp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13159) TxnHandler should support datanucleus.connectionPoolingType = None
[ https://issues.apache.org/jira/browse/HIVE-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-13159: -- Attachment: HIVE-13159.3.patch New version of the patch updated to match current master. > TxnHandler should support datanucleus.connectionPoolingType = None > -- > > Key: HIVE-13159 > URL: https://issues.apache.org/jira/browse/HIVE-13159 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Sergey Shelukhin >Assignee: Alan Gates > Attachments: HIVE-13159.2.patch, HIVE-13159.3.patch, HIVE-13159.patch > > > Right now, one has to choose bonecp or dbcp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint
[ https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-13725: Attachment: (was: HIVE-13725.1.patch) > ACID: Streaming API should synchronize calls when multiple threads use the > same endpoint > > > Key: HIVE-13725 > URL: https://issues.apache.org/jira/browse/HIVE-13725 > Project: Hive > Issue Type: Bug > Components: HCatalog, Metastore, Transactions >Affects Versions: 1.2.1, 2.0.0 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Critical > Labels: ACID, Streaming > Attachments: HIVE-13725.1.patch > > > Currently, the streaming endpoint creates a metastore client which gets used > for RPC. The client itself is not internally thread safe. Therefore, the API > methods should provide the relevant synchronization so that the methods can > be called from different threads. A sample use case is as follows: > 1. Thread 1 creates a streaming endpoint and opens a txn batch. > 2. Thread 2 heartbeats the txn batch. > With the current impl, this can result in an "out of sequence response", > since the response of the calls in thread1 might end up going to thread2 and > vice-versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13968) CombineHiveInputFormat does not honor InputFormat that implements AvoidSplitCombination
[ https://issues.apache.org/jira/browse/HIVE-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Rajaperumal updated HIVE-13968: Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for reviewing the change [~ruili] > CombineHiveInputFormat does not honor InputFormat that implements > AvoidSplitCombination > --- > > Key: HIVE-13968 > URL: https://issues.apache.org/jira/browse/HIVE-13968 > Project: Hive > Issue Type: Bug >Reporter: Prasanna Rajaperumal >Assignee: Prasanna Rajaperumal > Attachments: HIVE-13968.1.patch, HIVE-13968.2.patch, > HIVE-13968.3.patch > > > If I have 100 path[] , the nonCombinablePaths will have only the paths > paths[0-9] and the rest of the paths will be in combinablePaths, even if the > inputformat returns false for AvoidSplitCombination.shouldSkipCombine() for > all the paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint
[ https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-13725: Attachment: HIVE-13725.1.patch > ACID: Streaming API should synchronize calls when multiple threads use the > same endpoint > > > Key: HIVE-13725 > URL: https://issues.apache.org/jira/browse/HIVE-13725 > Project: Hive > Issue Type: Bug > Components: HCatalog, Metastore, Transactions >Affects Versions: 1.2.1, 2.0.0 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Critical > Labels: ACID, Streaming > Attachments: HIVE-13725.1.patch > > > Currently, the streaming endpoint creates a metastore client which gets used > for RPC. The client itself is not internally thread safe. Therefore, the API > methods should provide the relevant synchronization so that the methods can > be called from different threads. A sample use case is as follows: > 1. Thread 1 creates a streaming endpoint and opens a txn batch. > 2. Thread 2 heartbeats the txn batch. > With the current impl, this can result in an "out of sequence response", > since the response of the calls in thread1 might end up going to thread2 and > vice-versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint
[ https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-13725: Attachment: HIVE-13725.1.patch > ACID: Streaming API should synchronize calls when multiple threads use the > same endpoint > > > Key: HIVE-13725 > URL: https://issues.apache.org/jira/browse/HIVE-13725 > Project: Hive > Issue Type: Bug > Components: HCatalog, Metastore, Transactions >Affects Versions: 1.2.1, 2.0.0 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Critical > Labels: ACID, Streaming > Attachments: HIVE-13725.1.patch > > > Currently, the streaming endpoint creates a metastore client which gets used > for RPC. The client itself is not internally thread safe. Therefore, the API > methods should provide the relevant synchronization so that the methods can > be called from different threads. A sample use case is as follows: > 1. Thread 1 creates a streaming endpoint and opens a txn batch. > 2. Thread 2 heartbeats the txn batch. > With the current impl, this can result in an "out of sequence response", > since the response of the calls in thread1 might end up going to thread2 and > vice-versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13982) Extension to limit push down through order by & group by
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13982: --- Description: Pointed out by [~gopalv]. Queries which follow the format are not optimal with map-side aggregation, because the Map 1 does not have TopN in the reduce sink. These queries shuffle 100% of the aggregate in cases where the reduce de-dup does not kick in. As input data grows, it falls off a cliff of performance after 4 reducers. {code} select state, city, sum(sales) from table group by state, city order by state, city limit 10; {code} {code} select state, city, sum(sales) from table group by city, state order by state, city limit 10; {code} {code} select state, city, sum(sales) from table group by city, state order by state desc, city limit 10; {code} was: Queries which follow the format are not optimal with map-side aggregation, because the Map 1 does not have TopN in the reduce sink. These queries shuffle 100% of the aggregate in cases where the reduce de-dup does not kick in. As input data grows, it falls off a cliff of performance after 4 reducers. {code} select state, city, sum(sales) from table group by state, city order by state, city limit 10; {code} {code} select state, city, sum(sales) from table group by city, state order by state, city limit 10; {code} {code} select state, city, sum(sales) from table group by city, state order by state desc, city limit 10; {code} > Extension to limit push down through order by & group by > > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.patch > > > Pointed out by [~gopalv]. > Queries which follow the format are not optimal with map-side aggregation, > because the Map 1 does not have TopN in the reduce sink. > These queries shuffle 100% of the aggregate in cases where the reduce de-dup > does not kick in. > As input data grows, it falls off a cliff of performance after 4 reducers. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in
[ https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324750#comment-15324750 ] Hive QA commented on HIVE-13964: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809293/HIVE-13964.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10225 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/74/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/74/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-74/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12809293 - PreCommit-HIVE-MASTER-Build > Add a parameter to beeline to allow a properties file to be passed in > - > > Key: HIVE-13964 > URL: https://issues.apache.org/jira/browse/HIVE-13964 > Project: Hive > Issue Type: New Feature > Components: Beeline >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch > > > HIVE-6652 removed the ability to pass in a properties file as a beeline > parameter. It may be a useful feature to be able to pass the file in is a > parameter, such as --property-file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13838) Set basic stats as inaccurate for all ACID tables
[ https://issues.apache.org/jira/browse/HIVE-13838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324748#comment-15324748 ] Pengcheng Xiong commented on HIVE-13838: Thanks [~ekoifman], i will take a look again. Thanks for finding this! > Set basic stats as inaccurate for all ACID tables > - > > Key: HIVE-13838 > URL: https://issues.apache.org/jira/browse/HIVE-13838 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-13838.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13908) Beeline adds extra fractional digits when you insert values to table with float data type
[ https://issues.apache.org/jira/browse/HIVE-13908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324738#comment-15324738 ] Sergio Peña commented on HIVE-13908: This issue is related to a Thrift communication. The FLOAT data is correctly loaded on HS2, but in order to send it to beeline, it needs to cast it as DOUBLE because Thrift does not support FLOAT data types. This is where the decimals got extended. This is the code where it happens: https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/thrift/ColumnBuffer.java#L368 {noformat} case FLOAT_TYPE: nulls.set(size, field == null); doubleVars()[size] = field == null ? 0 : new Double(field.toString()); break; {noformat} > Beeline adds extra fractional digits when you insert values to table with > float data type > -- > > Key: HIVE-13908 > URL: https://issues.apache.org/jira/browse/HIVE-13908 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Takahiko Saito > > Via beeline, although -35664.76 is inserted, -35664.76171875 is displayed > {noformat} > 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> drop table test; > No rows affected (0.067 seconds) > 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> create table test(f float); > No rows affected (0.248 seconds) > 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test > values(-35664.76),(29497.34); > INFO : Tez session hasn't been created yet. Opening session > INFO : Dag name: insert into table tes...35664.76),(29497.34)(Stage-1) > INFO : > INFO : Status: Running (Executing on YARN cluster with App id > application_1464727816747_0019) > INFO : Map 1: -/- > INFO : Map 1: 0/1 > INFO : Map 1: 0/1 > INFO : Map 1: 0(+1)/1 > INFO : Map 1: 1/1 > INFO : Loading data to table default.test from > hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-01_20-16-32_885_9161749848563358684-1/-ext-1 > INFO : Table default.test stats: [numFiles=1, numRows=2, totalSize=19, > rawDataSize=17] > No rows affected (31.725 seconds) > 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test; > +--+--+ > | test.f | > +--+--+ > | -35664.76171875 | > | 29497.33984375 | > +--+--+ > 2 rows selected (0.143 seconds) > {noformat} > The issue is not seen via Hive CLI: > {noformat} > hive> create table test(f float); > OK > Time taken: 0.32 seconds > hive> insert into table test values(-35664.76),(29497.34); > Query ID = hrt_qa_20160601202446_75f38c5d-f52b-45b3-b67a-8a8b0a194305 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1464727816747_0020) > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED > KILLED > > Map 1 .. SUCCEEDED 1 100 0 > 0 > > VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 7.66 s > > Loading data to table default.test > Table default.test stats: [numFiles=1, numRows=2, totalSize=19, > rawDataSize=17] > OK > Time taken: 11.477 seconds > hive> select * from test; > OK > -35664.76 > 29497.34 > Time taken: 0.144 seconds, Fetched: 2 row(s) > {noformat} > hdfs file shows expected value: > {noformat} > 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> dfs -cat > hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/00_0 > 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> ; > +-+--+ > | DFS Output | > +-+--+ > | -35664.76 | > | 29497.34| > +-+--+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13788) hive msck listpartitions need to make use of directSQL instead of datanucleus
[ https://issues.apache.org/jira/browse/HIVE-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324659#comment-15324659 ] Ashutosh Chauhan commented on HIVE-13788: - Lets keep changes minimal to msck. Can you please get rid of changes from other code path as we get better understanding of getPartsWithAuthInfo(). Second, to retrieve all partitions instead of adding new method, please instead use PartitionPruner::prune() method. > hive msck listpartitions need to make use of directSQL instead of datanucleus > - > > Key: HIVE-13788 > URL: https://issues.apache.org/jira/browse/HIVE-13788 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Hari Sankar Sivarama Subramaniyan >Priority: Minor > Attachments: HIVE-13788.1.patch, msck_call_stack_with_fix.png, > msck_stack_trace.png > > > Currently, for tables having 1000s of partitions too many DB calls are made > via datanucleus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13838) Set basic stats as inaccurate for all ACID tables
[ https://issues.apache.org/jira/browse/HIVE-13838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324589#comment-15324589 ] Eugene Koifman commented on HIVE-13838: --- [~pxiong] as far as I can tell this is still not fixed. Please see https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/70/testReport/ or https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/73/testReport/ the same set of tests keeps failing > Set basic stats as inaccurate for all ACID tables > - > > Key: HIVE-13838 > URL: https://issues.apache.org/jira/browse/HIVE-13838 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-13838.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13392) disable speculative execution for ACID Compactor
[ https://issues.apache.org/jira/browse/HIVE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324582#comment-15324582 ] Eugene Koifman commented on HIVE-13392: --- all failures have age > 1 [~wzheng] or [~alangates] could you review please? > disable speculative execution for ACID Compactor > > > Key: HIVE-13392 > URL: https://issues.apache.org/jira/browse/HIVE-13392 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-13392.2.patch, HIVE-13392.3.patch, > HIVE-13392.4.patch, HIVE-13392.patch > > > https://developer.yahoo.com/hadoop/tutorial/module4.html > Speculative execution is enabled by default. You can disable speculative > execution for the mappers and reducers by setting the > mapred.map.tasks.speculative.execution and > mapred.reduce.tasks.speculative.execution JobConf options to false, > respectively. > CompactorMR is currently not set up to handle speculative execution and may > lead to something like > {code} > 2016-02-08 22:56:38,256 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): > Failed to CREATE_FILE > /apps/hive/warehouse/service_logs_v2/ds=2016-01-20/_tmp_6cf08b9f-c2e2-4182-bc81-e032801b147f/base_13858600/bucket_4 > for DFSClient_attempt_1454628390210_27756_m_01_1_131224698_1 on > 172.18.129.12 because this file lease is currently owned by > DFSClient_attempt_1454628390210_27756_m_01_0_-2027182532_1 on > 172.18.129.18 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2937) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2562) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2451) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:688) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > {code} > Short term: disable speculative execution for this job > Longer term perhaps make each task write to dir with UUID... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13968) CombineHiveInputFormat does not honor InputFormat that implements AvoidSplitCombination
[ https://issues.apache.org/jira/browse/HIVE-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324529#comment-15324529 ] Hive QA commented on HIVE-13968: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809280/HIVE-13968.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10224 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/73/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/73/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-73/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12809280 - PreCommit-HIVE-MASTER-Build > CombineHiveInputFormat does not honor InputFormat that implements > AvoidSplitCombination > --- > > Key: HIVE-13968 > URL: https://issues.apache.org/jira/browse/HIVE-13968 > Project: Hive > Issue Type: Bug >Reporter: Prasanna Rajaperumal >Assignee: Prasanna Rajaperumal > Attachments: HIVE-13968.1.patch, HIVE-13968.2.patch, > HIVE-13968.3.patch > > > If I have 100 path[] , the nonCombinablePaths will have only the paths > paths[0-9] and the rest of the paths will be in combinablePaths, even if the > inputformat returns false for AvoidSplitCombination.shouldSkipCombine() for > all the paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324449#comment-15324449 ] BELUGA BEHR commented on HIVE-13278: This problem does not seem specific to Spark. I believe it happens when Hive starts a map-only MapReduce job. It doesn't generate a reduce.xml as there isn't a reduce phase, but later it tries to put reduce.xml in distributed cache when submitting the job, which cause this error. > Many redundant 'File not found' messages appeared in container log during > query execution with Hive on Spark > > > Key: HIVE-13278 > URL: https://issues.apache.org/jira/browse/HIVE-13278 > Project: Hive > Issue Type: Bug > Environment: Hive on Spark engine > Found based on : > Apache Hive 2.0.0 > Apache Spark 1.6.0 >Reporter: Xin Hao >Priority: Minor > > Many redundant 'File not found' messages appeared in container log during > query execution with Hive on Spark. > Certainly, it doesn't prevent the query from running successfully. So mark it > as Minor currently. > Error message example: > 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: > /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13903) getFunctionInfo is downloading jar on every call
[ https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajat Khandelwal updated HIVE-13903: Attachment: HIVE-13903.02.patch > getFunctionInfo is downloading jar on every call > > > Key: HIVE-13903 > URL: https://issues.apache.org/jira/browse/HIVE-13903 > Project: Hive > Issue Type: Bug >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch, > HIVE-13903.02.patch > > > on queries using permanent udfs, the jar file of the udf is downloaded > multiple times. Each call originating from Registry.getFunctionInfo. This > increases time for the query, especially if that query is just an explain > query. The jar should be downloaded once, and not downloaded again if the udf > class is accessible in the current thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13903) getFunctionInfo is downloading jar on every call
[ https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324416#comment-15324416 ] Rajat Khandelwal commented on HIVE-13903: - Taking patch from reviewboard and attaching > getFunctionInfo is downloading jar on every call > > > Key: HIVE-13903 > URL: https://issues.apache.org/jira/browse/HIVE-13903 > Project: Hive > Issue Type: Bug >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch, > HIVE-13903.02.patch > > > on queries using permanent udfs, the jar file of the udf is downloaded > multiple times. Each call originating from Registry.getFunctionInfo. This > increases time for the query, especially if that query is just an explain > query. The jar should be downloaded once, and not downloaded again if the udf > class is accessible in the current thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13731) LLAP: return LLAP token with the splits
[ https://issues.apache.org/jira/browse/HIVE-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324376#comment-15324376 ] Hive QA commented on HIVE-13731: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809265/HIVE-13731.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10193 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-join1.q-mapjoin_decimal.q-vectorized_distinct_gby.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vectorization_13.q-schema_evol_text_nonvec_mapwork_part_all_primitive.q-bucket3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/72/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/72/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-72/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12809265 - PreCommit-HIVE-MASTER-Build > LLAP: return LLAP token with the splits > --- > > Key: HIVE-13731 > URL: https://issues.apache.org/jira/browse/HIVE-13731 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13731.01.patch, HIVE-13731.01.wo.13675-13443.patch, > HIVE-13731.02.patch, HIVE-13731.patch, HIVE-13731.wo.13444-13675-13443.patch > > > Need to return the token with the splits, then take it in LLAPIF and make > sure it's used when talking to LLAP -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13968) CombineHiveInputFormat does not honor InputFormat that implements AvoidSplitCombination
[ https://issues.apache.org/jira/browse/HIVE-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324340#comment-15324340 ] Rui Li commented on HIVE-13968: --- +1 > CombineHiveInputFormat does not honor InputFormat that implements > AvoidSplitCombination > --- > > Key: HIVE-13968 > URL: https://issues.apache.org/jira/browse/HIVE-13968 > Project: Hive > Issue Type: Bug >Reporter: Prasanna Rajaperumal >Assignee: Prasanna Rajaperumal > Attachments: HIVE-13968.1.patch, HIVE-13968.2.patch, > HIVE-13968.3.patch > > > If I have 100 path[] , the nonCombinablePaths will have only the paths > paths[0-9] and the rest of the paths will be in combinablePaths, even if the > inputformat returns false for AvoidSplitCombination.shouldSkipCombine() for > all the paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13903) getFunctionInfo is downloading jar on every call
[ https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324336#comment-15324336 ] Rajat Khandelwal commented on HIVE-13903: - Created https://reviews.apache.org/r/48544/ > getFunctionInfo is downloading jar on every call > > > Key: HIVE-13903 > URL: https://issues.apache.org/jira/browse/HIVE-13903 > Project: Hive > Issue Type: Bug >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch > > > on queries using permanent udfs, the jar file of the udf is downloaded > multiple times. Each call originating from Registry.getFunctionInfo. This > increases time for the query, especially if that query is just an explain > query. The jar should be downloaded once, and not downloaded again if the udf > class is accessible in the current thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13903) getFunctionInfo is downloading jar on every call
[ https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324338#comment-15324338 ] Rajat Khandelwal commented on HIVE-13903: - Taking patch from reviewboard and attaching > getFunctionInfo is downloading jar on every call > > > Key: HIVE-13903 > URL: https://issues.apache.org/jira/browse/HIVE-13903 > Project: Hive > Issue Type: Bug >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch > > > on queries using permanent udfs, the jar file of the udf is downloaded > multiple times. Each call originating from Registry.getFunctionInfo. This > increases time for the query, especially if that query is just an explain > query. The jar should be downloaded once, and not downloaded again if the udf > class is accessible in the current thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13903) getFunctionInfo is downloading jar on every call
[ https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajat Khandelwal updated HIVE-13903: Status: Patch Available (was: In Progress) > getFunctionInfo is downloading jar on every call > > > Key: HIVE-13903 > URL: https://issues.apache.org/jira/browse/HIVE-13903 > Project: Hive > Issue Type: Bug >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch > > > on queries using permanent udfs, the jar file of the udf is downloaded > multiple times. Each call originating from Registry.getFunctionInfo. This > increases time for the query, especially if that query is just an explain > query. The jar should be downloaded once, and not downloaded again if the udf > class is accessible in the current thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13903) getFunctionInfo is downloading jar on every call
[ https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajat Khandelwal updated HIVE-13903: Attachment: HIVE-13903.01.patch > getFunctionInfo is downloading jar on every call > > > Key: HIVE-13903 > URL: https://issues.apache.org/jira/browse/HIVE-13903 > Project: Hive > Issue Type: Bug >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch > > > on queries using permanent udfs, the jar file of the udf is downloaded > multiple times. Each call originating from Registry.getFunctionInfo. This > increases time for the query, especially if that query is just an explain > query. The jar should be downloaded once, and not downloaded again if the udf > class is accessible in the current thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)