[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16813: Description: Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata and data files corresponding to the event. The event is dumped in the same sequence in which it was generated. Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and sort it using compareTo algorithm of FileStatus class which doesn't check the length before sorting it alphabetically. Due to this, the event-100 is processed before event-99 and hence making the replica database non-sync with source. Need to use a customized compareTo algorithm to sort the FileStatus. was: Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata and data files corresponding to the event. The event is dumped in the same sequence in which it was generated. Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and sort it using compareTo algorithm of FileStatus class which doesn't check the length before sorting it alphabetically. Due to this, the event-100 is processed before event-99 and hence making the replica database unreliable. Need to use a customized compareTo algorithm to sort the FileStatus. > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16813 started by Sankar Hariappan. --- > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036805#comment-16036805 ] Rui Li commented on HIVE-16573: --- [~anishek], yeah spark only supports in-place update in CLI. In the current method we don't check {{isHiveServerQuery}} for tez either (actually as Bing mentioned we can't access SessionState here), so I suppose the change is OK, right? > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
[ https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036787#comment-16036787 ] Hive QA commented on HIVE-16824: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871197/HIVE-16824.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] (batchId=232) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5530/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5530/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5530/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12871197 - PreCommit-HIVE-Build > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers > -- > > Key: HIVE-16824 > URL: https://issues.apache.org/jira/browse/HIVE-16824 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16824.1.patch > > > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
[ https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16824: - Affects Version/s: 3.0.0 > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers > -- > > Key: HIVE-16824 > URL: https://issues.apache.org/jira/browse/HIVE-16824 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16824.1.patch > > > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036830#comment-16036830 ] Hive QA commented on HIVE-11297: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871201/HIVE-11297.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] (batchId=232) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5531/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5531/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5531/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12871201 - PreCommit-HIVE-Build > Combine op trees for partition info generating tasks [Spark branch] > --- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-12412) Multi insert queries fail to run properly in hive 1.1.x or later.
[ https://issues.apache.org/jira/browse/HIVE-12412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklaus Xiao updated HIVE-12412: Affects Version/s: 2.3.0 > Multi insert queries fail to run properly in hive 1.1.x or later. > - > > Key: HIVE-12412 > URL: https://issues.apache.org/jira/browse/HIVE-12412 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.0, 2.3.0 >Reporter: John P. Petrakis > Labels: Correctness, CorrectnessBug > > We use multi insert queries to take data in one table and manipulate it by > inserting it into a results table. Queries are of this form: > from (select * from data_table lateral view explode(data_table.f2) f2 as > explode_f2) as explode_data_table >insert overwrite table results_table partition (q_id='C.P1',rl='1') >select >array(cast(if(explode_data_table.f1 is null or > explode_data_table.f1='', 'UNKNOWN',explode_data_table.f1) as > String),cast(explode_f2.s1 as String)) as dimensions, >ARRAY(CAST(sum(explode_f2.d1) as Double)) as metrics, >null as rownm >where (explode_data_table.date_id between 20151016 and 20151016) >group by >if(explode_data_table.f1 is null or explode_data_table.f1='', > 'UNKNOWN',explode_data_table.f1), >explode_f2.s1 >INSERT OVERWRITE TABLE results_table PARTITION (q_id='C.P2',rl='0') >SELECT ARRAY(CAST('Total' as String),CAST('Total' as String)) AS > dimensions, >ARRAY(CAST(sum(explode_f2.d1) as Double)) AS metrics, >null AS rownm >WHERE (explode_data_table.date_id BETWEEN 20151016 AND 20151016) >INSERT OVERWRITE TABLE results_table PARTITION (q_id='C.P5',rl='0') >SELECT >ARRAY(CAST('Total' as String)) AS dimensions, >ARRAY(CAST(sum(explode_f2.d1) as Double)) AS metrics, >null AS rownm >WHERE (explode_data_table.date_id BETWEEN 20151016 AND 20151016) > This query is meant to total a given field of a struct that is potentially a > list of structs. For our test data set, which consists of a single row, the > summation yields "Null", with messages in the hive log of the nature: > Missing fields! Expected 2 fields but only got 1! Ignoring similar problems. > or "Extra fields detected..." > For significantly more data, this query will eventually cause a run time > error while processing a column (caused by array index out of bounds > exception in one of the lazy binary classes such as LazyBinaryString or > LazyBinaryStruct). > Using the query above from the hive command line, the following data was used: > (note there are tabs in the data below) > string oneone:1.0:1.00:10.0,eon:1.0:1.00:100.0 > string twotwo:2.0:2.00:20.0,otw:2.0:2.00:20.0,wott:2.0:2.00:20.0 > string thrthree:3.0:3.00:30.0 > string foufour:4.0:4.00:40.0 > There are two fields, a string, (eg. 'string one') and a list of structs. > The following is used to create the table: > create table if not exists t1 ( > f1 string, > f2 > array> > ) > partitioned by (clid string, date_id string) > row format delimited fields > terminated by '09' > collection items terminated by ',' > map keys terminated by ':' > lines terminated by '10' > location '/user/hive/warehouse/t1'; > And the following is used to load the data: > load data local inpath '/path/to/data/file/cplx_test.data2' OVERWRITE into > table t1 partition(client_id='987654321',date_id='20151016'); > The resulting table should yield the following: > ["string fou","four"] [4.0] nullC.P11 > ["string one","eon"] [1.0] nullC.P11 > ["string one","one"] [1.0] nullC.P11 > ["string thr","three"][3.0] nullC.P11 > ["string two","otw"] [2.0] nullC.P11 > ["string two","two"] [2.0] nullC.P11 > ["string two","wott"] [2.0] nullC.P11 > ["Total","Total"] [15.0] nullC.P20 > ["Total"] [15.0] nullC.P50 > However what we get is: > Hive Runtime Error while processing row > {"_col2":2.5306499719322744E-258,"_col3":""} (ultimately due to an array > index out of bounds exception) > If we reduce the above data to a SINGLE row, the we don't get an exception > but the total fields come out as NULL. > The ONLY way this query would work is > 1) if I added a group by (date_id) or even group by ('') as the last line in > the query... or removed the last where
[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036697#comment-16036697 ] Rui Li commented on HIVE-16573: --- +1. [~anishek] would you mind also have a look? Thanks > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails
[ https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036722#comment-16036722 ] Hive QA commented on HIVE-16780: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871199/HIVE-16780.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_reverse] (batchId=83) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] (batchId=232) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5529/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5529/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5529/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12871199 - PreCommit-HIVE-Build > Case "multiple sources, single key" in spark_dynamic_pruning.q fails > - > > Key: HIVE-16780 > URL: https://issues.apache.org/jira/browse/HIVE-16780 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16780.patch > > > script.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > set hive.spark.dynamic.partition.pruning=true; > -- multiple sources, single key > select count(*) from srcpart join srcpart_date on (srcpart.ds = > srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr) > {code} > if disabling "hive.optimize.index.filter", case passes otherwise it always > hang out in the first job. Exception > {code} > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 PerfLogger: method=SparkInitializeOperators start=1495899585574 end=1495899585933 > duration=359 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler> > 17/05/27 23:39:45 INFO Executor task launch worker-0 Utilities: PLAN PATH = > hdfs://bdpe41:8020/tmp/hive/root/029a2d8a-c6e5-4ea9-adea-ef8fbea3cde2/hive_2017-05-27_23-39-06_464_5915518562441677640-1/-mr-10007/617d9dd6-9f9a-4786-8131-a7b98e8abc3e/map.xml > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 Utilities: Found plan > in cache for name: map.xml > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 DFSClient: Connecting > to datanode 10.239.47.162:50010 > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 MapOperator: Processing > alias(es) srcpart_hour for file > hdfs://bdpe41:8020/user/hive/warehouse/srcpart_hour/08_0 > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 ObjectCache: Creating > root_20170527233906_ac2934e1-2e58-4116-9f0d-35dee302d689_DynamicValueRegistry > 17/05/27 23:39:45 ERROR Executor task launch worker-0 SparkMapRecordHandler: > Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: Hive > Runtime Error while processing row {"hr":"11","hour":"11"} > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row {"hr":"11","hour":"11"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:136) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at >
[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036736#comment-16036736 ] anishek commented on HIVE-16573: +1 looks good, I think this bug to manage the in place update progress on hive-cli side, this still does not take care of showing the progress on beeline side ? > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036829#comment-16036829 ] anishek commented on HIVE-16573: [~lirui] yes change is fine, I was just confirming that we were on the same page: that this is only for cli. > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-11297: Attachment: HIVE-11297.2.patch > Combine op trees for partition info generating tasks [Spark branch] > --- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
[ https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036845#comment-16036845 ] ZhangBing Lin commented on HIVE-16824: -- Unit tests failed not related to the patch > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers > -- > > Key: HIVE-16824 > URL: https://issues.apache.org/jira/browse/HIVE-16824 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16824.1.patch > > > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16768) NOT operator returns NULL from result of <=>
[ https://issues.apache.org/jira/browse/HIVE-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037073#comment-16037073 ] Fei Hui commented on HIVE-16768: [~sterligovak] resolve it as duplicate. Please reopen it if it is not fixed > NOT operator returns NULL from result of <=> > > > Key: HIVE-16768 > URL: https://issues.apache.org/jira/browse/HIVE-16768 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: Alexander Sterligov >Assignee: Fei Hui > > {{SELECT "foo" <=> null;}} > returns {{false}} as expected. > {{SELECT NOT("foo" <=> null);}} > returns NULL, but should return {{true}}. > Workaround is > {{SELECT NOT(COALESCE("foo" <=> null));}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16736) General Improvements to BufferedRows
[ https://issues.apache.org/jira/browse/HIVE-16736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HIVE-16736: --- Description: General improvements for {{BufferedRows.java}}. Use {{ArrayList}} instead of {{LinkedList}} to conserve memory for large data sets, prevent having to loop through the entire data set twice in {{normalizeWidths}} method, some simplifications. (was: General improvements for {{BufferedRows.java}}. Use {{ArrayList}} instead of {{LinkedList}}, prevent having to loop through the entire data set twice in {{normalizeWidths}} method, some simplifications.) > General Improvements to BufferedRows > > > Key: HIVE-16736 > URL: https://issues.apache.org/jira/browse/HIVE-16736 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16736.1.patch > > > General improvements for {{BufferedRows.java}}. Use {{ArrayList}} instead of > {{LinkedList}} to conserve memory for large data sets, prevent having to loop > through the entire data set twice in {{normalizeWidths}} method, some > simplifications. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HIVE-16768) NOT operator returns NULL from result of <=>
[ https://issues.apache.org/jira/browse/HIVE-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hui resolved HIVE-16768. Resolution: Duplicate > NOT operator returns NULL from result of <=> > > > Key: HIVE-16768 > URL: https://issues.apache.org/jira/browse/HIVE-16768 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: Alexander Sterligov >Assignee: Fei Hui > > {{SELECT "foo" <=> null;}} > returns {{false}} as expected. > {{SELECT NOT("foo" <=> null);}} > returns NULL, but should return {{true}}. > Workaround is > {{SELECT NOT(COALESCE("foo" <=> null));}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16736) General Improvements to BufferedRows
[ https://issues.apache.org/jira/browse/HIVE-16736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037140#comment-16037140 ] BELUGA BEHR commented on HIVE-16736: Unrelated test failures > General Improvements to BufferedRows > > > Key: HIVE-16736 > URL: https://issues.apache.org/jira/browse/HIVE-16736 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16736.1.patch > > > General improvements for {{BufferedRows.java}}. Use {{ArrayList}} instead of > {{LinkedList}}, prevent having to loop through the entire data set twice in > {{normalizeWidths}} method, some simplifications. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16758) Better Select Number of Replications
[ https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HIVE-16758: --- Attachment: HIVE-16758.1.patch > Better Select Number of Replications > > > Key: HIVE-16758 > URL: https://issues.apache.org/jira/browse/HIVE-16758 > Project: Hive > Issue Type: Improvement >Reporter: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16758.1.patch > > > {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}} > We should be smarter about how we pick a replication number. We should add a > new configuration equivalent to {{mapreduce.client.submit.file.replication}}. > This value should be around the square root of the number of nodes and not > hard-coded in the code. > {code} > public static final String DFS_REPLICATION_MAX = "dfs.replication.max"; > private int minReplication = 10; > @Override > protected void initializeOp(Configuration hconf) throws HiveException { > ... > int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication); > // minReplication value should not cross the value of dfs.replication.max > minReplication = Math.min(minReplication, dfsMaxReplication); > } > {code} > https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16758) Better Select Number of Replications
[ https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HIVE-16758: --- Status: Patch Available (was: Open) > Better Select Number of Replications > > > Key: HIVE-16758 > URL: https://issues.apache.org/jira/browse/HIVE-16758 > Project: Hive > Issue Type: Improvement >Reporter: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16758.1.patch > > > {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}} > We should be smarter about how we pick a replication number. We should add a > new configuration equivalent to {{mapreduce.client.submit.file.replication}}. > This value should be around the square root of the number of nodes and not > hard-coded in the code. > {code} > public static final String DFS_REPLICATION_MAX = "dfs.replication.max"; > private int minReplication = 10; > @Override > protected void initializeOp(Configuration hconf) throws HiveException { > ... > int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication); > // minReplication value should not cross the value of dfs.replication.max > minReplication = Math.min(minReplication, dfsMaxReplication); > } > {code} > https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery
[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037185#comment-16037185 ] Xuefu Zhang commented on HIVE-6348: --- I'm wondering if it makes more sense to optimize the query rather than banning it. While it might be dumb and inefficient, I don't quite see anything wrong in semantics. > Order by/Sort by in subquery > > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Rui Li >Priority: Minor > Labels: sub-query > Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch > > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15144) JSON.org license is now CatX
[ https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-15144: - Attachment: HIVE-15144.patch Update patch to include fixes from dvoros. > JSON.org license is now CatX > > > Key: HIVE-15144 > URL: https://issues.apache.org/jira/browse/HIVE-15144 > Project: Hive > Issue Type: Bug >Reporter: Robert Kanter >Priority: Blocker > Fix For: 2.2.0 > > Attachments: HIVE-15144.patch, HIVE-15144.patch > > > per [update resolved legal|http://www.apache.org/legal/resolved.html#json]: > {quote} > CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE? > No. As of 2016-11-03 this has been moved to the 'Category X' license list. > Prior to this, use of the JSON Java library was allowed. See Debian's page > for a list of alternatives. > {quote} > I'm not sure when this dependency was first introduced, but it looks like > it's currently used in a few places: > https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15144) JSON.org license is now CatX
[ https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037203#comment-16037203 ] ASF GitHub Bot commented on HIVE-15144: --- Github user omalley closed the pull request at: https://github.com/apache/hive/pull/188 > JSON.org license is now CatX > > > Key: HIVE-15144 > URL: https://issues.apache.org/jira/browse/HIVE-15144 > Project: Hive > Issue Type: Bug >Reporter: Robert Kanter >Priority: Blocker > Fix For: 2.2.0 > > Attachments: HIVE-15144.patch, HIVE-15144.patch > > > per [update resolved legal|http://www.apache.org/legal/resolved.html#json]: > {quote} > CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE? > No. As of 2016-11-03 this has been moved to the 'Category X' license list. > Prior to this, use of the JSON Java library was allowed. See Debian's page > for a list of alternatives. > {quote} > I'm not sure when this dependency was first introduced, but it looks like > it's currently used in a few places: > https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16758) Better Select Number of Replications
[ https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037179#comment-16037179 ] BELUGA BEHR commented on HIVE-16758: Patch: # Set the default number of replications to 1 to support single-node test clusters # Determine the number of replications based on {{mapreduce.client.submit.file.replication}} instead of DFS replication max # Removed logic which increased the Hash Table Sink file replication to be based on the target directory's default replication instead of the configured amount. This is confusing because it overrides a user setting without explaining to the user why their configuration has been changed. Additionally, this replication is about making the data locality reasonable for Executor tasks and not about protecting data. The default replication value has a very different goal than this replication value and therefore should not be linked. > Better Select Number of Replications > > > Key: HIVE-16758 > URL: https://issues.apache.org/jira/browse/HIVE-16758 > Project: Hive > Issue Type: Improvement >Reporter: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16758.1.patch > > > {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}} > We should be smarter about how we pick a replication number. We should add a > new configuration equivalent to {{mapreduce.client.submit.file.replication}}. > This value should be around the square root of the number of nodes and not > hard-coded in the code. > {code} > public static final String DFS_REPLICATION_MAX = "dfs.replication.max"; > private int minReplication = 10; > @Override > protected void initializeOp(Configuration hconf) throws HiveException { > ... > int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication); > // minReplication value should not cross the value of dfs.replication.max > minReplication = Math.min(minReplication, dfsMaxReplication); > } > {code} > https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15144) JSON.org license is now CatX
[ https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-15144: - Attachment: HIVE-15144.patch Remove the json license file. > JSON.org license is now CatX > > > Key: HIVE-15144 > URL: https://issues.apache.org/jira/browse/HIVE-15144 > Project: Hive > Issue Type: Bug >Reporter: Robert Kanter >Priority: Blocker > Fix For: 2.2.0 > > Attachments: HIVE-15144.patch, HIVE-15144.patch, HIVE-15144.patch > > > per [update resolved legal|http://www.apache.org/legal/resolved.html#json]: > {quote} > CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE? > No. As of 2016-11-03 this has been moved to the 'Category X' license list. > Prior to this, use of the JSON Java library was allowed. See Debian's page > for a list of alternatives. > {quote} > I'm not sure when this dependency was first introduced, but it looks like > it's currently used in a few places: > https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery
[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037241#comment-16037241 ] Ashutosh Chauhan commented on HIVE-6348: Indeed optimizing away inner query sort (without limit) is much more user friendly then throwing up exception. > Order by/Sort by in subquery > > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Rui Li >Priority: Minor > Labels: sub-query > Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch > > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16808) WebHCat statusdir parameter doesn't properly handle Unicode characters when using relative path
[ https://issues.apache.org/jira/browse/HIVE-16808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037253#comment-16037253 ] Daniel Dai commented on HIVE-16808: --- +1 > WebHCat statusdir parameter doesn't properly handle Unicode characters when > using relative path > --- > > Key: HIVE-16808 > URL: https://issues.apache.org/jira/browse/HIVE-16808 > Project: Hive > Issue Type: Bug > Components: WebHCat >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16808.01.patch, HIVE-16808.02.patch, > HIVE-16808.03.patch > > > {noformat} > curl http://.:20111/templeton/v1/hive?user.name=hive -d execute="select > count(*) from default.all100k" -d statusdir="/user/hive/düsseldorf7" > curl http://:20111/templeton/v1/hive?user.name=hive -d execute="select > count(*) from default.all100k" -d statusdir="/user/hive/䶴狝A﨩O" > {noformat} > will create statusdirs like so > {noformat} > /user/hive/düsseldorf-1 > drwxr-xr-x - hive hive 0 2017-06-01 19:01 /user/hive/düsseldorf7 > drwxr-xr-x - hive hive 0 2017-06-01 19:08 /user/hive/䶴狝A﨩O > {noformat} > but > {noformat} > curl http://.:20111/templeton/v1/hive?user.name=hive -d execute="select > count(*) from default.all100k" -d statusdir="düsseldorf7" > curl http://:20111/templeton/v1/hive?user.name=hive -d execute="select > count(*) from default.all100k" -d statusdir="䶴狝A﨩O" > {noformat} > Will create > {noformat} > drwxr-xr-x - hive hive 0 2017-06-01 00:27 > /user/hive/d%C3%BCsseldorf7 > drwxr-xr-x - hive hive 0 2017-06-01 22:33 > /user/hive/%E4%B6%B4%E7%8B%9DA%EF%A8%A9O > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16813: Status: Patch Available (was: In Progress) > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Attachments: HIVE-16813.01.patch > > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches
[ https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16797: --- Status: Patch Available (was: Open) > Enhance HiveFilterSetOpTransposeRule to remove union branches > - > > Key: HIVE-16797 > URL: https://issues.apache.org/jira/browse/HIVE-16797 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch > > > in query4.q, we can see that it creates a CTE with union all of 3 branches. > Then it is going to do a 3 way self-join of the CTE with predicates. The > predicates actually specifies only one of the branch in CTE to participate in > the join. Thus, in some cases, e.g., > {code} >/- filter(false) -TS0 > union all - filter(false) -TS1 >\-TS2 > {code} > we can cut the branches of TS0 and TS1. The union becomes only TS2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches
[ https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16797: --- Attachment: HIVE-16797.02.patch > Enhance HiveFilterSetOpTransposeRule to remove union branches > - > > Key: HIVE-16797 > URL: https://issues.apache.org/jira/browse/HIVE-16797 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch > > > in query4.q, we can see that it creates a CTE with union all of 3 branches. > Then it is going to do a 3 way self-join of the CTE with predicates. The > predicates actually specifies only one of the branch in CTE to participate in > the join. Thus, in some cases, e.g., > {code} >/- filter(false) -TS0 > union all - filter(false) -TS1 >\-TS2 > {code} > we can cut the branches of TS0 and TS1. The union becomes only TS2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches
[ https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16797: --- Status: Open (was: Patch Available) > Enhance HiveFilterSetOpTransposeRule to remove union branches > - > > Key: HIVE-16797 > URL: https://issues.apache.org/jira/browse/HIVE-16797 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch > > > in query4.q, we can see that it creates a CTE with union all of 3 branches. > Then it is going to do a 3 way self-join of the CTE with predicates. The > predicates actually specifies only one of the branch in CTE to participate in > the join. Thus, in some cases, e.g., > {code} >/- filter(false) -TS0 > union all - filter(false) -TS1 >\-TS2 > {code} > we can cut the branches of TS0 and TS1. The union becomes only TS2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16758) Better Select Number of Replications
[ https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037254#comment-16037254 ] Hive QA commented on HIVE-16758: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871255/HIVE-16758.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=46) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] (batchId=232) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5532/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5532/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5532/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12871255 - PreCommit-HIVE-Build > Better Select Number of Replications > > > Key: HIVE-16758 > URL: https://issues.apache.org/jira/browse/HIVE-16758 > Project: Hive > Issue Type: Improvement >Reporter: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16758.1.patch > > > {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}} > We should be smarter about how we pick a replication number. We should add a > new configuration equivalent to {{mapreduce.client.submit.file.replication}}. > This value should be around the square root of the number of nodes and not > hard-coded in the code. > {code} > public static final String DFS_REPLICATION_MAX = "dfs.replication.max"; > private int minReplication = 10; > @Override > protected void initializeOp(Configuration hconf) throws HiveException { > ... > int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication); > // minReplication value should not cross the value of dfs.replication.max > minReplication = Math.min(minReplication, dfsMaxReplication); > } > {code} > https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery
[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037261#comment-16037261 ] Vineet Garg commented on HIVE-6348: --- I agree with [~ashutoshc] and [~xuefuz]. If we do insist on letting users know about order by/sort by IMHO showing a warning and then proceeding with the query or optimizing to remove order by would be better. > Order by/Sort by in subquery > > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Rui Li >Priority: Minor > Labels: sub-query > Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch > > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches
[ https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037265#comment-16037265 ] Pengcheng Xiong commented on HIVE-16797: I use pull-up-constant to pull the constant out of union. Then use RexSimplify to simplify that to see if it can be reduced to always false. However, there are still several comments regarding patch 02: (1) it sounds like that I was not able to simplify (($2=1 OR $2=2) AND $2=3) to a false. Here ($2=1 OR $2=2) comes from two branches of union. Thus, I introduce a hive union merge rule (btw, calcite union merge rule does not fire well in current hive master) so that we can check ($2=1 AND $2=3) and ($2=2 AND $2=3), repectively, which works with RexSimplify. (2) it sounds like RexSimplify also can not reduce ($2>2 AND $2=3) to false. There is a test case in filter_union.q for that and you will see. (3) if we can assume that it is always a project under union, we may have better options. > Enhance HiveFilterSetOpTransposeRule to remove union branches > - > > Key: HIVE-16797 > URL: https://issues.apache.org/jira/browse/HIVE-16797 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch > > > in query4.q, we can see that it creates a CTE with union all of 3 branches. > Then it is going to do a 3 way self-join of the CTE with predicates. The > predicates actually specifies only one of the branch in CTE to participate in > the join. Thus, in some cases, e.g., > {code} >/- filter(false) -TS0 > union all - filter(false) -TS1 >\-TS2 > {code} > we can cut the branches of TS0 and TS1. The union becomes only TS2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches
[ https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037265#comment-16037265 ] Pengcheng Xiong edited comment on HIVE-16797 at 6/5/17 5:42 PM: I use pull-up-constant to pull the constant out of union. Then use RexSimplify to simplify that to see if it can be reduced to always false. However, there are still several comments regarding patch 02: (1) it sounds like that I was not able to simplify (($2=1 OR $2=2) AND $2=3) to a false. Here ($2=1 OR $2=2) comes from two branches of union. Thus, I introduce a hive union merge rule (btw, calcite union merge rule does not fire well in current hive master) so that we can check ($2=1 AND $2=3) and ($2=2 AND $2=3), repectively, which works with RexSimplify. (2) it sounds like RexSimplify also can not reduce ($2>2 AND $2=3) to false. There is a test case in filter_union.q for that and you will see. (3) if we can assume that it is always a project under union, we may have better options. (4) for tpcds queries, current patch is good enough. was (Author: pxiong): I use pull-up-constant to pull the constant out of union. Then use RexSimplify to simplify that to see if it can be reduced to always false. However, there are still several comments regarding patch 02: (1) it sounds like that I was not able to simplify (($2=1 OR $2=2) AND $2=3) to a false. Here ($2=1 OR $2=2) comes from two branches of union. Thus, I introduce a hive union merge rule (btw, calcite union merge rule does not fire well in current hive master) so that we can check ($2=1 AND $2=3) and ($2=2 AND $2=3), repectively, which works with RexSimplify. (2) it sounds like RexSimplify also can not reduce ($2>2 AND $2=3) to false. There is a test case in filter_union.q for that and you will see. (3) if we can assume that it is always a project under union, we may have better options. > Enhance HiveFilterSetOpTransposeRule to remove union branches > - > > Key: HIVE-16797 > URL: https://issues.apache.org/jira/browse/HIVE-16797 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch > > > in query4.q, we can see that it creates a CTE with union all of 3 branches. > Then it is going to do a 3 way self-join of the CTE with predicates. The > predicates actually specifies only one of the branch in CTE to participate in > the join. Thus, in some cases, e.g., > {code} >/- filter(false) -TS0 > union all - filter(false) -TS1 >\-TS2 > {code} > we can cut the branches of TS0 and TS1. The union becomes only TS2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037283#comment-16037283 ] ASF GitHub Bot commented on HIVE-16813: --- GitHub user sankarh opened a pull request: https://github.com/apache/hive/pull/192 HIVE-16813: Incremental REPL LOAD should load the events in the same sequence as it is dumped. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sankarh/hive HIVE-16813 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/192.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #192 commit ba709ab7101cad69d5f6fc82bf031fea217e669e Author: Sankar HariappanDate: 2017-06-05T17:53:34Z HIVE-16813: Incremental REPL LOAD should load the events in the same sequence as it is dumped. > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16813: Attachment: HIVE-16813.01.patch Added 01.patch with below changes. - Added a custom comparator (EventDumpDirComparator) to sort the directories listed during REPL LOAD. It compares the dir length before comparing the directory name Strings. - Added unit tests to verify the new comparator and also to verify the bug with default FileStatus comparator. > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Attachments: HIVE-16813.01.patch > > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037291#comment-16037291 ] Sankar Hariappan edited comment on HIVE-16813 at 6/5/17 6:00 PM: - Added 01.patch with below changes. - Added a custom comparator (EventDumpDirComparator) to sort the directories listed during REPL LOAD. It compares the dir length before comparing the directory name Strings. - Added unit tests to verify the new comparator and also to verify the bug with default FileStatus comparator. Request [~anishek] / [~sushanth] to kindly review the patch! cc [~thejas] was (Author: sankarh): Added 01.patch with below changes. - Added a custom comparator (EventDumpDirComparator) to sort the directories listed during REPL LOAD. It compares the dir length before comparing the directory name Strings. - Added unit tests to verify the new comparator and also to verify the bug with default FileStatus comparator. > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Attachments: HIVE-16813.01.patch > > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038188#comment-16038188 ] Hive QA commented on HIVE-16821: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871471/HIVE-16821.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] (batchId=232) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5541/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5541/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5541/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12871471 - PreCommit-HIVE-Build > Vectorization: support Explain Analyze in vectorized mode > - > > Key: HIVE-16821 > URL: https://issues.apache.org/jira/browse/HIVE-16821 > Project: Hive > Issue Type: Bug > Components: Diagnosability, Vectorization >Affects Versions: 2.1.1, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, > HIVE-16821.2.patch > > > Currently, to avoid a branch in the operator inner loop - the runtime stats > are only available in non-vector mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery
[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038237#comment-16038237 ] Vineet Garg commented on HIVE-6348: --- I think it's better to remove it in AST or during logical plan generation. Because once HiveSubqueryRemoveRule is executed, which is the very first rule, subquery will be rewritten into join and there is no way to figure out if original query had a subquery. > Order by/Sort by in subquery > > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Rui Li >Priority: Minor > Labels: sub-query > Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch > > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038240#comment-16038240 ] Hive QA commented on HIVE-16813: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871479/HIVE-16813.01.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10816 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] (batchId=232) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=239) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5542/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5542/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5542/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12871479 - PreCommit-HIVE-Build > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Attachments: HIVE-16813.01.patch > > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16813: Status: Open (was: Patch Available) > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Attachments: HIVE-16813.01.patch > > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16813: Attachment: HIVE-16813.01.patch > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Attachments: HIVE-16813.01.patch > > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16813: Attachment: (was: HIVE-16813.01.patch) > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Attachments: HIVE-16813.01.patch > > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038186#comment-16038186 ] Gopal V commented on HIVE-16821: Map 1 is getting vectorized due to {{hive.vectorized.use.vector.serde.deserialize=true}} & the operator-ids change when the vectorizer runs. I'll do a few more scale tests to make sure that the VRB calls are not accidentally going the parent method. > Vectorization: support Explain Analyze in vectorized mode > - > > Key: HIVE-16821 > URL: https://issues.apache.org/jira/browse/HIVE-16821 > Project: Hive > Issue Type: Bug > Components: Diagnosability, Vectorization >Affects Versions: 2.1.1, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, > HIVE-16821.2.patch > > > Currently, to avoid a branch in the operator inner loop - the runtime stats > are only available in non-vector mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (HIVE-16785) Ensure replication actions are idempotent if any series of events are applied again.
[ https://issues.apache.org/jira/browse/HIVE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16785 started by Sankar Hariappan. --- > Ensure replication actions are idempotent if any series of events are applied > again. > > > Key: HIVE-16785 > URL: https://issues.apache.org/jira/browse/HIVE-16785 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > > Some of the events(ALTER, RENAME, TRUNCATE) are not idempotent and hence > leads to failure of REPL LOAD if applied twice or applied on an object which > is latest than current event. For example, if TRUNCATE is applied on a table > which is already dropped will fail instead of noop. > Also, need to consider the scenario where the object is missing while > applying an event. For example, if RENAME_TABLE event is applied on target > where the old table is missing should validate if table should be recreated > or should treat the event as noop. This can be done by verifying the DB level > last repl ID against the current event ID. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.
[ https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16813: Status: Patch Available (was: Open) > Incremental REPL LOAD should load the events in the same sequence as it is > dumped. > -- > > Key: HIVE-16813 > URL: https://issues.apache.org/jira/browse/HIVE-16813 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Attachments: HIVE-16813.01.patch > > > Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata > and data files corresponding to the event. The event is dumped in the same > sequence in which it was generated. > Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and > sort it using compareTo algorithm of FileStatus class which doesn't check the > length before sorting it alphabetically. > Due to this, the event-100 is processed before event-99 and hence making the > replica database non-sync with source. > Need to use a customized compareTo algorithm to sort the FileStatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038154#comment-16038154 ] Prasanth Jayachandran commented on HIVE-16821: -- Why would this patch make Map 1 vectorized (in explain diff)? Also don't understand why would this change operator Ids. Other than that looks good to me. +1 > Vectorization: support Explain Analyze in vectorized mode > - > > Key: HIVE-16821 > URL: https://issues.apache.org/jira/browse/HIVE-16821 > Project: Hive > Issue Type: Bug > Components: Diagnosability, Vectorization >Affects Versions: 2.1.1, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, > HIVE-16821.2.patch > > > Currently, to avoid a branch in the operator inner loop - the runtime stats > are only available in non-vector mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037305#comment-16037305 ] Chao Sun commented on HIVE-11297: - [~kellyzly]: it seems the same TableScan [could be added multiple times|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java#L116] in {{SplitOpTreeForDPP}}, and so multiple MapWorks are generated for the same TableScan. Can you check if we can avoid doing that? > Combine op trees for partition info generating tasks [Spark branch] > --- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15144) JSON.org license is now CatX
[ https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037334#comment-16037334 ] Hive QA commented on HIVE-15144: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871260/HIVE-15144.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 272 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_aggregate_9] (batchId=37) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_aggregate_without_gby] (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_between_columns] (batchId=65) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_groupby] (batchId=77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_bround] (batchId=31) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_bucket] (batchId=25) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_cast_constant] (batchId=8) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_2] (batchId=66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_4] (batchId=84) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_mapjoin1] (batchId=31) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_simple] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_coalesce] (batchId=10) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_coalesce_2] (batchId=68) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_complex_join] (batchId=42) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_count] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_data_types] (batchId=72) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_date_1] (batchId=20) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_aggregate] (batchId=17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_cast] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_expressions] (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_mapjoin] (batchId=53) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_math_funcs] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_precision] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round] (batchId=34) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round_2] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_udf2] (batchId=69) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_distinct_2] (batchId=49) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_elt] (batchId=34) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_empty_where] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby6] (batchId=83) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_3] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_mapjoin] (batchId=71) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] (batchId=53) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_grouping_sets] (batchId=79) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_if_expr] (batchId=10) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_include_no_sel] (batchId=4) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_1] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_arithmetic] (batchId=4) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_mapjoin] (batchId=36) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_left_outer_join2] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_left_outer_join] (batchId=21) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=74) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mr_diff_schema_alias] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_multi_insert] (batchId=81) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_non_constant_in_expr] (batchId=71) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_non_string_partition] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_null_projection] (batchId=9)
[jira] [Assigned] (HIVE-16825) NPE on parallel DP creation
[ https://issues.apache.org/jira/browse/HIVE-16825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-16825: --- > NPE on parallel DP creation > --- > > Key: HIVE-16825 > URL: https://issues.apache.org/jira/browse/HIVE-16825 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > > {noformat} > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1885) > at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1862) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16825) NPE on parallel DP creation
[ https://issues.apache.org/jira/browse/HIVE-16825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16825: Reporter: Dharmesh Kakadia (was: Sergey Shelukhin) > NPE on parallel DP creation > --- > > Key: HIVE-16825 > URL: https://issues.apache.org/jira/browse/HIVE-16825 > Project: Hive > Issue Type: Bug >Reporter: Dharmesh Kakadia >Assignee: Prasanth Jayachandran > > {noformat} > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1885) > at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1862) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
[ https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037383#comment-16037383 ] Prasanth Jayachandran commented on HIVE-16323: -- metaStoreClient is causing NPE in a test. We should use getMSC() instead. > HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204 > --- > > Key: HIVE-16323 > URL: https://issues.apache.org/jira/browse/HIVE-16323 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, PM_leak.png > > > Hive.loadDynamicPartitions creates threads with new embedded rawstore, but > never close them, thus we leak PersistenceManager one per such thread. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
[ https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037385#comment-16037385 ] Sergey Shelukhin commented on HIVE-16323: - This exposes HIVE-16825, should be fixed before commit. > HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204 > --- > > Key: HIVE-16323 > URL: https://issues.apache.org/jira/browse/HIVE-16323 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, PM_leak.png > > > Hive.loadDynamicPartitions creates threads with new embedded rawstore, but > never close them, thus we leak PersistenceManager one per such thread. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HIVE-16825) NPE on parallel DP creation
[ https://issues.apache.org/jira/browse/HIVE-16825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-16825. - Resolution: Invalid Part of HIVE-16323 that is not committed yet > NPE on parallel DP creation > --- > > Key: HIVE-16825 > URL: https://issues.apache.org/jira/browse/HIVE-16825 > Project: Hive > Issue Type: Bug >Reporter: Dharmesh Kakadia >Assignee: Prasanth Jayachandran > > {noformat} > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1885) > at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1862) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
[ https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037396#comment-16037396 ] Sergey Shelukhin commented on HIVE-16323: - Also; Hive client is used via a threadlocal. Is sharing metastore client between threads safe? That exception seems to imply someone closes metastore client while the pool threads are still running. I am guessing other code doesn't hit that because it calls getMSC, but the whole thing where some threads close and null it out and other threads reopen it, seems fragile. > HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204 > --- > > Key: HIVE-16323 > URL: https://issues.apache.org/jira/browse/HIVE-16323 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, PM_leak.png > > > Hive.loadDynamicPartitions creates threads with new embedded rawstore, but > never close them, thus we leak PersistenceManager one per such thread. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16452) Database UUID for metastore DB
[ https://issues.apache.org/jira/browse/HIVE-16452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036608#comment-16036608 ] Lefty Leverenz commented on HIVE-16452: --- [~vihangk1], do you need anything more from me for this? Putting the information in the wiki is more important than finding the best location -- we can always move it later. > Database UUID for metastore DB > -- > > Key: HIVE-16452 > URL: https://issues.apache.org/jira/browse/HIVE-16452 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Fix For: 3.0.0 > > > In cloud environments it is possible that a same database instance is used as > the long running metadata persistence layer and multiple HMS access this > database. These HMS instances could be running the same time or in case of > transient workloads come up on an on-demand basis. HMS is used by multiple > projects in the Hadoop eco-system as the de-facto metadata keeper for various > SQL engines on the cluster. Currently, there is no way to uniquely identify > the database instance which is backing the HMS. For example, if there are two > instances of HMS running on top of same metastore DB, there is no way to > identify that data received from both the metastore clients is coming from > the same database. Similarly, if there in case of transient workloads > multiple HMS services come up and go, a external application which is > fetching data from a HMS has no way to identify that these multiple instances > of HMS are in fact returning the same data. > We can potentially use the combination of javax.jdo.option.ConnectionURL, > javax.jdo.option.ConnectionDriverName configuration of each HMS instance but > this is approach may not be very robust. If the database is migrated to > another server for some reason the ConnectionURL can change. Having a UUID in > the metastore DB which can be queried using a Thrift API can help solve this > problem. This way any application talking to multiple HMS instances can > recognize if the data is coming the same backing database. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery
[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036612#comment-16036612 ] Rui Li commented on HIVE-6348: -- The latest failures are due to the sub-query order/sort by in our tests. I'd like to get some feedbacks before updating them. cc [~hagleitn], [~ashutoshc], [~xuefuz]. Do you think the proposal makes sense? > Order by/Sort by in subquery > > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Rui Li >Priority: Minor > Labels: sub-query > Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch > > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size
[ https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-16503: -- Labels: TODOC3.0 (was: ) > LLAP: Oversubscribe memory for noconditional task size > -- > > Key: HIVE-16503 > URL: https://issues.apache.org/jira/browse/HIVE-16503 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch, > HIVE-16503.3.patch, HIVE-16503.4.patch > > > When running map joins in llap, it can potentially use more memory for hash > table loading (assuming other executors in the daemons have some memory to > spare). This map join conversion decision has to be made during compilation > that can provide some more room for LLAP. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16573: --- Attachment: HIVE-16573.1.patch Generate the patch file based on master branch > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size
[ https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036625#comment-16036625 ] Lefty Leverenz commented on HIVE-16503: --- Doc note: This adds two configs (*hive.llap.mapjoin.memory.oversubscribe.factor* and *hive.llap.memory.oversubscription.max.executors.per.query*) to HiveConf.java, so they need to be documented in the wiki. * [Configuration Properties -- LLAP | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-LLAP] Added a TODOC3.0 label. > LLAP: Oversubscribe memory for noconditional task size > -- > > Key: HIVE-16503 > URL: https://issues.apache.org/jira/browse/HIVE-16503 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch, > HIVE-16503.3.patch, HIVE-16503.4.patch > > > When running map joins in llap, it can potentially use more memory for hash > table loading (assuming other executors in the daemons have some memory to > spare). This map join conversion decision has to be made during compilation > that can provide some more room for LLAP. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16573: --- Attachment: (was: HIVE-16573-branch2.3.patch) > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16768) NOT operator returns NULL from result of <=>
[ https://issues.apache.org/jira/browse/HIVE-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036634#comment-16036634 ] Fei Hui commented on HIVE-16768: [~pxiong] HIVE-15517 is not fixed on 2.1.1, should we pick it up on branch-2.1? > NOT operator returns NULL from result of <=> > > > Key: HIVE-16768 > URL: https://issues.apache.org/jira/browse/HIVE-16768 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: Alexander Sterligov >Assignee: Fei Hui > > {{SELECT "foo" <=> null;}} > returns {{false}} as expected. > {{SELECT NOT("foo" <=> null);}} > returns NULL, but should return {{true}}. > Workaround is > {{SELECT NOT(COALESCE("foo" <=> null));}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16768) NOT operator returns NULL from result of <=>
[ https://issues.apache.org/jira/browse/HIVE-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036636#comment-16036636 ] Pengcheng Xiong commented on HIVE-16768: [~ferhui] ,sorry my bad, it is said that it is fixed in 2.2 while 2.2 has not been published yet... > NOT operator returns NULL from result of <=> > > > Key: HIVE-16768 > URL: https://issues.apache.org/jira/browse/HIVE-16768 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: Alexander Sterligov >Assignee: Fei Hui > > {{SELECT "foo" <=> null;}} > returns {{false}} as expected. > {{SELECT NOT("foo" <=> null);}} > returns NULL, but should return {{true}}. > Workaround is > {{SELECT NOT(COALESCE("foo" <=> null));}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
[ https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036644#comment-16036644 ] Lefty Leverenz commented on HIVE-16343: --- [~prasanth_j], so far the particular LLAP metrics haven't been documented. But should they be? And if so, where -- the LLAP design doc or the Metrics doc? * [LLAP -- Monitoring | https://cwiki.apache.org/confluence/display/Hive/LLAP#LLAP-Monitoring] * [Hive Metrics | https://cwiki.apache.org/confluence/display/Hive/Hive+Metrics] > LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring > > > Key: HIVE-16343 > URL: https://issues.apache.org/jira/browse/HIVE-16343 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0 > > Attachments: HIVE-16343.1.patch, HIVE-16343.2.patch > > > Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful > for monitoring and also setting up triggers via JMC. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
[ https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin reassigned HIVE-16824: > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers > -- > > Key: HIVE-16824 > URL: https://issues.apache.org/jira/browse/HIVE-16824 > Project: Hive > Issue Type: Bug >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
[ https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16824: - Attachment: HIVE-16824.1.patch > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers > -- > > Key: HIVE-16824 > URL: https://issues.apache.org/jira/browse/HIVE-16824 > Project: Hive > Issue Type: Bug >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16824.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
[ https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16824: - Description: PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers > -- > > Key: HIVE-16824 > URL: https://issues.apache.org/jira/browse/HIVE-16824 > Project: Hive > Issue Type: Bug >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16824.1.patch > > > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
[ https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16824: - Status: Patch Available (was: Open) > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers > -- > > Key: HIVE-16824 > URL: https://issues.apache.org/jira/browse/HIVE-16824 > Project: Hive > Issue Type: Bug >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16824.1.patch > > > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
[ https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036668#comment-16036668 ] ZhangBing Lin commented on HIVE-16824: -- Submit a patch! > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers > -- > > Key: HIVE-16824 > URL: https://issues.apache.org/jira/browse/HIVE-16824 > Project: Hive > Issue Type: Bug >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16824.1.patch > > > PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails
[ https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-16780: Attachment: HIVE-16780.patch > Case "multiple sources, single key" in spark_dynamic_pruning.q fails > - > > Key: HIVE-16780 > URL: https://issues.apache.org/jira/browse/HIVE-16780 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16780.patch > > > script.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > set hive.spark.dynamic.partition.pruning=true; > -- multiple sources, single key > select count(*) from srcpart join srcpart_date on (srcpart.ds = > srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr) > {code} > if disabling "hive.optimize.index.filter", case passes otherwise it always > hang out in the first job. Exception > {code} > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 PerfLogger: method=SparkInitializeOperators start=1495899585574 end=1495899585933 > duration=359 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler> > 17/05/27 23:39:45 INFO Executor task launch worker-0 Utilities: PLAN PATH = > hdfs://bdpe41:8020/tmp/hive/root/029a2d8a-c6e5-4ea9-adea-ef8fbea3cde2/hive_2017-05-27_23-39-06_464_5915518562441677640-1/-mr-10007/617d9dd6-9f9a-4786-8131-a7b98e8abc3e/map.xml > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 Utilities: Found plan > in cache for name: map.xml > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 DFSClient: Connecting > to datanode 10.239.47.162:50010 > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 MapOperator: Processing > alias(es) srcpart_hour for file > hdfs://bdpe41:8020/user/hive/warehouse/srcpart_hour/08_0 > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 ObjectCache: Creating > root_20170527233906_ac2934e1-2e58-4116-9f0d-35dee302d689_DynamicValueRegistry > 17/05/27 23:39:45 ERROR Executor task launch worker-0 SparkMapRecordHandler: > Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: Hive > Runtime Error while processing row {"hr":"11","hour":"11"} > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row {"hr":"11","hour":"11"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:136) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127) > at > org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) > at > org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Failed to retrieve dynamic value > for RS_7_srcpart__col3_min > at > org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:126) > at > org.apache.hadoop.hive.ql.plan.DynamicValue.getWritableValue(DynamicValue.java:101) > at > org.apache.hadoop.hive.ql.exec.ExprNodeDynamicValueEvaluator._evaluate(ExprNodeDynamicValueEvaluator.java:51) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80) > at >
[jira] [Commented] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails
[ https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036670#comment-16036670 ] liyunzhang_intel commented on HIVE-16780: - [~csun]: case "multiple sources, single key" pass if hive.tez.dynamic.semijoin.reduction is false. bq.Maybe we should first disable this optimization for Spark in DynamicPartitionPruningOptimization agree, update HIVE-16780.1.patch. the explain when enabling hive.tez.dynamic.semijoin.reduction {noformat} TAGE DEPENDENCIES: Stage-2 is a root stage Stage-3 depends on stages: Stage-2 Stage-1 depends on stages: Stage-3 Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Spark DagName: root_20170605152828_4c4f4f82-d08f-41e9-9a07-4147b8529dd0:2 Vertices: Map 4 Map Operator Tree: TableScan alias: srcpart_date filterExpr: ds is not null (type: boolean) Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ds is not null (type: boolean) Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE Column stats: NONE Spark HashTable Sink Operator keys: 0 ds (type: string) 1 ds (type: string) Select Operator expressions: ds (type: string) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE Column stats: NONE Spark Partition Pruning Sink Operator partition key expr: ds Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE Column stats: NONE target column name: ds target work: Map 1 Local Work: Map Reduce Local Work Map 5 Map Operator Tree: TableScan alias: srcpart_hour filterExpr: (hr is not null and (hr BETWEEN DynamicValue(RS_7_srcpart__col3_min) AND DynamicValue(RS_7_srcpart__col3_max) and in_bloom_filter(hr, DynamicValue(RS_7_srcpart__col3_bloom_filter (type: boolean) Statistics: Num rows: 2 Data size: 10 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (hr is not null and (hr BETWEEN DynamicValue(RS_7_srcpart__col3_min) AND DynamicValue(RS_7_srcpart__col3_max) and in_bloom_filter(hr, DynamicValue(RS_7_srcpart__col3_bloom_filter (type: boolean) Statistics: Num rows: 2 Data size: 10 Basic stats: COMPLETE Column stats: NONE Spark HashTable Sink Operator keys: 0 _col3 (type: string) 1 hr (type: string) Select Operator expressions: hr (type: string) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 10 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 10 Basic stats: COMPLETE Column stats: NONE Spark Partition Pruning Sink Operator partition key expr: hr Statistics: Num rows: 2 Data size: 10 Basic stats: COMPLETE Column stats: NONE target column name: hr target work: Map 1 Local Work: Map Reduce Local Work Stage: Stage-3 Spark DagName: root_20170605152828_4c4f4f82-d08f-41e9-9a07-4147b8529dd0:3 Vertices: Map 4 Map Operator Tree: TableScan alias: srcpart_date filterExpr: ds is not null (type: boolean) Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ds is not null (type: boolean) Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE Column stats: NONE Spark HashTable Sink Operator keys: 0 ds (type: string) 1 ds (type: string) Select
[jira] [Updated] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails
[ https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-16780: Status: Patch Available (was: Open) > Case "multiple sources, single key" in spark_dynamic_pruning.q fails > - > > Key: HIVE-16780 > URL: https://issues.apache.org/jira/browse/HIVE-16780 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16780.patch > > > script.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > set hive.spark.dynamic.partition.pruning=true; > -- multiple sources, single key > select count(*) from srcpart join srcpart_date on (srcpart.ds = > srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr) > {code} > if disabling "hive.optimize.index.filter", case passes otherwise it always > hang out in the first job. Exception > {code} > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 PerfLogger: method=SparkInitializeOperators start=1495899585574 end=1495899585933 > duration=359 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler> > 17/05/27 23:39:45 INFO Executor task launch worker-0 Utilities: PLAN PATH = > hdfs://bdpe41:8020/tmp/hive/root/029a2d8a-c6e5-4ea9-adea-ef8fbea3cde2/hive_2017-05-27_23-39-06_464_5915518562441677640-1/-mr-10007/617d9dd6-9f9a-4786-8131-a7b98e8abc3e/map.xml > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 Utilities: Found plan > in cache for name: map.xml > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 DFSClient: Connecting > to datanode 10.239.47.162:50010 > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 MapOperator: Processing > alias(es) srcpart_hour for file > hdfs://bdpe41:8020/user/hive/warehouse/srcpart_hour/08_0 > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 ObjectCache: Creating > root_20170527233906_ac2934e1-2e58-4116-9f0d-35dee302d689_DynamicValueRegistry > 17/05/27 23:39:45 ERROR Executor task launch worker-0 SparkMapRecordHandler: > Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: Hive > Runtime Error while processing row {"hr":"11","hour":"11"} > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row {"hr":"11","hour":"11"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:136) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127) > at > org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) > at > org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Failed to retrieve dynamic value > for RS_7_srcpart__col3_min > at > org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:126) > at > org.apache.hadoop.hive.ql.plan.DynamicValue.getWritableValue(DynamicValue.java:101) > at > org.apache.hadoop.hive.ql.exec.ExprNodeDynamicValueEvaluator._evaluate(ExprNodeDynamicValueEvaluator.java:51) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80) > at >
[jira] [Updated] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
[ https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-16323: -- Attachment: HIVE-16323.3.patch Switch metaStoreClient to getMSC, also close syncMetaStoreClient as Thejas commented. I think this Hive client is only shared with load-dynamic-partitions threads, and within the threads, write operations are synchronized via SynchronizedMetaStoreClient. cc [~rajesh.balamohan]. > HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204 > --- > > Key: HIVE-16323 > URL: https://issues.apache.org/jira/browse/HIVE-16323 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, > HIVE-16323.3.patch, PM_leak.png > > > Hive.loadDynamicPartitions creates threads with new embedded rawstore, but > never close them, thus we leak PersistenceManager one per such thread. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16571) HiveServer2: Prefer LIFO over round-robin for Tez session reuse
[ https://issues.apache.org/jira/browse/HIVE-16571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16571: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master > HiveServer2: Prefer LIFO over round-robin for Tez session reuse > --- > > Key: HIVE-16571 > URL: https://issues.apache.org/jira/browse/HIVE-16571 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Tez >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > Fix For: 3.0.0 > > Attachments: HIVE-16571.2.patch, HIVE-16571.patch > > > Currently Tez session reuse is entirely round-robin, which means a single > user might have to run upto 32 queries before reusing a warm session on a > HiveServer2. > This is not the case when session reuse is disabled, with a user warming up > their session on the 1st query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery
[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037458#comment-16037458 ] Carter Shanklin commented on HIVE-6348: --- I don't think banning is a good idea, there's just no way to know what will break in user's environments. > Order by/Sort by in subquery > > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Rui Li >Priority: Minor > Labels: sub-query > Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch > > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails
[ https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037480#comment-16037480 ] Chao Sun commented on HIVE-16780: - {quote} One interesting thing is when enabling hive.tez.dynamic.semijoin.reduction, there is an extra reduce Reducer 2 <- Map 6 (GROUP, 1). But what's purpose of Reducer 2? {quote} I think that's for the aggregation of min/max and bloom filter. See [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java#L489]. > Case "multiple sources, single key" in spark_dynamic_pruning.q fails > - > > Key: HIVE-16780 > URL: https://issues.apache.org/jira/browse/HIVE-16780 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16780.patch > > > script.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > set hive.spark.dynamic.partition.pruning=true; > -- multiple sources, single key > select count(*) from srcpart join srcpart_date on (srcpart.ds = > srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr) > {code} > if disabling "hive.optimize.index.filter", case passes otherwise it always > hang out in the first job. Exception > {code} > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 PerfLogger: method=SparkInitializeOperators start=1495899585574 end=1495899585933 > duration=359 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler> > 17/05/27 23:39:45 INFO Executor task launch worker-0 Utilities: PLAN PATH = > hdfs://bdpe41:8020/tmp/hive/root/029a2d8a-c6e5-4ea9-adea-ef8fbea3cde2/hive_2017-05-27_23-39-06_464_5915518562441677640-1/-mr-10007/617d9dd6-9f9a-4786-8131-a7b98e8abc3e/map.xml > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 Utilities: Found plan > in cache for name: map.xml > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 DFSClient: Connecting > to datanode 10.239.47.162:50010 > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 MapOperator: Processing > alias(es) srcpart_hour for file > hdfs://bdpe41:8020/user/hive/warehouse/srcpart_hour/08_0 > 17/05/27 23:39:45 DEBUG Executor task launch worker-0 ObjectCache: Creating > root_20170527233906_ac2934e1-2e58-4116-9f0d-35dee302d689_DynamicValueRegistry > 17/05/27 23:39:45 ERROR Executor task launch worker-0 SparkMapRecordHandler: > Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: Hive > Runtime Error while processing row {"hr":"11","hour":"11"} > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row {"hr":"11","hour":"11"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:136) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127) > at > org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) > at > org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Failed to retrieve dynamic value > for RS_7_srcpart__col3_min > at > org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:126) > at >
[jira] [Assigned] (HIVE-16827) Merge stats task and column stats task into a single task
[ https://issues.apache.org/jira/browse/HIVE-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reassigned HIVE-16827: -- > Merge stats task and column stats task into a single task > - > > Key: HIVE-16827 > URL: https://issues.apache.org/jira/browse/HIVE-16827 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > > Within the task, we can specify whether to compute basic stats only or column > stats only or both. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16826) Improvements for SeparatedValuesOutputFormat
[ https://issues.apache.org/jira/browse/HIVE-16826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HIVE-16826: --- Description: Proposing changes to class {{org.apache.hive.beeline.SeparatedValuesOutputFormat}}. # Simplify the code # Code currently creates and destroys {{CsvListWriter}}, which contains a buffer, for every line printed # Use Apache Commons libraries for certain actions # Prefer non-synchronized {{StringBuilderWriter}} to Java's synchronized {{StringWriter}} was: Proposing changes to class {{org.apache.hive.beeline.SeparatedValuesOutputFormat}}. # Simplify the code # Code currently creates and destroys {{CsvListWriter}}, which contains a buffer, for every line printed # Use Apache Commons libraries for certain actions > Improvements for SeparatedValuesOutputFormat > > > Key: HIVE-16826 > URL: https://issues.apache.org/jira/browse/HIVE-16826 > Project: Hive > Issue Type: Improvement > Components: Beeline >Affects Versions: 2.1.1, 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > > Proposing changes to class > {{org.apache.hive.beeline.SeparatedValuesOutputFormat}}. > # Simplify the code > # Code currently creates and destroys {{CsvListWriter}}, which contains a > buffer, for every line printed > # Use Apache Commons libraries for certain actions > # Prefer non-synchronized {{StringBuilderWriter}} to Java's synchronized > {{StringWriter}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-14514) OrcRecordUpdater should clone writerOptions when creating delete event writers
[ https://issues.apache.org/jira/browse/HIVE-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14514: -- Priority: Critical (was: Minor) > OrcRecordUpdater should clone writerOptions when creating delete event writers > -- > > Key: HIVE-14514 > URL: https://issues.apache.org/jira/browse/HIVE-14514 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Eugene Koifman >Priority: Critical > > When split-update is enabled for ACID, OrcRecordUpdater creates two sets of > writers: one for the insert deltas and one for the delete deltas. The > deleteEventWriter is initialized with similar writerOptions as the normal > writer, except that it has a different callback handler. Due to the lack of > copy constructor/ clone() method in writerOptions, the same writerOptions > object is mutated to specify a different callback for the delete case. > Although, this is harmless for now, but it may become a source of confusion > and possible error in future. The ideal way to fix this would be to create a > clone() method for writerOptions- however this requires that the parent class > of WriterOptions in the OrcFile.WriterOptions should implement Cloneable or > provide a copy constructor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
[ https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037638#comment-16037638 ] Hive QA commented on HIVE-16323: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871284/HIVE-16323.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] (batchId=232) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5536/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5536/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5536/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12871284 - PreCommit-HIVE-Build > HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204 > --- > > Key: HIVE-16323 > URL: https://issues.apache.org/jira/browse/HIVE-16323 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, > HIVE-16323.3.patch, PM_leak.png > > > Hive.loadDynamicPartitions creates threads with new embedded rawstore, but > never close them, thus we leak PersistenceManager one per such thread. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16826) Improvements for SeparatedValuesOutputFormat
[ https://issues.apache.org/jira/browse/HIVE-16826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HIVE-16826: --- Status: Patch Available (was: Open) > Improvements for SeparatedValuesOutputFormat > > > Key: HIVE-16826 > URL: https://issues.apache.org/jira/browse/HIVE-16826 > Project: Hive > Issue Type: Improvement > Components: Beeline >Affects Versions: 2.1.1, 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16826.1.patch > > > Proposing changes to class > {{org.apache.hive.beeline.SeparatedValuesOutputFormat}}. > # Simplify the code > # Code currently creates and destroys {{CsvListWriter}}, which contains a > buffer, for every line printed > # Use Apache Commons libraries for certain actions > # Prefer non-synchronized {{StringBuilderWriter}} to Java's synchronized > {{StringWriter}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16826) Improvements for SeparatedValuesOutputFormat
[ https://issues.apache.org/jira/browse/HIVE-16826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HIVE-16826: --- Attachment: HIVE-16826.1.patch > Improvements for SeparatedValuesOutputFormat > > > Key: HIVE-16826 > URL: https://issues.apache.org/jira/browse/HIVE-16826 > Project: Hive > Issue Type: Improvement > Components: Beeline >Affects Versions: 2.1.1, 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16826.1.patch > > > Proposing changes to class > {{org.apache.hive.beeline.SeparatedValuesOutputFormat}}. > # Simplify the code > # Code currently creates and destroys {{CsvListWriter}}, which contains a > buffer, for every line printed > # Use Apache Commons libraries for certain actions > # Prefer non-synchronized {{StringBuilderWriter}} to Java's synchronized > {{StringWriter}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches
[ https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037454#comment-16037454 ] Hive QA commented on HIVE-16797: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871266/HIVE-16797.02.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 10822 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=46) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_aggr] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union24] (batchId=57) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union30] (batchId=74) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union34] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[unionall_unbalancedppd] (batchId=2) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_union_multiinsert] (batchId=151) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query33] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query56] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query5] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query60] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query71] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query76] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query77] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query80] (batchId=232) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union30] (batchId=134) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5534/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5534/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5534/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 25 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12871266 - PreCommit-HIVE-Build > Enhance HiveFilterSetOpTransposeRule to remove union branches > - > > Key: HIVE-16797 > URL: https://issues.apache.org/jira/browse/HIVE-16797 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch > > > in query4.q, we can see that it creates a CTE with union all of 3 branches. > Then it is going to do a 3 way self-join of the CTE with predicates. The > predicates actually specifies only one of the branch in CTE to participate in > the join. Thus, in some cases, e.g., > {code} >/- filter(false) -TS0 > union all - filter(false) -TS1 >\-TS2 > {code} > we can cut the branches of TS0 and TS1. The union becomes only TS2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16804) Semijoin hint : Needs support for target table.
[ https://issues.apache.org/jira/browse/HIVE-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16804: -- Attachment: HIVE-16804.2.patch Added exceptions. If a hint fails to create an edge, it should throw. > Semijoin hint : Needs support for target table. > --- > > Key: HIVE-16804 > URL: https://issues.apache.org/jira/browse/HIVE-16804 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-16804.1.patch, HIVE-16804.2.patch > > > Currently the semijoin hint takes source table input. However, to provide > better control, also provide the target table name in hint. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery
[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037986#comment-16037986 ] Xuefu Zhang commented on HIVE-6348: --- [~lirui], I think it's better to remove it from operator tree and the optimization can be put as one of the optimization rules. > Order by/Sort by in subquery > > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Rui Li >Priority: Minor > Labels: sub-query > Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch > > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16821: --- Status: Patch Available (was: Open) > Vectorization: support Explain Analyze in vectorized mode > - > > Key: HIVE-16821 > URL: https://issues.apache.org/jira/browse/HIVE-16821 > Project: Hive > Issue Type: Bug > Components: Diagnosability, Vectorization >Affects Versions: 2.1.1, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch > > > Currently, to avoid a branch in the operator inner loop - the runtime stats > are only available in non-vector mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16821: --- Attachment: HIVE-16821.2.patch > Vectorization: support Explain Analyze in vectorized mode > - > > Key: HIVE-16821 > URL: https://issues.apache.org/jira/browse/HIVE-16821 > Project: Hive > Issue Type: Bug > Components: Diagnosability, Vectorization >Affects Versions: 2.1.1, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch > > > Currently, to avoid a branch in the operator inner loop - the runtime stats > are only available in non-vector mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16821: --- Attachment: HIVE-16821.2.patch > Vectorization: support Explain Analyze in vectorized mode > - > > Key: HIVE-16821 > URL: https://issues.apache.org/jira/browse/HIVE-16821 > Project: Hive > Issue Type: Bug > Components: Diagnosability, Vectorization >Affects Versions: 2.1.1, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, > HIVE-16821.2.patch > > > Currently, to avoid a branch in the operator inner loop - the runtime stats > are only available in non-vector mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
[ https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037800#comment-16037800 ] Rajesh Balamohan commented on HIVE-16323: - SynchronizedMetaStoreClient is used only in load-dynamic-partition threads. Should {{ObjectStore.shutdodown()}} set {{pm}} to null as this can be invoked lots of times?. Also getPartition() called via loadPartition() should be using "getSychronizedMSC()::getPartitionWithAuthInfo()" to be on safer side. > HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204 > --- > > Key: HIVE-16323 > URL: https://issues.apache.org/jira/browse/HIVE-16323 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, > HIVE-16323.3.patch, PM_leak.png > > > Hive.loadDynamicPartitions creates threads with new embedded rawstore, but > never close them, thus we leak PersistenceManager one per such thread. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16804) Semijoin hint : Needs support for target table.
[ https://issues.apache.org/jira/browse/HIVE-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037988#comment-16037988 ] Hive QA commented on HIVE-16804: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871375/HIVE-16804.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] (batchId=232) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5538/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5538/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5538/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12871375 - PreCommit-HIVE-Build > Semijoin hint : Needs support for target table. > --- > > Key: HIVE-16804 > URL: https://issues.apache.org/jira/browse/HIVE-16804 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-16804.1.patch, HIVE-16804.2.patch > > > Currently the semijoin hint takes source table input. However, to provide > better control, also provide the target table name in hint. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038031#comment-16038031 ] Hive QA commented on HIVE-16573: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871188/HIVE-16573.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] (batchId=232) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5539/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5539/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5539/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12871188 - PreCommit-HIVE-Build > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16573: --- Status: Patch Available (was: In Progress) I verified this patch, it could work for spark engine on HiveCLI. > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled
[ https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037924#comment-16037924 ] Bing Li commented on HIVE-16573: [~ruili] and [~anishek], thank you for your review. I just submitted the patch. > In-place update for HoS can't be disabled > - > > Key: HIVE-16573 > URL: https://issues.apache.org/jira/browse/HIVE-16573 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Bing Li >Priority: Minor > Attachments: HIVE-16573.1.patch > > > {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery
[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037930#comment-16037930 ] Rui Li commented on HIVE-6348: -- Thanks guys for the suggestions. Yeah I agree ignoring such order/sort by is better. Do you think I can just remove it from the AST? > Order by/Sort by in subquery > > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Rui Li >Priority: Minor > Labels: sub-query > Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch > > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038070#comment-16038070 ] Hive QA commented on HIVE-16821: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871470/HIVE-16821.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5540/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5540/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5540/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-06-06 03:51:56.839 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5540/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-06-06 03:51:56.842 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at bdacb10 HIVE-16571 : HiveServer2: Prefer LIFO over round-robin for Tez session reuse (Gopal Vijayaraghavan, reviewed by Sergey Shelukhin) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at bdacb10 HIVE-16571 : HiveServer2: Prefer LIFO over round-robin for Tez session reuse (Gopal Vijayaraghavan, reviewed by Sergey Shelukhin) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-06-06 03:52:00.381 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: patch -p0 patching file pom.xml patching file ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java patching file ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven [ERROR] Failed to execute goal on project hive-hcatalog: Could not resolve dependencies for project org.apache.hive.hcatalog:hive-hcatalog:pom:3.0.0-SNAPSHOT: Failed to collect dependencies for [org.mockito:mockito-all:jar:1.10.19 (test), org.apache.hadoop:hadoop-common:jar:2.7.3 (test), org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.7.3 (test), org.apache.pig:pig:jar:h2:0.16.0 (test), org.slf4j:slf4j-api:jar:1.7.10 (compile)]: Failed to read artifact descriptor for org.apache.hadoop:hadoop-common:jar:2.7.3: Could not find artifact org.apache.hadoop:hadoop-project-dist:pom:2.7.3 in datanucleus (http://www.datanucleus.org/downloads/maven2) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hive-hcatalog + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12871470 - PreCommit-HIVE-Build > Vectorization: support Explain Analyze in vectorized mode > - > > Key: HIVE-16821 > URL: https://issues.apache.org/jira/browse/HIVE-16821 >