Review Request 35576: HIVE-11028 Tez: table self join and join with another table fails with IndexOutOfBoundsException
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35576/ --- Review request for hive, Gunther Hagleitner and John Pullokkaran. Bugs: HIVE-11028 https://issues.apache.org/jira/browse/HIVE-11028 Repository: hive-git Description --- Change TezCompiler to only run short-cutting of expressions rather than full constant folding. Diffs - itests/src/test/resources/testconfiguration.properties b9f39fb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java 0027960 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcCtx.java 6bb2a09 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java 4a4814d ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 56707af ql/src/test/queries/clientpositive/tez_self_join.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_self_join.q.out PRE-CREATION Diff: https://reviews.apache.org/r/35576/diff/ Testing --- qfile test added Thanks, Jason Dere
Re: Review Request 35576: HIVE-11028 Tez: table self join and join with another table fails with IndexOutOfBoundsException
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35576/#review88269 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java (line 46) https://reviews.apache.org/r/35576/#comment140696 As we discussed, Couldn't we move foldExpr from ConstantPropagate to ExprNodeDescUtils. Then DPP could use ExprNodeDescUtils directly without depending on ConstantProp. IMO, its a better seperation. - John Pullokkaran On June 17, 2015, 6:38 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35576/ --- (Updated June 17, 2015, 6:38 p.m.) Review request for hive, Gunther Hagleitner and John Pullokkaran. Bugs: HIVE-11028 https://issues.apache.org/jira/browse/HIVE-11028 Repository: hive-git Description --- Change TezCompiler to only run short-cutting of expressions rather than full constant folding. Diffs - itests/src/test/resources/testconfiguration.properties b9f39fb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java 0027960 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcCtx.java 6bb2a09 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java 4a4814d ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 56707af ql/src/test/queries/clientpositive/tez_self_join.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_self_join.q.out PRE-CREATION Diff: https://reviews.apache.org/r/35576/diff/ Testing --- qfile test added Thanks, Jason Dere
Re: PL/HQL and Hive
Alan, HPL/SQL is a good name, I am ok with this change. Right now I am the only one developer of PL/HQL. Which status will I be given in the Hive project, so I can continue developing the tool? I will read docs and try to create a patch. Thanks, Dmitry On Wed, Jun 17, 2015 at 9:55 PM, Alan Gates alanfga...@gmail.com wrote: Here's what we need to do: 1) You need to file a JIRA proposing to contribute the code. 2) You can then contribute the code as a patch to that JIRA. As long as you've written all the code yourself this is sufficient to hand legal rights to Apache to contribute the code. If others beyond you have legal claim to the code (ie they wrote it or paid you to write it) we'll need to work with Apache and those authors to get clearance to include the code. 3) Before committing the code we need to move it to an org.apache.hive packaging structure. I propose that we put it in a new package org.apache.hive.hplsql (see below for why I chose that). We can take the patch you submit and make this change before committing or you can move it yourself before you contribute the patch. 4) One of the current committers can then take the patch and get it committed. One suggestion that might be controversial: I propose we change the name from PL/HQL to HPL/SQL (hence my packaging name suggestion above). We want to move away from saying Hive has a language called HQL which is SQL like. At this point Hive's SQL is most of the way to SQL-92 so talking about HQL just confuses people. Hence Hive PL/SQL (HPL/SQL) seems better. Or if you prefer we could do PL/HSQL. Alan. Dmitry Tolpeko dmtolp...@gmail.com June 15, 2015 at 8:03 Hi Alan, I am back from my vacation. Please let me know what actions, information is required for me regarding IP. Can we talk about Jira creation and first steps to make PL/HQL conform to Hive standards? Thanks, Dmitry
Re: Review Request 34897: CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34897/ --- (Updated June 17, 2015, 5:54 p.m.) Review request for hive and Jesús Camacho Rodríguez. Changes --- Address all the comments. Repository: hive-git Description --- in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, when aliases contains empty string and key is an empty string too, it assumes that aliases contains key. This will trigger incorrect PPD. To reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java 9c21238 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverterPostProc.java e7c8342 Diff: https://reviews.apache.org/r/34897/diff/ Testing --- Thanks, pengcheng xiong
Re: PL/HQL and Hive
Here's what we need to do: 1) You need to file a JIRA proposing to contribute the code. 2) You can then contribute the code as a patch to that JIRA. As long as you've written all the code yourself this is sufficient to hand legal rights to Apache to contribute the code. If others beyond you have legal claim to the code (ie they wrote it or paid you to write it) we'll need to work with Apache and those authors to get clearance to include the code. 3) Before committing the code we need to move it to an org.apache.hive packaging structure. I propose that we put it in a new package org.apache.hive.hplsql (see below for why I chose that). We can take the patch you submit and make this change before committing or you can move it yourself before you contribute the patch. 4) One of the current committers can then take the patch and get it committed. One suggestion that might be controversial: I propose we change the name from PL/HQL to HPL/SQL (hence my packaging name suggestion above). We want to move away from saying Hive has a language called HQL which is SQL like. At this point Hive's SQL is most of the way to SQL-92 so talking about HQL just confuses people. Hence Hive PL/SQL (HPL/SQL) seems better. Or if you prefer we could do PL/HSQL. Alan. Dmitry Tolpeko mailto:dmtolp...@gmail.com June 15, 2015 at 8:03 Hi Alan, I am back from my vacation. Please let me know what actions, information is required for me regarding IP. Can we talk about Jira creation and first steps to make PL/HQL conform to Hive standards? Thanks, Dmitry
Re: Review Request 35532: HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35532/ --- (Updated June 17, 2015, 10:47 p.m.) Review request for hive. Repository: hive-git Description --- HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly Diffs (updated) - data/files/emp2.txt 650aff7 ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java 32471f2 ql/src/test/queries/clientpositive/windowing_windowspec3.q 608a6cf ql/src/test/results/clientpositive/windowing_windowspec3.q.out 42c042f Diff: https://reviews.apache.org/r/35532/diff/ Testing --- Thanks, Aihua Xu
Re: PL/HQL and Hive
In Apache projects there are contributors and committers. Contributors are anyone who helps with the project via code, docs, tests, bug reports, etc. Committers can commit code, though it must still be reviewed by other committers. On the process of becoming a committer in Hive see https://cwiki.apache.org/confluence/display/Hive/BecomingACommitter Obviously contributing a large bit of functionality starts you on that road nicely. If you need help getting the patch together let me know. Alan. Dmitry Tolpeko mailto:dmtolp...@gmail.com June 17, 2015 at 13:02 Alan, HPL/SQL is a good name, I am ok with this change. Right now I am the only one developer of PL/HQL. Which status will I be given in the Hive project, so I can continue developing the tool? I will read docs and try to create a patch. Thanks, Dmitry Alan Gates mailto:alanfga...@gmail.com June 17, 2015 at 11:55 Here's what we need to do: 1) You need to file a JIRA proposing to contribute the code. 2) You can then contribute the code as a patch to that JIRA. As long as you've written all the code yourself this is sufficient to hand legal rights to Apache to contribute the code. If others beyond you have legal claim to the code (ie they wrote it or paid you to write it) we'll need to work with Apache and those authors to get clearance to include the code. 3) Before committing the code we need to move it to an org.apache.hive packaging structure. I propose that we put it in a new package org.apache.hive.hplsql (see below for why I chose that). We can take the patch you submit and make this change before committing or you can move it yourself before you contribute the patch. 4) One of the current committers can then take the patch and get it committed. One suggestion that might be controversial: I propose we change the name from PL/HQL to HPL/SQL (hence my packaging name suggestion above). We want to move away from saying Hive has a language called HQL which is SQL like. At this point Hive's SQL is most of the way to SQL-92 so talking about HQL just confuses people. Hence Hive PL/SQL (HPL/SQL) seems better. Or if you prefer we could do PL/HSQL. Alan. Dmitry Tolpeko mailto:dmtolp...@gmail.com June 15, 2015 at 8:03 Hi Alan, I am back from my vacation. Please let me know what actions, information is required for me regarding IP. Can we talk about Jira creation and first steps to make PL/HQL conform to Hive standards? Thanks, Dmitry Dmitry Tolpeko mailto:dmtolp...@gmail.com June 2, 2015 at 12:35 Alan, I am new to the Hive project structure and development process, so I would highly appreciate your guidance (if you can initiate Jira or tell me how to do that i.e). Also I can grant software to Apache if required although I am not sure which IP clearance required. For me uploading of the code is sufficient. Thank you, Dmitry Alan Gates mailto:alanfga...@gmail.com June 1, 2015 at 15:50 Dmitry, I'm thrilled to hear that you're open to integrating PL/HQL into Hive. As for how we'd do it, this is obviously something we'll have to discuss in the community on the dev list. But my initial thought is that we start by importing it as it, mostly focussing on package name changes, etc. So it starts as a stand alone. Then over time we work on integrating it directly into Hive. This will have a number of benefits for users as they'll be able to create and store procedures, invoke them from JDBC connections, grant and revoke access to procedures, etc. So I think the next step is to open a JIRA on it and then we can start building a patch to contribute the code. Given that PL/HQL has already been released as a separate entity I'm not sure if we need additional IP clearance (ie you have to sign a grant) or if you uploading the code to a JIRA is sufficient. Do any of the Hive PMC know? No worries if you can't respond until June 12, there's no a rush. Enjoy your vacation. Alan.
[jira] [Created] (HIVE-11039) Write a tool to allow people with datanucelus.identifierFactory=datanucleus2 to migrate their metastore to datanucleus1 naming
Sushanth Sowmyan created HIVE-11039: --- Summary: Write a tool to allow people with datanucelus.identifierFactory=datanucleus2 to migrate their metastore to datanucleus1 naming Key: HIVE-11039 URL: https://issues.apache.org/jira/browse/HIVE-11039 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Critical We hit an interesting bug in a case where datanucleus.identifierFactory = datanucleus2 . The problem is that directSql handgenerates SQL strings assuming datanucleus1 naming scheme. If a user has their metastore JDO managed by datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are incorrect. One simple example of what this results in is the following: whenever DN persists a field which is held as a ListT, it winds up storing each T as a separate line in the appropriate mapping table, and has a column called INTEGER_IDX, which holds the position in the list. Then, upon reading, it automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which results in the list retaining its order. In DN2 naming scheme, the column is called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX and IDX. Whenever they use JDO, such as with all writes, it will then use the IDX field, and when they do any sort of optimized reads, such as through directSQL, it will ORDER BY INTEGER_IDX. An immediate danger is seen when we consider that the schema of a table is stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch schema for the table can come up mixed up in the table's native hashing order, rather than sorted by the index. This can then result in schema ordering being different from the actual table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this may return (c:string, a:int, b: string), and thus, queries which are inserting after selecting from another table can have ClassCastExceptions when trying to insert data in the wong order - this is how we discovered this bug. This problem, however, can be far worse, if there are no type problems - it is possible, for eg., that if a,bc were all strings, that that insert query would succeed but mix up the order, which then results in user table data being mixed up. This has the potential to be very bad. We should write a tool to help convert metastores that use datanucleus2 to datanucleus1(more difficult, needs more one-time testing) or change directSql to support both(easier to code, but increases test-coverage matrix significantly and we should really then be testing against both schemes). But in the short term, we should disable directSql if we see that the identifierfactory is datanucleus2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 35576: HIVE-11028 Tez: table self join and join with another table fails with IndexOutOfBoundsException
On June 17, 2015, 7:39 p.m., John Pullokkaran wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java, line 46 https://reviews.apache.org/r/35576/diff/1/?file=986551#file986551line46 As we discussed, Couldn't we move foldExpr from ConstantPropagate to ExprNodeDescUtils. Then DPP could use ExprNodeDescUtils directly without depending on ConstantProp. IMO, its a better seperation. It looks like ConstantPropagate does additional things that we actually want to be done, such as being able to delete filter operators if the filter expression consists of a single constant True expression. We end up with those kinds of situationss from the dynamic partition pruning. I believe it would be easier and a more generic reusable solution to use ConstantPropagate here, than to have to re-implement similar logic in DynamicPartitionPrunerProc. - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35576/#review88269 --- On June 17, 2015, 6:38 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35576/ --- (Updated June 17, 2015, 6:38 p.m.) Review request for hive, Gunther Hagleitner and John Pullokkaran. Bugs: HIVE-11028 https://issues.apache.org/jira/browse/HIVE-11028 Repository: hive-git Description --- Change TezCompiler to only run short-cutting of expressions rather than full constant folding. Diffs - itests/src/test/resources/testconfiguration.properties b9f39fb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java 0027960 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcCtx.java 6bb2a09 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java 4a4814d ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 56707af ql/src/test/queries/clientpositive/tez_self_join.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_self_join.q.out PRE-CREATION Diff: https://reviews.apache.org/r/35576/diff/ Testing --- qfile test added Thanks, Jason Dere
Re: Review Request 35532: HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly
On June 17, 2015, 4:08 p.m., Ashutosh Chauhan wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java, line 1240 https://reviews.apache.org/r/35532/diff/1/?file=985909#file985909line1240 This doesn't seem right. isGreater() (as oppose to isEqual()) is not symmetric w.r.t order of two arguments. e.g., consider v1 = 23 and v2 = NULL, this call will return v1 v2. However, if v1 = NULL and v2 = 23, it will still return v1 v2. Either NULLs should always be greater or always be smaller, otherwise this has potential to generate incorrect result set. Aihua Xu wrote: The name isGreater() probably is a little misleading. It actually means if the distance of second value to the first value is greater than the given amt. When v1 = 23 and v2 = null, the distance is considered greater than 10 (actually any value) since 23 and null are not comparable; the same for v1 = null and v2 = 23. v1 = null and v2 = null are considered less than 10 since they are both null and the distance is 0. I can change the name to mean what it means, like isDistanceGreater(), so that we won't be confused with isEqual() (which is what I initially did as well); or add some comments. Suggestions? I see. Yeah, change the name and add comments too. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35532/#review88235 --- On June 16, 2015, 8:13 p.m., Aihua Xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35532/ --- (Updated June 16, 2015, 8:13 p.m.) Review request for hive. Repository: hive-git Description --- HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly Diffs - data/files/emp2.txt 650aff7f2c8003fb7c04dfa377c2b25d04f3ce88 ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java 32471f2dc864c38a2969909efa5b21508e27d7f8 ql/src/test/queries/clientpositive/windowing_windowspec3.q 608a6cf45e3c1e0b928800dae0470e8acfd77734 ql/src/test/results/clientpositive/windowing_windowspec3.q.out 42c042f2cf80f0a5a8269ad9eb9864d7e76525cc Diff: https://reviews.apache.org/r/35532/diff/ Testing --- Thanks, Aihua Xu
Re: Review Request 35532: HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly
On June 17, 2015, 4:08 p.m., Ashutosh Chauhan wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java, line 1240 https://reviews.apache.org/r/35532/diff/1/?file=985909#file985909line1240 This doesn't seem right. isGreater() (as oppose to isEqual()) is not symmetric w.r.t order of two arguments. e.g., consider v1 = 23 and v2 = NULL, this call will return v1 v2. However, if v1 = NULL and v2 = 23, it will still return v1 v2. Either NULLs should always be greater or always be smaller, otherwise this has potential to generate incorrect result set. The name isGreater() probably is a little misleading. It actually means if the distance of second value to the first value is greater than the given amt. When v1 = 23 and v2 = null, the distance is considered greater than 10 (actually any value) since 23 and null are not comparable; the same for v1 = null and v2 = 23. v1 = null and v2 = null are considered less than 10 since they are both null and the distance is 0. I can change the name to mean what it means, like isDistanceGreater(), so that we won't be confused with isEqual() (which is what I initially did as well); or add some comments. Suggestions? - Aihua --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35532/#review88235 --- On June 16, 2015, 8:13 p.m., Aihua Xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35532/ --- (Updated June 16, 2015, 8:13 p.m.) Review request for hive. Repository: hive-git Description --- HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly Diffs - data/files/emp2.txt 650aff7f2c8003fb7c04dfa377c2b25d04f3ce88 ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java 32471f2dc864c38a2969909efa5b21508e27d7f8 ql/src/test/queries/clientpositive/windowing_windowspec3.q 608a6cf45e3c1e0b928800dae0470e8acfd77734 ql/src/test/results/clientpositive/windowing_windowspec3.q.out 42c042f2cf80f0a5a8269ad9eb9864d7e76525cc Diff: https://reviews.apache.org/r/35532/diff/ Testing --- Thanks, Aihua Xu
[jira] [Created] (HIVE-11038) MiniTezCli tests are hanging
Wei Zheng created HIVE-11038: Summary: MiniTezCli tests are hanging Key: HIVE-11038 URL: https://issues.apache.org/jira/browse/HIVE-11038 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 2.0.0 Reporter: Wei Zheng Priority: Blocker Whenever running a MiniTezCli test, it just hangs. Here's the maven command to run a test: {code} $ mvn test -Phadoop-2 -Dtest=TestMiniTezCliDriver -Dqfile=dynamic_partition_pruning.q {code} Here's the tail of org.apache.hadoop.hive.cli.TestMiniTezCliDriver-output.txt: {code} Status: Running (Executing on YARN cluster with App id application_1434574617753_0001) Map 1: -/- Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 POSTHOOK: query: analyze table lineitem compute statistics for columns POSTHOOK: type: QUERY POSTHOOK: Input: default@lineitem POSTHOOK: Output: file:/Users/wzheng/bf/hive/itests/qtest/target/tmp/localscratchdir/c684ea6a-11b1-4253-a529-c3778695b72a/hive_2015-06-17_13-57-19_047_1275844087077606719-1/-mr-1 OK Time taken: 0.387 seconds Begin query: dynamic_partition_pruning.q ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/Users/wzheng/bf/hive/conf/ivysettings.xml will be used {code} And here's the jstack output (partial): {code} main #1 prio=5 os_prio=31 tid=0x7fc75e805800 nid=0x1303 waiting on condition [0x000101d84000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor.monitorExecution(TezJobMonitor.java:378) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:168) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1657) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1416) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1197) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1033) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1007) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:146) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning(TestMiniTezCliDriver.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) VM Thread os_prio=31 tid=0x7fc75e830800 nid=0x3103 runnable GC task thread#0 (ParallelGC) os_prio=31 tid=0x7fc75e811800 nid=0x2103 runnable GC task thread#1 (ParallelGC) os_prio=31 tid=0x7fc75f00 nid=0x2303 runnable GC task thread#2 (ParallelGC) os_prio=31 tid=0x7fc75f001000 nid=0x2503 runnable GC task thread#3 (ParallelGC) os_prio=31 tid=0x7fc75f80 nid=0x2703 runnable GC task thread#4 (ParallelGC) os_prio=31 tid=0x7fc75f801000 nid=0x2903
Review Request 35582: HIVE-11029:hadoop.proxyuser.mapr.groups does not work to restrict the groups that can be impersonated
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35582/ --- Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-11029 https://issues.apache.org/jira/browse/HIVE-11029 Repository: hive-git Description --- Currently Hive session UGI uses createRemoteUser API instead of createProxyUser API in the unsecured mode. That way, the impersonated user is not passed to the jobtracker/Resourcemanager. This caused the hadoop.proxyuser.mapr.groups does not work to restrict the groups that can be impersonated. Any impersonated user can launch a mapreduce job. The fix is replacing the createRemoteUser API by createProxyUser API. Diffs - service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java 56af643 Diff: https://reviews.apache.org/r/35582/diff/ Testing --- Thanks, Na Yang
questions about hive CBO
Hi all, I'm reading the source code of Hive cbo (CalcaitePlanner), but I find it hard to follow. Listed below are some of the questions: 1. What's the relationship between HepPlanner and HiveVolcanoPlanner? 2. I don't have a clue about these concepts: clusters, traitDef and collectGarbage(). Thanks for any help. best regards, -zhenhua
Re: questions about hive CBO
HepPlanner is a greedy planner VolcanoPlanner is a more exhaustive planner. ReloptCluster captures env for planning; it holds on to type factory, metadata providerŠ Having said that these are just required plumbings needed to explore plan alternatives. CalcitePlanner, Meta data providers, ReloptHiveTable are some of the key pieces you need to understand. On 6/17/15, 6:03 PM, wangzhenhua (G) wangzhen...@huawei.com wrote: Hi all, I'm reading the source code of Hive cbo (CalcaitePlanner), but I find it hard to follow. Listed below are some of the questions: 1. What's the relationship between HepPlanner and HiveVolcanoPlanner? 2. I don't have a clue about these concepts: clusters, traitDef and collectGarbage(). Thanks for any help. best regards, -zhenhua
[jira] [Created] (HIVE-11040) Change Derby dependency version to 10.10.2.0
Jason Dere created HIVE-11040: - Summary: Change Derby dependency version to 10.10.2.0 Key: HIVE-11040 URL: https://issues.apache.org/jira/browse/HIVE-11040 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere We don't see this on the Apache pre-commit tests because it uses PTest, but running the entire TestCliDriver suite results in failures in some of the partition-related qtests (partition_coltype_literals, partition_date, partition_date2). I've only really seen this on Linux (I was using CentOS). HIVE-8879 changed the Derby dependency version from 10.10.1.1 to 10.11.1.1. Testing with 10.10.1.1 or 10.20.2.0 seems to allow the partition related tests to pass. I'd like to change the dependency version to 10.20.2.0, since that version should also contain the fix for HIVE-8879. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34897: CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34897/#review88242 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java (line 976) https://reviews.apache.org/r/34897/#comment140666 Is your intention here to change the table alias in the schema of the child too? ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverterPostProc.java (line 55) https://reviews.apache.org/r/34897/#comment140664 It seems that joinOpToAlias is never used. Could it be removed? ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverterPostProc.java (line 110) https://reviews.apache.org/r/34897/#comment140665 Blank space - Jesús Camacho Rodríguez On June 16, 2015, 9:55 p.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34897/ --- (Updated June 16, 2015, 9:55 p.m.) Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, when aliases contains empty string and key is an empty string too, it assumes that aliases contains key. This will trigger incorrect PPD. To reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java 9c21238 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverterPostProc.java e7c8342 Diff: https://reviews.apache.org/r/34897/diff/ Testing --- Thanks, pengcheng xiong
Hive-0.14 - Build # 987 - Still Failing
Changes for Build #980 Changes for Build #981 Changes for Build #982 Changes for Build #983 Changes for Build #984 Changes for Build #985 Changes for Build #986 Changes for Build #987 No tests ran. The Apache Jenkins build system has built Hive-0.14 (build #987) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-0.14/987/ to view the results.
[jira] [Created] (HIVE-11036) Race condition in DataNucleus makes Metastore to hang
Ashutosh Chauhan created HIVE-11036: --- Summary: Race condition in DataNucleus makes Metastore to hang Key: HIVE-11036 URL: https://issues.apache.org/jira/browse/HIVE-11036 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.2.0, 1.0.0, 0.14.0, 1.1.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Under moderate to high concurrent query workload Metastore gets deadlocked in DataNucleus -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11037) HiveOnTez: make explain user level = true as default
Pengcheng Xiong created HIVE-11037: -- Summary: HiveOnTez: make explain user level = true as default Key: HIVE-11037 URL: https://issues.apache.org/jira/browse/HIVE-11037 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong In Hive-9780, we introduced a new level of explain for hive on tez. We would like to make it running by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- (Updated June 17, 2015, 8:59 a.m.) Review request for hive and Xuefu Zhang. Changes --- improve the compare algorithm and update qfile output Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 2cb126d ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 ql/src/test/results/clientpositive/spark/union28.q.out 98582df ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 ql/src/test/results/clientpositive/spark/union30.q.out 3409623 ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 ql/src/test/results/clientpositive/spark/union5.q.out afee988 ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 35532: HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35532/#review88235 --- ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java (line 1238) https://reviews.apache.org/r/35532/#comment140647 This doesn't seem right. isGreater() (as oppose to isEqual()) is not symmetric w.r.t order of two arguments. e.g., consider v1 = 23 and v2 = NULL, this call will return v1 v2. However, if v1 = NULL and v2 = 23, it will still return v1 v2. Either NULLs should always be greater or always be smaller, otherwise this has potential to generate incorrect result set. - Ashutosh Chauhan On June 16, 2015, 8:13 p.m., Aihua Xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35532/ --- (Updated June 16, 2015, 8:13 p.m.) Review request for hive. Repository: hive-git Description --- HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly Diffs - data/files/emp2.txt 650aff7f2c8003fb7c04dfa377c2b25d04f3ce88 ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java 32471f2dc864c38a2969909efa5b21508e27d7f8 ql/src/test/queries/clientpositive/windowing_windowspec3.q 608a6cf45e3c1e0b928800dae0470e8acfd77734 ql/src/test/results/clientpositive/windowing_windowspec3.q.out 42c042f2cf80f0a5a8269ad9eb9864d7e76525cc Diff: https://reviews.apache.org/r/35532/diff/ Testing --- Thanks, Aihua Xu
[jira] [Created] (HIVE-11035) PPD: Orc Split elimination fails because filterColumns=[-1]
Gopal V created HIVE-11035: -- Summary: PPD: Orc Split elimination fails because filterColumns=[-1] Key: HIVE-11035 URL: https://issues.apache.org/jira/browse/HIVE-11035 Project: Hive Issue Type: Bug Reporter: Gopal V {code} create temporary table xx (x int) stored as orc ; insert into xx values (20),(200); set hive.fetch.task.conversion=none; select * from xx where x is null; {code} This should generate zero tasks after optional split elimination in the app master, instead of generating the 1 task which for sure hits the row-index filters and removes all rows anyway. Right now, this runs 1 task for the stripe containing (min=20, max=200, has_null=false), which is broken. Instead, it returns YES_NO_NULL from the following default case https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L976 -- This message was sent by Atlassian JIRA (v6.3.4#6332)