[jira] [Commented] (HIVE-3790) UDF to introduce an OFFSET(day,month or year) for a given date or timestamp
[ https://issues.apache.org/jira/browse/HIVE-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752149#comment-13752149 ] Jithin John commented on HIVE-3790: --- could some one review this and provide comments? UDF to introduce an OFFSET(day,month or year) for a given date or timestamp Key: HIVE-3790 URL: https://issues.apache.org/jira/browse/HIVE-3790 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.9.0 Reporter: Jithin John Fix For: 0.9.1 Attachments: HIVE-3790.patch Current releases of Hive lacks a generic function which would find the date offset to a date / timestamp. Current releases have date_add (date) and date_sub(date) which allows user to add or substract days only.But we could not use year or month as a unit. The Function DATE_OFFSET(date,offset,unit) returns the date offset value from start_date according to the unit. Here the unit can be year , month and day. The function could be used for date range queries and is more flexible than the existing functions. Functionality :- Function Name: DATE_OFFSET(date,offset,unit) Add a offset value to the unit part of the date/timestamp. Returns the date in the format of -MM-dd . Example: hive select date_offset('2009-07-29', -1 ,'MONTH' ) FROM src LIMIT 1 - 2009-06-29 Usage :- Case : To calculate the expiry date of a item from manufacturing date Table :- ITEM_TAB Manufacturing_date |item id|store id|value|unit|price 2012-12-01|110001|00003|0.99|1.00|0.99 2012-12-02|110001|00008|0.99|0.00|0.00 2012-12-03|110001|00009|0.99|0.00|0.00 2012-12-04|110001|001112002|0.99|0.00|0.00 2012-12-05|110001|001112003|0.99|0.00|0.00 2012-12-06|110001|001112006|0.99|1.00|0.99 2012-12-07|110001|001112007|0.99|0.00|0.00 2012-12-08|110001|001112008|0.99|0.00|0.00 2012-12-09|110001|001112009|0.99|0.00|0.00 2012-12-10|110001|001112010|0.99|0.00|0.00 2012-12-11|110001|001113003|0.99|0.00|0.00 2012-12-12|110001|001113006|0.99|0.00|0.00 2012-12-13|110001|001113008|0.99|0.00|0.00 2012-12-14|110001|001113010|0.99|0.00|0.00 2012-12-15|110001|001114002|0.99|0.00|0.00 2012-12-16|110001|001114004|0.99|1.00|0.99 2012-12-17|110001|001114005|0.99|0.00|0.00 2012-12-18|110001|001121004|0.99|0.00|0.00 QUERY: select man_date , date_offset(man_date ,5 ,'year') as expiry_date from item_tab; RESULT: 2012-12-01 2017-12-01 2012-12-02 2017-12-02 2012-12-03 2017-12-03 2012-12-04 2017-12-04 2012-12-05 2017-12-05 2012-12-06 2017-12-06 2012-12-07 2017-12-07 2012-12-08 2017-12-08 2012-12-09 2017-12-09 2012-12-10 2017-12-10 2012-12-11 2017-12-11 2012-12-12 2017-12-12 2012-12-13 2017-12-13 2012-12-14 2017-12-14 2012-12-15 2017-12-15 2012-12-16 2017-12-16 2012-12-17 2017-12-17 2012-12-18 2017-12-18 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752239#comment-13752239 ] Mohammad Kamrul Islam commented on HIVE-1511: - Thanks to [~ashutoshc] and [~brocknoland] for moving it to this far! I think I isolated the issue in some extent. Looks like it is a bug in Kryo. At first, I created an XML plan file for the failed case using our existing java based serialization. Then I wrote (copied from Ashutosh) an independent java class that deserializes the plan XML in MapRedWork object using XMLDecoder. After that, the code serializes the MapredWork object using Kryo. At last, deserialize it using Kryo. In this case, serialization with Kryo succeeds but deserialization with Kryo fails with the following exception. It is important to note that the simpler version of plan XML succeeds using the same utility. I'm going to attach three files: 1. Independent Java code to test KryoHiveTest.java. 2. Script to compile and run run.sh. (Run with run.sh generated_plan.xml) 3. Generated plan in XML generated_plan.xml that fails. [~romixlev] : do you have any suggestion? I think you are also active in Kryo. Should i send an email to kayo list? Exception: {quote} Exception in thread main com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 12416, Size: 1504 Serialization trace: rslvMap (org.apache.hadoop.hive.ql.parse.RowResolver) rr (org.apache.hadoop.hive.ql.parse.OpParseContext) opParseCtxMap (org.apache.hadoop.hive.ql.plan.MapWork) mapWork (org.apache.hadoop.hive.ql.plan.MapredWork) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:485) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:485) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:760) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:485) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:485) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657) at KryoHiveTest.fun(KryoHiveTest.java:51) at KryoHiveTest.main(KryoHiveTest.java:25) Caused by: java.lang.IndexOutOfBoundsException: Index: 12416, Size: 1504 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42) at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:804) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:728) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:127) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) ... 16 more {quote} Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Attachments: HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511-wip.patch As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA
[jira] [Updated] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-1511: Attachment: KryoHiveTest.java generated_plan.xml run.sh Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Attachments: generated_plan.xml, HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-1511: Status: Open (was: Patch Available) Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Attachments: generated_plan.xml, HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752350#comment-13752350 ] Hudson commented on HIVE-5147: -- FAILURE: Integrated in Hive-trunk-hadoop2 #386 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/386/]) HIVE-5147 : Newly added test TestSessionHooks is failing on trunk (Navis via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517873) * /hive/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionHookContext.java * /hive/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionHookContextImpl.java * /hive/trunk/service/src/java/org/apache/hive/service/cli/session/SessionManager.java Newly added test TestSessionHooks is failing on trunk - Key: HIVE-5147 URL: https://issues.apache.org/jira/browse/HIVE-5147 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.12.0 Reporter: Ashutosh Chauhan Assignee: Navis Fix For: 0.12.0 Attachments: HIVE-5147.D12543.1.patch This was recently added via HIVE-4588 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752349#comment-13752349 ] Hudson commented on HIVE-5144: -- FAILURE: Integrated in Hive-trunk-hadoop2 #386 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/386/]) HIVE-5144 : HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517877) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Fix For: 0.12.0 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Possible to get table metadata in UDTF or UDF?
Hi all, Is it possible to get metadata (e.g. column names, column ids ) for a given table name inside a User Defined Table Function ? Best regards, Shawn
[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3562: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis for your persistence on this one! Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, HIVE-3562.D5967.9.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5166) TestWebHCatE2e is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752481#comment-13752481 ] Ashutosh Chauhan commented on HIVE-5166: Stacktrace: {noformat} TestCase TestWebHCatE2e Name Status Type Time(s) getStatus Failure GET http://localhost:50111/templeton/v1/status?user.name=johndoe html head meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/ titleError 503 java.lang.RuntimeException: Could not load wadl generators from wadlGeneratorDescriptions./title /head body h2HTTP ERROR: 503/h2 pProblem accessing /templeton/v1/status. Reason: pre java.lang.RuntimeException: Could not load wadl generators from wadlGeneratorDescriptions./pre/p hr /ismallPowered by Jetty:///small/i /body /html expected:200 but was:503 junit.framework.AssertionFailedError: GET http://localhost:50111/templeton/v1/status?user.name=johndoe html head meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/ titleError 503 java.lang.RuntimeException: Could not load wadl generators from wadlGeneratorDescriptions./title /head body h2HTTP ERROR: 503/h2 pProblem accessing /templeton/v1/status. Reason: pre java.lang.RuntimeException: Could not load wadl generators from wadlGeneratorDescriptions./pre/p hr /ismallPowered by Jetty:///small/i /body /html expected:200 but was:503 at org.apache.hcatalog.templeton.TestWebHCatE2e.getStatus(TestWebHCatE2e.java:85) 0.125 invalidPath Failure GET http://localhost:50111/templeton/v1/no_such_mapping/database?user.name=johndoe html head meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/ titleError 503 java.lang.RuntimeException: Could not load wadl generators from wadlGeneratorDescriptions./title /head body h2HTTP ERROR: 503/h2 pProblem accessing /templeton/v1/no_such_mapping/database. Reason: pre java.lang.RuntimeException: Could not load wadl generators from wadlGeneratorDescriptions./pre/p hr /ismallPowered by Jetty:///small/i /body /html expected:500 but was:503 junit.framework.AssertionFailedError: GET http://localhost:50111/templeton/v1/no_such_mapping/database?user.name=johndoe html head meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/ titleError 503 java.lang.RuntimeException: Could not load wadl generators from wadlGeneratorDescriptions./title /head body h2HTTP ERROR: 503/h2 pProblem accessing /templeton/v1/no_such_mapping/database. Reason: pre java.lang.RuntimeException: Could not load wadl generators from wadlGeneratorDescriptions./pre/p hr /ismallPowered by Jetty:///small/i /body /html expected:500 but was:503 at org.apache.hcatalog.templeton.TestWebHCatE2e.invalidPath(TestWebHCatE2e.java:105) {noformat} TestWebHCatE2e is failing on trunk -- Key: HIVE-5166 URL: https://issues.apache.org/jira/browse/HIVE-5166 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 0.12.0 Reporter: Ashutosh Chauhan I observed these while running full test suite last couple of times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5166) TestWebHCatE2e is failing on trunk
Ashutosh Chauhan created HIVE-5166: -- Summary: TestWebHCatE2e is failing on trunk Key: HIVE-5166 URL: https://issues.apache.org/jira/browse/HIVE-5166 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 0.12.0 Reporter: Ashutosh Chauhan I observed these while running full test suite last couple of times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5166) TestWebHCatE2e is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752486#comment-13752486 ] Ashutosh Chauhan commented on HIVE-5166: Also, I should note this is happening inconsistently. On an another box in full test run, these tests indeed passed. TestWebHCatE2e is failing on trunk -- Key: HIVE-5166 URL: https://issues.apache.org/jira/browse/HIVE-5166 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 0.12.0 Reporter: Ashutosh Chauhan I observed these while running full test suite last couple of times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5128) Direct SQL for view is failing
[ https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5128: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Sergey! Direct SQL for view is failing --- Key: HIVE-5128 URL: https://issues.apache.org/jira/browse/HIVE-5128 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch I cannot sure of this, but dropping views, (it rolls back to JPA and works fine) {noformat} etastore.ObjectStore: Direct SQL failed, falling back to ORM MetaException(message:Unexpected null for one of the IDs, SD null, column null, serde null) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758) ... {noformat} Should it be disabled for views or can be fixed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752490#comment-13752490 ] Gopal V commented on HIVE-3562: --- Good work Navis. Let me mark HIVE-5093 as obsoleted by this - no need for that hack anymore. Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, HIVE-3562.D5967.9.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path
[ https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752496#comment-13752496 ] Phabricator commented on HIVE-5158: --- ashutoshc has requested changes to the revision HIVE-5158 [jira] allow getting all partitions for table to also use direct SQL path. Question on supporting max. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1387 Seems like its not hard to support max in this scenario. We can simply do query.setRange(0, max) for it. Did you consider supporting it? REVISION DETAIL https://reviews.facebook.net/D12573 BRANCH HIVE-5158 ARCANIST PROJECT hive To: JIRA, ashutoshc, sershe allow getting all partitions for table to also use direct SQL path -- Key: HIVE-5158 URL: https://issues.apache.org/jira/browse/HIVE-5158 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5158.D12573.1.patch While testing some queries I noticed that getPartitions can be very slow (which happens e.g. in non-strict mode with no partition column filter); with a table with many partitions it can take 10-12s easily. SQL perf path can also be used for this path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path
[ https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752526#comment-13752526 ] Sergey Shelukhin commented on HIVE-5158: Actually there's another path that needs to be changed... allow getting all partitions for table to also use direct SQL path -- Key: HIVE-5158 URL: https://issues.apache.org/jira/browse/HIVE-5158 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5158.D12573.1.patch While testing some queries I noticed that getPartitions can be very slow (which happens e.g. in non-strict mode with no partition column filter); with a table with many partitions it can take 10-12s easily. SQL perf path can also be used for this path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5128) Direct SQL for view is failing
[ https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752530#comment-13752530 ] Hudson commented on HIVE-5128: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #74 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/74/]) HIVE-5128 : Direct SQL for view is failing (Sergey Shelukhin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518258) * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java Direct SQL for view is failing --- Key: HIVE-5128 URL: https://issues.apache.org/jira/browse/HIVE-5128 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch I cannot sure of this, but dropping views, (it rolls back to JPA and works fine) {noformat} etastore.ObjectStore: Direct SQL failed, falling back to ORM MetaException(message:Unexpected null for one of the IDs, SD null, column null, serde null) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758) ... {noformat} Should it be disabled for views or can be fixed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752529#comment-13752529 ] Hudson commented on HIVE-3562: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #74 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/74/]) HIVE-3562 : Some limit can be pushed down to map stage (Navis via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518234) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/build.xml * /hive/trunk/ql/ivy.xml * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java * /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown.q * /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown_negative.q * /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown.q.out * /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown_negative.q.out Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, HIVE-3562.D5967.9.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-5093) Use a combiner for LIMIT with GROUP BY and ORDER BY operators
[ https://issues.apache.org/jira/browse/HIVE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-5093. Resolution: Not A Problem HIVE-3562 made this redundant. Use a combiner for LIMIT with GROUP BY and ORDER BY operators - Key: HIVE-5093 URL: https://issues.apache.org/jira/browse/HIVE-5093 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-5093-WIP-01.patch Operator trees of the following structure can have a memory friendly combiner put in place after the sort-phase GBY-LIM and OBY-LIM This will cut down on I/O when spilling to disk and particularly during the merge phase of the reducer. There are two possible combiners - LimitNKeysCombiner and LimitNValuesCombiner. The first one would be ideal for the GROUP-BY case, while the latter would more useful for the ORDER-BY case. The combiners are still relevant even if there are 1:1 forward operators on the reducer side and for small data items, the MR base layer does not run the combiners at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752543#comment-13752543 ] Sivaramakrishnan Narayanan commented on HIVE-3562: -- Good stuff, Navis! Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, HIVE-3562.D5967.9.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752552#comment-13752552 ] Hudson commented on HIVE-3562: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #142 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/142/]) HIVE-3562 : Some limit can be pushed down to map stage (Navis via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518234) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/build.xml * /hive/trunk/ql/ivy.xml * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java * /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown.q * /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown_negative.q * /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown.q.out * /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown_negative.q.out Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, HIVE-3562.D5967.9.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5128) Direct SQL for view is failing
[ https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752553#comment-13752553 ] Hudson commented on HIVE-5128: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #142 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/142/]) HIVE-5128 : Direct SQL for view is failing (Sergey Shelukhin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518258) * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java Direct SQL for view is failing --- Key: HIVE-5128 URL: https://issues.apache.org/jira/browse/HIVE-5128 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch I cannot sure of this, but dropping views, (it rolls back to JPA and works fine) {noformat} etastore.ObjectStore: Direct SQL failed, falling back to ORM MetaException(message:Unexpected null for one of the IDs, SD null, column null, serde null) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758) ... {noformat} Should it be disabled for views or can be fixed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
hive pull request: Kk wb 1228
GitHub user krishna-verticloud opened a pull request: https://github.com/apache/hive/pull/11 Kk wb 1228 You can merge this pull request into a Git repository by running: $ git pull https://github.com/VertiPub/hive kk-WB-1228 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/11.patch
hive pull request: Kk wb 1228
Github user krishna-verticloud closed the pull request at: https://github.com/apache/hive/pull/11
[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path
[ https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752565#comment-13752565 ] Phabricator commented on HIVE-5158: --- sershe has commented on the revision HIVE-5158 [jira] allow getting all partitions for table to also use direct SQL path. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1387 1) I am not sure this will work for SQL JDO; probably it will just get all of them and return limited number of rows. 2) Due to absence of offset, it's really a semi-useless parameter; I don't see it used. REVISION DETAIL https://reviews.facebook.net/D12573 BRANCH HIVE-5158 ARCANIST PROJECT hive To: JIRA, ashutoshc, sershe allow getting all partitions for table to also use direct SQL path -- Key: HIVE-5158 URL: https://issues.apache.org/jira/browse/HIVE-5158 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5158.D12573.1.patch While testing some queries I noticed that getPartitions can be very slow (which happens e.g. in non-strict mode with no partition column filter); with a table with many partitions it can take 10-12s easily. SQL perf path can also be used for this path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5158) allow getting all partitions for table to also use direct SQL path
[ https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-5158: --- Status: Open (was: Patch Available) allow getting all partitions for table to also use direct SQL path -- Key: HIVE-5158 URL: https://issues.apache.org/jira/browse/HIVE-5158 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5158.D12573.1.patch While testing some queries I noticed that getPartitions can be very slow (which happens e.g. in non-strict mode with no partition column filter); with a table with many partitions it can take 10-12s easily. SQL perf path can also be used for this path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752588#comment-13752588 ] Phabricator commented on HIVE-5029: --- ashutoshc has requested changes to the revision HIVE-5029 [jira] direct SQL perf optimization cannot be tested well. Couple of comments. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java:76 Why is this change required? metastore/src/java/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java:19 Can you place this class in metastore/src/test instead of metastore/src/java ? REVISION DETAIL https://reviews.facebook.net/D12483 BRANCH HIVE-sqltest ARCANIST PROJECT hive To: JIRA, ashutoshc, sershe direct SQL perf optimization cannot be tested well -- Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.patch, HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path
[ https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752591#comment-13752591 ] Phabricator commented on HIVE-5158: --- ashutoshc has commented on the revision HIVE-5158 [jira] allow getting all partitions for table to also use direct SQL path. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1387 1) That should still be as fast as orm path, if not faster. 2) Metastore thrift api is public. There can be consumer of if apart from Hive. REVISION DETAIL https://reviews.facebook.net/D12573 BRANCH HIVE-5158 ARCANIST PROJECT hive To: JIRA, ashutoshc, sershe allow getting all partitions for table to also use direct SQL path -- Key: HIVE-5158 URL: https://issues.apache.org/jira/browse/HIVE-5158 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5158.D12573.1.patch While testing some queries I noticed that getPartitions can be very slow (which happens e.g. in non-strict mode with no partition column filter); with a table with many partitions it can take 10-12s easily. SQL perf path can also be used for this path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752624#comment-13752624 ] Phabricator commented on HIVE-5029: --- sershe has commented on the revision HIVE-5029 [jira] direct SQL perf optimization cannot be tested well. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java:76 This class inherits from ObjectStore, so the interface this is looking for is on base class. getInterfaces doesn't give you all the interfaces in the hierarchy metastore/src/java/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java:19 let me try this.. REVISION DETAIL https://reviews.facebook.net/D12483 BRANCH HIVE-sqltest ARCANIST PROJECT hive To: JIRA, ashutoshc, sershe direct SQL perf optimization cannot be tested well -- Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.patch, HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-4460: - Attachment: HIVE-4460.3.patch HIVE-4460.3.patch incorporates RB comments Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752671#comment-13752671 ] Phabricator commented on HIVE-5029: --- ashutoshc has commented on the revision HIVE-5029 [jira] direct SQL perf optimization cannot be tested well. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java:76 Can you add this in a comment here? metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java:81 You can do list.toArray() for this. REVISION DETAIL https://reviews.facebook.net/D12483 BRANCH HIVE-sqltest ARCANIST PROJECT hive To: JIRA, ashutoshc, sershe direct SQL perf optimization cannot be tested well -- Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.patch, HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil
[ https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-5163: - Attachment: HIVE-5163.update HIVE-5163.patch HIVE-5163.move Moved HCatMapRedUtil to org.apache.hcatalog.mapreduce to make above mentioned bugs easier. HIVE-5163.patch - cummulative (for automated build) HIVE-5163.move - just the rename for SVN rename to preserve history HIVE-5163.update - changed to apply after SVN move is done refactor org.apache.hadoop.mapred.HCatMapRedUtil Key: HIVE-5163 URL: https://issues.apache.org/jira/browse/HIVE-5163 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update Everything that this class does is delegated to a Shim class. To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of HCatMapRedUtil and make the calls directly to the Shim layer. It will make it easier because all org.apache.hcatalog classes will move to org.apache.hive.hcatalog classes thus making way to provide binary backwards compat. This class won't change it's name so it's more difficult to provide backwards compat. The org.apache.hadoop.mapred.TempletonJobTracker is not an issue since it goes away in HIVE-4460. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil
[ https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752685#comment-13752685 ] Eugene Koifman commented on HIVE-5163: -- This must be checked in after HIVE-4460 refactor org.apache.hadoop.mapred.HCatMapRedUtil Key: HIVE-5163 URL: https://issues.apache.org/jira/browse/HIVE-5163 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update Everything that this class does is delegated to a Shim class. To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of HCatMapRedUtil and make the calls directly to the Shim layer. It will make it easier because all org.apache.hcatalog classes will move to org.apache.hive.hcatalog classes thus making way to provide binary backwards compat. This class won't change it's name so it's more difficult to provide backwards compat. The org.apache.hadoop.mapred.TempletonJobTracker is not an issue since it goes away in HIVE-4460. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil
[ https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-5163: - Status: Patch Available (was: Open) refactor org.apache.hadoop.mapred.HCatMapRedUtil Key: HIVE-5163 URL: https://issues.apache.org/jira/browse/HIVE-5163 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update Everything that this class does is delegated to a Shim class. To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of HCatMapRedUtil and make the calls directly to the Shim layer. It will make it easier because all org.apache.hcatalog classes will move to org.apache.hive.hcatalog classes thus making way to provide binary backwards compat. This class won't change it's name so it's more difficult to provide backwards compat. The org.apache.hadoop.mapred.TempletonJobTracker is not an issue since it goes away in HIVE-4460. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-4895: - Status: Open (was: Patch Available) This patch will need to be redone after HIVE-4460 HIVE-5163 Move all HCatalog classes to org.apache.hive.hcatalog - Key: HIVE-4895 URL: https://issues.apache.org/jira/browse/HIVE-4895 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, HIVE-4895.update.patch Original Estimate: 24h Time Spent: 12h Remaining Estimate: 12h make sure to preserve history in SCM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 13862: ReduceSinkDeDuplication can pick the wrong partitioning columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13862/ --- (Updated Aug. 28, 2013, 7:03 p.m.) Review request for hive. Changes --- update comments Bugs: HIVE-5149 https://issues.apache.org/jira/browse/HIVE-5149 Repository: hive-git Description --- https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb Diff: https://reviews.apache.org/r/13862/diff/ Testing --- Thanks, Yin Huai
[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5149: --- Attachment: HIVE-5149.2.patch ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x
[ https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752783#comment-13752783 ] Thejas M Nair commented on HIVE-4460: - +1 . I will kick off the tests on my machine as pre-commit tests are not working right now. Publish HCatalog artifacts for Hadoop 2.x - Key: HIVE-4460 URL: https://issues.apache.org/jira/browse/HIVE-4460 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Environment: Hadoop 2.x Reporter: Venkat Ranganathan Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.patch Original Estimate: 72h Time Spent: 40h 40m Remaining Estimate: 31h 20m HCatalog artifacts are only published for Hadoop 1.x version. As more projects add HCatalog integration, the need for HCatalog artifcats on Hadoop versions supported by the product is needed so that automated builds that target different Hadoop releases can be built successfully. For example SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 1.x and 2.x releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752805#comment-13752805 ] Phabricator commented on HIVE-5029: --- sershe has commented on the revision HIVE-5029 [jira] direct SQL perf optimization cannot be tested well. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java:81 ClassUtils.getAllInterfaces returns non-generic list, so it only has toArray(Object[]) REVISION DETAIL https://reviews.facebook.net/D12483 To: JIRA, ashutoshc, sershe direct SQL perf optimization cannot be tested well -- Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch, HIVE-5029.patch, HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5029) direct SQL perf optimization cannot be tested well
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5029: -- Attachment: HIVE-5029.D12483.2.patch sershe updated the revision HIVE-5029 [jira] direct SQL perf optimization cannot be tested well. Update w/feedback. The moved file didn't change. Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D12483 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D12483?vs=38841id=39177#toc AFFECTED FILES metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java metastore/src/test/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java To: JIRA, ashutoshc, sershe direct SQL perf optimization cannot be tested well -- Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch, HIVE-5029.patch, HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil
[ https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752820#comment-13752820 ] Thejas M Nair commented on HIVE-5163: - Looks good +1. This is hcat only change, so I will make sure it doesn't break hive build and run the hcat unit tests before committing. refactor org.apache.hadoop.mapred.HCatMapRedUtil Key: HIVE-5163 URL: https://issues.apache.org/jira/browse/HIVE-5163 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update Everything that this class does is delegated to a Shim class. To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of HCatMapRedUtil and make the calls directly to the Shim layer. It will make it easier because all org.apache.hcatalog classes will move to org.apache.hive.hcatalog classes thus making way to provide binary backwards compat. This class won't change it's name so it's more difficult to provide backwards compat. The org.apache.hadoop.mapred.TempletonJobTracker is not an issue since it goes away in HIVE-4460. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4844: - Attachment: HIVE-4844.9.patch attaching HIVE-4844.9.patch, changes per review from hbutani: - descriptive comment about numericTypes map - TypeInfoParser fix and tests for invalid TypeInfo parameter syntax - raise error if Hive tries to instantiate varchar TypeInfo without type params. - fixed typo in constant value in Thrift file Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, screenshot.png Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4961: -- Attachment: vectorUDF.8.patch Added unit tests, plus support for isRepeating performance optimization for the case when all input vectors passed into a function are marked as isRepeating = true. Fixed a bug related to setting string output. Create bridge for custom UDFs to operate in vectorized mode --- Key: HIVE-4961 URL: https://issues.apache.org/jira/browse/HIVE-4961 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Attachments: vectorUDF.4.patch, vectorUDF.5.patch, vectorUDF.8.patch Suppose you have a custom UDF myUDF() that you've created to extend hive. The goal of this JIRA is to create a facility where if you run a query that uses myUDF() in an expression, the query will run in vectorized mode. This would be a general-purpose bridge for custom UDFs that users add to Hive. It would work with existing UDFs. I'm considering a separate JIRA for a new kind of custom UDF implementation that is vectorized from the beginning, to optimize performance. That is not covered by this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5102) ORC getSplits should create splits based the stripes
[ https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5102: -- Attachment: HIVE-5102.D12579.1.patch omalley requested code review of HIVE-5102 [jira] ORC getSplits should create splits based the stripes. Reviewers: JIRA working on orcinputformat Currently ORC inherits getSplits from FileFormat, which basically makes a split per an HDFS block. This can create too little parallelism and would be better done by having getSplits look at the file footer and create splits based on the stripes. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D12579 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/30219/ To: JIRA, omalley ORC getSplits should create splits based the stripes - Key: HIVE-5102 URL: https://issues.apache.org/jira/browse/HIVE-5102 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5102.D12579.1.patch Currently ORC inherits getSplits from FileFormat, which basically makes a split per an HDFS block. This can create too little parallelism and would be better done by having getSplits look at the file footer and create splits based on the stripes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-5091: Component/s: File Formats ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5102) ORC getSplits should create splits based the stripes
[ https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-5102: Status: Patch Available (was: Open) ORC getSplits should create splits based the stripes - Key: HIVE-5102 URL: https://issues.apache.org/jira/browse/HIVE-5102 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5102.D12579.1.patch Currently ORC inherits getSplits from FileFormat, which basically makes a split per an HDFS block. This can create too little parallelism and would be better done by having getSplits look at the file footer and create splits based on the stripes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh
Thejas M Nair created HIVE-5167: --- Summary: webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh Key: HIVE-5167 URL: https://issues.apache.org/jira/browse/HIVE-5167 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair HIVE-4820 introduced checks for env variables, but it does so before sourcing webhcat-env.sh. This order needs to be reversed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh
[ https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5167: Attachment: HIVE-5167.1.patch webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh --- Key: HIVE-5167 URL: https://issues.apache.org/jira/browse/HIVE-5167 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5167.1.patch HIVE-4820 introduced checks for env variables, but it does so before sourcing webhcat-env.sh. This order needs to be reversed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh
[ https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752941#comment-13752941 ] Thejas M Nair commented on HIVE-5167: - Also, the check for environment variables being set should be changed to from a fatal error to a warning. These are necessary only for default configuration of webhcat. HIVE_HOME is not used in default config file. webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh --- Key: HIVE-5167 URL: https://issues.apache.org/jira/browse/HIVE-5167 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5167.1.patch HIVE-4820 introduced checks for env variables, but it does so before sourcing webhcat-env.sh. This order needs to be reversed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4196) Support for Streaming Partitions in Hive
[ https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752946#comment-13752946 ] Roshan Naik commented on HIVE-4196: --- {quote} According to the Hive coding conventions lines should be bounded at 100 characters. Many lines in this patch exceed that. {quote} Will fix the ones which are not in the thrift generated files. {quote} I'm surprised to see that streamingStatus sets the chunk id for the table. {quote} Seems like a bug. Will fix. {quote} The logic at the end of of these functions doesn't look right. Take getNextChunkID for example. If commitTransaction fails (line 2132) rollback will be called but the next chunk id will still be returned. It seems you need a check on success after commit. I realize many of the calls in the class follow this, but it doesn't seem right. {quote} Good catch. At the time I thought commitTxn() will only fail with an exception does not return false. But on closer inspection there is indeed a corner case (if rollBack was called) that it returns false also. Its a bizzare thing for a function to fail with without exceptions. But for now I will fix my code to live with it. {quote} In HiveMetaStoreClient.java, is assert what you want? Are you ok with the validity of the arguments not being checked most of the time?{quote} Not all checks are in place. There is some checks that will happen at lower layers. Some at higher. Will be adding more checks. {quote} I'm trying to figure out whether the chunk files are moved, deleted, or left alone during the partition rolling. {quote} That would depend on whether the table is defined to be an external or internal table. It is essentially an add_partition of the new partition. It calls HiveMetastore.add_partition_core_notxn() inside a transaction. Support for Streaming Partitions in Hive Key: HIVE-4196 URL: https://issues.apache.org/jira/browse/HIVE-4196 Project: Hive Issue Type: New Feature Components: Database/Schema, HCatalog Affects Versions: 0.10.1 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.docx, HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.pdf, HIVE-4196.v1.patch Motivation: Allow Hive users to immediately query data streaming in through clients such as Flume. Currently Hive partitions must be created after all the data for the partition is available. Thereafter, data in the partitions is considered immutable. This proposal introduces the notion of a streaming partition into which new files an be committed periodically and made available for queries before the partition is closed and converted into a standard partition. The admin enables streaming partition on a table using DDL. He provides the following pieces of information: - Name of the partition in the table on which streaming is enabled - Frequency at which the streaming partition should be closed and converted into a standard partition. Tables with streaming partition enabled will be partitioned by one and only one column. It is assumed that this column will contain a timestamp. Closing the current streaming partition converts it into a standard partition. Based on the specified frequency, the current streaming partition is closed and a new one created for future writes. This is referred to as 'rolling the partition'. A streaming partition's life cycle is as follows: - A new streaming partition is instantiated for writes - Streaming clients request (via webhcat) for a HDFS file name into which they can write a chunk of records for a specific table. - Streaming clients write a chunk (via webhdfs) to that file and commit it(via webhcat). Committing merely indicates that the chunk has been written completely and ready for serving queries. - When the partition is rolled, all committed chunks are swept into single directory and a standard partition pointing to that directory is created. The streaming partition is closed and new streaming partition is created. Rolling the partition is atomic. Streaming clients are agnostic of partition rolling. - Hive queries will be able to query the partition that is currently open for streaming. only committed chunks will be visible. read consistency will be ensured so that repeated reads of the same partition will be idempotent for the lifespan of the query. Partition rolling requires an active agent/thread running to check when it is time to roll and trigger the roll. This could be either be achieved by using an external agent such as Oozie (preferably) or an internal agent. -- This message is automatically generated by JIRA. If you think
[jira] [Updated] (HIVE-5158) allow getting all partitions for table to also use direct SQL path
[ https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5158: -- Attachment: HIVE-5158.D12573.2.patch sershe updated the revision HIVE-5158 [jira] allow getting all partitions for table to also use direct SQL path. Change the patch instead in such manner that PartitionPruner calls the method I already modified. It seems like it doesn't need auth (get-by-filter and get-by-name don't use it). Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D12573 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D12573?vs=39141id=39201#toc MANIPHEST TASKS https://reviews.facebook.net/T63 AFFECTED FILES metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java To: JIRA, ashutoshc, sershe allow getting all partitions for table to also use direct SQL path -- Key: HIVE-5158 URL: https://issues.apache.org/jira/browse/HIVE-5158 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5158.D12573.1.patch, HIVE-5158.D12573.2.patch While testing some queries I noticed that getPartitions can be very slow (which happens e.g. in non-strict mode with no partition column filter); with a table with many partitions it can take 10-12s easily. SQL perf path can also be used for this path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5091: -- Attachment: HIVE-5091.D12249.3.patch omalley updated the revision HIVE-5091 [jira] ORC files should have an option to pad stripes to the HDFS block boundaries. Updated test file dump output Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D12249 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D12249?vs=38865id=39207#toc AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java ql/src/test/resources/orc-file-dump.out To: JIRA, omalley Cc: hagleitn ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-5159) Change the kind fields in ORC's proto file to optional
[ https://issues.apache.org/jira/browse/HIVE-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere reassigned HIVE-5159: Assignee: Jason Dere Change the kind fields in ORC's proto file to optional -- Key: HIVE-5159 URL: https://issues.apache.org/jira/browse/HIVE-5159 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Jason Dere Java's protobuf generated code uses a null value to represent enum values that were added after the reader was compiled. To reflect that reality, the enum values should always be marked as optional. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: pre-commit-build offline untill wed
OK I just fixed this and verified the restart script works. Sorry about the delay, as Edward said things went to hell, I was out of town without a laptop, and our restart scripts were untested. We'll do our best to make sure this doesn't happen again. On Sat, Aug 24, 2013 at 12:07 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I do have access the the build box, however I never poked the sudo mechanism to restart the service before and that is not correct. No pre-commit testing anymore, we have to go back to the old system for a while . -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
[jira] [Updated] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4964: -- Attachment: HIVE-4964.D12585.1.patch hbutani requested code review of HIVE-4964 [jira] Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced. Reviewers: JIRA, ashutoshc merge with trunk There are still pieces of code that deal with: supporting select expressions with Windowing supporting a filter with windowing Need to do this before introducing Perf. improvements. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D12585 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/30237/ To: JIRA, ashutoshc, hbutani Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Priority: Minor Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, HIVE-4964.D12585.1.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4617: Attachment: HIVE-4617.D12507Test.1.patch HIVE-4617.D12507Test.1.patch - Copy of HIVE-4617.D12507.1.patch to kick off tests. ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4617: Status: Open (was: Patch Available) ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4617: Status: Patch Available (was: Open) ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753058#comment-13753058 ] Xuefu Zhang commented on HIVE-4844: --- Hi Jason, Thanks for your response. I understand it's hard to separate your patch into small patches. On the other hand, I'm wondering if the changes you made dealing with precision/scale is required for char/varchar support. If not, could you spare them from you patch? The problem I have is the difficulty to rebase my changes on your patch because of the progressive nature. This might makes easier for both of us to proceed. In the meantime, please feel free to include whatever changes that are needed for both feature. Please let me know. Thanks. Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, screenshot.png Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3183) case expression should allow different types per ISO-SQL 2012
[ https://issues.apache.org/jira/browse/HIVE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiu updated HIVE-3183: -- Attachment: udf_when_type_wrong3.q.out udf_when_type_wrong2.q.out Hive-3183.patch.txt This patch removes the restriction on 'when' clause. So some negative testcases become positive, namely: udf_when_type_wrong2.q udf_when_type_wrong3.q They should be moved from 'ql/src/test/queries/clientnegative/' to 'ql/src/test/queries/clientpositive/', and be renamed to reflect its positive nature. Also in ql/src/test/results/clientpositive/ udf_when_type_wrong2.q.out udf_when_type_wrong3.q.out need to be added. case expression should allow different types per ISO-SQL 2012 - Key: HIVE-3183 URL: https://issues.apache.org/jira/browse/HIVE-3183 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.8.0 Reporter: N Campbell Attachments: Hive-3183.patch.txt, udf_when_type_wrong2.q.out, udf_when_type_wrong3.q.out The ISO-SQL standard specification for CASE allows the specification to include different types in the WHEN and ELSE blocks including this example which mixes smallint and integer types select case when vsint.csint is not null then vsint.csint else 1 end from cert.vsint vsint The Apache Hive docs do not state how it deviates from the standard or any given restrictions so unsure if this is a bug vs an enhancement. Many SQL applications mix so this seems to be a restrictive implementation if this is by design. Argument type mismatch '1': The expression after ELSE should have the same type as those after THEN: smallint is expected but int is found -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
RFC: Major HCatalog refactoring
Hi, Here is the plan for refactoring HCatalog as was agreed to when it was merged into Hive during. HIVE-4869 is the umbrella bug for this work. The changes are complex and touch every single file under hcatalog. Please comment. When HCatalog project was merged into Hive on 0.11 several integration items did not make the 0.11 deadline. It was agreed to finish them in 0.12 release. Specifically: 1. HIVE-4895 - change package name from org.apache.hcatalog to org.apache.hive.hcatalog 2. HIVE-4896 - create binary backwards compatibility layer for hcat users upgrading from 0.11 to 0.12 For item 1, we’ll just move every file under org.apache.hcatalog to org.apache.hive.hcatalog and update all “package” and “import” statement as well as all hcat/webhcat scripts. This will include all JUnit tests. Item 2 will ensure that if a user has a M/R program or Pig script, etc. that uses HCatalog public API, their programs will continue to work w/o change with hive 0.12. The proposal is to make the changes that have as little impact on the build system, in part to make upcoming ‘mavenization’ of hive easier, in part to make the changes more manageable. The list of public interfaces (and their transitive closure) for which backwards compat will be provided. 1. HCatLoader 2. HCatStorer 3. HCatInputFormat 4. HCatOutputFormat 5. HCatReader 6. HCatWriter 7. HCatRecord 8. HCatSchema To achieve this, 0.11 version of these classes will be added in org.apache.hcatalog package (after item 1 is done). Each of these classes as well as dependencies will be deprecated to make it clear that any new development needs to happen in org.apache.hive.hcatalog. 0.11 version of JUnit tests for hcat will also be brought to trunk and handled the same way as mainline code. A sunset clause will be added to the deprecation message. Thus, the published HCatalog JARs will contain both packages and the unit tests will cover both versions of the API. Since these changes are unavoidably disruptive, we’ll need to lock down hcatalog part of hive, check in all existing patches (which are ready, i.e. apply/test cleanly and don’t have review comments which need to be addressed) and them make the refactoring changes. Thanks, Eugene -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753094#comment-13753094 ] Jason Dere commented on HIVE-4844: -- Hi Xuefu, sorry about. I did add precision/scale in a few places, let's take a look: 1. JDBC: The precision/scale is also used for returning varchar length, so these changes are necessary. 2. BaseTypeParams/TypeQualifiers/TTypeQualifiers: These are objects used to hold type qualifier information, and I did add precision/scale fields/setters to these objects. If you'd like them removed I can remove any mention of precision/scale in these objects. 3. TCLIService.thrift: Add constant string values to represent precision/scale fields. I can also remove those constant definitions if you like. Let me know if you want me to remove mention of precision/scale from (2) and (3). Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, screenshot.png Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753109#comment-13753109 ] Hive QA commented on HIVE-5029: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12600451/HIVE-5029.D12483.2.patch {color:green}SUCCESS:{color} +1 2902 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/552/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/552/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. direct SQL perf optimization cannot be tested well -- Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch, HIVE-5029.patch, HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX
[ https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753163#comment-13753163 ] indrajit commented on HIVE-951: --- CREATE EXTERNAL TABLE allow users to us the table on the top of HDFS , Its good feature and it does not look for the path whether it is created or not , After creation of table you can lazily create the path Selectively include EXTERNAL TABLE source files via REGEX - Key: HIVE-951 URL: https://issues.apache.org/jira/browse/HIVE-951 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-951.patch CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular expression. CREATE EXTERNAL TABLE was designed to allow users to access data that exists outside of Hive, and currently makes the assumption that all of the files located under the supplied path should be included in the new table. Users frequently encounter directories containing multiple datasets, or directories that contain data in heterogeneous schemas, and it's often impractical or impossible to adjust the layout of the directory to meet the requirements of CREATE EXTERNAL TABLE. A good example of this problem is creating an external table based on the contents of an S3 bucket. One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE as follows: CREATE EXTERNAL TABLE ... LOCATION path [file_regex] ... For example: {code:sql} CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string ) STORED AS TEXTFILE LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$'; {code} Creates mytable1 which includes all files in s3:/my.bucket with a filename matching 'folder/2009*.bz2' {code:sql} CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int ) STORED AS TEXTFILE LOCATION 'hdfs://data/' 'xyz.*2009.bz2$'; {code} Creates mytable2 including all files matching 'xyz*2009.bz2' located under hdfs://data/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4964: --- Assignee: Harish Butani Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Priority: Minor Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, HIVE-4964.D12585.1.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753169#comment-13753169 ] Phabricator commented on HIVE-4964: --- ashutoshc has accepted the revision HIVE-4964 [jira] Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced. +1 REVISION DETAIL https://reviews.facebook.net/D12585 BRANCH HIVE-4964-2 ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Priority: Minor Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, HIVE-4964.D12585.1.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753175#comment-13753175 ] Sergey Shelukhin commented on HIVE-5029: Hive QA passed direct SQL perf optimization cannot be tested well -- Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch, HIVE-5029.patch, HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5158) allow getting all partitions for table to also use direct SQL path
[ https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5158: -- Attachment: HIVE-5158.D12573.3.patch sershe updated the revision HIVE-5158 [jira] allow getting all partitions for table to also use direct SQL path. Adding the limit support to this and other call... tests are running Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D12573 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D12573?vs=39201id=39237#toc MANIPHEST TASKS https://reviews.facebook.net/T63 AFFECTED FILES metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java To: JIRA, ashutoshc, sershe allow getting all partitions for table to also use direct SQL path -- Key: HIVE-5158 URL: https://issues.apache.org/jira/browse/HIVE-5158 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5158.D12573.1.patch, HIVE-5158.D12573.2.patch, HIVE-5158.D12573.3.patch While testing some queries I noticed that getPartitions can be very slow (which happens e.g. in non-strict mode with no partition column filter); with a table with many partitions it can take 10-12s easily. SQL perf path can also be used for this path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753217#comment-13753217 ] Phabricator commented on HIVE-5029: --- ashutoshc has commented on the revision HIVE-5029 [jira] direct SQL perf optimization cannot be tested well. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1684 Before throwing exceptions, don't we need to rollbackTransaction() ? metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1783 Before throwing exception, don't we need to rollbackTransaction() ? REVISION DETAIL https://reviews.facebook.net/D12483 To: JIRA, ashutoshc, sershe direct SQL perf optimization cannot be tested well -- Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch, HIVE-5029.patch, HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5107) Change hive's build to maven
[ https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753233#comment-13753233 ] Roshan Naik commented on HIVE-5107: --- curious .. is ant's 'makepom' task (to convert a ivy file into a pom file) a useful starting point for such an effort ? Change hive's build to maven Key: HIVE-5107 URL: https://issues.apache.org/jira/browse/HIVE-5107 Project: Hive Issue Type: Task Reporter: Edward Capriolo Assignee: Edward Capriolo I can not cope with hive's build infrastructure any more. I have started working on porting the project to maven. When I have some solid progess i will github the entire thing for review. Then we can talk about switching the project somehow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path
[ https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753248#comment-13753248 ] Phabricator commented on HIVE-5158: --- ashutoshc has accepted the revision HIVE-5158 [jira] allow getting all partitions for table to also use direct SQL path. +1 REVISION DETAIL https://reviews.facebook.net/D12573 BRANCH HIVE-5158 ARCANIST PROJECT hive To: JIRA, ashutoshc, sershe allow getting all partitions for table to also use direct SQL path -- Key: HIVE-5158 URL: https://issues.apache.org/jira/browse/HIVE-5158 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5158.D12573.1.patch, HIVE-5158.D12573.2.patch, HIVE-5158.D12573.3.patch While testing some queries I noticed that getPartitions can be very slow (which happens e.g. in non-strict mode with no partition column filter); with a table with many partitions it can take 10-12s easily. SQL perf path can also be used for this path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5102) ORC getSplits should create splits based the stripes
[ https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753305#comment-13753305 ] Hive QA commented on HIVE-5102: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12600472/HIVE-5102.D12579.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2907 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testFileGenerator {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/554/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/554/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ORC getSplits should create splits based the stripes - Key: HIVE-5102 URL: https://issues.apache.org/jira/browse/HIVE-5102 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5102.D12579.1.patch Currently ORC inherits getSplits from FileFormat, which basically makes a split per an HDFS block. This can create too little parallelism and would be better done by having getSplits look at the file footer and create splits based on the stripes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753307#comment-13753307 ] Hudson commented on HIVE-3562: -- SUCCESS: Integrated in Hive-trunk-h0.21 #2295 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2295/]) HIVE-3562 : Some limit can be pushed down to map stage (Navis via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518234) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/build.xml * /hive/trunk/ql/ivy.xml * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java * /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown.q * /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown_negative.q * /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown.q.out * /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown_negative.q.out Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, HIVE-3562.D5967.9.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5128) Direct SQL for view is failing
[ https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753308#comment-13753308 ] Hudson commented on HIVE-5128: -- SUCCESS: Integrated in Hive-trunk-h0.21 #2295 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2295/]) HIVE-5128 : Direct SQL for view is failing (Sergey Shelukhin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518258) * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java Direct SQL for view is failing --- Key: HIVE-5128 URL: https://issues.apache.org/jira/browse/HIVE-5128 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.12.0 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch I cannot sure of this, but dropping views, (it rolls back to JPA and works fine) {noformat} etastore.ObjectStore: Direct SQL failed, falling back to ORM MetaException(message:Unexpected null for one of the IDs, SD null, column null, serde null) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758) ... {noformat} Should it be disabled for views or can be fixed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira