[jira] [Commented] (HIVE-3430) group by followed by join with the same key should be optimized
[ https://issues.apache.org/jira/browse/HIVE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590321#comment-13590321 ] Lianhui Wang commented on HIVE-3430: also should consider the following query: SELECT a.key, a.cnt, b.key, a.cnt FROM (SELECT x.key as key, count(x.value) AS cnt FROM src x group by x.key) a JOIN src b ON (a.key = b.key); group by followed by join with the same key should be optimized --- Key: HIVE-3430 URL: https://issues.apache.org/jira/browse/HIVE-3430 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.10.0 Reporter: Namit Jain -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4095) Add exchange partition in Hive
Namit Jain created HIVE-4095: Summary: Add exchange partition in Hive Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain It would very useful to support exchange partition in hive, something similar to http://www.orafaq.com/node/2570 in Oracle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590406#comment-13590406 ] Namit Jain commented on HIVE-4095: -- More details can be found at: https://cwiki.apache.org/confluence/display/Hive/Exchange+Partition Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain It would very useful to support exchange partition in hive, something similar to http://www.orafaq.com/node/2570 in Oracle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590410#comment-13590410 ] Namit Jain commented on HIVE-4095: -- It might be easier to have a syntax closer to http://www.techrepublic.com/blog/datacenter/partition-switching-in-sql-server-2005/143 Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain It would very useful to support exchange partition in hive, something similar to http://www.orafaq.com/node/2570 in Oracle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Deleted] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4095: - Comment: was deleted (was: It might be easier to have a syntax closer to http://www.techrepublic.com/blog/datacenter/partition-switching-in-sql-server-2005/143) Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain It would very useful to support exchange partition in hive, something similar to http://www.orafaq.com/node/2570 in Oracle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4096) problem in hive.map.groupby.sorted with distincts
Namit Jain created HIVE-4096: Summary: problem in hive.map.groupby.sorted with distincts Key: HIVE-4096 URL: https://issues.apache.org/jira/browse/HIVE-4096 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain set hive.enforce.bucketing = true; set hive.enforce.sorting = true; set hive.exec.reducers.max = 10; set hive.map.groupby.sorted=true; CREATE TABLE T1(key STRING, val STRING) PARTITIONED BY (ds string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/T1.txt' INTO TABLE T1 PARTITION (ds='1'); -- perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE T1 PARTITION (ds='1') select key, val from T1 where ds = '1'; CREATE TABLE outputTbl1(cnt INT); -- The plan should be converted to a map-side group by, since the -- sorting columns and grouping columns match, and all the bucketing columns -- are part of sorting columns EXPLAIN select count(distinct key) from T1; select count(distinct key) from T1; explain INSERT OVERWRITE TABLE outputTbl1 select count(distinct key) from T1; INSERT OVERWRITE TABLE outputTbl1 select count(distinct key) from T1; SELECT * FROM outputTbl1; DROP TABLE T1; The above query gives wrong results -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4073) Make partition by optional in over clause
[ https://issues.apache.org/jira/browse/HIVE-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4073: --- Attachment: HIVE-4073-2.patch Attached patch seems to work well. All the ptf tests pass and the query discussed above fails. Make partition by optional in over clause - Key: HIVE-4073 URL: https://issues.apache.org/jira/browse/HIVE-4073 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Ashutosh Chauhan Assignee: Brock Noland Attachments: HIVE-4073-0.patch, HIVE-4073-1.patch, HIVE-4073-2.patch select s, sum( i ) over() from tt; should work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4073) Make partition by optional in over clause
[ https://issues.apache.org/jira/browse/HIVE-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4073: --- Status: Patch Available (was: Open) Make partition by optional in over clause - Key: HIVE-4073 URL: https://issues.apache.org/jira/browse/HIVE-4073 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Ashutosh Chauhan Assignee: Brock Noland Attachments: HIVE-4073-0.patch, HIVE-4073-1.patch, HIVE-4073-2.patch select s, sum( i ) over() from tt; should work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1994 - Fixed
Changes for Build #1992 Changes for Build #1993 Changes for Build #1994 All tests passed The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1994) Status: Fixed Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1994/ to view the results.
[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-3874: Status: Patch Available (was: Open) Pamela, Yeah, that probably makes sense. I'll file the follow up jiras. Create a new Optimized Row Columnar file format for Hive Key: HIVE-3874 URL: https://issues.apache.org/jira/browse/HIVE-3874 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, HIVE-3874.D8529.2.patch, HIVE-3874.D8529.3.patch, HIVE-3874.D8529.4.patch, HIVE-3874.D8871.1.patch, OrcFileIntro.pptx, orc.tgz There are several limitations of the current RC File format that I'd like to address by creating a new format: * each column value is stored as a binary blob, which means: ** the entire column value must be read, decompressed, and deserialized ** the file format can't use smarter type-specific compression ** push down filters can't be evaluated * the start of each row group needs to be found by scanning * user metadata can only be added to the file when the file is created * the file doesn't store the number of rows per a file or row group * there is no mechanism for seeking to a particular row number, which is required for external indexes. * there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4097) ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids
Owen O'Malley created HIVE-4097: --- Summary: ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids Key: HIVE-4097 URL: https://issues.apache.org/jira/browse/HIVE-4097 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Hive assumes that an empty string in hive.io.file.readcolumn.ids means all columns. The ORC reader currently assumes it means no columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4098) OrcInputFormat assumes Hive always calls createValue
Owen O'Malley created HIVE-4098: --- Summary: OrcInputFormat assumes Hive always calls createValue Key: HIVE-4098 URL: https://issues.apache.org/jira/browse/HIVE-4098 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Hive's HiveContextAwareRecordReader doesn't create a new value for each InputFormat and instead reuses the same row between input formats. That causes the first record of second (and third, etc.) partition to be dropped and replaced with the last row of the previous partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4099) Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
Zafar Gilani created HIVE-4099: -- Summary: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask Key: HIVE-4099 URL: https://issues.apache.org/jira/browse/HIVE-4099 Project: Hive Issue Type: Bug Affects Versions: 0.7.1 Environment: GNU/Linux x86_64, kernel 2.6.32-131.0.15.e16.x86_64, 16 cores, 48 GB main memory, 16 mappers, 8 reducers, mapred.java.child.opts set to 2g. Reporter: Zafar Gilani Join query fails with Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask. hive.log: ERROR exec.MapredLocalTask (SessionState.java:printError(365)) ERROR ql.Driver (SessionState.java:printError(365)) Select and insert queries work fine. Simplest of join fails. Data-set size: Two tables being joined, have 27k records each, each record having three fields. Already tried and failed: - Add contrib jar to the hive classpath - Set Hadoop mapred.child.java.opts to 2 to 8g of memory - Set Hive mapred.child.java.opts to 2 to 8g of memory - Set hive.auto.convert.join to true (regular join to mapjoin) - Set hive.optimize.skewjoin to true (handle skewness in data) - Set hive.mapjoin.maxsize to 100 (small table rows, both tables have 27k rows) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4015) Add ORC file to the grammar as a file format
[ https://issues.apache.org/jira/browse/HIVE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590722#comment-13590722 ] Owen O'Malley commented on HIVE-4015: - Gunther, this looks good. I'd suggest removing the code that lets you override the serde, since with ORC you really don't want to do that. Add ORC file to the grammar as a file format Key: HIVE-4015 URL: https://issues.apache.org/jira/browse/HIVE-4015 Project: Hive Issue Type: Improvement Reporter: Owen O'Malley Assignee: Gunther Hagleitner Attachments: HIVE-4015.1.patch, HIVE-4015.2.patch, HIVE-4015.3.patch It would be much more convenient for users if we enable them to use ORC as a file format in the HQL grammar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4097) ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids
[ https://issues.apache.org/jira/browse/HIVE-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4097: -- Attachment: HIVE-4097.D9015.1.patch omalley requested code review of HIVE-4097 [jira] ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids. Reviewers: JIRA HIVE-4097 Hive assumes that an empty string in hive.io.file.readcolumn.ids means all columns. The ORC reader currently assumes it means no columns. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D9015 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/21861/ To: JIRA, omalley ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids - Key: HIVE-4097 URL: https://issues.apache.org/jira/browse/HIVE-4097 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-4097.D9015.1.patch Hive assumes that an empty string in hive.io.file.readcolumn.ids means all columns. The ORC reader currently assumes it means no columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590757#comment-13590757 ] Kevin Wilfong commented on HIVE-3874: - Thanks Pam and Owen. +1 again Create a new Optimized Row Columnar file format for Hive Key: HIVE-3874 URL: https://issues.apache.org/jira/browse/HIVE-3874 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, HIVE-3874.D8529.2.patch, HIVE-3874.D8529.3.patch, HIVE-3874.D8529.4.patch, HIVE-3874.D8871.1.patch, OrcFileIntro.pptx, orc.tgz There are several limitations of the current RC File format that I'd like to address by creating a new format: * each column value is stored as a binary blob, which means: ** the entire column value must be read, decompressed, and deserialized ** the file format can't use smarter type-specific compression ** push down filters can't be evaluated * the start of each row group needs to be found by scanning * user metadata can only be added to the file when the file is created * the file doesn't store the number of rows per a file or row group * there is no mechanism for seeking to a particular row number, which is required for external indexes. * there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter
[ https://issues.apache.org/jira/browse/HIVE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590769#comment-13590769 ] Li Yang commented on HIVE-4045: --- Namit, Can you please review this change? Thanks, Li Modify PreDropPartitionEvent to pass Table parameter Key: HIVE-4045 URL: https://issues.apache.org/jira/browse/HIVE-4045 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Li Yang Assignee: Li Yang Priority: Minor MetaStorePreEventListener which implements onEvent(PreEventContext context) sometimes needs to access Table properties when PreDropPartitionEvent is listened to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter
[ https://issues.apache.org/jira/browse/HIVE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-4045 started by Li Yang. Modify PreDropPartitionEvent to pass Table parameter Key: HIVE-4045 URL: https://issues.apache.org/jira/browse/HIVE-4045 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Li Yang Assignee: Li Yang Priority: Minor MetaStorePreEventListener which implements onEvent(PreEventContext context) sometimes needs to access Table properties when PreDropPartitionEvent is listened to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter
[ https://issues.apache.org/jira/browse/HIVE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Yang updated HIVE-4045: -- Status: Patch Available (was: In Progress) Modify PreDropPartitionEvent to pass Table parameter Key: HIVE-4045 URL: https://issues.apache.org/jira/browse/HIVE-4045 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Li Yang Assignee: Li Yang Priority: Minor MetaStorePreEventListener which implements onEvent(PreEventContext context) sometimes needs to access Table properties when PreDropPartitionEvent is listened to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4100) Improve regex_replace UDF to allow non-ascii characters
Mark Grover created HIVE-4100: - Summary: Improve regex_replace UDF to allow non-ascii characters Key: HIVE-4100 URL: https://issues.apache.org/jira/browse/HIVE-4100 Project: Hive Issue Type: Improvement Components: UDF Affects Versions: 0.10.0 Reporter: Mark Grover Assignee: Mark Grover Fix For: 0.11.0 There have a been a few email threads on the user mailing list regarding regex_replace UDF not supporting non-ASCII characters. We should validate that and improve the UDF to allow it. Translate UDF will be a good reference since it does that by using code points instead of characters -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4101) Partition By field must be in select field list
Brock Noland created HIVE-4101: -- Summary: Partition By field must be in select field list Key: HIVE-4101 URL: https://issues.apache.org/jira/browse/HIVE-4101 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Brock Noland This following query: {noformat} SELECT year, quarter, sales,avg(sales) OVER (PARTITION BY department, year) FROM quarterly_sales WHERE department = 'Appliances'; {noformat} fails as below. If department is moved to the select field list it passes. {noformat} Diagnostic Messages for this Task:java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:160) ... 14 more Caused by: java.lang.RuntimeException: cannot find field _col0 from [0:_col1, 1:_col2, 2:_col3] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:143) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) at org.apache.hadoop.hive.ql.exec.PTFOperator.setupKeysWrapper(PTFOperator.java:193) at org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:100) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409) at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:152) ... 14 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4101) Partition By field must be in select field list
[ https://issues.apache.org/jira/browse/HIVE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590789#comment-13590789 ] Ashutosh Chauhan commented on HIVE-4101: This is same as HIVE-4085. It is caused by HIVE-4035, before which such a query used to succeed. Partition By field must be in select field list --- Key: HIVE-4101 URL: https://issues.apache.org/jira/browse/HIVE-4101 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Brock Noland This following query: {noformat} SELECT year, quarter, sales,avg(sales) OVER (PARTITION BY department, year) FROM quarterly_sales WHERE department = 'Appliances'; {noformat} fails as below. If department is moved to the select field list it passes. {noformat} Diagnostic Messages for this Task:java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:160) ... 14 more Caused by: java.lang.RuntimeException: cannot find field _col0 from [0:_col1, 1:_col2, 2:_col3] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:143) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) at org.apache.hadoop.hive.ql.exec.PTFOperator.setupKeysWrapper(PTFOperator.java:193) at org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:100) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409) at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:152) ... 14 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4101) Partition By field must be in select field list
[ https://issues.apache.org/jira/browse/HIVE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland resolved HIVE-4101. Resolution: Duplicate Thanks! Resolving as dup.. Partition By field must be in select field list --- Key: HIVE-4101 URL: https://issues.apache.org/jira/browse/HIVE-4101 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Brock Noland This following query: {noformat} SELECT year, quarter, sales,avg(sales) OVER (PARTITION BY department, year) FROM quarterly_sales WHERE department = 'Appliances'; {noformat} fails as below. If department is moved to the select field list it passes. {noformat} Diagnostic Messages for this Task:java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:160) ... 14 more Caused by: java.lang.RuntimeException: cannot find field _col0 from [0:_col1, 1:_col2, 2:_col3] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:143) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) at org.apache.hadoop.hive.ql.exec.PTFOperator.setupKeysWrapper(PTFOperator.java:193) at org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:100) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409) at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:152) ... 14 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-3985) Update new UDAFs introduced for Windowing to work with new Decimal Type
[ https://issues.apache.org/jira/browse/HIVE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland reassigned HIVE-3985: -- Assignee: Brock Noland Update new UDAFs introduced for Windowing to work with new Decimal Type --- Key: HIVE-3985 URL: https://issues.apache.org/jira/browse/HIVE-3985 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Brock Noland -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4100) Improve regex_replace UDF to allow non-ascii characters
[ https://issues.apache.org/jira/browse/HIVE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590803#comment-13590803 ] thattommyhall commented on HIVE-4100: - The particular case I had was regexp_replace(some_column,[^\\u-\\u],\ufffd) does not work, whereas regexp_replace(some_column,[^\\u-\\u],�) does. So we need a way to specify unicode chars in the replace string. Improve regex_replace UDF to allow non-ascii characters --- Key: HIVE-4100 URL: https://issues.apache.org/jira/browse/HIVE-4100 Project: Hive Issue Type: Improvement Components: UDF Affects Versions: 0.10.0 Reporter: Mark Grover Assignee: Mark Grover Fix For: 0.11.0 There have a been a few email threads on the user mailing list regarding regex_replace UDF not supporting non-ASCII characters. We should validate that and improve the UDF to allow it. Translate UDF will be a good reference since it does that by using code points instead of characters -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4093) Remove sprintf from PTFTranslator and use String.format()
[ https://issues.apache.org/jira/browse/HIVE-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4093: --- Status: Patch Available (was: Open) Remove sprintf from PTFTranslator and use String.format() - Key: HIVE-4093 URL: https://issues.apache.org/jira/browse/HIVE-4093 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-4093-0.patch, HIVE-4093-1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter
[ https://issues.apache.org/jira/browse/HIVE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-4045: Status: Open (was: Patch Available) Modify PreDropPartitionEvent to pass Table parameter Key: HIVE-4045 URL: https://issues.apache.org/jira/browse/HIVE-4045 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Li Yang Assignee: Li Yang Priority: Minor MetaStorePreEventListener which implements onEvent(PreEventContext context) sometimes needs to access Table properties when PreDropPartitionEvent is listened to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter
[ https://issues.apache.org/jira/browse/HIVE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590867#comment-13590867 ] Kevin Wilfong commented on HIVE-4045: - Comments on Phabricator Modify PreDropPartitionEvent to pass Table parameter Key: HIVE-4045 URL: https://issues.apache.org/jira/browse/HIVE-4045 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Li Yang Assignee: Li Yang Priority: Minor MetaStorePreEventListener which implements onEvent(PreEventContext context) sometimes needs to access Table properties when PreDropPartitionEvent is listened to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4102) Can't drop table with Postgresql metastore
sekine coulibaly created HIVE-4102: -- Summary: Can't drop table with Postgresql metastore Key: HIVE-4102 URL: https://issues.apache.org/jira/browse/HIVE-4102 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.10.0 Environment: Centos 6.3 CDH 4.2.0 Reporter: sekine coulibaly Setup a fresh hive install, create a table pointing to an HDFS file. Then, when trying to drop that table, the CLI hangs for a while and then displays : hive drop table log; FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask Trying another time : hive drop table log; FAILED: SemanticException [Error 10001]: Table not found log getting tables list : hive show tables; FAILED: Error in metadata: MetaException(message:Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask hive For this last query, the Postgresql logs showthe following : LOG: connection received: host=127.0.0.1 port=49717 LOG: connection authorized: user=hiveuser database=metastore LOG: execute unnamed: SHOW TRANSACTION ISOLATION LEVEL LOG: execute S_1: BEGIN LOG: execute unnamed: SELECT 'org.apache.hadoop.hive.metastore.model.MDatabase' AS NUCLEUS_TYPE,THIS.DESC,THIS.DB_LOCATION_URI,THIS.NAME,THIS.DB_ID FROM DBS THIS WHERE THIS.NAME = $1 DETAIL: parameters: $1 = 'default' LOG: execute unnamed: SELECT A0.PARAM_KEY,A0.PARAM_VALUE FROM DATABASE_PARAMS A0 WHERE A0.DB_ID = $1 AND A0.PARAM_KEY IS NOT NULL DETAIL: parameters: $1 = '1' LOG: execute S_2: COMMIT LOG: execute unnamed: SHOW TRANSACTION ISOLATION LEVEL LOG: execute S_1: BEGIN WARNING: nonstandard use of \\ in a string literal at character 234 HINT: Use the escape string syntax for backslashes, e.g., E'\\'. (standard_conforming_strings = off). Would this help ? http://mapredit.blogspot.fr/2012/12/hive-drop-table-hangs-postgres-metastore.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1995 - Failure
Changes for Build #1995 No tests ran. The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1995) Status: Failure Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1995/ to view the results.
[jira] [Updated] (HIVE-4097) ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids
[ https://issues.apache.org/jira/browse/HIVE-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-4097: Status: Patch Available (was: Open) This patch fixes the problem and adds a test case to ensure that the empty string is correctly handled. ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids - Key: HIVE-4097 URL: https://issues.apache.org/jira/browse/HIVE-4097 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-4097.D9015.1.patch Hive assumes that an empty string in hive.io.file.readcolumn.ids means all columns. The ORC reader currently assumes it means no columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
Gopal V created HIVE-4103: - Summary: Remove System.gc() call from the map-join local-task loop Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4098) OrcInputFormat assumes Hive always calls createValue
[ https://issues.apache.org/jira/browse/HIVE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4098: -- Attachment: HIVE-4098.D9021.1.patch omalley requested code review of HIVE-4098 [jira] OrcInputFormat assumes Hive always calls createValue. Reviewers: JIRA hive-4098 remove assumption that only an inputformat's createValue() is used in the next() calls. Hive's HiveContextAwareRecordReader doesn't create a new value for each InputFormat and instead reuses the same row between input formats. That causes the first record of second (and third, etc.) partition to be dropped and replaced with the last row of the previous partition. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D9021 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/21897/ To: JIRA, omalley OrcInputFormat assumes Hive always calls createValue Key: HIVE-4098 URL: https://issues.apache.org/jira/browse/HIVE-4098 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-4098.D9021.1.patch Hive's HiveContextAwareRecordReader doesn't create a new value for each InputFormat and instead reuses the same row between input formats. That causes the first record of second (and third, etc.) partition to be dropped and replaced with the last row of the previous partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4103: -- Priority: Minor (was: Major) Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Priority: Minor Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4098) OrcInputFormat assumes Hive always calls createValue
[ https://issues.apache.org/jira/browse/HIVE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-4098: Status: Patch Available (was: Open) The patch removes the assumption of a dedicated row for each RecordReader. OrcInputFormat assumes Hive always calls createValue Key: HIVE-4098 URL: https://issues.apache.org/jira/browse/HIVE-4098 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-4098.D9021.1.patch Hive's HiveContextAwareRecordReader doesn't create a new value for each InputFormat and instead reuses the same row between input formats. That causes the first record of second (and third, etc.) partition to be dropped and replaced with the last row of the previous partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4071) Map-join outer join produces incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4071: - Attachment: HIVE-4071_2.patch Updated to remove the dead code. Still needs work to address tests comment. Map-join outer join produces incorrect results. --- Key: HIVE-4071 URL: https://issues.apache.org/jira/browse/HIVE-4071 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4071_2.patch, HIVE-4071.patch For example, if one sets the size of noConditionalTask.size to 10 with corresponding auto join configurations set to true in auto_join28.q instead of the current smalltable.filesize configuration, we will observe different results if a select query is run. (The test only has explain statements at present). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2655) Ability to define functions in HQL
[ https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590981#comment-13590981 ] Carl Steinbach commented on HIVE-2655: -- @Brock: Can you open a new phabricator review request for your updated version of the patch? Thanks. Ability to define functions in HQL -- Key: HIVE-2655 URL: https://issues.apache.org/jira/browse/HIVE-2655 Project: Hive Issue Type: New Feature Components: SQL Reporter: Jonathan Perlow Assignee: Brock Noland Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch, HIVE-2655-9.patch Ability to create functions in HQL as a substitute for creating them in Java. Jonathan Chang requested I create this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4103: -- Attachment: HIVE-4103.patch Remove the thread-stopping System.gc() calls from isAbort() Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Priority: Minor Attachments: HIVE-4103.patch Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2655) Ability to define functions in HQL
[ https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590986#comment-13590986 ] Brock Noland commented on HIVE-2655: [~cwsteinbach] I did here https://reviews.facebook.net/D8673 and setup as a weblink not sure why it's not updating the JIRA automatically. Ability to define functions in HQL -- Key: HIVE-2655 URL: https://issues.apache.org/jira/browse/HIVE-2655 Project: Hive Issue Type: New Feature Components: SQL Reporter: Jonathan Perlow Assignee: Brock Noland Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch, HIVE-2655-9.patch Ability to create functions in HQL as a substitute for creating them in Java. Jonathan Chang requested I create this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590988#comment-13590988 ] Gopal V commented on HIVE-4103: --- On a run, the difference was {code} 2013-03-01 04:57:21 Upload 1 File to: file:/tmp/root/hive_2013-03-01_16-56-53_785_1192800933446838868/-local-10002/HashTable-Stage-1/MapJoin-demographics-01--.hashtable File size: 18426794 2013-03-01 04:57:21 End of local task; Time Taken: 22.426 sec. {code} versus, after-fix {code} 2013-03-01 04:56:26 Upload 1 File to: file:/tmp/root/hive_2013-03-01_16-56-01_539_5116929752955084952/-local-10002/HashTable-Stage-1/MapJoin-demographics-01--.hashtable File size: 18426794 2013-03-01 04:56:26 End of local task; Time Taken: 19.874 sec. {code} Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Priority: Minor Attachments: HIVE-4103.patch Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4103: -- Release Note: Remove System.gc() calls from HashMapWrapper::isAbort() to avoid slow-downs during local task of the map-join Status: Patch Available (was: Open) Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Priority: Minor Attachments: HIVE-4103.patch Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-4103: - Assignee: Gopal V Remove System.gc() call from the map-join local-task loop - Key: HIVE-4103 URL: https://issues.apache.org/jira/browse/HIVE-4103 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-4103.patch Hive's HashMapWrapper calls System.gc() twice within the HashMapWrapper::isAbort() which produces a significant slow-down during the loop. {code} 2013-03-01 04:54:28 The gc calls took 677 ms 2013-03-01 04:54:28 Processing rows:20 Hashtable size: 19 Memory usage: 62955432rate: 0.033 2013-03-01 04:54:31 The gc calls took 956 ms 2013-03-01 04:54:31 Processing rows:30 Hashtable size: 29 Memory usage: 90826656rate: 0.048 2013-03-01 04:54:33 The gc calls took 967 ms 2013-03-01 04:54:33 Processing rows:384160 Hashtable size: 384160 Memory usage: 114412712 rate: 0.06 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4104) Hive localtask does not buffer disk-writes or reads
Gopal V created HIVE-4104: - Summary: Hive localtask does not buffer disk-writes or reads Key: HIVE-4104 URL: https://issues.apache.org/jira/browse/HIVE-4104 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Hive's HashMapWrapper does not use any buffering in its File I/O, but operates sequentially for writes reads. The strace logs show clearly that {code} 9495 write(222, x, 1)= 1 9495 write(222, sq\0~\0\5, 6)= 6 9495 write(222, w\25, 2) = 2 9495 write(222, \0\0\0\1\0\0\0\1\0\0\0\2\0\0\0\5\3\1M\1S, 21) = 21 9495 write(222, x, 1)= 1 9495 write(222, sq\0~\0\2, 6)= 6 9495 write(222, w\t, 2) = 2 9495 write(222, \0\0\0\5\1\215\r\325v, 9) = 9 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4104) Hive localtask does not buffer disk-writes or reads
[ https://issues.apache.org/jira/browse/HIVE-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591008#comment-13591008 ] Gopal V commented on HIVE-4104: --- Before {code} 2013-03-01 05:15:13 Dump the hashtable into file: file:/tmp/root/hive_2013-03-01_17-14-59_468_442960319525994949/-local-10002/HashTable-Stage-1/MapJoin-customer_demographics-01--.hashtable 2013-03-01 05:15:27 Upload 1 File to: file:/tmp/root/hive_2013-03-01_17-14-59_468_442960319525994949/-local-10002/HashTable-Stage-1/MapJoin-customer_demographics-01--.hashtable File size: 18426794 2013-03-01 05:15:27 End of local task; Time Taken: 22.314 sec. {code} After {code} 2013-03-01 05:15:53 Dump the hashtable into file: file:/tmp/root/hive_2013-03-01_17-15-39_668_1531738824783900468/-local-10002/HashTable-Stage-1/MapJoin-demographics-01--.hashtable 2013-03-01 05:15:54 Upload 1 File to: file:/tmp/root/hive_2013-03-01_17-15-39_668_1531738824783900468/-local-10002/HashTable-Stage-1/MapJoin-demographics-01--.hashtable File size: 18426794 2013-03-01 05:15:54 End of local task; Time Taken: 9.601 sec. {code} Savings are found on the map-side read as well. Before {code} Job 0: Map: 4 Cumulative CPU: 64.79 sec HDFS Read: 300156 HDFS Write: 1682 SUCCESS Total MapReduce CPU Time Spent: 1 minutes 4 seconds 790 msec Time taken: 56.385 seconds, Fetched: 100 row(s) {code} After {code} Job 0: Map: 4 Cumulative CPU: 26.95 sec HDFS Read: 300156 HDFS Write: 1682 SUCCESS Total MapReduce CPU Time Spent: 26 seconds 950 msec Time taken: 38.173 seconds, Fetched: 100 row(s) {code} Hive localtask does not buffer disk-writes or reads --- Key: HIVE-4104 URL: https://issues.apache.org/jira/browse/HIVE-4104 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Hive's HashMapWrapper does not use any buffering in its File I/O, but operates sequentially for writes reads. The strace logs show clearly that {code} 9495 write(222, x, 1)= 1 9495 write(222, sq\0~\0\5, 6)= 6 9495 write(222, w\25, 2) = 2 9495 write(222, \0\0\0\1\0\0\0\1\0\0\0\2\0\0\0\5\3\1M\1S, 21) = 21 9495 write(222, x, 1)= 1 9495 write(222, sq\0~\0\2, 6)= 6 9495 write(222, w\t, 2) = 2 9495 write(222, \0\0\0\5\1\215\r\325v, 9) = 9 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4104) Hive localtask does not buffer disk-writes or reads
[ https://issues.apache.org/jira/browse/HIVE-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591010#comment-13591010 ] Brock Noland commented on HIVE-4104: Nice find! Hive localtask does not buffer disk-writes or reads --- Key: HIVE-4104 URL: https://issues.apache.org/jira/browse/HIVE-4104 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-4104.patch Hive's HashMapWrapper does not use any buffering in its File I/O, but operates sequentially for writes reads. The strace logs show clearly that {code} 9495 write(222, x, 1)= 1 9495 write(222, sq\0~\0\5, 6)= 6 9495 write(222, w\25, 2) = 2 9495 write(222, \0\0\0\1\0\0\0\1\0\0\0\2\0\0\0\5\3\1M\1S, 21) = 21 9495 write(222, x, 1)= 1 9495 write(222, sq\0~\0\2, 6)= 6 9495 write(222, w\t, 2) = 2 9495 write(222, \0\0\0\5\1\215\r\325v, 9) = 9 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4104) Hive localtask does not buffer disk-writes or reads
[ https://issues.apache.org/jira/browse/HIVE-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4104: -- Attachment: HIVE-4104.patch Buffer I/O for HashMapWrapper Hive localtask does not buffer disk-writes or reads --- Key: HIVE-4104 URL: https://issues.apache.org/jira/browse/HIVE-4104 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-4104.patch Hive's HashMapWrapper does not use any buffering in its File I/O, but operates sequentially for writes reads. The strace logs show clearly that {code} 9495 write(222, x, 1)= 1 9495 write(222, sq\0~\0\5, 6)= 6 9495 write(222, w\25, 2) = 2 9495 write(222, \0\0\0\1\0\0\0\1\0\0\0\2\0\0\0\5\3\1M\1S, 21) = 21 9495 write(222, x, 1)= 1 9495 write(222, sq\0~\0\2, 6)= 6 9495 write(222, w\t, 2) = 2 9495 write(222, \0\0\0\5\1\215\r\325v, 9) = 9 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4104) Hive localtask does not buffer disk-writes or reads
[ https://issues.apache.org/jira/browse/HIVE-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4104: -- Release Note: Buffer I/O on HashMapWrapper to speed up write/read ops Status: Patch Available (was: Open) Hive localtask does not buffer disk-writes or reads --- Key: HIVE-4104 URL: https://issues.apache.org/jira/browse/HIVE-4104 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-4104.patch Hive's HashMapWrapper does not use any buffering in its File I/O, but operates sequentially for writes reads. The strace logs show clearly that {code} 9495 write(222, x, 1)= 1 9495 write(222, sq\0~\0\5, 6)= 6 9495 write(222, w\25, 2) = 2 9495 write(222, \0\0\0\1\0\0\0\1\0\0\0\2\0\0\0\5\3\1M\1S, 21) = 21 9495 write(222, x, 1)= 1 9495 write(222, sq\0~\0\2, 6)= 6 9495 write(222, w\t, 2) = 2 9495 write(222, \0\0\0\5\1\215\r\325v, 9) = 9 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3963) Allow Hive to connect to RDBMS
[ https://issues.apache.org/jira/browse/HIVE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxime LANCIAUX updated HIVE-3963: -- Description: I am thinking about something like : SELECT jdbcload('driver','url','user','password','sql') FROM dual; There is already a JIRA https://issues.apache.org/jira/browse/HIVE-1555 for JDBCStorageHandler was: I am thinking about something like : CREATE JDBCEXTERNAL TABLE ( col1 int, col2 string ) TBLPROPERTIES ... and/or SELECT jdbcload('driver','url','user','password','sql') FROM dual; Allow Hive to connect to RDBMS -- Key: HIVE-3963 URL: https://issues.apache.org/jira/browse/HIVE-3963 Project: Hive Issue Type: New Feature Components: Import/Export, JDBC, SQL, StorageHandler Affects Versions: 0.10.0, 0.9.1 Reporter: Maxime LANCIAUX I am thinking about something like : SELECT jdbcload('driver','url','user','password','sql') FROM dual; There is already a JIRA https://issues.apache.org/jira/browse/HIVE-1555 for JDBCStorageHandler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary
[ https://issues.apache.org/jira/browse/HIVE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591058#comment-13591058 ] Vinod Kumar Vavilapalli commented on HIVE-4014: --- Okay, I cannot reproduce this on trunk, though I was consistently hitting this on hive-0.10. I'll try hive-0.10 again to be sure some other patch fixed this. [~tamastarjanyi], what version are you using? Hive+RCFile is not doing column pruning and reading much more data than necessary - Key: HIVE-4014 URL: https://issues.apache.org/jira/browse/HIVE-4014 Project: Hive Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli With even simple projection queries, I see that HDFS bytes read counter doesn't show any reduction in the amount of data read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys
Vinod Kumar Vavilapalli created HIVE-4105: - Summary: Hive MapJoinOperator unnecessarily deserializes values for all join-keys Key: HIVE-4105 URL: https://issues.apache.org/jira/browse/HIVE-4105 Project: Hive Issue Type: Bug Reporter: Vinod Kumar Vavilapalli We can avoid this for inner-joins. Hive does an explicit value de-serialization up front so even for those rows which won't emit output. In these cases, we can do just with key de-serialization. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys
[ https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HIVE-4105: -- Attachment: HIVE-4105-20130301.txt Here's a patch to avoid value de-serialization where not needed in case of inner join. In my microbenchmark, where I was map-joining a big table, with a small table, this brought the task execution time down from 15seconds to 10seconds on about 3 million records on the big table, the second table being very small and the output is small too. Note that you won't see this much of an improvement for non-selective inner joins. If folks are interested, I'll try productionizing the benchmark. Hive MapJoinOperator unnecessarily deserializes values for all join-keys Key: HIVE-4105 URL: https://issues.apache.org/jira/browse/HIVE-4105 Project: Hive Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Attachments: HIVE-4105-20130301.txt We can avoid this for inner-joins. Hive does an explicit value de-serialization up front so even for those rows which won't emit output. In these cases, we can do just with key de-serialization. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4105) Hive MapJoinOperator unnecessarily deserializes values for all join-keys
[ https://issues.apache.org/jira/browse/HIVE-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HIVE-4105: -- Attachment: HIVE-4105-20130301.1.txt Patch upmerged to the latest trunk. Hive MapJoinOperator unnecessarily deserializes values for all join-keys Key: HIVE-4105 URL: https://issues.apache.org/jira/browse/HIVE-4105 Project: Hive Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Attachments: HIVE-4105-20130301.1.txt, HIVE-4105-20130301.txt We can avoid this for inner-joins. Hive does an explicit value de-serialization up front so even for those rows which won't emit output. In these cases, we can do just with key de-serialization. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3987) Update PTF invocation and windowing grammar
[ https://issues.apache.org/jira/browse/HIVE-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3987: --- Attachment: HIVE-3987.patch Patch which takes care of second and third bullet points of this jira. Update PTF invocation and windowing grammar --- Key: HIVE-3987 URL: https://issues.apache.org/jira/browse/HIVE-3987 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Attachments: HIVE-3987.patch Changes to grammar to make it more Standards based: - support Partition Order style along with Hive specific Distribute/Cluster and Sort in windowing specification. - PTF args should come after Input details like in Aster. - tbd: do we need to support named parameters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3987) Update PTF invocation and windowing grammar
[ https://issues.apache.org/jira/browse/HIVE-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591188#comment-13591188 ] Ashutosh Chauhan commented on HIVE-3987: https://reviews.facebook.net/D9027 Update PTF invocation and windowing grammar --- Key: HIVE-3987 URL: https://issues.apache.org/jira/browse/HIVE-3987 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Attachments: HIVE-3987.patch Changes to grammar to make it more Standards based: - support Partition Order style along with Hive specific Distribute/Cluster and Sort in windowing specification. - PTF args should come after Input details like in Aster. - tbd: do we need to support named parameters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3987) Update PTF invocation and windowing grammar
[ https://issues.apache.org/jira/browse/HIVE-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3987: --- Assignee: Ashutosh Chauhan Update PTF invocation and windowing grammar --- Key: HIVE-3987 URL: https://issues.apache.org/jira/browse/HIVE-3987 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Ashutosh Chauhan Attachments: HIVE-3987.patch Changes to grammar to make it more Standards based: - support Partition Order style along with Hive specific Distribute/Cluster and Sort in windowing specification. - PTF args should come after Input details like in Aster. - tbd: do we need to support named parameters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3987) Update PTF invocation and windowing grammar
[ https://issues.apache.org/jira/browse/HIVE-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3987: --- Status: Patch Available (was: Open) Update PTF invocation and windowing grammar --- Key: HIVE-3987 URL: https://issues.apache.org/jira/browse/HIVE-3987 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Ashutosh Chauhan Attachments: HIVE-3987.patch Changes to grammar to make it more Standards based: - support Partition Order style along with Hive specific Distribute/Cluster and Sort in windowing specification. - PTF args should come after Input details like in Aster. - tbd: do we need to support named parameters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4106) SMB joins fail in multi-way joins
Vikram Dixit K created HIVE-4106: Summary: SMB joins fail in multi-way joins Key: HIVE-4106 URL: https://issues.apache.org/jira/browse/HIVE-4106 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K I see array out of bounds exception in case of multi way smb joins. This is related to changes that went in as part of HIVE-3403. This issue has been discussed in HIVE-3891. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Jenkins build is back to normal : Hive-0.10.0-SNAPSHOT-h0.20.1 #80
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/80/
[jira] [Updated] (HIVE-4106) SMB joins fail in multi-way joins
[ https://issues.apache.org/jira/browse/HIVE-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4106: - Attachment: HIVE-4106.patch SMB joins fail in multi-way joins - Key: HIVE-4106 URL: https://issues.apache.org/jira/browse/HIVE-4106 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4106.patch I see array out of bounds exception in case of multi way smb joins. This is related to changes that went in as part of HIVE-3403. This issue has been discussed in HIVE-3891. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4106) SMB joins fail in multi-way joins
[ https://issues.apache.org/jira/browse/HIVE-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4106: - Status: Patch Available (was: Open) SMB joins fail in multi-way joins - Key: HIVE-4106 URL: https://issues.apache.org/jira/browse/HIVE-4106 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4106.patch I see array out of bounds exception in case of multi way smb joins. This is related to changes that went in as part of HIVE-3403. This issue has been discussed in HIVE-3891. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3490) Implement * or a.* for arguments to UDFs
[ https://issues.apache.org/jira/browse/HIVE-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3490: Status: Patch Available (was: Open) Implement * or a.* for arguments to UDFs Key: HIVE-3490 URL: https://issues.apache.org/jira/browse/HIVE-3490 Project: Hive Issue Type: Bug Components: Query Processor, UDF Reporter: Adam Kramer Assignee: Navis Attachments: HIVE-3490.D8889.1.patch, HIVE-3490.D8889.2.patch For a random UDF, we should be able to use * or a.* to refer to all of the columns in their natural order. This is not currently implemented. I'm reporting this as a bug because it is a manner in which Hive is inconsistent with the SQL spec, and because Hive claims to implement *. hive select all_non_null(a.*) from table a where a.ds='2012-09-01'; FAILED: ParseException line 1:25 mismatched input '*' expecting Identifier near '.' in expression specification -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join
[ https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-3996: - Attachment: HIVE-3996_4.patch Correctly enforce the memory limit on the multi-table map-join -- Key: HIVE-3996 URL: https://issues.apache.org/jira/browse/HIVE-3996 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996_4.patch, HIVE-3996.patch Currently with HIVE-3784, the joins are converted to map-joins based on checks of the table size against the config variable: hive.auto.convert.join.noconditionaltask.size. However, the current implementation will also merge multiple mapjoin operators into a single task regardless of whether the sum of the table sizes will exceed the configured value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3490) Implement * or a.* for arguments to UDFs
[ https://issues.apache.org/jira/browse/HIVE-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3490: -- Attachment: HIVE-3490.D8889.2.patch navis updated the revision HIVE-3490 [jira] Implement * or a.* for arguments to UDFs. Addressed comments Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D8889 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8889?vs=28635id=28989#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeColumnListDesc.java ql/src/test/queries/clientpositive/allcolref_in_udf.q ql/src/test/results/clientpositive/allcolref_in_udf.q.out To: JIRA, navis Cc: njain Implement * or a.* for arguments to UDFs Key: HIVE-3490 URL: https://issues.apache.org/jira/browse/HIVE-3490 Project: Hive Issue Type: Bug Components: Query Processor, UDF Reporter: Adam Kramer Assignee: Navis Attachments: HIVE-3490.D8889.1.patch, HIVE-3490.D8889.2.patch For a random UDF, we should be able to use * or a.* to refer to all of the columns in their natural order. This is not currently implemented. I'm reporting this as a bug because it is a manner in which Hive is inconsistent with the SQL spec, and because Hive claims to implement *. hive select all_non_null(a.*) from table a where a.ds='2012-09-01'; FAILED: ParseException line 1:25 mismatched input '*' expecting Identifier near '.' in expression specification -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3490) Implement * or a.* for arguments to UDFs
[ https://issues.apache.org/jira/browse/HIVE-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591227#comment-13591227 ] Phabricator commented on HIVE-3490: --- navis has commented on the revision HIVE-3490 [jira] Implement * or a.* for arguments to UDFs. INLINE COMMENTS ql/src/test/queries/clientpositive/allcolref_in_udf.q:4 ok. ql/src/test/queries/clientpositive/allcolref_in_udf.q:9 ok. ql/src/test/queries/clientpositive/allcolref_in_udf.q:8 it's decided by the row schema of prev operator. For joins, it's left most alias to right. Added comments. REVISION DETAIL https://reviews.facebook.net/D8889 To: JIRA, navis Cc: njain Implement * or a.* for arguments to UDFs Key: HIVE-3490 URL: https://issues.apache.org/jira/browse/HIVE-3490 Project: Hive Issue Type: Bug Components: Query Processor, UDF Reporter: Adam Kramer Assignee: Navis Attachments: HIVE-3490.D8889.1.patch, HIVE-3490.D8889.2.patch For a random UDF, we should be able to use * or a.* to refer to all of the columns in their natural order. This is not currently implemented. I'm reporting this as a bug because it is a manner in which Hive is inconsistent with the SQL spec, and because Hive claims to implement *. hive select all_non_null(a.*) from table a where a.ds='2012-09-01'; FAILED: ParseException line 1:25 mismatched input '*' expecting Identifier near '.' in expression specification -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job
[ https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HIVE-3952: -- Attachment: HIVE-3952-20130301.txt Ran my new test again, passes. This patch can be applied on top of HIVE-4106. merge map-job followed by map-reduce job Key: HIVE-3952 URL: https://issues.apache.org/jira/browse/HIVE-3952 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Vinod Kumar Vavilapalli Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt, HIVE-3952-20130301.txt Consider the query like: select count(*) FROM ( select idOne, idTwo, value FROM bigTable JOIN smallTableOne on (bigTable.idOne = smallTableOne.idOne) ) firstjoin JOIN smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo); where smallTableOne and smallTableTwo are smaller than hive.auto.convert.join.noconditionaltask.size and hive.auto.convert.join.noconditionaltask is set to true. The joins are collapsed into mapjoins, and it leads to a map-only job (for the map-joins) followed by a map-reduce job (for the group by). Ideally, the map-only job should be merged with the following map-reduce job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4071) Map-join outer join produces incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4071: - Attachment: (was: HIVE-4071_3.patch) Map-join outer join produces incorrect results. --- Key: HIVE-4071 URL: https://issues.apache.org/jira/browse/HIVE-4071 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4071_2.patch, HIVE-4071_3.patch, HIVE-4071.patch For example, if one sets the size of noConditionalTask.size to 10 with corresponding auto join configurations set to true in auto_join28.q instead of the current smalltable.filesize configuration, we will observe different results if a select query is run. (The test only has explain statements at present). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4071) Map-join outer join produces incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4071: - Attachment: HIVE-4071_3.patch Map-join outer join produces incorrect results. --- Key: HIVE-4071 URL: https://issues.apache.org/jira/browse/HIVE-4071 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4071_2.patch, HIVE-4071_3.patch, HIVE-4071.patch For example, if one sets the size of noConditionalTask.size to 10 with corresponding auto join configurations set to true in auto_join28.q instead of the current smalltable.filesize configuration, we will observe different results if a select query is run. (The test only has explain statements at present). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4093) Remove sprintf from PTFTranslator and use String.format()
[ https://issues.apache.org/jira/browse/HIVE-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591243#comment-13591243 ] Ashutosh Chauhan commented on HIVE-4093: Removing sprintf is useful, but I am not sure about testcase you added. Remove sprintf from PTFTranslator and use String.format() - Key: HIVE-4093 URL: https://issues.apache.org/jira/browse/HIVE-4093 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-4093-0.patch, HIVE-4093-1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4093) Remove sprintf from PTFTranslator and use String.format()
[ https://issues.apache.org/jira/browse/HIVE-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591249#comment-13591249 ] Brock Noland commented on HIVE-4093: Hi, I added that test case to exercise this check: https://github.com/apache/hive/blob/ptf-windowing/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java#L560 Is that a limitation we should remove? Remove sprintf from PTFTranslator and use String.format() - Key: HIVE-4093 URL: https://issues.apache.org/jira/browse/HIVE-4093 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-4093-0.patch, HIVE-4093-1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4093) Remove sprintf from PTFTranslator and use String.format()
[ https://issues.apache.org/jira/browse/HIVE-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591254#comment-13591254 ] Ashutosh Chauhan commented on HIVE-4093: Ya we need to address this limitation. I would suggest for this jira we just address sprintf issue. Don't add this extra -ve testcase, file a new jira for this limitation. Remove sprintf from PTFTranslator and use String.format() - Key: HIVE-4093 URL: https://issues.apache.org/jira/browse/HIVE-4093 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-4093-0.patch, HIVE-4093-1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4093) Remove sprintf from PTFTranslator and use String.format()
[ https://issues.apache.org/jira/browse/HIVE-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591260#comment-13591260 ] Brock Noland commented on HIVE-4093: OK sounds good, will do! Remove sprintf from PTFTranslator and use String.format() - Key: HIVE-4093 URL: https://issues.apache.org/jira/browse/HIVE-4093 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-4093-0.patch, HIVE-4093-1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4082) Break up ptf tests in PTF, Windowing and Lead/Lag tests
[ https://issues.apache.org/jira/browse/HIVE-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4082: -- Attachment: HIVE-4082.D9033.1.patch pkalmegh requested code review of HIVE-4082 [jira] Break up ptf tests in PTF, Windowing and Lead/Lag tests. Reviewers: JIRA HIVE-4082: Refactor tests TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D9033 AFFECTED FILES data/files/flights_tiny.txt data/files/part.rc data/files/part.seq ql/src/test/queries/clientpositive/leadlag.q ql/src/test/queries/clientpositive/ptf.q ql/src/test/queries/clientpositive/ptf_general_queries.q ql/src/test/queries/clientpositive/ptf_npath.q ql/src/test/queries/clientpositive/ptf_window_boundaries.q ql/src/test/queries/clientpositive/windowing.q ql/src/test/results/clientpositive/leadlag.q.out ql/src/test/results/clientpositive/ptf.q.out ql/src/test/results/clientpositive/ptf_general_queries.q.out ql/src/test/results/clientpositive/ptf_npath.q.out ql/src/test/results/clientpositive/ptf_window_boundaries.q.out ql/src/test/results/clientpositive/windowing.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/21915/ To: JIRA, pkalmegh Break up ptf tests in PTF, Windowing and Lead/Lag tests --- Key: HIVE-4082 URL: https://issues.apache.org/jira/browse/HIVE-4082 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Prajakta Kalmegh Attachments: HIVE-4082.D9033.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4107) Update Hive 0.10.0 RELEASE_NOTES.txt
Lefty Leverenz created HIVE-4107: Summary: Update Hive 0.10.0 RELEASE_NOTES.txt Key: HIVE-4107 URL: https://issues.apache.org/jira/browse/HIVE-4107 Project: Hive Issue Type: Bug Components: Documentation Affects Versions: 0.10.0 Reporter: Lefty Leverenz Hive release 0.10.0 includes a RELEASE_NOTES.txt file left over from release 0.8.1 (branch-0.8-r2). It needs to be updated to match the JIRA change log here: [https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12320745styleName=TextprojectId=12310843]. Thanks to Eric Chu for drawing attention to this problem on u...@hive.apache.org. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3908) create view statement's outputs contains the view and a temporary dir.
[ https://issues.apache.org/jira/browse/HIVE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-3908: -- Status: Patch Available (was: Open) Patch attached create view statement's outputs contains the view and a temporary dir. -- Key: HIVE-3908 URL: https://issues.apache.org/jira/browse/HIVE-3908 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Prasad Mujumdar Attachments: HIVE-3908-1.patch It should only contain the view -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-3908) create view statement's outputs contains the view and a temporary dir.
[ https://issues.apache.org/jira/browse/HIVE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar reassigned HIVE-3908: - Assignee: Prasad Mujumdar create view statement's outputs contains the view and a temporary dir. -- Key: HIVE-3908 URL: https://issues.apache.org/jira/browse/HIVE-3908 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Prasad Mujumdar Attachments: HIVE-3908-1.patch It should only contain the view -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3908) create view statement's outputs contains the view and a temporary dir.
[ https://issues.apache.org/jira/browse/HIVE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-3908: -- Attachment: HIVE-3908-1.patch create view statement's outputs contains the view and a temporary dir. -- Key: HIVE-3908 URL: https://issues.apache.org/jira/browse/HIVE-3908 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Prasad Mujumdar Attachments: HIVE-3908-1.patch It should only contain the view -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3908) create view statement's outputs contains the view and a temporary dir.
[ https://issues.apache.org/jira/browse/HIVE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591346#comment-13591346 ] Prasad Mujumdar commented on HIVE-3908: --- Review request on https://reviews.facebook.net/D9039 create view statement's outputs contains the view and a temporary dir. -- Key: HIVE-3908 URL: https://issues.apache.org/jira/browse/HIVE-3908 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Prasad Mujumdar Attachments: HIVE-3908-1.patch It should only contain the view -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3908) create view statement's outputs contains the view and a temporary dir.
[ https://issues.apache.org/jira/browse/HIVE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-3908: -- Component/s: Query Processor Affects Version/s: 0.10.0 create view statement's outputs contains the view and a temporary dir. -- Key: HIVE-3908 URL: https://issues.apache.org/jira/browse/HIVE-3908 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Namit Jain Assignee: Prasad Mujumdar Attachments: HIVE-3908-1.patch It should only contain the view -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira