[jira] [Created] (HIVE-8637) In insert into X select from Y, table properties from X are clobbering those from Y
Alan Gates created HIVE-8637: Summary: In insert into X select from Y, table properties from X are clobbering those from Y Key: HIVE-8637 URL: https://issues.apache.org/jira/browse/HIVE-8637 Project: Hive Issue Type: Task Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 With a query like: {code} insert into table X select * from Y; {code} the table properties from table X are being sent to the input formats for table Y. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8637) In insert into X select from Y, table properties from X are clobbering those from Y
[ https://issues.apache.org/jira/browse/HIVE-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187445#comment-14187445 ] Alan Gates commented on HIVE-8637: -- The issue is that HiveOutputFormatImpl.checkOutputSpecs writes the table properties for table X into the conf file. When HiveInputFormat.getInputSplits later takes that same conf file, and goes to copy the table properties in, it calls Utilities.copyTableJobPropertiesToConf (the same method that checkOutputSpecs did). The issue is in copyTablePropertiesToConf, it does not overwrite a given table property in the job conf if it is already set. This means that many of the table properties from Y don't get propagated because the values from X are already set. I do not believe this is a new problem, but it is showing up now because reading transactional tables depends on the bucket count to be accurate. So a query like: {code} create table notbucketed (a string, b int); create table transactional (a string, b int) clustered by (b) into 2 buckets stored as orc tblproperties = ('transactional' = 'true'); insert into table notbucketed select * from transactional; {code} results in the table 'transactional' being told it has no buckets. Since the acid reader depends on this value, it concludes that with no buckets it has no splits, and thus the above insert writes nothing into 'notbucketed' regardless of how many records are in 'transactional'. In insert into X select from Y, table properties from X are clobbering those from Y --- Key: HIVE-8637 URL: https://issues.apache.org/jira/browse/HIVE-8637 Project: Hive Issue Type: Task Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 With a query like: {code} insert into table X select * from Y; {code} the table properties from table X are being sent to the input formats for table Y. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8637) In insert into X select from Y, table properties from X are clobbering those from Y
[ https://issues.apache.org/jira/browse/HIVE-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8637: - Attachment: HIVE-8637.patch This is not a permanent fix. This fix works by changing HiveInputFormat.getInputSplits to call a new method in Utilities that sets values from table properties in the job conf whether they are already set or not. This seems safe, since the table should properly understand its own properties. I believe the correct long term solution is to make sure a different copy of JobConf goes to the input and output tables, so each can write whatever it wants there. I think that would have to be done in ExecDriver.execute, since calls to checkOutputSpecs and getInputSplits are done by Hadoop after Hive submits the job. I think that would fix the MR case. I'm sure the fix for Tez would be slightly different (since the job is submitted all at once). But this would also destroy any ability to communicate information across jobs via the conf file. I don't know if anything is doing that or not. I'm loathe to make that big a change when [~hagleitn] has said he wants to cut a release in a week. So, I propose this smaller change now, and we file a JIRA for the bigger, more complete fix. In insert into X select from Y, table properties from X are clobbering those from Y --- Key: HIVE-8637 URL: https://issues.apache.org/jira/browse/HIVE-8637 Project: Hive Issue Type: Task Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8637.patch With a query like: {code} insert into table X select * from Y; {code} the table properties from table X are being sent to the input formats for table Y. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8637) In insert into X select from Y, table properties from X are clobbering those from Y
[ https://issues.apache.org/jira/browse/HIVE-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8637: - Status: Patch Available (was: Open) In insert into X select from Y, table properties from X are clobbering those from Y --- Key: HIVE-8637 URL: https://issues.apache.org/jira/browse/HIVE-8637 Project: Hive Issue Type: Task Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8637.patch With a query like: {code} insert into table X select * from Y; {code} the table properties from table X are being sent to the input formats for table Y. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8629) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez
[ https://issues.apache.org/jira/browse/HIVE-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187676#comment-14187676 ] Alan Gates commented on HIVE-8629: -- Doing LOG.debug and documenting it should be fine. Other than that, +1. Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez Key: HIVE-8629 URL: https://issues.apache.org/jira/browse/HIVE-8629 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Attachments: HIVE-8629.patch When creating a hive session to run basic alter table create partition queries, the session creation takes too long (more than 5 sec) if the hive execution engine is set to tez. Since the streaming clients dont care about Tez , it can explicitly override the setting to mr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7408) HCatPartition needs getPartCols method
[ https://issues.apache.org/jira/browse/HIVE-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185260#comment-14185260 ] Alan Gates commented on HIVE-7408: -- +1 HCatPartition needs getPartCols method -- Key: HIVE-7408 URL: https://issues.apache.org/jira/browse/HIVE-7408 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.13.0 Reporter: JongWon Park Assignee: Navis Priority: Minor Attachments: HIVE-7408.1.patch.txt, HIVE-7408.2.patch.txt org.apache.hive.hcatalog.api.HCatPartition has getColumns method. However, it is not partition column. HCatPartition needs getPartCols method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8605) HIVE-5799 breaks backward compatibility for time values in config
[ https://issues.apache.org/jira/browse/HIVE-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185315#comment-14185315 ] Alan Gates commented on HIVE-8605: -- As far as I know all of the time units were whole integers, so 'd' for double, and 'f' for float probably don't make sense. 'l' for long is the only one I know of people using (we found this when a co-worker copied a config file from 0.13 and used it against 0.14 branch). So I could change the patch to just support 'l'. We have to find someway not to break that backward compatibility without also breaking your changes to do the time units. HIVE-5799 breaks backward compatibility for time values in config - Key: HIVE-8605 URL: https://issues.apache.org/jira/browse/HIVE-8605 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8605.patch It is legal for long values in the config file to have an L or for float values to have an f. For example, the default value for hive.compactor.check.interval was 300L. As part of HIVE-5799, many long values were converted to TimeUnit. Attempts to read these values now throw java.lang.IllegalArgumentException: Invalid time unit l We need to change this to ignore the L or f, so that users existing config files don't break. I propose to do this by changing HiveConf.unitFor to detect the L or f and interpret it to mean the default time unit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist
[ https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185335#comment-14185335 ] Alan Gates commented on HIVE-8583: -- I'm not opposed to changing the order of the modifiers, I just didn't understand why it mattered. So no need for a new patch. We do need the tests to run on this patch though. I don't think the build failure has anything with your patch. So just canceling the patch, re-attaching the file, and re-submitting the patch should force the tests to run. HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist --- Key: HIVE-8583 URL: https://issues.apache.org/jira/browse/HIVE-8583 Project: Hive Issue Type: Improvement Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Attachments: HIVE-8583.1.patch [~alangates] added the following in HIVE-8341: {code} String bl = hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString()); if (bl != null bl.length() 0) { String[] bls = bl.split(,); for (String b : bls) { b.replaceAll(., _); blackListedConfEntries.add(b); } } {code} The {{replaceAll}} call is confusing as its result is not used at all. This patch contains the following: * Minor style modification (missorted modifiers) * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST * Removes replaceAll * Lets blackListed take a Configuration job as parameter which allowed me to add a test for this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8562) ResultSet.isClosed sometimes doesn't work with mysql
[ https://issues.apache.org/jira/browse/HIVE-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved HIVE-8562. -- Resolution: Invalid Turns out I was using an old version of the mysql JDBC jar. Once I use the proper version this issue goes away. ResultSet.isClosed sometimes doesn't work with mysql Key: HIVE-8562 URL: https://issues.apache.org/jira/browse/HIVE-8562 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Calls to ResultSet.isClosed are sometimes throwing an AbstractMethodException when used against MySQL. This is causing issues for the compactor when it tries to update stats. As far as I can tell it only happens when the result set is empty (which is weird). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6669) sourcing txn-script from schema script results in failure for mysql oracle
[ https://issues.apache.org/jira/browse/HIVE-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185591#comment-14185591 ] Alan Gates commented on HIVE-6669: -- All the txn tables are already in the hive-schema-0.14.0.mssql.sql. I don't know why. But that's why I didn't add them. sourcing txn-script from schema script results in failure for mysql oracle Key: HIVE-6669 URL: https://issues.apache.org/jira/browse/HIVE-6669 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Alan Gates Priority: Blocker Attachments: HIVE-6669.2.patch, HIVE-6669.patch This issues is addressed in 0.13 by in-lining the the transaction schema statements in the schema initialization script (HIVE-6559) The 0.14 schema initialization is not fixed. This is the followup ticket for to address the problem in 0.14. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8605) HIVE-5799 breaks backward compatibility for time values in config
Alan Gates created HIVE-8605: Summary: HIVE-5799 breaks backward compatibility for time values in config Key: HIVE-8605 URL: https://issues.apache.org/jira/browse/HIVE-8605 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 It is legal for long values in the config file to have an L or for float values to have an f. For example, the default value for hive.compactor.check.interval was 300L. As part of HIVE-5799, many long values were converted to TimeUnit. Attempts to read these values now throw java.lang.IllegalArgumentException: Invalid time unit l We need to change this to ignore the L or f, so that users existing config files don't break. I propose to do this by changing HiveConf.unitFor to detect the L or f and interpret it to mean the default time unit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8605) HIVE-5799 breaks backward compatibility for time values in config
[ https://issues.apache.org/jira/browse/HIVE-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8605: - Attachment: HIVE-8605.patch HIVE-5799 breaks backward compatibility for time values in config - Key: HIVE-8605 URL: https://issues.apache.org/jira/browse/HIVE-8605 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8605.patch It is legal for long values in the config file to have an L or for float values to have an f. For example, the default value for hive.compactor.check.interval was 300L. As part of HIVE-5799, many long values were converted to TimeUnit. Attempts to read these values now throw java.lang.IllegalArgumentException: Invalid time unit l We need to change this to ignore the L or f, so that users existing config files don't break. I propose to do this by changing HiveConf.unitFor to detect the L or f and interpret it to mean the default time unit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8605) HIVE-5799 breaks backward compatibility for time values in config
[ https://issues.apache.org/jira/browse/HIVE-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8605: - Status: Patch Available (was: Open) [~navis], if you have a chance to review this that would be great. HIVE-5799 breaks backward compatibility for time values in config - Key: HIVE-8605 URL: https://issues.apache.org/jira/browse/HIVE-8605 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8605.patch It is legal for long values in the config file to have an L or for float values to have an f. For example, the default value for hive.compactor.check.interval was 300L. As part of HIVE-5799, many long values were converted to TimeUnit. Attempts to read these values now throw java.lang.IllegalArgumentException: Invalid time unit l We need to change this to ignore the L or f, so that users existing config files don't break. I propose to do this by changing HiveConf.unitFor to detect the L or f and interpret it to mean the default time unit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist
[ https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183002#comment-14183002 ] Alan Gates commented on HIVE-8583: -- Yes, Lars is correct. That is just a piece of earlier code that I neglected to take out. HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist --- Key: HIVE-8583 URL: https://issues.apache.org/jira/browse/HIVE-8583 Project: Hive Issue Type: Improvement Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Attachments: HIVE-8583.1.patch [~alangates] added the following in HIVE-8341: {code} String bl = hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString()); if (bl != null bl.length() 0) { String[] bls = bl.split(,); for (String b : bls) { b.replaceAll(., _); blackListedConfEntries.add(b); } } {code} The {{replaceAll}} call is confusing as its result is not used at all. This patch contains the following: * Minor style modification (missorted modifiers) * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST * Removes replaceAll * Lets blackListed take a Configuration job as parameter which allowed me to add a test for this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist
[ https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183171#comment-14183171 ] Alan Gates commented on HIVE-8583: -- +1, patch looks fine. The statement missorted modifiers implies there is a correct order. If the compiler doesn't care about final static private versus private static final why should we? HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist --- Key: HIVE-8583 URL: https://issues.apache.org/jira/browse/HIVE-8583 Project: Hive Issue Type: Improvement Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Attachments: HIVE-8583.1.patch [~alangates] added the following in HIVE-8341: {code} String bl = hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString()); if (bl != null bl.length() 0) { String[] bls = bl.split(,); for (String b : bls) { b.replaceAll(., _); blackListedConfEntries.add(b); } } {code} The {{replaceAll}} call is confusing as its result is not used at all. This patch contains the following: * Minor style modification (missorted modifiers) * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST * Removes replaceAll * Lets blackListed take a Configuration job as parameter which allowed me to add a test for this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8516) insert/values allowed against bucketed, non-transactional tables
[ https://issues.apache.org/jira/browse/HIVE-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183735#comment-14183735 ] Alan Gates commented on HIVE-8516: -- Actually, I think this bug is invalid. I was confused. Hive certainly supports inserts into bucketed tables. insert/values allowed against bucketed, non-transactional tables Key: HIVE-8516 URL: https://issues.apache.org/jira/browse/HIVE-8516 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Matt McCline Hive does not support insert into bucketed tables. A special exception is made for transactional tables, as they require bucketing. Insert/values works against non-transactional tables, since it just dumps the values into a temp table and rewrites the query into insert/select from that temp table. However, the check that prevents doing inserts into non-transactional, bucketed tables is not catching this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8516) insert/values allowed against bucketed, non-transactional tables
[ https://issues.apache.org/jira/browse/HIVE-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved HIVE-8516. -- Resolution: Invalid insert/values allowed against bucketed, non-transactional tables Key: HIVE-8516 URL: https://issues.apache.org/jira/browse/HIVE-8516 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Matt McCline Hive does not support insert into bucketed tables. A special exception is made for transactional tables, as they require bucketing. Insert/values works against non-transactional tables, since it just dumps the values into a temp table and rewrites the query into insert/select from that temp table. However, the check that prevents doing inserts into non-transactional, bucketed tables is not catching this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8543) Compactions fail on metastore using postgres
[ https://issues.apache.org/jira/browse/HIVE-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8543: - Resolution: Fixed Status: Resolved (was: Patch Available) Test failures are not related. I ran the streaming test locally and saw no issues. Patch committed to trunk and 0.14 branch. Thanks Damien for writing most of the code for this and Eugene for reviewing it. Compactions fail on metastore using postgres Key: HIVE-8543 URL: https://issues.apache.org/jira/browse/HIVE-8543 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8543.patch The worker fails to update the stats when the metastore is using Postgres as the RDBMS. {code} org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not exist {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8474: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk and branch 0.14. Thanks Ashutosh and Matt for the reviews. Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8474.2.patch, HIVE-8474.patch {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8562) ResultSet.isClosed sometimes doesn't work with mysql
Alan Gates created HIVE-8562: Summary: ResultSet.isClosed sometimes doesn't work with mysql Key: HIVE-8562 URL: https://issues.apache.org/jira/browse/HIVE-8562 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Calls to ResultSet.isClosed are sometimes throwing an AbstractMethodException when used against MySQL. This is causing issues for the compactor when it tries to update stats. As far as I can tell it only happens when the result set is empty (which is weird). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8235) Insert into partitioned bucketed sorted tables fails with this file is already being created by
[ https://issues.apache.org/jira/browse/HIVE-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved HIVE-8235. -- Resolution: Cannot Reproduce Closing as cannot reproduce, as I cannot reproduce this. Please re-open if you see it again. Insert into partitioned bucketed sorted tables fails with this file is already being created by - Key: HIVE-8235 URL: https://issues.apache.org/jira/browse/HIVE-8235 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: insert_into_partitioned_bucketed_table.txt.tar.gz.zip When loading into a partitioned bucketed sorted table the query fails with {code} Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0] for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for client [172.21.128.111], because this file is already being created by [DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on [172.21.128.122] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy15.create(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy15.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at
[jira] [Created] (HIVE-8543) Compactions fail on metastore using postgres
Alan Gates created HIVE-8543: Summary: Compactions fail on metastore using postgres Key: HIVE-8543 URL: https://issues.apache.org/jira/browse/HIVE-8543 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 The worker fails to update the stats when the metastore is using Postgres as the RDBMS. {code} org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not exist {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178936#comment-14178936 ] Alan Gates commented on HIVE-7689: -- Opened HIVE-8543 to deal with the issue in the compactor. Fix wrong lower case table names in Postgres Metastore back end --- Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Blocker Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch Current 0.14 patch create table with lower case names. This patch fix wrong lower case tables names in Postgres Metastore back end. Mixing lower case and upper case throws bugs in {{JDBCStatsPublisher}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8543) Compactions fail on metastore using postgres
[ https://issues.apache.org/jira/browse/HIVE-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179063#comment-14179063 ] Alan Gates commented on HIVE-8543: -- I'm testing a fix as well. If it passes I'll post it shortly so you can make sure it works in your environment. Compactions fail on metastore using postgres Key: HIVE-8543 URL: https://issues.apache.org/jira/browse/HIVE-8543 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 The worker fails to update the stats when the metastore is using Postgres as the RDBMS. {code} org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not exist {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8543) Compactions fail on metastore using postgres
[ https://issues.apache.org/jira/browse/HIVE-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8543: - Status: Patch Available (was: Open) Compactions fail on metastore using postgres Key: HIVE-8543 URL: https://issues.apache.org/jira/browse/HIVE-8543 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8543.patch The worker fails to update the stats when the metastore is using Postgres as the RDBMS. {code} org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not exist {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8543) Compactions fail on metastore using postgres
[ https://issues.apache.org/jira/browse/HIVE-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8543: - Attachment: HIVE-8543.patch This patch adds quotes to operations for the TABLE_COL_STATS and PART_COL_STATS tables. The code for this was taken almost exclusively from [~damien.carol]'s patch on HIVE-7689 Compactions fail on metastore using postgres Key: HIVE-8543 URL: https://issues.apache.org/jira/browse/HIVE-8543 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8543.patch The worker fails to update the stats when the metastore is using Postgres as the RDBMS. {code} org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not exist {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8341) Transaction information in config file can grow excessively large
[ https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179194#comment-14179194 ] Alan Gates commented on HIVE-8341: -- [~leftylev] what needs documented here? Transaction information in config file can grow excessively large - Key: HIVE-8341 URL: https://issues.apache.org/jira/browse/HIVE-8341 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-8341.2.patch, HIVE-8341.3.patch, HIVE-8341.patch In our testing we have seen cases where the transaction list grows very large. We need a more efficient way of communicating the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8341) Transaction information in config file can grow excessively large
[ https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179220#comment-14179220 ] Alan Gates commented on HIVE-8341: -- It doesn't need to be documented in Hive Transactions. There's nothing transaction specific about it. I don't expect users to set this themselves. I've updated the Configuration Properties with information on this value. I'm glad you caught this, as I didn't realize we recorded all conf keys there. I'll also add the same information to the release notes for this bug. Transaction information in config file can grow excessively large - Key: HIVE-8341 URL: https://issues.apache.org/jira/browse/HIVE-8341 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-8341.2.patch, HIVE-8341.3.patch, HIVE-8341.patch In our testing we have seen cases where the transaction list grows very large. We need a more efficient way of communicating the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large
[ https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8341: - Release Note: A new environment variable hive.script.operator.env.blacklist was added in 0.14. Its default value is hive.txn.valid.txns,hive.script.operator.env.blacklist By default all values in the HiveConf object are converted to environment variables of the same name as the key (with '.' (dot) converted to '_' (underscore)) and set as part of the script operator's environment. However, some values can grow large or are not amenable to translation to environment variables. This value gives a comma separated list of configuration values that will not be set in the environment when calling a script operator. By default the valid transaction list is excluded, as it can grow large and is sometimes compressed, which does not translate well to an environment variable. Transaction information in config file can grow excessively large - Key: HIVE-8341 URL: https://issues.apache.org/jira/browse/HIVE-8341 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-8341.2.patch, HIVE-8341.3.patch, HIVE-8341.patch In our testing we have seen cases where the transaction list grows very large. We need a more efficient way of communicating the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8474: - Status: Open (was: Patch Available) Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8474.patch {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177104#comment-14177104 ] Alan Gates commented on HIVE-8474: -- Ok, I'll rework it not to use addToBatchFrom. I do plan on factoring out the switch statement so that can be shared, but hopefully that will be alright. Are you ok with the changes to VectorizedRowBatch to add tracking the partition columns? Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8474.patch {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177198#comment-14177198 ] Alan Gates commented on HIVE-8474: -- Are you saying that rather than changing VectorizedRowBatch I should just pass along the Ctx to my new acidAddToBatchFrom method and use that to figure out the partition columns? Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8474.patch {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8474: - Attachment: HIVE-8474.2.patch A second version of the patch that incorporates Matt's feedback. Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8474.2.patch, HIVE-8474.patch {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8474: - Status: Patch Available (was: Open) Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8474.2.patch, HIVE-8474.patch {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8515) Column projection not being pushed to ORC delta files
Alan Gates created HIVE-8515: Summary: Column projection not being pushed to ORC delta files Key: HIVE-8515 URL: https://issues.apache.org/jira/browse/HIVE-8515 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Currently when only some columns are projected, that projection is pushed to the base file but not to delta files. This does not cause incorrect results (the columns are projected out later in the query execution), but it is less efficient then it could be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8515) Column projection not being pushed to ORC delta files
[ https://issues.apache.org/jira/browse/HIVE-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176077#comment-14176077 ] Alan Gates commented on HIVE-8515: -- The issue is in OrcInputFormat.getReader: {code} if (split.hasBase()) { bucket = AcidUtils.parseBaseBucketFilename(split.getPath(), conf) .getBucket(); reader = OrcFile.createReader(path, OrcFile.readerOptions(conf)); final ListOrcProto.Type types = reader.getTypes(); setIncludedColumns(readOptions, types, conf, split.isOriginal()); setSearchArgument(readOptions, types, conf, split.isOriginal()); } else { bucket = (int) split.getStart(); reader = null; } } {code} setIncludeColumns is called if there is a base, but not if there isn't. Column projection not being pushed to ORC delta files - Key: HIVE-8515 URL: https://issues.apache.org/jira/browse/HIVE-8515 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Currently when only some columns are projected, that projection is pushed to the base file but not to delta files. This does not cause incorrect results (the columns are projected out later in the query execution), but it is less efficient then it could be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8516) insert/values allowed against bucketed, non-transactional tables
Alan Gates created HIVE-8516: Summary: insert/values allowed against bucketed, non-transactional tables Key: HIVE-8516 URL: https://issues.apache.org/jira/browse/HIVE-8516 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Hive does not support insert into bucketed tables. A special exception is made for transactional tables, as they require bucketing. Insert/values works against non-transactional tables, since it just dumps the values into a temp table and rewrites the query into insert/select from that temp table. However, the check that prevents doing inserts into non-transactional, bucketed tables is not catching this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
[ https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175151#comment-14175151 ] Alan Gates commented on HIVE-8290: -- bq. Was hive.support.concurrency required for transactions in 0.13.0? Yes. With DbTxnManager configured, all ORC tables forced to be transactional --- Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8290.2.patch, HIVE-8290.patch Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175664#comment-14175664 ] Alan Gates commented on HIVE-8474: -- Further testing has also determined that attempts to select a partition column in vectorization results in a NPE as well. Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8474: - Attachment: HIVE-8474.patch This patch makes several changes in vectorization. [~mmccline] and [~ashutoshc], as I am not very familiar with this code and as I know the code is very performance sensitive I would appreciate your feedback on the patch. The issue causing problems was that VectorizedBatchUtil.addRowToBatchFrom is used by VectorizedOrcAcidRowReader to take the merged rows from and acid read and put them in a vector batch. But this method appears to have been built to be used by vector operators, not file formats where columns may be missing because they have been projected out or may already have values set as they are partition columns. So I made the following changes: # I changed addRowToBatchFrom to skip writing values into ColumnVectors that are null. This handles the case where columns have been projected out and thus the ColumnVector is null. # I changed VectorizedRowBatch to have a boolean array to track which columns are partition columns and VectorizedRowBatchCtx.createVectorizedRowBatch to populate this array # I changed addRowToBatchFrom to skip writing values into ColumnVectors that are marked in VectorizedRowBatch as partition columns, since this results in overwriting the values that have already been put there by VectorizedRowBatchCtx.addPartitionColumnsToBatch My concern is whether it is appropriate to mix in this functionality to skip projected out and partition columns into addRowToBatchFrom. If you think it isn't good, I can write a new method to do this. But that will involve a fair amount of duplicate code. [~owen.omalley], I also changed VectorizedOrcAcidRowReader to set the partition column values after every call to VectorizedRowBatch.reset in next. Without doing this the code was NPEing later in the pipeline because the partition column had been set to null. It appeared that you had copied the code from VectorizedOrcInputFormat, which only called addPartitionColsToBatch once, but which never called reset. I tried removing the call to reset but that caused other issues. Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8474.patch {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at
[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8474: - Status: Patch Available (was: Open) Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8474.patch {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8235) Insert into partitioned bucketed sorted tables fails with this file is already being created by
[ https://issues.apache.org/jira/browse/HIVE-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175690#comment-14175690 ] Alan Gates commented on HIVE-8235: -- [~mmokhtar], ping, have you had a chance to run this? I can't reproduce it. Insert into partitioned bucketed sorted tables fails with this file is already being created by - Key: HIVE-8235 URL: https://issues.apache.org/jira/browse/HIVE-8235 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: insert_into_partitioned_bucketed_table.txt.tar.gz.zip When loading into a partitioned bucketed sorted table the query fails with {code} Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0] for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for client [172.21.128.111], because this file is already being created by [DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on [172.21.128.122] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy15.create(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy15.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at
[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large
[ https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8341: - Status: Open (was: Patch Available) The TestOperators failure is caused by this patch. The rest I believe are unrelated. I'll put up a new version of the patch that addresses the TestOperators failure. Transaction information in config file can grow excessively large - Key: HIVE-8341 URL: https://issues.apache.org/jira/browse/HIVE-8341 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Attachments: HIVE-8341.2.patch, HIVE-8341.patch In our testing we have seen cases where the transaction list grows very large. We need a more efficient way of communicating the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large
[ https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8341: - Status: Patch Available (was: Open) Transaction information in config file can grow excessively large - Key: HIVE-8341 URL: https://issues.apache.org/jira/browse/HIVE-8341 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Attachments: HIVE-8341.2.patch, HIVE-8341.3.patch, HIVE-8341.patch In our testing we have seen cases where the transaction list grows very large. We need a more efficient way of communicating the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large
[ https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8341: - Attachment: HIVE-8341.3.patch Transaction information in config file can grow excessively large - Key: HIVE-8341 URL: https://issues.apache.org/jira/browse/HIVE-8341 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Attachments: HIVE-8341.2.patch, HIVE-8341.3.patch, HIVE-8341.patch In our testing we have seen cases where the transaction list grows very large. We need a more efficient way of communicating the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
Alan Gates created HIVE-8474: Summary: Vectorized reads of transactional tables fail when not all columns are selected Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large
[ https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8341: - Attachment: HIVE-8341.2.patch A new version of the patch. The transaction list is compressed, as before. However, with this patch I added a blacklist to the ScriptOperator to strain out specified conf variables and not make them environment variables. Transaction information in config file can grow excessively large - Key: HIVE-8341 URL: https://issues.apache.org/jira/browse/HIVE-8341 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Attachments: HIVE-8341.2.patch, HIVE-8341.patch In our testing we have seen cases where the transaction list grows very large. We need a more efficient way of communicating the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large
[ https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8341: - Status: Patch Available (was: Open) Transaction information in config file can grow excessively large - Key: HIVE-8341 URL: https://issues.apache.org/jira/browse/HIVE-8341 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Attachments: HIVE-8341.2.patch, HIVE-8341.patch In our testing we have seen cases where the transaction list grows very large. We need a more efficient way of communicating the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171100#comment-14171100 ] Alan Gates commented on HIVE-7689: -- Ok, we should quote calls to just that stats table then. That way we don't pollute all the TxnHandler code, but we can fix this problem. We can either re-open this bug or open a new one. Do you want to post a patch for this or do you want me to? If you post it I can review it. If not, I can post a patch in a day or two and you can test that it works in your system. Fix wrong lower case table names in Postgres Metastore back end --- Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Blocker Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch Current 0.14 patch create table with lower case names. This patch fix wrong lower case tables names in Postgres Metastore back end. Mixing lower case and upper case throws bugs in {{JDBCStatsPublisher}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8459) DbLockManager locking table in addition to partitions
Alan Gates created HIVE-8459: Summary: DbLockManager locking table in addition to partitions Key: HIVE-8459 URL: https://issues.apache.org/jira/browse/HIVE-8459 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Queries and operations on partitioned tables are generating locks on the whole table when they should only be locking the partition. For example: {code} insert into table concur_orc_tab_part partition (ds='today') values ('fred flintstone', 43, 1.95); {code} This should only be locking the partition ds='today'. But instead: {code} mysql select * from HIVE_LOCKS; +++--+-+---+--+---+--+---++-++ | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB | HL_TABLE | HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | HL_ACQUIRED_AT | HL_USER | HL_HOST| +++--+-+---+--+---+--+---++-++ |425 | 1 | 204 | default | values__tmp__table__1 | NULL | a | r| 141331074 | 1413310738000 | hive| node-1.example.com | |425 | 2 | 204 | default | concur_orc_tab_part | ds=today | a | r| 141331074 | 1413310738000 | hive| node-1.example.com | +++--+-+---+--+---+--+---++-++ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8459) DbLockManager locking table in addition to partitions
[ https://issues.apache.org/jira/browse/HIVE-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8459: - Description: Queries and operations on partitioned tables are generating locks on the whole table when they should only be locking the partition. For example: {code} select count(*) from concur_orc_tab_part where ds = 'today'; {code} This should only be locking the partition ds='today'. But instead: {code} mysql select * from HIVE_LOCKS; +++--+-+-+--+---+--+---++-++ | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB | HL_TABLE| HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | HL_ACQUIRED_AT | HL_USER | HL_HOST| +++--+-+-+--+---+--+---++-++ |428 | 1 |0 | default | concur_orc_tab_part | NULL | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | |428 | 2 |0 | default | concur_orc_tab_part | ds=today | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | +++--+-+-+--+---+--+---++-++ {code} was: Queries and operations on partitioned tables are generating locks on the whole table when they should only be locking the partition. For example: {code} insert into table concur_orc_tab_part partition (ds='today') values ('fred flintstone', 43, 1.95); {code} This should only be locking the partition ds='today'. But instead: {code} mysql select * from HIVE_LOCKS; +++--+-+---+--+---+--+---++-++ | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB | HL_TABLE | HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | HL_ACQUIRED_AT | HL_USER | HL_HOST| +++--+-+---+--+---+--+---++-++ |425 | 1 | 204 | default | values__tmp__table__1 | NULL | a | r| 141331074 | 1413310738000 | hive| node-1.example.com | |425 | 2 | 204 | default | concur_orc_tab_part | ds=today | a | r| 141331074 | 1413310738000 | hive| node-1.example.com | +++--+-+---+--+---+--+---++-++ {code} DbLockManager locking table in addition to partitions - Key: HIVE-8459 URL: https://issues.apache.org/jira/browse/HIVE-8459 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Queries and operations on partitioned tables are generating locks on the whole table when they should only be locking the partition. For example: {code} select count(*) from concur_orc_tab_part where ds = 'today'; {code} This should only be locking the partition ds='today'. But instead: {code} mysql select * from HIVE_LOCKS; +++--+-+-+--+---+--+---++-++ | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB | HL_TABLE | HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | HL_ACQUIRED_AT | HL_USER | HL_HOST| +++--+-+-+--+---+--+---++-++ |428 | 1 |0 | default | concur_orc_tab_part | NULL | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | |428 | 2 |0 | default | concur_orc_tab_part | ds=today | a | r| 1413311172000 | 1413311171000 | hive|
[jira] [Commented] (HIVE-8459) DbLockManager locking table in addition to partitions
[ https://issues.apache.org/jira/browse/HIVE-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171328#comment-14171328 ] Alan Gates commented on HIVE-8459: -- Note that this is only happening on read side, not the write side. DbLockManager locking table in addition to partitions - Key: HIVE-8459 URL: https://issues.apache.org/jira/browse/HIVE-8459 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Queries and operations on partitioned tables are generating locks on the whole table when they should only be locking the partition. For example: {code} select count(*) from concur_orc_tab_part where ds = 'today'; {code} This should only be locking the partition ds='today'. But instead: {code} mysql select * from HIVE_LOCKS; +++--+-+-+--+---+--+---++-++ | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB | HL_TABLE | HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | HL_ACQUIRED_AT | HL_USER | HL_HOST| +++--+-+-+--+---+--+---++-++ |428 | 1 |0 | default | concur_orc_tab_part | NULL | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | |428 | 2 |0 | default | concur_orc_tab_part | ds=today | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | +++--+-+-+--+---+--+---++-++ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8459) DbLockManager locking table in addition to partitions
[ https://issues.apache.org/jira/browse/HIVE-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8459: - Priority: Major (was: Critical) DbLockManager locking table in addition to partitions - Key: HIVE-8459 URL: https://issues.apache.org/jira/browse/HIVE-8459 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Queries and operations on partitioned tables are generating locks on the whole table when they should only be locking the partition. For example: {code} select count(*) from concur_orc_tab_part where ds = 'today'; {code} This should only be locking the partition ds='today'. But instead: {code} mysql select * from HIVE_LOCKS; +++--+-+-+--+---+--+---++-++ | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB | HL_TABLE | HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | HL_ACQUIRED_AT | HL_USER | HL_HOST| +++--+-+-+--+---+--+---++-++ |428 | 1 |0 | default | concur_orc_tab_part | NULL | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | |428 | 2 |0 | default | concur_orc_tab_part | ds=today | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | +++--+-+-+--+---+--+---++-++ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8442) Revert HIVE-8403
Alan Gates created HIVE-8442: Summary: Revert HIVE-8403 Key: HIVE-8442 URL: https://issues.apache.org/jira/browse/HIVE-8442 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 HIVE-8403 caused the number of tests run to drop from ~6K to ~4K. Also, the datanucleus repo is back up. So we should revert this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8442) Revert HIVE-8403
[ https://issues.apache.org/jira/browse/HIVE-8442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8442: - Attachment: HIVE-8442.patch For the record, here's the reversion patch. Revert HIVE-8403 Key: HIVE-8442 URL: https://issues.apache.org/jira/browse/HIVE-8442 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8442.patch HIVE-8403 caused the number of tests run to drop from ~6K to ~4K. Also, the datanucleus repo is back up. So we should revert this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8442) Revert HIVE-8403
[ https://issues.apache.org/jira/browse/HIVE-8442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved HIVE-8442. -- Resolution: Fixed Reverted in both trunk and branch-0.14. Revert HIVE-8403 Key: HIVE-8442 URL: https://issues.apache.org/jira/browse/HIVE-8442 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8442.patch HIVE-8403 caused the number of tests run to drop from ~6K to ~4K. Also, the datanucleus repo is back up. So we should revert this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8332) Reading an ACID table with vectorization on results in NPE
[ https://issues.apache.org/jira/browse/HIVE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8332: - Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked into branch 0.14 and trunk. Reading an ACID table with vectorization on results in NPE -- Key: HIVE-8332 URL: https://issues.apache.org/jira/browse/HIVE-8332 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8332.patch On a transactional table, insert some data, then with vectorization turned on do a select. The result is: {code} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.getObjectInspector(OrcInputFormat.java:1137) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.init(VectorizedOrcAcidRowReader.java:61) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1041) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246) ... 25 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.
[ https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8258: - Resolution: Fixed Status: Resolved (was: Patch Available) Patch 6 checked into trunk and branch 0.14. Thanks Eugene for the review. Compactor cleaners can be starved on a busy table or partition. --- Key: HIVE-8258 URL: https://issues.apache.org/jira/browse/HIVE-8258 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, HIVE-8258.5.patch, HIVE-8258.6.patch, HIVE-8258.patch Currently the cleaning thread in the compactor does not run on a table or partition while any locks are held on this partition. This leaves it open to starvation in the case of a busy table or partition. It only needs to wait until all locks on the table/partition at the time of the compaction have expired. Any jobs initiated after that (and thus any locks obtained) will be for the new versions of the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file
[ https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8368: - Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk and branch 0.14. Thanks Eugene for the review. compactor is improperly writing delete records in base file --- Key: HIVE-8368 URL: https://issues.apache.org/jira/browse/HIVE-8368 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8368.2.patch, HIVE-8368.patch When the compactor reads records from the base and deltas, it is not properly dropping delete records. This leads to oversized base files, and possibly to wrong query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8402) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions
[ https://issues.apache.org/jira/browse/HIVE-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8402: - Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk and branch 0.14. Thanks Owen for the review. Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions - Key: HIVE-8402 URL: https://issues.apache.org/jira/browse/HIVE-8402 Project: Hive Issue Type: Bug Components: File Formats, Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8402.patch ORC is in some instances pushing SARGs into delta files. This is wrong behavior in general as it may result in failing to pull the most recent version of a row. When the SARG is applied to a row that is deleted it causes an ArrayOutOfBoundsException because there is no data in the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.
[ https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8258: - Status: Open (was: Patch Available) Missed method signature change in TestCompactor. Compactor cleaners can be starved on a busy table or partition. --- Key: HIVE-8258 URL: https://issues.apache.org/jira/browse/HIVE-8258 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, HIVE-8258.5.patch, HIVE-8258.patch Currently the cleaning thread in the compactor does not run on a table or partition while any locks are held on this partition. This leaves it open to starvation in the case of a busy table or partition. It only needs to wait until all locks on the table/partition at the time of the compaction have expired. Any jobs initiated after that (and thus any locks obtained) will be for the new versions of the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.
[ https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8258: - Status: Patch Available (was: Open) Compactor cleaners can be starved on a busy table or partition. --- Key: HIVE-8258 URL: https://issues.apache.org/jira/browse/HIVE-8258 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, HIVE-8258.5.patch, HIVE-8258.6.patch, HIVE-8258.patch Currently the cleaning thread in the compactor does not run on a table or partition while any locks are held on this partition. This leaves it open to starvation in the case of a busy table or partition. It only needs to wait until all locks on the table/partition at the time of the compaction have expired. Any jobs initiated after that (and thus any locks obtained) will be for the new versions of the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.
[ https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8258: - Attachment: HIVE-8258.6.patch A new patch with the signature change for TestCompactor. Compactor cleaners can be starved on a busy table or partition. --- Key: HIVE-8258 URL: https://issues.apache.org/jira/browse/HIVE-8258 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, HIVE-8258.5.patch, HIVE-8258.6.patch, HIVE-8258.patch Currently the cleaning thread in the compactor does not run on a table or partition while any locks are held on this partition. This leaves it open to starvation in the case of a busy table or partition. It only needs to wait until all locks on the table/partition at the time of the compaction have expired. Any jobs initiated after that (and thus any locks obtained) will be for the new versions of the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8347) Use base-64 encoding instead of custom encoding for serialized objects
[ https://issues.apache.org/jira/browse/HIVE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8347: - Resolution: Fixed Fix Version/s: 0.15.0 Assignee: Alan Gates Status: Resolved (was: Patch Available) Patch checked into trunk. Thanks Mariappan for the patch. Note, this should be assigned to Mariappan Asokan, but as Mariappan is not in the contributor list I couldn't do that. JIRA seemed to want it to be assigned to someone so I assigned it to me. But if one of the JIRA admins to add Mariappan to the contributor list then we can properly assign the JIRA. Use base-64 encoding instead of custom encoding for serialized objects -- Key: HIVE-8347 URL: https://issues.apache.org/jira/browse/HIVE-8347 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.13.1 Reporter: Mariappan Asokan Assignee: Alan Gates Fix For: 0.15.0 Attachments: HIVE-8347.patch Serialized objects that are shipped via Hadoop {{Configuration}} are encoded using custom encoding (see {{HCatUtil.encodeBytes()}} and its complement {{HCatUtil.decodeBytes()}}) which has 100% overhead. In other words, each byte in the serialized object becomes 2 bytes after encoding. Perhaps, this might be one of the reasons for the problem reported in HCATALOG-453. The patch for HCATALOG-453 compressed serialized {{InputJobInfo}} objects to solve the problem. By using Base64 encoding, the overhead will be reduced to about 33%. This will alleviate the problem for all serialized objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8347) Use base-64 encoding instead of custom encoding for serialized objects
[ https://issues.apache.org/jira/browse/HIVE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167048#comment-14167048 ] Alan Gates commented on HIVE-8347: -- You can send an email to dev@hive.apache.org and ask to be added. Use base-64 encoding instead of custom encoding for serialized objects -- Key: HIVE-8347 URL: https://issues.apache.org/jira/browse/HIVE-8347 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.13.1 Reporter: Mariappan Asokan Assignee: Alan Gates Fix For: 0.15.0 Attachments: HIVE-8347.patch Serialized objects that are shipped via Hadoop {{Configuration}} are encoded using custom encoding (see {{HCatUtil.encodeBytes()}} and its complement {{HCatUtil.decodeBytes()}}) which has 100% overhead. In other words, each byte in the serialized object becomes 2 bytes after encoding. Perhaps, this might be one of the reasons for the problem reported in HCATALOG-453. The patch for HCATALOG-453 compressed serialized {{InputJobInfo}} objects to solve the problem. By using Base64 encoding, the overhead will be reduced to about 33%. This will alleviate the problem for all serialized objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases
[ https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8367: - Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk and branch-0.14. Thanks Eugene for the patch. delete writes records in wrong order in some cases -- Key: HIVE-8367 URL: https://issues.apache.org/jira/browse/HIVE-8367 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8367.2.patch, HIVE-8367.patch I have found one query with 10k records where you do: create table insert into table -- 10k records delete from table -- just some records The records in the delete delta are not ordered properly by rowid. I assume this applies to updates as well, but I haven't tested it yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8347) Use base-64 encoding instead of custom encoding for serialized objects
[ https://issues.apache.org/jira/browse/HIVE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8347: - Assignee: Mariappan Asokan (was: Alan Gates) Use base-64 encoding instead of custom encoding for serialized objects -- Key: HIVE-8347 URL: https://issues.apache.org/jira/browse/HIVE-8347 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.13.1 Reporter: Mariappan Asokan Assignee: Mariappan Asokan Fix For: 0.15.0 Attachments: HIVE-8347.patch Serialized objects that are shipped via Hadoop {{Configuration}} are encoded using custom encoding (see {{HCatUtil.encodeBytes()}} and its complement {{HCatUtil.decodeBytes()}}) which has 100% overhead. In other words, each byte in the serialized object becomes 2 bytes after encoding. Perhaps, this might be one of the reasons for the problem reported in HCATALOG-453. The patch for HCATALOG-453 compressed serialized {{InputJobInfo}} objects to solve the problem. By using Base64 encoding, the overhead will be reduced to about 33%. This will alleviate the problem for all serialized objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8347) Use base-64 encoding instead of custom encoding for serialized objects
[ https://issues.apache.org/jira/browse/HIVE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165416#comment-14165416 ] Alan Gates commented on HIVE-8347: -- +1, [~sushanth], any concerns? Use base-64 encoding instead of custom encoding for serialized objects -- Key: HIVE-8347 URL: https://issues.apache.org/jira/browse/HIVE-8347 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.13.1 Reporter: Mariappan Asokan Attachments: HIVE-8347.patch Serialized objects that are shipped via Hadoop {{Configuration}} are encoded using custom encoding (see {{HCatUtil.encodeBytes()}} and its complement {{HCatUtil.decodeBytes()}}) which has 100% overhead. In other words, each byte in the serialized object becomes 2 bytes after encoding. Perhaps, this might be one of the reasons for the problem reported in HCATALOG-453. The patch for HCATALOG-453 compressed serialized {{InputJobInfo}} objects to solve the problem. By using Base64 encoding, the overhead will be reduced to about 33%. This will alleviate the problem for all serialized objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8308) Acid related table properties should be defined in one place and should be case insensitive
[ https://issues.apache.org/jira/browse/HIVE-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8308: - Summary: Acid related table properties should be defined in one place and should be case insensitive (was: Acid related table properties should be defined in one place) Acid related table properties should be defined in one place and should be case insensitive --- Key: HIVE-8308 URL: https://issues.apache.org/jira/browse/HIVE-8308 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Currently SemanticAnalyzer.ACID_TABLE_PROPERTY and Initiator.NO_AUTO_COMPACT are defined in the classes that use them. Since these are both potential table properties and they both are ACID related it makes sense to collect them together. There's no central place for Table properties at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8308) Acid related table properties should be defined in one place
[ https://issues.apache.org/jira/browse/HIVE-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8308: - Priority: Major (was: Minor) Acid related table properties should be defined in one place Key: HIVE-8308 URL: https://issues.apache.org/jira/browse/HIVE-8308 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Currently SemanticAnalyzer.ACID_TABLE_PROPERTY and Initiator.NO_AUTO_COMPACT are defined in the classes that use them. Since these are both potential table properties and they both are ACID related it makes sense to collect them together. There's no central place for Table properties at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8308) Acid related table properties should be defined in one place and should be case insensitive
[ https://issues.apache.org/jira/browse/HIVE-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165435#comment-14165435 ] Alan Gates commented on HIVE-8308: -- In addition to be defined in one place, they should be case insensitive. Right now transactional has to be lower case, and NO_AUTO_COMPACT has to be upper case. Acid related table properties should be defined in one place and should be case insensitive --- Key: HIVE-8308 URL: https://issues.apache.org/jira/browse/HIVE-8308 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Currently SemanticAnalyzer.ACID_TABLE_PROPERTY and Initiator.NO_AUTO_COMPACT are defined in the classes that use them. Since these are both potential table properties and they both are ACID related it makes sense to collect them together. There's no central place for Table properties at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
[ https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165436#comment-14165436 ] Alan Gates commented on HIVE-8290: -- [~leftylev], actually looking at the code, both are case sensitive, but one I made all caps and one all lower. That doesn't seem good. I've added this to HIVE-8308 to fix. With DbTxnManager configured, all ORC tables forced to be transactional --- Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8290.2.patch, HIVE-8290.patch Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large
[ https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8341: - Status: Open (was: Patch Available) Transaction information in config file can grow excessively large - Key: HIVE-8341 URL: https://issues.apache.org/jira/browse/HIVE-8341 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Attachments: HIVE-8341.patch In our testing we have seen cases where the transaction list grows very large. We need a more efficient way of communicating the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8402) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions
Alan Gates created HIVE-8402: Summary: Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions Key: HIVE-8402 URL: https://issues.apache.org/jira/browse/HIVE-8402 Project: Hive Issue Type: Bug Components: File Formats, Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 ORC is in some instances pushing SARGs into delta files. This is wrong behavior in general as it may result in failing to pull the most recent version of a row. When the SARG is applied to a row that is deleted it causes an ArrayOutOfBoundsException because there is no data in the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8402) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions
[ https://issues.apache.org/jira/browse/HIVE-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163876#comment-14163876 ] Alan Gates commented on HIVE-8402: -- In my tests at the moment I only see this after compaction. I don't know if this is because ORC correctly doesn't push the SARGs when there are only delta files or if something else is causing this. Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions - Key: HIVE-8402 URL: https://issues.apache.org/jira/browse/HIVE-8402 Project: Hive Issue Type: Bug Components: File Formats, Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 ORC is in some instances pushing SARGs into delta files. This is wrong behavior in general as it may result in failing to pull the most recent version of a row. When the SARG is applied to a row that is deleted it causes an ArrayOutOfBoundsException because there is no data in the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8403) Build broken by datanucleus.org being offline
Alan Gates created HIVE-8403: Summary: Build broken by datanucleus.org being offline Key: HIVE-8403 URL: https://issues.apache.org/jira/browse/HIVE-8403 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Alan Gates Priority: Blocker Fix For: 0.14.0 datanucleus.org is not available, making it impossible to download jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8403) Build broken by datanucleus.org being offline
[ https://issues.apache.org/jira/browse/HIVE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8403: - Attachment: HIVE-8403.patch Removes datanucleus.org as a repository and adds JBoss in order to pick up JMS jars. Build broken by datanucleus.org being offline - Key: HIVE-8403 URL: https://issues.apache.org/jira/browse/HIVE-8403 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8403.patch datanucleus.org is not available, making it impossible to download jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8403) Build broken by datanucleus.org being offline
[ https://issues.apache.org/jira/browse/HIVE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8403: - Assignee: Alan Gates Status: Patch Available (was: Open) Build broken by datanucleus.org being offline - Key: HIVE-8403 URL: https://issues.apache.org/jira/browse/HIVE-8403 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8403.patch datanucleus.org is not available, making it impossible to download jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8367) delete writes records in wrong order in some cases
[ https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164332#comment-14164332 ] Alan Gates commented on HIVE-8367: -- bq. What was the original query where the issue showed up? {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); insert into table concur_orc_tab select * from texttab; -- loads 10k records into the table delete from concur_orc_tab where age = 20 and age 30; {code} This resulted in only some rows being deleted (~300 of the 1700 that should have been deleted) What precisely was the problem and how does the RS deduplication change help? The problem was that because the code was turning off the RS deduplication it was getting a plan with two MR jobs. The sort by ROW__ID was done in job one, and the bucketing was done in job two. This meant that the bucketing in job 2 partially undid the sorting of job 1, resulting in only some of the records showing up as deleted (since the records have to be written in the delta file in proper order). The minimum number of reducers on which to apply the RS deduplication is pushed to 1 so that this optimization is used for even small queries. How is the changes to sort order of ROW__ID related? That should never have been set to descending in the first place. ROW__ID needs to be stored ascending to work properly. I suspect it was a fluke of most of the qfile tests that they worked with this on. (Actually Thejas asked at the time why this was necessary and rather than fixing it (which I should have done) I just said I didn't know. Oops.) bq. ReduceSinkDeDuplication.java change is not needed What change? I don't see any changes to that file in the patch. delete writes records in wrong order in some cases -- Key: HIVE-8367 URL: https://issues.apache.org/jira/browse/HIVE-8367 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8367.patch I have found one query with 10k records where you do: create table insert into table -- 10k records delete from table -- just some records The records in the delete delta are not ordered properly by rowid. I assume this applies to updates as well, but I haven't tested it yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8402) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions
[ https://issues.apache.org/jira/browse/HIVE-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8402: - Attachment: HIVE-8402.patch A patch to change orc to not push sargs into the deltas. And to answer my earlier unknown, this did only happen when a base was also present. When there was no base file the sarg was not being written into the options passed to OrcRawRecordMerge (see OrcInputFormat.getReader, around line 1121). Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions - Key: HIVE-8402 URL: https://issues.apache.org/jira/browse/HIVE-8402 Project: Hive Issue Type: Bug Components: File Formats, Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8402.patch ORC is in some instances pushing SARGs into delta files. This is wrong behavior in general as it may result in failing to pull the most recent version of a row. When the SARG is applied to a row that is deleted it causes an ArrayOutOfBoundsException because there is no data in the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8402) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions
[ https://issues.apache.org/jira/browse/HIVE-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8402: - Status: Patch Available (was: Open) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions - Key: HIVE-8402 URL: https://issues.apache.org/jira/browse/HIVE-8402 Project: Hive Issue Type: Bug Components: File Formats, Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8402.patch ORC is in some instances pushing SARGs into delta files. This is wrong behavior in general as it may result in failing to pull the most recent version of a row. When the SARG is applied to a row that is deleted it causes an ArrayOutOfBoundsException because there is no data in the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file
[ https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8368: - Status: Open (was: Patch Available) compactor is improperly writing delete records in base file --- Key: HIVE-8368 URL: https://issues.apache.org/jira/browse/HIVE-8368 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8368.patch When the compactor reads records from the base and deltas, it is not properly dropping delete records. This leads to oversized base files, and possibly to wrong query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file
[ https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8368: - Attachment: HIVE-8368.2.patch Rebased version of the patch. compactor is improperly writing delete records in base file --- Key: HIVE-8368 URL: https://issues.apache.org/jira/browse/HIVE-8368 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8368.2.patch, HIVE-8368.patch When the compactor reads records from the base and deltas, it is not properly dropping delete records. This leads to oversized base files, and possibly to wrong query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file
[ https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8368: - Status: Patch Available (was: Open) compactor is improperly writing delete records in base file --- Key: HIVE-8368 URL: https://issues.apache.org/jira/browse/HIVE-8368 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8368.2.patch, HIVE-8368.patch When the compactor reads records from the base and deltas, it is not properly dropping delete records. This leads to oversized base files, and possibly to wrong query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases
[ https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8367: - Status: Open (was: Patch Available) delete writes records in wrong order in some cases -- Key: HIVE-8367 URL: https://issues.apache.org/jira/browse/HIVE-8367 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8367.patch I have found one query with 10k records where you do: create table insert into table -- 10k records delete from table -- just some records The records in the delete delta are not ordered properly by rowid. I assume this applies to updates as well, but I haven't tested it yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases
[ https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8367: - Status: Patch Available (was: Open) delete writes records in wrong order in some cases -- Key: HIVE-8367 URL: https://issues.apache.org/jira/browse/HIVE-8367 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8367.2.patch, HIVE-8367.patch I have found one query with 10k records where you do: create table insert into table -- 10k records delete from table -- just some records The records in the delete delta are not ordered properly by rowid. I assume this applies to updates as well, but I haven't tested it yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases
[ https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8367: - Attachment: HIVE-8367.2.patch Rebased version of the patch. delete writes records in wrong order in some cases -- Key: HIVE-8367 URL: https://issues.apache.org/jira/browse/HIVE-8367 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8367.2.patch, HIVE-8367.patch I have found one query with 10k records where you do: create table insert into table -- 10k records delete from table -- just some records The records in the delete delta are not ordered properly by rowid. I assume this applies to updates as well, but I haven't tested it yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.
[ https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8258: - Status: Patch Available (was: Open) Compactor cleaners can be starved on a busy table or partition. --- Key: HIVE-8258 URL: https://issues.apache.org/jira/browse/HIVE-8258 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, HIVE-8258.5.patch, HIVE-8258.patch Currently the cleaning thread in the compactor does not run on a table or partition while any locks are held on this partition. This leaves it open to starvation in the case of a busy table or partition. It only needs to wait until all locks on the table/partition at the time of the compaction have expired. Any jobs initiated after that (and thus any locks obtained) will be for the new versions of the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.
[ https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8258: - Attachment: HIVE-8258.5.patch Rebased patch. Compactor cleaners can be starved on a busy table or partition. --- Key: HIVE-8258 URL: https://issues.apache.org/jira/browse/HIVE-8258 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, HIVE-8258.5.patch, HIVE-8258.patch Currently the cleaning thread in the compactor does not run on a table or partition while any locks are held on this partition. This leaves it open to starvation in the case of a busy table or partition. It only needs to wait until all locks on the table/partition at the time of the compaction have expired. Any jobs initiated after that (and thus any locks obtained) will be for the new versions of the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6669) sourcing txn-script from schema script results in failure for mysql oracle
[ https://issues.apache.org/jira/browse/HIVE-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-6669: - Attachment: HIVE-6669.2.patch A new version of the patch that quotes postgres table and field names. sourcing txn-script from schema script results in failure for mysql oracle Key: HIVE-6669 URL: https://issues.apache.org/jira/browse/HIVE-6669 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Alan Gates Priority: Blocker Attachments: HIVE-6669.2.patch, HIVE-6669.patch This issues is addressed in 0.13 by in-lining the the transaction schema statements in the schema initialization script (HIVE-6559) The 0.14 schema initialization is not fixed. This is the followup ticket for to address the problem in 0.14. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8341) Transaction information in config file can grow excessively large
[ https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164472#comment-14164472 ] Alan Gates commented on HIVE-8341: -- Do you have a simple query with a transform in it that shows the issue with the process builder? Transaction information in config file can grow excessively large - Key: HIVE-8341 URL: https://issues.apache.org/jira/browse/HIVE-8341 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Attachments: HIVE-8341.patch In our testing we have seen cases where the transaction list grows very large. We need a more efficient way of communicating the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.
[ https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8258: - Status: Patch Available (was: Open) Ignore that last comment, the issue was just pilot error. Compactor cleaners can be starved on a busy table or partition. --- Key: HIVE-8258 URL: https://issues.apache.org/jira/browse/HIVE-8258 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, HIVE-8258.patch Currently the cleaning thread in the compactor does not run on a table or partition while any locks are held on this partition. This leaves it open to starvation in the case of a busy table or partition. It only needs to wait until all locks on the table/partition at the time of the compaction have expired. Any jobs initiated after that (and thus any locks obtained) will be for the new versions of the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6669) sourcing txn-script from schema script results in failure for mysql oracle
[ https://issues.apache.org/jira/browse/HIVE-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-6669: - Status: Patch Available (was: Open) NO PRECOMMIT TESTS sourcing txn-script from schema script results in failure for mysql oracle Key: HIVE-6669 URL: https://issues.apache.org/jira/browse/HIVE-6669 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Alan Gates Priority: Blocker Attachments: HIVE-6669.patch This issues is addressed in 0.13 by in-lining the the transaction schema statements in the schema initialization script (HIVE-6559) The 0.14 schema initialization is not fixed. This is the followup ticket for to address the problem in 0.14. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file
[ https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8368: - Status: Patch Available (was: Open) compactor is improperly writing delete records in base file --- Key: HIVE-8368 URL: https://issues.apache.org/jira/browse/HIVE-8368 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8367.patch When the compactor reads records from the base and deltas, it is not properly dropping delete records. This leads to oversized base files, and possibly to wrong query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file
[ https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8368: - Attachment: HIVE-8367.patch The issue comes out when input sizes are large enough that they exceed one map task. This patch fixes it by turning on reduce deduplication in the optimizer (which was being turned off before) and dropping the minimum number of reducers to 1 (instead of 4). This has the side effect of halving the time it takes to do an update or delete. compactor is improperly writing delete records in base file --- Key: HIVE-8368 URL: https://issues.apache.org/jira/browse/HIVE-8368 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8367.patch When the compactor reads records from the base and deltas, it is not properly dropping delete records. This leads to oversized base files, and possibly to wrong query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8368) compactor is improperly writing delete records in base file
[ https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162040#comment-14162040 ] Alan Gates commented on HIVE-8368: -- Ignore the last comment, it was intended for a different JIRA. compactor is improperly writing delete records in base file --- Key: HIVE-8368 URL: https://issues.apache.org/jira/browse/HIVE-8368 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 When the compactor reads records from the base and deltas, it is not properly dropping delete records. This leads to oversized base files, and possibly to wrong query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases
[ https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8367: - Attachment: HIVE-8367.patch The issue comes out when input sizes are large enough that they exceed one map task. This patch fixes it by turning on reduce deduplication in the optimizer (which was being turned off before) and dropping the minimum number of reducers to 1 (instead of 4). This has the side effect of halving the time it takes to do an update or delete. delete writes records in wrong order in some cases -- Key: HIVE-8367 URL: https://issues.apache.org/jira/browse/HIVE-8367 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8367.patch I have found one query with 10k records where you do: create table insert into table -- 10k records delete from table -- just some records The records in the delete delta are not ordered properly by rowid. I assume this applies to updates as well, but I haven't tested it yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases
[ https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8367: - Status: Patch Available (was: Open) delete writes records in wrong order in some cases -- Key: HIVE-8367 URL: https://issues.apache.org/jira/browse/HIVE-8367 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8367.patch I have found one query with 10k records where you do: create table insert into table -- 10k records delete from table -- just some records The records in the delete delta are not ordered properly by rowid. I assume this applies to updates as well, but I haven't tested it yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file
[ https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8368: - Attachment: (was: HIVE-8367.patch) compactor is improperly writing delete records in base file --- Key: HIVE-8368 URL: https://issues.apache.org/jira/browse/HIVE-8368 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 When the compactor reads records from the base and deltas, it is not properly dropping delete records. This leads to oversized base files, and possibly to wrong query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file
[ https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8368: - Status: Open (was: Patch Available) compactor is improperly writing delete records in base file --- Key: HIVE-8368 URL: https://issues.apache.org/jira/browse/HIVE-8368 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 When the compactor reads records from the base and deltas, it is not properly dropping delete records. This leads to oversized base files, and possibly to wrong query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)