[jira] [Created] (HIVE-8637) In insert into X select from Y, table properties from X are clobbering those from Y

2014-10-28 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8637:


 Summary: In insert into X select from Y, table properties from X 
are clobbering those from Y
 Key: HIVE-8637
 URL: https://issues.apache.org/jira/browse/HIVE-8637
 Project: Hive
  Issue Type: Task
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


With a query like:
{code}
insert into table X select * from Y;
{code}
the table properties from table X are being sent to the input formats for table 
Y.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8637) In insert into X select from Y, table properties from X are clobbering those from Y

2014-10-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187445#comment-14187445
 ] 

Alan Gates commented on HIVE-8637:
--

The issue is that HiveOutputFormatImpl.checkOutputSpecs writes the table 
properties for table X into the conf file.  When HiveInputFormat.getInputSplits 
later takes that same conf file, and goes to copy the table properties in, it 
calls Utilities.copyTableJobPropertiesToConf (the same method that 
checkOutputSpecs did).  The issue is in copyTablePropertiesToConf, it does not 
overwrite a given table property in the job conf if it is already set.  This 
means that many of the table properties from Y don't get propagated because the 
values from X are already set.

I do not believe this is a new problem, but it is showing up now because 
reading transactional tables depends on the bucket count to be accurate.  So a 
query like:
{code}
create table notbucketed (a string, b int);
create table transactional (a string, b int) clustered by (b) into 2 buckets 
stored as orc tblproperties = ('transactional' = 'true');
insert into table notbucketed select * from transactional;
{code}
results in the table 'transactional' being told it has no buckets.  Since the 
acid reader depends on this value, it concludes that with no buckets it has no 
splits, and thus the above insert writes nothing into 'notbucketed' regardless 
of how many records are in 'transactional'. 

 In insert into X select from Y, table properties from X are clobbering those 
 from Y
 ---

 Key: HIVE-8637
 URL: https://issues.apache.org/jira/browse/HIVE-8637
 Project: Hive
  Issue Type: Task
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


 With a query like:
 {code}
 insert into table X select * from Y;
 {code}
 the table properties from table X are being sent to the input formats for 
 table Y.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8637) In insert into X select from Y, table properties from X are clobbering those from Y

2014-10-28 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8637:
-
Attachment: HIVE-8637.patch

This is not a permanent fix.  This fix works by changing 
HiveInputFormat.getInputSplits to call a new method in Utilities that sets 
values from table properties in the job conf whether they are already set or 
not.  This seems safe, since the table should properly understand its own 
properties.

I believe the correct long term solution is to make sure a different copy of 
JobConf goes to the input and output tables, so each can write whatever it 
wants there.  I think that would have to be done in ExecDriver.execute, since 
calls to checkOutputSpecs and getInputSplits are done by Hadoop after Hive 
submits the job.  I think that would fix the MR case.  I'm sure the fix for Tez 
would be slightly different (since the job is submitted all at once).

But this would also destroy any ability to communicate information across jobs 
via the conf file.  I don't know if anything is doing that or not.  I'm loathe 
to make that big a change when [~hagleitn] has said he wants to cut a release 
in a week.

So, I propose this smaller change now, and we file a JIRA for the bigger, more 
complete fix.

 In insert into X select from Y, table properties from X are clobbering those 
 from Y
 ---

 Key: HIVE-8637
 URL: https://issues.apache.org/jira/browse/HIVE-8637
 Project: Hive
  Issue Type: Task
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8637.patch


 With a query like:
 {code}
 insert into table X select * from Y;
 {code}
 the table properties from table X are being sent to the input formats for 
 table Y.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8637) In insert into X select from Y, table properties from X are clobbering those from Y

2014-10-28 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8637:
-
Status: Patch Available  (was: Open)

 In insert into X select from Y, table properties from X are clobbering those 
 from Y
 ---

 Key: HIVE-8637
 URL: https://issues.apache.org/jira/browse/HIVE-8637
 Project: Hive
  Issue Type: Task
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8637.patch


 With a query like:
 {code}
 insert into table X select * from Y;
 {code}
 the table properties from table X are being sent to the input formats for 
 table Y.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8629) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez

2014-10-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187676#comment-14187676
 ] 

Alan Gates commented on HIVE-8629:
--

Doing LOG.debug and documenting it should be fine.  Other than that, +1.

 Streaming / ACID : hive cli session creation takes too long and times out if 
 execution engine is tez
 

 Key: HIVE-8629
 URL: https://issues.apache.org/jira/browse/HIVE-8629
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: ACID, Streaming
 Attachments: HIVE-8629.patch


 When creating a hive session to run basic alter table create partition  
 queries, the session creation takes too long (more than 5 sec)  if the hive 
 execution engine is set to tez.
 Since the streaming clients dont care about Tez , it can explicitly override 
 the setting to mr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7408) HCatPartition needs getPartCols method

2014-10-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185260#comment-14185260
 ] 

Alan Gates commented on HIVE-7408:
--

+1

 HCatPartition needs getPartCols method
 --

 Key: HIVE-7408
 URL: https://issues.apache.org/jira/browse/HIVE-7408
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: JongWon Park
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7408.1.patch.txt, HIVE-7408.2.patch.txt


 org.apache.hive.hcatalog.api.HCatPartition has getColumns method. However, it 
 is not partition column. HCatPartition needs getPartCols method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8605) HIVE-5799 breaks backward compatibility for time values in config

2014-10-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185315#comment-14185315
 ] 

Alan Gates commented on HIVE-8605:
--

As far as I know all of the time units were whole integers, so 'd' for double, 
and 'f' for float probably don't make sense.  'l' for long is the only one I 
know of people using (we found this when a co-worker copied a config file from 
0.13 and used it against 0.14 branch).  So I could change the patch to just 
support 'l'.  We have to find someway not to break that backward compatibility 
without also breaking your changes to do the time units.

 HIVE-5799 breaks backward compatibility for time values in config
 -

 Key: HIVE-8605
 URL: https://issues.apache.org/jira/browse/HIVE-8605
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8605.patch


 It is legal for long values in the config file to have an L or for float 
 values to have an f.  For example, the default value for 
 hive.compactor.check.interval was 300L.  As part of HIVE-5799, many long 
 values were converted to TimeUnit.  Attempts to read these values now throw 
 java.lang.IllegalArgumentException: Invalid time unit l
 We need to change this to ignore the L or f, so that users existing config 
 files don't break.  I propose to do this by changing HiveConf.unitFor to 
 detect the L or f and interpret it to mean the default time unit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist

2014-10-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185335#comment-14185335
 ] 

Alan Gates commented on HIVE-8583:
--

I'm not opposed to changing the order of the modifiers, I just didn't 
understand why it mattered.  So no need for a new patch.

We do need the tests to run on this patch though.  I don't think the build 
failure has anything with your patch.  So just canceling the patch, 
re-attaching the file, and re-submitting the patch should force the tests to 
run.

 HIVE-8341 Cleanup  Test for hive.script.operator.env.blacklist
 ---

 Key: HIVE-8583
 URL: https://issues.apache.org/jira/browse/HIVE-8583
 Project: Hive
  Issue Type: Improvement
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor
 Attachments: HIVE-8583.1.patch


 [~alangates] added the following in HIVE-8341:
 {code}
 String bl = 
 hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString());
 if (bl != null  bl.length()  0) {
   String[] bls = bl.split(,);
   for (String b : bls) {
 b.replaceAll(., _);
 blackListedConfEntries.add(b);
   }
 }
 {code}
 The {{replaceAll}} call is confusing as its result is not used at all.
 This patch contains the following:
 * Minor style modification (missorted modifiers)
 * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST
 * Removes replaceAll
 * Lets blackListed take a Configuration job as parameter which allowed me to 
 add a test for this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8562) ResultSet.isClosed sometimes doesn't work with mysql

2014-10-27 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved HIVE-8562.
--
Resolution: Invalid

Turns out I was using an old version of the mysql JDBC jar.  Once I use the 
proper version this issue goes away.

 ResultSet.isClosed sometimes doesn't work with mysql
 

 Key: HIVE-8562
 URL: https://issues.apache.org/jira/browse/HIVE-8562
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


 Calls to ResultSet.isClosed are sometimes throwing an AbstractMethodException 
 when used against MySQL.  This is causing issues for the compactor when it 
 tries to update stats.  As far as I can tell it only happens when the result 
 set is empty (which is weird).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6669) sourcing txn-script from schema script results in failure for mysql oracle

2014-10-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185591#comment-14185591
 ] 

Alan Gates commented on HIVE-6669:
--

All the txn tables are already in the hive-schema-0.14.0.mssql.sql.  I don't 
know why.  But that's why I didn't add them.

 sourcing txn-script from schema script results in failure for mysql  oracle
 

 Key: HIVE-6669
 URL: https://issues.apache.org/jira/browse/HIVE-6669
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Alan Gates
Priority: Blocker
 Attachments: HIVE-6669.2.patch, HIVE-6669.patch


 This issues is addressed in 0.13 by in-lining the the transaction schema 
 statements in the schema initialization script (HIVE-6559)
 The 0.14 schema initialization is not fixed. This is the followup ticket for 
 to address the problem in 0.14. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8605) HIVE-5799 breaks backward compatibility for time values in config

2014-10-25 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8605:


 Summary: HIVE-5799 breaks backward compatibility for time values 
in config
 Key: HIVE-8605
 URL: https://issues.apache.org/jira/browse/HIVE-8605
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0


It is legal for long values in the config file to have an L or for float values 
to have an f.  For example, the default value for hive.compactor.check.interval 
was 300L.  As part of HIVE-5799, many long values were converted to TimeUnit.  
Attempts to read these values now throw java.lang.IllegalArgumentException: 
Invalid time unit l

We need to change this to ignore the L or f, so that users existing config 
files don't break.  I propose to do this by changing HiveConf.unitFor to detect 
the L or f and interpret it to mean the default time unit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8605) HIVE-5799 breaks backward compatibility for time values in config

2014-10-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8605:
-
Attachment: HIVE-8605.patch

 HIVE-5799 breaks backward compatibility for time values in config
 -

 Key: HIVE-8605
 URL: https://issues.apache.org/jira/browse/HIVE-8605
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8605.patch


 It is legal for long values in the config file to have an L or for float 
 values to have an f.  For example, the default value for 
 hive.compactor.check.interval was 300L.  As part of HIVE-5799, many long 
 values were converted to TimeUnit.  Attempts to read these values now throw 
 java.lang.IllegalArgumentException: Invalid time unit l
 We need to change this to ignore the L or f, so that users existing config 
 files don't break.  I propose to do this by changing HiveConf.unitFor to 
 detect the L or f and interpret it to mean the default time unit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8605) HIVE-5799 breaks backward compatibility for time values in config

2014-10-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8605:
-
Status: Patch Available  (was: Open)

[~navis], if you have a chance to review this that would be great.

 HIVE-5799 breaks backward compatibility for time values in config
 -

 Key: HIVE-8605
 URL: https://issues.apache.org/jira/browse/HIVE-8605
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8605.patch


 It is legal for long values in the config file to have an L or for float 
 values to have an f.  For example, the default value for 
 hive.compactor.check.interval was 300L.  As part of HIVE-5799, many long 
 values were converted to TimeUnit.  Attempts to read these values now throw 
 java.lang.IllegalArgumentException: Invalid time unit l
 We need to change this to ignore the L or f, so that users existing config 
 files don't break.  I propose to do this by changing HiveConf.unitFor to 
 detect the L or f and interpret it to mean the default time unit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist

2014-10-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183002#comment-14183002
 ] 

Alan Gates commented on HIVE-8583:
--

Yes, Lars is correct.  That is just a piece of earlier code that I neglected to 
take out.

 HIVE-8341 Cleanup  Test for hive.script.operator.env.blacklist
 ---

 Key: HIVE-8583
 URL: https://issues.apache.org/jira/browse/HIVE-8583
 Project: Hive
  Issue Type: Improvement
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor
 Attachments: HIVE-8583.1.patch


 [~alangates] added the following in HIVE-8341:
 {code}
 String bl = 
 hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString());
 if (bl != null  bl.length()  0) {
   String[] bls = bl.split(,);
   for (String b : bls) {
 b.replaceAll(., _);
 blackListedConfEntries.add(b);
   }
 }
 {code}
 The {{replaceAll}} call is confusing as its result is not used at all.
 This patch contains the following:
 * Minor style modification (missorted modifiers)
 * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST
 * Removes replaceAll
 * Lets blackListed take a Configuration job as parameter which allowed me to 
 add a test for this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist

2014-10-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183171#comment-14183171
 ] 

Alan Gates commented on HIVE-8583:
--

+1, patch looks fine.

The statement missorted modifiers implies there is a correct order.  If the 
compiler doesn't care about final static private versus private static 
final why should we?

 HIVE-8341 Cleanup  Test for hive.script.operator.env.blacklist
 ---

 Key: HIVE-8583
 URL: https://issues.apache.org/jira/browse/HIVE-8583
 Project: Hive
  Issue Type: Improvement
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor
 Attachments: HIVE-8583.1.patch


 [~alangates] added the following in HIVE-8341:
 {code}
 String bl = 
 hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString());
 if (bl != null  bl.length()  0) {
   String[] bls = bl.split(,);
   for (String b : bls) {
 b.replaceAll(., _);
 blackListedConfEntries.add(b);
   }
 }
 {code}
 The {{replaceAll}} call is confusing as its result is not used at all.
 This patch contains the following:
 * Minor style modification (missorted modifiers)
 * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST
 * Removes replaceAll
 * Lets blackListed take a Configuration job as parameter which allowed me to 
 add a test for this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8516) insert/values allowed against bucketed, non-transactional tables

2014-10-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183735#comment-14183735
 ] 

Alan Gates commented on HIVE-8516:
--

Actually, I think this bug is invalid.  I was confused.  Hive certainly 
supports inserts into bucketed tables.

 insert/values allowed against bucketed, non-transactional tables
 

 Key: HIVE-8516
 URL: https://issues.apache.org/jira/browse/HIVE-8516
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Matt McCline

 Hive does not support insert into bucketed tables.  A special exception is 
 made for transactional tables, as they require bucketing.  
 Insert/values works against non-transactional tables, since it just dumps the 
 values into a temp table and rewrites the query into insert/select from that 
 temp table.  However, the check that prevents doing inserts into 
 non-transactional, bucketed tables is not catching this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8516) insert/values allowed against bucketed, non-transactional tables

2014-10-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved HIVE-8516.
--
Resolution: Invalid

 insert/values allowed against bucketed, non-transactional tables
 

 Key: HIVE-8516
 URL: https://issues.apache.org/jira/browse/HIVE-8516
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Matt McCline

 Hive does not support insert into bucketed tables.  A special exception is 
 made for transactional tables, as they require bucketing.  
 Insert/values works against non-transactional tables, since it just dumps the 
 values into a temp table and rewrites the query into insert/select from that 
 temp table.  However, the check that prevents doing inserts into 
 non-transactional, bucketed tables is not catching this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8543) Compactions fail on metastore using postgres

2014-10-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8543:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Test failures are not related.  I ran the streaming test locally and saw no 
issues.

Patch committed to trunk and 0.14 branch.  Thanks Damien for writing most of 
the code for this and Eugene for reviewing it.

 Compactions fail on metastore using postgres
 

 Key: HIVE-8543
 URL: https://issues.apache.org/jira/browse/HIVE-8543
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8543.patch


 The worker fails to update the stats when the metastore is using Postgres as 
 the RDBMS.  
 {code}
 org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not 
 exist
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8474:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and branch 0.14.  Thanks Ashutosh and Matt for the reviews.

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8474.2.patch, HIVE-8474.patch


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 13 more
 {code}
 The issue is that the object inspector passed to VectorizedOrcAcidRowReader 
 has all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8562) ResultSet.isClosed sometimes doesn't work with mysql

2014-10-22 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8562:


 Summary: ResultSet.isClosed sometimes doesn't work with mysql
 Key: HIVE-8562
 URL: https://issues.apache.org/jira/browse/HIVE-8562
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


Calls to ResultSet.isClosed are sometimes throwing an AbstractMethodException 
when used against MySQL.  This is causing issues for the compactor when it 
tries to update stats.  As far as I can tell it only happens when the result 
set is empty (which is weird).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8235) Insert into partitioned bucketed sorted tables fails with this file is already being created by

2014-10-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved HIVE-8235.
--
Resolution: Cannot Reproduce

Closing as cannot reproduce, as I cannot reproduce this.  Please re-open if you 
see it again.

 Insert into partitioned bucketed sorted tables fails with this file is 
 already being created by
 -

 Key: HIVE-8235
 URL: https://issues.apache.org/jira/browse/HIVE-8235
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: insert_into_partitioned_bucketed_table.txt.tar.gz.zip


 When loading into a partitioned bucketed sorted table the query fails with 
 {code}
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
  Failed to create file 
 [/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0]
  for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for 
 client [172.21.128.111], because this file is already being created by 
 [DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on 
 [172.21.128.122]
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy15.create(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
   at com.sun.proxy.$Proxy15.create(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600)
   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)
   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
   at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
   at 
 

[jira] [Created] (HIVE-8543) Compactions fail on metastore using postgres

2014-10-21 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8543:


 Summary: Compactions fail on metastore using postgres
 Key: HIVE-8543
 URL: https://issues.apache.org/jira/browse/HIVE-8543
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


The worker fails to update the stats when the metastore is using Postgres as 
the RDBMS.  

{code}
org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not 
exist
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end

2014-10-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178936#comment-14178936
 ] 

Alan Gates commented on HIVE-7689:
--

Opened HIVE-8543 to deal with the issue in the compactor.

 Fix wrong lower case table names in Postgres Metastore back end
 ---

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Blocker
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, 
 HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, 
 HIVE-7889.3.patch, HIVE-7889.4.patch


 Current 0.14 patch create table with lower case names.
 This patch fix wrong lower case tables names in Postgres Metastore back end.
 Mixing lower case and upper case throws bugs in {{JDBCStatsPublisher}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8543) Compactions fail on metastore using postgres

2014-10-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179063#comment-14179063
 ] 

Alan Gates commented on HIVE-8543:
--

I'm testing a fix as well.  If it passes I'll post it shortly so you can make 
sure it works in your environment.

 Compactions fail on metastore using postgres
 

 Key: HIVE-8543
 URL: https://issues.apache.org/jira/browse/HIVE-8543
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


 The worker fails to update the stats when the metastore is using Postgres as 
 the RDBMS.  
 {code}
 org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not 
 exist
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8543) Compactions fail on metastore using postgres

2014-10-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8543:
-
Status: Patch Available  (was: Open)

 Compactions fail on metastore using postgres
 

 Key: HIVE-8543
 URL: https://issues.apache.org/jira/browse/HIVE-8543
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8543.patch


 The worker fails to update the stats when the metastore is using Postgres as 
 the RDBMS.  
 {code}
 org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not 
 exist
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8543) Compactions fail on metastore using postgres

2014-10-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8543:
-
Attachment: HIVE-8543.patch

This patch adds quotes to operations for the TABLE_COL_STATS and PART_COL_STATS 
tables.  The code for this was taken almost exclusively from [~damien.carol]'s 
patch on HIVE-7689

 Compactions fail on metastore using postgres
 

 Key: HIVE-8543
 URL: https://issues.apache.org/jira/browse/HIVE-8543
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8543.patch


 The worker fails to update the stats when the metastore is using Postgres as 
 the RDBMS.  
 {code}
 org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not 
 exist
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8341) Transaction information in config file can grow excessively large

2014-10-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179194#comment-14179194
 ] 

Alan Gates commented on HIVE-8341:
--

[~leftylev] what needs documented here?

 Transaction information in config file can grow excessively large
 -

 Key: HIVE-8341
 URL: https://issues.apache.org/jira/browse/HIVE-8341
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-8341.2.patch, HIVE-8341.3.patch, HIVE-8341.patch


 In our testing we have seen cases where the transaction list grows very 
 large.  We need a more efficient way of communicating the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8341) Transaction information in config file can grow excessively large

2014-10-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179220#comment-14179220
 ] 

Alan Gates commented on HIVE-8341:
--

It doesn't need to be documented in Hive Transactions.  There's nothing 
transaction specific about it.  I don't expect users to set this themselves.

I've updated the Configuration Properties with information on this value.  I'm 
glad you caught this, as I didn't realize we recorded all conf keys there.

I'll also add the same information to the release notes for this bug.



 Transaction information in config file can grow excessively large
 -

 Key: HIVE-8341
 URL: https://issues.apache.org/jira/browse/HIVE-8341
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-8341.2.patch, HIVE-8341.3.patch, HIVE-8341.patch


 In our testing we have seen cases where the transaction list grows very 
 large.  We need a more efficient way of communicating the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large

2014-10-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8341:
-
Release Note: 
A new environment variable hive.script.operator.env.blacklist was added in 
0.14.  Its default value is 
hive.txn.valid.txns,hive.script.operator.env.blacklist

By default all values in the HiveConf object are converted to environment 
variables of the same name as the key (with '.' (dot) converted to '_' 
(underscore)) and set as part of the script operator's environment.  However, 
some values can grow large or are not amenable to translation to environment 
variables.  This value gives a comma separated list of configuration values 
that will not be set in the environment when calling a script operator.  By 
default the valid transaction list is excluded, as it can grow large and is 
sometimes compressed, which does not translate well to an environment variable.

 Transaction information in config file can grow excessively large
 -

 Key: HIVE-8341
 URL: https://issues.apache.org/jira/browse/HIVE-8341
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-8341.2.patch, HIVE-8341.3.patch, HIVE-8341.patch


 In our testing we have seen cases where the transaction list grows very 
 large.  We need a more efficient way of communicating the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-20 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8474:
-
Status: Open  (was: Patch Available)

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8474.patch


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 13 more
 {code}
 The issue is that the object inspector passed to VectorizedOrcAcidRowReader 
 has all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177104#comment-14177104
 ] 

Alan Gates commented on HIVE-8474:
--

Ok, I'll rework it not to use addToBatchFrom.  I do plan on factoring out the 
switch statement so that can be shared, but hopefully that will be alright.

Are you ok with the changes to VectorizedRowBatch to add tracking the partition 
columns?

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8474.patch


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 13 more
 {code}
 The issue is that the object inspector passed to VectorizedOrcAcidRowReader 
 has all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177198#comment-14177198
 ] 

Alan Gates commented on HIVE-8474:
--

Are you saying that rather than changing VectorizedRowBatch I should just pass 
along the Ctx to my new acidAddToBatchFrom method and use that to figure out 
the partition columns?

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8474.patch


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 13 more
 {code}
 The issue is that the object inspector passed to VectorizedOrcAcidRowReader 
 has all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-20 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8474:
-
Attachment: HIVE-8474.2.patch

A second version of the patch that incorporates Matt's feedback.  

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8474.2.patch, HIVE-8474.patch


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 13 more
 {code}
 The issue is that the object inspector passed to VectorizedOrcAcidRowReader 
 has all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-20 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8474:
-
Status: Patch Available  (was: Open)

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8474.2.patch, HIVE-8474.patch


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 13 more
 {code}
 The issue is that the object inspector passed to VectorizedOrcAcidRowReader 
 has all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8515) Column projection not being pushed to ORC delta files

2014-10-18 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8515:


 Summary: Column projection not being pushed to ORC delta files
 Key: HIVE-8515
 URL: https://issues.apache.org/jira/browse/HIVE-8515
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates


Currently when only some columns are projected, that projection is pushed to 
the base file but not to delta files.  This does not cause incorrect results 
(the columns are projected out later in the query execution), but it is less 
efficient then it could be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8515) Column projection not being pushed to ORC delta files

2014-10-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176077#comment-14176077
 ] 

Alan Gates commented on HIVE-8515:
--

The issue is in OrcInputFormat.getReader:
{code}
if (split.hasBase()) {
  bucket = AcidUtils.parseBaseBucketFilename(split.getPath(), conf)
  .getBucket();
  reader = OrcFile.createReader(path, OrcFile.readerOptions(conf));
  final ListOrcProto.Type types = reader.getTypes();
  setIncludedColumns(readOptions, types, conf, split.isOriginal());
  setSearchArgument(readOptions, types, conf, split.isOriginal());
} else {
  bucket = (int) split.getStart();
  reader = null;
}
}
{code}

setIncludeColumns is called if there is a base, but not if there isn't.

 Column projection not being pushed to ORC delta files
 -

 Key: HIVE-8515
 URL: https://issues.apache.org/jira/browse/HIVE-8515
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates

 Currently when only some columns are projected, that projection is pushed to 
 the base file but not to delta files.  This does not cause incorrect results 
 (the columns are projected out later in the query execution), but it is less 
 efficient then it could be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8516) insert/values allowed against bucketed, non-transactional tables

2014-10-18 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8516:


 Summary: insert/values allowed against bucketed, non-transactional 
tables
 Key: HIVE-8516
 URL: https://issues.apache.org/jira/browse/HIVE-8516
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates


Hive does not support insert into bucketed tables.  A special exception is made 
for transactional tables, as they require bucketing.  

Insert/values works against non-transactional tables, since it just dumps the 
values into a temp table and rewrites the query into insert/select from that 
temp table.  However, the check that prevents doing inserts into 
non-transactional, bucketed tables is not catching this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-10-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175151#comment-14175151
 ] 

Alan Gates commented on HIVE-8290:
--

bq. Was hive.support.concurrency required for transactions in 0.13.0?
Yes.

 With DbTxnManager configured, all ORC tables forced to be transactional
 ---

 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8290.2.patch, HIVE-8290.patch


 Currently, once a user configures DbTxnManager to the be transaction manager, 
 all tables that use ORC are expected to be transactional.  This means they 
 all have to have buckets.  This most likely won't be what users want.
 We need to add a specific mark to a table so that users can indicate it 
 should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175664#comment-14175664
 ] 

Alan Gates commented on HIVE-8474:
--

Further testing has also determined that attempts to select a partition column 
in vectorization results in a NPE as well.

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 13 more
 {code}
 The issue is that the object inspector passed to VectorizedOrcAcidRowReader 
 has all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-17 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8474:
-
Attachment: HIVE-8474.patch

This patch makes several changes in vectorization.  [~mmccline] and 
[~ashutoshc], as I am not very familiar with this code and as I know the code 
is very performance sensitive I would appreciate your feedback on the patch.

The issue causing problems was that VectorizedBatchUtil.addRowToBatchFrom is 
used by VectorizedOrcAcidRowReader to take the merged rows from and acid read 
and put them in a vector batch.  But this method appears to have been built to 
be used by vector operators, not file formats where columns may be missing 
because they have been projected out or may already have values set as they are 
partition columns.  So I made the following changes:
# I changed addRowToBatchFrom to skip writing values into ColumnVectors that 
are null.  This handles the case where columns have been projected out and thus 
the ColumnVector is null.
# I changed VectorizedRowBatch to have a boolean array to track which columns 
are partition columns and VectorizedRowBatchCtx.createVectorizedRowBatch to 
populate this array
# I changed addRowToBatchFrom to skip writing values into ColumnVectors that 
are marked in VectorizedRowBatch as partition columns, since this results in 
overwriting the values that have already been put there by 
VectorizedRowBatchCtx.addPartitionColumnsToBatch

My concern is whether it is appropriate to mix in this functionality to skip 
projected out and partition columns into addRowToBatchFrom.  If you think it 
isn't good, I can write a new method to do this.  But that will involve a fair 
amount of duplicate code.  

[~owen.omalley], I also changed VectorizedOrcAcidRowReader to set the partition 
column values after every call to VectorizedRowBatch.reset in next.  Without 
doing this the code was NPEing later in the pipeline because the partition 
column had been set to null.  It appeared that you had copied the code from 
VectorizedOrcInputFormat, which only called addPartitionColsToBatch once, but 
which never called reset.  I tried removing the call to reset but that caused 
other issues.

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8474.patch


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 

[jira] [Updated] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-17 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8474:
-
Status: Patch Available  (was: Open)

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8474.patch


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 13 more
 {code}
 The issue is that the object inspector passed to VectorizedOrcAcidRowReader 
 has all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8235) Insert into partitioned bucketed sorted tables fails with this file is already being created by

2014-10-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175690#comment-14175690
 ] 

Alan Gates commented on HIVE-8235:
--

[~mmokhtar], ping, have you had a chance to run this?  I can't reproduce it.

 Insert into partitioned bucketed sorted tables fails with this file is 
 already being created by
 -

 Key: HIVE-8235
 URL: https://issues.apache.org/jira/browse/HIVE-8235
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: insert_into_partitioned_bucketed_table.txt.tar.gz.zip


 When loading into a partitioned bucketed sorted table the query fails with 
 {code}
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
  Failed to create file 
 [/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0]
  for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for 
 client [172.21.128.111], because this file is already being created by 
 [DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on 
 [172.21.128.122]
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy15.create(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
   at com.sun.proxy.$Proxy15.create(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600)
   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)
   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
   at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
   at 
 

[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large

2014-10-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8341:
-
Status: Open  (was: Patch Available)

The TestOperators failure is caused by this patch.  The rest I believe are 
unrelated.  I'll put up a new version of the patch that addresses the 
TestOperators failure.

 Transaction information in config file can grow excessively large
 -

 Key: HIVE-8341
 URL: https://issues.apache.org/jira/browse/HIVE-8341
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Attachments: HIVE-8341.2.patch, HIVE-8341.patch


 In our testing we have seen cases where the transaction list grows very 
 large.  We need a more efficient way of communicating the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large

2014-10-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8341:
-
Status: Patch Available  (was: Open)

 Transaction information in config file can grow excessively large
 -

 Key: HIVE-8341
 URL: https://issues.apache.org/jira/browse/HIVE-8341
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Attachments: HIVE-8341.2.patch, HIVE-8341.3.patch, HIVE-8341.patch


 In our testing we have seen cases where the transaction list grows very 
 large.  We need a more efficient way of communicating the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large

2014-10-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8341:
-
Attachment: HIVE-8341.3.patch

 Transaction information in config file can grow excessively large
 -

 Key: HIVE-8341
 URL: https://issues.apache.org/jira/browse/HIVE-8341
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Attachments: HIVE-8341.2.patch, HIVE-8341.3.patch, HIVE-8341.patch


 In our testing we have seen cases where the transaction list grows very 
 large.  We need a more efficient way of communicating the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-15 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8474:


 Summary: Vectorized reads of transactional tables fail when not 
all columns are selected
 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


{code}
create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
('transactional'='true');
select name, age from concur_orc_tab order by name;
{code}
results in
{code}
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
at 
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at 
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
... 13 more
{code}

The issue is that the object inspector passed to VectorizedOrcAcidRowReader has 
all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large

2014-10-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8341:
-
Attachment: HIVE-8341.2.patch

A new version of the patch.  The transaction list is compressed, as before.  
However, with this patch I added a blacklist to the ScriptOperator to strain 
out specified conf variables and not make them environment variables.

 Transaction information in config file can grow excessively large
 -

 Key: HIVE-8341
 URL: https://issues.apache.org/jira/browse/HIVE-8341
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Attachments: HIVE-8341.2.patch, HIVE-8341.patch


 In our testing we have seen cases where the transaction list grows very 
 large.  We need a more efficient way of communicating the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large

2014-10-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8341:
-
Status: Patch Available  (was: Open)

 Transaction information in config file can grow excessively large
 -

 Key: HIVE-8341
 URL: https://issues.apache.org/jira/browse/HIVE-8341
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Attachments: HIVE-8341.2.patch, HIVE-8341.patch


 In our testing we have seen cases where the transaction list grows very 
 large.  We need a more efficient way of communicating the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end

2014-10-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171100#comment-14171100
 ] 

Alan Gates commented on HIVE-7689:
--

Ok, we should quote calls to just that stats table then.  That way we don't 
pollute all the TxnHandler code, but we can fix this problem.  We can either 
re-open this bug or open a new one.

Do you want to post a patch for this or do you want me to?  If you post it I 
can review it.  If not, I can post a patch in a day or two and you can test 
that it works in your system.

 Fix wrong lower case table names in Postgres Metastore back end
 ---

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Blocker
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, 
 HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, 
 HIVE-7889.3.patch, HIVE-7889.4.patch


 Current 0.14 patch create table with lower case names.
 This patch fix wrong lower case tables names in Postgres Metastore back end.
 Mixing lower case and upper case throws bugs in {{JDBCStatsPublisher}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8459) DbLockManager locking table in addition to partitions

2014-10-14 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8459:


 Summary: DbLockManager locking table in addition to partitions
 Key: HIVE-8459
 URL: https://issues.apache.org/jira/browse/HIVE-8459
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical


Queries and operations on partitioned tables are generating locks on the whole 
table when they should only be locking the partition.  For example:

{code}
insert into table concur_orc_tab_part partition (ds='today') values ('fred 
flintstone', 43, 1.95);
{code}

This should only be locking the partition ds='today'.  But instead:
{code}
mysql select * from HIVE_LOCKS;
+++--+-+---+--+---+--+---++-++
| HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB   | HL_TABLE  
| HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | 
HL_ACQUIRED_AT | HL_USER | HL_HOST|
+++--+-+---+--+---+--+---++-++
|425 |  1 |  204 | default | values__tmp__table__1 
| NULL | a | r| 141331074 |  
1413310738000 | hive| node-1.example.com |
|425 |  2 |  204 | default | concur_orc_tab_part   
| ds=today | a | r| 141331074 |  
1413310738000 | hive| node-1.example.com |
+++--+-+---+--+---+--+---++-++
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8459) DbLockManager locking table in addition to partitions

2014-10-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8459:
-
Description: 
Queries and operations on partitioned tables are generating locks on the whole 
table when they should only be locking the partition.  For example:

{code}
select count(*) from concur_orc_tab_part where ds = 'today';
{code}

This should only be locking the partition ds='today'.  But instead:
{code}
mysql select * from HIVE_LOCKS;
+++--+-+-+--+---+--+---++-++
| HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB   | HL_TABLE| 
HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | 
HL_ACQUIRED_AT | HL_USER | HL_HOST|
+++--+-+-+--+---+--+---++-++
|428 |  1 |0 | default | concur_orc_tab_part | 
NULL | a | r| 1413311172000 |  
1413311171000 | hive| node-1.example.com |
|428 |  2 |0 | default | concur_orc_tab_part | 
ds=today | a | r| 1413311172000 |  
1413311171000 | hive| node-1.example.com |
+++--+-+-+--+---+--+---++-++
{code}

  was:
Queries and operations on partitioned tables are generating locks on the whole 
table when they should only be locking the partition.  For example:

{code}
insert into table concur_orc_tab_part partition (ds='today') values ('fred 
flintstone', 43, 1.95);
{code}

This should only be locking the partition ds='today'.  But instead:
{code}
mysql select * from HIVE_LOCKS;
+++--+-+---+--+---+--+---++-++
| HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB   | HL_TABLE  
| HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | 
HL_ACQUIRED_AT | HL_USER | HL_HOST|
+++--+-+---+--+---+--+---++-++
|425 |  1 |  204 | default | values__tmp__table__1 
| NULL | a | r| 141331074 |  
1413310738000 | hive| node-1.example.com |
|425 |  2 |  204 | default | concur_orc_tab_part   
| ds=today | a | r| 141331074 |  
1413310738000 | hive| node-1.example.com |
+++--+-+---+--+---+--+---++-++
{code}


 DbLockManager locking table in addition to partitions
 -

 Key: HIVE-8459
 URL: https://issues.apache.org/jira/browse/HIVE-8459
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical

 Queries and operations on partitioned tables are generating locks on the 
 whole table when they should only be locking the partition.  For example:
 {code}
 select count(*) from concur_orc_tab_part where ds = 'today';
 {code}
 This should only be locking the partition ds='today'.  But instead:
 {code}
 mysql select * from HIVE_LOCKS;
 +++--+-+-+--+---+--+---++-++
 | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB   | HL_TABLE
 | HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | 
 HL_ACQUIRED_AT | HL_USER | HL_HOST|
 +++--+-+-+--+---+--+---++-++
 |428 |  1 |0 | default | concur_orc_tab_part 
 | NULL | a | r| 1413311172000 |  
 1413311171000 | hive| node-1.example.com |
 |428 |  2 |0 | default | concur_orc_tab_part 
 | ds=today | a | r| 1413311172000 |  
 1413311171000 | hive| 

[jira] [Commented] (HIVE-8459) DbLockManager locking table in addition to partitions

2014-10-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171328#comment-14171328
 ] 

Alan Gates commented on HIVE-8459:
--

Note that this is only happening on read side, not the write side.

 DbLockManager locking table in addition to partitions
 -

 Key: HIVE-8459
 URL: https://issues.apache.org/jira/browse/HIVE-8459
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical

 Queries and operations on partitioned tables are generating locks on the 
 whole table when they should only be locking the partition.  For example:
 {code}
 select count(*) from concur_orc_tab_part where ds = 'today';
 {code}
 This should only be locking the partition ds='today'.  But instead:
 {code}
 mysql select * from HIVE_LOCKS;
 +++--+-+-+--+---+--+---++-++
 | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB   | HL_TABLE
 | HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | 
 HL_ACQUIRED_AT | HL_USER | HL_HOST|
 +++--+-+-+--+---+--+---++-++
 |428 |  1 |0 | default | concur_orc_tab_part 
 | NULL | a | r| 1413311172000 |  
 1413311171000 | hive| node-1.example.com |
 |428 |  2 |0 | default | concur_orc_tab_part 
 | ds=today | a | r| 1413311172000 |  
 1413311171000 | hive| node-1.example.com |
 +++--+-+-+--+---+--+---++-++
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8459) DbLockManager locking table in addition to partitions

2014-10-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8459:
-
Priority: Major  (was: Critical)

 DbLockManager locking table in addition to partitions
 -

 Key: HIVE-8459
 URL: https://issues.apache.org/jira/browse/HIVE-8459
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates

 Queries and operations on partitioned tables are generating locks on the 
 whole table when they should only be locking the partition.  For example:
 {code}
 select count(*) from concur_orc_tab_part where ds = 'today';
 {code}
 This should only be locking the partition ds='today'.  But instead:
 {code}
 mysql select * from HIVE_LOCKS;
 +++--+-+-+--+---+--+---++-++
 | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB   | HL_TABLE
 | HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | 
 HL_ACQUIRED_AT | HL_USER | HL_HOST|
 +++--+-+-+--+---+--+---++-++
 |428 |  1 |0 | default | concur_orc_tab_part 
 | NULL | a | r| 1413311172000 |  
 1413311171000 | hive| node-1.example.com |
 |428 |  2 |0 | default | concur_orc_tab_part 
 | ds=today | a | r| 1413311172000 |  
 1413311171000 | hive| node-1.example.com |
 +++--+-+-+--+---+--+---++-++
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8442) Revert HIVE-8403

2014-10-13 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8442:


 Summary: Revert HIVE-8403
 Key: HIVE-8442
 URL: https://issues.apache.org/jira/browse/HIVE-8442
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0


HIVE-8403 caused the number of tests run to drop from ~6K to ~4K.  Also, the 
datanucleus repo is back up.  So we should revert this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8442) Revert HIVE-8403

2014-10-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8442:
-
Attachment: HIVE-8442.patch

For the record, here's the reversion patch.

 Revert HIVE-8403
 

 Key: HIVE-8442
 URL: https://issues.apache.org/jira/browse/HIVE-8442
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8442.patch


 HIVE-8403 caused the number of tests run to drop from ~6K to ~4K.  Also, the 
 datanucleus repo is back up.  So we should revert this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8442) Revert HIVE-8403

2014-10-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved HIVE-8442.
--
Resolution: Fixed

Reverted in both trunk and branch-0.14.

 Revert HIVE-8403
 

 Key: HIVE-8442
 URL: https://issues.apache.org/jira/browse/HIVE-8442
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8442.patch


 HIVE-8403 caused the number of tests run to drop from ~6K to ~4K.  Also, the 
 datanucleus repo is back up.  So we should revert this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8332) Reading an ACID table with vectorization on results in NPE

2014-10-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8332:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked into branch 0.14 and trunk.

 Reading an ACID table with vectorization on results in NPE
 --

 Key: HIVE-8332
 URL: https://issues.apache.org/jira/browse/HIVE-8332
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8332.patch


 On a transactional table, insert some data, then with vectorization turned on 
 do a select.  The result is:
 {code}
 Caused by: java.lang.NullPointerException at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.getObjectInspector(OrcInputFormat.java:1137)
  at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.init(VectorizedOrcAcidRowReader.java:61)
  at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1041)
  at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
   ... 25 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.

2014-10-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8258:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch 6 checked into trunk and branch 0.14.  Thanks Eugene for the review.

 Compactor cleaners can be starved on a busy table or partition.
 ---

 Key: HIVE-8258
 URL: https://issues.apache.org/jira/browse/HIVE-8258
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, 
 HIVE-8258.5.patch, HIVE-8258.6.patch, HIVE-8258.patch


 Currently the cleaning thread in the compactor does not run on a table or 
 partition while any locks are held on this partition.  This leaves it open to 
 starvation in the case of a busy table or partition.  It only needs to wait 
 until all locks on the table/partition at the time of the compaction have 
 expired.  Any jobs initiated after that (and thus any locks obtained) will be 
 for the new versions of the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file

2014-10-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8368:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk and branch 0.14.  Thanks Eugene for the review.

 compactor is improperly writing delete records in base file
 ---

 Key: HIVE-8368
 URL: https://issues.apache.org/jira/browse/HIVE-8368
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8368.2.patch, HIVE-8368.patch


 When the compactor reads records from the base and deltas, it is not properly 
 dropping delete records.  This leads to oversized base files, and possibly to 
 wrong query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8402) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions

2014-10-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8402:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk and branch 0.14.  Thanks Owen for the review.

 Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions
 -

 Key: HIVE-8402
 URL: https://issues.apache.org/jira/browse/HIVE-8402
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8402.patch


 ORC is in some instances pushing SARGs into delta files.  This is wrong 
 behavior in general as it may result in failing to pull the most recent 
 version of a row.  When the SARG is applied to a row that is deleted it 
 causes an ArrayOutOfBoundsException because there is no data in the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.

2014-10-10 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8258:
-
Status: Open  (was: Patch Available)

Missed method signature change in TestCompactor.

 Compactor cleaners can be starved on a busy table or partition.
 ---

 Key: HIVE-8258
 URL: https://issues.apache.org/jira/browse/HIVE-8258
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, 
 HIVE-8258.5.patch, HIVE-8258.patch


 Currently the cleaning thread in the compactor does not run on a table or 
 partition while any locks are held on this partition.  This leaves it open to 
 starvation in the case of a busy table or partition.  It only needs to wait 
 until all locks on the table/partition at the time of the compaction have 
 expired.  Any jobs initiated after that (and thus any locks obtained) will be 
 for the new versions of the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.

2014-10-10 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8258:
-
Status: Patch Available  (was: Open)

 Compactor cleaners can be starved on a busy table or partition.
 ---

 Key: HIVE-8258
 URL: https://issues.apache.org/jira/browse/HIVE-8258
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, 
 HIVE-8258.5.patch, HIVE-8258.6.patch, HIVE-8258.patch


 Currently the cleaning thread in the compactor does not run on a table or 
 partition while any locks are held on this partition.  This leaves it open to 
 starvation in the case of a busy table or partition.  It only needs to wait 
 until all locks on the table/partition at the time of the compaction have 
 expired.  Any jobs initiated after that (and thus any locks obtained) will be 
 for the new versions of the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.

2014-10-10 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8258:
-
Attachment: HIVE-8258.6.patch

A new patch with the signature change for TestCompactor.

 Compactor cleaners can be starved on a busy table or partition.
 ---

 Key: HIVE-8258
 URL: https://issues.apache.org/jira/browse/HIVE-8258
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, 
 HIVE-8258.5.patch, HIVE-8258.6.patch, HIVE-8258.patch


 Currently the cleaning thread in the compactor does not run on a table or 
 partition while any locks are held on this partition.  This leaves it open to 
 starvation in the case of a busy table or partition.  It only needs to wait 
 until all locks on the table/partition at the time of the compaction have 
 expired.  Any jobs initiated after that (and thus any locks obtained) will be 
 for the new versions of the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8347) Use base-64 encoding instead of custom encoding for serialized objects

2014-10-10 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8347:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
 Assignee: Alan Gates
   Status: Resolved  (was: Patch Available)

Patch checked into trunk.  Thanks Mariappan for the patch.

Note, this should be assigned to Mariappan Asokan, but as Mariappan is not in 
the contributor list I couldn't do that.  JIRA seemed to want it to be assigned 
to someone so I assigned it to me.  But if one of the JIRA admins to add 
Mariappan to the contributor list then we can properly assign the JIRA.

 Use base-64 encoding instead of custom encoding for serialized objects
 --

 Key: HIVE-8347
 URL: https://issues.apache.org/jira/browse/HIVE-8347
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mariappan Asokan
Assignee: Alan Gates
 Fix For: 0.15.0

 Attachments: HIVE-8347.patch


 Serialized objects that are shipped via Hadoop {{Configuration}} are encoded 
 using custom encoding (see {{HCatUtil.encodeBytes()}} and its complement 
 {{HCatUtil.decodeBytes()}}) which has 100% overhead.  In other words, each 
 byte in the serialized object becomes 2 bytes after encoding.  Perhaps, this 
 might be one of the reasons for the problem reported in HCATALOG-453.  The 
 patch for HCATALOG-453 compressed serialized {{InputJobInfo}} objects to 
 solve the problem.
 By using Base64 encoding, the overhead will be reduced to about 33%.  This 
 will alleviate the problem for all serialized objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8347) Use base-64 encoding instead of custom encoding for serialized objects

2014-10-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167048#comment-14167048
 ] 

Alan Gates commented on HIVE-8347:
--

You can send an email to dev@hive.apache.org and ask to be added.  

 Use base-64 encoding instead of custom encoding for serialized objects
 --

 Key: HIVE-8347
 URL: https://issues.apache.org/jira/browse/HIVE-8347
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mariappan Asokan
Assignee: Alan Gates
 Fix For: 0.15.0

 Attachments: HIVE-8347.patch


 Serialized objects that are shipped via Hadoop {{Configuration}} are encoded 
 using custom encoding (see {{HCatUtil.encodeBytes()}} and its complement 
 {{HCatUtil.decodeBytes()}}) which has 100% overhead.  In other words, each 
 byte in the serialized object becomes 2 bytes after encoding.  Perhaps, this 
 might be one of the reasons for the problem reported in HCATALOG-453.  The 
 patch for HCATALOG-453 compressed serialized {{InputJobInfo}} objects to 
 solve the problem.
 By using Base64 encoding, the overhead will be reduced to about 33%.  This 
 will alleviate the problem for all serialized objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases

2014-10-10 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8367:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk and branch-0.14.  Thanks Eugene for the patch.

 delete writes records in wrong order in some cases
 --

 Key: HIVE-8367
 URL: https://issues.apache.org/jira/browse/HIVE-8367
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8367.2.patch, HIVE-8367.patch


 I have found one query with 10k records where you do:
 create table
 insert into table -- 10k records
 delete from table -- just some records
 The records in the delete delta are not ordered properly by rowid.
 I assume this applies to updates as well, but I haven't tested it yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8347) Use base-64 encoding instead of custom encoding for serialized objects

2014-10-10 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8347:
-
Assignee: Mariappan Asokan  (was: Alan Gates)

 Use base-64 encoding instead of custom encoding for serialized objects
 --

 Key: HIVE-8347
 URL: https://issues.apache.org/jira/browse/HIVE-8347
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
 Fix For: 0.15.0

 Attachments: HIVE-8347.patch


 Serialized objects that are shipped via Hadoop {{Configuration}} are encoded 
 using custom encoding (see {{HCatUtil.encodeBytes()}} and its complement 
 {{HCatUtil.decodeBytes()}}) which has 100% overhead.  In other words, each 
 byte in the serialized object becomes 2 bytes after encoding.  Perhaps, this 
 might be one of the reasons for the problem reported in HCATALOG-453.  The 
 patch for HCATALOG-453 compressed serialized {{InputJobInfo}} objects to 
 solve the problem.
 By using Base64 encoding, the overhead will be reduced to about 33%.  This 
 will alleviate the problem for all serialized objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8347) Use base-64 encoding instead of custom encoding for serialized objects

2014-10-09 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165416#comment-14165416
 ] 

Alan Gates commented on HIVE-8347:
--

+1, [~sushanth], any concerns?

 Use base-64 encoding instead of custom encoding for serialized objects
 --

 Key: HIVE-8347
 URL: https://issues.apache.org/jira/browse/HIVE-8347
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mariappan Asokan
 Attachments: HIVE-8347.patch


 Serialized objects that are shipped via Hadoop {{Configuration}} are encoded 
 using custom encoding (see {{HCatUtil.encodeBytes()}} and its complement 
 {{HCatUtil.decodeBytes()}}) which has 100% overhead.  In other words, each 
 byte in the serialized object becomes 2 bytes after encoding.  Perhaps, this 
 might be one of the reasons for the problem reported in HCATALOG-453.  The 
 patch for HCATALOG-453 compressed serialized {{InputJobInfo}} objects to 
 solve the problem.
 By using Base64 encoding, the overhead will be reduced to about 33%.  This 
 will alleviate the problem for all serialized objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8308) Acid related table properties should be defined in one place and should be case insensitive

2014-10-09 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8308:
-
Summary: Acid related table properties should be defined in one place and 
should be case insensitive  (was: Acid related table properties should be 
defined in one place)

 Acid related table properties should be defined in one place and should be 
 case insensitive
 ---

 Key: HIVE-8308
 URL: https://issues.apache.org/jira/browse/HIVE-8308
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates

 Currently SemanticAnalyzer.ACID_TABLE_PROPERTY and Initiator.NO_AUTO_COMPACT 
 are defined in the classes that use them.  Since these are both potential 
 table properties and they both are ACID related it makes sense to collect 
 them together.  There's no central place for Table properties at this point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8308) Acid related table properties should be defined in one place

2014-10-09 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8308:
-
Priority: Major  (was: Minor)

 Acid related table properties should be defined in one place
 

 Key: HIVE-8308
 URL: https://issues.apache.org/jira/browse/HIVE-8308
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates

 Currently SemanticAnalyzer.ACID_TABLE_PROPERTY and Initiator.NO_AUTO_COMPACT 
 are defined in the classes that use them.  Since these are both potential 
 table properties and they both are ACID related it makes sense to collect 
 them together.  There's no central place for Table properties at this point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8308) Acid related table properties should be defined in one place and should be case insensitive

2014-10-09 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165435#comment-14165435
 ] 

Alan Gates commented on HIVE-8308:
--

In addition to be defined in one place, they should be case insensitive.  Right 
now transactional has to be lower case, and NO_AUTO_COMPACT has to be upper 
case.

 Acid related table properties should be defined in one place and should be 
 case insensitive
 ---

 Key: HIVE-8308
 URL: https://issues.apache.org/jira/browse/HIVE-8308
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates

 Currently SemanticAnalyzer.ACID_TABLE_PROPERTY and Initiator.NO_AUTO_COMPACT 
 are defined in the classes that use them.  Since these are both potential 
 table properties and they both are ACID related it makes sense to collect 
 them together.  There's no central place for Table properties at this point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-10-09 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165436#comment-14165436
 ] 

Alan Gates commented on HIVE-8290:
--

[~leftylev], actually looking at the code, both are case sensitive, but one I 
made all caps and one all lower.  That doesn't seem good.  I've added this to 
HIVE-8308 to fix.

 With DbTxnManager configured, all ORC tables forced to be transactional
 ---

 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8290.2.patch, HIVE-8290.patch


 Currently, once a user configures DbTxnManager to the be transaction manager, 
 all tables that use ORC are expected to be transactional.  This means they 
 all have to have buckets.  This most likely won't be what users want.
 We need to add a specific mark to a table so that users can indicate it 
 should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8341) Transaction information in config file can grow excessively large

2014-10-09 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8341:
-
Status: Open  (was: Patch Available)

 Transaction information in config file can grow excessively large
 -

 Key: HIVE-8341
 URL: https://issues.apache.org/jira/browse/HIVE-8341
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Attachments: HIVE-8341.patch


 In our testing we have seen cases where the transaction list grows very 
 large.  We need a more efficient way of communicating the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8402) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions

2014-10-08 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8402:


 Summary: Orc pushing SARGs into delta files causing 
ArrayOutOfBoundsExceptions
 Key: HIVE-8402
 URL: https://issues.apache.org/jira/browse/HIVE-8402
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0


ORC is in some instances pushing SARGs into delta files.  This is wrong 
behavior in general as it may result in failing to pull the most recent version 
of a row.  When the SARG is applied to a row that is deleted it causes an 
ArrayOutOfBoundsException because there is no data in the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8402) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions

2014-10-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163876#comment-14163876
 ] 

Alan Gates commented on HIVE-8402:
--

In my tests at the moment I only see this after compaction.  I don't know if 
this is because ORC correctly doesn't push the SARGs when there are only delta 
files or if something else is causing this.

 Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions
 -

 Key: HIVE-8402
 URL: https://issues.apache.org/jira/browse/HIVE-8402
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0


 ORC is in some instances pushing SARGs into delta files.  This is wrong 
 behavior in general as it may result in failing to pull the most recent 
 version of a row.  When the SARG is applied to a row that is deleted it 
 causes an ArrayOutOfBoundsException because there is no data in the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8403) Build broken by datanucleus.org being offline

2014-10-08 Thread Alan Gates (JIRA)
Alan Gates created HIVE-8403:


 Summary: Build broken by datanucleus.org being offline
 Key: HIVE-8403
 URL: https://issues.apache.org/jira/browse/HIVE-8403
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Alan Gates
Priority: Blocker
 Fix For: 0.14.0


datanucleus.org is not available, making it impossible to download jars.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8403) Build broken by datanucleus.org being offline

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8403:
-
Attachment: HIVE-8403.patch

Removes datanucleus.org as a repository and adds JBoss in order to pick up JMS 
jars.

 Build broken by datanucleus.org being offline
 -

 Key: HIVE-8403
 URL: https://issues.apache.org/jira/browse/HIVE-8403
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8403.patch


 datanucleus.org is not available, making it impossible to download jars.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8403) Build broken by datanucleus.org being offline

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8403:
-
Assignee: Alan Gates
  Status: Patch Available  (was: Open)

 Build broken by datanucleus.org being offline
 -

 Key: HIVE-8403
 URL: https://issues.apache.org/jira/browse/HIVE-8403
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8403.patch


 datanucleus.org is not available, making it impossible to download jars.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8367) delete writes records in wrong order in some cases

2014-10-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164332#comment-14164332
 ] 

Alan Gates commented on HIVE-8367:
--

bq. What was the original query where the issue showed up?
{code}
create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
('transactional'='true');
insert into table concur_orc_tab select * from texttab; -- loads 10k records 
into the table
delete from concur_orc_tab where age = 20 and age  30;
{code}
This resulted in only some rows being deleted (~300 of the 1700 that should 
have been deleted)

What precisely was the problem and how does the RS deduplication change help?
The problem was that because the code was turning off the RS deduplication it 
was getting a plan with two MR jobs.  The sort by ROW__ID was done in job one, 
and the bucketing was done in job two.  This meant that the bucketing in job 2 
partially undid the sorting of job 1, resulting in only some of the records 
showing up as deleted (since the records have to be written in the delta file 
in proper order).  The minimum number of reducers on which to apply the RS 
deduplication is pushed to 1 so that this optimization is used for even small 
queries.  

How is the changes to sort order of ROW__ID related?
That should never have been set to descending in the first place.  ROW__ID 
needs to be stored ascending to work properly.  I suspect it was a fluke of 
most of the qfile tests that they worked with this on.  (Actually Thejas asked 
at the time why this was necessary and rather than fixing it (which I should 
have done) I just said I didn't know.  Oops.)  

bq.  ReduceSinkDeDuplication.java change is not needed
What change?  I don't see any changes to that file in the patch.

 delete writes records in wrong order in some cases
 --

 Key: HIVE-8367
 URL: https://issues.apache.org/jira/browse/HIVE-8367
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8367.patch


 I have found one query with 10k records where you do:
 create table
 insert into table -- 10k records
 delete from table -- just some records
 The records in the delete delta are not ordered properly by rowid.
 I assume this applies to updates as well, but I haven't tested it yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8402) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8402:
-
Attachment: HIVE-8402.patch

A patch to change orc to not push sargs into the deltas.

And to answer my earlier unknown,  this did only happen when a base was also 
present.  When there was no base file the sarg was not being written into the 
options passed to OrcRawRecordMerge (see OrcInputFormat.getReader, around line 
1121). 

 Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions
 -

 Key: HIVE-8402
 URL: https://issues.apache.org/jira/browse/HIVE-8402
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8402.patch


 ORC is in some instances pushing SARGs into delta files.  This is wrong 
 behavior in general as it may result in failing to pull the most recent 
 version of a row.  When the SARG is applied to a row that is deleted it 
 causes an ArrayOutOfBoundsException because there is no data in the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8402) Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8402:
-
Status: Patch Available  (was: Open)

 Orc pushing SARGs into delta files causing ArrayOutOfBoundsExceptions
 -

 Key: HIVE-8402
 URL: https://issues.apache.org/jira/browse/HIVE-8402
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8402.patch


 ORC is in some instances pushing SARGs into delta files.  This is wrong 
 behavior in general as it may result in failing to pull the most recent 
 version of a row.  When the SARG is applied to a row that is deleted it 
 causes an ArrayOutOfBoundsException because there is no data in the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8368:
-
Status: Open  (was: Patch Available)

 compactor is improperly writing delete records in base file
 ---

 Key: HIVE-8368
 URL: https://issues.apache.org/jira/browse/HIVE-8368
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8368.patch


 When the compactor reads records from the base and deltas, it is not properly 
 dropping delete records.  This leads to oversized base files, and possibly to 
 wrong query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8368:
-
Attachment: HIVE-8368.2.patch

Rebased version of the patch.

 compactor is improperly writing delete records in base file
 ---

 Key: HIVE-8368
 URL: https://issues.apache.org/jira/browse/HIVE-8368
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8368.2.patch, HIVE-8368.patch


 When the compactor reads records from the base and deltas, it is not properly 
 dropping delete records.  This leads to oversized base files, and possibly to 
 wrong query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8368:
-
Status: Patch Available  (was: Open)

 compactor is improperly writing delete records in base file
 ---

 Key: HIVE-8368
 URL: https://issues.apache.org/jira/browse/HIVE-8368
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8368.2.patch, HIVE-8368.patch


 When the compactor reads records from the base and deltas, it is not properly 
 dropping delete records.  This leads to oversized base files, and possibly to 
 wrong query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8367:
-
Status: Open  (was: Patch Available)

 delete writes records in wrong order in some cases
 --

 Key: HIVE-8367
 URL: https://issues.apache.org/jira/browse/HIVE-8367
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8367.patch


 I have found one query with 10k records where you do:
 create table
 insert into table -- 10k records
 delete from table -- just some records
 The records in the delete delta are not ordered properly by rowid.
 I assume this applies to updates as well, but I haven't tested it yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8367:
-
Status: Patch Available  (was: Open)

 delete writes records in wrong order in some cases
 --

 Key: HIVE-8367
 URL: https://issues.apache.org/jira/browse/HIVE-8367
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8367.2.patch, HIVE-8367.patch


 I have found one query with 10k records where you do:
 create table
 insert into table -- 10k records
 delete from table -- just some records
 The records in the delete delta are not ordered properly by rowid.
 I assume this applies to updates as well, but I haven't tested it yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8367:
-
Attachment: HIVE-8367.2.patch

Rebased version of the patch.

 delete writes records in wrong order in some cases
 --

 Key: HIVE-8367
 URL: https://issues.apache.org/jira/browse/HIVE-8367
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8367.2.patch, HIVE-8367.patch


 I have found one query with 10k records where you do:
 create table
 insert into table -- 10k records
 delete from table -- just some records
 The records in the delete delta are not ordered properly by rowid.
 I assume this applies to updates as well, but I haven't tested it yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8258:
-
Status: Patch Available  (was: Open)

 Compactor cleaners can be starved on a busy table or partition.
 ---

 Key: HIVE-8258
 URL: https://issues.apache.org/jira/browse/HIVE-8258
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, 
 HIVE-8258.5.patch, HIVE-8258.patch


 Currently the cleaning thread in the compactor does not run on a table or 
 partition while any locks are held on this partition.  This leaves it open to 
 starvation in the case of a busy table or partition.  It only needs to wait 
 until all locks on the table/partition at the time of the compaction have 
 expired.  Any jobs initiated after that (and thus any locks obtained) will be 
 for the new versions of the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8258:
-
Attachment: HIVE-8258.5.patch

Rebased patch.

 Compactor cleaners can be starved on a busy table or partition.
 ---

 Key: HIVE-8258
 URL: https://issues.apache.org/jira/browse/HIVE-8258
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, 
 HIVE-8258.5.patch, HIVE-8258.patch


 Currently the cleaning thread in the compactor does not run on a table or 
 partition while any locks are held on this partition.  This leaves it open to 
 starvation in the case of a busy table or partition.  It only needs to wait 
 until all locks on the table/partition at the time of the compaction have 
 expired.  Any jobs initiated after that (and thus any locks obtained) will be 
 for the new versions of the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6669) sourcing txn-script from schema script results in failure for mysql oracle

2014-10-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-6669:
-
Attachment: HIVE-6669.2.patch

A new version of the patch that quotes postgres table and field names.

 sourcing txn-script from schema script results in failure for mysql  oracle
 

 Key: HIVE-6669
 URL: https://issues.apache.org/jira/browse/HIVE-6669
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Alan Gates
Priority: Blocker
 Attachments: HIVE-6669.2.patch, HIVE-6669.patch


 This issues is addressed in 0.13 by in-lining the the transaction schema 
 statements in the schema initialization script (HIVE-6559)
 The 0.14 schema initialization is not fixed. This is the followup ticket for 
 to address the problem in 0.14. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8341) Transaction information in config file can grow excessively large

2014-10-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164472#comment-14164472
 ] 

Alan Gates commented on HIVE-8341:
--

Do you have a simple query with a transform in it that shows the issue with the 
process builder?

 Transaction information in config file can grow excessively large
 -

 Key: HIVE-8341
 URL: https://issues.apache.org/jira/browse/HIVE-8341
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Attachments: HIVE-8341.patch


 In our testing we have seen cases where the transaction list grows very 
 large.  We need a more efficient way of communicating the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.

2014-10-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8258:
-
Status: Patch Available  (was: Open)

Ignore that last comment, the issue was just pilot error.

 Compactor cleaners can be starved on a busy table or partition.
 ---

 Key: HIVE-8258
 URL: https://issues.apache.org/jira/browse/HIVE-8258
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8258.2.patch, HIVE-8258.3.patch, HIVE-8258.4.patch, 
 HIVE-8258.patch


 Currently the cleaning thread in the compactor does not run on a table or 
 partition while any locks are held on this partition.  This leaves it open to 
 starvation in the case of a busy table or partition.  It only needs to wait 
 until all locks on the table/partition at the time of the compaction have 
 expired.  Any jobs initiated after that (and thus any locks obtained) will be 
 for the new versions of the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6669) sourcing txn-script from schema script results in failure for mysql oracle

2014-10-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-6669:
-
Status: Patch Available  (was: Open)

NO PRECOMMIT TESTS

 sourcing txn-script from schema script results in failure for mysql  oracle
 

 Key: HIVE-6669
 URL: https://issues.apache.org/jira/browse/HIVE-6669
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Alan Gates
Priority: Blocker
 Attachments: HIVE-6669.patch


 This issues is addressed in 0.13 by in-lining the the transaction schema 
 statements in the schema initialization script (HIVE-6559)
 The 0.14 schema initialization is not fixed. This is the followup ticket for 
 to address the problem in 0.14. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file

2014-10-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8368:
-
Status: Patch Available  (was: Open)

 compactor is improperly writing delete records in base file
 ---

 Key: HIVE-8368
 URL: https://issues.apache.org/jira/browse/HIVE-8368
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8367.patch


 When the compactor reads records from the base and deltas, it is not properly 
 dropping delete records.  This leads to oversized base files, and possibly to 
 wrong query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file

2014-10-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8368:
-
Attachment: HIVE-8367.patch

The issue comes out when input sizes are large enough that they exceed one map 
task.  

This patch fixes it by turning on reduce deduplication in the optimizer (which 
was being turned off before) and dropping the minimum number of reducers to 1 
(instead of 4).  This has the side effect of halving the time it takes to do an 
update or delete.

 compactor is improperly writing delete records in base file
 ---

 Key: HIVE-8368
 URL: https://issues.apache.org/jira/browse/HIVE-8368
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8367.patch


 When the compactor reads records from the base and deltas, it is not properly 
 dropping delete records.  This leads to oversized base files, and possibly to 
 wrong query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8368) compactor is improperly writing delete records in base file

2014-10-07 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162040#comment-14162040
 ] 

Alan Gates commented on HIVE-8368:
--

Ignore the last comment, it was intended for a different JIRA.

 compactor is improperly writing delete records in base file
 ---

 Key: HIVE-8368
 URL: https://issues.apache.org/jira/browse/HIVE-8368
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


 When the compactor reads records from the base and deltas, it is not properly 
 dropping delete records.  This leads to oversized base files, and possibly to 
 wrong query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases

2014-10-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8367:
-
Attachment: HIVE-8367.patch

The issue comes out when input sizes are large enough that they exceed one map 
task.
This patch fixes it by turning on reduce deduplication in the optimizer (which 
was being turned off before) and dropping the minimum number of reducers to 1 
(instead of 4). This has the side effect of halving the time it takes to do an 
update or delete.

 delete writes records in wrong order in some cases
 --

 Key: HIVE-8367
 URL: https://issues.apache.org/jira/browse/HIVE-8367
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8367.patch


 I have found one query with 10k records where you do:
 create table
 insert into table -- 10k records
 delete from table -- just some records
 The records in the delete delta are not ordered properly by rowid.
 I assume this applies to updates as well, but I haven't tested it yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8367) delete writes records in wrong order in some cases

2014-10-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8367:
-
Status: Patch Available  (was: Open)

 delete writes records in wrong order in some cases
 --

 Key: HIVE-8367
 URL: https://issues.apache.org/jira/browse/HIVE-8367
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8367.patch


 I have found one query with 10k records where you do:
 create table
 insert into table -- 10k records
 delete from table -- just some records
 The records in the delete delta are not ordered properly by rowid.
 I assume this applies to updates as well, but I haven't tested it yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file

2014-10-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8368:
-
Attachment: (was: HIVE-8367.patch)

 compactor is improperly writing delete records in base file
 ---

 Key: HIVE-8368
 URL: https://issues.apache.org/jira/browse/HIVE-8368
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


 When the compactor reads records from the base and deltas, it is not properly 
 dropping delete records.  This leads to oversized base files, and possibly to 
 wrong query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8368) compactor is improperly writing delete records in base file

2014-10-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8368:
-
Status: Open  (was: Patch Available)

 compactor is improperly writing delete records in base file
 ---

 Key: HIVE-8368
 URL: https://issues.apache.org/jira/browse/HIVE-8368
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0


 When the compactor reads records from the base and deltas, it is not properly 
 dropping delete records.  This leads to oversized base files, and possibly to 
 wrong query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   6   7   8   9   10   >