date:20130917


 [ 
https://issues.apache.org/jira/browse/HIVE-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5300:


Status: Patch Available  (was: Open)

 MapredLocalTask logs success message twice
 --

 Key: HIVE-5300
 URL: https://issues.apache.org/jira/browse/HIVE-5300
 Project: Hive
  Issue Type: Improvement
  Components: Logging
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-5300.1.patch.txt


 Something like this,
 {noformat}
 Execution completed successfully
 Mapred Local Task Succeeded . Convert the Join into MapJoin
 Mapred Local Task Succeeded . Convert the Join into MapJoin
 Launching Job 1 out of 1
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5300) MapredLocalTask logs success message twice


 [ 
https://issues.apache.org/jira/browse/HIVE-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5300:


Attachment: HIVE-5300.1.patch.txt

 MapredLocalTask logs success message twice
 --

 Key: HIVE-5300
 URL: https://issues.apache.org/jira/browse/HIVE-5300
 Project: Hive
  Issue Type: Improvement
  Components: Logging
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-5300.1.patch.txt


 Something like this,
 {noformat}
 Execution completed successfully
 Mapred Local Task Succeeded . Convert the Join into MapJoin
 Mapred Local Task Succeeded . Convert the Join into MapJoin
 Launching Job 1 out of 1
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5172) TUGIContainingTransport returning null transport, causing intermittent SocketTimeoutException on hive client and NullPointerException in TUGIBasedProcessor on the server


[ 
https://issues.apache.org/jira/browse/HIVE-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769239#comment-13769239
 ] 

Ashutosh Chauhan commented on HIVE-5172:


As I mentioned in 
https://issues.apache.org/jira/browse/HIVE-3805?focusedCommentId=13533106page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13533106
 this TUGIContainingTransport is really a hack, current way to do this is to 
use {{Plain Sasl Server}} otherwise we keep running into problems like this. 
[~agateaaa] wondering if you would like to pursue the 'proper fix'? 
If not, than I need to think a bit for this current patch. Will get back to you 
soon. 

 TUGIContainingTransport returning null transport, causing intermittent 
 SocketTimeoutException on hive client and NullPointerException in 
 TUGIBasedProcessor on the server
 -

 Key: HIVE-5172
 URL: https://issues.apache.org/jira/browse/HIVE-5172
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0, 0.10.0, 0.11.0
Reporter: agate
 Attachments: HIVE-5172.1.patch.txt


 We are running into frequent problem using HCatalog 0.4.1 (Hive Metastore 
 Server 0.9) where we get connection reset or connection timeout errors on the 
 client and NullPointerException in TUGIBasedProcessor on the server. 
 {code}
 hive client logs:
 =
 org.apache.thrift.transport.TTransportException: 
 java.net.SocketTimeoutException: Read timed out
 at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
 ... 31 more
 {code}
 {code}
 hive metastore server logs:
 ===
 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer 
 (TThreadPoolServer.java:run(182)) - Error occurred during processing of 
 message.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
 at

[jira] [Updated] (HIVE-3764) Support metastore version consistency check

[
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasad Mujumdar updated HIVE-3764:
--

Attachment: (was: HIVE-3764.4.patch)

Support metastore version consistency check
---

Key: HIVE-3764
URL: https://issues.apache.org/jira/browse/HIVE-3764
Project: Hive
Issue Type: Improvement
Components: Metastore
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
Fix For: 0.12.0

Attachments: HIVE-3764.1.patch, HIVE-3764.4.patch

Today there's no version/compatibility information stored in hive metastore.
Also the datanucleus configuration property to automatically create missing
tables is enabled by default. If you happen to start an older or newer hive
or don't run the correct upgrade scripts during migration, the metastore
would end up corrupted. The autoCreate schema is not always sufficient to
upgrade metastore when migrating to newer release. It's not supported with
all databases. Besides the migration often involves altering existing table,
changing or moving data etc.
Hence it's very useful to have some consistency check to make sure that hive
is using correct metastore and for production systems the schema is not
automatically by running hive.
Besides it would be helpful to add a tool that can leverage this version
information to figure out the required set of upgrade scripts, and execute
those against the configured metastore. Now that Hive includes Beeline
client, it can be used to execute the scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3764) Support metastore version consistency check

[
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasad Mujumdar updated HIVE-3764:
--

Attachment: (was: HIVE-3764-0.13-addional-file.patch)

Support metastore version consistency check
---

Attachments: HIVE-3764.1.patch, HIVE-3764.4.patch

Re: Review Request 14120: HIVE-3764: Support metastore version consistency check

2013-09-17 Thread Prasad Mujumdar



 On Sept. 13, 2013, 1:35 p.m., Brock Noland wrote:
  Prasad, this looks really good! I just had two people email me directly 
  yesterday and both were using the incorrect metastore version.
  
  Have you ran the new unit tests a couple of times?  Have you done any other 
  testing?

Addressed the comments. The MetaException doesn't support nesting, so changed 
that to HiveMetaException. 
Added more tests.
manually tested the init and upgrade operations using derby and MySQL.
As discussed on the ticket, I am going to split the patch into two separate 
tickets. will close this review 


- Prasad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14120/#review26079
---


On Sept. 13, 2013, 7:53 a.m., Prasad Mujumdar wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/14120/
 ---
 
 (Updated Sept. 13, 2013, 7:53 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-3764
 https://issues.apache.org/jira/browse/HIVE-3764
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 - Added a new table in the metastore schema to store the Hive version in the 
 metastore.
 - Metastore handler compare the version stored in the schema with its own 
 version. If there's a mismatch, then it can either record the correct version 
 or raise error. The behavior is configurable via a new Hive config. This 
 config when set, also restrict dataNucleus to auto upgrade the schema.
 - The new schema creation and upgrade scripts record the new version in the 
 metastore version table.
 - Added 0.12 upgrade scripts for all supported DBs to creates the new table 
 version tables in 0.12 metastore schema
 - Added a new schemaTool that can perform new schema initialization or 
 upgrade based on the schema version and product version.
 
 
 Diffs
 -
 
   beeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java PRE-CREATION 
   beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java PRE-CREATION 
   beeline/src/test/org/apache/hive/beeline/src/test/TestSchemaTool.java 
 PRE-CREATION 
   bin/ext/schemaTool.sh PRE-CREATION 
   bin/schematool PRE-CREATION 
   build-common.xml ad5ac23 
   build.xml 3e87163 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22149e4 
   conf/hive-default.xml.template 9a3fc1d 
   metastore/scripts/upgrade/derby/014-HIVE-3764.derby.sql PRE-CREATION 
   metastore/scripts/upgrade/derby/hive-schema-0.12.0.derby.sql cce544f 
   metastore/scripts/upgrade/derby/upgrade-0.10.0-to-0.11.0.derby.sql cae7936 
   metastore/scripts/upgrade/derby/upgrade-0.11.0-to-0.12.0.derby.sql 492cc93 
   metastore/scripts/upgrade/derby/upgrade.order.derby PRE-CREATION 
   metastore/scripts/upgrade/mysql/014-HIVE-3764.mysql.sql PRE-CREATION 
   metastore/scripts/upgrade/mysql/hive-schema-0.12.0.mysql.sql 22a77fe 
   metastore/scripts/upgrade/mysql/upgrade-0.11.0-to-0.12.0.mysql.sql 375a05f 
   metastore/scripts/upgrade/mysql/upgrade.order.mysql PRE-CREATION 
   metastore/scripts/upgrade/oracle/014-HIVE-3764.oracle.sql PRE-CREATION 
   metastore/scripts/upgrade/oracle/hive-schema-0.12.0.oracle.sql 85a0178 
   metastore/scripts/upgrade/oracle/upgrade-0.10.0-to-0.11.0.mysql.sql 
 PRE-CREATION 
   metastore/scripts/upgrade/oracle/upgrade-0.11.0-to-0.12.0.oracle.sql 
 a2d0901 
   metastore/scripts/upgrade/oracle/upgrade.order.oracle PRE-CREATION 
   metastore/scripts/upgrade/postgres/014-HIVE-3764.postgres.sql PRE-CREATION 
   metastore/scripts/upgrade/postgres/hive-schema-0.12.0.postgres.sql 7b319ba 
   metastore/scripts/upgrade/postgres/upgrade-0.11.0-to-0.12.0.postgres.sql 
 9da0a1b 
   metastore/scripts/upgrade/postgres/upgrade.order.postgres PRE-CREATION 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 39dda92 
   
 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
 PRE-CREATION 
   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 a08c728 
   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e410c3a 
   
 metastore/src/model/org/apache/hadoop/hive/metastore/model/MVersionTable.java 
 PRE-CREATION 
   metastore/src/model/package.jdo c42b5b0 
   
 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
  8066784 
   
 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
  0f9b16c 
   
 metastore/src/test/org/apache/hadoop/hive/metastore/TestMetastoreVersion.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/14120/diff/
 
 
 Testing
 ---
 
 Added new tests for schema verification and schemaTool.
 
 
 Thanks,
 
 Prasad Mujumdar

[jira] [Commented] (HIVE-5300) MapredLocalTask logs success message twice


[ 
https://issues.apache.org/jira/browse/HIVE-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769244#comment-13769244
 ] 

Ashutosh Chauhan commented on HIVE-5300:


+1

 MapredLocalTask logs success message twice
 --

 Key: HIVE-5300
 URL: https://issues.apache.org/jira/browse/HIVE-5300
 Project: Hive
  Issue Type: Improvement
  Components: Logging
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-5300.1.patch.txt


 Something like this,
 {noformat}
 Execution completed successfully
 Mapred Local Task Succeeded . Convert the Join into MapJoin
 Mapred Local Task Succeeded . Convert the Join into MapJoin
 Launching Job 1 out of 1
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request 14169: HIVE-3764: Support metastore version consistency check

2013-09-17 Thread Prasad Mujumdar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14169/
---

Review request for hive, Ashutosh Chauhan and Brock Noland.


Bugs: HIVE-3764
https://issues.apache.org/jira/browse/HIVE-3764


Repository: hive-git


Description
---

This is a 0.12 specific patch. The trunk patch will include additional 
metastore scripts which I will attach separately to the ticket.

- Added a new table in the metastore schema to store the Hive version in the 
metastore.
- Metastore handler compare the version stored in the schema with its own 
version. If there's a mismatch, then it can either record the correct version 
or raise error. The behavior is configurable via a new Hive config. This config 
when set, also restrict dataNucleus to auto upgrade the schema.
- The new schema creation and upgrade scripts record the new version in the 
metastore version table.
- Added 0.12 upgrade scripts for all supported DBs to creates the new table 
version tables in 0.12 metastore schema

The current patch has the verification turned off by default. I would prefer to 
keep it enabled, though it require any add-hoc setup to explicitly disable it 
(or create the metastore schema by running scripts). The default can be changed 
or left as is as per the consensus. 


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22149e4 
  conf/hive-default.xml.template 9a3fc1d 
  metastore/scripts/upgrade/derby/014-HIVE-3764.derby.sql PRE-CREATION 
  metastore/scripts/upgrade/derby/hive-schema-0.12.0.derby.sql cce544f 
  metastore/scripts/upgrade/derby/upgrade-0.10.0-to-0.11.0.derby.sql cae7936 
  metastore/scripts/upgrade/derby/upgrade-0.11.0-to-0.12.0.derby.sql 492cc93 
  metastore/scripts/upgrade/derby/upgrade.order.derby PRE-CREATION 
  metastore/scripts/upgrade/mysql/014-HIVE-3764.mysql.sql PRE-CREATION 
  metastore/scripts/upgrade/mysql/hive-schema-0.12.0.mysql.sql 22a77fe 
  metastore/scripts/upgrade/mysql/upgrade-0.11.0-to-0.12.0.mysql.sql 375a05f 
  metastore/scripts/upgrade/mysql/upgrade.order.mysql PRE-CREATION 
  metastore/scripts/upgrade/oracle/014-HIVE-3764.oracle.sql PRE-CREATION 
  metastore/scripts/upgrade/oracle/hive-schema-0.12.0.oracle.sql 85a0178 
  metastore/scripts/upgrade/oracle/upgrade-0.10.0-to-0.11.0.mysql.sql 
PRE-CREATION 
  metastore/scripts/upgrade/oracle/upgrade-0.11.0-to-0.12.0.oracle.sql a2d0901 
  metastore/scripts/upgrade/oracle/upgrade.order.oracle PRE-CREATION 
  metastore/scripts/upgrade/postgres/014-HIVE-3764.postgres.sql PRE-CREATION 
  metastore/scripts/upgrade/postgres/hive-schema-0.12.0.postgres.sql 7b319ba 
  metastore/scripts/upgrade/postgres/upgrade-0.11.0-to-0.12.0.postgres.sql 
9da0a1b 
  metastore/scripts/upgrade/postgres/upgrade.order.postgres PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
39dda92 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java a27243d 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e410c3a 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MVersionTable.java 
PRE-CREATION 
  metastore/src/model/package.jdo c42b5b0 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 8066784 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 0f9b16c 
  metastore/src/test/org/apache/hadoop/hive/metastore/TestMetastoreVersion.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/14169/diff/


Testing
---

Added new tests for schema verification. Manually tested the upgrades using 
derby and MySQL.


Thanks,

Prasad Mujumdar

[jira] [Updated] (HIVE-3764) Support metastore version consistency check

[
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasad Mujumdar updated HIVE-3764:
--

Description:
Today there's no version/compatibility information stored in hive metastore.
Also the datanucleus configuration property to automatically create missing
tables is enabled by default. If you happen to start an older or newer hive or
don't run the correct upgrade scripts during migration, the metastore would end
up corrupted. The autoCreate schema is not always sufficient to upgrade
metastore when migrating to newer release. It's not supported with all
databases. Besides the migration often involves altering existing table,
changing or moving data etc.

Hence it's very useful to have some consistency check to make sure that hive is
using correct metastore and for production systems the schema is not
automatically by running hive.

was:
Today there's no version/compatibility information stored in hive metastore.
Also the datanucleus configuration property to automatically create missing
tables is enabled by default. If you happen to start an older or newer hive or
don't run the correct upgrade scripts during migration, the metastore would end
up corrupted. The autoCreate schema is not always sufficient to upgrade
metastore when migrating to newer release. It's not supported with all
databases. Besides the migration often involves altering existing table,
changing or moving data etc.

Hence it's very useful to have some consistency check to make sure that hive is
using correct metastore and for production systems the schema is not
automatically by running hive.

Besides it would be helpful to add a tool that can leverage this version
information to figure out the required set of upgrade scripts, and execute
those against the configured metastore. Now that Hive includes Beeline client,
it can be used to execute the scripts.

Support metastore version consistency check
---

Attachments: HIVE-3764.1.patch, HIVE-3764.4.patch

[jira] [Commented] (HIVE-3764) Support metastore version consistency check

[
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769247#comment-13769247
]

Prasad Mujumdar commented on HIVE-3764:
---

New review request for the updated patch at https://reviews.apache.org/r/14169/

Support metastore version consistency check
---

Attachments: HIVE-3764.1.patch, HIVE-3764.4.patch

[jira] [Created] (HIVE-5301) Add a schema tool for offline metastore schema upgrade

Prasad Mujumdar created HIVE-5301:
-

 Summary: Add a schema tool for offline metastore schema upgrade
 Key: HIVE-5301
 URL: https://issues.apache.org/jira/browse/HIVE-5301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0


HIVE-3764 is addressing metastore version consistency.

Besides it would be helpful to add a tool that can leverage this version 
information to figure out the required set of upgrade scripts, and execute 
those against the configured metastore. Now that Hive includes Beeline client, 
it can be used to execute the scripts.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5301) Add a schema tool for offline metastore schema upgrade


 [ 
https://issues.apache.org/jira/browse/HIVE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-5301:
--

Attachment: HIVE-5301-with-HIVE-3764.0.patch

Combined HIVE-3764 + HIVE-5301 patch testing

 Add a schema tool for offline metastore schema upgrade
 --

 Key: HIVE-5301
 URL: https://issues.apache.org/jira/browse/HIVE-5301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-5301.1.patch, HIVE-5301-with-HIVE-3764.0.patch


 HIVE-3764 is addressing metastore version consistency.
 Besides it would be helpful to add a tool that can leverage this version 
 information to figure out the required set of upgrade scripts, and execute 
 those against the configured metastore. Now that Hive includes Beeline 
 client, it can be used to execute the scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5301) Add a schema tool for offline metastore schema upgrade


 [ 
https://issues.apache.org/jira/browse/HIVE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-5301:
--

Attachment: HIVE-5301.1.patch

Patch attached, requires HIVE-3764 patch

 Add a schema tool for offline metastore schema upgrade
 --

 Key: HIVE-5301
 URL: https://issues.apache.org/jira/browse/HIVE-5301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-5301.1.patch, HIVE-5301-with-HIVE-3764.0.patch


 HIVE-3764 is addressing metastore version consistency.
 Besides it would be helpful to add a tool that can leverage this version 
 information to figure out the required set of upgrade scripts, and execute 
 those against the configured metastore. Now that Hive includes Beeline 
 client, it can be used to execute the scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request 14170: HIVE-5301: Add a schema tool for offline metastore schema upgrade

2013-09-17 Thread Prasad Mujumdar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14170/
---

Review request for hive, Ashutosh Chauhan, Brock Noland, and Thejas Nair.


Bugs: HIVE-5301
https://issues.apache.org/jira/browse/HIVE-5301


Repository: hive-git


Description
---

Schema tool to initialize and migrate hive metastore schema
- Extract the metastore connection details from hive configuration
- the target version is extracted from binary and metastore if possible, 
optionally can be specified as argument
- determine the scripts needs to be executed for the initialization or upgrade
- handle DB nested scripts
- execute the required scripts using beeline


Diffs
-

  beeline/src/java/org/apache/hive/beeline/BeeLine.java 4c6eb9b 
  beeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java PRE-CREATION 
  beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java PRE-CREATION 
  beeline/src/test/org/apache/hive/beeline/src/test/TestSchemaTool.java 
PRE-CREATION 
  bin/ext/schemaTool.sh PRE-CREATION 
  bin/schematool PRE-CREATION 
  build-common.xml ad5ac23 
  build.xml 3e87163 

Diff: https://reviews.apache.org/r/14170/diff/


Testing
---

Added unit tests. Manually tested various options using derby and MySQL.


Thanks,

Prasad Mujumdar

[jira] [Updated] (HIVE-5301) Add a schema tool for offline metastore schema upgrade


 [ 
https://issues.apache.org/jira/browse/HIVE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-5301:
--

Status: Patch Available  (was: Open)

 Add a schema tool for offline metastore schema upgrade
 --

 Key: HIVE-5301
 URL: https://issues.apache.org/jira/browse/HIVE-5301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-5301.1.patch, HIVE-5301-with-HIVE-3764.0.patch


 HIVE-3764 is addressing metastore version consistency.
 Besides it would be helpful to add a tool that can leverage this version 
 information to figure out the required set of upgrade scripts, and execute 
 those against the configured metastore. Now that Hive includes Beeline 
 client, it can be used to execute the scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3764) Support metastore version consistency check

[
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769252#comment-13769252
]

Prasad Mujumdar commented on HIVE-3764:
---

The schema tool part is addressed via HIVE-5301.

Support metastore version consistency check
---

Attachments: HIVE-3764.1.patch, HIVE-3764.4.patch

[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769275#comment-13769275
 ] 

Hive QA commented on HIVE-4732:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603500/HIVE-4732.5.patch

{color:green}SUCCESS:{color} +1 3126 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/774/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/774/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5211) ALTER TABLE does not change the type of column for a table with AVRO data

2013-09-17 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HIVE-5211:
--

Labels: avro  (was: )

 ALTER TABLE does not change the type of column for a table with AVRO data
 -

 Key: HIVE-5211
 URL: https://issues.apache.org/jira/browse/HIVE-5211
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Neha Tomar
  Labels: avro

 1  Created a table in Hive with AVRO data.
 hive CREATE EXTERNAL TABLE sample ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION '/home/neha/test_data/avrodata'
  TBLPROPERTIES ('avro.schema.literal'='{type: record,name: 
 TUPLE_3,fields: [ { name: sample_id,type: [ null, int ],doc: 
 autogenerated from Pig Field Schema} ]}'
  );
 OK
 Time taken: 0.16 seconds
 hive describe sample;
 OK
 sample_id int from deserializer   
 Time taken: 0.516 seconds, Fetched: 1 row(s)
  Alter the type of column from int to bigint. It displays OK as the 
  result of DDL execution. However, describing the table still shows 
  previous data type.
 hive alter table sample change sample_id int bigint;
 OK
 Time taken: 0.614 seconds
 hive describe sample;   
 OK
 sample_id int from deserializer   
 Time taken: 0.4 seconds, Fetched: 1 row(s)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-4116) Can't use views using map datatype.


 [ 
https://issues.apache.org/jira/browse/HIVE-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-4116:
---

Assignee: Navis

 Can't use views using map datatype.
 ---

 Key: HIVE-4116
 URL: https://issues.apache.org/jira/browse/HIVE-4116
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1, 0.10.0
Reporter: Karel Vervaeke
Assignee: Navis

 Executing the following 
 {noformat}
 DROP TABLE IF EXISTS `items`;
 CREATE TABLE IF NOT EXISTS `items` (id INT, name STRING, info 
 MAPSTRING,STRING) PARTITIONED BY (ds STRING);
 DROP VIEW IF EXISTS `priceview`;
 CREATE VIEW `priceview` AS
 SELECT
 `items`.`id`,
 `items`.info['price']
 FROM
 `items`
 ;
 select * from `priceview`;
 {noformat}
 Produces the following error:
 {noformat}
 karel@tomato:~/tmp$ $HIVE_HOME/bin/hive -f hivebug.sql
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Logging initialized using configuration in 
 jar:file:/home/karel/opt/hive-0.10.0-bin/lib/hive-common-0.10.0.jar!/hive-log4j.properties
 Hive history file=/tmp/karel/hive_job_log_karel_201303051117_945318761.txt
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/home/karel/opt/hadoop-2.0.0-mr1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/home/karel/opt/hive-0.10.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 OK
 Time taken: 5.449 seconds
 OK
 Time taken: 0.303 seconds
 OK
 Time taken: 0.131 seconds
 OK
 Time taken: 0.206 seconds
 FAILED: SemanticException line 3:22 mismatched input '.' expecting FROM near 
 '`items`' in from clause
  in definition of VIEW priceview [
 SELECT
 `items`.`id`,
 `items``items`.`info`info['price']
 FROM
 `default`.`items`
 ] used as priceview at Line 3:14
 {noformat}
 Unless I'm not using the right syntax, I would expect this simple example to 
 work. I have tried some variations (quotes, no quotes, ...), to no avail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4116) Can't use views using map datatype.


 [ 
https://issues.apache.org/jira/browse/HIVE-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4116:


Status: Patch Available  (was: Open)

 Can't use views using map datatype.
 ---

 Key: HIVE-4116
 URL: https://issues.apache.org/jira/browse/HIVE-4116
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.10.0, 0.8.1
Reporter: Karel Vervaeke
Assignee: Navis

 Executing the following 
 {noformat}
 DROP TABLE IF EXISTS `items`;
 CREATE TABLE IF NOT EXISTS `items` (id INT, name STRING, info 
 MAPSTRING,STRING) PARTITIONED BY (ds STRING);
 DROP VIEW IF EXISTS `priceview`;
 CREATE VIEW `priceview` AS
 SELECT
 `items`.`id`,
 `items`.info['price']
 FROM
 `items`
 ;
 select * from `priceview`;
 {noformat}
 Produces the following error:
 {noformat}
 karel@tomato:~/tmp$ $HIVE_HOME/bin/hive -f hivebug.sql
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Logging initialized using configuration in 
 jar:file:/home/karel/opt/hive-0.10.0-bin/lib/hive-common-0.10.0.jar!/hive-log4j.properties
 Hive history file=/tmp/karel/hive_job_log_karel_201303051117_945318761.txt
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/home/karel/opt/hadoop-2.0.0-mr1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/home/karel/opt/hive-0.10.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 OK
 Time taken: 5.449 seconds
 OK
 Time taken: 0.303 seconds
 OK
 Time taken: 0.131 seconds
 OK
 Time taken: 0.206 seconds
 FAILED: SemanticException line 3:22 mismatched input '.' expecting FROM near 
 '`items`' in from clause
  in definition of VIEW priceview [
 SELECT
 `items`.`id`,
 `items``items`.`info`info['price']
 FROM
 `default`.`items`
 ] used as priceview at Line 3:14
 {noformat}
 Unless I'm not using the right syntax, I would expect this simple example to 
 work. I have tried some variations (quotes, no quotes, ...), to no avail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4116) Can't use views using map datatype.


 [ 
https://issues.apache.org/jira/browse/HIVE-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4116:


Affects Version/s: 0.11.0

 Can't use views using map datatype.
 ---

 Key: HIVE-4116
 URL: https://issues.apache.org/jira/browse/HIVE-4116
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1, 0.10.0, 0.11.0
Reporter: Karel Vervaeke
Assignee: Navis

 Executing the following 
 {noformat}
 DROP TABLE IF EXISTS `items`;
 CREATE TABLE IF NOT EXISTS `items` (id INT, name STRING, info 
 MAPSTRING,STRING) PARTITIONED BY (ds STRING);
 DROP VIEW IF EXISTS `priceview`;
 CREATE VIEW `priceview` AS
 SELECT
 `items`.`id`,
 `items`.info['price']
 FROM
 `items`
 ;
 select * from `priceview`;
 {noformat}
 Produces the following error:
 {noformat}
 karel@tomato:~/tmp$ $HIVE_HOME/bin/hive -f hivebug.sql
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Logging initialized using configuration in 
 jar:file:/home/karel/opt/hive-0.10.0-bin/lib/hive-common-0.10.0.jar!/hive-log4j.properties
 Hive history file=/tmp/karel/hive_job_log_karel_201303051117_945318761.txt
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/home/karel/opt/hadoop-2.0.0-mr1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/home/karel/opt/hive-0.10.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 OK
 Time taken: 5.449 seconds
 OK
 Time taken: 0.303 seconds
 OK
 Time taken: 0.131 seconds
 OK
 Time taken: 0.206 seconds
 FAILED: SemanticException line 3:22 mismatched input '.' expecting FROM near 
 '`items`' in from clause
  in definition of VIEW priceview [
 SELECT
 `items`.`id`,
 `items``items`.`info`info['price']
 FROM
 `default`.`items`
 ] used as priceview at Line 3:14
 {noformat}
 Unless I'm not using the right syntax, I would expect this simple example to 
 work. I have tried some variations (quotes, no quotes, ...), to no avail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4116) Can't use views using map datatype.

2013-09-17 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4116:
--

Attachment: D12975.1.patch

navis requested code review of HIVE-4116 [jira] Can't use views using map 
datatype..

Reviewers: JIRA

HIVE-4116 Cannot use views using map datatype

Executing the following

DROP TABLE IF EXISTS `items`;
CREATE TABLE IF NOT EXISTS `items` (id INT, name STRING, info 
MAPSTRING,STRING) PARTITIONED BY (ds STRING);

DROP VIEW IF EXISTS `priceview`;
CREATE VIEW `priceview` AS
SELECT
`items`.`id`,
`items`.info['price']
FROM
`items`
;

select * from `priceview`;

Produces the following error:

karel@tomato:~/tmp$ $HIVE_HOME/bin/hive -f hivebug.sql
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in 
jar:file:/home/karel/opt/hive-0.10.0-bin/lib/hive-common-0.10.0.jar!/hive-log4j.properties
Hive history file=/tmp/karel/hive_job_log_karel_201303051117_945318761.txt
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/karel/opt/hadoop-2.0.0-mr1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/karel/opt/hive-0.10.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
OK
Time taken: 5.449 seconds
OK
Time taken: 0.303 seconds
OK
Time taken: 0.131 seconds
OK
Time taken: 0.206 seconds
FAILED: SemanticException line 3:22 mismatched input '.' expecting FROM near 
'`items`' in from clause
 in definition of VIEW priceview [
SELECT
`items`.`id`,
`items``items`.`info`info['price']
FROM
`default`.`items`
] used as priceview at Line 3:14

Unless I'm not using the right syntax, I would expect this simple example to 
work. I have tried some variations (quotes, no quotes, ...), to no avail.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12975

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java
  ql/src/test/queries/clientpositive/create_view_translate.q
  ql/src/test/results/clientpositive/create_view_translate.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/31011/

To: JIRA, navis


 Can't use views using map datatype.
 ---

 Key: HIVE-4116
 URL: https://issues.apache.org/jira/browse/HIVE-4116
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1, 0.10.0, 0.11.0
Reporter: Karel Vervaeke
Assignee: Navis
 Attachments: D12975.1.patch


 Executing the following 
 {noformat}
 DROP TABLE IF EXISTS `items`;
 CREATE TABLE IF NOT EXISTS `items` (id INT, name STRING, info 
 MAPSTRING,STRING) PARTITIONED BY (ds STRING);
 DROP VIEW IF EXISTS `priceview`;
 CREATE VIEW `priceview` AS
 SELECT
 `items`.`id`,
 `items`.info['price']
 FROM
 `items`
 ;
 select * from `priceview`;
 {noformat}
 Produces the following error:
 {noformat}
 karel@tomato:~/tmp$ $HIVE_HOME/bin/hive -f hivebug.sql
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Logging initialized using configuration in 
 jar:file:/home/karel/opt/hive-0.10.0-bin/lib/hive-common-0.10.0.jar!/hive-log4j.properties
 Hive history file=/tmp/karel/hive_job_log_karel_201303051117_945318761.txt
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/home/karel/opt/hadoop-2.0.0-mr1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/home/karel/opt/hive-0.10.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 OK
 Time taken: 5.449 seconds
 OK
 Time taken: 0.303 seconds
 OK
 Time taken: 0.131 seconds
 OK
 Time taken: 0.206 seconds
 FAILED: SemanticException line 3:22 mismatched input '.' expecting FROM near 
 '`items`' in from clause
  in definition of VIEW priceview [
 SELECT
 `items`.`id`,
 `items``items`.`info`info['price']
 FROM
 `default`.`items`
 ] used as priceview at Line 3:14
 {noformat}
 Unless I'm not using the right syntax, I would expect this simple example to 
 work. I have tried some variations (quotes, no quotes, ...), to no avail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA,

[jira] [Resolved] (HIVE-4173) Hive Ingnoring where clause for multitable insert


 [ 
https://issues.apache.org/jira/browse/HIVE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis resolved HIVE-4173.
-

Resolution: Duplicate

 Hive Ingnoring where clause for multitable insert
 -

 Key: HIVE-4173
 URL: https://issues.apache.org/jira/browse/HIVE-4173
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.8.1, 0.9.0
 Environment: Red Hat Enterprise Linux Server release 6.3 (Santiago),
Reporter: hussain
Priority: Critical

 Hive is ignoring Filter conditions given at Multi Insert select statement 
 when  Filtering given on Source Query..
 To highlight this issue, please see below example with where clause 
 (status!='C') from employee12 table causing issue and due to which insert 
 filters (batch_id='12 and batch_id!='12' )not working and dumping all the 
 data coming from source to both the tables.
 I have checked the hive execution plan, and didn't find Filter predicates 
 under for filtering record per insert statements
 from 
 (from employee12
 select * 
 where status!='C') t
 insert into table employee1
 select 
 status,
 field1,
 'T' as field2,
 'P' as field3,
 'C' as field4
 where batch_id='12'
 insert into table employee2
 select
 status,
 field1,
 'D' as field2, 
 'P' as field3,
 'C' as field4
 where batch_id!='12';
 It is working fine with single insert. Hive generating plan properly.. 
 I am able to reproduce this issue with 8.1 and 9.0 version of Hive.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5122) Add partition for multiple partition ignores locations for non-first partitions


[ 
https://issues.apache.org/jira/browse/HIVE-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769302#comment-13769302
 ] 

Thejas M Nair commented on HIVE-5122:
-

Looks good. +1


 Add partition for multiple partition ignores locations for non-first 
 partitions
 ---

 Key: HIVE-5122
 URL: https://issues.apache.org/jira/browse/HIVE-5122
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D12411.3.patch, D12411.4.patch, 
 HIVE-5122.D12411.1.patch, HIVE-5122.D12411.2.patch


 http://www.mail-archive.com/user@hive.apache.org/msg09151.html
 When multiple partitions are being added in single alter table statement, the 
 location for first partition is being used as the location of all partitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.

2013-09-17 Thread Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769320#comment-13769320
 ] 

Douglas commented on HIVE-5296:
---

Hi Kousuke, 

Thanks for your interest. Here are the answers to your questions:

1) This is most definitely the Hiverserver2 process. I validated this by 
tracking the heap space utilised for the hiveserver2 process over time, as 
connections were opened and closed. 

2) The queries that were being executed were for the most part LOAD DATA 
INPATH:

{code}

int returnCode = hc.update(LOAD DATA INPATH \'+fileName +\'  +
OVERWRITE INTO TABLE +targetTable+ 
partition (dt=\'+cal.getTimeInMillis()+\'));
logger.info(this.getClass()+  returned with value 
+returnCode);   
{code}

These were a mix of successes, and exceptions. I've yet to validate if the leak 
occurs in all instances, or in those cases where the hiveserver throws 
Exceptions.

3) I've not had the time to dig into the hiveserver code as yet to find the 
offending object, but if I do get the chance, I will certainly post my findings 
and a patch.



 Memory leak: OOM Error after multiple open/closed JDBC connections. 
 

 Key: HIVE-5296
 URL: https://issues.apache.org/jira/browse/HIVE-5296
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
 Environment: Hive 0.12.0, Hadoop 1.1.2, Debian.
Reporter: Douglas
  Labels: hiveserver
 Fix For: 0.12.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481
 However, on inspection of the related patch and my built version of Hive 
 (patch carried forward to 0.12.0), I am still seeing the described behaviour.
 Multiple connections to Hiveserver2, all of which are closed and disposed of 
 properly show the Java heap size to grow extremely quickly. 
 This issue can be recreated using the following code
 {code}
 import java.sql.DriverManager;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.sql.Statement;
 import java.util.Properties;
 import org.apache.hive.service.cli.HiveSQLException;
 import org.apache.log4j.Logger;
 /*
  * Class which encapsulates the lifecycle of a query or statement.
  * Provides functionality which allows you to create a connection
  */
 public class HiveClient {
   
   Connection con;
   Logger logger;
   private static String driverName = org.apache.hive.jdbc.HiveDriver;   
   private String db;
   
   
   public HiveClient(String db)
   {   
   logger = Logger.getLogger(HiveClient.class);
   this.db=db;
   
   try{
Class.forName(driverName);
   }catch(ClassNotFoundException e){
   logger.info(Can't find Hive driver);
   }
   
   String hiveHost = GlimmerServer.config.getString(hive/host);
   String hivePort = GlimmerServer.config.getString(hive/port);
   String connectionString = jdbc:hive2://+hiveHost+:+hivePort 
 +/default;
   logger.info(String.format(Attempting to connect to 
 %s,connectionString));
   try{
   con = 
 DriverManager.getConnection(connectionString,,);  
 
   }catch(Exception e){
   logger.error(Problem instantiating the 
 connection+e.getMessage());
   }   
   }
   
   public int update(String query) 
   {
   Integer res = 0;
   Statement stmt = null;
   try{
   stmt = con.createStatement();
   String switchdb = USE +db;
   logger.info(switchdb);  
   stmt.executeUpdate(switchdb);
   logger.info(query);
   res = stmt.executeUpdate(query);
   logger.info(Query passed to server);  
   stmt.close();
   }catch(HiveSQLException e){
   logger.info(String.format(HiveSQLException thrown, 
 this can be valid,  +
   but check the error: %s from the query 
 %s,query,e.toString()));
   }catch(SQLException e){
   logger.error(String.format(Unable to execute query 
 SQLException %s. Error: %s,query,e));
   }catch(Exception e){

[jira] [Created] (HIVE-5302) PartitionPruner fails on Avro non-partitioned data

2013-09-17 Thread Sean Busbey (JIRA)

Sean Busbey created HIVE-5302:
-

 Summary: PartitionPruner fails on Avro non-partitioned data
 Key: HIVE-5302
 URL: https://issues.apache.org/jira/browse/HIVE-5302
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Sean Busbey
Priority: Blocker


While updating HIVE-3585 I found a test case that causes the failure in the 
MetaStoreUtils partition retrieval from back in HIVE-4789.

in this case, the failure is triggered when the partition pruner is handed a 
non-partitioned table and has to construct a pseudo-partition.

e.g.
{code}
  INSERT OVERWRITE TABLE partitioned_table PARTITION(col) SELECT id, foo, col 
FROM non_partitioned_table WHERE col = 9;
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5297) Hive does not honor type for partition columns

2013-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769334#comment-13769334
 ] 

Hive QA commented on HIVE-5297:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603502/HIVE-5297.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 3128 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_type_check
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_table_add_partition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_view_failure5
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/775/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/775/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the partition 
 addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Status: Open  (was: Patch Available)

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the partition 
 addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Attachment: HIVE-5297.3.patch

Fix failing tests. The type of error changed for the negative tests.

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the partition 
 addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 14155: HIVE-5297 Hive does not honor type for partition columns

2013-09-17 Thread Vikram Dixit Kumaraswamy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14155/
---

(Updated Sept. 17, 2013, 9:08 a.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Updated test results.


Bugs: HIVE-5297
https://issues.apache.org/jira/browse/HIVE-5297


Repository: hive-git


Description
---

Hive does not consider the type of the partition column while writing 
partitions. Consider for example the query:

create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
row format delimited fields terminated by ',';
alter table tab1 add partition (month='June', day='second');

Hive accepts this query. However if you try to select from this table and 
insert into another expecting schema match, it will insert nulls instead.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1af68a6 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2ece97e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java a704462 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java fb79823 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ca667d4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 767f545 
  ql/src/test/queries/clientnegative/illegal_partition_type.q PRE-CREATION 
  ql/src/test/queries/clientnegative/illegal_partition_type2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/partition_type_check.q PRE-CREATION 
  ql/src/test/results/clientnegative/alter_table_add_partition.q.out bd9c148 
  ql/src/test/results/clientnegative/alter_view_failure5.q.out 4edb82c 
  ql/src/test/results/clientnegative/illegal_partition_type.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/illegal_partition_type2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/parititon_type_check.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/partition_type_check.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/14155/diff/


Testing
---

Ran all tests.


Thanks,

Vikram Dixit Kumaraswamy

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Status: Patch Available  (was: Open)

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the partition 
 addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

HBase Tables Join in Hive

2013-09-17 Thread Kiran Kumar.M.R

Hi,
I am having 2 tables in HBase.

Table1: data, userID ( This is very big table)
Table2: userID, userDetails ( This is a smaller table)

We need to join both the table on userID column and perform some queries.

Our idea is to map both Table1 and Table2 in Hive using HBaseStorageHandler.

Does Hive support JOIN also on these HBase mapped tables?

Regards,
Kiran

[jira] [Commented] (HIVE-4998) support jdbc documented table types in default configuration


[ 
https://issues.apache.org/jira/browse/HIVE-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769432#comment-13769432
 ] 

Hudson commented on HIVE-4998:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #101 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/101/])
HIVE-4998 support jdbc documented table types in default configuration (Thejas 
Nair via Harish Butani) (rhbutani: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1523741)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java


 support jdbc documented table types in default configuration
 

 Key: HIVE-4998
 URL: https://issues.apache.org/jira/browse/HIVE-4998
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.12.0

 Attachments: HIVE-4998.1.patch


 The jdbc table types supported by hive server2 are not the documented typical 
 types [1] in jdbc, they are hive specific types (MANAGED_TABLE, 
 EXTERNAL_TABLE, VIRTUAL_VIEW). 
 HIVE-4573 added support for the jdbc documented typical types, but the HS2 
 default configuration is to return the hive types 
 The default configuration should result in the expected jdbc typical behavior.
 [1] 
 http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html?is-external=true#getTables(java.lang.String,
  java.lang.String, java.lang.String, java.lang.String[])

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4998) support jdbc documented table types in default configuration


[ 
https://issues.apache.org/jira/browse/HIVE-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769467#comment-13769467
 ] 

Hudson commented on HIVE-4998:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #435 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/435/])
HIVE-4998 support jdbc documented table types in default configuration (Thejas 
Nair via Harish Butani) (rhbutani: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1523741)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java


 support jdbc documented table types in default configuration
 

 Key: HIVE-4998
 URL: https://issues.apache.org/jira/browse/HIVE-4998
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.12.0

 Attachments: HIVE-4998.1.patch


 The jdbc table types supported by hive server2 are not the documented typical 
 types [1] in jdbc, they are hive specific types (MANAGED_TABLE, 
 EXTERNAL_TABLE, VIRTUAL_VIEW). 
 HIVE-4573 added support for the jdbc documented typical types, but the HS2 
 default configuration is to return the hive types 
 The default configuration should result in the expected jdbc typical behavior.
 [1] 
 http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html?is-external=true#getTables(java.lang.String,
  java.lang.String, java.lang.String, java.lang.String[])

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4605) hive job fails when insert overwrite ORC table

2013-09-17 Thread Joe Travaglini (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769472#comment-13769472
 ] 

Joe Travaglini commented on HIVE-4605:
--

Brock,
I've also been seeing this same symptom several times over the past month, but 
with RCFile and not ORC.
I also cannot reliably reproduce, but it's happening.
See also 
http://mail-archives.apache.org/mod_mbox/hive-user/201306.mbox/%3ccansfgrkr0jy5w3ey3z8awtwpyphgu5yedicybvjbnwr_o_5...@mail.gmail.com%3E
 which seems like the same symptom.
Hive 0.10 in CDH4.3.1

 hive job fails when insert overwrite ORC table
 --

 Key: HIVE-4605
 URL: https://issues.apache.org/jira/browse/HIVE-4605
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
 Environment: OS: 2.6.18-194.el5xen #1 SMP Fri Apr 2 15:34:40 EDT 2010 
 x86_64 x86_64 x86_64 GNU/Linux
 Hadoop 1.1.2
Reporter: Link Qian
Assignee: Brock Noland

 1, create a table with ORC storage model
 create table iparea_analysis_orc (network int, ip string,   )
 stored as ORC;
 2, insert table iparea_analysis_orc select  network, ip,  , the script 
 success, but failed after add *OVERWRITE* keyword.  the main error log list 
 as here.
 ava.lang.RuntimeException: Hive Runtime Error while closing operators: Unable 
 to rename output from: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_task_tmp.-ext-1/_tmp.00_0
  to: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_tmp.-ext-1/00_0
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:317)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
 output from: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_task_tmp.-ext-1/_tmp.00_0
  to: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_tmp.-ext-1/00_0
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:197)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:108)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:867)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:309)
   ... 7 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

did you always have to log in to phabricator

2013-09-17 Thread Edward Capriolo

I never remeber having to log into phabricator to view a patch. Has this
changed recently? I believe that having to create an external account to
view a patch in progress is not something we should be doing.

[jira] [Commented] (HIVE-3764) Support metastore version consistency check

[
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769546#comment-13769546
]

Brock Noland commented on HIVE-3764:

FYI it looks like you tried to delete HIVE-3764.4.patch but it's still there?
Anyway based on the date it looks like HIVE-3764.1.patch is the current patch.

Support metastore version consistency check
---

Attachments: HIVE-3764.1.patch, HIVE-3764.4.patch

[jira] [Updated] (HIVE-5292) Join on decimal columns fails to return rows


 [ 
https://issues.apache.org/jira/browse/HIVE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5292:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!
[~thejas] I will recommend inclusion of this bug fix in 0.12 as well.

 Join on decimal columns fails to return rows
 

 Key: HIVE-5292
 URL: https://issues.apache.org/jira/browse/HIVE-5292
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
 Environment: Linux lnxx64r5 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 
 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Sergio Lob
Assignee: Navis
 Fix For: 0.13.0

 Attachments: D12969.1.patch


 Join on matching decimal columns returns 0 rows
 To reproduce (I used beeline):
 1. create 2 simple identical tables with 2 identical rows: 
 CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 2. populate tables with identical data:
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ;
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ;
 3. data file decdata contains:
 10|.98
 20|1234567890.1234
 4. Perform join (returns 0 rows instead of 2):
 SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON
 T1.D = T2.D ;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: did you always have to log in to phabricator

2013-09-17 Thread Brock Noland

Personally I prefer Review Board.

On Tue, Sep 17, 2013 at 8:31 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 I never remeber having to log into phabricator to view a patch. Has this
 changed recently? I believe that having to create an external account to
 view a patch in progress is not something we should be doing.



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

[jira] [Updated] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.


 [ 
https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5285:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Hari!
[~thejas] Will recommend inclusion of this and HIVE-5199 in 0.12 branch.

 Custom SerDes throw cast exception when there are complex nested structures 
 containing NonSettableObjectInspectors.
 ---

 Key: HIVE-5285
 URL: https://issues.apache.org/jira/browse/HIVE-5285
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Fix For: 0.13.0

 Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt


 The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is 
 incomplete. Consider a complex nested structure containing the following 
 object inspector hierarchy:
 SettableStructObjectInspector
 {
   ListObjectInspectorNonSettableStructObjectInspector
 }
 In the above case, the cast exception can happen via 
 MapOperator/FetchOperator as below:
 java.io.IOException: java.lang.ClassCastException: 
 com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 Caused by: java.lang.ClassCastException: 
 com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
 cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:294)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529)
 ... 13 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5294) Create collect UDF and make evaluator reusable


[ 
https://issues.apache.org/jira/browse/HIVE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769559#comment-13769559
 ] 

Brock Noland commented on HIVE-5294:


This looks good! I wander about the aggregation buffer constructor, 
specifically:

Log.error(buffer type was null);

Won't this lead to a NPE later? If so, should we just throw a RuntimeException?

 Create collect UDF and make evaluator reusable
 --

 Key: HIVE-5294
 URL: https://issues.apache.org/jira/browse/HIVE-5294
 Project: Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5294.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5088) Fix udf_translate.q on Windows


[ 
https://issues.apache.org/jira/browse/HIVE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769562#comment-13769562
 ] 

Ashutosh Chauhan commented on HIVE-5088:


Just to mention underlying problem in Hadoop is already committed via 
HADOOP-9801 which will be available in 1.3.0 and 2.1.1-beta, so we may need not 
to put a workaround in Hive for this.

 Fix udf_translate.q on Windows
 --

 Key: HIVE-5088
 URL: https://issues.apache.org/jira/browse/HIVE-5088
 Project: Hive
  Issue Type: Bug
  Components: Tests, Windows
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-5088-1.patch


 Test failed with message:
 [junit] Begin query: udf_translate.q
 [junit] 13/08/14 03:23:57 FATAL conf.Configuration: error parsing conf 
 file: 
 com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: 
 Invalid byte 1 of 1-byte UTF-8 sequence.
 [junit] Exception in thread main java.lang.RuntimeException: 
 com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: 
 Invalid byte 1 of 1-byte UTF-8 sequence.
 [junit]   at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1255)
 [junit]   at 
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1117)
 [junit]   at 
 org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1053)
 [junit]   at 
 org.apache.hadoop.conf.Configuration.get(Configuration.java:397)
 [junit]   at 
 org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:594)
 [junit]   at 
 org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:1015)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:659)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit]   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit]   at java.lang.reflect.Method.invoke(Method.java:597)
 [junit]   at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 [junit] Caused by: 
 com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: 
 Invalid byte 1 of 1-byte UTF-8 sequence.
 [junit]   at 
 com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
 [junit]   at 
 com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)
 [junit]   at 
 com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
 [junit]   at 
 com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(XMLEntityScanner.java:487)
 [junit]   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2688)
 [junit]   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:647)
 [junit]   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
 [junit]   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
 [junit]   at 
 com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
 [junit]   at 
 com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
 [junit]   at 
 com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
 [junit]   at 
 com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:232)
 [junit]   at 
 com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
 [junit]   at 
 javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
 [junit]   at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1181)
 [junit]   ... 11 more
 [junit] Exception: Client Execution failed with error code = 1
 [junit] See build/ql/tmp/hive.log, or try ant test ... 
 -Dtest.silent=false to get more logs.
 [junit] junit.framework.AssertionFailedError: Client Execution failed 
 with error code = 1
 [junit] See build/ql/tmp/hive.log, or try ant test ... 
 -Dtest.silent=false to get more logs.
 [junit]   at junit.framework.Assert.fail(Assert.java:47)
 [junit]   at 
 org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:122)
 [junit]   at 
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_translate(TestCliDriver.java:104)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]   at

[jira] [Commented] (HIVE-5294) Create collect UDF and make evaluator reusable


[ 
https://issues.apache.org/jira/browse/HIVE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769586#comment-13769586
 ] 

Edward Capriolo commented on HIVE-5294:
---

Yes that should throw at runtime. That was something left over from testing.

 Create collect UDF and make evaluator reusable
 --

 Key: HIVE-5294
 URL: https://issues.apache.org/jira/browse/HIVE-5294
 Project: Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5294.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS


 [ 
https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5246:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thank you for your contribution!

  Local task for map join submitted via oozie job fails on a secure HDFS
 ---

 Key: HIVE-5246
 URL: https://issues.apache.org/jira/browse/HIVE-5246
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.13.0

 Attachments: HIVE-5246.1.patch, HIVE-5246-test.tar


 For a Hive query started by Oozie Hive action, the local task submitted for 
 Mapjoin fails. The HDFS delegation token is not shared properly with the 
 child JVM created for the local task.
 Oozie creates a delegation token for the Hive action and sets env variable 
 HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config 
 property. However this doesn't get passed down to the child JVM which causes 
 the problem.
 This is similar issue addressed by HIVE-4343 which address the problem 
 HiveServer2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc


[ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769582#comment-13769582
 ] 

Ashutosh Chauhan commented on HIVE-5279:


With latest patch tests in TestCliDriver  TestContribCliDriver tests are no 
more failing, so thats a progress. However, 4 tests in TestParse are still 
failing, as previously, namely {{groupby1.q}} {{groupby2.q}} {{groupby3.q}} and 
{{groupby5.q}}

 Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
 ---

 Key: HIVE-5279
 URL: https://issues.apache.org/jira/browse/HIVE-5279
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Critical
 Attachments: 5279.patch, D12963.1.patch, D12963.2.patch, 
 D12963.3.patch


 We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
 previous serialization mechanism solved this but, kryo complaints that it's 
 not Serializable and fails the query.
 The log below is the example, 
 {noformat}
 java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
 Serialization trace:
 inputOI 
 (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
 genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
 aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
 conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
   at org.apache.h
 {noformat}
 If this cannot be fixed in somehow, some UDAFs should be modified to be run 
 on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3764) Support metastore version consistency check

[
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769634#comment-13769634
]

Brock Noland commented on HIVE-3764:

It was my fault! I removed it.

Support metastore version consistency check
---

Attachments: HIVE-3764.1.patch

[jira] [Commented] (HIVE-3764) Support metastore version consistency check

[
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769630#comment-13769630
]

Prasad Mujumdar commented on HIVE-3764:
---

[~brocknoland] yes, HIVE-3764.1.patch is the latest. The .4.patch left in there
was the one that you added (to refresh the correct patch for test run) hence I
couldn't remove that. sorry about the confusion.

Support metastore version consistency check
---

Attachments: HIVE-3764.1.patch, HIVE-3764.4.patch

[jira] [Updated] (HIVE-3764) Support metastore version consistency check


 [ 
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-3764:
---

Attachment: (was: HIVE-3764.4.patch)

 Support metastore version consistency check
 ---

 Key: HIVE-3764
 URL: https://issues.apache.org/jira/browse/HIVE-3764
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-3764.1.patch


 Today there's no version/compatibility information stored in hive metastore. 
 Also the datanucleus configuration property to automatically create missing 
 tables is enabled by default. If you happen to start an older or newer hive 
 or don't run the correct upgrade scripts during migration, the metastore 
 would end up corrupted. The autoCreate schema is not always sufficient to 
 upgrade metastore when migrating to newer release. It's not supported with 
 all databases. Besides the migration often involves altering existing table, 
 changing or moving data etc.
 Hence it's very useful to have some consistency check to make sure that hive 
 is using correct metastore and for production systems the schema is not 
 automatically by running hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5294) Create collect UDF and make evaluator reusable


 [ 
https://issues.apache.org/jira/browse/HIVE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5294:
--

Attachment: HIVE-5294.1.patch.txt

 Create collect UDF and make evaluator reusable
 --

 Key: HIVE-5294
 URL: https://issues.apache.org/jira/browse/HIVE-5294
 Project: Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5294.1.patch.txt, HIVE-5294.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5294) Create collect UDF and make evaluator reusable


[ 
https://issues.apache.org/jira/browse/HIVE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769621#comment-13769621
 ] 

Edward Capriolo commented on HIVE-5294:
---

.1 throws Runtime exception (which we should never hit anyway)

 Create collect UDF and make evaluator reusable
 --

 Key: HIVE-5294
 URL: https://issues.apache.org/jira/browse/HIVE-5294
 Project: Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5294.1.patch.txt, HIVE-5294.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4998) support jdbc documented table types in default configuration


[ 
https://issues.apache.org/jira/browse/HIVE-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769639#comment-13769639
 ] 

Hudson commented on HIVE-4998:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2337 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2337/])
HIVE-4998 support jdbc documented table types in default configuration (Thejas 
Nair via Harish Butani) (rhbutani: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1523741)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java


 support jdbc documented table types in default configuration
 

 Key: HIVE-4998
 URL: https://issues.apache.org/jira/browse/HIVE-4998
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.12.0

 Attachments: HIVE-4998.1.patch


 The jdbc table types supported by hive server2 are not the documented typical 
 types [1] in jdbc, they are hive specific types (MANAGED_TABLE, 
 EXTERNAL_TABLE, VIRTUAL_VIEW). 
 HIVE-4573 added support for the jdbc documented typical types, but the HS2 
 default configuration is to return the hive types 
 The default configuration should result in the expected jdbc typical behavior.
 [1] 
 http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html?is-external=true#getTables(java.lang.String,
  java.lang.String, java.lang.String, java.lang.String[])

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: did you always have to log in to phabricator

2013-09-17 Thread Xuefu Zhang

Yeah. I used to be able to view w/o login, but now I am not.


On Tue, Sep 17, 2013 at 7:27 AM, Brock Noland br...@cloudera.com wrote:

 Personally I prefer Review Board.

 On Tue, Sep 17, 2013 at 8:31 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
  I never remeber having to log into phabricator to view a patch. Has this
  changed recently? I believe that having to create an external account to
  view a patch in progress is not something we should be doing.



 --
 Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

[jira] [Commented] (HIVE-5294) Create collect UDF and make evaluator reusable


[ 
https://issues.apache.org/jira/browse/HIVE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769654#comment-13769654
 ] 

Brock Noland commented on HIVE-5294:


Agreed. This looks good to me. I plan on committing it if tests pass.

 Create collect UDF and make evaluator reusable
 --

 Key: HIVE-5294
 URL: https://issues.apache.org/jira/browse/HIVE-5294
 Project: Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5294.1.patch.txt, HIVE-5294.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc


[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769688#comment-13769688
 ] 

Ashutosh Chauhan commented on HIVE-4113:


In addition to what [~yhuai] suggested for RCFile, similar enhancement exist 
for ORC as well, as ORC stores stats (including counts) per stripe which will 
allow us to do almost no IO, but I will say that those enhancements will likely 
require changes in query processing code, so I will consider them out of scope 
for this jira. Lets get this one in and take up enhancements in follow-up. 

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4173) Hive Ignoring where clause for multitable insert


 [ 
https://issues.apache.org/jira/browse/HIVE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-4173:
--

Summary: Hive Ignoring where clause for multitable insert  (was: Hive 
Ingnoring where clause for multitable insert)

 Hive Ignoring where clause for multitable insert
 

 Key: HIVE-4173
 URL: https://issues.apache.org/jira/browse/HIVE-4173
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.8.1, 0.9.0
 Environment: Red Hat Enterprise Linux Server release 6.3 (Santiago),
Reporter: hussain
Priority: Critical

 Hive is ignoring Filter conditions given at Multi Insert select statement 
 when  Filtering given on Source Query..
 To highlight this issue, please see below example with where clause 
 (status!='C') from employee12 table causing issue and due to which insert 
 filters (batch_id='12 and batch_id!='12' )not working and dumping all the 
 data coming from source to both the tables.
 I have checked the hive execution plan, and didn't find Filter predicates 
 under for filtering record per insert statements
 from 
 (from employee12
 select * 
 where status!='C') t
 insert into table employee1
 select 
 status,
 field1,
 'T' as field2,
 'P' as field3,
 'C' as field4
 where batch_id='12'
 insert into table employee2
 select
 status,
 field1,
 'D' as field2, 
 'P' as field3,
 'C' as field4
 where batch_id!='12';
 It is working fine with single insert. Hive generating plan properly.. 
 I am able to reproduce this issue with 8.1 and 9.0 version of Hive.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc


[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769693#comment-13769693
 ] 

Brock Noland commented on HIVE-4113:


Agreed. Unfortunately I won't have time to take this up in the next few days so 
if someone has time and would like to see this in soon I'd be more than willing 
to hand it off.

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833


 [ 
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5298:
--

Status: Patch Available  (was: Open)

 AvroSerde performance problem caused by HIVE-3833
 -

 Key: HIVE-5298
 URL: https://issues.apache.org/jira/browse/HIVE-5298
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.13.0

 Attachments: HIVE-5298.1.patch, HIVE-5298.patch


 HIVE-3833 fixed the targeted problem and made Hive to use partition-level 
 metadata to initialize object inspector. In doing that, however, it goes thru 
 every file under the table to access the partition metadata, which is very 
 inefficient, especially in case of multiple files per partition. This causes 
 more problem for AvroSerde because AvroSerde initialization accesses schema, 
 which is located on file system. As a result, before hive can process any 
 data, it needs to access every file for a table, which can take long enough 
 to cause job failure because of lack of job progress.
 The improvement can be made so that partition metadata is only access once 
 per partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5294) Create collect UDF and make evaluator reusable

2013-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769709#comment-13769709
 ] 

Hive QA commented on HIVE-5294:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603607/HIVE-5294.1.patch.txt

{color:green}SUCCESS:{color} +1 3126 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/783/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/783/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Create collect UDF and make evaluator reusable
 --

 Key: HIVE-5294
 URL: https://issues.apache.org/jira/browse/HIVE-5294
 Project: Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5294.1.patch.txt, HIVE-5294.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833

[
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xuefu Zhang updated HIVE-5298:
--

Attachment: HIVE-5298.1.patch

Update the patch based on test result. Note that no test is added for this due
to the nature of the issue. However, I will do manual testing and with update
with the result.

AvroSerde performance problem caused by HIVE-3833
-

Key: HIVE-5298
URL: https://issues.apache.org/jira/browse/HIVE-5298
Project: Hive
Issue Type: Improvement
Components: Query Processor
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Fix For: 0.13.0

Attachments: HIVE-5298.1.patch, HIVE-5298.patch

HIVE-3833 fixed the targeted problem and made Hive to use partition-level
metadata to initialize object inspector. In doing that, however, it goes thru
every file under the table to access the partition metadata, which is very
inefficient, especially in case of multiple files per partition. This causes
more problem for AvroSerde because AvroSerde initialization accesses schema,
which is located on file system. As a result, before hive can process any
data, it needs to access every file for a table, which can take long enough
to cause job failure because of lack of job progress.
The improvement can be made so that partition metadata is only access once
per partition.

Re: did you always have to log in to phabricator

2013-09-17 Thread Edward Capriolo

I do not like this. It is inconvenience when using a mobile device, but
more importantly it does not seem very transparent to our end users. For
example, a user is browsing jira they may want to review the code only on
review board (not yet attached to the issue), they should not be forced to
sign up to help in the process.

Would anyone from facebook care to chime in here? I think we all like
fabricator for the most part. Our docs suggest this fabricator is our
de-facto review system. As an ASF project I do not think requiring a login
on some external service even to review a jira is correct.


On Tue, Sep 17, 2013 at 12:27 PM, Xuefu Zhang xzh...@cloudera.com wrote:

 Yeah. I used to be able to view w/o login, but now I am not.


 On Tue, Sep 17, 2013 at 7:27 AM, Brock Noland br...@cloudera.com wrote:

  Personally I prefer Review Board.
 
  On Tue, Sep 17, 2013 at 8:31 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
   I never remeber having to log into phabricator to view a patch. Has
 this
   changed recently? I believe that having to create an external account
 to
   view a patch in progress is not something we should be doing.
 
 
 
  --
  Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

[jira] [Commented] (HIVE-5292) Join on decimal columns fails to return rows


[ 
https://issues.apache.org/jira/browse/HIVE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769748#comment-13769748
 ] 

Hudson commented on HIVE-5292:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #102 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/102/])
HIVE-5292 : Join on decimal columns fails to return rows (Navis via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524062)
* 
/hive/trunk/common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java
* /hive/trunk/ql/src/test/queries/clientpositive/decimal_join.q
* /hive/trunk/ql/src/test/results/clientpositive/decimal_join.q.out


 Join on decimal columns fails to return rows
 

 Key: HIVE-5292
 URL: https://issues.apache.org/jira/browse/HIVE-5292
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
 Environment: Linux lnxx64r5 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 
 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Sergio Lob
Assignee: Navis
 Fix For: 0.13.0

 Attachments: D12969.1.patch


 Join on matching decimal columns returns 0 rows
 To reproduce (I used beeline):
 1. create 2 simple identical tables with 2 identical rows: 
 CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 2. populate tables with identical data:
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ;
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ;
 3. data file decdata contains:
 10|.98
 20|1234567890.1234
 4. Perform join (returns 0 rows instead of 2):
 SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON
 T1.D = T2.D ;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS


[ 
https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769747#comment-13769747
 ] 

Hudson commented on HIVE-5246:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #102 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/102/])
HIVE-5246 - Local task for map join submitted via oozie job fails on a secure 
HDFS (Prasad Mujumdar via Brock Noland) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524074)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SecureCmdDoAs.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java


  Local task for map join submitted via oozie job fails on a secure HDFS
 ---

 Key: HIVE-5246
 URL: https://issues.apache.org/jira/browse/HIVE-5246
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.13.0

 Attachments: HIVE-5246.1.patch, HIVE-5246-test.tar


 For a Hive query started by Oozie Hive action, the local task submitted for 
 Mapjoin fails. The HDFS delegation token is not shared properly with the 
 child JVM created for the local task.
 Oozie creates a delegation token for the Hive action and sets env variable 
 HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config 
 property. However this doesn't get passed down to the child JVM which causes 
 the problem.
 This is similar issue addressed by HIVE-4343 which address the problem 
 HiveServer2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5297) Hive does not honor type for partition columns

2013-09-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769729#comment-13769729
 ] 

Sergey Shelukhin commented on HIVE-5297:


There are open comments remaining... one should be a straightforward code change

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the partition 
 addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5271) Convert join op to a map join op in the planning phase


 [ 
https://issues.apache.org/jira/browse/HIVE-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5271:
-

Attachment: HIVE-5271.WIP.patch

WIP patch.

 Convert join op to a map join op in the planning phase
 --

 Key: HIVE-5271
 URL: https://issues.apache.org/jira/browse/HIVE-5271
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5271.WIP.patch


 This captures the planning changes required in hive to support hash joins. We 
 need to convert the join operator to a map join operator. This is hooked into 
 the infrastructure provided by HIVE-5095.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-17 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769730#comment-13769730
 ] 

Yin Huai commented on HIVE-4113:


Let me take a look. Seems only a few minor changes are needed for Brock's 
patch. One thing I need to make sure is if we populate all columns in the list 
of needed columns for select * from. If so, we will not need 
hive.io.file.read.all.columns.

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4531) [WebHCat] Collecting task logs to hdfs


[ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769763#comment-13769763
 ] 

Eugene Koifman commented on HIVE-4531:
--

Could you open a review board so we can embed comments next to the code they 
refer to?

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog, WebHCat
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, 
 HIVE-4531-8.patch, HIVE-4531-9.patch, samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc


[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769740#comment-13769740
 ] 

Ashutosh Chauhan commented on HIVE-4113:


Thanks [~yhuai] for volunteering. Assigning it to you.

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc


 [ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4113:
---

Assignee: Yin Huai  (was: Brock Noland)

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4531) [WebHCat] Collecting task logs to hdfs


[ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769742#comment-13769742
 ] 

Eugene Koifman commented on HIVE-4531:
--

I realized I missed LogRetriever in the review:
1. It opens URLConnection in several places but doesn't close them.
2. Is this class meant to be used anywhere other than TempletonControllerJob?  
If no, can it be moved to the same package and be made package private (To 
reduce public API footprint)?  Similarly, could all member variables/methods be 
made as private as possible?
3. I think it would be really useful to add some higher level documentation 
about the design.  Why does this class exist?  why does it parse JSPs, where 
does it write the result, etc.  I think 1 or 2 paragraphs would be sufficient.

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog, WebHCat
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, 
 HIVE-4531-8.patch, samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5292) Join on decimal columns fails to return rows


[ 
https://issues.apache.org/jira/browse/HIVE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769776#comment-13769776
 ] 

Hudson commented on HIVE-5292:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #169 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/169/])
HIVE-5292 : Join on decimal columns fails to return rows (Navis via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524062)
* 
/hive/trunk/common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java
* /hive/trunk/ql/src/test/queries/clientpositive/decimal_join.q
* /hive/trunk/ql/src/test/results/clientpositive/decimal_join.q.out


 Join on decimal columns fails to return rows
 

 Key: HIVE-5292
 URL: https://issues.apache.org/jira/browse/HIVE-5292
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
 Environment: Linux lnxx64r5 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 
 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Sergio Lob
Assignee: Navis
 Fix For: 0.13.0

 Attachments: D12969.1.patch


 Join on matching decimal columns returns 0 rows
 To reproduce (I used beeline):
 1. create 2 simple identical tables with 2 identical rows: 
 CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 2. populate tables with identical data:
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ;
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ;
 3. data file decdata contains:
 10|.98
 20|1234567890.1234
 4. Perform join (returns 0 rows instead of 2):
 SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON
 T1.D = T2.D ;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS


[ 
https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769775#comment-13769775
 ] 

Hudson commented on HIVE-5246:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #169 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/169/])
HIVE-5246 - Local task for map join submitted via oozie job fails on a secure 
HDFS (Prasad Mujumdar via Brock Noland) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524074)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SecureCmdDoAs.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java


  Local task for map join submitted via oozie job fails on a secure HDFS
 ---

 Key: HIVE-5246
 URL: https://issues.apache.org/jira/browse/HIVE-5246
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.13.0

 Attachments: HIVE-5246.1.patch, HIVE-5246-test.tar


 For a Hive query started by Oozie Hive action, the local task submitted for 
 Mapjoin fails. The HDFS delegation token is not shared properly with the 
 child JVM created for the local task.
 Oozie creates a delegation token for the Hive action and sets env variable 
 HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config 
 property. However this doesn't get passed down to the child JVM which causes 
 the problem.
 This is similar issue addressed by HIVE-4343 which address the problem 
 HiveServer2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.


[ 
https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769774#comment-13769774
 ] 

Hudson commented on HIVE-5285:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #169 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/169/])
HIVE-5285 : Custom SerDes throw cast exception when there are complex nested 
structures containing NonSettableObjectInspectors.(Hari Sankar via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524067)
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/serde2/CustomSerDe3.java
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat17.q
* 
/hive/trunk/ql/src/test/results/clientpositive/partition_wise_fileformat17.q.out
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java


 Custom SerDes throw cast exception when there are complex nested structures 
 containing NonSettableObjectInspectors.
 ---

 Key: HIVE-5285
 URL: https://issues.apache.org/jira/browse/HIVE-5285
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Fix For: 0.13.0

 Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt


 The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is 
 incomplete. Consider a complex nested structure containing the following 
 object inspector hierarchy:
 SettableStructObjectInspector
 {
   ListObjectInspectorNonSettableStructObjectInspector
 }
 In the above case, the cast exception can happen via 
 MapOperator/FetchOperator as below:
 java.io.IOException: java.lang.ClassCastException: 
 com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 Caused by: java.lang.ClassCastException: 
 com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
 cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:294)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529)
 ... 13 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Interesting claims that seem untrue

2013-09-17 Thread Konstantin Boudnik

Carter,

what you are doing is essentially contradict ASF policy of community over
code.

Perhaps, your intentions are good. However, LOC calculations or other silly
contests are essentially driving a wedge between developers who happen to draw
their paycheck from different commercial entities. Hadoop community passed
through this already and it caused nothing but despair and bitterness between
the people.

Unlike some other popular contests, the number of lines contributed doesn't
matter for most. Seriously.

Regards,
Cos

On Mon, Sep 16, 2013 at 01:58PM, Carter Shanklin wrote:
Ed,

If nothing else I'm glad it was interesting enough to generate some
discussion. These sorts of stats are always subjects of a lot of
controversy. I have seen a lot of these sorts of charts float around in
confidential slide decks and I think it's good to have them out in the open
where anyone can critique and correct them.

In this case Ed, you've pointed out a legitimate flaw in my analysis. Doing
the analysis again I found that previously, due to a bug in my scripts,
JIRAs that didn't have Hudson comments in them were not counted (this was
one way it was identifying SVN commit IDs which I have since removed due to
flakiness). Brock's patch was the single largest victim of this bug but not
the only one, there were some from Cloudera, NexR, Hortonworks, Facebook
even 2 from you Ed. The interested can see a full list of exclusions here:
https://docs.google.com/spreadsheet/ccc?key=0ArmXd5zzNQm5dDJTMkFtaUk2d0dyU3hnWGJCcUczbXc#gid=0.
I apologize to those under-represented, there wasn't any intent on my part
to minimize anyone's work. The impact in final totals is Cloudera +5.4%,
NexR +0.8%, Facebook -2.7%, Hortonworks -3.3%. I will be updating the blog
later today with relevant corrections.

There is going to be continued interest in seeing charts like these, for
example when Hive 12 is officially done. Sanjay suggested that LoC counts
may not be the best way to represent true contribution. I agree that not
all lines of code are created equal, for example a few monster patches
recently went in re-arranging HCatalog namespaces and I think also
indentation style. This (hopefully) mechanical work is not on the same
footing as adding new query language features. Still it is work and
wouldn't be fair to pretend it didn't happen. If anyone has ideas on better
ways to fairly capture contribution I'm open to suggestions.

On Thu, Sep 12, 2013 at 7:19 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

I was reading the horton-works blog and found an interesting article.

http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/#comment-160753

There is a very interesting graphic which attempts to demonstrate lines of
code in the 12 release.
http://hortonworks.com/wp-content/uploads/2013/09/hive4.png

Although I do not know how they are calculated, they are probably counting
code generated by tests output, but besides that they are wrong.

One claim is that Cloudera contributed 4,244 lines of code.

So to debunk that claim:

In https://issues.apache.org/jira/browse/HIVE-4675 Brock Noland from
cloudera, created the ptest2 testing framework. He did all the work for
ptest2 in hive 12, and it is clearly more then 4,244

This consists of 84 java files
[edward@desksandra ptest2]$ find . -name *.java | wc -l
84
and by itself is 8001 lines of code.
[edward@desksandra ptest2]$ find . -name *.java | xargs cat | wc -l
8001

[edward@desksandra hive-trunk]$ wc -l HIVE-4675.patch
7902 HIVE-4675.patch

This is not the only feature from cloudera in hive 12.

There is also a section of the article that talks of a ROAD MAP for hive
features. I did not know we (hive) had a road map. I have advocated
switching to feature based release and having a road map before, but it was
suggested that might limit people from itch-scratching.

--
Carter Shanklin
Director, Product Management
Hortonworks
(M): +1.650.644.8795 (T): @cshanklin http://twitter.com/cshanklin

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

[jira] [Commented] (HIVE-5138) Streaming - Web HCat API


[ 
https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769784#comment-13769784
 ] 

Eugene Koifman commented on HIVE-5138:
--

OK, makes sense.  It would be useful to add some javadoc about concurrency (or 
rather why it's not an issue)

 Streaming - Web HCat  API
 -

 Key: HIVE-5138
 URL: https://issues.apache.org/jira/browse/HIVE-5138
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog, Metastore, WebHCat
Reporter: Roshan Naik
Assignee: Roshan Naik
 Attachments: HIVE-4196.v2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.


[ 
https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769746#comment-13769746
 ] 

Hudson commented on HIVE-5285:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #102 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/102/])
HIVE-5285 : Custom SerDes throw cast exception when there are complex nested 
structures containing NonSettableObjectInspectors.(Hari Sankar via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524067)
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/serde2/CustomSerDe3.java
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat17.q
* 
/hive/trunk/ql/src/test/results/clientpositive/partition_wise_fileformat17.q.out
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java


 Custom SerDes throw cast exception when there are complex nested structures 
 containing NonSettableObjectInspectors.
 ---

 Key: HIVE-5285
 URL: https://issues.apache.org/jira/browse/HIVE-5285
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Fix For: 0.13.0

 Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt


 The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is 
 incomplete. Consider a complex nested structure containing the following 
 object inspector hierarchy:
 SettableStructObjectInspector
 {
   ListObjectInspectorNonSettableStructObjectInspector
 }
 In the above case, the cast exception can happen via 
 MapOperator/FetchOperator as below:
 java.io.IOException: java.lang.ClassCastException: 
 com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 Caused by: java.lang.ClassCastException: 
 com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
 cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:294)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529)
 ... 13 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4568) Beeline needs to support resolving variables


[ 
https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769727#comment-13769727
 ] 

Edward Capriolo commented on HIVE-4568:
---

Sorry I have not had time to review this. I am not a good person to do this ATM 
because I am slightly clueless as to how beeline works. The code looks clean, 
but I would need to understand a bit more before I give it a +1.

 Beeline needs to support resolving variables
 

 Key: HIVE-4568
 URL: https://issues.apache.org/jira/browse/HIVE-4568
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-4568-1.patch, HIVE-4568-2.patch, HIVE-4568.3.patch, 
 HIVE-4568.4.patch, HIVE-4568.5.patch, HIVE-4568.6.patch, HIVE-4568.7.patch, 
 HIVE-4568.patch


 Previous Hive CLI allows user to specify hive variables at the command line 
 using option --hivevar. In user's script, reference to a hive variable will 
 be substituted with the value of the variable. In such way, user can 
 parameterize his/her script and invoke the script with different hive 
 variable values. The following script is one usage:
 {code}
 hive --hivevar
  INPUT=/user/jenkins/oozie.1371538916178/examples/input-data/table
  --hivevar
  OUTPUT=/user/jenkins/oozie.1371538916178/examples/output-data/hive
  -f script.q
 {code}
 script.q makes use of hive variables:
 {code}
 CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';
 INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test;
 {code}
 However, after upgrade to hiveserver2 and beeline, this functionality is 
 missing. Beeline doesn't take --hivevar option, and any hive variable isn't 
 passed to server so it cannot be used for substitution.
 This JIRA is to address this issue, providing a backward compatible behavior 
 at Beeline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs

2013-09-17 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4531:
-

Attachment: HIVE-4531-9.patch

bq. o Should this include e2e tests in addition (or instead of unit tests). If 
(when  Hadoop changes the log file format this will break, but Unit tests won't 
catch this since the data that the tests parse is static.
There are e2e test cases inside a separate ticket: HIVE-5078

bq. Here is a bunch of little things/nits:
bq. o Server.java has “ if (enablelog == true  
!TempletonUtils.isset(statusdir)) throw new BadParam(enablelog is only 
applicable when statusdir is set);” in 4 different places. Can this be a 
method?
done

bq. o What is the purpose of Server#misc()?
Should not be there, removed

bq. o TempletonControllerJob: import org.apache.hive.hcatalog.templeton.Main; - 
unused import
done
bq. oo Line 173 - indentation is off?
done
bq. oo Line 295 - writer.close() - This writer is connected to System.err. What 
are the implications of closing this? What if something tries to write to it 
later?
No one after this point is writing to writer. We opened writer, so we need to 
close it in our code.

bq. o TempletonUtils has unused imports - checkstyle needs to be run on the 
whole patch.
done

bq. o TestJobIDParser mixes JUnit3 and JUnit4. It should either not extend 
TestCase (I vote for this) or not use @Test annotations
one

bq. o Can JobIDParser (and all subclasses) be made package scoped since they 
are not used outside templeton pacakge? Similarly, can methods be made as 
private as possible?
done

bq. o JobIDParser#parseJobID() has “fname” param which is not used. What is the 
intent? Should it be used in openStatusFile() call? If not, better to remove it.
we shall use it in openStatusFile(). Fixed.

bq. o JobIDParser#openStatusFile() creas a Reader. Where/when is it being 
closed?
should close in parseJobID. Fixed.

bq. o Could the 2 member variables in JobIDParser be made private (even final)?
I can make them protected, but since they will be used in subclass, so I cannot 
make them private/final

bq. o Why is TestJobIDParser using findJobID() directly? Could it not use 
parseJobID()?
Because parseJobID hardcoded with the standard output file for that parser, 
which is stderr in current directory. In the test, I want to override it to 
test the input file in the test directory 

bq. o Can JobIDParser have 1 line of class level javadoc about the purpose of 
this class?
done


 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog, WebHCat
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, 
 HIVE-4531-8.patch, HIVE-4531-9.patch, samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 11334: HIVE-4568 Beeline needs to support resolving variables

2013-09-17 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11334/#review26181
---



beeline/src/test/org/apache/hive/beeline/src/test/TestBeeLineWithArgs.java
https://reviews.apache.org/r/11334/#comment51141

This approach of setting the arguments going to be hard to read and 
maintain.

Can you do something like  this ? -
replace the use of private final String[] args with a function 
ListString getBaseArgs(String jdbcUrl);

Then add (-f, scriptFileName) to the list it returns ?

Similarly add params to the list in testBeelineCommandLineHiveVariable ?

Everything else looks good.


- Thejas Nair


On Sept. 10, 2013, 9:45 p.m., Xuefu Zhang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11334/
 ---
 
 (Updated Sept. 10, 2013, 9:45 p.m.)
 
 
 Review request for hive and Ashutosh Chauhan.
 
 
 Bugs: HIVE-4568
 https://issues.apache.org/jira/browse/HIVE-4568
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 1. Added command variable substition
 2. Added test case
 
 
 Diffs
 -
 
   beeline/src/java/org/apache/hive/beeline/BeeLine.java 4c6eb9b 
   beeline/src/java/org/apache/hive/beeline/BeeLine.properties b6650cf 
   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 61bdeee 
   beeline/src/java/org/apache/hive/beeline/DatabaseConnection.java c70003d 
   beeline/src/test/org/apache/hive/beeline/src/test/TestBeeLineWithArgs.java 
 4280449 
 
 Diff: https://reviews.apache.org/r/11334/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Xuefu Zhang

[jira] [Commented] (HIVE-4568) Beeline needs to support resolving variables


[ 
https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769798#comment-13769798
 ] 

Thejas M Nair commented on HIVE-4568:
-

Xuefu, I have added a comment on review board.


 Beeline needs to support resolving variables
 

 Key: HIVE-4568
 URL: https://issues.apache.org/jira/browse/HIVE-4568
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-4568-1.patch, HIVE-4568-2.patch, HIVE-4568.3.patch, 
 HIVE-4568.4.patch, HIVE-4568.5.patch, HIVE-4568.6.patch, HIVE-4568.7.patch, 
 HIVE-4568.patch


 Previous Hive CLI allows user to specify hive variables at the command line 
 using option --hivevar. In user's script, reference to a hive variable will 
 be substituted with the value of the variable. In such way, user can 
 parameterize his/her script and invoke the script with different hive 
 variable values. The following script is one usage:
 {code}
 hive --hivevar
  INPUT=/user/jenkins/oozie.1371538916178/examples/input-data/table
  --hivevar
  OUTPUT=/user/jenkins/oozie.1371538916178/examples/output-data/hive
  -f script.q
 {code}
 script.q makes use of hive variables:
 {code}
 CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';
 INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test;
 {code}
 However, after upgrade to hiveserver2 and beeline, this functionality is 
 missing. Beeline doesn't take --hivevar option, and any hive variable isn't 
 passed to server so it cannot be used for substitution.
 This JIRA is to address this issue, providing a backward compatible behavior 
 at Beeline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5070) Need to implement listLocatedStatus() in ProxyFileSystem

2013-09-17 Thread shanyu zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HIVE-5070:
--

Fix Version/s: (was: 0.11.1)
   0.13.0
Affects Version/s: (was: 0.11.0)
   0.12.0
   Status: Patch Available  (was: Open)

 Need to implement listLocatedStatus() in ProxyFileSystem
 

 Key: HIVE-5070
 URL: https://issues.apache.org/jira/browse/HIVE-5070
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.12.0
Reporter: shanyu zhao
 Fix For: 0.13.0

 Attachments: HIVE-5070.patch.txt, HIVE-5070-v2.patch


 MAPREDUCE-1981 introduced a new API for FileSystem - listLocatedStatus. It is 
 used in Hadoop's FileInputFormat.getSplits(). Hive's ProxyFileSystem class 
 needs to implement this API in order to make Hive unit test work.
 Otherwise, you'll see these exceptions when running TestCliDriver test case, 
 e.g. results of running allcolref_in_udf.q:
 [junit] Running org.apache.hadoop.hive.cli.TestCliDriver
 [junit] Begin query: allcolref_in_udf.q
 [junit] java.lang.IllegalArgumentException: Wrong FS: 
 pfile:/GitHub/Monarch/project/hive-monarch/build/ql/test/data/warehouse/src, 
 expected: file:///
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
 [junit]   at 
 org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69)
 [junit]   at 
 org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:375)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1482)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1522)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem$4.init(FileSystem.java:1798)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1797)
 [junit]   at 
 org.apache.hadoop.fs.ChecksumFileSystem.listLocatedStatus(ChecksumFileSystem.java:579)
 [junit]   at 
 org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235)
 [junit]   at 
 org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235)
 [junit]   at 
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
 [junit]   at 
 org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
 [junit]   at 
 org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69)
 [junit]   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:385)
 [junit]   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:351)
 [junit]   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:389)
 [junit]   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503)
 [junit]   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495)
 [junit]   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390)
 [junit]   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
 [junit]   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
 [junit]   at java.security.AccessController.doPrivileged(Native Method)
 [junit]   at javax.security.auth.Subject.doAs(Subject.java:396)
 [junit]   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481)
 [junit]   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
 [junit]   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 [junit]   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:552)
 [junit]   at java.security.AccessController.doPrivileged(Native Method)
 [junit]   at javax.security.auth.Subject.doAs(Subject.java:396)
 [junit]   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481)
 [junit]   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:552)
 [junit]   at 
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:543)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:688)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit]   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit]   at java.lang.reflect.Method.invoke(Method.java:597)

Re: Interesting claims that seem untrue

2013-09-17 Thread Edward Capriolo

Whatever you count, you get more of :)

On Tue, Sep 17, 2013 at 1:57 PM, Konstantin Boudnik c...@apache.org wrote:

Carter,

what you are doing is essentially contradict ASF policy of community over
code.

Perhaps, your intentions are good. However, LOC calculations or other silly
contests are essentially driving a wedge between developers who happen to
draw
their paycheck from different commercial entities. Hadoop community passed
through this already and it caused nothing but despair and bitterness
between
the people.

Unlike some other popular contests, the number of lines contributed doesn't
matter for most. Seriously.

Regards,
Cos

On Mon, Sep 16, 2013 at 01:58PM, Carter Shanklin wrote:
Ed,

https://docs.google.com/spreadsheet/ccc?key=0ArmXd5zzNQm5dDJTMkFtaUk2d0dyU3hnWGJCcUczbXc#gid=0
.
I apologize to those under-represented, there wasn't any intent on my
part
to minimize anyone's work. The impact in final totals is Cloudera +5.4%,
NexR +0.8%, Facebook -2.7%, Hortonworks -3.3%. I will be updating the
blog
later today with relevant corrections.

There is going to be continued interest in seeing charts like these, for
example when Hive 12 is officially done. Sanjay suggested that LoC counts
may not be the best way to represent true contribution. I agree that not
all lines of code are created equal, for example a few monster patches
recently went in re-arranging HCatalog namespaces and I think also
indentation style. This (hopefully) mechanical work is not on the same
footing as adding new query language features. Still it is work and
wouldn't be fair to pretend it didn't happen. If anyone has ideas on
better
ways to fairly capture contribution I'm open to suggestions.

On Thu, Sep 12, 2013 at 7:19 AM, Edward Capriolo edlinuxg...@gmail.com
wrote:

I was reading the horton-works blog and found an interesting article.

http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/#comment-160753

There is a very interesting graphic which attempts to demonstrate
lines of
code in the 12 release.
http://hortonworks.com/wp-content/uploads/2013/09/hive4.png

Although I do not know how they are calculated, they are probably
counting
code generated by tests output, but besides that they are wrong.

One claim is that Cloudera contributed 4,244 lines of code.

So to debunk that claim:

In https://issues.apache.org/jira/browse/HIVE-4675 Brock Noland from
cloudera, created the ptest2 testing framework. He did all the work for
ptest2 in hive 12, and it is clearly more then 4,244

This consists of 84 java files
[edward@desksandra ptest2]$ find . -name *.java | wc -l
84
and by itself is 8001 lines of code.
[edward@desksandra ptest2]$ find . -name *.java | xargs cat | wc -l
8001

[edward@desksandra hive-trunk]$ wc -l HIVE-4675.patch
7902 HIVE-4675.patch

This is not the only feature from cloudera in hive 12.

There is also a section of the article that talks of a ROAD MAP for
hive
features. I did not know we (hive) had a road map. I have advocated
switching to feature based release and having a road map before, but
it was
suggested that might limit people from itch-scratching.

--
Carter Shanklin
Director, Product Management
Hortonworks
(M): +1.650.644.8795 (T): @cshanklin http://twitter.com/cshanklin

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified
that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately
and delete it from your system. Thank You.

[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-17 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769712#comment-13769712
 ] 

Prasanth J commented on HIVE-4113:
--

HIVE-4340 will expose ORC stats through reader interfaces which can be used for 
optimizing count(*).

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4568) Beeline needs to support resolving variables


[ 
https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769718#comment-13769718
 ] 

Xuefu Zhang commented on HIVE-4568:
---

[~thejas] [~appodictic] [~ashutoshc] I'm wondering if any of you have cycle to 
review the patch. It has been pending for quite some time. Let me know if you 
have any questions. Thanks. 

 Beeline needs to support resolving variables
 

 Key: HIVE-4568
 URL: https://issues.apache.org/jira/browse/HIVE-4568
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-4568-1.patch, HIVE-4568-2.patch, HIVE-4568.3.patch, 
 HIVE-4568.4.patch, HIVE-4568.5.patch, HIVE-4568.6.patch, HIVE-4568.7.patch, 
 HIVE-4568.patch


 Previous Hive CLI allows user to specify hive variables at the command line 
 using option --hivevar. In user's script, reference to a hive variable will 
 be substituted with the value of the variable. In such way, user can 
 parameterize his/her script and invoke the script with different hive 
 variable values. The following script is one usage:
 {code}
 hive --hivevar
  INPUT=/user/jenkins/oozie.1371538916178/examples/input-data/table
  --hivevar
  OUTPUT=/user/jenkins/oozie.1371538916178/examples/output-data/hive
  -f script.q
 {code}
 script.q makes use of hive variables:
 {code}
 CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';
 INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test;
 {code}
 However, after upgrade to hiveserver2 and beeline, this functionality is 
 missing. Beeline doesn't take --hivevar option, and any hive variable isn't 
 passed to server so it cannot be used for substitution.
 This JIRA is to address this issue, providing a backward compatible behavior 
 at Beeline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs


 [ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-4531:
-

Component/s: WebHCat

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog, WebHCat
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, 
 HIVE-4531-8.patch, samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-17 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769716#comment-13769716
 ] 

Prasanth J commented on HIVE-4113:
--

Sorry. Please ignore that comment. Row count interface already exists in ORC 
reader. HIVE-4340 is not relevant for this JIRA.

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode

2013-09-17 Thread Eric Hanson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4961:
--

Attachment: HIVE-4961.4-vectorization.patch

Refactor packages per request from Ashutosh.

 Create bridge for custom UDFs to operate in vectorized mode
 ---

 Key: HIVE-4961
 URL: https://issues.apache.org/jira/browse/HIVE-4961
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Eric Hanson
 Fix For: vectorization-branch

 Attachments: HIVE-4961.1-vectorization.patch, 
 HIVE-4961.2-vectorization.patch, HIVE-4961.3-vectorization.patch, 
 HIVE-4961.4-vectorization.patch, vectorUDF.4.patch, vectorUDF.5.patch, 
 vectorUDF.8.patch, vectorUDF.9.patch


 Suppose you have a custom UDF myUDF() that you've created to extend hive. The 
 goal of this JIRA is to create a facility where if you run a query that uses 
 myUDF() in an expression, the query will run in vectorized mode.
 This would be a general-purpose bridge for custom UDFs that users add to 
 Hive. It would work with existing UDFs.
 I'm considering a separate JIRA for a new kind of custom UDF implementation 
 that is vectorized from the beginning, to optimize performance. That is not 
 covered by this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5156) HiveServer2 jdbc ResultSet.close should free up resources on server side

2013-09-17 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-5156:
---

Attachment: HIVE-5156.D12837.3.patch

 HiveServer2 jdbc ResultSet.close should free up resources on server side
 

 Key: HIVE-5156
 URL: https://issues.apache.org/jira/browse/HIVE-5156
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Minor
 Attachments: HIVE-5156.D12837.3.patch


 ResultSet.close does not free up any resources (tmp files etc) on hive server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5206) Support parameterized primitive types


[ 
https://issues.apache.org/jira/browse/HIVE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769889#comment-13769889
 ] 

Thejas M Nair commented on HIVE-5206:
-

Patch committed to 0.12 branch


 Support parameterized primitive types
 -

 Key: HIVE-5206
 URL: https://issues.apache.org/jira/browse/HIVE-5206
 Project: Hive
  Issue Type: Improvement
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.12.0

 Attachments: HIVE-5206.1.patch, HIVE-5206.2.patch, HIVE-5206.3.patch, 
 HIVE-5206.4.patch, HIVE-5206.D12693.1.patch, HIVE-5206.v12.1.patch


 Support for parameterized types is needed for char/varchar/decimal support. 
 This adds a type parameters value to the 
 PrimitiveTypeEntry/PrimitiveTypeInfo/PrimitiveObjectInspector objects. 
 NO PRECOMMIT TESTS - dependent on HIVE-5203/HIVE-5204

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5206) Support parameterized primitive types


 [ 
https://issues.apache.org/jira/browse/HIVE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5206:


Fix Version/s: (was: 0.13.0)
   0.12.0

 Support parameterized primitive types
 -

 Key: HIVE-5206
 URL: https://issues.apache.org/jira/browse/HIVE-5206
 Project: Hive
  Issue Type: Improvement
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.12.0

 Attachments: HIVE-5206.1.patch, HIVE-5206.2.patch, HIVE-5206.3.patch, 
 HIVE-5206.4.patch, HIVE-5206.D12693.1.patch, HIVE-5206.v12.1.patch


 Support for parameterized types is needed for char/varchar/decimal support. 
 This adds a type parameters value to the 
 PrimitiveTypeEntry/PrimitiveTypeInfo/PrimitiveObjectInspector objects. 
 NO PRECOMMIT TESTS - dependent on HIVE-5203/HIVE-5204

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5278) Move some string UDFs to GenericUDFs, for better varchar support


[ 
https://issues.apache.org/jira/browse/HIVE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769892#comment-13769892
 ] 

Thejas M Nair commented on HIVE-5278:
-

Patch committed to 0.12 branch.

 Move some string UDFs to GenericUDFs, for better varchar support
 

 Key: HIVE-5278
 URL: https://issues.apache.org/jira/browse/HIVE-5278
 Project: Hive
  Issue Type: Improvement
  Components: Types, UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.12.0

 Attachments: D12909.1.patch, HIVE-5278.1.patch, HIVE-5278.2.patch, 
 HIVE-5278.v12.1.patch


 To better support varchar/char types in string UDFs, select UDFs should be 
 converted to GenericUDFs. This allows the UDF to return the resulting 
 char/varchar length in the type metadata.
 This work is being split off as a separate task from HIVE-4844. The initial 
 UDFs as part of this work are concat/lower/upper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5278) Move some string UDFs to GenericUDFs, for better varchar support


 [ 
https://issues.apache.org/jira/browse/HIVE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5278:


Fix Version/s: (was: 0.13.0)
   0.12.0

 Move some string UDFs to GenericUDFs, for better varchar support
 

 Key: HIVE-5278
 URL: https://issues.apache.org/jira/browse/HIVE-5278
 Project: Hive
  Issue Type: Improvement
  Components: Types, UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.12.0

 Attachments: D12909.1.patch, HIVE-5278.1.patch, HIVE-5278.2.patch, 
 HIVE-5278.v12.1.patch


 To better support varchar/char types in string UDFs, select UDFs should be 
 converted to GenericUDFs. This allows the UDF to return the resulting 
 char/varchar length in the type metadata.
 This work is being split off as a separate task from HIVE-4844. The initial 
 UDFs as part of this work are concat/lower/upper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5161) Additional SerDe support for varchar type


 [ 
https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5161:


Fix Version/s: (was: 0.13.0)
   0.12.0

 Additional SerDe support for varchar type
 -

 Key: HIVE-5161
 URL: https://issues.apache.org/jira/browse/HIVE-5161
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers, Types
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.12.0

 Attachments: D12897.1.patch, HIVE-5161.1.patch, HIVE-5161.2.patch, 
 HIVE-5161.3.patch, HIVE-5161.v12.1.patch


 Breaking out support for varchar for the various SerDes as an additional task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Interesting claims that seem untrue

2013-09-17 Thread Lefty Leverenz

Whatever you count, you get more of :)

Then let's count lines of documentation! ;)

-- Lefty

On Tue, Sep 17, 2013 at 12:15 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

Whatever you count, you get more of :)

On Tue, Sep 17, 2013 at 1:57 PM, Konstantin Boudnik c...@apache.orgwrote:

Carter,

what you are doing is essentially contradict ASF policy of community over
code.

Perhaps, your intentions are good. However, LOC calculations or other
silly
contests are essentially driving a wedge between developers who happen to
draw
their paycheck from different commercial entities. Hadoop community passed
through this already and it caused nothing but despair and bitterness
between
the people.

Unlike some other popular contests, the number of lines contributed
doesn't
matter for most. Seriously.

Regards,
Cos

On Mon, Sep 16, 2013 at 01:58PM, Carter Shanklin wrote:
Ed,

There is going to be continued interest in seeing charts like these, for
example when Hive 12 is officially done. Sanjay suggested that LoC
counts
may not be the best way to represent true contribution. I agree that not
all lines of code are created equal, for example a few monster patches
recently went in re-arranging HCatalog namespaces and I think also
indentation style. This (hopefully) mechanical work is not on the same
footing as adding new query language features. Still it is work and
wouldn't be fair to pretend it didn't happen. If anyone has ideas on
better
ways to fairly capture contribution I'm open to suggestions.

On Thu, Sep 12, 2013 at 7:19 AM, Edward Capriolo edlinuxg...@gmail.com
wrote:

I was reading the horton-works blog and found an interesting article.

http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/#comment-160753

There is a very interesting graphic which attempts to demonstrate
lines of
code in the 12 release.
http://hortonworks.com/wp-content/uploads/2013/09/hive4.png

Although I do not know how they are calculated, they are probably
counting
code generated by tests output, but besides that they are wrong.

One claim is that Cloudera contributed 4,244 lines of code.

So to debunk that claim:

In https://issues.apache.org/jira/browse/HIVE-4675 Brock Noland from
cloudera, created the ptest2 testing framework. He did all the work
for
ptest2 in hive 12, and it is clearly more then 4,244

This consists of 84 java files
[edward@desksandra ptest2]$ find . -name *.java | wc -l
84
and by itself is 8001 lines of code.
[edward@desksandra ptest2]$ find . -name *.java | xargs cat | wc -l
8001

[edward@desksandra hive-trunk]$ wc -l HIVE-4675.patch
7902 HIVE-4675.patch

This is not the only feature from cloudera in hive 12.

There is also a section of the article that talks of a ROAD MAP for
hive
features. I did not know we (hive) had a road map. I have advocated
switching to feature based release and having a road map before, but
it was
suggested that might limit people from itch-scratching.

--
Carter Shanklin
Director, Product Management
Hortonworks
(M): +1.650.644.8795 (T): @cshanklin http://twitter.com/cshanklin

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the
reader
of this message is not the intended recipient, you are hereby notified
that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have

[jira] [Updated] (HIVE-5086) Fix scriptfile1.q on Windows

2013-09-17 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-5086:
-

Attachment: HIVE-5086-2.patch

Fixed unit test failure.

 Fix scriptfile1.q on Windows
 

 Key: HIVE-5086
 URL: https://issues.apache.org/jira/browse/HIVE-5086
 Project: Hive
  Issue Type: Bug
  Components: Tests, Windows
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-5086-1.patch, HIVE-5086-2.patch


 Test failed with error message:
 [junit] Task with the most failures(4): 
 [junit] -
 [junit] Task ID:
 [junit]   task_20130814023904691_0001_m_00
 [junit] 
 [junit] URL:
 [junit]   
 http://localhost:50030/taskdetails.jsp?jobid=job_20130814023904691_0001tipid=task_20130814023904691_0001_m_00
 [junit] -
 [junit] Diagnostic Messages for this Task:
 [junit] java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row {key:238,value:val_238}
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
 [junit]   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 [junit]   at 
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
 [junit]   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
 [junit]   at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
 [junit]   at java.security.AccessController.doPrivileged(Native Method)
 [junit]   at javax.security.auth.Subject.doAs(Subject.java:396)
 [junit]   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
 [junit]   at org.apache.hadoop.mapred.Child.main(Child.java:265)
 [junit] Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
 Runtime Error while processing row {key:238,value:val_238}
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:538)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
 [junit]   ... 8 more
 [junit] Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 [Error 2]: Unable to initialize custom script.
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:357)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:848)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:848)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:848)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:528)
 [junit]   ... 9 more
 [junit] Caused by: java.io.IOException: Cannot run program 
 D:\tmp\hadoop-Administrator\mapred\local\3_0\taskTracker\Administrator\jobcache\job_20130814023904691_0001\attempt_20130814023904691_0001_m_00_3\work\.\testgrep:
  CreateProcess error=193, %1 is not a valid Win32 application
 [junit]   at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:316)
 [junit]   ... 18 more
 [junit] Caused by: java.io.IOException: CreateProcess error=193, %1 is 
 not a valid Win32 application
 [junit]   at java.lang.ProcessImpl.create(Native Method)
 [junit]   at java.lang.ProcessImpl.init(ProcessImpl.java:81)
 [junit]   at java.lang.ProcessImpl.start(ProcessImpl.java:30)
 [junit]   at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
 [junit]   ... 19 more
 [junit] 
 [junit] 
 [junit] Exception: Client Execution failed with error code = 2
 [junit] See build/ql/tmp/hive.log, or try ant test ... 
 -Dtest.silent=false to get more logs.
 [junit] junit.framework.AssertionFailedError: Client Execution failed 
 with error code = 2
 [junit] See build/ql/tmp/hive.log, or try ant test ... 
 -Dtest.silent=false to get more logs.
 [junit]   at junit.framework.Assert.fail(Assert.java:47)
 [junit]   at 
 org.apache.hadoop.hive.cli.TestMinimrCliDriver.runTest(TestMinimrCliDriver.java:122)
 [junit]   at 
 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_scriptfile1(TestMinimrCliDriver.java:104)
 [junit]   at

[jira] [Updated] (HIVE-4844) Add varchar data type


 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4844:


Fix Version/s: (was: 0.13.0)
   0.12.0

 Add varchar data type
 -

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.12.0

 Attachments: HIVE-4844.10.patch, HIVE-4844.11.patch, 
 HIVE-4844.12.patch, HIVE-4844.13.patch, HIVE-4844.14.patch, 
 HIVE-4844.15.patch, HIVE-4844.16.patch, HIVE-4844.17.patch, 
 HIVE-4844.18.patch, HIVE-4844.19.patch, HIVE-4844.1.patch.hack, 
 HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, 
 HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, 
 HIVE-4844.D12699.1.patch, HIVE-4844.D12891.1.patch, HIVE-4844.v12.1.patch, 
 screenshot.png


 Add new varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.
 Char type will be added as another task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5084) Fix newline.q on Windows


[ 
https://issues.apache.org/jira/browse/HIVE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769914#comment-13769914
 ] 

Thejas M Nair commented on HIVE-5084:
-

Patch committed to 0.12 branch.

 Fix newline.q on Windows
 

 Key: HIVE-5084
 URL: https://issues.apache.org/jira/browse/HIVE-5084
 Project: Hive
  Issue Type: Bug
  Components: Tests, Windows
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-5084-1.patch


 Test failed with vague error message:
 [junit] Error during job, obtaining debugging information...
 [junit] junit.framework.AssertionFailedError: Client Execution failed 
 with error code = 2
 hive.log doesn't show something interesting either:
 2013-08-14 00:47:29,411 DEBUG zookeeper.ClientCnxn 
 (ClientCnxn.java:readResponse(723)) - Got ping response for sessionid: 
 0x1407a49fc1e0003 after 1ms
 2013-08-14 00:47:31,391 ERROR exec.Task (SessionState.java:printError(416)) - 
 Execution failed with exit status: 2
 2013-08-14 00:47:31,391 ERROR exec.Task (SessionState.java:printError(416)) - 
 Obtaining error information
 2013-08-14 00:47:31,392 ERROR exec.Task (SessionState.java:printError(416)) - 
 Task failed!
 Task ID:
   Stage-1
 Logs:

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5301) Add a schema tool for offline metastore schema upgrade


[ 
https://issues.apache.org/jira/browse/HIVE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769919#comment-13769919
 ] 

Ashutosh Chauhan commented on HIVE-5301:


[~prasadm] Can you create RB or phabricator link for this?

 Add a schema tool for offline metastore schema upgrade
 --

 Key: HIVE-5301
 URL: https://issues.apache.org/jira/browse/HIVE-5301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-5301.1.patch, HIVE-5301-with-HIVE-3764.0.patch


 HIVE-3764 is addressing metastore version consistency.
 Besides it would be helpful to add a tool that can leverage this version 
 information to figure out the required set of upgrade scripts, and execute 
 those against the configured metastore. Now that Hive includes Beeline 
 client, it can be used to execute the scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 14169: HIVE-3764: Support metastore version consistency check

2013-09-17 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14169/#review26182
---


Mostly looks good. Some comments.


metastore/scripts/upgrade/derby/hive-schema-0.12.0.derby.sql
https://reviews.apache.org/r/14169/#comment51142

Name 'comment' has caused problems previously. I will suggest to name it 
VERSION_COMMENT, VCOMMENT or any other variation of it.



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java
https://reviews.apache.org/r/14169/#comment51143

Looks like this line can be removed.



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java
https://reviews.apache.org/r/14169/#comment51144

typo



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java
https://reviews.apache.org/r/14169/#comment51145

Can you name this variable version. I got confused thinking curVersion 
implies current version of jars (which was incorrect)



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java
https://reviews.apache.org/r/14169/#comment51146

Will be good to do currVersion.trim() here.



metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/14169/#comment51147

Can you add a comment why we need to do a recheck? Seems like its not 
necessary.



metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/14169/#comment51148

Should this be if(strictValidation  ... ) 


- Ashutosh Chauhan


On Sept. 17, 2013, 6:13 a.m., Prasad Mujumdar wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/14169/
 ---
 
 (Updated Sept. 17, 2013, 6:13 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Brock Noland.
 
 
 Bugs: HIVE-3764
 https://issues.apache.org/jira/browse/HIVE-3764
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a 0.12 specific patch. The trunk patch will include additional 
 metastore scripts which I will attach separately to the ticket.
 
 - Added a new table in the metastore schema to store the Hive version in the 
 metastore.
 - Metastore handler compare the version stored in the schema with its own 
 version. If there's a mismatch, then it can either record the correct version 
 or raise error. The behavior is configurable via a new Hive config. This 
 config when set, also restrict dataNucleus to auto upgrade the schema.
 - The new schema creation and upgrade scripts record the new version in the 
 metastore version table.
 - Added 0.12 upgrade scripts for all supported DBs to creates the new table 
 version tables in 0.12 metastore schema
 
 The current patch has the verification turned off by default. I would prefer 
 to keep it enabled, though it require any add-hoc setup to explicitly disable 
 it (or create the metastore schema by running scripts). The default can be 
 changed or left as is as per the consensus. 
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22149e4 
   conf/hive-default.xml.template 9a3fc1d 
   metastore/scripts/upgrade/derby/014-HIVE-3764.derby.sql PRE-CREATION 
   metastore/scripts/upgrade/derby/hive-schema-0.12.0.derby.sql cce544f 
   metastore/scripts/upgrade/derby/upgrade-0.10.0-to-0.11.0.derby.sql cae7936 
   metastore/scripts/upgrade/derby/upgrade-0.11.0-to-0.12.0.derby.sql 492cc93 
   metastore/scripts/upgrade/derby/upgrade.order.derby PRE-CREATION 
   metastore/scripts/upgrade/mysql/014-HIVE-3764.mysql.sql PRE-CREATION 
   metastore/scripts/upgrade/mysql/hive-schema-0.12.0.mysql.sql 22a77fe 
   metastore/scripts/upgrade/mysql/upgrade-0.11.0-to-0.12.0.mysql.sql 375a05f 
   metastore/scripts/upgrade/mysql/upgrade.order.mysql PRE-CREATION 
   metastore/scripts/upgrade/oracle/014-HIVE-3764.oracle.sql PRE-CREATION 
   metastore/scripts/upgrade/oracle/hive-schema-0.12.0.oracle.sql 85a0178 
   metastore/scripts/upgrade/oracle/upgrade-0.10.0-to-0.11.0.mysql.sql 
 PRE-CREATION 
   metastore/scripts/upgrade/oracle/upgrade-0.11.0-to-0.12.0.oracle.sql 
 a2d0901 
   metastore/scripts/upgrade/oracle/upgrade.order.oracle PRE-CREATION 
   metastore/scripts/upgrade/postgres/014-HIVE-3764.postgres.sql PRE-CREATION 
   metastore/scripts/upgrade/postgres/hive-schema-0.12.0.postgres.sql 7b319ba 
   metastore/scripts/upgrade/postgres/upgrade-0.11.0-to-0.12.0.postgres.sql 
 9da0a1b 
   metastore/scripts/upgrade/postgres/upgrade.order.postgres PRE-CREATION 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 39dda92 
   
 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
 PRE-CREATION 
   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 a27243d

[jira] [Updated] (HIVE-5084) Fix newline.q on Windows