[jira] [Commented] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION by better way.

2013-09-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771625#comment-13771625
 ] 

Hive QA commented on HIVE-5315:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603915/HIVE-5315.patch

{color:red}ERROR:{color} -1 due to 91 failed/errored test(s), 129 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables
org.apache.hive.beeline.src.test.TestBeeLineWithArgs.testPositiveScriptFile
org.apache.hive.jdbc.TestJdbcDriver2.testBadURL
org.apache.hive.jdbc.TestJdbcDriver2.testBuiltInUDFCol
org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes
org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes2
org.apache.hive.jdbc.TestJdbcDriver2.testDatabaseMetaData
org.apache.hive.jdbc.TestJdbcDriver2.testDescribeTable
org.apache.hive.jdbc.TestJdbcDriver2.testDriverProperties
org.apache.hive.jdbc.TestJdbcDriver2.testDuplicateColumnNameOrder
org.apache.hive.jdbc.TestJdbcDriver2.testErrorDiag
org.apache.hive.jdbc.TestJdbcDriver2.testErrorMessages
org.apache.hive.jdbc.TestJdbcDriver2.testExecutePreparedStatement
org.apache.hive.jdbc.TestJdbcDriver2.testExecuteQueryException
org.apache.hive.jdbc.TestJdbcDriver2.testExplainStmt
org.apache.hive.jdbc.TestJdbcDriver2.testExprCol
org.apache.hive.jdbc.TestJdbcDriver2.testImportedKeys

[jira] [Updated] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION by better way.

2013-09-19 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-5315:
-

Status: Open  (was: Patch Available)

 bin/hive should retrieve HADOOP_VERSION by better way.
 --

 Key: HIVE-5315
 URL: https://issues.apache.org/jira/browse/HIVE-5315
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Kousuke Saruta
 Fix For: 0.11.1

 Attachments: HIVE-5315.patch


 In current implementation, bin/hive retrieves HADOOP_VERSION like as follows
 {code}
 HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
 {code}
 But, sometimes, hadoop version doesn't show version information at the 
 first line.
 If HADOOP_VERSION is not retrieve collectly, Hive or related processes will 
 not be up.
 I faced this situation when I try to debug Hiveserver2 with debug option like 
 as follows 
 {code}
 -Xdebug -Xrunjdwp:trunsport=dt_socket,suspend=n,server=y,address=9876
 {code}
 Then, hadoop version shows -Xdebug... at the first line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5310) commit futuama_episodes

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771628#comment-13771628
 ] 

Hudson commented on HIVE-5310:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2340 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2340/])
HIVE-5310 futurama-episodes (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524448)
* /hive/trunk/data/files/futurama_episodes.avro


 commit futuama_episodes
 ---

 Key: HIVE-5310
 URL: https://issues.apache.org/jira/browse/HIVE-5310
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 This is a small binary file that will be used for trevni. We can run the 
 pre-commit build if this is committed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5166) TestWebHCatE2e is failing intermittently on trunk

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771627#comment-13771627
 ] 

Hudson commented on HIVE-5166:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2340 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2340/])
HIVE-5166 : TestWebHCatE2e is failing intermittently on trunk (Eugene Koifman 
via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524441)
* 
/hive/trunk/hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/TestWebHCatE2e.java


 TestWebHCatE2e is failing intermittently on trunk
 -

 Key: HIVE-5166
 URL: https://issues.apache.org/jira/browse/HIVE-5166
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Eugene Koifman
 Fix For: 0.13.0

 Attachments: HIVE-5166.patch


 I observed these while running full test suite last couple of times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5318) Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10

2013-09-19 Thread Brad Ruderman (JIRA)
Brad Ruderman created HIVE-5318:
---

 Summary: Import Throws Error when Importing from a table export 
Hive 0.9 to Hive 0.10
 Key: HIVE-5318
 URL: https://issues.apache.org/jira/browse/HIVE-5318
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Affects Versions: 0.10.0, 0.9.0
Reporter: Brad Ruderman
Priority: Critical


When Exporting hive tables using the hive command in Hive 0.9 EXPORT table TO 
'hdfs_path' then importing to another hive 0.10 instance using IMPORT FROM 
'hdfs_path', hive throws this error:


13/09/18 13:14:02 ERROR ql.Driver: FAILED: SemanticException Exception while 
processing
org.apache.hadoop.hive.ql.parse.SemanticException: Exception while processing
at 
org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:277)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.NullPointerException
at java.util.ArrayList.init(ArrayList.java:131)
at 
org.apache.hadoop.hive.ql.plan.CreateTableDesc.init(CreateTableDesc.java:128)
at 
org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:99)
... 16 more

13/09/18 13:14:02 INFO ql.Driver: /PERFLOG method=compile start=1379535241411 
end=1379535242332 duration=921
13/09/18 13:14:02 INFO ql.Driver: PERFLOG method=releaseLocks
13/09/18 13:14:02 INFO ql.Driver: /PERFLOG method=releaseLocks 
start=1379535242332 end=1379535242332 duration=0
13/09/18 13:14:02 INFO ql.Driver: PERFLOG method=releaseLocks
13/09/18 13:14:02 INFO ql.Driver: /PERFLOG method=releaseLocks 
start=1379535242333 end=1379535242333 duration=0


This is probably a critical blocker for people who are trying to test Hive 0.10 
in their staging environments prior to the upgrade from 0.9

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: How long will we support Hadoop 0.20.2?

2013-09-19 Thread Thejas Nair
How is 0.20.2 more easy for setup/development than the stable 1.x line ?


On Wed, Sep 18, 2013 at 7:18 PM, Xuefu Zhang xzh...@cloudera.com wrote:
 Even not for production, I think 0.20.2 is very useful for development as
 well. it's simple and easy to set up, avoiding a lot of hassles that we
 don't have to deal with during development. Thus, I think it makes sense to
 keep supporting it, especially when there isn't much cost involved.

 --Xuefu


 On Wed, Sep 18, 2013 at 7:06 PM, Ashish Thusoo athu...@qubole.com wrote:

 +1 on what Ed said.

 I think 0.20.2 is still very real. Would be a bummer if we do not support
 it as a lot of companies are still on that version.

 Ashish

 Ashish Thusoo http://www.linkedin.com/pub/ashish-thusoo/0/5a8/50
 CEO and Co-founder,
 Qubole http://www.qubole.com - a cloud based service that makes big data
 easy for analysts and data engineers



 On Wed, Sep 18, 2013 at 5:57 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

  BTW: I am very likely to install hive 0.12 on hadoop 0.20.2 clusters. I
  have been running hive since version 0.2. I have been running hadoop
 since
  version 0.17.2. After 0.17.2 I moved to 0.20.2. Since then the hadoop has
  seemingly has 10's of releases. 0.21, 0.21.append (Dead on arrival) .
  cloudera this, cloudera that, yahoo hadoop distribution (dead on
 arrival),
  0.20.2.203 0.20.2.205, 1? 2.0? 2.1. None of them really have much shelf
  life or a very clear upgrade path.
 
  The only thing that has remained constant for our environment is hive and
  hadoop 0.20.2. I have been happily just upgrading hive on these clusters
  for years now.
 
  So in a nutshell, I'm a long time committer, and I actively support and
  develop hive on hadoop 0.20.2 clusters, I do not see supporting the shims
  as complicated or difficult.
 
 
 
  On Wed, Sep 18, 2013 at 7:02 PM, Owen O'Malley omal...@apache.org
 wrote:
 
   On Wed, Sep 18, 2013 at 1:54 PM, Edward Capriolo 
 edlinuxg...@gmail.com
   wrote:
  
I am not fine with dropping it. I still run it in several places.
   
  
   The question is not whether you run Hadoop 0.20.2, but whether you are
   likely to install Hive 0.12 on those very old clusters.
  
  
   
Believe it or now many people still run 0.20.2. I believe (correct me
  If
   I
am wrong) facebook is still running a heavily patch 0.20.2.
   
  
   It is more accurate to say that Facebook is running a fork of Hadoop
  where
   the last common point was Hadoop 0.20.1. I haven't heard anyone (other
  than
   you in this thread) say they are running 0.20.2 in years.
  
  
I could see dropping 0.20.2 if it was a huge burden but I do not see
 it
that way, it work's it is reliable, and it is a known quantity.
   
  
   It is a large burden in that we have relatively complicated shims and a
   lack of testing. Unless you are signing up to test every release on
  0.20.2
   we don't have anyone doing the relevant testing.
  
   -- Owen
  
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771652#comment-13771652
 ] 

Ashutosh Chauhan commented on HIVE-4113:


Thanks, [~yhuai] for taking this one up.
Its a known existing problem that predicate pushdown doesn't happen for 
HCatalog today. I will say that if it is getting burdensome, we can tackle that 
in a separate jira. 
I am fine with removing flag for column pruning. Its been around for a long 
time ( HIVE-279 ) and I haven't come across a case where user has run into 
problem with it.
I didn't get your comment about READ_ALL_COLUMNS_DEFAULT. If we set it to true, 
will that imply that this optimization will be off by default, that seems like 
a bad choice. In HCatInputFormat, we can probably set the config such that it 
always select all columns for now. That way Hive will still get the benefit of 
optimization and hcatalog will continue with what it is doing today. 

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
 HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5209) JDBC support for varchar

2013-09-19 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771659#comment-13771659
 ] 

Phabricator commented on HIVE-5209:
---

thejas has commented on the revision HIVE-5209 [jira] JDBC support for 
varchar.

INLINE COMMENTS
  jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java:32 TableSchema was 
used prior to this patch, but since the classes it uses have changed, it is 
possible that there is a new dependency.
  But if this patch isn't changing the situation, we can fix that separately.
  I think the best next step would be to test what are the hive client 
dependencies with and without the patch.
  In a separate jira, I think we should start looking at creating a 
service-core module that jdbc classes use, instead of having the whole service 
module which includes the server pieces as a dependency.

REVISION DETAIL
  https://reviews.facebook.net/D12999

To: JIRA, jdere
Cc: cwsteinbach, thejas


 JDBC support for varchar
 

 Key: HIVE-5209
 URL: https://issues.apache.org/jira/browse/HIVE-5209
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, JDBC, Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: D12999.1.patch, HIVE-5209.1.patch, HIVE-5209.2.patch, 
 HIVE-5209.4.patch, HIVE-5209.D12705.1.patch


 Support returning varchar length in result set metadata

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3764) Support metastore version consistency check

2013-09-19 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-3764:
--

Attachment: HIVE-3764-12.2.patch

Rebased patch for 0.12

 Support metastore version consistency check
 ---

 Key: HIVE-3764
 URL: https://issues.apache.org/jira/browse/HIVE-3764
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-3764-12.2.patch, HIVE-3764.1.patch, 
 HIVE-3764.2.patch


 Today there's no version/compatibility information stored in hive metastore. 
 Also the datanucleus configuration property to automatically create missing 
 tables is enabled by default. If you happen to start an older or newer hive 
 or don't run the correct upgrade scripts during migration, the metastore 
 would end up corrupted. The autoCreate schema is not always sufficient to 
 upgrade metastore when migrating to newer release. It's not supported with 
 all databases. Besides the migration often involves altering existing table, 
 changing or moving data etc.
 Hence it's very useful to have some consistency check to make sure that hive 
 is using correct metastore and for production systems the schema is not 
 automatically by running hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3764) Support metastore version consistency check

2013-09-19 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-3764:
--

Attachment: HIVE-3764-trunk.2.patch

Rebased patch for trunk

 Support metastore version consistency check
 ---

 Key: HIVE-3764
 URL: https://issues.apache.org/jira/browse/HIVE-3764
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-3764-12.2.patch, HIVE-3764.1.patch, 
 HIVE-3764.2.patch, HIVE-3764-trunk.2.patch


 Today there's no version/compatibility information stored in hive metastore. 
 Also the datanucleus configuration property to automatically create missing 
 tables is enabled by default. If you happen to start an older or newer hive 
 or don't run the correct upgrade scripts during migration, the metastore 
 would end up corrupted. The autoCreate schema is not always sufficient to 
 upgrade metastore when migrating to newer release. It's not supported with 
 all databases. Besides the migration often involves altering existing table, 
 changing or moving data etc.
 Hence it's very useful to have some consistency check to make sure that hive 
 is using correct metastore and for production systems the schema is not 
 automatically by running hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5301) Add a schema tool for offline metastore schema upgrade

2013-09-19 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771664#comment-13771664
 ] 

Prasad Mujumdar commented on HIVE-5301:
---

[~ashutoshc] Agreed.
I have tested extensively with Derby and MySQL with 0.10. Did test the upgrade 
of empty schema (generated using this tool) as well. But the 0.7 to 0.8 upgrade 
is more complex due to data move which is not tested by that. I will setup 0.7 
with test data and verify the upgrade options. will attach the output of the 
tests.


 Add a schema tool for offline metastore schema upgrade
 --

 Key: HIVE-5301
 URL: https://issues.apache.org/jira/browse/HIVE-5301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-5301.1.patch, HIVE-5301-with-HIVE-3764.0.patch


 HIVE-3764 is addressing metastore version consistency.
 Besides it would be helpful to add a tool that can leverage this version 
 information to figure out the required set of upgrade scripts, and execute 
 those against the configured metastore. Now that Hive includes Beeline 
 client, it can be used to execute the scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771672#comment-13771672
 ] 

Hudson commented on HIVE-5198:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #440 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/440/])
HIVE-5198: WebHCat returns exitcode 143 (w/o an explanation) (Eugene Koifman 
via Thejas Nair) (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524617)
* 
/hive/trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/ExecServiceImpl.java


 WebHCat returns exitcode 143 (w/o an explanation)
 -

 Key: HIVE-5198
 URL: https://issues.apache.org/jira/browse/HIVE-5198
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.11.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5198.patch


 The message might look like this:
 {statement:use default; show table extended like xyz;,error:unable to 
 show table: xyz,exec:{stdout:,stderr:,exitcode:143}} 
 WebHCat has a templeton.exec.timeout property which kills an HCat request 
 (i.e. something like a DDL statement that gets routed to HCat CLI) if it 
 takes longer than this timeout.
 Since WebHCat does a fork/exec to 'hcat' script, the timeout is implemented 
 as SIGTERM sent to the subprocess.  SIGTERM value is 15.  So it's reported as 
 128 + 15 = 143.
 Error logging/reporting should be improved in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771678#comment-13771678
 ] 

Thejas M Nair commented on HIVE-4487:
-

I am seeing several precommit intermittent test failures in last few builds, 
which seem to be caused by permission errors. I am wondering if it might be 
related to this change. I also saw this on my linux machine, but not in another 
run on my mac.

The tests have errors like this -
Copying data from 
file:/home/hiveptest/ip-10-74-50-170-hiveptest-2/apache-svn-trunk-source/data/files/kv1.txt
Failed with exception Failed to set permissions of path: 
/home/hiveptest/ip-10-74-50-170-hiveptest-2/apache-svn-trunk-source/build/ql/scratchdir/hive_2013-09-18_19-22-30_852_73877859563099-1/-ext-1
 to 0777

For example in - 
https://builds.apache.org/job/PreCommit-HIVE-Build/813/testReport/org.apache.hadoop.hive.ql.parse/TestParseNegative/testParseNegative_ambiguous_join_col/


 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3764) Support metastore version consistency check

2013-09-19 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-3764:
--

Attachment: (was: HIVE-3764-trunk.2.patch)

 Support metastore version consistency check
 ---

 Key: HIVE-3764
 URL: https://issues.apache.org/jira/browse/HIVE-3764
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-3764.1.patch, HIVE-3764.2.patch


 Today there's no version/compatibility information stored in hive metastore. 
 Also the datanucleus configuration property to automatically create missing 
 tables is enabled by default. If you happen to start an older or newer hive 
 or don't run the correct upgrade scripts during migration, the metastore 
 would end up corrupted. The autoCreate schema is not always sufficient to 
 upgrade metastore when migrating to newer release. It's not supported with 
 all databases. Besides the migration often involves altering existing table, 
 changing or moving data etc.
 Hence it's very useful to have some consistency check to make sure that hive 
 is using correct metastore and for production systems the schema is not 
 automatically by running hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3764) Support metastore version consistency check

2013-09-19 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-3764:
--

Attachment: (was: HIVE-3764-12.2.patch)

 Support metastore version consistency check
 ---

 Key: HIVE-3764
 URL: https://issues.apache.org/jira/browse/HIVE-3764
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-3764.1.patch, HIVE-3764.2.patch, 
 HIVE-3764-trunk.2.patch


 Today there's no version/compatibility information stored in hive metastore. 
 Also the datanucleus configuration property to automatically create missing 
 tables is enabled by default. If you happen to start an older or newer hive 
 or don't run the correct upgrade scripts during migration, the metastore 
 would end up corrupted. The autoCreate schema is not always sufficient to 
 upgrade metastore when migrating to newer release. It's not supported with 
 all databases. Besides the migration often involves altering existing table, 
 changing or moving data etc.
 Hence it's very useful to have some consistency check to make sure that hive 
 is using correct metastore and for production systems the schema is not 
 automatically by running hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3764) Support metastore version consistency check

2013-09-19 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-3764:
--

Attachment: HIVE-3764-12.3.patch

 Support metastore version consistency check
 ---

 Key: HIVE-3764
 URL: https://issues.apache.org/jira/browse/HIVE-3764
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-3764-12.3.patch, HIVE-3764.1.patch, 
 HIVE-3764.2.patch, HIVE-3764-trunk.3.patch


 Today there's no version/compatibility information stored in hive metastore. 
 Also the datanucleus configuration property to automatically create missing 
 tables is enabled by default. If you happen to start an older or newer hive 
 or don't run the correct upgrade scripts during migration, the metastore 
 would end up corrupted. The autoCreate schema is not always sufficient to 
 upgrade metastore when migrating to newer release. It's not supported with 
 all databases. Besides the migration often involves altering existing table, 
 changing or moving data etc.
 Hence it's very useful to have some consistency check to make sure that hive 
 is using correct metastore and for production systems the schema is not 
 automatically by running hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3764) Support metastore version consistency check

2013-09-19 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-3764:
--

Attachment: HIVE-3764-trunk.3.patch

 Support metastore version consistency check
 ---

 Key: HIVE-3764
 URL: https://issues.apache.org/jira/browse/HIVE-3764
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-3764-12.3.patch, HIVE-3764.1.patch, 
 HIVE-3764.2.patch, HIVE-3764-trunk.3.patch


 Today there's no version/compatibility information stored in hive metastore. 
 Also the datanucleus configuration property to automatically create missing 
 tables is enabled by default. If you happen to start an older or newer hive 
 or don't run the correct upgrade scripts during migration, the metastore 
 would end up corrupted. The autoCreate schema is not always sufficient to 
 upgrade metastore when migrating to newer release. It's not supported with 
 all databases. Besides the migration often involves altering existing table, 
 changing or moving data etc.
 Hence it's very useful to have some consistency check to make sure that hive 
 is using correct metastore and for production systems the schema is not 
 automatically by running hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5311) TestHCatPartitionPublish can fail randomly

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771712#comment-13771712
 ] 

Hudson commented on HIVE-5311:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2341 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2341/])
HIVE-5311 : TestHCatPartitionPublish can fail randomly (Brock Noland via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524515)
* 
/hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/TestHCatPartitionPublish.java
* 
/hive/trunk/hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitionPublish.java


 TestHCatPartitionPublish can fail randomly
 --

 Key: HIVE-5311
 URL: https://issues.apache.org/jira/browse/HIVE-5311
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-5311.patch


 {noformat}
 org.apache.thrift.TApplicationException: 
 create_table_with_environment_context failed: out of sequence response
   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
   at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:793)
   at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:779)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:482)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:471)
   at 
 org.apache.hcatalog.mapreduce.TestHCatPartitionPublish.createTable(TestHCatPartitionPublish.java:241)
   at 
 org.apache.hcatalog.mapreduce.TestHCatPartitionPublish.testPartitionPublish(TestHCatPartitionPublish.java:133)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771713#comment-13771713
 ] 

Hudson commented on HIVE-4487:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2341 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2341/])
HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing 
FSPermission(string) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java
HIVE-4487 - Hive does not set explicit permissions on hive.exec.scratchdir 
(Chaoyu Tang via Brock Noland) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524509)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java


 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771711#comment-13771711
 ] 

Hudson commented on HIVE-5313:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2341 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2341/])
HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing 
FSPermission(string) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java


 HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
 -

 Key: HIVE-5313
 URL: https://issues.apache.org/jira/browse/HIVE-5313
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-5313.patch


 As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
 to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771710#comment-13771710
 ] 

Hudson commented on HIVE-5198:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2341 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2341/])
HIVE-5198: WebHCat returns exitcode 143 (w/o an explanation) (Eugene Koifman 
via Thejas Nair) (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524617)
* 
/hive/trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/ExecServiceImpl.java


 WebHCat returns exitcode 143 (w/o an explanation)
 -

 Key: HIVE-5198
 URL: https://issues.apache.org/jira/browse/HIVE-5198
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.11.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5198.patch


 The message might look like this:
 {statement:use default; show table extended like xyz;,error:unable to 
 show table: xyz,exec:{stdout:,stderr:,exitcode:143}} 
 WebHCat has a templeton.exec.timeout property which kills an HCat request 
 (i.e. something like a DDL statement that gets routed to HCat CLI) if it 
 takes longer than this timeout.
 Since WebHCat does a fork/exec to 'hcat' script, the timeout is implemented 
 as SIGTERM sent to the subprocess.  SIGTERM value is 15.  So it's reported as 
 128 + 15 = 143.
 Error logging/reporting should be improved in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5032) Enable hive creating external table at the root directory of DFS

2013-09-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771763#comment-13771763
 ] 

Hive QA commented on HIVE-5032:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603946/HIVE-5032.2.patch

{color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1241 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables
org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2

[jira] [Updated] (HIVE-5319) Executing SELECT on an AVRO table fails after executing ALTER to modify type of an existing column

2013-09-19 Thread Neha Tomar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Tomar updated HIVE-5319:
-

Labels: avro  (was: )

 Executing SELECT on an AVRO table fails after executing ALTER to modify type 
 of an existing column
 --

 Key: HIVE-5319
 URL: https://issues.apache.org/jira/browse/HIVE-5319
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
 Environment: Linux Ubuntu
Reporter: Neha Tomar
  Labels: avro

 1  Created a table in Hive with AVRO data.
   CREATE EXTERNAL TABLE tweets (username string, tweet string, timestamp 
 bigint)
 COMMENT 'A table backed by Avro data with the Avro schema stored in HDFS'
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS
 INPUTFORMAT  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION '/home/neha/test_data/avro_create_data'
 TBLPROPERTIES 
 ('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[
  {name : username,type : string,doc : Name of the user account on 
 Twitter.com},{name : tweet,type:string,doc : The content of the 
 Twitter message}, {name : timestamp, type : long, doc : Unix 
 epoch time in seconds}]}');
 2  Altered type of a column (to a compatible type) using ALTER TABLE. In 
 this example, altered type for column timestamp from long to int.
   ALTER TABLE tweets SET TBLPROPERTIES 
 ('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[
  {name : username,type : string,doc : Name of the user account on 
 Twitter.com},{name : tweet,type:string,doc : The content of the 
 Twitter message}, {name : timestamp, type : int, doc : Unix epoch 
 time in seconds}]}');
 3  Now, a select query on this table fails with following error.
 hive select * from tweets;
 OK
 Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: 
 Found long, expecting int
 Time taken: 4.514 seconds

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5319) Executing SELECT on an AVRO table fails after executing ALTER to modify type of an existing column

2013-09-19 Thread Neha Tomar (JIRA)
Neha Tomar created HIVE-5319:


 Summary: Executing SELECT on an AVRO table fails after executing 
ALTER to modify type of an existing column
 Key: HIVE-5319
 URL: https://issues.apache.org/jira/browse/HIVE-5319
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
 Environment: Linux Ubuntu
Reporter: Neha Tomar


1  Created a table in Hive with AVRO data.
CREATE EXTERNAL TABLE tweets (username string, tweet string, timestamp 
bigint)
COMMENT 'A table backed by Avro data with the Avro schema stored in HDFS'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/home/neha/test_data/avro_create_data'
TBLPROPERTIES 
('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[
 {name : username,type : string,doc : Name of the user account on 
Twitter.com},{name : tweet,type:string,doc : The content of the 
Twitter message}, {name : timestamp, type : long, doc : Unix epoch 
time in seconds}]}');

2  Altered type of a column (to a compatible type) using ALTER TABLE. In this 
example, altered type for column timestamp from long to int.

ALTER TABLE tweets SET TBLPROPERTIES 
('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[
 {name : username,type : string,doc : Name of the user account on 
Twitter.com},{name : tweet,type:string,doc : The content of the 
Twitter message}, {name : timestamp, type : int, doc : Unix epoch 
time in seconds}]}');

3  Now, a select query on this table fails with following error.

hive select * from tweets;
OK
Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: 
Found long, expecting int
Time taken: 4.514 seconds







--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5319) Executing SELECT on an AVRO table fails after executing ALTER to modify type of an existing column

2013-09-19 Thread Neha Tomar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771780#comment-13771780
 ] 

Neha Tomar commented on HIVE-5319:
--

Pasting the exception trace below.

Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: 
Found long, expecting int
13/09/19 16:03:46 ERROR CliDriver: Failed with exception 
java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting 
int
java.io.IOException: org.apache.avro.AvroTypeException: Found long, expecting 
int
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.avro.AvroTypeException: Found long, expecting int
at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:231)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at 
org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:82)
at 
org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:341)
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:146)
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
at 
org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:140)
at 
org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:49)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:514)
... 13 more


 Executing SELECT on an AVRO table fails after executing ALTER to modify type 
 of an existing column
 --

 Key: HIVE-5319
 URL: https://issues.apache.org/jira/browse/HIVE-5319
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
 Environment: Linux Ubuntu
Reporter: Neha Tomar
  Labels: avro

 1  Created a table in Hive with AVRO data.
   CREATE EXTERNAL TABLE tweets (username string, tweet string, timestamp 
 bigint)
 COMMENT 'A table backed by Avro data with the Avro schema stored in HDFS'
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS
 INPUTFORMAT  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION '/home/neha/test_data/avro_create_data'
 TBLPROPERTIES 
 ('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[
  {name : username,type : string,doc : Name of the user account on 
 Twitter.com},{name : tweet,type:string,doc : The content of the 
 Twitter message}, {name : timestamp, type : long, doc : Unix 
 epoch time in seconds}]}');
 2  Altered type of a column (to a compatible type) using ALTER TABLE. In 
 this example, altered type for column timestamp from long to int.
   ALTER TABLE tweets SET TBLPROPERTIES 
 ('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[
  {name : username,type : string,doc : Name of the user account on 
 Twitter.com},{name : tweet,type:string,doc : The content of the 
 Twitter message}, {name : timestamp, type : int, doc : Unix epoch 
 time in seconds}]}');
 3  Now, a select query on this table fails with following error.
 hive select * from tweets;
 OK
 Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: 
 Found long, expecting int
 

[jira] [Commented] (HIVE-5309) Update hive-default.xml.template for vectorization flag; remove unused imports from MetaStoreUtils.java

2013-09-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771809#comment-13771809
 ] 

Hive QA commented on HIVE-5309:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603954/HIVE-5309.1-vectorization.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 3955 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json
org.apache.hcatalog.listener.TestMsgBusConnection.testConnection
org.apache.hive.hcatalog.listener.TestMsgBusConnection.testConnection
org.apache.hive.hcatalog.mapreduce.TestHCatExternalPartitioned.testHCatPartitionedTable
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/820/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/820/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

 Update hive-default.xml.template for vectorization flag; remove unused 
 imports from MetaStoreUtils.java
 ---

 Key: HIVE-5309
 URL: https://issues.apache.org/jira/browse/HIVE-5309
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5309.1-vectorization.patch, 
 HIVE-5309.1.vectorization.patch


 This jira provides fixes for some of the review comments on HIVE-5283.
 1) Update hive-default.xml.template for vectorization flag.
 2) remove unused imports from MetaStoreUtils.
 3) Add a test to run vectorization with non-orc format. The test must still 
 pass because vectorization optimization should fall back to non-vector mode.
 4) Hardcode the table name in QTestUtil.java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Operators and || do not work

2013-09-19 Thread amareshwari sriramdasu
Hello,

Though the documentation
https://cwiki.apache.org/Hive/languagemanual-udf.html says they are same as
AND and OR, they do not even get parsed. User gets parsing when they are
used. Was that intentional or is it a regression?

hive select key from src where key=a || key =b;
FAILED: Parse Error: line 1:33 cannot recognize input near '|' 'key' '=' in
expression specification

hive select key from src where key=a  key =b;
FAILED: Parse Error: line 1:33 cannot recognize input near '' 'key' '=' in
expression specification

Thanks
Amareshwari


[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support

2013-09-19 Thread Fusheng Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771817#comment-13771817
 ] 

Fusheng Wang commented on HIVE-5168:


A draft design document has been uploaded:
https://cwiki.apache.org/confluence/display/Hive/Spatial+queries
 

 Extend Hive for spatial query support
 -

 Key: HIVE-5168
 URL: https://issues.apache.org/jira/browse/HIVE-5168
 Project: Hive
  Issue Type: New Feature
Reporter: Fusheng Wang
  Labels: Hadoop-GIS, Spatial,

 I would like to propose to incorporate a newly developed spatial querying 
 component into Hive.
 We have recently developed a high performance MapReduce based spatial 
 querying system Hadoop-GIS, to support large scale spatial queries and 
 analytics. 
 Hadoop-GIS is a scalable and high performance spatial data warehousing system 
 for running large scale spatial queries on Hadoop. Hadoop-GIS supports 
 multiple types of spatial queries on MapReduce through space partitioning, 
 customizable spatial query engine RESQUE, implicit parallel spatial query 
 execution on MapReduce, and effective methods for amending query results 
 through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of 
 global partition indexing and customizable on demand local spatial indexing 
 to achieve efficient query processing. Hadoop-GIS is integrated into Hive to 
 support declarative spatial queries with an integrated architecture. 
 We have an alpha release. We look forward to contributors in Hive community 
 to contribute to the system. 
 github: https://github.com/hadoop-gis
 Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS
 References:
 1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong 
 Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing 
 System Over MapReduce. In Proceedings of the 39th International Conference on 
 Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013. 
 http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf
 2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High 
 Performance Spatial Query System for Large Scale Medical Imaging Data. In 
 Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances 
 in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach, 
 California, USA, November 6-9, 2012. 
 http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4113:
---

Status: Open  (was: Patch Available)

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
 HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771847#comment-13771847
 ] 

Yin Huai commented on HIVE-4113:


READ_ALL_COLUMNS and READ_ALL_COLUMNS_DEFAULT are mainly created for HCat, 
because I think it is a kind of burden to users if they have to be aware 
ColumnProjectionUtils and use it every time. So, through HCat, if users do not 
use ColumnProjectionUtils to set needed columns, we will read all columns. If 
we set READ_ALL_COLUMNS_DEFAULT=false, no column will be read if a user does 
not use ColumnProjectionUtils.

In Hive, if we get rid off the flag of column pruning, the list of 
neededColumnIDs in TS will not be null. Thus, in Hive, we will always set 
READ_ALL_COLUMNS to false (the .2 patch has an issue on it... I will fix it 
later).

In summary, in Hive, we use neededColumnIDs in TS as the only way to tell a 
underlying recordreader what to read. If neededColumnIDs is an empty list, we 
will know no needed column. Otherwise, we will read columns specified in 
neededColumnIDs (if we have select * in a sub-query, neededColumnIDs should be 
populated to include all columns).

In HCat, if a user wants to use the MapReduce interface, he or she has two ways 
to tell what columns are needed. 1) This user does nothing. In this case, we 
will read all columns. 2) This user uses utility functions in 
ColumnProjectionUtils (e.g. setReadColumnIDs) to specify needed columns. In 
this case, READ_ALL_COLUMNS will be set to false and we only read columns 
specified in READ_COLUMN_IDS_CONF_STR.

I hope what I am proposing makes sense. I am welcome to any suggestion :)

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
 HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class

2013-09-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771887#comment-13771887
 ] 

Hive QA commented on HIVE-5306:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603955/HIVE-5306.4.patch

{color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1245 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables
org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2

[jira] [Created] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors

2013-09-19 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-5320:
-

 Summary: Querying a table with nested struct type over JSON data 
results in errors
 Key: HIVE-5320
 URL: https://issues.apache.org/jira/browse/HIVE-5320
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
Reporter: Chaoyu Tang


Querying a table with nested_struct datatype like
==
create table nest_struct_tbl (col1 string, col2 arraystructa1:string, 
a2:arraystructb1:int, b2:string, b3:string) ROW FORMAT SERDE 
'org.openx.data.jsonserde.JsonSerDe'; 
==
over JSON data cause errors including java.lang.IndexOutOfBoundsException or 
corrupted data. 
The JsonSerDe used is 
json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.

The cause is that the method:
public ListObject getStructFieldsDataAsList(Object o) 
in JsonStructObjectInspector.java 
returns a list referencing to a static arraylist values
So the local variable 'list' in method serialize of Hive LazySimpleSerDe class 
is returned with same reference in its recursive calls and its element values 
are kept on being overwritten in the case STRUCT.

Solutions:
1. Fix in JsonSerDe, and change the field 'values' in 
java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
to instance scope.
Filed a ticket to JSonSerDe 
(https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
2. Ideally, in the method serialize of class LazySimpleSerDe, we should 
defensively save a copy of a list resulted from list = 
soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of 
JsonStructObjectInspector, so that the recursive calls of serialize can work 
properly regardless of the extended SerDe implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-19 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771894#comment-13771894
 ] 

Alan Gates commented on HIVE-5317:
--

The only requirement is that the file format must be able to support a rowid.  
With things like text and sequence file this can be done via a byte offset.

I'm not seeing why this falls apart in the file based authorization.  Are you 
worried that different users will own the base and delta files?  It's no 
different than the current case where different users may own different 
partitions.  We will need to make sure the compactions can still happen in this 
case, that is that the compaction can be run as the user who owns the table, 
not as Hive.

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4223) LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table

2013-09-19 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771904#comment-13771904
 ] 

Chaoyu Tang commented on HIVE-4223:
---

I was able to reproduce the similar issue but with JsonSerDe 1.1.4 
(json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar). See Hive-5320 for 
details


 LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of 
 hive table
 

 Key: HIVE-4223
 URL: https://issues.apache.org/jira/browse/HIVE-4223
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
 Environment: Hive 0.9.0
Reporter: Yong Zhang
 Attachments: nest_struct.data


 The LazySimpleSerDe will throw IndexOutOfBoundsException if the column 
 structure is struct containing array of struct. 
 I have a table with one column defined like this:
 columnA
 array 
 struct
col1:primiType,
col2:primiType,
col3:primiType,
col4:primiType,
col5:primiType,
col6:primiType,
col7:primiType,
col8:array
 struct
   col1:primiType,
   col2::primiType,
   col3::primiType,
   col4:primiType,
   col5:primiType,
   col6:primiType,
   col7:primiType,
   col8:primiType,
   col9:primiType
 

 
 
 In this example, the outside struct has 8 columns (including the array), and 
 the inner struct has 9 columns. As long as the outside struct has LESS column 
 count than the inner struct column count, I think we will get the following 
 exception as stracktrace in LazeSimpleSerDe when it tries to serialize a row:
 Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:443)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
 ... 9 more
 I am not very sure about exactly the reason of this problem. I believe that 
 the   public static void serialize(ByteStream.Output out, Object 
 obj,ObjectInspector objInspector, byte[] separators, int level, Text 
 nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape) is 
 recursively invoking itself when facing nest structure. But for the nested 
 struct structure, the list reference will mass up, and the size() will return 
 wrong data.
 In the above example case I faced, 
 for these 2 lines:
   List? extends StructField fields = soi.getAllStructFieldRefs();
   list = soi.getStructFieldsDataAsList(obj);
 my StructObjectInspector(soi) will return the CORRECT data for 
 getAllStructFieldRefs() and getStructFieldsDataAsList() methods. For example, 
 for one row, for the outsider 8 columns struct, I have 2 elements in the 
 inner array of struct, and each element will have 9 columns (as there are 9 
 columns in the inner struct). During runtime, after I added more logging in 
 the LazySimpleSerDe, I will see the following behavior in the logging:
 for 8 outside column, loop
 for 9 inside columns, loop for serialize
 for 9 inside columns, loop for serialize
 code broken here, for the outside loop, it will try to access the 9th 
 element,which not exist 

[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771905#comment-13771905
 ] 

Ashutosh Chauhan commented on HIVE-4113:


Sounds good to me. Go ahead and make the changes.

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
 HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5271) Convert join op to a map join op in the planning phase

2013-09-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771914#comment-13771914
 ] 

Ashutosh Chauhan commented on HIVE-5271:


Couple of suggestions:
* Seems like changes in MapRedTask are unintentional.
* Instead of modifying existing test files, I will recommend to create new test 
cases with names like tez_*

 Convert join op to a map join op in the planning phase
 --

 Key: HIVE-5271
 URL: https://issues.apache.org/jira/browse/HIVE-5271
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5271.WIP.patch


 This captures the planning changes required in hive to support hash joins. We 
 need to convert the join operator to a map join operator. This is hooked into 
 the infrastructure provided by HIVE-5095.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors

2013-09-19 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-5320:
--

Attachment: HIVE-5320.patch

 Querying a table with nested struct type over JSON data results in errors
 -

 Key: HIVE-5320
 URL: https://issues.apache.org/jira/browse/HIVE-5320
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
Reporter: Chaoyu Tang
 Attachments: HIVE-5320.patch


 Querying a table with nested_struct datatype like
 ==
 create table nest_struct_tbl (col1 string, col2 arraystructa1:string, 
 a2:arraystructb1:int, b2:string, b3:string) ROW FORMAT SERDE 
 'org.openx.data.jsonserde.JsonSerDe'; 
 ==
 over JSON data cause errors including java.lang.IndexOutOfBoundsException or 
 corrupted data. 
 The JsonSerDe used is 
 json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.
 The cause is that the method:
 public ListObject getStructFieldsDataAsList(Object o) 
 in JsonStructObjectInspector.java 
 returns a list referencing to a static arraylist values
 So the local variable 'list' in method serialize of Hive LazySimpleSerDe 
 class is returned with same reference in its recursive calls and its element 
 values are kept on being overwritten in the case STRUCT.
 Solutions:
 1. Fix in JsonSerDe, and change the field 'values' in 
 java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
 to instance scope.
 Filed a ticket to JSonSerDe 
 (https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
 2. Ideally, in the method serialize of class LazySimpleSerDe, we should 
 defensively save a copy of a list resulted from list = 
 soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of 
 JsonStructObjectInspector, so that the recursive calls of serialize can work 
 properly regardless of the extended SerDe implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Operators and || do not work

2013-09-19 Thread Ashutosh Chauhan
I have not tested it on historical versions, so don't know on which
versions it used to work (if ever), but possibly antlr upgrade [1] may have
impacted this.

[1] : https://issues.apache.org/jira/browse/HIVE-2439

Ashutosh


On Thu, Sep 19, 2013 at 4:52 AM, amareshwari sriramdasu 
amareshw...@gmail.com wrote:

 Hello,

 Though the documentation
 https://cwiki.apache.org/Hive/languagemanual-udf.html says they are same
 as
 AND and OR, they do not even get parsed. User gets parsing when they are
 used. Was that intentional or is it a regression?

 hive select key from src where key=a || key =b;
 FAILED: Parse Error: line 1:33 cannot recognize input near '|' 'key' '=' in
 expression specification

 hive select key from src where key=a  key =b;
 FAILED: Parse Error: line 1:33 cannot recognize input near '' 'key' '=' in
 expression specification

 Thanks
 Amareshwari



[jira] [Assigned] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors

2013-09-19 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang reassigned HIVE-5320:
-

Assignee: Chaoyu Tang

 Querying a table with nested struct type over JSON data results in errors
 -

 Key: HIVE-5320
 URL: https://issues.apache.org/jira/browse/HIVE-5320
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Attachments: HIVE-5320.patch


 Querying a table with nested_struct datatype like
 ==
 create table nest_struct_tbl (col1 string, col2 arraystructa1:string, 
 a2:arraystructb1:int, b2:string, b3:string) ROW FORMAT SERDE 
 'org.openx.data.jsonserde.JsonSerDe'; 
 ==
 over JSON data cause errors including java.lang.IndexOutOfBoundsException or 
 corrupted data. 
 The JsonSerDe used is 
 json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.
 The cause is that the method:
 public ListObject getStructFieldsDataAsList(Object o) 
 in JsonStructObjectInspector.java 
 returns a list referencing to a static arraylist values
 So the local variable 'list' in method serialize of Hive LazySimpleSerDe 
 class is returned with same reference in its recursive calls and its element 
 values are kept on being overwritten in the case STRUCT.
 Solutions:
 1. Fix in JsonSerDe, and change the field 'values' in 
 java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
 to instance scope.
 Filed a ticket to JSonSerDe 
 (https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
 2. Ideally, in the method serialize of class LazySimpleSerDe, we should 
 defensively save a copy of a list resulted from list = 
 soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of 
 JsonStructObjectInspector, so that the recursive calls of serialize can work 
 properly regardless of the extended SerDe implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors

2013-09-19 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771916#comment-13771916
 ] 

Chaoyu Tang commented on HIVE-5320:
---

Please review the attached patch for the fix.

 Querying a table with nested struct type over JSON data results in errors
 -

 Key: HIVE-5320
 URL: https://issues.apache.org/jira/browse/HIVE-5320
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Attachments: HIVE-5320.patch


 Querying a table with nested_struct datatype like
 ==
 create table nest_struct_tbl (col1 string, col2 arraystructa1:string, 
 a2:arraystructb1:int, b2:string, b3:string) ROW FORMAT SERDE 
 'org.openx.data.jsonserde.JsonSerDe'; 
 ==
 over JSON data cause errors including java.lang.IndexOutOfBoundsException or 
 corrupted data. 
 The JsonSerDe used is 
 json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.
 The cause is that the method:
 public ListObject getStructFieldsDataAsList(Object o) 
 in JsonStructObjectInspector.java 
 returns a list referencing to a static arraylist values
 So the local variable 'list' in method serialize of Hive LazySimpleSerDe 
 class is returned with same reference in its recursive calls and its element 
 values are kept on being overwritten in the case STRUCT.
 Solutions:
 1. Fix in JsonSerDe, and change the field 'values' in 
 java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
 to instance scope.
 Filed a ticket to JSonSerDe 
 (https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
 2. Ideally, in the method serialize of class LazySimpleSerDe, we should 
 defensively save a copy of a list resulted from list = 
 soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of 
 JsonStructObjectInspector, so that the recursive calls of serialize can work 
 properly regardless of the extended SerDe implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2615) CTAS with literal NULL creates VOID type

2013-09-19 Thread Johndee Burks (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771940#comment-13771940
 ] 

Johndee Burks commented on HIVE-2615:
-

An example of what the cast would look like: 

create table new_table as select column, cast(null as type) column_name 
from table_name;

create table null_test as select user, cast(null as bigint) test from a;

 CTAS with literal NULL creates VOID type
 

 Key: HIVE-2615
 URL: https://issues.apache.org/jira/browse/HIVE-2615
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0
Reporter: David Phillips
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-2615.1.patch


 Create the table with a column that always contains NULL:
 {quote}
 hive create table bad as select 1 x, null z from dual; 
 {quote}
 Because there's no type, Hive gives it the VOID type:
 {quote}
 hive describe bad;
 OK
 x int 
 z void
 {quote}
 This seems weird, because AFAIK, there is no normal way to create a column of 
 type VOID.  The problem is that the table can't be queried:
 {quote}
 hive select * from bad;
 OK
 Failed with exception java.io.IOException:java.lang.RuntimeException: 
 Internal error: no LazyObject for VOID
 {quote}
 Worse, even if you don't select that field, the query fails at runtime:
 {quote}
 hive select x from bad;
 ...
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5302) PartitionPruner fails on Avro non-partitioned data

2013-09-19 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771970#comment-13771970
 ] 

Edward Capriolo commented on HIVE-5302:
---

[~mwagner] [~busbey] Can the two of you come to a consensus as to whether the 
bug still exists? 

[~ashutoshc]  I understand your debate about bloating the plan, however the 
plan is fairly ephemeral and changes quite often. If we can confirm the issue, 
this is surely a 0.12 blocker. You have mentioned that you would like to see 
this issue resolved a different way. Without a concrete suggestion as to what 
the better way might be we are at a stand still.

I do not thing we want to hold up 0.12 longer then we need to, and I do not 
thing we want avro broken. Does anyone want to add anything? If not I am +1 on 
this patch.

 PartitionPruner fails on Avro non-partitioned data
 --

 Key: HIVE-5302
 URL: https://issues.apache.org/jira/browse/HIVE-5302
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
  Labels: avro
 Attachments: HIVE-5302.1-branch-0.12.patch.txt, 
 HIVE-5302.1.patch.txt, HIVE-5302.1.patch.txt


 While updating HIVE-3585 I found a test case that causes the failure in the 
 MetaStoreUtils partition retrieval from back in HIVE-4789.
 in this case, the failure is triggered when the partition pruner is handed a 
 non-partitioned table and has to construct a pseudo-partition.
 e.g.
 {code}
   INSERT OVERWRITE TABLE partitioned_table PARTITION(col) SELECT id, foo, col 
 FROM non_partitioned_table WHERE col = 9;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771954#comment-13771954
 ] 

Brock Noland commented on HIVE-4487:


Full error message from: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-813/failed/TestParseNegative/hive.log

{noformat}
20_510_7475863120290716577-1/-ext-1 to 0777
java.io.IOException: Failed to set permissions of path: 
/home/hiveptest/ip-10-74-50-170-hiveptest-2/apache-svn-trunk-source/build/ql/scratchdir/hive_2013-09-18_19-22-20_510_7475863120290716577-1/-ext-1
 to 0777
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at 
org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at 
org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.fs.ProxyFileSystem.mkdirs(ProxyFileSystem.java:217)
at 
org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
at org.apache.hadoop.hive.ql.exec.CopyTask.execute(CopyTask.java:74)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1415)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1193)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1021)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
at org.apache.hadoop.hive.ql.QTestUtil.runLoadCmd(QTestUtil.java:539)
at org.apache.hadoop.hive.ql.QTestUtil.createSources(QTestUtil.java:586)
at org.apache.hadoop.hive.ql.QTestUtil.init(QTestUtil.java:678)
at 
org.apache.hadoop.hive.ql.parse.TestParseNegative.runTest(TestParseNegative.java:248)
at 
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_ambiguous_join_col(TestParseNegative.java:117)
{noformat}

 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5202) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types.

2013-09-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771971#comment-13771971
 ] 

Hive QA commented on HIVE-5202:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603964/HIVE-5202.2.patch.txt

{color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1241 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables
org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2

[jira] [Updated] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-19 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-5317:


Attachment: InsertUpdatesinHive.pdf

Here are my thoughts about how it can be approached.

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: How long will we support Hadoop 0.20.2?

2013-09-19 Thread Owen O'Malley
+1 to dropping Hadoop 0.20.2 support in Hive 0.13, which given that Hive
0.12 has just branched means it isn't likely that Hive 0.13 will come out
in the next 6 months.

-- Owen


On Thu, Sep 19, 2013 at 8:35 AM, Brock Noland br...@cloudera.com wrote:

 First off, I have to apologize, I didn't know there would be such
 passions on both sides of the 0.20.2 argument!

 On Thu, Sep 19, 2013 at 10:11 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
  That rant being done,

 No worries man, Hadoop versions are something worth ranting about.
 IMHO Hadoop has a history of changing API's and breaking end users.
 However, I feel this is improving.

  we can not and should not support hadoop 0.20.2
  forever. Discontinuing hadoop 0.20.2 in say 6 months might be reasonable,
  but I think dropping it on the floor due to a one line change for a
 missing
  convenience constructor is a bit knee-jerk.

 Very sorry if I came across with the opinion that we should drop
 0.20.2 now because of the constructor issue. The issue brought up
 0.20.2's age in my mind and the logical next step is to ask how long
 we plan on supporting it! :) I like the time bounding idea and I feel
 6 months is reasonable. FWIW, the 1.X series is stable for my needs.

 Brock



[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771952#comment-13771952
 ] 

Brock Noland commented on HIVE-4487:


Very strange. I don't see why this would be occurring since the hiveptest owns 
everything in /home/hiveptest/. It's not a privileged user so cannot it cannot 
change ownership. The only way I can see that is if 
hive_2013-09-18_19-22-30_852_73877859563099-1 somehow got created with 
000 (or anything but 700).

 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5311) TestHCatPartitionPublish can fail randomly

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772065#comment-13772065
 ] 

Hudson commented on HIVE-5311:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #106 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/106/])
HIVE-5311 : TestHCatPartitionPublish can fail randomly (Brock Noland via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524515)
* 
/hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/TestHCatPartitionPublish.java
* 
/hive/trunk/hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitionPublish.java


 TestHCatPartitionPublish can fail randomly
 --

 Key: HIVE-5311
 URL: https://issues.apache.org/jira/browse/HIVE-5311
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-5311.patch


 {noformat}
 org.apache.thrift.TApplicationException: 
 create_table_with_environment_context failed: out of sequence response
   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
   at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:793)
   at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:779)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:482)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:471)
   at 
 org.apache.hcatalog.mapreduce.TestHCatPartitionPublish.createTable(TestHCatPartitionPublish.java:241)
   at 
 org.apache.hcatalog.mapreduce.TestHCatPartitionPublish.testPartitionPublish(TestHCatPartitionPublish.java:133)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772064#comment-13772064
 ] 

Hudson commented on HIVE-5313:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #106 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/106/])
HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing 
FSPermission(string) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java


 HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
 -

 Key: HIVE-5313
 URL: https://issues.apache.org/jira/browse/HIVE-5313
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-5313.patch


 As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
 to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-19 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772086#comment-13772086
 ] 

Bikas Saha commented on HIVE-5317:
--

Some questions which I am sure have been considered but are not clear in the 
document.
Should metastore heartbeat be in the job itself and not the client since the 
job is the source of truth and the client can disappear. What happens if the 
client disappears but the job completes with success and manages to promote the 
output files?
Is transaction id per file or per metastore? Where does the metastore recover 
the last transaction id(s) from after restart?

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772066#comment-13772066
 ] 

Hudson commented on HIVE-4487:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #106 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/106/])
HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing 
FSPermission(string) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java
HIVE-4487 - Hive does not set explicit permissions on hive.exec.scratchdir 
(Chaoyu Tang via Brock Noland) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524509)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java


 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772046#comment-13772046
 ] 

stack commented on HIVE-5317:
-

[~alangates]

Looks like a bunch of hbase primitives done as mapreduce jobs.

At first blush, on 1., percolator would be a bunch of work but looks less than 
what is proposed here (would you need percolator given you write the 
transaction id into the row?).  On 2., if hbase were made write ORC, couldn't 
you MR the files hbase writes after asking hbase to snapshot.

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772063#comment-13772063
 ] 

Hudson commented on HIVE-5198:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #106 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/106/])
HIVE-5198: WebHCat returns exitcode 143 (w/o an explanation) (Eugene Koifman 
via Thejas Nair) (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524617)
* 
/hive/trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/ExecServiceImpl.java


 WebHCat returns exitcode 143 (w/o an explanation)
 -

 Key: HIVE-5198
 URL: https://issues.apache.org/jira/browse/HIVE-5198
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.11.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5198.patch


 The message might look like this:
 {statement:use default; show table extended like xyz;,error:unable to 
 show table: xyz,exec:{stdout:,stderr:,exitcode:143}} 
 WebHCat has a templeton.exec.timeout property which kills an HCat request 
 (i.e. something like a DDL statement that gets routed to HCat CLI) if it 
 takes longer than this timeout.
 Since WebHCat does a fork/exec to 'hcat' script, the timeout is implemented 
 as SIGTERM sent to the subprocess.  SIGTERM value is 15.  So it's reported as 
 128 + 15 = 143.
 Error logging/reporting should be improved in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5070) Need to implement listLocatedStatus() in ProxyFileSystem for 0.23 shim

2013-09-19 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HIVE-5070:
--

Summary: Need to implement listLocatedStatus() in ProxyFileSystem for 0.23 
shim  (was: Need to implement listLocatedStatus() in ProxyFileSystem)

 Need to implement listLocatedStatus() in ProxyFileSystem for 0.23 shim
 --

 Key: HIVE-5070
 URL: https://issues.apache.org/jira/browse/HIVE-5070
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.12.0
Reporter: shanyu zhao
 Fix For: 0.13.0

 Attachments: HIVE-5070.patch.txt, HIVE-5070-v2.patch, 
 HIVE-5070-v3.patch


 MAPREDUCE-1981 introduced a new API for FileSystem - listLocatedStatus. It is 
 used in Hadoop's FileInputFormat.getSplits(). Hive's ProxyFileSystem class 
 needs to implement this API in order to make Hive unit test work.
 Otherwise, you'll see these exceptions when running TestCliDriver test case, 
 e.g. results of running allcolref_in_udf.q:
 [junit] Running org.apache.hadoop.hive.cli.TestCliDriver
 [junit] Begin query: allcolref_in_udf.q
 [junit] java.lang.IllegalArgumentException: Wrong FS: 
 pfile:/GitHub/Monarch/project/hive-monarch/build/ql/test/data/warehouse/src, 
 expected: file:///
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
 [junit]   at 
 org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69)
 [junit]   at 
 org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:375)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1482)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1522)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem$4.init(FileSystem.java:1798)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1797)
 [junit]   at 
 org.apache.hadoop.fs.ChecksumFileSystem.listLocatedStatus(ChecksumFileSystem.java:579)
 [junit]   at 
 org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235)
 [junit]   at 
 org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235)
 [junit]   at 
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
 [junit]   at 
 org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
 [junit]   at 
 org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69)
 [junit]   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:385)
 [junit]   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:351)
 [junit]   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:389)
 [junit]   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503)
 [junit]   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495)
 [junit]   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390)
 [junit]   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
 [junit]   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
 [junit]   at java.security.AccessController.doPrivileged(Native Method)
 [junit]   at javax.security.auth.Subject.doAs(Subject.java:396)
 [junit]   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481)
 [junit]   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
 [junit]   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 [junit]   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:552)
 [junit]   at java.security.AccessController.doPrivileged(Native Method)
 [junit]   at javax.security.auth.Subject.doAs(Subject.java:396)
 [junit]   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481)
 [junit]   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:552)
 [junit]   at 
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:543)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:688)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit]   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit]   at 

[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772077#comment-13772077
 ] 

stack commented on HIVE-5317:
-

bq. The HBase scan rate is much lower than HDFS, especially with short-circuit 
reads.

What kinda of numbers are you talking Owen?  Would be interested in knowing 
what they are.  Implication would be also that it cannot be improved?  Or 
scanning the files written by hbase offline from a snapshot wouldn't work from 
you (snapshots are cheap in hbase.  Going by your use cases, you'd be doing 
these runs infrequently enough).

bq. HBase is tuned for a write-heavy workloads.

Funny.  Often we're accused of the other extreme.

bq. HBase doesn't have a columnar format and can't support column projection.

It doesn't. Too much work to add a storage engine that wrote columnar?

bq. HBase doesn't have the equivalent of partitions or buckets.

In hbase we call them 'Regions'.


 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772115#comment-13772115
 ] 

Owen O'Malley commented on HIVE-5317:
-

Bikas,
  In Hive if the client disappears, the query fails, because the final work 
(output promotion, display to the user) is done by the client. Also don't 
forget that a single query may be composed on many MR jobs, although obviously 
that changes on Tez. 

The transaction id is global for all of the tasks working on the same query. 

The metastore's data in stored in an underlying SQL database, so the 
transaction information will need to be there also.

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-19 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Attachment: HIVE-4732.6.patch

Incorporating [~appodictic] comments.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.6.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow

2013-09-19 Thread Doug Sedlak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772142#comment-13772142
 ] 

Doug Sedlak commented on HIVE-4051:
---

I've noticed that the more partitions in a Hive table, the slower the following 
operations come back.  With thousands of partitions they approach painfully 
slow:
SELECT * FROM TABNAME
SHOW TABLE EXTENDED LIKE `TABNAME`

Do you know if this fix takes case of these issues?  If not is it something you 
could test?
If not, I'll enter a new case.

Thanks, Doug  doug.sed...@sas.com

 Hive's metastore suffers from 1+N queries when querying partitions  is slow
 

 Key: HIVE-4051
 URL: https://issues.apache.org/jira/browse/HIVE-4051
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
 Environment: RHEL 6.3 / EC2 C1.XL
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: 0.12.0

 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, 
 HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch, HIVE-4051.D11805.5.patch, 
 HIVE-4051.D11805.6.patch, HIVE-4051.D11805.7.patch, HIVE-4051.D11805.8.patch, 
 HIVE-4051.D11805.9.patch


 Hive's query client takes a long time to initialize  start planning queries 
 because of delays in creating all the MTable/MPartition objects.
 For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
 initialize - firing approximately 5900 queries to the mysql database.
 Several of those queries fetch exactly one row to create a single object on 
 the client.
 The following 12 queries were repeated for each partition, generating a storm 
 of SQL queries 
 {code}
 4 Query SELECT 
 `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
 `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
 `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 
 `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
 FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
 `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
 THIS.`INTEGER_IDX`=0
 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
 THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
 `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
 `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
 =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
 =4871 AND `STRING_LIST_ID_KID` IS NOT NULL
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN 
 `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = 
 `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM 
 `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT 
 (`A0`.`STRING_LIST_ID_KID` IS NULL)
 {code}
 This data is not detached or cached, so this operation is performed during 
 every query plan for the partitions, even in the same hive client.
 The queries are automatically generated by JDO/DataNucleus which makes it 
 nearly impossible to rewrite it into a single denormalized join operation  
 process it locally.
 Attempts to optimize this with JDO fetch-groups did not bear fruit in 
 improving the query count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class

2013-09-19 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772149#comment-13772149
 ] 

Mohammad Kamrul Islam commented on HIVE-5306:
-


[~jdere]: 
2. How about float/string? I think those can be converted to double. 
Was string and float supported in the original case?

Will apply your other comments.

 Use new GenericUDF instead of basic UDF for UDFAbs class
 

 Key: HIVE-5306
 URL: https://issues.apache.org/jira/browse/HIVE-5306
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-5306.1.patch, HIVE-5306.2.patch, HIVE-5306.3.patch, 
 HIVE-5306.4.patch


 GenericUDF class is the latest  and recommended base class for any UDFs.
 This JIRA is to change the current UDFAbs class extended from GenericUDF.
 The general benefit of GenericUDF is described in comments as 
 * The GenericUDF are superior to normal UDFs in the following ways: 1. It can
  * accept arguments of complex types, and return complex types. 2. It can 
 accept
  * variable length of arguments. 3. It can accept an infinite number of 
 function
  * signature - for example, it's easy to write a GenericUDF that accepts
  * arrayint, arrayarrayint and so on (arbitrary levels of nesting). 4. 
 It
  * can do short-circuit evaluations using DeferedObject.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 14232: HIVE-5070: Need to implement listLocatedStatus() in ProxyFileSystem for 0.23 shim.

2013-09-19 Thread shanyu zhao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14232/
---

Review request for hive, Jason Dere and Thejas Nair.


Repository: hive-git


Description
---

Please see HIVE-5070 for a detailed description of the problem:
https://issues.apache.org/jira/browse/HIVE-5070

This patch creates a new shim method: createProxyFileSystem(). In shim 0.20 and 
0.20S, it is simply create a ProxyFileSystem object. In shim 0.23, it creates a 
ProxyFileSystem23 that derives from ProxyFileSystem and implement the 
listLocatedStatus() method to handle proxy correctly.


Diffs
-

  shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java cf5c175 
  shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 9351411 
  
shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java
 28843e0 
  shims/src/common/java/org/apache/hadoop/fs/ProxyFileSystem.java 28a18f6 
  shims/src/common/java/org/apache/hadoop/fs/ProxyLocalFileSystem.java 9f35769 
  shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java 5b91267 

Diff: https://reviews.apache.org/r/14232/diff/


Testing
---

Run hive unit tests against hadoop 2.1.1-beta, it is successful now.


Thanks,

shanyu zhao



[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-19 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772021#comment-13772021
 ] 

Alan Gates commented on HIVE-5317:
--

Brock, we did look at that.  We didn't go that route for a couple of reasons:
# Adding transactions to HBase is a fair amount of work.  See Google's 
Percolator paper on one approach to that.
# HBase can't offer the same scan speed as HDFS.  Since we're choosing to focus 
this on updates done in the OLAP style work loads HBase isn't going to be a 
great storage mechanism for the data.  I agree it might make sense to have 
transactions on HBase for a more OLTP style workload.

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 14221: HIVE-4113: Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14221/
---

(Updated Sept. 19, 2013, 5:48 p.m.)


Review request for hive.


Bugs: HIVE-4113
https://issues.apache.org/jira/browse/HIVE-4113


Repository: hive-git


Description
---

Modifies ColumnProjectionUtils such there are two flags. One for the column ids 
and one indicating whether all columns should be read. Additionally the patch 
updates all locations which uses the old method of empty string indicating all 
columns should be read.

The automatic formatter generated by ant eclipse-files is fairly aggressive so 
there are some unrelated import/whitespace cleanup.

This one is based on https://reviews.apache.org/r/11770/ and has been rebased 
to the latest trunk.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f37d0c 
  conf/hive-default.xml.template 545026d 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 766056b 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
 553446a 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatRecordReader.java
 3ee6157 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
 1980ef5 
  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitioned.java
 577e06d 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java
 d38bb8d 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 31a52ba 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java ab0494e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java a5a8943 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 0f29a0e 
  ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 
49145b7 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cccdc1b 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java a83f223 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 9521060 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 50c5093 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java
 cbdc2db 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
ed14e82 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java b97d869 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java
 0550bf6 
  ql/src/test/org/apache/hadoop/hive/ql/io/PerformTestRCFileAndSeqFile.java 
fb9fca1 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java dd1276d 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
83c5c38 
  serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 
0b3ef7b 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 
11f5f07 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 
1335446 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java 
e1270cc 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java
 b717278 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java
 0317024 
  serde/src/test/org/apache/hadoop/hive/serde2/TestColumnProjectionUtils.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/14221/diff/


Testing
---


Thanks,

Yin Huai



[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class

2013-09-19 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772158#comment-13772158
 ] 

Jason Dere commented on HIVE-5306:
--

As mentioned in the first comment, for non-generic UDFs it does attempt to see 
if the input argument can be mapped to one of the supported argument types. So 
it should work for float/string:

hive create view view1 as select abs('1'), abs(cast(1.0 as float)) from src 
limit 1; 
OK
Time taken: 0.099 seconds
hive describe view1;
OK
_c0 double  None
_c1 double  None
Time taken: 0.055 seconds, Fetched: 2 row(s)


 Use new GenericUDF instead of basic UDF for UDFAbs class
 

 Key: HIVE-5306
 URL: https://issues.apache.org/jira/browse/HIVE-5306
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-5306.1.patch, HIVE-5306.2.patch, HIVE-5306.3.patch, 
 HIVE-5306.4.patch


 GenericUDF class is the latest  and recommended base class for any UDFs.
 This JIRA is to change the current UDFAbs class extended from GenericUDF.
 The general benefit of GenericUDF is described in comments as 
 * The GenericUDF are superior to normal UDFs in the following ways: 1. It can
  * accept arguments of complex types, and return complex types. 2. It can 
 accept
  * variable length of arguments. 3. It can accept an infinite number of 
 function
  * signature - for example, it's easy to write a GenericUDF that accepts
  * arrayint, arrayarrayint and so on (arbitrary levels of nesting). 4. 
 It
  * can do short-circuit evaluations using DeferedObject.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5321) Join filters do not work correctly with outer joins again

2013-09-19 Thread Alexander Pivovarov (JIRA)
Alexander Pivovarov created HIVE-5321:
-

 Summary: Join filters do not work correctly with outer joins again
 Key: HIVE-5321
 URL: https://issues.apache.org/jira/browse/HIVE-5321
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.9.0
Reporter: Alexander Pivovarov


SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
do not give correct results.

to reproduce:
hive create table tt1 (c1 int);
hive create table tt2 (c1 int);
$ vi tt1
1
2
3
4
$ vi tt2
1
2
8
9
$ hadoop fs -put tt1 /user/hive/warehouse/tt1/
$ hadoop fs -put tt2 /user/hive/warehouse/tt2/
wrong result:
hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
2);
1   1
2   2
3   NULL
4   NULL
correct result:
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
1   1
2   2
alexp@t1:~/hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
Release Notes - Hive - Version 0.11.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-19 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772192#comment-13772192
 ] 

Eric Hanson commented on HIVE-5317:
---

Overall this looks like a workable approach give the use cases described 
(mostly coarse grained updates with a low transaction rate), and it has the 
benefit that it doesn't take a dependency on another large piece of software 
like an update-aware DBMS or NoSQL store.

Regarding use cases, it appears that this design won't be able to have fast 
performance for fine-grained inserts. E.g. there might be scenarios where you 
want to insert one row into a fact table every 10 milliseconds in a separate 
transaction and have the rows immediately visible to readers. Are you willing 
to forgo that use case? It sounds like yes. This may be reasonable. If you want 
to handle it then a different design for the delta insert file information is 
probably needed, i.e. a store that's optimized for short write transactions.

I didn't see any obvious problem, due to the versioned scans, but is this 
design safe from the Halloween problem? That's the problem where an update scan 
sees its own updates again, causing an infinite loop or incorrect update. An 
argument that the design is safe from this would be good.

You mention that you will have one type of delta file that encodes updates 
directly, for sorted files. Is this really necessary, or can you make updates 
illegal for sorted files? If updates can always be modelled as insert plus 
deleted, that simplifies things.

How do you ensure that the delta files are fully written (committed) to the 
storage system before the metastore treats the transaction that created the 
delta file as committed?

It's not completely clear why you need exactly the transaction ID information 
specified in the delta file names. E.g. would just the transaction ID (start 
timestamp) be enough? A precise specification of how they are used would be 
useful.

Explicitly explaining what happens when a transaction aborts and how its delta 
files get ignored and then cleaned up would be useful.

Is there any issue with correctness of task retry in the presence of updates if 
a task fails? It appears that it is safe due to the snapshot isolation. 
Explicitly addressing this in the specification would be good.



 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5321) Join filters do not work correctly with outer joins again

2013-09-19 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-5321:
--

Description: 
SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
do not give correct results.

to reproduce:
hive create table tt1 (c1 int);
hive create table tt2 (c1 int);
$ vi tt1
1
2
3
4

$ vi tt2
1
2
8
9

$ hadoop fs -put tt1 /user/hive/warehouse/tt1/
$ hadoop fs -put tt2 /user/hive/warehouse/tt2/

wrong result:
hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
2);
1   1
2   2
3   NULL
4   NULL

correct result:
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
1   1
2   2

hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
Release Notes - Hive - Version 0.11.0

  was:
SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
do not give correct results.

to reproduce:
hive create table tt1 (c1 int);
hive create table tt2 (c1 int);
$ vi tt1
1
2
3
4
$ vi tt2
1
2
8
9
$ hadoop fs -put tt1 /user/hive/warehouse/tt1/
$ hadoop fs -put tt2 /user/hive/warehouse/tt2/
wrong result:
hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
2);
1   1
2   2
3   NULL
4   NULL
correct result:
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
1   1
2   2
alexp@t1:~/hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
Release Notes - Hive - Version 0.11.0


 Join filters do not work correctly with outer joins again
 -

 Key: HIVE-5321
 URL: https://issues.apache.org/jira/browse/HIVE-5321
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.11.0
Reporter: Alexander Pivovarov

 SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.
 to reproduce:
 hive create table tt1 (c1 int);
 hive create table tt2 (c1 int);
 $ vi tt1
 1
 2
 3
 4
 $ vi tt2
 1
 2
 8
 9
 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/
 $ hadoop fs -put tt2 /user/hive/warehouse/tt2/
 wrong result:
 hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
 2);
 1 1
 2 2
 3 NULL
 4 NULL
 correct result:
 select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
 1 1
 2 2
 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
 Release Notes - Hive - Version 0.11.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1534) Join filters do not work correctly with outer joins

2013-09-19 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772169#comment-13772169
 ] 

Alexander Pivovarov commented on HIVE-1534:
---

I still see this issue in hive-0.11.0

 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.7.0

 Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
 patch-1534-4.txt, patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5321) Join filters do not work correctly with outer joins again

2013-09-19 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-5321:
--

Description: 
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2);
does not give correct results.

to reproduce:
hive create table tt1 (c1 int);
hive create table tt2 (c1 int);
$ vi tt1
1
2
3
4

$ vi tt2
1
2
8
9

$ hadoop fs -put tt1 /user/hive/warehouse/tt1/
$ hadoop fs -put tt2 /user/hive/warehouse/tt2/

wrong result:
hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
2);
1   1
2   2
3   NULL
4   NULL

correct result:
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
1   1
2   2

hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
Release Notes - Hive - Version 0.11.0

  was:
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2);
do not give correct results.

to reproduce:
hive create table tt1 (c1 int);
hive create table tt2 (c1 int);
$ vi tt1
1
2
3
4

$ vi tt2
1
2
8
9

$ hadoop fs -put tt1 /user/hive/warehouse/tt1/
$ hadoop fs -put tt2 /user/hive/warehouse/tt2/

wrong result:
hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
2);
1   1
2   2
3   NULL
4   NULL

correct result:
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
1   1
2   2

hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
Release Notes - Hive - Version 0.11.0


 Join filters do not work correctly with outer joins again
 -

 Key: HIVE-5321
 URL: https://issues.apache.org/jira/browse/HIVE-5321
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.11.0
Reporter: Alexander Pivovarov

 select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2);
 does not give correct results.
 to reproduce:
 hive create table tt1 (c1 int);
 hive create table tt2 (c1 int);
 $ vi tt1
 1
 2
 3
 4
 $ vi tt2
 1
 2
 8
 9
 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/
 $ hadoop fs -put tt2 /user/hive/warehouse/tt2/
 wrong result:
 hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
 2);
 1 1
 2 2
 3 NULL
 4 NULL
 correct result:
 select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
 1 1
 2 2
 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
 Release Notes - Hive - Version 0.11.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772106#comment-13772106
 ] 

Yin Huai commented on HIVE-4487:


{code}
drwxrwxrwt  22 root root  56K Sep 19 13:36 tmp
{code}

{code}
drwxrwxrwx 2 yhuai   yhuai   4.0K Sep 19 13:50 yhuai
{code}

 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772186#comment-13772186
 ] 

Chaoyu Tang commented on HIVE-4487:
---

[~yhuai] It works in my eclipse. The log tells that it failed in line outStream 
= fs.create(resFile) of DDLTask.
Could you debug and check before this line is executed, what permission and 
owner of the dir (e.g. /tmp/yhuai/hive_2013-09-19_/, one level up 
-local-1) are? What Hadoop version you are using?

 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1534) Join filters do not work correctly with outer joins

2013-09-19 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772179#comment-13772179
 ] 

Alexander Pivovarov commented on HIVE-1534:
---

to reproduce:

hive create table tt1 (c1 int);
hive create table tt2 (c1 int);

$ vi tt1
1
2
3
4

$ vi tt2
1
2
8
9

$ hadoop fs -put tt1 /user/hive/warehouse/tt1/
$ hadoop fs -put tt2 /user/hive/warehouse/tt2/

wrong result:
hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
2);
1   1
2   2
3   NULL
4   NULL

correct result:
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
1   1
2   2


alexp@t1:~/hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
Release Notes - Hive - Version 0.11.0


 Join filters do not work correctly with outer joins
 ---

 Key: HIVE-1534
 URL: https://issues.apache.org/jira/browse/HIVE-1534
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.7.0

 Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
 patch-1534-4.txt, patch-1534.txt


  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
 and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
 do not give correct results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772183#comment-13772183
 ] 

Hudson commented on HIVE-5313:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #173 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/173/])
HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing 
FSPermission(string) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java


 HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
 -

 Key: HIVE-5313
 URL: https://issues.apache.org/jira/browse/HIVE-5313
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-5313.patch


 As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
 to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772184#comment-13772184
 ] 

Hudson commented on HIVE-4487:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #173 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/173/])
HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing 
FSPermission(string) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java


 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4113:
---

Attachment: HIVE-4113.3.patch

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
 HIVE-4113.3.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772103#comment-13772103
 ] 

Yin Huai commented on HIVE-4487:


i meant when I did show tables in hive cli launched in eclipse.

 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772101#comment-13772101
 ] 

Brock Noland commented on HIVE-4487:


Can you share the file permissions on each directory in the tree?

 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4113:
---

Status: Patch Available  (was: Open)

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
 HIVE-4113.3.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5321) Join filters do not work correctly with outer joins again

2013-09-19 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-5321:
--

Description: 
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2);
do not give correct results.

to reproduce:
hive create table tt1 (c1 int);
hive create table tt2 (c1 int);
$ vi tt1
1
2
3
4

$ vi tt2
1
2
8
9

$ hadoop fs -put tt1 /user/hive/warehouse/tt1/
$ hadoop fs -put tt2 /user/hive/warehouse/tt2/

wrong result:
hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
2);
1   1
2   2
3   NULL
4   NULL

correct result:
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
1   1
2   2

hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
Release Notes - Hive - Version 0.11.0

  was:
SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1  10)
and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1  10)
do not give correct results.

to reproduce:
hive create table tt1 (c1 int);
hive create table tt2 (c1 int);
$ vi tt1
1
2
3
4

$ vi tt2
1
2
8
9

$ hadoop fs -put tt1 /user/hive/warehouse/tt1/
$ hadoop fs -put tt2 /user/hive/warehouse/tt2/

wrong result:
hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
2);
1   1
2   2
3   NULL
4   NULL

correct result:
select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
1   1
2   2

hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
Release Notes - Hive - Version 0.11.0


 Join filters do not work correctly with outer joins again
 -

 Key: HIVE-5321
 URL: https://issues.apache.org/jira/browse/HIVE-5321
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.11.0
Reporter: Alexander Pivovarov

 select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2);
 do not give correct results.
 to reproduce:
 hive create table tt1 (c1 int);
 hive create table tt2 (c1 int);
 $ vi tt1
 1
 2
 3
 4
 $ vi tt2
 1
 2
 8
 9
 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/
 $ hadoop fs -put tt2 /user/hive/warehouse/tt2/
 wrong result:
 hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
 2);
 1 1
 2 2
 3 NULL
 4 NULL
 correct result:
 select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
 1 1
 2 2
 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
 Release Notes - Hive - Version 0.11.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)

2013-09-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772182#comment-13772182
 ] 

Hudson commented on HIVE-5198:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #173 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/173/])
HIVE-5198: WebHCat returns exitcode 143 (w/o an explanation) (Eugene Koifman 
via Thejas Nair) (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524617)
* 
/hive/trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/ExecServiceImpl.java


 WebHCat returns exitcode 143 (w/o an explanation)
 -

 Key: HIVE-5198
 URL: https://issues.apache.org/jira/browse/HIVE-5198
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.11.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5198.patch


 The message might look like this:
 {statement:use default; show table extended like xyz;,error:unable to 
 show table: xyz,exec:{stdout:,stderr:,exitcode:143}} 
 WebHCat has a templeton.exec.timeout property which kills an HCat request 
 (i.e. something like a DDL statement that gets routed to HCat CLI) if it 
 takes longer than this timeout.
 Since WebHCat does a fork/exec to 'hcat' script, the timeout is implemented 
 as SIGTERM sent to the subprocess.  SIGTERM value is 15.  So it's reported as 
 128 + 15 = 143.
 Error logging/reporting should be improved in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772093#comment-13772093
 ] 

Yin Huai commented on HIVE-4487:


Here is my error log when I am launching hive cli through eclipse.
{code}
Caused by: java.io.FileNotFoundException: 
/tmp/yhuai/hive_2013-09-19_13-43-12_206_2528583202954923226-1/-local-1 
(Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:209)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:180)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:176)
at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:234)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.init(ChecksumFileSystem.java:335)
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:364)
at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2252)
... 13 more
{\code}

 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5321) Join filters do not work correctly with outer joins again

2013-09-19 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772259#comment-13772259
 ] 

Alexander Pivovarov commented on HIVE-5321:
---

Most probably the fix should validate a query and prevent executing query 
having filter predicate in join on

 Join filters do not work correctly with outer joins again
 -

 Key: HIVE-5321
 URL: https://issues.apache.org/jira/browse/HIVE-5321
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.11.0
Reporter: Alexander Pivovarov

 select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2);
 does not give correct results.
 to reproduce:
 hive create table tt1 (c1 int);
 hive create table tt2 (c1 int);
 $ vi tt1
 1
 2
 3
 4
 $ vi tt2
 1
 2
 8
 9
 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/
 $ hadoop fs -put tt2 /user/hive/warehouse/tt2/
 wrong result:
 hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 
 2);
 1 1
 2 2
 3 NULL
 4 NULL
 correct result:
 select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2;
 1 1
 2 2
 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt 
 Release Notes - Hive - Version 0.11.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772258#comment-13772258
 ] 

Thejas M Nair commented on HIVE-4487:
-

I think the problem might have to do with HIVE-5313 change. It converts the 
octal string into short using Short.parseShort(scratchDirPermission) but that 
function expects decimal. So 700 gets converted to 700 instead of 448.


 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5322) FsPermission is initialized incorrectly in HIVE 5513

2013-09-19 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772274#comment-13772274
 ] 

Thejas M Nair commented on HIVE-5322:
-

I won't be able to get to fixing this today. So it would be great if anyone can 
take a stab at it.

Supporting 20.2 is not trivial work ! (cc [~appodictic])



 FsPermission is initialized incorrectly in HIVE 5513
 

 Key: HIVE-5322
 URL: https://issues.apache.org/jira/browse/HIVE-5322
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Priority: Blocker
 Fix For: 0.12.0


 The  change in HIVE-5313 converts the octal string into short using 
 Short.parseShort(scratchDirPermission) but Short.parseShort function expects 
 decimal. So 700 gets converted to 700 instead of 448.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-19 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772278#comment-13772278
 ] 

Thejas M Nair commented on HIVE-4487:
-

I have created HIVE-5322 to track the permission issue.


 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Fix For: 0.12.0

 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 14221: HIVE-4113: Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14221/#review26274
---


Few comments, mostly around unnecessary null checks, which I think are no 
longer required, now that column pruning will always be happening. 
Secondly, I think we should be representing column list as LinkedHashSet 
instead of List.


ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
https://reviews.apache.org/r/14221/#comment51336

Seems like this null check is now redundant. List can never be null.



ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
https://reviews.apache.org/r/14221/#comment51341

If above is true, this else can also be removed.



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
https://reviews.apache.org/r/14221/#comment51339

Seems like this null check is now redundant. Can we remove this?



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
https://reviews.apache.org/r/14221/#comment51340

If above is true, than this can also be removed.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51337

Seems like this method is called only from tests, no one actually uses it. 
I will suggest just to remove this method altogether to minimize. 
appendReadColumnIds() can readily be used in place of this and that is what 
Hive uses everywhere.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51338

This method is also only called either from previous method and from test. 
Once we remove previous one, I don't think we need to introduce this new method.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51355

ids should be of type LinkedHashSetInteger instead of list.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51343

Seems like this null check is no longer needed.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51344

I don't think old can be null at this point either. We should remove this 
null check.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51356

Simlarly here cols should be of type LinkedHashSetString



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51345

This null check is not required anymore.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51342

Seems like there is no way that ids could be null now. Lets remove this 
null check. If someone is indeed passing null, than we are just masking that 
bug, which should be fixed at caller site.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51358

This should return LinkedHashSetInteger instead.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51346

Caller should never pass null conf, Returning empty list is dangerous. 
Better to let it throw NPE on the caller than this.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51347

It doesn't seem like that this method is needed.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51348

It should be caller's responsibility to not pass in null here. We should 
not do this null check.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51349

Remove call to this new public method and just inline that logic in this 
method. We should keep our public methods to minimum.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51350

Again, no null check please : )


- Ashutosh Chauhan


On Sept. 19, 2013, 5:48 p.m., Yin Huai wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/14221/
 ---
 
 (Updated Sept. 19, 2013, 5:48 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-4113
 https://issues.apache.org/jira/browse/HIVE-4113
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Modifies ColumnProjectionUtils such there are two flags. One for the column 
 ids and one indicating whether all columns should be read. 

[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772163#comment-13772163
 ] 

Ashutosh Chauhan commented on HIVE-4113:


[~yhuai] I left some comments on RB. But, it seems like you updated the patch 
in meanwhile, so some of those you may have already addressed.

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
 HIVE-4113.3.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5322) FsPermission is initialized incorrectly in HIVE 5513

2013-09-19 Thread Mark Wagner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772286#comment-13772286
 ] 

Mark Wagner commented on HIVE-5322:
---

I'll take at a stab at it.

 FsPermission is initialized incorrectly in HIVE 5513
 

 Key: HIVE-5322
 URL: https://issues.apache.org/jira/browse/HIVE-5322
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Priority: Blocker
 Fix For: 0.12.0


 The  change in HIVE-5313 converts the octal string into short using 
 Short.parseShort(scratchDirPermission) but Short.parseShort function expects 
 decimal. So 700 gets converted to 700 instead of 448.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-5322) FsPermission is initialized incorrectly in HIVE 5513

2013-09-19 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner reassigned HIVE-5322:
-

Assignee: Mark Wagner

 FsPermission is initialized incorrectly in HIVE 5513
 

 Key: HIVE-5322
 URL: https://issues.apache.org/jira/browse/HIVE-5322
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Mark Wagner
Priority: Blocker
 Fix For: 0.12.0


 The  change in HIVE-5313 converts the octal string into short using 
 Short.parseShort(scratchDirPermission) but Short.parseShort function expects 
 decimal. So 700 gets converted to 700 instead of 448.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5322) FsPermission is initialized incorrectly in HIVE 5513

2013-09-19 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-5322:
---

 Summary: FsPermission is initialized incorrectly in HIVE 5513
 Key: HIVE-5322
 URL: https://issues.apache.org/jira/browse/HIVE-5322
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Priority: Blocker
 Fix For: 0.12.0


The  change in HIVE-5313 converts the octal string into short using 
Short.parseShort(scratchDirPermission) but Short.parseShort function expects 
decimal. So 700 gets converted to 700 instead of 448.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5086) Fix scriptfile1.q on Windows

2013-09-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5086:
---

   Resolution: Fixed
Fix Version/s: (was: 0.12.0)
   0.13.0
   Status: Resolved  (was: Patch Available)

Nevermind, I found the missing file in original patch. Committed to trunk. 
Thanks, Daniel!

 Fix scriptfile1.q on Windows
 

 Key: HIVE-5086
 URL: https://issues.apache.org/jira/browse/HIVE-5086
 Project: Hive
  Issue Type: Bug
  Components: Tests, Windows
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.13.0

 Attachments: HIVE-5086-1.patch, HIVE-5086-2.patch


 Test failed with error message:
 [junit] Task with the most failures(4): 
 [junit] -
 [junit] Task ID:
 [junit]   task_20130814023904691_0001_m_00
 [junit] 
 [junit] URL:
 [junit]   
 http://localhost:50030/taskdetails.jsp?jobid=job_20130814023904691_0001tipid=task_20130814023904691_0001_m_00
 [junit] -
 [junit] Diagnostic Messages for this Task:
 [junit] java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row {key:238,value:val_238}
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
 [junit]   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 [junit]   at 
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
 [junit]   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
 [junit]   at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
 [junit]   at java.security.AccessController.doPrivileged(Native Method)
 [junit]   at javax.security.auth.Subject.doAs(Subject.java:396)
 [junit]   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
 [junit]   at org.apache.hadoop.mapred.Child.main(Child.java:265)
 [junit] Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
 Runtime Error while processing row {key:238,value:val_238}
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:538)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
 [junit]   ... 8 more
 [junit] Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 [Error 2]: Unable to initialize custom script.
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:357)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:848)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:848)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:848)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:528)
 [junit]   ... 9 more
 [junit] Caused by: java.io.IOException: Cannot run program 
 D:\tmp\hadoop-Administrator\mapred\local\3_0\taskTracker\Administrator\jobcache\job_20130814023904691_0001\attempt_20130814023904691_0001_m_00_3\work\.\testgrep:
  CreateProcess error=193, %1 is not a valid Win32 application
 [junit]   at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:316)
 [junit]   ... 18 more
 [junit] Caused by: java.io.IOException: CreateProcess error=193, %1 is 
 not a valid Win32 application
 [junit]   at java.lang.ProcessImpl.create(Native Method)
 [junit]   at java.lang.ProcessImpl.init(ProcessImpl.java:81)
 [junit]   at java.lang.ProcessImpl.start(ProcessImpl.java:30)
 [junit]   at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
 [junit]   ... 19 more
 [junit] 
 [junit] 
 [junit] Exception: Client Execution failed with error code = 2
 [junit] See build/ql/tmp/hive.log, or try ant test ... 
 -Dtest.silent=false to get more logs.
 [junit] junit.framework.AssertionFailedError: Client Execution failed 
 with error code = 2
 [junit] See build/ql/tmp/hive.log, or try ant test ... 
 -Dtest.silent=false to get more logs.
 [junit]   at junit.framework.Assert.fail(Assert.java:47)
 [junit]   at 
 

[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772005#comment-13772005
 ] 

Brock Noland commented on HIVE-5317:


Just curious, I was surprised I didn't see adding transactions to HBase + 
support in the hbase storage handler as a potential alternative implementation. 
Could you speak to why your approach is superior to that approach?  Also, it'd 
be great if you posted design document on the design document section of the 
wiki: https://cwiki.apache.org/confluence/display/Hive/DesignDocs

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772028#comment-13772028
 ] 

Owen O'Malley commented on HIVE-5317:
-

Expanding on Alan's comments:

* The HBase scan rate is much lower than HDFS, especially with short-circuit 
reads.
* HBase is tuned for a write-heavy workloads.
* HBase doesn't have a columnar format and can't support column projection.
* HBase doesn't have predicate pushdown into the file format.
* HBase doesn't have the equivalent of partitions or buckets.

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5209) JDBC support for varchar

2013-09-19 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772335#comment-13772335
 ] 

Phabricator commented on HIVE-5209:
---

jdere has commented on the revision HIVE-5209 [jira] JDBC support for varchar.

INLINE COMMENTS
  jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java:32 Tried the 
sample JDBC client from 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients, the 
example is still able to work fine with just the hive-jdbc and hive-service 
JARs in my classpath.  So it actually looks like we are still ok here, even 
without having to pull out those methods above to a separate utility class.

REVISION DETAIL
  https://reviews.facebook.net/D12999

To: JIRA, jdere
Cc: cwsteinbach, thejas


 JDBC support for varchar
 

 Key: HIVE-5209
 URL: https://issues.apache.org/jira/browse/HIVE-5209
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, JDBC, Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: D12999.1.patch, HIVE-5209.1.patch, HIVE-5209.2.patch, 
 HIVE-5209.4.patch, HIVE-5209.5.patch, HIVE-5209.D12705.1.patch


 Support returning varchar length in result set metadata

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772337#comment-13772337
 ] 

Hive QA commented on HIVE-4732:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12604084/HIVE-4732.6.patch

{color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1242 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables
org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-09-19 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772336#comment-13772336
 ] 

Phabricator commented on HIVE-2206:
---

yhuai has closed the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

  Closed by commit rHIVE1504395 (authored by hashutosh).

CHANGED PRIOR TO COMMIT
  https://reviews.facebook.net/D11097?vs=39099id=40161#toc

REVISION DETAIL
  https://reviews.facebook.net/D11097

COMMIT
  https://reviews.facebook.net/rHIVE1504395

To: JIRA, ashutoshc, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, 
 HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, 
 HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, HIVE-2206.patch, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is 

[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4113:
---

Status: Open  (was: Patch Available)

 Optimize select count(1) with RCFile and Orc
 

 Key: HIVE-4113
 URL: https://issues.apache.org/jira/browse/HIVE-4113
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Gopal V
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
 HIVE-4113.3.patch, HIVE-4113.patch, HIVE-4113.patch


 select count(1) loads up every column  every row when used with RCFile.
 select count(1) from store_sales_10_rc gives
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
 HDFS Write: 8 SUCCESS
 {code}
 Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far 
 less
 {code}
 Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
 HDFS Write: 8 SUCCESS
 {code}
 Which is 11% of the data size read by the COUNT(1).
 This was tracked down to the following code in RCFile.java
 {code}
   } else {
 // TODO: if no column name is specified e.g, in select count(1) from 
 tt;
 // skip all columns, this should be distinguished from the case:
 // select * from tt;
 for (int i = 0; i  skippedColIDs.length; i++) {
   skippedColIDs[i] = false;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4996) unbalanced calls to openTransaction/commitTransaction

2013-09-19 Thread Jonathan Sharley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772313#comment-13772313
 ] 

Jonathan Sharley commented on HIVE-4996:


We have seen similar issues in an environment with a lot of concurrent hive 
access from multiple machines all running version 0.11.  Our first thought was 
that we needed to turn on hive.support.concurrency.  However, after doing this 
on all of our hosts including the ones running hiveserver we still see an 
intermittent issue.  I've not been able to reliably reproduce it, but it does 
happen at least a few times a day as part of our regularly scheduled jobs.

 unbalanced calls to openTransaction/commitTransaction
 -

 Key: HIVE-4996
 URL: https://issues.apache.org/jira/browse/HIVE-4996
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
 Environment: hiveserver1  Java HotSpot(TM) 64-Bit Server VM (build 
 20.6-b01, mixed mode)
Reporter: wangfeng
Priority: Critical
  Labels: hive, metastore
   Original Estimate: 504h
  Remaining Estimate: 504h

 when we used hiveserver1 based on hive-0.10.0, we found the Exception 
 thrown.It was:
 FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: 
 commitTransaction was called but openTransactionCalls = 0. This probably 
 indicates that the
 re are unbalanced calls to openTransaction/commitTransaction)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 help

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5209) JDBC support for varchar

2013-09-19 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5209:
-

Attachment: HIVE-5209.5.patch

Uploading patch v5, with modifications based on Thejas' feedback. I tried a 
JDBC client example and it did not require any additional JARs other than 
hive-jdbc and hive-service, so it looks like we do not need rework the 
TypeDescriptor/TypeQualifier changes to avoid dependency on serde classes. 

 JDBC support for varchar
 

 Key: HIVE-5209
 URL: https://issues.apache.org/jira/browse/HIVE-5209
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, JDBC, Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: D12999.1.patch, HIVE-5209.1.patch, HIVE-5209.2.patch, 
 HIVE-5209.4.patch, HIVE-5209.5.patch, HIVE-5209.D12705.1.patch


 Support returning varchar length in result set metadata

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2

2013-09-19 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772311#comment-13772311
 ] 

Sushanth Sowmyan commented on HIVE-4388:


As an update, I've been working on updating this patch, and finally have all my 
tests succeeding, but I'm now told that the KV/Cell changes are being rolled 
back now in 0.96.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 14221: HIVE-4113: Optimize select count(1) with RCFile and Orc

2013-09-19 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14221/#review26284
---



ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
https://reviews.apache.org/r/14221/#comment51360

removed in diff3



ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
https://reviews.apache.org/r/14221/#comment51361

From the context, I think op should be a TS. So I will remove this else. 



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
https://reviews.apache.org/r/14221/#comment51362

removed in diff3



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
https://reviews.apache.org/r/14221/#comment51363

will remove it.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51364

it is only used in tests. I will remove it.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51365

will remove it.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51366

will remove it



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51367

If we use LinkedHashSet, there will be no duplicate in ids. But, we also 
need to check if there is any duplicate in the read column string we are 
appending to (it will happen in the node running the compiler). The current 
version will leave the deduplication work in getReadColumnIDs (it will happen 
in every task). I think your suggestion is better. Will change it



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51370

right now, we do the deduplication work in getReadColumnIDs (at every 
task). I think your suggestion is better. Will change it.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51369

I will check it usage and make sure ids will not be null. Since ids is an 
input parameter, is it better to add an annotation or have an assertion?



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51371

ok will remove it.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
https://reviews.apache.org/r/14221/#comment51372

ok, will also remove other null check


- Yin Huai


On Sept. 19, 2013, 5:48 p.m., Yin Huai wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/14221/
 ---
 
 (Updated Sept. 19, 2013, 5:48 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-4113
 https://issues.apache.org/jira/browse/HIVE-4113
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Modifies ColumnProjectionUtils such there are two flags. One for the column 
 ids and one indicating whether all columns should be read. Additionally the 
 patch updates all locations which uses the old method of empty string 
 indicating all columns should be read.
 
 The automatic formatter generated by ant eclipse-files is fairly aggressive 
 so there are some unrelated import/whitespace cleanup.
 
 This one is based on https://reviews.apache.org/r/11770/ and has been rebased 
 to the latest trunk.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f37d0c 
   conf/hive-default.xml.template 545026d 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
  766056b 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
  553446a 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatRecordReader.java
  3ee6157 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
  1980ef5 
   
 hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitioned.java
  577e06d 
   
 hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java
  d38bb8d 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 31a52ba 
   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java ab0494e 
   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java a5a8943 
   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 0f29a0e 
   ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 
 49145b7 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cccdc1b 
   ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java a83f223 
   ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 9521060 
 

  1   2   >