date:20110726


[ 
https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071291#comment-13071291
 ] 

Paul Yang commented on HIVE-2226:
-

Committed. Thanks Sohan!

 Add API to retrieve table names by an arbitrary filter, e.g., by owner, 
 retention, parameters, etc.
 ---

 Key: HIVE-2226
 URL: https://issues.apache.org/jira/browse/HIVE-2226
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch


 Create a function called get_table_names_by_filter that returns a list of 
 table names in a database that match a certain filter.  The filter should 
 operate similar to the one HIVE-1609.  Initially, you should be able to prune 
 the table list based on owner, retention, or table parameter key/values.  The 
 filtering should take place at the JDO level for efficiency/speed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.


 [ 
https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-2226:


   Resolution: Fixed
Fix Version/s: 0.8.0
   Status: Resolved  (was: Patch Available)

 Add API to retrieve table names by an arbitrary filter, e.g., by owner, 
 retention, parameters, etc.
 ---

 Key: HIVE-2226
 URL: https://issues.apache.org/jira/browse/HIVE-2226
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch


 Create a function called get_table_names_by_filter that returns a list of 
 table names in a database that match a certain filter.  The filter should 
 operate similar to the one HIVE-1609.  Initially, you should be able to prune 
 the table list based on owner, retention, or table parameter key/values.  The 
 filtering should take place at the JDO level for efficiency/speed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2272) add TIMESTAMP data type

2011-07-26 Thread Franklin Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Franklin Hu updated HIVE-2272:
--

Attachment: hive-2272.6.patch

rebase 

 add TIMESTAMP data type
 ---

 Key: HIVE-2272
 URL: https://issues.apache.org/jira/browse/HIVE-2272
 Project: Hive
  Issue Type: New Feature
Reporter: Franklin Hu
Assignee: Franklin Hu
 Attachments: hive-2272.1.patch, hive-2272.2.patch, hive-2272.3.patch, 
 hive-2272.4.patch, hive-2272.5.patch, hive-2272.6.patch


 Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 
 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision 
 using both LazyBinary and LazySimple SerDes. 
 For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp 
 parsable strings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: HIVE-2286: ClassCastException when building index with security.authorization turned on

2011-07-26 Thread John Sichi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1137/#review1188
---



ql/src/java/org/apache/hadoop/hive/ql/Driver.java
https://reviews.apache.org/r/1137/#comment2597

java.util.Stack is deprecated since it adds unnecessary synchronization.  
We don't have a replacement yet (HIVE-1626) so we've just been using ArrayList.

Also, instead of typecasting to/from Object, use a static inner class for 
holding the record of state variables.


- John


On 2011-07-25 23:03:22, Syed Albiz wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1137/
 ---
 
 (Updated 2011-07-25 23:03:22)
 
 
 Review request for hive, John Sichi and Ning Zhang.
 
 
 Summary
 ---
 
 Save the original HiveOperation/commandType when we generate the index 
 builder task and restore it after we're done generating the task so that the 
 authorization checks make the right decision when deciding what to do.
 
 
 This addresses bug HIVE-2286.
 https://issues.apache.org/jira/browse/HIVE-2286
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe 
   ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION 
   ql/src/test/results/clientnegative/index_compact_entry_limit.q.out fcb2673 
   ql/src/test/results/clientnegative/index_compact_size_limit.q.out fcb2673 
   ql/src/test/results/clientpositive/index_auth.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/index_auto.q.out 8d65f98 
   ql/src/test/results/clientpositive/index_auto_file_format.q.out 194b35e 
   ql/src/test/results/clientpositive/index_auto_multiple.q.out 6b81fc3 
   ql/src/test/results/clientpositive/index_auto_partitioned.q.out b0635db 
   ql/src/test/results/clientpositive/index_auto_unused.q.out 3631bbc 
   ql/src/test/results/clientpositive/index_bitmap.q.out 8f41ce3 
   ql/src/test/results/clientpositive/index_bitmap1.q.out 9f638f5 
   ql/src/test/results/clientpositive/index_bitmap2.q.out e901477 
   ql/src/test/results/clientpositive/index_bitmap3.q.out 116c973 
   ql/src/test/results/clientpositive/index_bitmap_auto.q.out cc9d91e 
   ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 
 9003eb4 
   ql/src/test/results/clientpositive/index_bitmap_rc.q.out 9bd3c98 
   ql/src/test/results/clientpositive/index_compact.q.out c339ec9 
   ql/src/test/results/clientpositive/index_compact_1.q.out 34ba3ca 
   ql/src/test/results/clientpositive/index_compact_2.q.out e8ce238 
   ql/src/test/results/clientpositive/index_compact_3.q.out d39556d 
   ql/src/test/results/clientpositive/index_creation.q.out 532f07e 
 
 Diff: https://reviews.apache.org/r/1137/diff
 
 
 Testing
 ---
 
 Added new testcase to TestCliDriver: index_auth.q
 
 
 Thanks,
 
 Syed

[jira] [Commented] (HIVE-2286) ClassCastException when building index with security.authorization turned on

2011-07-26 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071315#comment-13071315
 ] 

jirapos...@reviews.apache.org commented on HIVE-2286:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1137/#review1188
---



ql/src/java/org/apache/hadoop/hive/ql/Driver.java
https://reviews.apache.org/r/1137/#comment2597

java.util.Stack is deprecated since it adds unnecessary synchronization.  
We don't have a replacement yet (HIVE-1626) so we've just been using ArrayList.

Also, instead of typecasting to/from Object, use a static inner class for 
holding the record of state variables.


- John


On 2011-07-25 23:03:22, Syed Albiz wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1137/
bq.  ---
bq.  
bq.  (Updated 2011-07-25 23:03:22)
bq.  
bq.  
bq.  Review request for hive, John Sichi and Ning Zhang.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Save the original HiveOperation/commandType when we generate the index 
builder task and restore it after we're done generating the task so that the 
authorization checks make the right decision when deciding what to do.
bq.  
bq.  
bq.  This addresses bug HIVE-2286.
bq.  https://issues.apache.org/jira/browse/HIVE-2286
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe 
bq.ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION 
bq.ql/src/test/results/clientnegative/index_compact_entry_limit.q.out 
fcb2673 
bq.ql/src/test/results/clientnegative/index_compact_size_limit.q.out 
fcb2673 
bq.ql/src/test/results/clientpositive/index_auth.q.out PRE-CREATION 
bq.ql/src/test/results/clientpositive/index_auto.q.out 8d65f98 
bq.ql/src/test/results/clientpositive/index_auto_file_format.q.out 194b35e 
bq.ql/src/test/results/clientpositive/index_auto_multiple.q.out 6b81fc3 
bq.ql/src/test/results/clientpositive/index_auto_partitioned.q.out b0635db 
bq.ql/src/test/results/clientpositive/index_auto_unused.q.out 3631bbc 
bq.ql/src/test/results/clientpositive/index_bitmap.q.out 8f41ce3 
bq.ql/src/test/results/clientpositive/index_bitmap1.q.out 9f638f5 
bq.ql/src/test/results/clientpositive/index_bitmap2.q.out e901477 
bq.ql/src/test/results/clientpositive/index_bitmap3.q.out 116c973 
bq.ql/src/test/results/clientpositive/index_bitmap_auto.q.out cc9d91e 
bq.ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 
9003eb4 
bq.ql/src/test/results/clientpositive/index_bitmap_rc.q.out 9bd3c98 
bq.ql/src/test/results/clientpositive/index_compact.q.out c339ec9 
bq.ql/src/test/results/clientpositive/index_compact_1.q.out 34ba3ca 
bq.ql/src/test/results/clientpositive/index_compact_2.q.out e8ce238 
bq.ql/src/test/results/clientpositive/index_compact_3.q.out d39556d 
bq.ql/src/test/results/clientpositive/index_creation.q.out 532f07e 
bq.  
bq.  Diff: https://reviews.apache.org/r/1137/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Added new testcase to TestCliDriver: index_auth.q
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Syed
bq.  
bq.



 ClassCastException when building index with security.authorization turned on
 

 Key: HIVE-2286
 URL: https://issues.apache.org/jira/browse/HIVE-2286
 Project: Hive
  Issue Type: Bug
Reporter: Syed S. Albiz
Assignee: Syed S. Albiz
 Attachments: HIVE-2286.1.patch, HIVE-2286.2.patch


 When trying to build an index with authorization checks turned on, hive 
 issues the following ClassCastException:
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer cannot be cast to
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
  at
 org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:540)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:848)
  at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:224)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:293)
  at
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:385)
  at
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:392)
  at

[jira] [Commented] (HIVE-2020) Create a separate namespace for Hive variables

2011-07-26 Thread Vaibhav Aggarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071318#comment-13071318
 ] 

Vaibhav Aggarwal commented on HIVE-2020:


I propose to use -d, --define to define Hive variables.
Amazon ElasticMapreduce is already using this notation for hive variables and 
variable substitution.

This approach would also clearly separate use of -hiveconf from -d or --define 
which would be used to purely set hive variables.

This would also maintain consistency for Hive users.

 Create a separate namespace for Hive variables
 --

 Key: HIVE-2020
 URL: https://issues.apache.org/jira/browse/HIVE-2020
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Carl Steinbach

 Support for variable substitution was added in HIVE-1096. However, variable 
 substitution was implemented by reusing the HiveConf namespace, so there is 
 no separation between Hive configuration properties and Hive variables.
 This ticket encompasses the following enhancements:
 * Create a separate namespace for managing Hive variables.
 * Add support for setting variables on the command line via '-hivevar x=y'
 * Add support for setting variables through the CLI via 'var x=y'
 * Add support for referencing variables in statements using either 
 '${hivevar:var_name}' or '${var_name}'
 * Provide a means for differentiating between hiveconf, hivevar, system, and 
 environment properties in the output of 'set -v'

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2305) UNION ALL on different types throws runtime exception

2011-07-26 Thread Franklin Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Franklin Hu updated HIVE-2305:
--

Attachment: hive-2305.2.patch

fix upstream input file change propagation

 UNION ALL on different types throws runtime exception
 -

 Key: HIVE-2305
 URL: https://issues.apache.org/jira/browse/HIVE-2305
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Franklin Hu
Assignee: Franklin Hu
 Attachments: hive-2305.1.patch, hive-2305.2.patch


 Ex:
 SELECT * (SELECT 123 FROM ... UNION ALL SELECT '123' FROM ..) t;
 Unioning columns of different types currently throws runtime exceptions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-1143) CREATE VIEW followup: updatable views


 [ 
https://issues.apache.org/jira/browse/HIVE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1143:


Assignee: Charles Chen  (was: Carl Steinbach)

 CREATE VIEW followup:  updatable views
 --

 Key: HIVE-1143
 URL: https://issues.apache.org/jira/browse/HIVE-1143
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Charles Chen

 For HIVE-972, we only implemented read-only views.  Updatable views are 
 difficult in general, but for simple cases where views are being used to 
 impose a rename layer on existing tables/columns, update support would be 
 high value (for consistent read/write access) and not a lot of work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-1989) recognize transitivity of predicates on join keys


 [ 
https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1989:


Assignee: Charles Chen

 recognize transitivity of predicates on join keys
 -

 Key: HIVE-1989
 URL: https://issues.apache.org/jira/browse/HIVE-1989
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Charles Chen

 Given
 {noformat}
 set hive.mapred.mode=strict;
 create table invites (foo int, bar string) partitioned by (ds string);
 create table invites2 (foo int, bar string) partitioned by (ds string);
 select count(*) from invites join invites2 on invites.ds=invites2.ds and 
 invites.ds='2011-01-01';
 {noformat}
 currently an error occurs:
 {noformat}
 Error in semantic analysis: No Partition Predicate Found for Alias invites2 
 Table invites2
 {noformat}
 The optimizer should be able to infer a predicate on invites2 via 
 transitivity.  The current lack places a burden on the user to add a 
 redundant predicate, and makes impossible (at least in strict mode) join 
 views where both underlying tables are partitioned (the join select list has 
 to pick one of the tables arbitrarily).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2123) CommandNeedRetryException needs release locks


[ 
https://issues.apache.org/jira/browse/HIVE-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071327#comment-13071327
 ] 

John Sichi commented on HIVE-2123:
--

This one has been sitting in Patch Available queue for a while...anything 
holding it up?


 CommandNeedRetryException needs release locks
 -

 Key: HIVE-2123
 URL: https://issues.apache.org/jira/browse/HIVE-2123
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2123.1.patch, HIVE-2123.2.patch, HIVE-2123.3.patch, 
 HIVE-2123.4.patch


 now when CommandNeedRetryException is thrown, locks are not released. Not 
 sure whether it will cause problem, since the same locks will be acquired 
 when retrying it. It is anyway something we need to fix. Also we can do some 
 little code cleaning up to make future mistakes less likely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions


[ 
https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071328#comment-13071328
 ] 

John Sichi commented on HIVE-2242:
--

This one has been sitting in Patch Available queue for a while...anything 
holding it up?


 DDL Semantic Analyzer does not pass partial specification partitions to 
 PreExecute hooks when dropping partitions
 -

 Key: HIVE-2242
 URL: https://issues.apache.org/jira/browse/HIVE-2242
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2242.1.patch


 Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
 partitions that have a full specification to Pre Execution hooks.  It should 
 also include all matches from partial specifications.
 E.g., suppose you have a table
 {{create table test_table (a string) partitioned by (p1 string, p2 string);}}
 {{alter table test_table add partition (p1=1, p2=1);}}
 {{alter table test_table add partition (p1=1, p2=2);}}
 {{alter table test_table add partition (p1=2, p2=2);}}
 and you run 
 {{alter table test_table drop partition(p1=1);}}
 Pre-execution hooks will not be passed any of the partitions.  The expected 
 behavior is for pre-execution hooks to get the WriteEntity's with the 
 partitions p1=1/p2=1 and p1=1/p2=2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2065) RCFile issues


[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071330#comment-13071330
 ] 

John Sichi commented on HIVE-2065:
--

This one has been sitting in Patch Available queue for a while...are there 
issues that still need to be resolved?



 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE.2065.patch.0.txt, HIVE.2065.patch.1.txt, 
 Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: HIVE-2272: add TIMESTAMP data type

2011-07-26 Thread Franklin Hu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1135/
---

(Updated 2011-07-26 21:11:35.218104)


Review request for hive.


Changes
---

Rebase


Summary
---

Adds TIMESTAMP type to serde2 with both string (LazySimple) and binary 
(LazyBinary) serialization.
Supports SQL style jdbc timestamps of the format with nanosecond precision
-MM-DD HH:MM:SS[.fff...]


This addresses bug HIVE-2272.
https://issues.apache.org/jira/browse/HIVE-2272


Diffs (updated)
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDate.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateAdd.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDayOfMonth.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMinute.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMonth.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSecond.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFWeekOfYear.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFYear.java 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovariance.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovarianceSample.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.java
 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStd.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStdSample.java
 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVarianceSample.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFnGrams.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java
 PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/invalid_t_create3.q 1151189 
  trunk/ql/src/test/queries/clientpositive/timestamp_1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_2.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_3.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_comparison.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_udf.q PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/invalid_create_tbl1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_alter1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_alter2.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_create1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_create2.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_transform.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/wrong_column_type.q.out 1151189 
  trunk/ql/src/test/results/clientpositive/show_functions.q.out 1151189 
  trunk/ql/src/test/results/clientpositive/timestamp_1.q.out PRE-CREATION

[jira] [Commented] (HIVE-2272) add TIMESTAMP data type

2011-07-26 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071368#comment-13071368
 ] 

jirapos...@reviews.apache.org commented on HIVE-2272:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1135/
---

(Updated 2011-07-26 21:11:35.218104)


Review request for hive.


Changes
---

Rebase


Summary
---

Adds TIMESTAMP type to serde2 with both string (LazySimple) and binary 
(LazyBinary) serialization.
Supports SQL style jdbc timestamps of the format with nanosecond precision
-MM-DD HH:MM:SS[.fff...]


This addresses bug HIVE-2272.
https://issues.apache.org/jira/browse/HIVE-2272


Diffs (updated)
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDate.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateAdd.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDayOfMonth.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMinute.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMonth.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSecond.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFWeekOfYear.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFYear.java 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovariance.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovarianceSample.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.java
 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStd.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStdSample.java
 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVarianceSample.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFnGrams.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java
 PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/invalid_t_create3.q 1151189 
  trunk/ql/src/test/queries/clientpositive/timestamp_1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_2.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_3.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_comparison.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_udf.q PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/invalid_create_tbl1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_alter1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_alter2.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_create1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_create2.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_transform.q.out

[jira] [Created] (HIVE-2308) Throw an error if user specifies unsupported FS in LOCATION clause of CREATE TABLE

Throw an error if user specifies unsupported FS in LOCATION clause of CREATE 
TABLE
--

 Key: HIVE-2308
 URL: https://issues.apache.org/jira/browse/HIVE-2308
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Carl Steinbach




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2309) Incorrect regular expression for extracting task id from filename


 [ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-2309:


Attachment: HIVE-2309.1.patch

 Incorrect regular expression for extracting task id from filename
 -

 Key: HIVE-2309
 URL: https://issues.apache.org/jira/browse/HIVE-2309
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: Paul Yang
Priority: Minor
 Attachments: HIVE-2309.1.patch


 For producing the correct filenames for bucketed tables, there is a method in 
 Utilities.java that extracts out the task id from the filename and replaces 
 it with the bucket number. There is a bug in the regex that is used to 
 extract this value for attempt numbers = 10:
 {code}
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_10').group(1)
 '10'
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_9').group(1)
 '001210'
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-26 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071409#comment-13071409
 ] 

Siying Dong commented on HIVE-2309:
---

can we limit number of digits for the attempt ID?

 Incorrect regular expression for extracting task id from filename
 -

 Key: HIVE-2309
 URL: https://issues.apache.org/jira/browse/HIVE-2309
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: Paul Yang
Assignee: Paul Yang
Priority: Minor
 Attachments: HIVE-2309.1.patch


 For producing the correct filenames for bucketed tables, there is a method in 
 Utilities.java that extracts out the task id from the filename and replaces 
 it with the bucket number. There is a bug in the regex that is used to 
 extract this value for attempt numbers = 10:
 {code}
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_10').group(1)
 '10'
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_9').group(1)
 '001210'
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2231) Column aliases


[ 
https://issues.apache.org/jira/browse/HIVE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071412#comment-13071412
 ] 

Adam Kramer commented on HIVE-2231:
---

The use case here is basically providing backwards compatibility. So for many 
users of a table, and many new users of a table, they are using the same table 
and want to refer to it as such; it is the canonical table.

But sometimes the table was originally named with crummy names, and it'd be 
better and cleaner to document and train new people on the appropriate names.

Views eat up the namespace and provide a level of misdirection that is not 
always desirable, but here are the two biggest limitations of views:
* SELECT * is not fast. I can't SELECT * on a view and get data immediately in 
the same way that I would upon writing the same query. This is true even when 
the schema are exactly the same.
* Partitions are not see-through. I can't use show partitions on a view or 
write any automated system based on the view to identify when new partitions 
land, which forces reference to the original table, and then all is lost.



 Column aliases
 --

 Key: HIVE-2231
 URL: https://issues.apache.org/jira/browse/HIVE-2231
 Project: Hive
  Issue Type: Wish
  Components: Query Processor
Reporter: Adam Kramer
Priority: Trivial

 It would be nice in several cases to be able to alias column names.
 Say someone in your company CREATEd a TABLE called important_but_named_poorly 
 (alvin BIGINT, theodore BIGINT, simon STRING) PARTITIONED BY (dave STRING), 
 that indexes the relationship between an actor (alvin), a target (theodore), 
 and the interaction between them (simon), partitioned based on the date 
 string (dave). Renaming the columns would break a million pipelines that are 
 important but ownerless.
 It would be awesome to define an aliasing system as such:
 ALTER TABLE important_but_named_poorly REPLACE COLUMNS (actor BIGINT AKA 
 alvin, target BIGINT AKA theodore, ixn STRING AKA simon) PARTITIONED BY (ds 
 STRING AKA dave);
 ...which would mean that any user could, e.g., use the term dave to refer 
 to ds if they really wanted to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1955) Support non-constant expressions for array indexes.


 [ 
https://issues.apache.org/jira/browse/HIVE-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-1955:
--

Description: 
FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for Array 
Indexes not Supported dut

...just wrote my own UDF to do this, and it is trivial. We should support this 
natively.

Let foo have these rows:
arr   i
[1,2,3]   1
[3,4,5]   2
[5,4,3]   2
[0,0,1]   0

Then,
SELECT arr[i] FROM foo
should return:
2
5
3
1

Similarly, for the same table,
SELECT 3 IN arr FROM foo
should return:
true
true
true
false

...these use cases are needless limitations of functionality. We shouldn't need 
UDFs to accomplish these goals.

  was:
FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for Array 
Indexes not Supported dut

...just wrote my own UDF to do this, and it is trivial. We should support this 
natively.


 Support non-constant expressions for array indexes.
 ---

 Key: HIVE-1955
 URL: https://issues.apache.org/jira/browse/HIVE-1955
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer

 FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for 
 Array Indexes not Supported dut
 ...just wrote my own UDF to do this, and it is trivial. We should support 
 this natively.
 Let foo have these rows:
 arr   i
 [1,2,3]   1
 [3,4,5]   2
 [5,4,3]   2
 [0,0,1]   0
 Then,
 SELECT arr[i] FROM foo
 should return:
 2
 5
 3
 1
 Similarly, for the same table,
 SELECT 3 IN arr FROM foo
 should return:
 true
 true
 true
 false
 ...these use cases are needless limitations of functionality. We shouldn't 
 need UDFs to accomplish these goals.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification

[
https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adam Kramer updated HIVE-1466:
--

Description:
NULL values are passed to transformers as a literal backslash and a literal N.
NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This
is inconsistent.

The ROW FORMAT specification of tables should be able to specify the manner in
which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or
'\003' or whatever should apply to all instances of table export and saving.

was:
I just updated the Hive wiki to clarify what some would consider an oddity:
When NULL values are exported to a script via TRANSFORM, they are converted to
the string \N, and then when the script's output is read, any cell that
contains only \N is treated as a NULL value.

I believe that there are very VERY few reasons why anyone would need cells that
contain only a backslash and then a capital N to be distinguished from NULL
cells, but for complete generality, we should allow this.

The way to do that is probably by adding a specification in the ROW FORMAT for
a table that would allow any string to be treated as a NULL if it is the only
string in a cell. Some may prefer the empty string, others the word NULL in
caps, etc. I vote for keeping \N as the default because I am used to it, but
also for allowing this to be customized.

Add NULL DEFINED AS to ROW FORMAT specification
---

Key: HIVE-1466
URL: https://issues.apache.org/jira/browse/HIVE-1466
Project: Hive
Issue Type: Improvement
Reporter: Adam Kramer

NULL values are passed to transformers as a literal backslash and a literal
N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL.
This is inconsistent.
The ROW FORMAT specification of tables should be able to specify the manner
in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or
'\003' or whatever should apply to all instances of table export and saving.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2311) TRANSFORM statements should come with their own ROW FORMATs.

TRANSFORM statements should come with their own ROW FORMATs.


 Key: HIVE-2311
 URL: https://issues.apache.org/jira/browse/HIVE-2311
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Adam Kramer


Sometimes Hive tables contain tabs and/or other characters that could easily be 
misinterpreted by a transformer as a delimiter. This can break many TRANSFORM 
queries.

The solution is to have a ROW FORMAT semantics that can be attached to an 
individual TRANSFORM instance. It would have the same semantics as table 
creation, but during serialization it would ensure that any formal delimiter 
characters that did not indicate an actual break between columns would be 
escaped.

At the very least, it is a bug that TRANSFORM statement deserialization does 
not backslash out literal tabs in the current implementation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2311) TRANSFORM statements should come with their own ROW FORMATs.


 [ 
https://issues.apache.org/jira/browse/HIVE-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-2311:
--

  Priority: Minor  (was: Major)
Issue Type: Bug  (was: Improvement)

 TRANSFORM statements should come with their own ROW FORMATs.
 

 Key: HIVE-2311
 URL: https://issues.apache.org/jira/browse/HIVE-2311
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Adam Kramer
Priority: Minor

 Sometimes Hive tables contain tabs and/or other characters that could easily 
 be misinterpreted by a transformer as a delimiter. This can break many 
 TRANSFORM queries.
 The solution is to have a ROW FORMAT semantics that can be attached to an 
 individual TRANSFORM instance. It would have the same semantics as table 
 creation, but during serialization it would ensure that any formal delimiter 
 characters that did not indicate an actual break between columns would be 
 escaped.
 At the very least, it is a bug that TRANSFORM statement deserialization does 
 not backslash out literal tabs in the current implementation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2309) Incorrect regular expression for extracting task id from filename


 [ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-2309:


Attachment: HIVE-2309.2.patch

 Incorrect regular expression for extracting task id from filename
 -

 Key: HIVE-2309
 URL: https://issues.apache.org/jira/browse/HIVE-2309
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: Paul Yang
Assignee: Paul Yang
Priority: Minor
 Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch


 For producing the correct filenames for bucketed tables, there is a method in 
 Utilities.java that extracts out the task id from the filename and replaces 
 it with the bucket number. There is a bug in the regex that is used to 
 extract this value for attempt numbers = 10:
 {code}
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_10').group(1)
 '10'
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_9').group(1)
 '001210'
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2312) Make CLI variables available to UDFs

Make CLI variables available to UDFs


 Key: HIVE-2312
 URL: https://issues.apache.org/jira/browse/HIVE-2312
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients, UDF
Reporter: Adam Kramer


Straightforward use case: My UDFs should be able to condition on whether 
hive.mapred.mode=strict or nonstrict.

But these things could also be useful for certain optimizations. For example, a 
UDAF knowing that there is only one reduce phase could avoid a lot of pushing 
data around unnecessarily.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-26 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071420#comment-13071420
 ] 

Siying Dong commented on HIVE-2309:
---

+1, will commit after tests pass

 Incorrect regular expression for extracting task id from filename
 -

 Key: HIVE-2309
 URL: https://issues.apache.org/jira/browse/HIVE-2309
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: Paul Yang
Assignee: Paul Yang
Priority: Minor
 Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch


 For producing the correct filenames for bucketed tables, there is a method in 
 Utilities.java that extracts out the task id from the filename and replaces 
 it with the bucket number. There is a bug in the regex that is used to 
 extract this value for attempt numbers = 10:
 {code}
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_10').group(1)
 '10'
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_9').group(1)
 '001210'
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-89) avg() min() max() will get error message


 [ 
https://issues.apache.org/jira/browse/HIVE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-89:
---

Fix Version/s: 0.3.0

 avg() min() max() will get error message
 

 Key: HIVE-89
 URL: https://issues.apache.org/jira/browse/HIVE-89
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: hadoop 0.17.2.1 hive 0.17.0
Reporter: YihueyChyi
Assignee: Zheng Shao
 Fix For: 0.3.0


 When I run select min() , max() or avg() ,I will get error message
 Test table : data rows: 15835023
 error message: FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.ExecDriver
 Hadoop web:50030 message
 From reduce process
 java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.reflect.InvocationTargetException
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:173)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.reflect.InvocationTargetException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:243)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:168)
   ... 2 more
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:210)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:297)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:240)
   ... 3 more
 Caused by: java.lang.NumberFormatException: For input string: 2004-12-22
   at 
 sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
   at java.lang.Double.parseDouble(Double.java:510)
   at org.apache.hadoop.hive.ql.udf.UDAFAvg.aggregate(UDAFAvg.java:42)
   ... 10 more
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1251) TRANSFORM should allow piping or allow cross-subquery assumptions.


 [ 
https://issues.apache.org/jira/browse/HIVE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-1251:
--

Description: 
Many traditional transforms can be accomplished via simple unix commands 
chained together. For example, the sort phase is an instance of cut -f 1 | 
sort. However, the TRANSFORM command in Hive doesn't allow for unix-style 
piping to occur.

One classic case where I wish there was piping is when I want to stack a 
column into several rows:

SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py | python 
reducer.py' AS key, value

...in this case, stacker.py would produce output of this form:
key col0
key col1
key col2
...and then the reducer would reduce the above down to one item per key. In 
this case, the current workaround is this:

SELECT TRANSFORM(a.key, a.col) USING 'python reducer.py' AS key, value FROM
(SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py' AS key, 
col FROM table)

...the problem here is that for the above to work (and it should, indeed, work 
in a map-only MR task), I must assume that the data output from one subquery 
will be passed in EXACTLY THE SAME FORMAT to the outer query--i.e., I must 
assume that Hive will not cut a map or reduce phase in between, or fan out 
data from the inner query into different mappers in the outer query.

As a user, *I should not be allowed to assume* that data coming out of a 
subquery goes into the nodes for a superquery in the same order...ESPECIALLY in 
the map phase.

  was:
Many traditional transforms can be accomplished via simple unix commands 
chained together. For example, the sort phase is an instance of cut -f 1 | 
sort. However, the TRANSFORM command in Hive doesn't allow for unix-style 
piping to occur.

One classic case where I wish there was piping is when I want to stack a 
column into several rows:

SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py | python 
reducer.py' AS key, value

...in this case, stacker.py would produce output of this form:
key col0
key col1
key col2
...and then the reducer would reduce the above down to one item per key. In 
this case, the current workaround is this:

SELECT TRANSFORM(a.key, a.col) USING 'python reducer.py' AS key, value FROM
(SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py' AS key, 
col FROM table)

...the problem here is that as a user, *I should not be allowed to assume* that 
the output from the inner query will be passed DIRECTLY to the outer query 
(i.e., the outer query should not assume that it gets the inner query's output 
on the same box and in the same order). I know as a programmer that this works 
fine as a pipe, but when writing Hive code I always wonder--what if Hive 
decides to run the inner query in a reduce step, and the outer query in a 
subsequent map step?

Broadly, my understanding is that the goal of Hive is to abstract the mapreduce 
process away from users. To this end, we have syntax (CLUSTER BY) that allows 
users to assume that a reduce task will occur (but see also 
https://issues.apache.org/jira/browse/HIVE-835 ), but there is no formal way to 
force or syntactically assume that the data will NOT be copied or sorted or 
transformed. I argue that the only case where this would be necessary or 
desirable would be in the instance of a pipe within a transform...ergo a desire 
for | to work as expected.

An alternative would be for the HQL language definition to explicitly state all 
conditions that would cause a task boundary to be crossed (so I can make the 
strong assumption that if none of those conditions obtains, my query will be 
supported in the future)...but that seems potentially restrictive as the 
language and Hadoop evolves.


Summary: TRANSFORM should allow piping or allow cross-subquery 
assumptions.  (was: TRANSFORM should allow pipes in some form)

 TRANSFORM should allow piping or allow cross-subquery assumptions.
 --

 Key: HIVE-1251
 URL: https://issues.apache.org/jira/browse/HIVE-1251
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer

 Many traditional transforms can be accomplished via simple unix commands 
 chained together. For example, the sort phase is an instance of cut -f 1 | 
 sort. However, the TRANSFORM command in Hive doesn't allow for unix-style 
 piping to occur.
 One classic case where I wish there was piping is when I want to stack a 
 column into several rows:
 SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py | python 
 reducer.py' AS key, value
 ...in this case, stacker.py would produce output of this form:
 key col0
 key col1
 key col2
 ...and then the reducer would reduce the above down to one item per key. In 
 this case, the current workaround is this:
 SELECT TRANSFORM(a.key, a.col) USING

[jira] [Updated] (HIVE-10) [Hive] filter is executed after the join


 [ 
https://issues.apache.org/jira/browse/HIVE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-10:
---

Fix Version/s: 0.3.0

 [Hive] filter is executed after the join
 

 Key: HIVE-10
 URL: https://issues.apache.org/jira/browse/HIVE-10
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.3.0


 Filter is not pushed above the join in Hive currently. This can be pretty 
 expensive if the filter is highly selective.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-39) Hive: we should be able to specify a column without a table/alias name


 [ 
https://issues.apache.org/jira/browse/HIVE-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-39:
---

Fix Version/s: 0.3.0

 Hive: we should be able to specify a column without a table/alias name
 --

 Key: HIVE-39
 URL: https://issues.apache.org/jira/browse/HIVE-39
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Zheng Shao
Assignee: Ashish Thusoo
 Fix For: 0.3.0


 SELECT field1, field2 from table1 should work, just as SELECT 
 table1.field1, table1.field2 from table1 
 For join, the situation will be a bit more complicated.  If the 2 join 
 operands have columns of the same name, then we should output an ambiguity 
 error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-58) [hive] join condition does not allow a simple filter


 [ 
https://issues.apache.org/jira/browse/HIVE-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-58:
---

Fix Version/s: 0.3.0

 [hive] join condition does not allow a simple filter
 

 Key: HIVE-58
 URL: https://issues.apache.org/jira/browse/HIVE-58
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.3.0


 In the join condition, a simple filter condition cannot be specified.
 For example,
   select  from A join B ON (A.a = B.b and A.x = 10);
 is not supported.  This can be very useful specially in case of outer joins.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-26) [Hive] uppercase alias with a join not working


 [ 
https://issues.apache.org/jira/browse/HIVE-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-26:
---

Fix Version/s: 0.3.0

 [Hive] uppercase alias with a join not working
 --

 Key: HIVE-26
 URL: https://issues.apache.org/jira/browse/HIVE-26
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.3.0


 EXPLAIN FROM 
 (SELECT src.* FROM src) x
 JOIN 
 (SELECT src.* FROM src) Y
 ON (x.key = Y.key)
 SELECT Y.*;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-836) Add syntax to force a new mapreduce job / transform subquery in mapper


 [ 
https://issues.apache.org/jira/browse/HIVE-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-836:
-

Description: 
Hive currently does a lot of awesome work to figure out when my transformers 
should be used in the mapper and when they should be used in the reducer. 
However, sometimes I have a different plan.

For example, consider this:

{code:title=foo.sql}
SELECT TRANSFORM(a.val1, a.val2)
USING './niftyscript'
AS part1, part2, part3
FROM (
SELECT b.val AS val1, c.val AS val2
FROM tblb b JOIN tblc c on (b.key=c.key)
) a
{code}

...now, assume that the join step is very easy and 'niftyscript' is really 
processor intensive. The ideal format for this is a MR task with few mappers 
and few reducers, and then a second MR task with lots of mappers.

Currently, there is no way to even require the outer TRANSFORM statement occur 
in a separate map phase. Implementing a hint such as /* +MAP */, akin to /* 
+MAPJOIN(x) */, would be awesome.

Current workaround is to dump everything to a temporary table and then start 
over, but that is not an easy to scale--the subquery structure effectively (and 
easily) locks the mid-points so no other job can touch the table.

  was:
Hive currently does a lot of awesome work to figure out when my transformers 
should be used in the mapper and when they should be used in the reducer. 
However, sometimes I have a different plan.

For example, consider this:

SELECT TRANSFORM(a.val1, a.val2)
USING './niftyscript'
AS part1, part2, part3
FROM (
SELECT b.val AS val1, c.val AS val2
FROM tblb b JOIN tblc c on (b.key=c.key)
) a

...in this syntax b and c will be joined (in the reducer, of course), and then 
the rows that pass the join clause will be passed to niftyscript _in the 
reducer._ However, when niftyscript is high-computation and there is a lot of 
data coming out of the join but very few reducers, there's a huge hold-up. It 
would be awesome if I could somehow force a new mapreduce step after the 
subquery, so that ./niftyscript is run in the mappers rather than the prior 
step's reducers.

Current workaround is to dump everything to a temporary table and then start 
over, but that is not an easy to scale--the subquery structure effectively (and 
easily) locks the mid-points so no other job can touch the table.

SUGGESTED FIX: Either cause MAP and REDUCE to force map/reduce steps (c.f. 
https://issues.apache.org/jira/browse/HIVE-835 ), or add a query element to 
specify that the job ends here. For example, in the above query, FROM a 
SELF-CONTAINED or PRECOMPUTE a or START JOB AFTER a or something like that.



 Add syntax to force a new mapreduce job / transform subquery in mapper
 --

 Key: HIVE-836
 URL: https://issues.apache.org/jira/browse/HIVE-836
 Project: Hive
  Issue Type: Wish
Reporter: Adam Kramer

 Hive currently does a lot of awesome work to figure out when my transformers 
 should be used in the mapper and when they should be used in the reducer. 
 However, sometimes I have a different plan.
 For example, consider this:
 {code:title=foo.sql}
 SELECT TRANSFORM(a.val1, a.val2)
 USING './niftyscript'
 AS part1, part2, part3
 FROM (
 SELECT b.val AS val1, c.val AS val2
 FROM tblb b JOIN tblc c on (b.key=c.key)
 ) a
 {code}
 ...now, assume that the join step is very easy and 'niftyscript' is really 
 processor intensive. The ideal format for this is a MR task with few mappers 
 and few reducers, and then a second MR task with lots of mappers.
 Currently, there is no way to even require the outer TRANSFORM statement 
 occur in a separate map phase. Implementing a hint such as /* +MAP */, akin 
 to /* +MAPJOIN(x) */, would be awesome.
 Current workaround is to dump everything to a temporary table and then start 
 over, but that is not an easy to scale--the subquery structure effectively 
 (and easily) locks the mid-points so no other job can touch the table.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-141) drop table partition behaving oddly - does not create subdirectories


 [ 
https://issues.apache.org/jira/browse/HIVE-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-141:


Fix Version/s: 0.3.0

 drop table partition behaving oddly - does not create subdirectories
 

 Key: HIVE-141
 URL: https://issues.apache.org/jira/browse/HIVE-141
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Hao Liu
Assignee: Prasad Chakka
Priority: Critical
 Fix For: 0.3.0

   Original Estimate: 4h
  Remaining Estimate: 4h

 for example, I have a table, which has two partitions:
 tmp_table_name/dt=2008-11-01
 tmp_table_name/dt=2008-11-02
 When we use hive metastore to drop the first partition (as root), I expect 
 the data file will be moved to 
 user/root/.Trash/081103/tmp_table_name/dt=2008-11-01 by default. However, 
 directory tmp_table_name was not created, the data was moved to 
 user/root/.Trash/081103/dt=2008-11-01, which makes data recovery a very 
 difficult task.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-66) Insert into a dynamic serde table from a MetadataTypedColumnSetSerDe


 [ 
https://issues.apache.org/jira/browse/HIVE-66?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-66:
---

Fix Version/s: 0.3.0

 Insert into a dynamic serde table from a MetadataTypedColumnSetSerDe
 

 Key: HIVE-66
 URL: https://issues.apache.org/jira/browse/HIVE-66
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
Priority: Critical
 Fix For: 0.3.0


 Fails with column mismatch error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-106) Join operation fails for some queries


 [ 
https://issues.apache.org/jira/browse/HIVE-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-106:


Fix Version/s: 0.8.0

 Join operation fails for some queries
 -

 Key: HIVE-106
 URL: https://issues.apache.org/jira/browse/HIVE-106
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Josh Ferguson
Assignee: Namit Jain
Priority: Critical
 Fix For: 0.8.0


 The Tables Are
 CREATE TABLE activities 
 (actor_id STRING, actee_id STRING, properties MAPSTRING, STRING) 
 PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
 CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS 
 ROW FORMAT DELIMITED 
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE;
 Detailed Table Information:
 Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null),
  FieldSchema(name:actee_id,type:string,comment:null), 
 FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id,
  
 actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
  FieldSchema(name:application,type:string,comment:null), 
 FieldSchema(name:dataset,type:string,comment:null), 
 FieldSchema(name:hour,type:int,comment:null)],parameters:{})
 CREATE TABLE users 
 (id STRING, properties MAPSTRING, STRING) 
 PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
 CLUSTERED BY (id) INTO 32 BUCKETS 
 ROW FORMAT DELIMITED 
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE;
 Detailed Table Information:
 Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null),
  
 FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
  FieldSchema(name:application,type:string,comment:null), 
 FieldSchema(name:dataset,type:string,comment:null), 
 FieldSchema(name:hour,type:int,comment:null)],parameters:{})
 A working query is
 SELECT activities.* FROM activities WHERE activities.dataset='poke' AND 
 activities.properties['verb'] = 'Dance';
 A non working query is
 SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON 
 activities.actor_id = users.id WHERE activities.dataset='poke' AND 
 activities.properties['verb'] = 'Dance';
 The Exception Is
 java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index 
 expression on string
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
   at 
 org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
   at

[jira] [Updated] (HIVE-145) Hive wiki provides incorrect download and setup instructions


 [ 
https://issues.apache.org/jira/browse/HIVE-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-145:


Fix Version/s: 0.3.0

 Hive wiki provides incorrect download and setup instructions
 

 Key: HIVE-145
 URL: https://issues.apache.org/jira/browse/HIVE-145
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Aaron Kimball
Assignee: Raghotham Murthy
 Fix For: 0.3.0


 The Getting Started instructions at 
 http://wiki.apache.org/hadoop/Hive/GettingStarted are incorrect. They claim 
 that you should download a dist-17.tar.gz file from a Facebook mirror. This 
 link is 404, and Facebook does not seem to maintain a publicly available Hive 
 package at any other location I can find. Thus, the wiki should be updated to 
 instruct users to checkout/export files from SVN. (This page is locked, so I 
 can't change it myself)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-835) Deprecate, remove, or fix MAP and REDUCE syntax.


 [ 
https://issues.apache.org/jira/browse/HIVE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-835:
-

Summary: Deprecate, remove, or fix MAP and REDUCE syntax.  (was: Make MAP 
and REDUCE work as expected or add warnings)

 Deprecate, remove, or fix MAP and REDUCE syntax.
 

 Key: HIVE-835
 URL: https://issues.apache.org/jira/browse/HIVE-835
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer

 There are syntactic elements MAP and REDUCE which function as syntactic sugar 
 for SELECT TRANSFORM. This behavior is not at all intuitive, because no 
 checking or verification is done to ensure that the user's intention is met.
 Specifically, Hive may see a MAP query and simply tack the transform script 
 on to the end of a reduce job (so, the user says MAP but hive does a REDUCE), 
 or (more dangerously) vice-versa. Given that Hive's whole point is to sit on 
 top of a mapreduce framework and allow transformations in the mapper or 
 reducer, it seems very inappropriate for Hive to ignore a clear command from 
 the user to MAP or to REDUCE the data using a script, and then simply ignore 
 it.
 Better behavior would be for hive to see a MAP command and to start a new 
 mapreduce step and run the command in the mapper (even if it otherwise would 
 be run in the reducer), and for REDUCE to begin a reduce step if necessary 
 (so, tack the REDUCE script on to the end of a REDUCE job if the current 
 system would do so, or if not, treat the 0th column as the reduce key, throw 
 a warning saying this has been done, and force a reduce job).
 Acceptable behavior would be to throw an error or warning when the user's 
 clearly-stated desire is going to be ignored. Warning: User used MAP 
 keyword, but transformation will occur in the reduce phase / Warning: User 
 used REDUCE keyword, but did not specify DISTRIBUTE BY / CLUSTER BY column. 
 Transformation will occur in the map phase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-211) Add metastore_db to svn ignore


 [ 
https://issues.apache.org/jira/browse/HIVE-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-211:


Fix Version/s: 0.3.0

 Add metastore_db to svn ignore
 --

 Key: HIVE-211
 URL: https://issues.apache.org/jira/browse/HIVE-211
 Project: Hive
  Issue Type: Task
Reporter: Johan Oskarsson
Assignee: Zheng Shao
Priority: Trivial
 Fix For: 0.3.0


 As per HIVE-101 add the metastore_db directory to svn ignore since it 
 shouldn't be committed or added to any patches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-835) Deprecate, remove, or fix MAP and REDUCE syntax.


 [ 
https://issues.apache.org/jira/browse/HIVE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-835:


Component/s: SQL

 Deprecate, remove, or fix MAP and REDUCE syntax.
 

 Key: HIVE-835
 URL: https://issues.apache.org/jira/browse/HIVE-835
 Project: Hive
  Issue Type: Improvement
  Components: SQL
Reporter: Adam Kramer

 There are syntactic elements MAP and REDUCE which function as syntactic sugar 
 for SELECT TRANSFORM. This behavior is not at all intuitive, because no 
 checking or verification is done to ensure that the user's intention is met.
 Specifically, Hive may see a MAP query and simply tack the transform script 
 on to the end of a reduce job (so, the user says MAP but hive does a REDUCE), 
 or (more dangerously) vice-versa. Given that Hive's whole point is to sit on 
 top of a mapreduce framework and allow transformations in the mapper or 
 reducer, it seems very inappropriate for Hive to ignore a clear command from 
 the user to MAP or to REDUCE the data using a script, and then simply ignore 
 it.
 Better behavior would be for hive to see a MAP command and to start a new 
 mapreduce step and run the command in the mapper (even if it otherwise would 
 be run in the reducer), and for REDUCE to begin a reduce step if necessary 
 (so, tack the REDUCE script on to the end of a REDUCE job if the current 
 system would do so, or if not, treat the 0th column as the reduce key, throw 
 a warning saying this has been done, and force a reduce job).
 Acceptable behavior would be to throw an error or warning when the user's 
 clearly-stated desire is going to be ignored. Warning: User used MAP 
 keyword, but transformation will occur in the reduce phase / Warning: User 
 used REDUCE keyword, but did not specify DISTRIBUTE BY / CLUSTER BY column. 
 Transformation will occur in the map phase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.

2011-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071426#comment-13071426
 ] 

Hudson commented on HIVE-2226:
--

Integrated in Hive-trunk-h0.21 #851 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/851/])
HIVE-2226. Add API to retrieve table names by an arbitrary filter, e.g., by 
owner, retention, parameters, etc. (Sohan Jain via pauly)

pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1151213
Files : 
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Constants.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
* /hive/trunk/metastore/if/hive_metastore.thrift
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote
* 
/hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_constants.php
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp
* 
/hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
* /hive/trunk/metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h
* /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/constants.py
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php
* /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb


 Add API to retrieve table names by an arbitrary filter, e.g., by owner, 
 retention, parameters, etc.
 ---

 Key: HIVE-2226
 URL: https://issues.apache.org/jira/browse/HIVE-2226
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch


 Create a function called get_table_names_by_filter that returns a list of 
 table names in a database that match a certain filter.  The filter should 
 operate similar to the one HIVE-1609.  Initially, you should be able to prune 
 the table list based on owner, retention, or table parameter key/values.  The 
 filtering should take place at the JDO level for efficiency/speed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-150) group by count(1) will get error


 [ 
https://issues.apache.org/jira/browse/HIVE-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-150:


Fix Version/s: 0.3.0

 group by count(1) will get error
 

 Key: HIVE-150
 URL: https://issues.apache.org/jira/browse/HIVE-150
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: HADOOP 0.17.2.1 
Reporter: YihueyChyi
 Fix For: 0.3.0

 Attachments: hive-150.1.patch


 HIVEQL:  select l.http_user_agent,count(1) from log_resume_all l  group by 
 l.http_user_agent
 Maybe I'll get error in the second stage:
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.ExecDriver
 The second stage :
 map error 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:151)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:250)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:174)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
 syslog logs
 2008-12-10 15:41:15,209 DEBUG org.apache.hadoop.mapred.TaskTracker: Child 
 starting
 2008-12-10 15:41:15,717 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2008-12-10 15:41:15,805 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 64
 2008-12-10 15:41:16,252 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
 the native-hadoop library
 2008-12-10 15:41:16,253 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: 
 Successfully loaded  initialized native-zlib library
 2008-12-10 15:41:16,424 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 Initializing Self
 2008-12-10 15:41:16,428 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 Adding alias /tmp/hive-root/462573742/46102483.10002 to work list for file 
 /tmp/hive-root/462573742/46102483.10002/0015_r_29_0
 2008-12-10 15:41:16,438 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Got 
 partitions: null
 2008-12-10 15:41:16,438 INFO 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Initializing Self
 2008-12-10 15:41:16,443 INFO 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Using tag = -1
 2008-12-10 15:41:16,460 INFO 
 org.apache.hadoop.hive.serde2.thrift.TBinarySortableProtocol: Sort order is 
 2008-12-10 15:41:16,460 INFO 
 org.apache.hadoop.hive.serde2.thrift.TBinarySortableProtocol: Sort order is 
 2008-12-10 15:41:16,489 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 DESERIALIZE_ERRORS:0
 2008-12-10 15:41:16,495 WARN org.apache.hadoop.mapred.TaskTracker: Error 
 running child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:151)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:250)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:174)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: HIVE-2286: ClassCastException when building index with security.authorization turned on

2011-07-26 Thread Syed Albiz


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1137/
---

(Updated 2011-07-26 23:28:13.279889)


Review request for hive, John Sichi and Ning Zhang.


Changes
---

refactor patch to dump query state into an inner class rather than a Stack.


Summary
---

Save the original HiveOperation/commandType when we generate the index builder 
task and restore it after we're done generating the task so that the 
authorization checks make the right decision when deciding what to do.


This addresses bug HIVE-2286.
https://issues.apache.org/jira/browse/HIVE-2286


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe 
  ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION 
  ql/src/test/results/clientnegative/addpart1.q.out f4da8f1 
  ql/src/test/results/clientnegative/alter_concatenate_indexed_table.q.out 
8ae1f9d 
  ql/src/test/results/clientnegative/alter_non_native.q.out 8be2c3b 
  ql/src/test/results/clientnegative/alter_view_failure.q.out 9954b66 
  ql/src/test/results/clientnegative/alter_view_failure2.q.out 5915b4f 
  ql/src/test/results/clientnegative/alter_view_failure4.q.out 97d6b18 
  ql/src/test/results/clientnegative/alter_view_failure5.q.out 2291ca6 
  ql/src/test/results/clientnegative/alter_view_failure6.q.out 03b2bc3 
  ql/src/test/results/clientnegative/alter_view_failure7.q.out d0f958c 
  ql/src/test/results/clientnegative/alter_view_failure8.q.out 4420c57 
  ql/src/test/results/clientnegative/alter_view_failure9.q.out 67306d3 
  ql/src/test/results/clientnegative/altern1.q.out c52ca04 
  ql/src/test/results/clientnegative/analyze_view.q.out 99def40 
  ql/src/test/results/clientnegative/archive1.q.out 0927686 
  ql/src/test/results/clientnegative/archive2.q.out 25baefa 
  ql/src/test/results/clientnegative/authorization_fail_1.q.out ab1abe2 
  ql/src/test/results/clientnegative/authorization_fail_3.q.out cd7ceb1 
  ql/src/test/results/clientnegative/authorization_fail_4.q.out b05f9b7 
  ql/src/test/results/clientnegative/authorization_fail_5.q.out f5bdc6a 
  ql/src/test/results/clientnegative/authorization_fail_7.q.out a52fd1c 
  ql/src/test/results/clientnegative/authorization_part.q.out 625d60c 
  ql/src/test/results/clientnegative/column_rename1.q.out 7c30e4e 
  ql/src/test/results/clientnegative/column_rename2.q.out 0ca78f9 
  ql/src/test/results/clientnegative/column_rename4.q.out f14fd48 
  ql/src/test/results/clientnegative/create_or_replace_view1.q.out 97bfa21 
  ql/src/test/results/clientnegative/create_or_replace_view2.q.out 8edac34 
  ql/src/test/results/clientnegative/create_or_replace_view4.q.out 89dd5f5 
  ql/src/test/results/clientnegative/create_or_replace_view5.q.out a0aed59 
  ql/src/test/results/clientnegative/create_or_replace_view6.q.out df44e33 
  ql/src/test/results/clientnegative/create_or_replace_view7.q.out 9356dcc 
  ql/src/test/results/clientnegative/create_or_replace_view8.q.out 4161659 
  ql/src/test/results/clientnegative/create_view_failure1.q.out 43cded4 
  ql/src/test/results/clientnegative/create_view_failure2.q.out a038067 
  ql/src/test/results/clientnegative/create_view_failure4.q.out f968569 
  ql/src/test/results/clientnegative/database_create_already_exists.q.out 
08c04f9 
  ql/src/test/results/clientnegative/database_create_invalid_name.q.out 1e58089 
  ql/src/test/results/clientnegative/database_drop_does_not_exist.q.out 80c00cd 
  ql/src/test/results/clientnegative/database_drop_not_empty.q.out baa8f37 
  ql/src/test/results/clientnegative/database_drop_not_empty_restrict.q.out 
b297a99 
  ql/src/test/results/clientnegative/database_switch_does_not_exist.q.out 
8b5674d 
  ql/src/test/results/clientnegative/drop_partition_failure.q.out 8a7c63d 
  ql/src/test/results/clientnegative/drop_table_failure2.q.out 9b63102 
  ql/src/test/results/clientnegative/drop_view_failure1.q.out 61ec927 
  ql/src/test/results/clientnegative/dyn_part3.q.out 5f4df65 
  ql/src/test/results/clientnegative/exim_00_unsupported_schema.q.out 814b742 
  ql/src/test/results/clientnegative/exim_01_nonpart_over_loaded.q.out 0351bc1 
  ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out 
d40ff27 
  ql/src/test/results/clientnegative/exim_03_nonpart_noncompat_colschema.q.out 
adff0f8 
  ql/src/test/results/clientnegative/exim_04_nonpart_noncompat_colnumber.q.out 
b84e954 
  ql/src/test/results/clientnegative/exim_05_nonpart_noncompat_coltype.q.out 
96f8452 
  ql/src/test/results/clientnegative/exim_06_nonpart_noncompat_storage.q.out 
25deaa3 
  ql/src/test/results/clientnegative/exim_07_nonpart_noncompat_ifof.q.out 
f9c3d5a 
  ql/src/test/results/clientnegative/exim_08_nonpart_noncompat_serde.q.out 
12c737a 
  ql/src/test/results/clientnegative/exim_09_nonpart_noncompat_serdeparam.q.out 
77afe3a

[jira] [Updated] (HIVE-2286) ClassCastException when building index with security.authorization turned on

2011-07-26 Thread Syed S. Albiz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed S. Albiz updated HIVE-2286:


Attachment: HIVE-2286.6.patch

 ClassCastException when building index with security.authorization turned on
 

 Key: HIVE-2286
 URL: https://issues.apache.org/jira/browse/HIVE-2286
 Project: Hive
  Issue Type: Bug
Reporter: Syed S. Albiz
Assignee: Syed S. Albiz
 Attachments: HIVE-2286.1.patch, HIVE-2286.2.patch, HIVE-2286.6.patch


 When trying to build an index with authorization checks turned on, hive 
 issues the following ClassCastException:
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer cannot be cast to
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
  at
 org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:540)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:848)
  at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:224)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:293)
  at
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:385)
  at
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:392)
  at
 org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:567)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
 a:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2286) ClassCastException when building index with security.authorization turned on

2011-07-26 Thread Syed S. Albiz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed S. Albiz updated HIVE-2286:


Status: Patch Available  (was: Open)

 ClassCastException when building index with security.authorization turned on
 

 Key: HIVE-2286
 URL: https://issues.apache.org/jira/browse/HIVE-2286
 Project: Hive
  Issue Type: Bug
Reporter: Syed S. Albiz
Assignee: Syed S. Albiz
 Attachments: HIVE-2286.1.patch, HIVE-2286.2.patch, HIVE-2286.6.patch


 When trying to build an index with authorization checks turned on, hive 
 issues the following ClassCastException:
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer cannot be cast to
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
  at
 org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:540)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:848)
  at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:224)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:293)
  at
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:385)
  at
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:392)
  at
 org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:567)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
 a:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2286) ClassCastException when building index with security.authorization turned on

2011-07-26 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071428#comment-13071428
 ] 

jirapos...@reviews.apache.org commented on HIVE-2286:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1137/
---

(Updated 2011-07-26 23:28:13.279889)


Review request for hive, John Sichi and Ning Zhang.


Changes
---

refactor patch to dump query state into an inner class rather than a Stack.


Summary
---

Save the original HiveOperation/commandType when we generate the index builder 
task and restore it after we're done generating the task so that the 
authorization checks make the right decision when deciding what to do.


This addresses bug HIVE-2286.
https://issues.apache.org/jira/browse/HIVE-2286


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe 
  ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION 
  ql/src/test/results/clientnegative/addpart1.q.out f4da8f1 
  ql/src/test/results/clientnegative/alter_concatenate_indexed_table.q.out 
8ae1f9d 
  ql/src/test/results/clientnegative/alter_non_native.q.out 8be2c3b 
  ql/src/test/results/clientnegative/alter_view_failure.q.out 9954b66 
  ql/src/test/results/clientnegative/alter_view_failure2.q.out 5915b4f 
  ql/src/test/results/clientnegative/alter_view_failure4.q.out 97d6b18 
  ql/src/test/results/clientnegative/alter_view_failure5.q.out 2291ca6 
  ql/src/test/results/clientnegative/alter_view_failure6.q.out 03b2bc3 
  ql/src/test/results/clientnegative/alter_view_failure7.q.out d0f958c 
  ql/src/test/results/clientnegative/alter_view_failure8.q.out 4420c57 
  ql/src/test/results/clientnegative/alter_view_failure9.q.out 67306d3 
  ql/src/test/results/clientnegative/altern1.q.out c52ca04 
  ql/src/test/results/clientnegative/analyze_view.q.out 99def40 
  ql/src/test/results/clientnegative/archive1.q.out 0927686 
  ql/src/test/results/clientnegative/archive2.q.out 25baefa 
  ql/src/test/results/clientnegative/authorization_fail_1.q.out ab1abe2 
  ql/src/test/results/clientnegative/authorization_fail_3.q.out cd7ceb1 
  ql/src/test/results/clientnegative/authorization_fail_4.q.out b05f9b7 
  ql/src/test/results/clientnegative/authorization_fail_5.q.out f5bdc6a 
  ql/src/test/results/clientnegative/authorization_fail_7.q.out a52fd1c 
  ql/src/test/results/clientnegative/authorization_part.q.out 625d60c 
  ql/src/test/results/clientnegative/column_rename1.q.out 7c30e4e 
  ql/src/test/results/clientnegative/column_rename2.q.out 0ca78f9 
  ql/src/test/results/clientnegative/column_rename4.q.out f14fd48 
  ql/src/test/results/clientnegative/create_or_replace_view1.q.out 97bfa21 
  ql/src/test/results/clientnegative/create_or_replace_view2.q.out 8edac34 
  ql/src/test/results/clientnegative/create_or_replace_view4.q.out 89dd5f5 
  ql/src/test/results/clientnegative/create_or_replace_view5.q.out a0aed59 
  ql/src/test/results/clientnegative/create_or_replace_view6.q.out df44e33 
  ql/src/test/results/clientnegative/create_or_replace_view7.q.out 9356dcc 
  ql/src/test/results/clientnegative/create_or_replace_view8.q.out 4161659 
  ql/src/test/results/clientnegative/create_view_failure1.q.out 43cded4 
  ql/src/test/results/clientnegative/create_view_failure2.q.out a038067 
  ql/src/test/results/clientnegative/create_view_failure4.q.out f968569 
  ql/src/test/results/clientnegative/database_create_already_exists.q.out 
08c04f9 
  ql/src/test/results/clientnegative/database_create_invalid_name.q.out 1e58089 
  ql/src/test/results/clientnegative/database_drop_does_not_exist.q.out 80c00cd 
  ql/src/test/results/clientnegative/database_drop_not_empty.q.out baa8f37 
  ql/src/test/results/clientnegative/database_drop_not_empty_restrict.q.out 
b297a99 
  ql/src/test/results/clientnegative/database_switch_does_not_exist.q.out 
8b5674d 
  ql/src/test/results/clientnegative/drop_partition_failure.q.out 8a7c63d 
  ql/src/test/results/clientnegative/drop_table_failure2.q.out 9b63102 
  ql/src/test/results/clientnegative/drop_view_failure1.q.out 61ec927 
  ql/src/test/results/clientnegative/dyn_part3.q.out 5f4df65 
  ql/src/test/results/clientnegative/exim_00_unsupported_schema.q.out 814b742 
  ql/src/test/results/clientnegative/exim_01_nonpart_over_loaded.q.out 0351bc1 
  ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out 
d40ff27 
  ql/src/test/results/clientnegative/exim_03_nonpart_noncompat_colschema.q.out 
adff0f8 
  ql/src/test/results/clientnegative/exim_04_nonpart_noncompat_colnumber.q.out 
b84e954 
  ql/src/test/results/clientnegative/exim_05_nonpart_noncompat_coltype.q.out 
96f8452 
  ql/src/test/results/clientnegative/exim_06_nonpart_noncompat_storage.q.out 
25deaa3

[jira] [Reopened] (HIVE-401) Reduce the ant test time to under 15 minutes


 [ 
https://issues.apache.org/jira/browse/HIVE-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-401:
-


Yesterday it took me 4 hours to run the tests on trunk.

 Reduce the ant test time to under 15 minutes
 

 Key: HIVE-401
 URL: https://issues.apache.org/jira/browse/HIVE-401
 Project: Hive
  Issue Type: Wish
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: hive_parallel_test.sh


 ant test is taking too long. This is a big overhead for development since 
 we need to do context switching all the time.
 We should bring the time back to under 15 minutes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-494) Select columns by index instead of name


 [ 
https://issues.apache.org/jira/browse/HIVE-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-494:
-

Description: 
SELECT mytable[0], mytable[2] FROM some_table_name mytable;

...should return the first and third columns, respectively, from mytable 
regardless of their column names.

The need for names specifically is kind of silly when they just get 
translated into numbers anyway.

  was:
In a very real sense, tables are like arrays or matrices with rows and columns. 
IT would be fantastic if I could refer to columns in my select statement by 
their index, rather than by their name.

SELECT mytable[0], mytable[2] FROM some_table_name mytable;

...which would then get the first and third column from mytable. We already 
have syntax like this for array data types, which I think would translate 
nicely: SELECT mytable[0][3], etc.

Or maybe I just spend too much time coding in R...

   Priority: Minor  (was: Major)
Summary: Select columns by index instead of name  (was: Select columns 
by number instead of name)

 Select columns by index instead of name
 ---

 Key: HIVE-494
 URL: https://issues.apache.org/jira/browse/HIVE-494
 Project: Hive
  Issue Type: Wish
  Components: Clients, Query Processor
Reporter: Adam Kramer
Priority: Minor
  Labels: SQL

 SELECT mytable[0], mytable[2] FROM some_table_name mytable;
 ...should return the first and third columns, respectively, from mytable 
 regardless of their column names.
 The need for names specifically is kind of silly when they just get 
 translated into numbers anyway.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2204) unable to get column names for a specific table that has '_' as part of its table name


 [ 
https://issues.apache.org/jira/browse/HIVE-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2204:
-

Fix Version/s: 0.8.0

 unable to get column names for a specific table that has '_' as part of its 
 table name
 --

 Key: HIVE-2204
 URL: https://issues.apache.org/jira/browse/HIVE-2204
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Mythili Gopalakrishnan
Assignee: Patrick Hunt
 Fix For: 0.8.0

 Attachments: HIVE-2204.patch


 I have a table age_group and I am trying to get list of columns for this 
 table name. As underscore and '%' have special meaning in table search 
 pattern according to JDBC searchPattern string specification, I escape the 
 '_' in my table name when I call getColumns for this single table. But HIVE 
 does not return any columns. My call to getColumns is as follows
 catalog   null
 schemaPattern %
 tableNamePattern  age\_group
 columnNamePattern  %
 If I don't escape the '_' in my tableNamePattern, I am able to get the list 
 of columns.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if possible

2011-07-26 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2248:
--

Summary: Comparison Operators convert number types to common type instead 
of double if possible  (was: Comparison Operators convert number types to 
common type instead of double if necessary)

 Comparison Operators convert number types to common type instead of double if 
 possible
 --

 Key: HIVE-2248
 URL: https://issues.apache.org/jira/browse/HIVE-2248
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Siying Dong
Assignee: Siying Dong
 Fix For: 0.8.0

 Attachments: HIVE-2248.1.patch


 Now if the two sides of comparison is of different type, we always convert 
 both to double and compare. It was a slight regression from the change in 
 https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, 
 using GenericUDFBridge, always tried to find common type first.
 The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always 
 convert the column and 0 to double and compare, which is wasteful, though it 
 is usually a minor costs in the system. But it is easy to fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HIVE-2046) In error scenario some opened streams may not closed in Utilities.java


 [ 
https://issues.apache.org/jira/browse/HIVE-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-2046:
--


 In error scenario some opened streams may not closed in Utilities.java
 --

 Key: HIVE-2046
 URL: https://issues.apache.org/jira/browse/HIVE-2046
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2046.Patch


 1) In error scenario XMLDecoder  XMLEncoder may not be closed in 
 serializeMapRedWork() and deserializeMapRedWork() Utilities.java
 2) BufferedReader is not closed in Utilities.StreamPrinter

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2046) In error scenario some opened streams may not closed in Utilities.java


 [ 
https://issues.apache.org/jira/browse/HIVE-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-2046.
--

Resolution: Duplicate

 In error scenario some opened streams may not closed in Utilities.java
 --

 Key: HIVE-2046
 URL: https://issues.apache.org/jira/browse/HIVE-2046
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2046.Patch


 1) In error scenario XMLDecoder  XMLEncoder may not be closed in 
 serializeMapRedWork() and deserializeMapRedWork() Utilities.java
 2) BufferedReader is not closed in Utilities.StreamPrinter

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel


 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2051:
-

Fix Version/s: 0.8.0

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch, HIVE-2051.5.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HIVE-2044) In error scenario opened streams may not closed in TypedBytesWritableOutput.java


 [ 
https://issues.apache.org/jira/browse/HIVE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-2044:
--


 In error scenario opened streams may not closed in 
 TypedBytesWritableOutput.java
 

 Key: HIVE-2044
 URL: https://issues.apache.org/jira/browse/HIVE-2044
 Project: Hive
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2044.Patch


 1) In error scenario DataOutputStream may not be closed in writeWritable of  
 TypedBytesWritableOutput.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1937) DDLSemanticAnalyzer won't take newly set Hive parameters


 [ 
https://issues.apache.org/jira/browse/HIVE-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1937:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 DDLSemanticAnalyzer won't take newly set Hive parameters
 

 Key: HIVE-1937
 URL: https://issues.apache.org/jira/browse/HIVE-1937
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.8.0

 Attachments: HIVE-1937.2.patch, HIVE-1937.3.patch, HIVE-1937.patch


 Hive DDLSemanticAnalyzer maintains a static reservedPartitionValue set whose 
 values come from several Hive parameters. However even if these parameters 
 are set to new values, the reservedPartitionValue are not changed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HIVE-1890) Optimize privilege checking for authorization


 [ 
https://issues.apache.org/jira/browse/HIVE-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-1890:
--


 Optimize privilege checking for authorization
 -

 Key: HIVE-1890
 URL: https://issues.apache.org/jira/browse/HIVE-1890
 Project: Hive
  Issue Type: Improvement
  Components: Security
Reporter: Namit Jain
Assignee: He Yongqiang

 Follow-up of HIVE-78
 There are many queries which have lots of input partitions for the same input 
 table.
 If the table under consideration has the same privilege for all the 
 partitions, you
 dont need to check the permissions for all the partitions. You can find the 
 common
 tables and skip the partitions altogether

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-1890) Optimize privilege checking for authorization


 [ 
https://issues.apache.org/jira/browse/HIVE-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1890.
--

Resolution: Duplicate

 Optimize privilege checking for authorization
 -

 Key: HIVE-1890
 URL: https://issues.apache.org/jira/browse/HIVE-1890
 Project: Hive
  Issue Type: Improvement
  Components: Security
Reporter: Namit Jain
Assignee: He Yongqiang

 Follow-up of HIVE-78
 There are many queries which have lots of input partitions for the same input 
 table.
 If the table under consideration has the same privilege for all the 
 partitions, you
 dont need to check the permissions for all the partitions. You can find the 
 common
 tables and skip the partitions altogether

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes


 [ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1644:
-

Fix Version/s: 0.8.0

 use filter pushdown for automatically accessing indexes
 ---

 Key: HIVE-1644
 URL: https://issues.apache.org/jira/browse/HIVE-1644
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: John Sichi
Assignee: Russell Melick
 Fix For: 0.8.0

 Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
 HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.13.patch, 
 HIVE-1644.14.patch, HIVE-1644.15.patch, HIVE-1644.16.patch, 
 HIVE-1644.17.patch, HIVE-1644.18.patch, HIVE-1644.19.patch, 
 HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, 
 HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch, 
 hive.log


 HIVE-1226 provides utilities for analyzing filters which have been pushed 
 down to a table scan.  The next step is to use these for selecting available 
 indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct


 [ 
https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1595:
-

Fix Version/s: 0.8.0

 job name for alter table T archive partition P is not correct
 -

 Key: HIVE-1595
 URL: https://issues.apache.org/jira/browse/HIVE-1595
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: Hive-1595.1.patch, Hive-1595.2.patch


 For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which 
 makes it difficult to identify

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HIVE-1490) More implicit type conversion: UNION ALL and COALESCE


 [ 
https://issues.apache.org/jira/browse/HIVE-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-1490:
--


 More implicit type conversion: UNION ALL and COALESCE
 -

 Key: HIVE-1490
 URL: https://issues.apache.org/jira/browse/HIVE-1490
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Server Infrastructure
Reporter: Adam Kramer
Assignee: Syed S. Albiz

 This is a usecase that frequently annoys me:
 SELECT TRANSFORM(stuff)
 USING 'script'
 AS thing1, thing2
 FROM some_table
 UNION ALL
 SELECT a.thing1, a.thing2
 FROM some_other_table a
 ...this fails when a.thing1 and a.thing2 are anything but STRING, because all 
 output of TRANSFORM is STRING.
 In this case, a.thing1 and a.thing2 should be implicitly converted to string.
 COALESCE(a.thing1, a.thing2, a.thing3) should similarly do implicit type 
 conversion among the arguments. If two are INT and one is BIGINT, upgrade the 
 INTs, etc.
 At the very least, it would be nice to have syntax like
 SELECT TRANSFORM(stuff)
 USING 'script'
 AS thing1 INT, thing2 INT
 ...which would effectively cast the output column to the specified type. But 
 really, type conversion should work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2199) incorrect success flag passed to jobClose


 [ 
https://issues.apache.org/jira/browse/HIVE-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2199:
-

Fix Version/s: 0.8.0

 incorrect success flag passed to jobClose
 -

 Key: HIVE-2199
 URL: https://issues.apache.org/jira/browse/HIVE-2199
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Franklin Hu
Assignee: Franklin Hu
Priority: Minor
 Fix For: 0.8.0

 Attachments: hive-2199.1.patch


 For block level merging of RCFiles, jobClose is passed the incorrect variable 
 as the success flag

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2024) In Driver.execute(), mapred.job.tracker is not restored if one of the task fails.


 [ 
https://issues.apache.org/jira/browse/HIVE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2024:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 In Driver.execute(), mapred.job.tracker is not restored if one of the task 
 fails.
 -

 Key: HIVE-2024
 URL: https://issues.apache.org/jira/browse/HIVE-2024
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Siying Dong
Assignee: Siying Dong
 Fix For: 0.8.0

 Attachments: HIVE-2024.1.patch


 If automatically one job is determined to run in local mode, and the task 
 fails with error code not 0, mapred.job.tracker will remain to be local and 
 might cause further problems.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2052) PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary


 [ 
https://issues.apache.org/jira/browse/HIVE-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2052:
-

Fix Version/s: 0.8.0

 PostHook and PreHook API to add flag to indicate it is pre or post hook plus 
 cache for content summary
 --

 Key: HIVE-2052
 URL: https://issues.apache.org/jira/browse/HIVE-2052
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2051.3.patch, HIVE-2052.1.patch, HIVE-2052.2.patch, 
 HIVE-2052.3.patch


 This will allow hooks to share some information better and reduce their 
 latency

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

[
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Carl Steinbach updated HIVE-2082:
-

Component/s: Query Processor
Fix Version/s: 0.8.0

Reduce memory consumption in preparing MapReduce job

Key: HIVE-2082
URL: https://issues.apache.org/jira/browse/HIVE-2082
Project: Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Ning Zhang
Assignee: Ning Zhang
Fix For: 0.8.0

Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch

Hive client side consume a lot of memory when the number of input partitions
is large. One reason is that each partition maintains a list of FieldSchema
which are intended to deal with schema evolution. However they are not used
currently and Hive uses the table level schema for all partitions. This will
be fixed in HIVE-2050. The memory consumption by this part will be reduced by
almost half (1.2GB to 700BM for 20k partitions).
Another large chunk of memory consumption is in the MapReduce job setup phase
when a PartitionDesc is created from each Partition object. A property object
is maintained in PartitionDesc which contains a full list of columns and
types. Due to the same reason, these should be the same as in the table level
schema. Also the deserializer initialization takes large amount of memory,
which should be avoided. My initial testing for these optimizations cut the
memory consumption in half (700MB to 300MB for 20k partitions).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-178) SELECT without FROM should assume a one-row table with no columns.


 [ 
https://issues.apache.org/jira/browse/HIVE-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-178:
-

Component/s: Testing Infrastructure
Description: 
SELECT 1+1;

should just return '2', but instead hive fails because no table is listed.

SELECT 1+1 FROM (empty table);

should also just return '2', but instead hive succeeds because there is no 
possible output, so it produces no output.

So, currently we have to run 

SELECT 1+1 FROM (silly one-row dummy table);

...which runs a whole mapreduce step to ignore a column of data that is useless 
anyway. This is much easier due to local mode, but still, it would be nice to 
be able to SELECT without specifying a table and to get one row of output in 
moments instead of waiting for even a local-mode job to launch, complete, and 
return.

This is especially useful for testing UDFs.

Relatedly, an optimization by which Hive can tell that data from a table isn't 
even USED would be useful, because it means that the data needn't be 
queried...the only relevant info from the table would be the number of rows it 
has, which is available for free from the metastore.

  was:
SELECT 1+1;

should just return '2', but instead hive fails because no table is listed.

SELECT 1+1 FROM (empty table);

should also just return '2', but instead hive succeeds because there is no 
possible output, so it produces no output.

So, currently we have to run 

SELECT 1+1 FROM (silly one-row dummy table);

...which runs a whole mapreduce step to ignore a column of data that is useless 
anyway. This is much easier due to local mode, but still, it would be nice to 
be able to SELECT without specifying a table and to get one row of output in 
moments instead of waiting for even a local-mode job to launch, complete, and 
return.

Relatedly, an optimization by which Hive can tell that data from a table isn't 
even USED would be useful, because it means that the data needn't be 
queried...the only relevant info from the table would be the number of rows it 
has, which is available for free from the metastore.


 SELECT without FROM should assume a one-row table with no columns.
 --

 Key: HIVE-178
 URL: https://issues.apache.org/jira/browse/HIVE-178
 Project: Hive
  Issue Type: Wish
  Components: Query Processor, Testing Infrastructure
Reporter: Adam Kramer
Priority: Minor
  Labels: SQL

 SELECT 1+1;
 should just return '2', but instead hive fails because no table is listed.
 SELECT 1+1 FROM (empty table);
 should also just return '2', but instead hive succeeds because there is no 
 possible output, so it produces no output.
 So, currently we have to run 
 SELECT 1+1 FROM (silly one-row dummy table);
 ...which runs a whole mapreduce step to ignore a column of data that is 
 useless anyway. This is much easier due to local mode, but still, it would be 
 nice to be able to SELECT without specifying a table and to get one row of 
 output in moments instead of waiting for even a local-mode job to launch, 
 complete, and return.
 This is especially useful for testing UDFs.
 Relatedly, an optimization by which Hive can tell that data from a table 
 isn't even USED would be useful, because it means that the data needn't be 
 queried...the only relevant info from the table would be the number of rows 
 it has, which is available for free from the metastore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2096) throw a error if the input is larger than a threshold for index input format


 [ 
https://issues.apache.org/jira/browse/HIVE-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2096:
-

  Component/s: Query Processor
   Diagnosability
Fix Version/s: 0.8.0

 throw a error if the input is larger than a threshold for index input format
 

 Key: HIVE-2096
 URL: https://issues.apache.org/jira/browse/HIVE-2096
 Project: Hive
  Issue Type: Bug
  Components: Diagnosability, Query Processor
Affects Versions: 0.8.0
Reporter: Namit Jain
 Fix For: 0.8.0

 Attachments: HIVE-2096.1.patch.txt, HIVE-2096.2.patch.txt, 
 HIVE-2096.3.patch.txt, HIVE-2096.4.patch.txt


 This can hang for ever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2096) throw a error if the input is larger than a threshold for index input format


 [ 
https://issues.apache.org/jira/browse/HIVE-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-2096:


Assignee: Wojciech Galuba

 throw a error if the input is larger than a threshold for index input format
 

 Key: HIVE-2096
 URL: https://issues.apache.org/jira/browse/HIVE-2096
 Project: Hive
  Issue Type: Bug
  Components: Diagnosability, Query Processor
Affects Versions: 0.8.0
Reporter: Namit Jain
Assignee: Wojciech Galuba
 Fix For: 0.8.0

 Attachments: HIVE-2096.1.patch.txt, HIVE-2096.2.patch.txt, 
 HIVE-2096.3.patch.txt, HIVE-2096.4.patch.txt


 This can hang for ever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2106) Increase the number of operator counter


 [ 
https://issues.apache.org/jira/browse/HIVE-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2106:
-

Fix Version/s: 0.8.0

 Increase the number of operator counter 
 

 Key: HIVE-2106
 URL: https://issues.apache.org/jira/browse/HIVE-2106
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.8.0

 Attachments: HIVE-2106.patch


 Currently Hadoop counters have to be defined as enum (hardcoded) and we 
 support up to 400 counters now. This limit the number of operators to 100 
 (each operator has 4 counters). We need to increase the hadoop counters or 
 change the Hive code to use Hadoop 0.20 API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2186) Dynamic Partitioning Failing because of characters not supported globStatus


 [ 
https://issues.apache.org/jira/browse/HIVE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2186:
-

Fix Version/s: 0.8.0

 Dynamic Partitioning Failing because of characters not supported globStatus
 ---

 Key: HIVE-2186
 URL: https://issues.apache.org/jira/browse/HIVE-2186
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Siying Dong
Assignee: Franklin Hu
 Fix For: 0.8.0

 Attachments: hive-2186.1.patch, hive-2186.2.patch, hive-2186.3.patch, 
 hive-2186.4.patch, hive-2186.5.patch


 Some dynamic queries failed on the stage of loading partitions if dynamic 
 partition columns contain special characters. We need to escape all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2159) TableSample(percent ) uses one intermediate size to be int, which overflows for large sampled size, making the sampling never triggered.


 [ 
https://issues.apache.org/jira/browse/HIVE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2159:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 TableSample(percent ) uses one intermediate size to be int, which overflows 
 for large sampled size, making the sampling never triggered.
 

 Key: HIVE-2159
 URL: https://issues.apache.org/jira/browse/HIVE-2159
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Siying Dong
Assignee: Siying Dong
 Fix For: 0.8.0

 Attachments: HIVE-2159.1.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2121) Input Sampling By Splits


 [ 
https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2121:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 Input Sampling By Splits
 

 Key: HIVE-2121
 URL: https://issues.apache.org/jira/browse/HIVE-2121
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Siying Dong
Assignee: Siying Dong
 Fix For: 0.8.0

 Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, 
 HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch, HIVE-2121.7.patch, 
 HIVE-2121.8.patch


 We need a better input sampling to serve at least two purposes:
 1. test their queries against a smaller data set
 2. understand more about how the data look like without scanning the whole 
 table.
 A simple function that gives a subset splits will help in those cases. It 
 doesn't have to be strict sampling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2157) NPE in MapJoinObjectKey


 [ 
https://issues.apache.org/jira/browse/HIVE-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2157:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 NPE in MapJoinObjectKey
 ---

 Key: HIVE-2157
 URL: https://issues.apache.org/jira/browse/HIVE-2157
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.8.0

 Attachments: HIVE-2157.1.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2262) mapjoin followed by union all, groupby does not work