[jira] [Created] (HIVE-4115) Introduce cube abstraction in hive

2013-03-05 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-4115:
-

 Summary: Introduce cube abstraction in hive
 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu


We would like to define a cube abstraction so that user can query at cube layer 
and do not know anything about storage and rollups. 

Will describe the model more in following comments.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4116) Can't use views using map datatype.

2013-03-05 Thread Karel Vervaeke (JIRA)
Karel Vervaeke created HIVE-4116:


 Summary: Can't use views using map datatype.
 Key: HIVE-4116
 URL: https://issues.apache.org/jira/browse/HIVE-4116
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.8.1
Reporter: Karel Vervaeke


Executing the following 

{noformat}
DROP TABLE IF EXISTS `items`;
CREATE TABLE IF NOT EXISTS `items` (id INT, name STRING, info 
MAPSTRING,STRING) PARTITIONED BY (ds STRING);

DROP VIEW IF EXISTS `priceview`;
CREATE VIEW `priceview` AS
SELECT
`items`.`id`,
`items`.info['price']
FROM
`items`
;

select * from `priceview`;
{noformat}

Produces the following error:
{noformat}
karel@tomato:~/tmp$ $HIVE_HOME/bin/hive -f hivebug.sql
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in 
jar:file:/home/karel/opt/hive-0.10.0-bin/lib/hive-common-0.10.0.jar!/hive-log4j.properties
Hive history file=/tmp/karel/hive_job_log_karel_201303051117_945318761.txt
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/karel/opt/hadoop-2.0.0-mr1-cdh4.0.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/karel/opt/hive-0.10.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
OK
Time taken: 5.449 seconds
OK
Time taken: 0.303 seconds
OK
Time taken: 0.131 seconds
OK
Time taken: 0.206 seconds
FAILED: SemanticException line 3:22 mismatched input '.' expecting FROM near 
'`items`' in from clause
 in definition of VIEW priceview [
SELECT
`items`.`id`,
`items``items`.`info`info['price']
FROM
`default`.`items`
] used as priceview at Line 3:14
{noformat}

Unless I'm not using the right syntax, I would expect this simple example to 
work. I have tried some variations (quotes, no quotes, ...), to no avail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive

2013-03-05 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593296#comment-13593296
 ] 

Amareshwari Sriramadasu commented on HIVE-4115:
---

Logical model :
-
*Cube* :
* A cube is a set of dimensions and measures in a particular subject. 
* A measure is a quantity that you are interested in measuring.
* A dimension is an attribute, or set of attributes, by which you can divide 
measures into sub-categories. 

*Fact Tables* :
* Cube will have fact tables associated with it.
* A fact table would have subset of measures and dimensions.
* Fact tables can be rolled at any dimension and time.

*Dimensions* :
* The cube dimension can refer to a dimension table 
* The cube dimension can have hierarchy of elements.

*Dimension tables* :
* A table with list of columns.
* The table can have references to other dimension tables.
* The dimension tables can be shared across cubes.

*Storage*:
* Fact or dimension table can have storages associated with it.

Storage Model :
-
A physical table will be created in hive metastore for each fact, per storage 
per rollup.


 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive

2013-03-05 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593316#comment-13593316
 ] 

Amareshwari Sriramadasu commented on HIVE-4115:
---


Illustrating above model with an example :
* Define a SALES_CUBE cube with measures : Sales, Discount and Dimensions: 
CustomerID, Location, Transaction-time 

* Dimensions:
** CustomerID is a simple dimension which refers to the customer table on 
column ID. CustomerTable is having the schema : ID, Age, Gender
** Location is hierarchical dimension with the hierarchy : Zipcode, CityID, 
StateID, CountryID, RegionID
*** Zipcode refers to ZipTable on column code. ZipTable schema : code, 
street-name, cityID, stateID
*** CityID refers to cityTable on column ID. CityTable schema : ID, name, 
stateID
*** stateID refers to stateTable on column ID. StateTable schema : ID, name, 
capital, countryID
*** countryID refers to counteryTable on column ID. CounterTable : ID, name, 
capital, Region
*** Region is an inline dimension with values 'APAC', 'EMEA', 'USA'
** Transaction-time is simple dimension with timestamp field.

* Facts :Sales_cube can have the following fact tables :
## RawFact with columns Sales, Discount, CustomerId, ZipCode, Transaction-time
## CountryFact with columns Sales, Discount, CountryID


Physical storage tables :

In the example described above say that RawFact is rolled hourly in Cluster c1, 
is rolled daily and monthly on Cluster C2; CountryFact is rolled daily, 
monthly, quarterly and yearly on Cluster C2; Also, Customer table is available 
in HBase cluster H1; All the location tables are available in HDFS cluster C2.

The physical tables would be :
* C1_Rawfact_hourly - schema : Sales, Discount, CustomerId, ZipCode, 
Transaction-time Partitioned by dt and state.
* C2_Rawfact_daily - schema : Sales, Discount, CustomerId, ZipCode, 
Transaction-time Partitioned by dt and state.
* C2_Rawfact_monthly - schema : Sales, Discount, CustomerId, ZipCode, 
Transaction-time Partitioned by dt and state.
* C2_CountryFact_daily - Schema : Sales, Discount, CountryID Partitioned by dt
* C2_CountryFact_monthly - Schema : Sales, Discount, CountryID Partitioned by 
dt
* C2_CountryFact_quarterly - Schema : Sales, Discount, CountryID Partitioned 
by dt
* C2_CountryFact_yearly - Schema : Sales, Discount, CountryID Partitioned by 
dt
* H1_CustomerTable - schema :  ID, Age, Gender
* C2_ZipTable - schema : code, street-name, cityID, stateID
* C2_CityTable - schema : ID, name, stateID
* C2_StateTable -schema : ID, name, capital, countryID
* C2_CountryTable -schema : ID, name, capital, Region


If User queries the data on cube with a query like the following :
* Select sales from SALES_CUBE where region = 'APAC' and 
time_range_in(09/01/2012, 12/31/2012)  // Q4 -2012.

Cube Abstraction provided would be smart enough to figure out which table to go 
and give the result . In this case the query translates to :

* Select sales from C2_CountryFact_quarterly join C2_countryTable on 
C2_CountryFact_quarterly.CountryID = C2_countryTable.ID where dt = Q4-2012 
and C2_countryTable.region = 'APAC';



 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive

2013-03-05 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593337#comment-13593337
 ] 

Amareshwari Sriramadasu commented on HIVE-4115:
---

bq. In the example described above say that RawFact is rolled hourly in Cluster 
c1, is rolled daily and monthly on Cluster C2; CountryFact is rolled daily, 
monthly, quarterly and yearly on Cluster C2; Also, Customer table is available 
in HBase cluster H1; All the location tables are available in HDFS cluster C2.

Forgot to mention that, along with timely rolling RawFact is rolled at 
dimension state also. 

 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4117) Extract schema from avro files when creating external hive table on existing avro file/dir

2013-03-05 Thread Shashwat Agarwal (JIRA)
Shashwat Agarwal created HIVE-4117:
--

 Summary: Extract schema from avro files when creating external 
hive table on existing avro file/dir
 Key: HIVE-4117
 URL: https://issues.apache.org/jira/browse/HIVE-4117
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Shashwat Agarwal
Priority: Minor


We can extract schema from Avro file itself when creating an external table 
over existing avro files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4117) Extract schema from avro files when creating external hive table on existing avro file/dir

2013-03-05 Thread Shashwat Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashwat Agarwal updated HIVE-4117:
---

Release Note: Read schema from avro file if available as location property
  Status: Patch Available  (was: Open)

Read schema from avro file if available as location property. 'location' can be 
either a directory or a file.

 Extract schema from avro files when creating external hive table on existing 
 avro file/dir
 --

 Key: HIVE-4117
 URL: https://issues.apache.org/jira/browse/HIVE-4117
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Shashwat Agarwal
Priority: Minor
  Labels: patch

 We can extract schema from Avro file itself when creating an external table 
 over existing avro files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4117) Extract schema from avro files when creating external hive table on existing avro file/dir

2013-03-05 Thread Shashwat Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashwat Agarwal updated HIVE-4117:
---

Attachment: avro-read-schema.patch

Read schema from avro file if available as location property

 Extract schema from avro files when creating external hive table on existing 
 avro file/dir
 --

 Key: HIVE-4117
 URL: https://issues.apache.org/jira/browse/HIVE-4117
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Shashwat Agarwal
Priority: Minor
  Labels: patch
 Attachments: avro-read-schema.patch


 We can extract schema from Avro file itself when creating an external table 
 over existing avro files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-707) add group_concat

2013-03-05 Thread Svetozar Misljencevic (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593428#comment-13593428
 ] 

Svetozar Misljencevic commented on HIVE-707:


You could use the currently undocumented union_map function as a workaround... 
Try :
concat_ws(' ', map_keys(UNION_MAP(MAP(your_column, 'dummy'

 add group_concat
 

 Key: HIVE-707
 URL: https://issues.apache.org/jira/browse/HIVE-707
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Min Zhou

 Moving the discussion to a new jira:
 I've implemented group_cat() in a rush, and found something difficult to 
 slove:
 1. function group_cat() has a internal order by clause, currently, we can't 
 implement such an aggregation in hive.
 2. when the strings will be group concated are too large, in another words, 
 if data skew appears, there is often not enough memory to store such a big 
 result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4118) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully qualified table name

2013-03-05 Thread Lenni Kuff (JIRA)
Lenni Kuff created HIVE-4118:


 Summary: ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails 
when using fully qualified table name
 Key: HIVE-4118
 URL: https://issues.apache.org/jira/browse/HIVE-4118
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff


Computing column stats fails when using fully qualified table name. Issuing a 
USE db and using only the table name succeeds.


{code}
hive -e ANALYZE TABLE somedb.some_table COMPUTE STATISTICS FOR COLUMNS int_col

org.apache.hadoop.hive.ql.metadata.HiveException: 
NoSuchObjectException(message:Table somedb.some_table for which stats is 
gathered doesn't exist.)
at 
org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2201)
at 
org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:325)
at 
org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:336)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
at $Proxy9.updateTableColumnStatistics(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.update_table_column_statistics(HiveMetaStore.java:3171)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at $Proxy10.update_table_column_statistics(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.updateTableColumnStatistics(HiveMetaStoreClient.java:973)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
at $Proxy11.updateTableColumnStatistics(Unknown Source)
at 
org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2198)
... 18 more

{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-05 Thread Lenni Kuff (JIRA)
Lenni Kuff created HIVE-4119:


 Summary: ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails 
with NPE if the table is empty
 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff


ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is 
empty


{code}
hive -e create table empty_table (i int); select compute_stats(i, 16) from 
empty_table


java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
... 15 more
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
at 

[jira] [Created] (HIVE-4121) ORC should have optional dictionaries for both strings and numeric types

2013-03-05 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4121:
---

 Summary: ORC should have optional dictionaries for both strings 
and numeric types
 Key: HIVE-4121
 URL: https://issues.apache.org/jira/browse/HIVE-4121
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4120) Implement decimal encoding for ORC

2013-03-05 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4120:
---

 Summary: Implement decimal encoding for ORC
 Key: HIVE-4120
 URL: https://issues.apache.org/jira/browse/HIVE-4120
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, ORC does not have an encoder for decimal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4122) Queries fail if timestamp data not in expected format

2013-03-05 Thread Lenni Kuff (JIRA)
Lenni Kuff created HIVE-4122:


 Summary: Queries fail if timestamp data not in expected format
 Key: HIVE-4122
 URL: https://issues.apache.org/jira/browse/HIVE-4122
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Lenni Kuff


Queries will fail if timestamp data not in expected format. The expected 
behavior is to return NULL for these invalid values.

{code}
# Not all timestamps in correct format:
echo 1999-10-10
1999-10-10 90:10:10
-01-01 00:00:00  table.data
hive -e create table timestamp_tbl (t timestamp)
hadoop fs -put ./table.data HIVE_WAREHOUSE_DIR/timestamp_tbl/
hive -e select t from timestamp_tbl

Execution failed with exit status: 2
13/03/05 09:47:05 ERROR exec.Task: Execution failed with exit status: 2
Obtaining error information
13/03/05 09:47:05 ERROR exec.Task: Obtaining error information

Task failed!
Task ID:
  Stage-1

Logs:

13/03/05 09:47:05 ERROR exec.Task: 
Task failed!
Task ID:
  Stage-1

Logs:
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4121) ORC should have optional dictionaries for both strings and numeric types

2013-03-05 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4121:


Description: Currently string columns always have dictionaries and numerics 
are always directly encoded. It would be better to make the encoding depend on 
a sample of the data. Perhaps the first 100k values should be evaluated for 
repeated values and the encoding picked for the stripe.

 ORC should have optional dictionaries for both strings and numeric types
 

 Key: HIVE-4121
 URL: https://issues.apache.org/jira/browse/HIVE-4121
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Currently string columns always have dictionaries and numerics are always 
 directly encoded. It would be better to make the encoding depend on a sample 
 of the data. Perhaps the first 100k values should be evaluated for repeated 
 values and the encoding picked for the stripe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4123) The RLE encoding for ORC can be improved

2013-03-05 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4123:
---

 Summary: The RLE encoding for ORC can be improved
 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The run length encoding of integers can be improved:
* tighter bit packing
* allow delta encoding
* allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-05 Thread Lenni Kuff (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lenni Kuff updated HIVE-4119:
-

Priority: Critical  (was: Major)

This is especially bad because if executing via a Hive Server it will cause the 
service process to crash. 

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 -

 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Priority: Critical

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 {code}
 hive -e create table empty_table (i int); select compute_stats(i, 16) from 
 empty_table
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   ... 15 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 

[jira] [Commented] (HIVE-4042) ignore mapjoin hint

2013-03-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593723#comment-13593723
 ] 

Namit Jain commented on HIVE-4042:
--

[~ashutoshc], too risky. HIVE-3891 only takes care of the sort-merge join case, 
not the bucketed join case.
Even if we do that, this is brand new code, and may have issues.

This patch really eases large deployments with lots of queries, where it is not 
manually possible to change the
queries (there are simply too many of them). I completely agree that eventually 
we should completely ignore the
mapjoin hint always, but we need some time to get there.

 ignore mapjoin hint
 ---

 Key: HIVE-4042
 URL: https://issues.apache.org/jira/browse/HIVE-4042
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4042.1.patch, hive.4042.2.patch, hive.4042.3.patch, 
 hive.4042.4.patch, hive.4042.5.patch, hive.4042.6.patch, hive.4042.7.patch, 
 hive.4042.8.patch


 After HIVE-3784, in a production environment, it can become difficult to
 deploy since a lot of production queries can break.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4124) Add more tests for windowing

2013-03-05 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-4124:
--

 Summary: Add more tests for windowing
 Key: HIVE-4124
 URL: https://issues.apache.org/jira/browse/HIVE-4124
 Project: Hive
  Issue Type: Test
  Components: PTF-Windowing
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


It will be good to add tests which tests against different data-types in 
different scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4124) Add more tests for windowing

2013-03-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4124:
---

Attachment: tests.patch

 Add more tests for windowing
 

 Key: HIVE-4124
 URL: https://issues.apache.org/jira/browse/HIVE-4124
 Project: Hive
  Issue Type: Test
  Components: PTF-Windowing
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: tests.patch


 It will be good to add tests which tests against different data-types in 
 different scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4124) Add more tests for windowing

2013-03-05 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593763#comment-13593763
 ] 

Ashutosh Chauhan commented on HIVE-4124:


https://reviews.facebook.net/D9099

 Add more tests for windowing
 

 Key: HIVE-4124
 URL: https://issues.apache.org/jira/browse/HIVE-4124
 Project: Hive
  Issue Type: Test
  Components: PTF-Windowing
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: tests.patch


 It will be good to add tests which tests against different data-types in 
 different scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-05 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-4119:


Assignee: Shreepadma Venugopalan

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 -

 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan
Priority: Critical

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 {code}
 hive -e create table empty_table (i int); select compute_stats(i, 16) from 
 empty_table
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   ... 15 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at 

[jira] [Commented] (HIVE-4106) SMB joins fail in multi-way joins

2013-03-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593767#comment-13593767
 ] 

Namit Jain commented on HIVE-4106:
--

  // check if the join operator encountered is a candidate for being converted
  // to a sort-merge join
  private NodeProcessor getCheckCandidateJoin() {
return new NodeProcessor() {
  @Override
  public Object process(Node nd, StackNode stack, NodeProcessorCtx 
procCtx,
Object... nodeOutputs) throws SemanticException {
SortBucketJoinProcCtx smbJoinContext = (SortBucketJoinProcCtx)procCtx;
JoinOperator joinOperator = (JoinOperator)nd;
int size = stack.size();
if (!(stack.get(size-1) instanceof JoinOperator) ||
!(stack.get(size-2) instanceof ReduceSinkOperator)) {
  smbJoinContext.getRejectedJoinOps().add(joinOperator);
  return null;
}

// If any operator in the stack does not support a auto-conversion, 
this join should
// not be converted.
for (int pos = size -3; pos = 0; pos--) {
  Operator? extends OperatorDesc op = (Operator? extends 
OperatorDesc)stack.get(pos);
  if (!op.supportAutomaticSortMergeJoin()) {
smbJoinContext.getRejectedJoinOps().add(joinOperator);
return null;
  }
}

return null;
  }
};
  }


It should be done above - file SortedMergeBucketMapJoinOptimizer.java.
Can you try your testcase, and see if it being added to rejectJoinOps ?

 SMB joins fail in multi-way joins
 -

 Key: HIVE-4106
 URL: https://issues.apache.org/jira/browse/HIVE-4106
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4106.patch


 I see array out of bounds exception in case of multi way smb joins. This is 
 related to changes that went in as part of HIVE-3403. This issue has been 
 discussed in HIVE-3891.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4118) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully qualified table name

2013-03-05 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-4118:


Assignee: Shreepadma Venugopalan

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully 
 qualified table name
 

 Key: HIVE-4118
 URL: https://issues.apache.org/jira/browse/HIVE-4118
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan

 Computing column stats fails when using fully qualified table name. Issuing a 
 USE db and using only the table name succeeds.
 {code}
 hive -e ANALYZE TABLE somedb.some_table COMPUTE STATISTICS FOR COLUMNS 
 int_col
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 NoSuchObjectException(message:Table somedb.some_table for which stats is 
 gathered doesn't exist.)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2201)
   at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:325)
   at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:336)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
   at $Proxy9.updateTableColumnStatistics(Unknown Source)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.update_table_column_statistics(HiveMetaStore.java:3171)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
   at $Proxy10.update_table_column_statistics(Unknown Source)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.updateTableColumnStatistics(HiveMetaStoreClient.java:973)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
   at $Proxy11.updateTableColumnStatistics(Unknown Source)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2198)
   ... 18 more
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4078) Delay the serialize-deserialize pair in CommonJoinResolver

2013-03-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-4078:
--

Summary: Delay the serialize-deserialize pair in CommonJoinResolver  (was: 
Remove the serialize-deserialize pair in CommonJoinResolver)

 Delay the serialize-deserialize pair in CommonJoinResolver
 --

 Key: HIVE-4078
 URL: https://issues.apache.org/jira/browse/HIVE-4078
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-4078-20130227.patch, HIVE-4078.patch


 CommonJoinProcessor tries to clone a MapredWork while attempting a conversion 
 to a map-join
 {code}
   // deep copy a new mapred work from xml
   InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8));
   MapredWork newWork = Utilities.deserializeMapRedWork(in, 
 physicalContext.getConf());
 {code}
 which is a very heavy operation memory wise  cpu-wise.
 Instead of cloning via XMLEncoder, it is faster to use BeanUtils.cloneBean() 
 which is following same data paths (get/set bean methods) instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4078) Delay the serialize-deserialize pair in CommonJoinResolver

2013-03-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-4078:
--

Attachment: HIVE-4078-20130305.patch

Use serialization/deserialization of the MapredWork only for cases which 
require conditional work.

 Delay the serialize-deserialize pair in CommonJoinResolver
 --

 Key: HIVE-4078
 URL: https://issues.apache.org/jira/browse/HIVE-4078
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-4078-20130227.patch, HIVE-4078-20130305.patch, 
 HIVE-4078.patch


 CommonJoinProcessor tries to clone a MapredWork while attempting a conversion 
 to a map-join
 {code}
   // deep copy a new mapred work from xml
   InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8));
   MapredWork newWork = Utilities.deserializeMapRedWork(in, 
 physicalContext.getConf());
 {code}
 which is a very heavy operation memory wise  cpu-wise.
 Instead of cloning via XMLEncoder, it is faster to use BeanUtils.cloneBean() 
 which is following same data paths (get/set bean methods) instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4078) Delay the serialize-deserialize pair in CommonJoinResolver

2013-03-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-4078:
--

Description: 
CommonJoinProcessor tries to clone a MapredWork while attempting a conversion 
to a map-join

{code}
  // deep copy a new mapred work from xml
  InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8));
  MapredWork newWork = Utilities.deserializeMapRedWork(in, 
physicalContext.getConf());
{code}

which is a very heavy operation memory wise  cpu-wise.

It would be better to do this only if a conditional task is required, resulting 
in a copy of the task.

  was:
CommonJoinProcessor tries to clone a MapredWork while attempting a conversion 
to a map-join

{code}
  // deep copy a new mapred work from xml
  InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8));
  MapredWork newWork = Utilities.deserializeMapRedWork(in, 
physicalContext.getConf());
{code}

which is a very heavy operation memory wise  cpu-wise.

Instead of cloning via XMLEncoder, it is faster to use BeanUtils.cloneBean() 
which is following same data paths (get/set bean methods) instead.


 Delay the serialize-deserialize pair in CommonJoinResolver
 --

 Key: HIVE-4078
 URL: https://issues.apache.org/jira/browse/HIVE-4078
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-4078-20130227.patch, HIVE-4078-20130305.patch, 
 HIVE-4078.patch


 CommonJoinProcessor tries to clone a MapredWork while attempting a conversion 
 to a map-join
 {code}
   // deep copy a new mapred work from xml
   InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8));
   MapredWork newWork = Utilities.deserializeMapRedWork(in, 
 physicalContext.getConf());
 {code}
 which is a very heavy operation memory wise  cpu-wise.
 It would be better to do this only if a conditional task is required, 
 resulting in a copy of the task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4078) Delay the serialize-deserialize pair in CommonJoinResolver

2013-03-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-4078:
--

Release Note: Create a copy of the MapredWork only if a conditional task is 
involved and avoid the copy if the task is non-conditional  (was: Clone map-red 
work using a faster BeanUtils.cloneBean() during mapjoin conversion)

 Delay the serialize-deserialize pair in CommonJoinResolver
 --

 Key: HIVE-4078
 URL: https://issues.apache.org/jira/browse/HIVE-4078
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-4078-20130227.patch, HIVE-4078-20130305.patch, 
 HIVE-4078.patch


 CommonJoinProcessor tries to clone a MapredWork while attempting a conversion 
 to a map-join
 {code}
   // deep copy a new mapred work from xml
   InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8));
   MapredWork newWork = Utilities.deserializeMapRedWork(in, 
 physicalContext.getConf());
 {code}
 which is a very heavy operation memory wise  cpu-wise.
 It would be better to do this only if a conditional task is required, 
 resulting in a copy of the task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4125) Expose metastore JMX metrics

2013-03-05 Thread Samuel Yuan (JIRA)
Samuel Yuan created HIVE-4125:
-

 Summary: Expose metastore JMX metrics
 Key: HIVE-4125
 URL: https://issues.apache.org/jira/browse/HIVE-4125
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Trivial


Add a safe way to access the metrics stored for each MetricsScope, so that they 
can be used outside of JMX.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2038) Metastore listener

2013-03-05 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593844#comment-13593844
 ] 

Arup Malakar commented on HIVE-2038:


[~ashutoshc] is the MetaStoreListener implementation supposed to be threadsafe? 
I am seeing issues related to that in HCatalog. The javadoc of the listener 
interface doesn't mention anything.

 Metastore listener
 --

 Key: HIVE-2038
 URL: https://issues.apache.org/jira/browse/HIVE-2038
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: hive_2038_3.patch, hive_2038_4.patch, hive-2038.patch, 
 metastore_listener.patch, metastore_listener.patch, metastore_listener.patch


 Provide to way to observe changes happening on Metastore

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3752) Add a non-sql API in hive to access data.

2013-03-05 Thread Nitay Joffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nitay Joffe resolved HIVE-3752.
---

Resolution: Not A Problem

I've done this work in a separate library.

 Add a non-sql API in hive to access data.
 -

 Key: HIVE-3752
 URL: https://issues.apache.org/jira/browse/HIVE-3752
 Project: Hive
  Issue Type: Improvement
Reporter: Nitay Joffe
Assignee: Nitay Joffe

 We would like to add an input/output format for accessing Hive data in Hadoop 
 directly without having to use e.g. a transform. Using a transform
 means having to do a whole map-reduce step with its own disk accesses and its 
 imposed structure. It also means needing to have Hive be the base 
 infrastructure for the entire system being developed which is not the right 
 fit as we only need a small part of it (access to the data).
 So we propose adding an API level InputFormat and OutputFormat to Hive that 
 will make it trivially easy to select a table with partition spec and read 
 from / write to it. We chose this design to make it compatible with Hadoop so 
 that existing systems that work with Hadoop's IO API will just work out of 
 the box.
 We need this system for the Giraph graph processing system 
 (http://giraph.apache.org/) as running graph jobs which read/write from Hive 
 is a common use case.
 [~namitjain] [~aching] [~kevinwilfong] [~apresta]
 Input-side (HiveApiInputFormat) review: https://reviews.facebook.net/D7401

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4078) Delay the serialize-deserialize pair in CommonJoinResolver

2013-03-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-4078:
--

Attachment: HIVE-4078-20130305.2.patch

with more comments  dead-code removed

 Delay the serialize-deserialize pair in CommonJoinResolver
 --

 Key: HIVE-4078
 URL: https://issues.apache.org/jira/browse/HIVE-4078
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-4078-20130227.patch, HIVE-4078-20130305.2.patch, 
 HIVE-4078-20130305.patch, HIVE-4078.patch


 CommonJoinProcessor tries to clone a MapredWork while attempting a conversion 
 to a map-join
 {code}
   // deep copy a new mapred work from xml
   InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8));
   MapredWork newWork = Utilities.deserializeMapRedWork(in, 
 physicalContext.getConf());
 {code}
 which is a very heavy operation memory wise  cpu-wise.
 It would be better to do this only if a conditional task is required, 
 resulting in a copy of the task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-03-05 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3874:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed, thanks Owen!

 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
 HIVE-3874.D8529.2.patch, HIVE-3874.D8529.3.patch, HIVE-3874.D8529.4.patch, 
 HIVE-3874.D8871.1.patch, OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4015) Add ORC file to the grammar as a file format

2013-03-05 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4015:
-

Status: Patch Available  (was: Open)

 Add ORC file to the grammar as a file format
 

 Key: HIVE-4015
 URL: https://issues.apache.org/jira/browse/HIVE-4015
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Gunther Hagleitner
 Attachments: HIVE-4015.1.patch, HIVE-4015.2.patch, HIVE-4015.3.patch, 
 HIVE-4015.4.patch


 It would be much more convenient for users if we enable them to use ORC as a 
 file format in the HQL grammar. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4110) Aggregation functions must have aliases when multiple functions are used

2013-03-05 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4110:
---

Status: Open  (was: Patch Available)

 Aggregation functions must have aliases when multiple functions are used
 

 Key: HIVE-4110
 URL: https://issues.apache.org/jira/browse/HIVE-4110
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4110-0.patch


 The following query fails:
 {noformat}
 select p_mfgr, p_retailprice, p_size,
 lead(p_retailprice) over(partition by p_mfgr order by p_size),
 lag(p_retailprice) over(partition by p_mfgr order by p_size)
 from part;
 {noformat}
 with the error below:
 {noformat}
 2013-03-02 16:10:47,126 ERROR ql.Driver (SessionState.java:printError(401)) - 
 FAILED: SemanticException [Error 10011]: Line 2:38 Invalid function 'p_mfgr'
 org.apache.hadoop.hive.ql.parse.SemanticException: Line 2:38 Invalid function 
 'p_mfgr'
   at 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:678)
   at 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:908)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
   at 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:166)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:8895)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:2634)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:2433)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:7234)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:7200)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:7978)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8651)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:259)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:898)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4110) Aggregation functions must have aliases when multiple functions are used

2013-03-05 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved HIVE-4110.


Resolution: Duplicate

 Aggregation functions must have aliases when multiple functions are used
 

 Key: HIVE-4110
 URL: https://issues.apache.org/jira/browse/HIVE-4110
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4110-0.patch


 The following query fails:
 {noformat}
 select p_mfgr, p_retailprice, p_size,
 lead(p_retailprice) over(partition by p_mfgr order by p_size),
 lag(p_retailprice) over(partition by p_mfgr order by p_size)
 from part;
 {noformat}
 with the error below:
 {noformat}
 2013-03-02 16:10:47,126 ERROR ql.Driver (SessionState.java:printError(401)) - 
 FAILED: SemanticException [Error 10011]: Line 2:38 Invalid function 'p_mfgr'
 org.apache.hadoop.hive.ql.parse.SemanticException: Line 2:38 Invalid function 
 'p_mfgr'
   at 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:678)
   at 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:908)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
   at 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:166)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:8895)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:2634)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:2433)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:7234)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:7200)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:7978)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8651)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:259)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:898)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4081) allow expressions with over clause

2013-03-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593917#comment-13593917
 ] 

Phabricator commented on HIVE-4081:
---

brock has commented on the revision HIVE-4081 [jira] allow expressions with 
over clause.

  The change looks good from my perspective, just one minor nit below. I'll 
mark HIVE-4110 as a duplicate.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:406 Is 
there a better variable name than aggregationTrees2?

REVISION DETAIL
  https://reviews.facebook.net/D9063

To: JIRA, ashutoshc, hbutani
Cc: brock


 allow expressions with over clause
 --

 Key: HIVE-4081
 URL: https://issues.apache.org/jira/browse/HIVE-4081
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4081.D9063.1.patch


 remove current restriction where only a UDAF invocation is allowed with a 
 windowing specification

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4097) ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids

2013-03-05 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593926#comment-13593926
 ] 

Gunther Hagleitner commented on HIVE-4097:
--

Looks good and verified it fixes (at least in part) the issue in the tests in 
HIVE-4015. +1 (non-comitter)

 ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids
 -

 Key: HIVE-4097
 URL: https://issues.apache.org/jira/browse/HIVE-4097
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4097.D9015.1.patch


 Hive assumes that an empty string in hive.io.file.readcolumn.ids means all 
 columns. The ORC reader currently assumes it means no columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4097) ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids

2013-03-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593928#comment-13593928
 ] 

Phabricator commented on HIVE-4097:
---

hagleitn has accepted the revision HIVE-4097 [jira] ORC file doesn't properly 
interpret empty hive.io.file.readcolumn.ids.

  Looks good to me.

REVISION DETAIL
  https://reviews.facebook.net/D9015

BRANCH
  tmp

ARCANIST PROJECT
  hive

To: JIRA, hagleitn, omalley


 ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids
 -

 Key: HIVE-4097
 URL: https://issues.apache.org/jira/browse/HIVE-4097
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4097.D9015.1.patch


 Hive assumes that an empty string in hive.io.file.readcolumn.ids means all 
 columns. The ORC reader currently assumes it means no columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #84

2013-03-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/84/

--
[...truncated 42432 lines...]
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2013-03-05 13:35:40,791 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] Execution completed successfully
[junit] Mapred Local Task Succeeded . Convert the Join into MapJoin
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/localscratchdir/hive_2013-03-05_13-35-38_019_2164532041772232386/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/tmp/hive_job_log_jenkins_201303051335_1203792916.txt
[junit] Copying file: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] Table default.testhivedrivertable stats: [num_partitions: 0, 
num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0]
[junit] POSTHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/localscratchdir/hive_2013-03-05_13-35-41_899_4532520662305030135/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/localscratchdir/hive_2013-03-05_13-35-41_899_4532520662305030135/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/tmp/hive_job_log_jenkins_201303051335_384717533.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable

[jira] [Created] (HIVE-4126) remove support for lead/lag UDFs outside of UDAF args

2013-03-05 Thread Harish Butani (JIRA)
Harish Butani created HIVE-4126:
---

 Summary: remove support for lead/lag UDFs outside of UDAF args
 Key: HIVE-4126
 URL: https://issues.apache.org/jira/browse/HIVE-4126
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani


Select Expressions such as 
p_size - lead(p_size,1)
are currently handled as non aggregation expressions done after all over 
clauses are evaluated.
Once we allow different partitions in a single select list(Jira 4041), these 
become ambiguous. 

- the equivalent way to do such things is either to use lead/lag UDAFs with 
expressions ( support added with Jira 4081)
- stack windowing clauses with inline queries. select lead(r,1).. from (select 
rank() as r)...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4126) remove support for lead/lag UDFs outside of UDAF args

2013-03-05 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4126:
--

Attachment: HIVE-4126.D9105.1.patch

hbutani requested code review of HIVE-4126 [jira] remove support for lead/lag 
UDFs outside of UDAF args.

Reviewers: JIRA, ashutoshc

remove support for lead/lag UDFs outside of UDAF args

Select Expressions such as
p_size - lead(p_size,1)
are currently handled as non aggregation expressions done after all over 
clauses are evaluated.
Once we allow different partitions in a single select list(Jira 4041), these 
become ambiguous.

the equivalent way to do such things is either to use lead/lag UDAFs 
with expressions ( support added with Jira 4081)
stack windowing clauses with inline queries. select lead(r,1).. from 
(select rank() as r)...

TEST PLAN
  existing tests

REVISION DETAIL
  https://reviews.facebook.net/D9105

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/test/queries/clientpositive/leadlag.q
  ql/src/test/queries/clientpositive/leadlag_queries.q
  ql/src/test/queries/clientpositive/ptf.q
  ql/src/test/queries/clientpositive/windowing.q
  ql/src/test/results/clientpositive/leadlag_queries.q.out
  ql/src/test/results/clientpositive/ptf.q.out
  ql/src/test/results/clientpositive/windowing.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/22065/

To: JIRA, ashutoshc, hbutani


 remove support for lead/lag UDFs outside of UDAF args
 -

 Key: HIVE-4126
 URL: https://issues.apache.org/jira/browse/HIVE-4126
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4126.D9105.1.patch


 Select Expressions such as 
 p_size - lead(p_size,1)
 are currently handled as non aggregation expressions done after all over 
 clauses are evaluated.
 Once we allow different partitions in a single select list(Jira 4041), these 
 become ambiguous. 
 - the equivalent way to do such things is either to use lead/lag UDAFs with 
 expressions ( support added with Jira 4081)
 - stack windowing clauses with inline queries. select lead(r,1).. from 
 (select rank() as r)...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4098) OrcInputFormat assumes Hive always calls createValue

2013-03-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593961#comment-13593961
 ] 

Phabricator commented on HIVE-4098:
---

hagleitn has accepted the revision HIVE-4098 [jira] OrcInputFormat assumes 
Hive always calls createValue.

  Looks good. There's a test for this in HIVE-4015.

REVISION DETAIL
  https://reviews.facebook.net/D9021

BRANCH
  hive-3874

ARCANIST PROJECT
  hive

To: JIRA, hagleitn, omalley


 OrcInputFormat assumes Hive always calls createValue
 

 Key: HIVE-4098
 URL: https://issues.apache.org/jira/browse/HIVE-4098
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4098.D9021.1.patch


 Hive's HiveContextAwareRecordReader doesn't create a new value for each 
 InputFormat and instead reuses the same row between input formats. That 
 causes the first record of second (and third, etc.) partition to be dropped 
 and replaced with the last row of the previous partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-2899) Remove dependency on sun's jdk.

2013-03-05 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-2899.
-

Resolution: Invalid

I'm closing this.

 Remove dependency on sun's jdk.
 ---

 Key: HIVE-2899
 URL: https://issues.apache.org/jira/browse/HIVE-2899
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 When the signal handlers were added, they introduced a dependency on 
 sun.misc.Signal and sun.misc.SignalHandler. We can look these classes up by 
 reflection and avoid the warning and also provide a soft-fail for non-sun 
 jvms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4081) allow expressions with over clause

2013-03-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593969#comment-13593969
 ] 

Phabricator commented on HIVE-4081:
---

ashutoshc has accepted the revision HIVE-4081 [jira] allow expressions with 
over clause.

  A minor comment. Also, would be good to fix Lint reported problems.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java:522 Do 
we also need to add TOK_PARTITIONBY, KW_ROWS, KW_RANGE, TOK_ORDERBY to this set 
?

REVISION DETAIL
  https://reviews.facebook.net/D9063

BRANCH
  HIVE-4081

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, hbutani
Cc: brock


 allow expressions with over clause
 --

 Key: HIVE-4081
 URL: https://issues.apache.org/jira/browse/HIVE-4081
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4081.D9063.1.patch


 remove current restriction where only a UDAF invocation is allowed with a 
 windowing specification

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4127) Testing with Hadoop 2.x causes test failure for ORC's TestFileDump

2013-03-05 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4127:
---

 Summary: Testing with Hadoop 2.x causes test failure for ORC's 
TestFileDump
 Key: HIVE-4127
 URL: https://issues.apache.org/jira/browse/HIVE-4127
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Hadoop 2's junit is a newer version, which causes differences in behaviors of 
the TestFileDump. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4124) Add more tests for windowing

2013-03-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4124.


Resolution: Fixed

Committed to branch. Thanks, Harish for the review.

 Add more tests for windowing
 

 Key: HIVE-4124
 URL: https://issues.apache.org/jira/browse/HIVE-4124
 Project: Hive
  Issue Type: Test
  Components: PTF-Windowing
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: tests.patch


 It will be good to add tests which tests against different data-types in 
 different scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4015) Add ORC file to the grammar as a file format

2013-03-05 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593987#comment-13593987
 ] 

Owen O'Malley commented on HIVE-4015:
-

+1 looks good to me.

 Add ORC file to the grammar as a file format
 

 Key: HIVE-4015
 URL: https://issues.apache.org/jira/browse/HIVE-4015
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Gunther Hagleitner
 Attachments: HIVE-4015.1.patch, HIVE-4015.2.patch, HIVE-4015.3.patch, 
 HIVE-4015.4.patch


 It would be much more convenient for users if we enable them to use ORC as a 
 file format in the HQL grammar. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4127) Testing with Hadoop 2.x causes test failure for ORC's TestFileDump

2013-03-05 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4127:
--

Attachment: HIVE-4127.D9111.1.patch

omalley requested code review of HIVE-4127 [jira] Testing with Hadoop 2.x 
causes test failure for ORC's TestFileDump.

Reviewers: JIRA

make TestFileDump less sensitive to the version of junit

Hadoop 2's junit is a newer version, which causes differences in behaviors of 
the TestFileDump.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D9111

AFFECTED FILES
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
  ql/src/test/resources/orc-file-dump.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/22077/

To: JIRA, omalley


 Testing with Hadoop 2.x causes test failure for ORC's TestFileDump
 --

 Key: HIVE-4127
 URL: https://issues.apache.org/jira/browse/HIVE-4127
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4127.D9111.1.patch


 Hadoop 2's junit is a newer version, which causes differences in behaviors of 
 the TestFileDump. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4127) Testing with Hadoop 2.x causes test failure for ORC's TestFileDump

2013-03-05 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4127:


Status: Patch Available  (was: Open)

 Testing with Hadoop 2.x causes test failure for ORC's TestFileDump
 --

 Key: HIVE-4127
 URL: https://issues.apache.org/jira/browse/HIVE-4127
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4127.D9111.1.patch


 Hadoop 2's junit is a newer version, which causes differences in behaviors of 
 the TestFileDump. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4128) Support avg(decimal)

2013-03-05 Thread Brock Noland (JIRA)
Brock Noland created HIVE-4128:
--

 Summary: Support avg(decimal)
 Key: HIVE-4128
 URL: https://issues.apache.org/jira/browse/HIVE-4128
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4128) Support avg(decimal)

2013-03-05 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4128:
---

Description: 
Currently the following query:

{noformat}
hive select p_mfgr, avg(p_retailprice) from part group by p_mfgr;
FAILED: UDFArgumentTypeException Only numeric or string type arguments are 
accepted but decimal is passed

 Support avg(decimal)
 

 Key: HIVE-4128
 URL: https://issues.apache.org/jira/browse/HIVE-4128
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor

 Currently the following query:
 {noformat}
 hive select p_mfgr, avg(p_retailprice) from part group by p_mfgr;
 FAILED: UDFArgumentTypeException Only numeric or string type arguments are 
 accepted but decimal is passed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4128) Support avg(decimal)

2013-03-05 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4128:
---

Description: 
Currently the following query:

{noformat}
hive select p_mfgr, avg(p_retailprice) from part group by p_mfgr;
FAILED: UDFArgumentTypeException Only numeric or string type arguments are 
accepted but decimal is passed
{noformat}

is not supported by hive but is on postgres.

  was:
Currently the following query:

{noformat}
hive select p_mfgr, avg(p_retailprice) from part group by p_mfgr;
FAILED: UDFArgumentTypeException Only numeric or string type arguments are 
accepted but decimal is passed


 Support avg(decimal)
 

 Key: HIVE-4128
 URL: https://issues.apache.org/jira/browse/HIVE-4128
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor

 Currently the following query:
 {noformat}
 hive select p_mfgr, avg(p_retailprice) from part group by p_mfgr;
 FAILED: UDFArgumentTypeException Only numeric or string type arguments are 
 accepted but decimal is passed
 {noformat}
 is not supported by hive but is on postgres.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4122) Queries fail if timestamp data not in expected format

2013-03-05 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4122:
--

Attachment: HIVE-4122-1.patch

 Queries fail if timestamp data not in expected format
 -

 Key: HIVE-4122
 URL: https://issues.apache.org/jira/browse/HIVE-4122
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Prasad Mujumdar
 Attachments: HIVE-4122-1.patch


 Queries will fail if timestamp data not in expected format. The expected 
 behavior is to return NULL for these invalid values.
 {code}
 # Not all timestamps in correct format:
 echo 1999-10-10
 1999-10-10 90:10:10
 -01-01 00:00:00  table.data
 hive -e create table timestamp_tbl (t timestamp)
 hadoop fs -put ./table.data HIVE_WAREHOUSE_DIR/timestamp_tbl/
 hive -e select t from timestamp_tbl
 Execution failed with exit status: 2
 13/03/05 09:47:05 ERROR exec.Task: Execution failed with exit status: 2
 Obtaining error information
 13/03/05 09:47:05 ERROR exec.Task: Obtaining error information
 Task failed!
 Task ID:
   Stage-1
 Logs:
 13/03/05 09:47:05 ERROR exec.Task: 
 Task failed!
 Task ID:
   Stage-1
 Logs:
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4122) Queries fail if timestamp data not in expected format

2013-03-05 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4122:
--

Status: Patch Available  (was: Open)

Patch attached

 Queries fail if timestamp data not in expected format
 -

 Key: HIVE-4122
 URL: https://issues.apache.org/jira/browse/HIVE-4122
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Prasad Mujumdar
 Attachments: HIVE-4122-1.patch


 Queries will fail if timestamp data not in expected format. The expected 
 behavior is to return NULL for these invalid values.
 {code}
 # Not all timestamps in correct format:
 echo 1999-10-10
 1999-10-10 90:10:10
 -01-01 00:00:00  table.data
 hive -e create table timestamp_tbl (t timestamp)
 hadoop fs -put ./table.data HIVE_WAREHOUSE_DIR/timestamp_tbl/
 hive -e select t from timestamp_tbl
 Execution failed with exit status: 2
 13/03/05 09:47:05 ERROR exec.Task: Execution failed with exit status: 2
 Obtaining error information
 13/03/05 09:47:05 ERROR exec.Task: Obtaining error information
 Task failed!
 Task ID:
   Stage-1
 Logs:
 13/03/05 09:47:05 ERROR exec.Task: 
 Task failed!
 Task ID:
   Stage-1
 Logs:
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4122) Queries fail if timestamp data not in expected format

2013-03-05 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594060#comment-13594060
 ] 

Prasad Mujumdar commented on HIVE-4122:
---

Review request on https://reviews.facebook.net/D9117

 Queries fail if timestamp data not in expected format
 -

 Key: HIVE-4122
 URL: https://issues.apache.org/jira/browse/HIVE-4122
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Prasad Mujumdar
 Attachments: HIVE-4122-1.patch


 Queries will fail if timestamp data not in expected format. The expected 
 behavior is to return NULL for these invalid values.
 {code}
 # Not all timestamps in correct format:
 echo 1999-10-10
 1999-10-10 90:10:10
 -01-01 00:00:00  table.data
 hive -e create table timestamp_tbl (t timestamp)
 hadoop fs -put ./table.data HIVE_WAREHOUSE_DIR/timestamp_tbl/
 hive -e select t from timestamp_tbl
 Execution failed with exit status: 2
 13/03/05 09:47:05 ERROR exec.Task: Execution failed with exit status: 2
 Obtaining error information
 13/03/05 09:47:05 ERROR exec.Task: Obtaining error information
 Task failed!
 Task ID:
   Stage-1
 Logs:
 13/03/05 09:47:05 ERROR exec.Task: 
 Task failed!
 Task ID:
   Stage-1
 Logs:
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4125) Expose metastore JMX metrics

2013-03-05 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4125:
--

Attachment: HIVE-4125.HIVE-4125.HIVE-4125.D9123.1.patch

sxyuan requested code review of HIVE-4125 [jira] Expose metastore JMX metrics.

Reviewers: kevinwilfong

Add a safe way to access the metrics stored for each MetricsScope, so that they 
can be used outside of JMX.

TEST PLAN
  Builds, metastore can run and log metrics.

REVISION DETAIL
  https://reviews.facebook.net/D9123

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/common/metrics/Metrics.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/22095/

To: kevinwilfong, sxyuan
Cc: JIRA


 Expose metastore JMX metrics
 

 Key: HIVE-4125
 URL: https://issues.apache.org/jira/browse/HIVE-4125
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Trivial
 Attachments: HIVE-4125.HIVE-4125.HIVE-4125.D9123.1.patch


 Add a safe way to access the metrics stored for each MetricsScope, so that 
 they can be used outside of JMX.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3835) Add an option to run tests where testfiles can be specified as a regular expression

2013-03-05 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594107#comment-13594107
 ] 

Kevin Wilfong commented on HIVE-3835:
-

to do what you want to do, it's something like

ant test -Dtestcase=TestCliDriver -Dqfile_regex='list_bucket_dml.*'

 Add an option to run tests where testfiles can be specified as a regular 
 expression
 ---

 Key: HIVE-3835
 URL: https://issues.apache.org/jira/browse/HIVE-3835
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Namit Jain

 For eg., if I want to run all list bucketing tests, I should be able to say:
  ant test -Dtestcase=TestCliDriver -Dqfile=list_bucket_dml*.q
 or something like that

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3835) Add an option to run tests where testfiles can be specified as a regular expression

2013-03-05 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong resolved HIVE-3835.
-

Resolution: Not A Problem

 Add an option to run tests where testfiles can be specified as a regular 
 expression
 ---

 Key: HIVE-3835
 URL: https://issues.apache.org/jira/browse/HIVE-3835
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Namit Jain

 For eg., if I want to run all list bucketing tests, I should be able to say:
  ant test -Dtestcase=TestCliDriver -Dqfile=list_bucket_dml*.q
 or something like that

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4081) allow expressions with over clause

2013-03-05 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594147#comment-13594147
 ] 

Ashutosh Chauhan commented on HIVE-4081:


Also it will be good to add following tests using newly added over10k dataset.
{noformat}
select s, si - lead(f, 3) over (partition by t order by bo desc) from over10k 
limit 100;
select s, i - lead(i, 3, 0) over (partition by si order by i) from over10k 
limit 100;
select s, si - lag(d, 3) over (partition by b order by si) from over10k limit 
100;
select s, lag(s, 3, 'fred') over (partition by f order by b) from over10k limit 
100;
{noformat}

 allow expressions with over clause
 --

 Key: HIVE-4081
 URL: https://issues.apache.org/jira/browse/HIVE-4081
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4081.D9063.1.patch


 remove current restriction where only a UDAF invocation is allowed with a 
 windowing specification

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 2001 - Failure

2013-03-05 Thread Apache Jenkins Server
Changes for Build #2001



1 tests failed.
REGRESSION:  
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1

Error Message:
Unexpected exception See build/ql/tmp/hive.log, or try ant test ... 
-Dtest.silent=false to get more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get 
more logs.
at junit.framework.Assert.fail(Assert.java:47)
at 
org.apache.hadoop.hive.cli.TestNegativeCliDriver.runTest(TestNegativeCliDriver.java:2381)
at 
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1(TestNegativeCliDriver.java:1867)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #2001)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/2001/ to 
view the results.

[jira] [Updated] (HIVE-4106) SMB joins fail in multi-way joins

2013-03-05 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4106:
-

Attachment: auto_sortmerge_join_12.q

Hi Namit,

I am able to reproduce this issue using this query. I see that the particular 
join (last join) is not getting added to the reject list because it does not 
match the conditions in the stack. In fact it is inverted. Let me know how to 
go about fixing this issue.

Thanks
Vikram.

 SMB joins fail in multi-way joins
 -

 Key: HIVE-4106
 URL: https://issues.apache.org/jira/browse/HIVE-4106
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: auto_sortmerge_join_12.q, HIVE-4106.patch


 I see array out of bounds exception in case of multi way smb joins. This is 
 related to changes that went in as part of HIVE-3403. This issue has been 
 discussed in HIVE-3891.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4084) Generated aliases for windowing expressions is broken

2013-03-05 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594267#comment-13594267
 ] 

Harish Butani commented on HIVE-4084:
-

fix in 4081

 Generated aliases for windowing expressions is broken
 -

 Key: HIVE-4084
 URL: https://issues.apache.org/jira/browse/HIVE-4084
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Prajakta Kalmegh

 all of the expressions w/o and alias get the alias of 'null-1'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4084) Generated aliases for windowing expressions is broken

2013-03-05 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani reassigned HIVE-4084:
---

Assignee: Harish Butani  (was: Prajakta Kalmegh)

 Generated aliases for windowing expressions is broken
 -

 Key: HIVE-4084
 URL: https://issues.apache.org/jira/browse/HIVE-4084
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani

 all of the expressions w/o and alias get the alias of 'null-1'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4111) Default value in lag is not handled correctly

2013-03-05 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani reassigned HIVE-4111:
---

Assignee: Harish Butani

 Default value in lag is not handled correctly
 -

 Key: HIVE-4111
 URL: https://issues.apache.org/jira/browse/HIVE-4111
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Ashutosh Chauhan
Assignee: Harish Butani

 select s, lag(s, 3, 'fred') over (partition by f order by b) from over100k;
 results in runtime exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4111) Default value in lag is not handled correctly

2013-03-05 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594270#comment-13594270
 ] 

Harish Butani commented on HIVE-4111:
-

this is not a default value issue; but an issue when the lag amount is larger 
than the partition size. Fix is in 4081

 Default value in lag is not handled correctly
 -

 Key: HIVE-4111
 URL: https://issues.apache.org/jira/browse/HIVE-4111
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Ashutosh Chauhan

 select s, lag(s, 3, 'fred') over (partition by f order by b) from over100k;
 results in runtime exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4081) allow expressions with over clause

2013-03-05 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4081:
--

Attachment: HIVE-4081.D9135.1.patch

hbutani requested code review of HIVE-4081 [jira] allow expressions with over 
clause.

Reviewers: JIRA, ashutoshc

fix lag amt less than part size issue; add wdw expr tests

remove current restriction where only a UDAF invocation is allowed with a 
windowing specification

TEST PLAN
  included

REVISION DETAIL
  https://reviews.facebook.net/D9135

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java
  ql/src/test/queries/clientpositive/windowing_expressions.q
  ql/src/test/results/clientpositive/windowing_expressions.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/22113/

To: JIRA, ashutoshc, hbutani
Cc: brock


 allow expressions with over clause
 --

 Key: HIVE-4081
 URL: https://issues.apache.org/jira/browse/HIVE-4081
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4081.D9063.1.patch, HIVE-4081.D9135.1.patch


 remove current restriction where only a UDAF invocation is allowed with a 
 windowing specification

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4081) allow expressions with over clause

2013-03-05 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594283#comment-13594283
 ] 

Harish Butani commented on HIVE-4081:
-

the changes are as specified in the previous review. Also fixed the issue in 
HIVE-4111
But didn't do a arc diff against the previous commit; which maybe why this is a 
new review.

 allow expressions with over clause
 --

 Key: HIVE-4081
 URL: https://issues.apache.org/jira/browse/HIVE-4081
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4081.D9063.1.patch, HIVE-4081.D9135.1.patch


 remove current restriction where only a UDAF invocation is allowed with a 
 windowing specification

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4081) allow expressions with over clause

2013-03-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594288#comment-13594288
 ] 

Phabricator commented on HIVE-4081:
---

ashutoshc has accepted the revision HIVE-4081 [jira] allow expressions with 
over clause.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D9135

BRANCH
  HIVE-4081

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, hbutani
Cc: brock


 allow expressions with over clause
 --

 Key: HIVE-4081
 URL: https://issues.apache.org/jira/browse/HIVE-4081
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4081.D9063.1.patch, HIVE-4081.D9135.1.patch


 remove current restriction where only a UDAF invocation is allowed with a 
 windowing specification

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4081) allow expressions with over clause

2013-03-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4081.


Resolution: Fixed

Committed to branch. Thanks, Harish!

 allow expressions with over clause
 --

 Key: HIVE-4081
 URL: https://issues.apache.org/jira/browse/HIVE-4081
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4081.D9063.1.patch, HIVE-4081.D9135.1.patch


 remove current restriction where only a UDAF invocation is allowed with a 
 windowing specification

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4111) Default value in lag is not handled correctly

2013-03-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4111.


Resolution: Fixed

Fixed via HIVE-4081

 Default value in lag is not handled correctly
 -

 Key: HIVE-4111
 URL: https://issues.apache.org/jira/browse/HIVE-4111
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Ashutosh Chauhan
Assignee: Harish Butani

 select s, lag(s, 3, 'fred') over (partition by f order by b) from over100k;
 results in runtime exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4084) Generated aliases for windowing expressions is broken

2013-03-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4084.


Resolution: Fixed

Resolving since HIVE-4081 is checked-in.

 Generated aliases for windowing expressions is broken
 -

 Key: HIVE-4084
 URL: https://issues.apache.org/jira/browse/HIVE-4084
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani

 all of the expressions w/o and alias get the alias of 'null-1'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4129) Window handling dumps debug info on console, instead should use logger.

2013-03-05 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-4129:
--

 Summary: Window handling dumps debug info on console, instead 
should use logger.
 Key: HIVE-4129
 URL: https://issues.apache.org/jira/browse/HIVE-4129
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4129) Window handling dumps debug info on console, instead should use logger.

2013-03-05 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594304#comment-13594304
 ] 

Ashutosh Chauhan commented on HIVE-4129:


 https://reviews.facebook.net/D9141

 Window handling dumps debug info on console, instead should use logger.
 ---

 Key: HIVE-4129
 URL: https://issues.apache.org/jira/browse/HIVE-4129
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4106) SMB joins fail in multi-way joins

2013-03-05 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594311#comment-13594311
 ] 

Vikram Dixit K commented on HIVE-4106:
--

This test somehow feels awkward to me. It produces the issue but the test I 
have is slightly different. I am trying to come up with a better test for this.

 SMB joins fail in multi-way joins
 -

 Key: HIVE-4106
 URL: https://issues.apache.org/jira/browse/HIVE-4106
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: auto_sortmerge_join_12.q, HIVE-4106.patch


 I see array out of bounds exception in case of multi way smb joins. This is 
 related to changes that went in as part of HIVE-3403. This issue has been 
 discussed in HIVE-3891.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4041) Support multiple partitionings in a single Query

2013-03-05 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594355#comment-13594355
 ] 

Harish Butani commented on HIVE-4041:
-

attached Design notes

 Support multiple partitionings in a single Query
 

 Key: HIVE-4041
 URL: https://issues.apache.org/jira/browse/HIVE-4041
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: WindowingComponentization.pdf


 Currently we disallow queries if the partition specifications of all Wdw fns 
 are not the same. We can relax this by generating multiple PTFOps based on 
 the unique partitionings in a Query. For partitionings that only differ in 
 sort, we can introduce a sort step in between PTFOps, which can happen in the 
 same Reduce task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4041) Support multiple partitionings in a single Query

2013-03-05 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-4041:


Attachment: WindowingComponentization.pdf

 Support multiple partitionings in a single Query
 

 Key: HIVE-4041
 URL: https://issues.apache.org/jira/browse/HIVE-4041
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: WindowingComponentization.pdf


 Currently we disallow queries if the partition specifications of all Wdw fns 
 are not the same. We can relax this by generating multiple PTFOps based on 
 the unique partitionings in a Query. For partitionings that only differ in 
 sort, we can introduce a sort step in between PTFOps, which can happen in the 
 same Reduce task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4108) Allow over() clause to contain an order by with no partition by

2013-03-05 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594369#comment-13594369
 ] 

Harish Butani commented on HIVE-4108:
-

So this will be the behavior:
{noformat}
| has Part | has Order | has Window | note  |
| y| y | y  | everything specified  |
| y| y | n  | no window |
| y| n | y  | order = partition |
| y| n | n  | same as above |
| n| y | y  | partition on constant |
| n| y | n  | same as above |
| n| n | y  | same as above, o = p  |
| n| n | n  | the over() case   |
{noformat}


 Allow over() clause to contain an order by with no partition by
 ---

 Key: HIVE-4108
 URL: https://issues.apache.org/jira/browse/HIVE-4108
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Brock Noland

 HIVE-4073 allows over() to be called with no partition by and no order by. We 
 should allow only an order by.
 From the review of HIVE-4073:
 Ashutosh
 {noformat}
 Can you also add following test. This should also work.
 select p_name, p_retailprice,
 avg(p_retailprice) over(order by p_name)
 from part
 partition by p_name;
 {noformat}
 Harish
 {noformat}
 This test will not work (:
 The grammar needs to be changed so:
 partitioningSpec
 @init { msgs.push(partitioningSpec clause); }
 @after { msgs.pop(); } 
 :
 partitionByClause orderByClause? - ^(TOK_PARTITIONINGSPEC partitionByClause 
 orderByClause?) |
 orderByClause - ^(TOK_PARTITIONINGSPEC orderByClause) |
 distributeByClause sortByClause? - ^(TOK_PARTITIONINGSPEC distributeByClause 
 sortByClause?) |
 sortByClause? - ^(TOK_PARTITIONINGSPEC sortByClause) |
 clusterByClause - ^(TOK_PARTITIONINGSPEC clusterByClause)
 ;
 And the SemanticAnalyzer::processPTFPartitionSpec has to handle this shape of 
 the AST Tree. The PTFTranslator also needs changes. Do this as another Jira
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4130) Bring the Lead/Lag UDFs interface in line with Lead/Lag UDAFs

2013-03-05 Thread Harish Butani (JIRA)
Harish Butani created HIVE-4130:
---

 Summary: Bring the Lead/Lag UDFs interface in line with Lead/Lag 
UDAFs
 Key: HIVE-4130
 URL: https://issues.apache.org/jira/browse/HIVE-4130
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani


- support a default value arg
- both amt and defaultValue args can be optional


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4131) Fix eclipse template classpath to include new packages added by ORC file patch

2013-03-05 Thread Prasad Mujumdar (JIRA)
Prasad Mujumdar created HIVE-4131:
-

 Summary: Fix eclipse template classpath to include new packages 
added by ORC file patch
 Key: HIVE-4131
 URL: https://issues.apache.org/jira/browse/HIVE-4131
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.11.0


The ORC file feature (HIVE-3874) has added protobuf and snappy libraries, also 
generated protobuf code. All these needs to be included in the eclipse 
classpath template. The eclipse projected generated on latest trunk has build 
errors due to the missing jar/classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4131) Fix eclipse template classpath to include new packages added by ORC file patch

2013-03-05 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4131:
--

Attachment: HIVE-4131-1.patch

 Fix eclipse template classpath to include new packages added by ORC file patch
 --

 Key: HIVE-4131
 URL: https://issues.apache.org/jira/browse/HIVE-4131
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.11.0

 Attachments: HIVE-4131-1.patch


 The ORC file feature (HIVE-3874) has added protobuf and snappy libraries, 
 also generated protobuf code. All these needs to be included in the eclipse 
 classpath template. The eclipse projected generated on latest trunk has build 
 errors due to the missing jar/classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4131) Fix eclipse template classpath to include new packages added by ORC file patch

2013-03-05 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4131:
--

Status: Patch Available  (was: Open)

Patch attached

 Fix eclipse template classpath to include new packages added by ORC file patch
 --

 Key: HIVE-4131
 URL: https://issues.apache.org/jira/browse/HIVE-4131
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.11.0

 Attachments: HIVE-4131-1.patch


 The ORC file feature (HIVE-3874) has added protobuf and snappy libraries, 
 also generated protobuf code. All these needs to be included in the eclipse 
 classpath template. The eclipse projected generated on latest trunk has build 
 errors due to the missing jar/classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira