[jira] [Created] (HIVE-10471) Derive column definitions from a raw Parquet data file

2015-04-23 Thread Mariano Dominguez (JIRA)
Mariano Dominguez created HIVE-10471:


 Summary: Derive column definitions from a raw Parquet data file
 Key: HIVE-10471
 URL: https://issues.apache.org/jira/browse/HIVE-10471
 Project: Hive
  Issue Type: Improvement
Reporter: Mariano Dominguez


This feature will allow Hive to create Parquet-backed tables the same way 
Cloudera's Impala[1] does:

CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET 
'/user/etl/destination/datafile1.dat'
  STORED AS PARQUET
  LOCATION '/user/etl/destination';

CREATE TABLE columns_from_data_file LIKE PARQUET 
'/user/etl/destination/datafile1.dat'
  STORED AS PARQUET;

[1] 
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_parquet.html#parquet_ddl_unique_1





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9228) Problem with subquery using windowing functions

2014-12-30 Thread Mariano Dominguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariano Dominguez updated HIVE-9228:

Affects Version/s: 0.13.1

 Problem with subquery using windowing functions
 ---

 Key: HIVE-9228
 URL: https://issues.apache.org/jira/browse/HIVE-9228
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Aihua Xu
Assignee: Aihua Xu
   Original Estimate: 96h
  Remaining Estimate: 96h

 The following query with window functions failed. The internal query works 
 fine.
 select st_fips_cd, zip_cd_5, hh_surr_key
 from
 (
 select st_fips_cd, zip_cd_5, hh_surr_key,
 count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by 
 st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1,
 row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key 
 asc) as analytic_row_number3
 from hh_agg
 where analytic_row_number2 = 1
 ) t;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-7903) [Documentation] Remove hive.metastore.warehouse.dir from Client Configuration Parameters list in Remote Metastore section

2014-08-28 Thread Mariano Dominguez (JIRA)
Mariano Dominguez created HIVE-7903:
---

 Summary: [Documentation] Remove hive.metastore.warehouse.dir from 
Client Configuration Parameters list in Remote Metastore section
 Key: HIVE-7903
 URL: https://issues.apache.org/jira/browse/HIVE-7903
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.12.0
Reporter: Mariano Dominguez


Source: 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastore

In Remote Metastore deployment mode, neither the Hive CLI nor the Beeline can 
change the value of the ‘hive.metastore.warehouse.dir’ property because it is a 
“server-side” property.

Changing the value can be accomplished, however, by running in Local Metastore 
mode (that is, bypassing the Hive Metastore Server and directly accessing the 
Metastore database):

1) At runtime
$ hive --hiveconf hive.metastore.warehouse.dir=path -e “query”
$ beeline --hiveconf hive.metastore.warehouse.dir=path -n username -p 
password -u connection_string -e query

2) In the shell
hive  SET hive.metastore.warehouse.dir=path;
beeline  SET hive.metastore.warehouse.dir=path;

3) From configuration file: hive-site.xml

This property gets cached once a table/database is created; therefore, 
subsequent value changes will not take effect. You will need to start a new 
session to re-set the property.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7903) [Documentation] Remove hive.metastore.warehouse.dir from Client Configuration Parameters list in Remote Metastore section

2014-08-28 Thread Mariano Dominguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariano Dominguez updated HIVE-7903:


Description: 
Source: 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastore

In Remote Metastore deployment mode, neither the Hive CLI nor the Beeline can 
change the value of the ‘hive.metastore.warehouse.dir’ property because it is a 
“server-side” property.

Changing the value can be accomplished, however, by running in Local Metastore 
mode (that is, bypassing the Hive Metastore Server and directly accessing the 
Metastore database):

1) At runtime
$ hive --hiveconf hive.metastore.warehouse.dir=path -e “query”
$ beeline --hiveconf hive.metastore.warehouse.dir=path -n username -p 
password -u connection_string -e query

2) In the shell
hive  SET hive.metastore.warehouse.dir=path;
beeline  SET hive.metastore.warehouse.dir=path;

3) From configuration file: hive-site.xml

This property gets cached once a table/database is created; therefore, 
subsequent value changes will not take effect. You will need to start a new 
session to re-set the property.


  was:
Source: 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastore

In Remote Metastore deployment mode, neither the Hive CLI nor the Beeline can 
change the value of the ‘hive.metastore.warehouse.dir’ property because it is a 
“server-side” property.

Changing the value can be accomplished, however, by running in Local Metastore 
mode (that is, bypassing the Hive Metastore Server and directly accessing the 
Metastore database):

1) At runtime
$ hive --hiveconf hive.metastore.warehouse.dir=path -e “query”
$ beeline --hiveconf hive.metastore.warehouse.dir=path -n username -p 
password -u connection_string -e query

2) In the shell
hive  SET hive.metastore.warehouse.dir=path;
beeline  SET hive.metastore.warehouse.dir=path;

3) From configuration file: hive-site.xml

This property gets cached once a table/database is created; therefore, 
subsequent value changes will not take effect. You will need to start a new 
session to re-set the property.



 [Documentation] Remove hive.metastore.warehouse.dir from Client Configuration 
 Parameters list in Remote Metastore section
 -

 Key: HIVE-7903
 URL: https://issues.apache.org/jira/browse/HIVE-7903
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.12.0
Reporter: Mariano Dominguez

 Source: 
 https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastore
 In Remote Metastore deployment mode, neither the Hive CLI nor the Beeline can 
 change the value of the ‘hive.metastore.warehouse.dir’ property because it is 
 a “server-side” property.
 Changing the value can be accomplished, however, by running in Local 
 Metastore mode (that is, bypassing the Hive Metastore Server and directly 
 accessing the Metastore database):
 1) At runtime
 $ hive --hiveconf hive.metastore.warehouse.dir=path -e “query”
 $ beeline --hiveconf hive.metastore.warehouse.dir=path -n username -p 
 password -u connection_string -e query
 2) In the shell
 hive  SET hive.metastore.warehouse.dir=path;
 beeline  SET hive.metastore.warehouse.dir=path;
 3) From configuration file: hive-site.xml
 This property gets cached once a table/database is created; therefore, 
 subsequent value changes will not take effect. You will need to start a new 
 session to re-set the property.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7631) Add support for NOT operator in column regex

2014-08-06 Thread Mariano Dominguez (JIRA)
Mariano Dominguez created HIVE-7631:
---

 Summary: Add support for NOT operator in column regex
 Key: HIVE-7631
 URL: https://issues.apache.org/jira/browse/HIVE-7631
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Mariano Dominguez


Given the following table:
0: jdbc:hive2://localhost:1/default DESCRIBE regex_column_tb;  
+---+---+---+
|   col_name|   data_type   |comment|
+---+---+---+
| stage_c1  | int   | None  |
| stage_c2  | int   | None  |
| c1| int   | None  |
| c2| int   | None  |
+---+---+---+

A simple regex can be used to select certain columns:
0: jdbc:hive2://localhost:1/default SELECT `stage_.*` FROM regex_column_tb;
+---+---+
| stage_c1  | stage_c2  |
+---+---+
+---+---+

The regex “(?!stage_).*” using the NOT operator is a valid Java regex, but it 
does not seem to be supported in Hive:
0: jdbc:hive2://localhost:1/default SELECT `(?!stage_).*` FROM 
regex_column_tb;
Error: Error while compiling statement: FAILED: ParseException line 1:17 
mismatched input ')' expecting FROM near 'stage_' in from clause 
(state=42000,code=4)

The following regex is supported (HIVE-420), but it is not as intuitive:
0: jdbc:hive2://localhost:1/default SELECT `(stage_.*)?+.+` FROM 
regex_column_tb; 
+-+-+
| c1  | c2  |
+-+-+
+-+-+




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7046) Propagate addition of new columns to partition schema

2014-05-15 Thread Mariano Dominguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariano Dominguez updated HIVE-7046:


Affects Version/s: 0.11.0
   0.13.0

 Propagate addition of new columns to partition schema
 -

 Key: HIVE-7046
 URL: https://issues.apache.org/jira/browse/HIVE-7046
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema
Affects Versions: 0.11.0, 0.12.0, 0.13.0
Reporter: Mariano Dominguez

 Hive reads data according to the partition schema, not the table schema 
 (because of HIVE-3833). ALTER TABLE only updates the table schema, and the 
 changes are not propagated to partitions. Thus, the schema of a partition 
 will differ from that of the table after altering the table schema; this is 
 done to preserve the ability to read existing data, particularly when using 
 binary formats such as RCFile. Binary formats do not allow changing the type 
 of a field because of the way serialization works; a field serialized as a 
 string will be displayed incorrectly if read as an integer.
 Unfortunately, as a side effect, this behavior limits the ability to add new 
 columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A 
 possible workaround is to manually recreate the partitions, but this process 
 could be unnecessarily cumbersome if the number of partitions is high. New 
 columns should be propagated to existing partitions automatically instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7046) Propagate addition of new columns to partition schema

2014-05-13 Thread Mariano Dominguez (JIRA)
Mariano Dominguez created HIVE-7046:
---

 Summary: Propagate addition of new columns to partition schema
 Key: HIVE-7046
 URL: https://issues.apache.org/jira/browse/HIVE-7046
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema
Affects Versions: 0.12.0
Reporter: Mariano Dominguez


Hive reads data according to the partition schema, not the table schema 
(because of HIVE-3833). ALTER TABLE only updates the table schema, and the 
changes are not propagated to partitions. Thus, the schema of a partition will 
differ from that of the table after altering the table schema; this is done to 
preserve the ability to read existing data, particularly when using binary 
formats such as RCFile. Binary formats do not allow changing the type of a 
field because of the way serialization works; a field serialized as a string 
will be displayed incorrectly if read as an integer.

Unfortunately, as a side effect, this behavior limits the ability to add new 
columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible 
workaround is to recreate the partitions, but this process could be 
unnecessarily cumbersome if the number of partitions is high. New columns 
should be propagated to existing partitions automatically instead.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7046) Propagate addition of new columns to partition schema

2014-05-12 Thread Mariano Dominguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariano Dominguez updated HIVE-7046:


Description: 
Hive reads data according to the partition schema, not the table schema 
(because of HIVE-3833). ALTER TABLE only updates the table schema, and the 
changes are not propagated to partitions. Thus, the schema of a partition will 
differ from that of the table after altering the table schema; this is done to 
preserve the ability to read existing data, particularly when using binary 
formats such as RCFile. Binary formats do not allow changing the type of a 
field because of the way serialization works; a field serialized as a string 
will be displayed incorrectly if read as an integer.

Unfortunately, as a side effect, this behavior limits the ability to add new 
columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible 
workaround is to manually recreate the partitions, but this process could be 
unnecessarily cumbersome if the number of partitions is high. New columns 
should be propagated to existing partitions automatically instead.


  was:
Hive reads data according to the partition schema, not the table schema 
(because of HIVE-3833). ALTER TABLE only updates the table schema, and the 
changes are not propagated to partitions. Thus, the schema of a partition will 
differ from that of the table after altering the table schema; this is done to 
preserve the ability to read existing data, particularly when using binary 
formats such as RCFile. Binary formats do not allow changing the type of a 
field because of the way serialization works; a field serialized as a string 
will be displayed incorrectly if read as an integer.

Unfortunately, as a side effect, this behavior limits the ability to add new 
columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible 
workaround is to recreate the partitions, but this process could be 
unnecessarily cumbersome if the number of partitions is high. New columns 
should be propagated to existing partitions automatically instead.



 Propagate addition of new columns to partition schema
 -

 Key: HIVE-7046
 URL: https://issues.apache.org/jira/browse/HIVE-7046
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema
Affects Versions: 0.12.0
Reporter: Mariano Dominguez

 Hive reads data according to the partition schema, not the table schema 
 (because of HIVE-3833). ALTER TABLE only updates the table schema, and the 
 changes are not propagated to partitions. Thus, the schema of a partition 
 will differ from that of the table after altering the table schema; this is 
 done to preserve the ability to read existing data, particularly when using 
 binary formats such as RCFile. Binary formats do not allow changing the type 
 of a field because of the way serialization works; a field serialized as a 
 string will be displayed incorrectly if read as an integer.
 Unfortunately, as a side effect, this behavior limits the ability to add new 
 columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A 
 possible workaround is to manually recreate the partitions, but this process 
 could be unnecessarily cumbersome if the number of partitions is high. New 
 columns should be propagated to existing partitions automatically instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-5999) Allow other characters for LINES TERMINATED BY

2013-12-10 Thread Mariano Dominguez (JIRA)
Mariano Dominguez created HIVE-5999:
---

 Summary: Allow other characters for LINES TERMINATED BY 
 Key: HIVE-5999
 URL: https://issues.apache.org/jira/browse/HIVE-5999
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema
Affects Versions: 0.12.0
Reporter: Mariano Dominguez


LINES TERMINATED BY only supports newline '\n' right now.

It would be nice to loosen this constraint and allow other characters.

This limitation seems to be hardcoded here:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-5823) Support for DECIMAL primitive type in AvroSerDe

2013-11-14 Thread Mariano Dominguez (JIRA)
Mariano Dominguez created HIVE-5823:
---

 Summary: Support for DECIMAL primitive type in AvroSerDe
 Key: HIVE-5823
 URL: https://issues.apache.org/jira/browse/HIVE-5823
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Mariano Dominguez
Priority: Minor


This new feature request would be tied to AVRO-1402.

Adding DECIMAL support would be particularly interesting when converting types 
from Avro to Hive, since DECIMAL is already a supported data type in Hive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)