[jira] [Created] (HIVE-10471) Derive column definitions from a raw Parquet data file
Mariano Dominguez created HIVE-10471: Summary: Derive column definitions from a raw Parquet data file Key: HIVE-10471 URL: https://issues.apache.org/jira/browse/HIVE-10471 Project: Hive Issue Type: Improvement Reporter: Mariano Dominguez This feature will allow Hive to create Parquet-backed tables the same way Cloudera's Impala[1] does: CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat' STORED AS PARQUET LOCATION '/user/etl/destination'; CREATE TABLE columns_from_data_file LIKE PARQUET '/user/etl/destination/datafile1.dat' STORED AS PARQUET; [1] http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_parquet.html#parquet_ddl_unique_1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariano Dominguez updated HIVE-9228: Affects Version/s: 0.13.1 Problem with subquery using windowing functions --- Key: HIVE-9228 URL: https://issues.apache.org/jira/browse/HIVE-9228 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Aihua Xu Assignee: Aihua Xu Original Estimate: 96h Remaining Estimate: 96h The following query with window functions failed. The internal query works fine. select st_fips_cd, zip_cd_5, hh_surr_key from ( select st_fips_cd, zip_cd_5, hh_surr_key, count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1, row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key asc) as analytic_row_number3 from hh_agg where analytic_row_number2 = 1 ) t; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7903) [Documentation] Remove hive.metastore.warehouse.dir from Client Configuration Parameters list in Remote Metastore section
Mariano Dominguez created HIVE-7903: --- Summary: [Documentation] Remove hive.metastore.warehouse.dir from Client Configuration Parameters list in Remote Metastore section Key: HIVE-7903 URL: https://issues.apache.org/jira/browse/HIVE-7903 Project: Hive Issue Type: Bug Components: Documentation Affects Versions: 0.12.0 Reporter: Mariano Dominguez Source: https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastore In Remote Metastore deployment mode, neither the Hive CLI nor the Beeline can change the value of the ‘hive.metastore.warehouse.dir’ property because it is a “server-side” property. Changing the value can be accomplished, however, by running in Local Metastore mode (that is, bypassing the Hive Metastore Server and directly accessing the Metastore database): 1) At runtime $ hive --hiveconf hive.metastore.warehouse.dir=path -e “query” $ beeline --hiveconf hive.metastore.warehouse.dir=path -n username -p password -u connection_string -e query 2) In the shell hive SET hive.metastore.warehouse.dir=path; beeline SET hive.metastore.warehouse.dir=path; 3) From configuration file: hive-site.xml This property gets cached once a table/database is created; therefore, subsequent value changes will not take effect. You will need to start a new session to re-set the property. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7903) [Documentation] Remove hive.metastore.warehouse.dir from Client Configuration Parameters list in Remote Metastore section
[ https://issues.apache.org/jira/browse/HIVE-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariano Dominguez updated HIVE-7903: Description: Source: https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastore In Remote Metastore deployment mode, neither the Hive CLI nor the Beeline can change the value of the ‘hive.metastore.warehouse.dir’ property because it is a “server-side” property. Changing the value can be accomplished, however, by running in Local Metastore mode (that is, bypassing the Hive Metastore Server and directly accessing the Metastore database): 1) At runtime $ hive --hiveconf hive.metastore.warehouse.dir=path -e “query” $ beeline --hiveconf hive.metastore.warehouse.dir=path -n username -p password -u connection_string -e query 2) In the shell hive SET hive.metastore.warehouse.dir=path; beeline SET hive.metastore.warehouse.dir=path; 3) From configuration file: hive-site.xml This property gets cached once a table/database is created; therefore, subsequent value changes will not take effect. You will need to start a new session to re-set the property. was: Source: https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastore In Remote Metastore deployment mode, neither the Hive CLI nor the Beeline can change the value of the ‘hive.metastore.warehouse.dir’ property because it is a “server-side” property. Changing the value can be accomplished, however, by running in Local Metastore mode (that is, bypassing the Hive Metastore Server and directly accessing the Metastore database): 1) At runtime $ hive --hiveconf hive.metastore.warehouse.dir=path -e “query” $ beeline --hiveconf hive.metastore.warehouse.dir=path -n username -p password -u connection_string -e query 2) In the shell hive SET hive.metastore.warehouse.dir=path; beeline SET hive.metastore.warehouse.dir=path; 3) From configuration file: hive-site.xml This property gets cached once a table/database is created; therefore, subsequent value changes will not take effect. You will need to start a new session to re-set the property. [Documentation] Remove hive.metastore.warehouse.dir from Client Configuration Parameters list in Remote Metastore section - Key: HIVE-7903 URL: https://issues.apache.org/jira/browse/HIVE-7903 Project: Hive Issue Type: Bug Components: Documentation Affects Versions: 0.12.0 Reporter: Mariano Dominguez Source: https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastore In Remote Metastore deployment mode, neither the Hive CLI nor the Beeline can change the value of the ‘hive.metastore.warehouse.dir’ property because it is a “server-side” property. Changing the value can be accomplished, however, by running in Local Metastore mode (that is, bypassing the Hive Metastore Server and directly accessing the Metastore database): 1) At runtime $ hive --hiveconf hive.metastore.warehouse.dir=path -e “query” $ beeline --hiveconf hive.metastore.warehouse.dir=path -n username -p password -u connection_string -e query 2) In the shell hive SET hive.metastore.warehouse.dir=path; beeline SET hive.metastore.warehouse.dir=path; 3) From configuration file: hive-site.xml This property gets cached once a table/database is created; therefore, subsequent value changes will not take effect. You will need to start a new session to re-set the property. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7631) Add support for NOT operator in column regex
Mariano Dominguez created HIVE-7631: --- Summary: Add support for NOT operator in column regex Key: HIVE-7631 URL: https://issues.apache.org/jira/browse/HIVE-7631 Project: Hive Issue Type: Improvement Affects Versions: 0.12.0 Reporter: Mariano Dominguez Given the following table: 0: jdbc:hive2://localhost:1/default DESCRIBE regex_column_tb; +---+---+---+ | col_name| data_type |comment| +---+---+---+ | stage_c1 | int | None | | stage_c2 | int | None | | c1| int | None | | c2| int | None | +---+---+---+ A simple regex can be used to select certain columns: 0: jdbc:hive2://localhost:1/default SELECT `stage_.*` FROM regex_column_tb; +---+---+ | stage_c1 | stage_c2 | +---+---+ +---+---+ The regex “(?!stage_).*” using the NOT operator is a valid Java regex, but it does not seem to be supported in Hive: 0: jdbc:hive2://localhost:1/default SELECT `(?!stage_).*` FROM regex_column_tb; Error: Error while compiling statement: FAILED: ParseException line 1:17 mismatched input ')' expecting FROM near 'stage_' in from clause (state=42000,code=4) The following regex is supported (HIVE-420), but it is not as intuitive: 0: jdbc:hive2://localhost:1/default SELECT `(stage_.*)?+.+` FROM regex_column_tb; +-+-+ | c1 | c2 | +-+-+ +-+-+ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7046) Propagate addition of new columns to partition schema
[ https://issues.apache.org/jira/browse/HIVE-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariano Dominguez updated HIVE-7046: Affects Version/s: 0.11.0 0.13.0 Propagate addition of new columns to partition schema - Key: HIVE-7046 URL: https://issues.apache.org/jira/browse/HIVE-7046 Project: Hive Issue Type: Improvement Components: Database/Schema Affects Versions: 0.11.0, 0.12.0, 0.13.0 Reporter: Mariano Dominguez Hive reads data according to the partition schema, not the table schema (because of HIVE-3833). ALTER TABLE only updates the table schema, and the changes are not propagated to partitions. Thus, the schema of a partition will differ from that of the table after altering the table schema; this is done to preserve the ability to read existing data, particularly when using binary formats such as RCFile. Binary formats do not allow changing the type of a field because of the way serialization works; a field serialized as a string will be displayed incorrectly if read as an integer. Unfortunately, as a side effect, this behavior limits the ability to add new columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible workaround is to manually recreate the partitions, but this process could be unnecessarily cumbersome if the number of partitions is high. New columns should be propagated to existing partitions automatically instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7046) Propagate addition of new columns to partition schema
Mariano Dominguez created HIVE-7046: --- Summary: Propagate addition of new columns to partition schema Key: HIVE-7046 URL: https://issues.apache.org/jira/browse/HIVE-7046 Project: Hive Issue Type: Improvement Components: Database/Schema Affects Versions: 0.12.0 Reporter: Mariano Dominguez Hive reads data according to the partition schema, not the table schema (because of HIVE-3833). ALTER TABLE only updates the table schema, and the changes are not propagated to partitions. Thus, the schema of a partition will differ from that of the table after altering the table schema; this is done to preserve the ability to read existing data, particularly when using binary formats such as RCFile. Binary formats do not allow changing the type of a field because of the way serialization works; a field serialized as a string will be displayed incorrectly if read as an integer. Unfortunately, as a side effect, this behavior limits the ability to add new columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible workaround is to recreate the partitions, but this process could be unnecessarily cumbersome if the number of partitions is high. New columns should be propagated to existing partitions automatically instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7046) Propagate addition of new columns to partition schema
[ https://issues.apache.org/jira/browse/HIVE-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariano Dominguez updated HIVE-7046: Description: Hive reads data according to the partition schema, not the table schema (because of HIVE-3833). ALTER TABLE only updates the table schema, and the changes are not propagated to partitions. Thus, the schema of a partition will differ from that of the table after altering the table schema; this is done to preserve the ability to read existing data, particularly when using binary formats such as RCFile. Binary formats do not allow changing the type of a field because of the way serialization works; a field serialized as a string will be displayed incorrectly if read as an integer. Unfortunately, as a side effect, this behavior limits the ability to add new columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible workaround is to manually recreate the partitions, but this process could be unnecessarily cumbersome if the number of partitions is high. New columns should be propagated to existing partitions automatically instead. was: Hive reads data according to the partition schema, not the table schema (because of HIVE-3833). ALTER TABLE only updates the table schema, and the changes are not propagated to partitions. Thus, the schema of a partition will differ from that of the table after altering the table schema; this is done to preserve the ability to read existing data, particularly when using binary formats such as RCFile. Binary formats do not allow changing the type of a field because of the way serialization works; a field serialized as a string will be displayed incorrectly if read as an integer. Unfortunately, as a side effect, this behavior limits the ability to add new columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible workaround is to recreate the partitions, but this process could be unnecessarily cumbersome if the number of partitions is high. New columns should be propagated to existing partitions automatically instead. Propagate addition of new columns to partition schema - Key: HIVE-7046 URL: https://issues.apache.org/jira/browse/HIVE-7046 Project: Hive Issue Type: Improvement Components: Database/Schema Affects Versions: 0.12.0 Reporter: Mariano Dominguez Hive reads data according to the partition schema, not the table schema (because of HIVE-3833). ALTER TABLE only updates the table schema, and the changes are not propagated to partitions. Thus, the schema of a partition will differ from that of the table after altering the table schema; this is done to preserve the ability to read existing data, particularly when using binary formats such as RCFile. Binary formats do not allow changing the type of a field because of the way serialization works; a field serialized as a string will be displayed incorrectly if read as an integer. Unfortunately, as a side effect, this behavior limits the ability to add new columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible workaround is to manually recreate the partitions, but this process could be unnecessarily cumbersome if the number of partitions is high. New columns should be propagated to existing partitions automatically instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-5999) Allow other characters for LINES TERMINATED BY
Mariano Dominguez created HIVE-5999: --- Summary: Allow other characters for LINES TERMINATED BY Key: HIVE-5999 URL: https://issues.apache.org/jira/browse/HIVE-5999 Project: Hive Issue Type: Improvement Components: Database/Schema Affects Versions: 0.12.0 Reporter: Mariano Dominguez LINES TERMINATED BY only supports newline '\n' right now. It would be nice to loosen this constraint and allow other characters. This limitation seems to be hardcoded here: https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171 -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-5823) Support for DECIMAL primitive type in AvroSerDe
Mariano Dominguez created HIVE-5823: --- Summary: Support for DECIMAL primitive type in AvroSerDe Key: HIVE-5823 URL: https://issues.apache.org/jira/browse/HIVE-5823 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Mariano Dominguez Priority: Minor This new feature request would be tied to AVRO-1402. Adding DECIMAL support would be particularly interesting when converting types from Avro to Hive, since DECIMAL is already a supported data type in Hive. -- This message was sent by Atlassian JIRA (v6.1#6144)