[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-20 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869021#comment-16869021
 ] 

Miklos Gergely commented on HIVE-21897:
---

You are right [~mithun], my mistake. Closing jira.

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
>  the SerDe and the SerDe properties can be set for a partition too, so
>  
> {code:java}
> ALTERT TABLE table PARTITION (partition_col='partition_value') SET SERDE 
> 'serde.class.name';{code}
> Is a valid statement. In fact it is not rejected, but it is not doing 
> anything at all. The execution is successful, everything remains the same. 
> The same is true for setting the serde properties:
> {code:java}
> ALTER TABLE table PARTITION (partition_col='partition_value') SET 
> SERDEPROPERTIES ('property_name'='property_value');{code}
> is also a valid statement, and not doing anything.
> I suggest to modify the parser, and reject these statements. SerDe is for a 
> table, and not for a partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-20 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868949#comment-16868949
 ] 

Mithun Radhakrishnan commented on HIVE-21897:
-

bq. Mithun Radhakrishnan just for sure I've inserted two rows, one with dt=1, 
one with dt=2, and checked the files in HDFS. They are both ORC files.

Pardon me, but this does not sound right. How exactly did you insert values 
into {{dt=1}} and {{dt=2}}? At what point did you write to each of the 
partitions? If you {{INSERT OVERWRITE}} *after* the partitions were created, 
then I can understand how you see what you see. But, consider this sequence:

{code:sql}
-- Create the table.
CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY (dt STRING) 
STORED AS TEXTFILE;
-- 1.
INSERT OVERWRITE TABLE foobar PARTITION( dt='1' ) VALUES ( "foo1", "value1" ); 
-- SerDe == LazySimpleSerDe.
-- Describe the partition, to confirm.
DESC FORMATTED TABLE foobar PARTITION (dt='1');

-- Alter format.
ALTER TABLE foobar SET FILEFORMAT ORCFILE; -- (No CASCADE)

-- 2.
INSERT OVERWRITE TABLE foobar PARTITION( dt='2' ) VALUES ( "foo2", "value2" ); 
-- SerDe == OrcSerDe.
-- Describe the partition, to confirm.
DESC FORMATTED TABLE foobar PARTITION (dt='2');
{code}

In this case, if {{dt='1'}} doesn't retain its SerDe setting, it will be 
rendered unreadable after the table-format is changed. Please correct me if I'm 
wrong.

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
>  the SerDe and the SerDe properties can be set for a partition too, so
>  
> {code:java}
> ALTERT TABLE table PARTITION (partition_col='partition_value') SET SERDE 
> 'serde.class.name';{code}
> Is a valid statement. In fact it is not rejected, but it is not doing 
> anything at all. The execution is successful, everything remains the same. 
> The same is true for setting the serde properties:
> {code:java}
> ALTER TABLE table PARTITION (partition_col='partition_value') SET 
> SERDEPROPERTIES ('property_name'='property_value');{code}
> is also a valid statement, and not doing anything.
> I suggest to modify the parser, and reject these statements. SerDe is for a 
> table, and not for a partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-20 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868359#comment-16868359
 ] 

Miklos Gergely commented on HIVE-21897:
---

As discussed with [~kgyrtkirk], this is not ok. We've checked it in the sys db, 
and also it shows that the SerDe is the same for the whole table and all 
partitions. For now let's reject such statements. In the future release we may 
start to have such a feature.

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
>  the SerDe and the SerDe properties can be set for a partition too, so
>  
> {code:java}
> ALTERT TABLE table PARTITION (partition_col='partition_value') SET SERDE 
> 'serde.class.name';{code}
> Is a valid statement. In fact it is not rejected, but it is not doing 
> anything at all. The execution is successful, everything remains the same. 
> The same is true for setting the serde properties:
> {code:java}
> ALTER TABLE table PARTITION (partition_col='partition_value') SET 
> SERDEPROPERTIES ('property_name'='property_value');{code}
> is also a valid statement, and not doing anything.
> I suggest to modify the parser, and reject these statements. SerDe is for a 
> table, and not for a partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-19 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868108#comment-16868108
 ] 

Miklos Gergely commented on HIVE-21897:
---

[~mithun] just for sure I've inserted two rows, one with dt=1, one with dt=2, 
and checked the files in HDFS. They are both ORC files.

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Ashutosh Chauhan
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
>  the SerDe and the SerDe properties can be set for a partition too, so
>  
> {code:java}
> ALTERT TABLE table PARTITION (partition_col='partition_value') SET SERDE 
> 'serde.class.name';{code}
> Is a valid statement. In fact it is not rejected, but it is not doing 
> anything at all. The execution is successful, everything remains the same. 
> The same is true for setting the serde properties:
> {code:java}
> ALTER TABLE table PARTITION (partition_col='partition_value') SET 
> SERDEPROPERTIES ('property_name'='property_value');{code}
> is also a valid statement, and not doing anything.
> I suggest to modify the parser, and reject these statements. SerDe is for a 
> table, and not for a partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-19 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868104#comment-16868104
 ] 

Miklos Gergely commented on HIVE-21897:
---

[~mithun] after executing those commands:

SHOW EXTENDED foobar;

 
{code:java}
+-++--+

|          col_name           |                     data_type                   
   | comment  |

+-++--+

| foo                         | string                                          
   |          |

| bar                         | string                                          
   |          |

| dt                          | string                                          
   |          |

|                             | NULL                                            
   | NULL     |

| # Partition Information     | NULL                                            
   | NULL     |

| # col_name                  | data_type                                       
   | comment  |

| dt                          | string                                          
   |          |

|                             | NULL                                            
   | NULL     |

| Detailed Table Information  | Table(tableName:foobar, dbName:default, 
owner:hive, createTime:1560986681, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:foo, type:string, comment:null), 
FieldSchema(name:bar, type:string, comment:null), FieldSchema(name:dt, 
type:string, comment:null)], 
location:hdfs://hive-on-tezt-1.vpc.cloudera.com:8020/warehouse/tablespace/managed/hive/foobar,
 inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[FieldSchema(name:dt, type:string, comment:null)], 
parameters:{last_modified_time=1560986885, totalSize=0, numRows=0, 
rawDataSize=0, transactional_properties=insert_only, 
COLUMN_STATS_ACCURATE={\"BASIC_STATS\":\"true\"}, numFiles=0, numPartitions=2, 
transient_lastDdlTime=1560986885, bucketing_version=2, last_modified_by=hive, 
transactional=true}, viewOriginalText:null, viewExpandedText:null, 
tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER, 
writeId:0) |          |
{code}
SHOW CREATE TABLE foobar;

 
{code:java}
++

|                   createtab_stmt                   |

++

| CREATE TABLE `foobar`(                             |

|   `foo` string,                                    |

|   `bar` string)                                    |

| PARTITIONED BY (                                   |

|   `dt` string)                                     |

| ROW FORMAT SERDE                                   |

|   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'      |

| STORED AS INPUTFORMAT                              |

|   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |

| OUTPUTFORMAT                                       |

|   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |

| LOCATION                                           |

|   
'hdfs://hive-on-tezt-1.vpc.cloudera.com:8020/warehouse/tablespace/managed/hive/foobar'
 |

| TBLPROPERTIES (                                    |

|   'bucketing_version'='2',                         |

|   'last_modified_by'='hive',                       |

|   'last_modified_time'='1560986885',               |

|   'transactional'='true',                          |

|   'transactional_properties'='insert_only',        |

|   'transient_lastDdlTime'='1560986885')            |

++
{code}
So as it seems the table has only one SerDe, not per partition. Do we want to 
allow a different SerDe per partition? Because if we do, it needs planning, and 
code changes. Or for now we may stick to the one SerDe / table.

 

 

 

 

 

 

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Ashutosh Chauhan
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/conf

[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-19 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868092#comment-16868092
 ] 

Mithun Radhakrishnan commented on HIVE-21897:
-

At the risk of muddying the waters, I'd consider {{AvroSerDe}} which relies on 
the table/serde settings for {{"avro.schema.url"}} and 
{{"avro.schema.literal"}}.
When an Avro table's schema changes, old partitions might link to an older 
schema-literal SerDe-parameter value than newer partitions.

I could be wrong, but we might want to reevaluate the assumption that SerDe 
settings should apply uniformly across all partitions in a table.

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Ashutosh Chauhan
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
>  the SerDe and the SerDe properties can be set for a partition too, so
>  
> {code:java}
> ALTERT TABLE table PARTITION (partition_col='partition_value') SET SERDE 
> 'serde.class.name';{code}
> Is a valid statement. In fact it is not rejected, but it is not doing 
> anything at all. The execution is successful, everything remains the same. 
> The same is true for setting the serde properties:
> {code:java}
> ALTER TABLE table PARTITION (partition_col='partition_value') SET 
> SERDEPROPERTIES ('property_name'='property_value');{code}
> is also a valid statement, and not doing anything.
> I suggest to modify the parser, and reject these statements. SerDe is for a 
> table, and not for a partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-19 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868085#comment-16868085
 ] 

Mithun Radhakrishnan commented on HIVE-21897:
-

bq. SerDe is for a table, and not for a partition.

Pardon me, but wouldn't a SerDe be exercised per partition?

{code:sql}
CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY (dt STRING) 
STORED AS TEXTFILE;
ALTER TABLE foobar ADD PARTITION ( dt='1' ); -- SerDe == LazySimpleSerDe.
ALTER TABLE foobar SET FILEFORMAT ORCFILE;
ALTER TABLE foobar ADD PARTITION ( dt='2' ); -- SerDe == OrcSerDe.
{code}

{{foobar(dt='1')}} should use {{LazySimpleSerDe}}, while {{foobar(dt='2')}} 
would use {{OrcSerDe}}, when each is read.

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Ashutosh Chauhan
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
>  the SerDe and the SerDe properties can be set for a partition too, so
>  
> {code:java}
> ALTERT TABLE table PARTITION (partition_col='partition_value') SET SERDE 
> 'serde.class.name';{code}
> Is a valid statement. In fact it is not rejected, but it is not doing 
> anything at all. The execution is successful, everything remains the same. 
> The same is true for setting the serde properties:
> {code:java}
> ALTER TABLE table PARTITION (partition_col='partition_value') SET 
> SERDEPROPERTIES ('property_name'='property_value');{code}
> is also a valid statement, and not doing anything.
> I suggest to modify the parser, and reject these statements. SerDe is for a 
> table, and not for a partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21897) Setting serde / serde properties for partitions

2019-06-19 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868066#comment-16868066
 ] 

Miklos Gergely commented on HIVE-21897:
---

[~ashutoshc] please either approve this modification, or let me know what 
should happen if a user wants to set the SerDe / SerDe properties of a 
partition, and I'll implement it.

[~muleho...@gmail.com], likely we'll need to modify the documentation.

> Setting serde / serde properties for partitions
> ---
>
> Key: HIVE-21897
> URL: https://issues.apache.org/jira/browse/HIVE-21897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Ashutosh Chauhan
>Priority: Major
> Fix For: 4.0.0
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
>  the SerDe and the SerDe properties can be set for a partition too, so
>  
> {code:java}
> ALTERT TABLE table PARTITION (partition_col='partition_value') SET SERDE 
> 'serde.class.name';{code}
> Is a valid statement. In fact it is not rejected, but it is not doing 
> anything at all. The execution is successful, everything remains the same. 
> The same is true for setting the serde properties:
> {code:java}
> ALTER TABLE table PARTITION (partition_col='partition_value') SET 
> SERDEPROPERTIES ('property_name'='property_value');{code}
> is also a valid statement, and not doing anything.
> I suggest to modify the parser, and reject these statements. SerDe is for a 
> table, and not for a partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)