subject:"\[jira\] \[Updated\] \(HIVE\-21940\) Metastore\: Postgres text <\-> clob mismatch for PARTITION_PARAMS\/PARAM

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-15 Thread Zoltan Haindrich (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21940:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to master. Thank you Laszlo!

> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21940.01.patch, HIVE-21940.01.patch, 
> HIVE-21940.02.patch, HIVE-21940.repro.patch
>
>
> Summary: this is a fix for a regression introduced by HIVE-20833/HIVE-20221, 
> fixed in the same way as HIVE-16667 earlier
> This issue was found while running sqoop/hive tests on a cluster with hive 
> with postgres metastore, and it turned out the problem is that PARAM_VALUE is 
> handled as it was CLOB but it's a text, so extractSqlClob returns it as is.
> It's reproducible on cluster by the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> sys query results in:
> {code}
> +-++-+
> | pp.part_id  |  pp.param_key  | pp.param_value  |
> +-++-+
> | 151 | rawDataSize| 28629   |
> | 151 | numRows| 28628   |
> | 151 | transient_lastDdlTime  | 28627   |
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> | 151 | numFiles   | 28625   |
> | 151 | totalSize  | 28622   |
> +-++-+
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids into this 
> field.
> In case of a direct metastore query (from hive's sys schema, but the same 
> result for direct postgres), it shows the result above (see sys query 
> output). This is an issue when hive treats these ids as they were real 
> values, but they are obviously not correct, and this causes various failures 
> (e.g. using serde parameter serialization.format=28392)
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0, 
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-12 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Attachment: HIVE-21940.02.patch

> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21940.01.patch, HIVE-21940.01.patch, 
> HIVE-21940.02.patch, HIVE-21940.repro.patch
>
>
> Summary: this is a fix for a regression introduced by HIVE-20833/HIVE-20221, 
> fixed in the same way as HIVE-16667 earlier
> This issue was found while running sqoop/hive tests on a cluster with hive 
> with postgres metastore, and it turned out the problem is that PARAM_VALUE is 
> handled as it was CLOB but it's a text, so extractSqlClob returns it as is.
> It's reproducible on cluster by the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> sys query results in:
> {code}
> +-++-+
> | pp.part_id  |  pp.param_key  | pp.param_value  |
> +-++-+
> | 151 | rawDataSize| 28629   |
> | 151 | numRows| 28628   |
> | 151 | transient_lastDdlTime  | 28627   |
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> | 151 | numFiles   | 28625   |
> | 151 | totalSize  | 28622   |
> +-++-+
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids into this 
> field.
> In case of a direct metastore query (from hive's sys schema, but the same 
> result for direct postgres), it shows the result above (see sys query 
> output). This is an issue when hive treats these ids as they were real 
> values, but they are obviously not correct, and this causes various failures 
> (e.g. using serde parameter serialization.format=28392)
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0, 
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-04 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
Summary: this is a fix for a regression introduced by HIVE-20833/HIVE-20221, 
fixed in the same way as HIVE-16667 earlier

This issue was found while running sqoop/hive tests on a cluster with hive with 
postgres metastore, and it turned out the problem is that PARAM_VALUE is 
handled as it was CLOB but it's a text, so extractSqlClob returns it as is.

It's reproducible on cluster by the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field.

In case of a direct metastore query (from hive's sys schema, but the same 
result for direct postgres), it shows the result above (see sys query output). 
This is an issue when hive treats these ids as they were real values, but they 
are obviously not correct, and this causes various failures (e.g. using serde 
parameter serialization.format=28392)



param_value values above are large object ids, according to pg_dump
| 151 | COLUMN_STATS_ACCURATE  | 28626   |

{code}
SELECT pg_catalog.lo_open('28626', 131072);
SELECT pg_catalog.lowrite(0, 
'\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
SELECT pg_catalog.lo_close(0);

{code}
decoded large object value:
{code}
{"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
{code}


  was:
Summary: this is a fix for a regression introduced by HIVE-20833/HIVE-20221, 
fixed in the same way as HIVE-16667 earlier

This issue was found while running sqoop/hive tests on a cluster with hive with 
postgres metastore, and it turned out the problem is that PARAM_VALUE is 
handled as it was CLOB but it's a text, so extractSqlClob returns it as is.

It's reproducible on cluster by the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-03 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Attachment: HIVE-21940.01.patch

> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21940.01.patch, HIVE-21940.01.patch, 
> HIVE-21940.repro.patch
>
>
> Summary: this is a fix for a regression introduced by HIVE-20833/HIVE-20221, 
> fixed in the same way as HIVE-16667 earlier
> This issue was found while running sqoop/hive tests on a cluster with hive 
> with postgres metastore, and it turned out the problem is that PARAM_VALUE is 
> handled as it was CLOB but it's a text, so extractSqlClob returns it as is.
> It's reproducible on cluster by the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> sys query results in:
> {code}
> +-++-+
> | pp.part_id  |  pp.param_key  | pp.param_value  |
> +-++-+
> | 151 | rawDataSize| 28629   |
> | 151 | numRows| 28628   |
> | 151 | transient_lastDdlTime  | 28627   |
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> | 151 | numFiles   | 28625   |
> | 151 | totalSize  | 28622   |
> +-++-+
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids into this 
> field. More interesting is that the large object can be resolved in some 
> codepaths. In case of a describe for partition it works correctly:
> {code}
> describe formatted my_table partition (col2='F');
> ...
> | Partition Parameters: | NULL
>| NULL   |
> |   | COLUMN_STATS_ACCURATE   
>| 
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
>  |
> |   | numFiles
>| 1  |
> |   | numRows 
>| 1  |
> |   | rawDataSize 
>| 6  |
> |   | totalSize   
>| 7  |
> |   | transient_lastDdlTime   
>| 1561976024 |
> |   | NULL
>| NULL   |
> {code}
> But in case of a direct metastore query (from hive's sys schema, but the same 
> result for direct postgres), it shows the result above (see sys query 
> output). This is an issue when hive treats these ids as they were real 
> values, but they are obviously not correct, and this causes various failures 
> (e.g. using serde parameter serialization.format=28392)
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0, 
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-03 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Status: Patch Available  (was: Open)

> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21940.01.patch, HIVE-21940.repro.patch
>
>
> Summary: this is a fix for a regression introduced by HIVE-20833/HIVE-20221, 
> fixed in the same way as HIVE-16667 earlier
> This issue was found while running sqoop/hive tests on a cluster with hive 
> with postgres metastore, and it turned out the problem is that PARAM_VALUE is 
> handled as it was CLOB but it's a text, so extractSqlClob returns it as is.
> It's reproducible on cluster by the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> sys query results in:
> {code}
> +-++-+
> | pp.part_id  |  pp.param_key  | pp.param_value  |
> +-++-+
> | 151 | rawDataSize| 28629   |
> | 151 | numRows| 28628   |
> | 151 | transient_lastDdlTime  | 28627   |
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> | 151 | numFiles   | 28625   |
> | 151 | totalSize  | 28622   |
> +-++-+
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids into this 
> field. More interesting is that the large object can be resolved in some 
> codepaths. In case of a describe for partition it works correctly:
> {code}
> describe formatted my_table partition (col2='F');
> ...
> | Partition Parameters: | NULL
>| NULL   |
> |   | COLUMN_STATS_ACCURATE   
>| 
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
>  |
> |   | numFiles
>| 1  |
> |   | numRows 
>| 1  |
> |   | rawDataSize 
>| 6  |
> |   | totalSize   
>| 7  |
> |   | transient_lastDdlTime   
>| 1561976024 |
> |   | NULL
>| NULL   |
> {code}
> But in case of a direct metastore query (from hive's sys schema, but the same 
> result for direct postgres), it shows the result above (see sys query 
> output). This is an issue when hive treats these ids as they were real 
> values, but they are obviously not correct, and this causes various failures 
> (e.g. using serde parameter serialization.format=28392)
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0, 
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-03 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
Summary: this is a fix for a regression introduced by HIVE-20833/HIVE-20221, 
fixed in the same way as HIVE-16667 earlier

This issue was found while running sqoop/hive tests on a cluster with hive with 
postgres metastore, and it turned out the problem is that PARAM_VALUE is 
handled as it was CLOB but it's a text, so extractSqlClob returns it as is.

It's reproducible on cluster by the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|   | rawDataSize   
 | 6  |
|   | totalSize 
 | 7  |
|   | transient_lastDdlTime 
 | 1561976024 |
|   | NULL  
 | NULL   |

{code}

But in case of a direct metastore query (from hive's sys schema, but the same 
result for direct postgres), it shows the result above (see sys query output). 
This is an issue when hive treats these ids as they were real values, but they 
are obviously not correct, and this causes various failures (e.g. using serde 
parameter serialization.format=28392)



param_value values above are large object ids, according to pg_dump
| 151 | COLUMN_STATS_ACCURATE  | 28626   |

{code}
SELECT pg_catalog.lo_open('28626', 131072);
SELECT pg_catalog.lowrite(0, 
'\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
SELECT pg_catalog.lo_close(0);

{code}
decoded large object value:
{code}
{"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
{code}


  was:
This issue was found while running sqoop/hive tests on a cluster with hive with 
postgres metastore, and it turned out the problem is that PARAM_VALUE is 
handled as it was CLOB but it's a text, so extractSqlClob returns it as is.

It's reproducible on cluster by the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 |

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-03 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Attachment: HIVE-21940.01.patch

> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21940.01.patch, HIVE-21940.repro.patch
>
>
> This issue was found while running sqoop/hive tests on a cluster with hive 
> with postgres metastore, and it turned out the problem is that PARAM_VALUE is 
> handled as it was CLOB but it's a text, so extractSqlClob returns it as is.
> It's reproducible on cluster by the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> sys query results in:
> {code}
> +-++-+
> | pp.part_id  |  pp.param_key  | pp.param_value  |
> +-++-+
> | 151 | rawDataSize| 28629   |
> | 151 | numRows| 28628   |
> | 151 | transient_lastDdlTime  | 28627   |
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> | 151 | numFiles   | 28625   |
> | 151 | totalSize  | 28622   |
> +-++-+
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids into this 
> field. More interesting is that the large object can be resolved in some 
> codepaths. In case of a describe for partition it works correctly:
> {code}
> describe formatted my_table partition (col2='F');
> ...
> | Partition Parameters: | NULL
>| NULL   |
> |   | COLUMN_STATS_ACCURATE   
>| 
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
>  |
> |   | numFiles
>| 1  |
> |   | numRows 
>| 1  |
> |   | rawDataSize 
>| 6  |
> |   | totalSize   
>| 7  |
> |   | transient_lastDdlTime   
>| 1561976024 |
> |   | NULL
>| NULL   |
> {code}
> But in case of a direct metastore query (from hive's sys schema, but the same 
> result for direct postgres), it shows the result above (see sys query 
> output). This is an issue when hive treats these ids as they were real 
> values, but they are obviously not correct, and this causes various failures 
> (e.g. using serde parameter serialization.format=28392)
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0, 
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-03 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Attachment: (was: HIVE-21940.01.patch)

> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21940.01.patch, HIVE-21940.repro.patch
>
>
> This issue was found while running sqoop/hive tests on a cluster with hive 
> with postgres metastore, and it turned out the problem is that PARAM_VALUE is 
> handled as it was CLOB but it's a text, so extractSqlClob returns it as is.
> It's reproducible on cluster by the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> sys query results in:
> {code}
> +-++-+
> | pp.part_id  |  pp.param_key  | pp.param_value  |
> +-++-+
> | 151 | rawDataSize| 28629   |
> | 151 | numRows| 28628   |
> | 151 | transient_lastDdlTime  | 28627   |
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> | 151 | numFiles   | 28625   |
> | 151 | totalSize  | 28622   |
> +-++-+
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids into this 
> field. More interesting is that the large object can be resolved in some 
> codepaths. In case of a describe for partition it works correctly:
> {code}
> describe formatted my_table partition (col2='F');
> ...
> | Partition Parameters: | NULL
>| NULL   |
> |   | COLUMN_STATS_ACCURATE   
>| 
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
>  |
> |   | numFiles
>| 1  |
> |   | numRows 
>| 1  |
> |   | rawDataSize 
>| 6  |
> |   | totalSize   
>| 7  |
> |   | transient_lastDdlTime   
>| 1561976024 |
> |   | NULL
>| NULL   |
> {code}
> But in case of a direct metastore query (from hive's sys schema, but the same 
> result for direct postgres), it shows the result above (see sys query 
> output). This is an issue when hive treats these ids as they were real 
> values, but they are obviously not correct, and this causes various failures 
> (e.g. using serde parameter serialization.format=28392)
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0, 
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-03 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Attachment: HIVE-21940.01.patch

> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21940.01.patch, HIVE-21940.repro.patch
>
>
> This issue was found while running sqoop/hive tests on a cluster with hive 
> with postgres metastore, and it turned out the problem is that PARAM_VALUE is 
> handled as it was CLOB but it's a text, so extractSqlClob returns it as is.
> It's reproducible on cluster by the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> sys query results in:
> {code}
> +-++-+
> | pp.part_id  |  pp.param_key  | pp.param_value  |
> +-++-+
> | 151 | rawDataSize| 28629   |
> | 151 | numRows| 28628   |
> | 151 | transient_lastDdlTime  | 28627   |
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> | 151 | numFiles   | 28625   |
> | 151 | totalSize  | 28622   |
> +-++-+
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids into this 
> field. More interesting is that the large object can be resolved in some 
> codepaths. In case of a describe for partition it works correctly:
> {code}
> describe formatted my_table partition (col2='F');
> ...
> | Partition Parameters: | NULL
>| NULL   |
> |   | COLUMN_STATS_ACCURATE   
>| 
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
>  |
> |   | numFiles
>| 1  |
> |   | numRows 
>| 1  |
> |   | rawDataSize 
>| 6  |
> |   | totalSize   
>| 7  |
> |   | transient_lastDdlTime   
>| 1561976024 |
> |   | NULL
>| NULL   |
> {code}
> But in case of a direct metastore query (from hive's sys schema, but the same 
> result for direct postgres), it shows the result above (see sys query 
> output). This is an issue when hive treats these ids as they were real 
> values, but they are obviously not correct, and this causes various failures 
> (e.g. using serde parameter serialization.format=28392)
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0, 
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-03 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
This issue was found while running sqoop/hive tests on a cluster with hive with 
postgres metastore, and it turned out the problem is that PARAM_VALUE is 
handled as it was CLOB but it's a text, so extractSqlClob returns it as is.

It's reproducible on cluster by the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|   | rawDataSize   
 | 6  |
|   | totalSize 
 | 7  |
|   | transient_lastDdlTime 
 | 1561976024 |
|   | NULL  
 | NULL   |

{code}

But in case of a direct metastore query (from hive's sys schema, but the same 
result for direct postgres), it shows the result above (see sys query output). 
This is an issue when hive treats these ids as they were real values, but they 
are obviously not correct, and this causes various failures (e.g. using serde 
parameter serialization.format=28392)



param_value values above are large object ids, according to pg_dump
| 151 | COLUMN_STATS_ACCURATE  | 28626   |

{code}
SELECT pg_catalog.lo_open('28626', 131072);
SELECT pg_catalog.lowrite(0, 
'\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
SELECT pg_catalog.lo_close(0);

{code}
decoded large object value:
{code}
{"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
{code}


  was:
This issue was found while running sqoop/hive tests on a cluster with hive with 
postgres metastore, and it turned out the problem is that PARAM_VALUE is 
handled as it was CLOB but it's a text, so extractClob returns it as is.

It's reproducible on cluster by the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 |

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-03 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
This issue was found while running sqoop/hive tests on a cluster with hive with 
postgres metastore, and it turned out the problem is that PARAM_VALUE is 
handled as it was CLOB but it's a text, so extractClob returns it as is.

It's reproducible on cluster by the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|   | rawDataSize   
 | 6  |
|   | totalSize 
 | 7  |
|   | transient_lastDdlTime 
 | 1561976024 |
|   | NULL  
 | NULL   |

{code}

But in case of a direct metastore query (from hive's sys schema, but the same 
result for direct postgres), it shows the result above (see sys query output). 
This is an issue when hive treats these ids as they were real values, but they 
are obviously not correct, and this causes various failures (e.g. using serde 
parameter serialization.format=28392)



param_value values above are large object ids, according to pg_dump
| 151 | COLUMN_STATS_ACCURATE  | 28626   |

{code}
SELECT pg_catalog.lo_open('28626', 131072);
SELECT pg_catalog.lowrite(0, 
'\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
SELECT pg_catalog.lo_close(0);

{code}
decoded large object value:
{code}
{"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
{code}


  was:
The issue is reproducible on a cluster with postgres metastore db by the 
following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-03 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Attachment: HIVE-21940.repro.patch

> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21940.repro.patch
>
>
> The issue is reproducible on a cluster with postgres metastore db by the 
> following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> sys query results in:
> {code}
> +-++-+
> | pp.part_id  |  pp.param_key  | pp.param_value  |
> +-++-+
> | 151 | rawDataSize| 28629   |
> | 151 | numRows| 28628   |
> | 151 | transient_lastDdlTime  | 28627   |
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> | 151 | numFiles   | 28625   |
> | 151 | totalSize  | 28622   |
> +-++-+
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids into this 
> field. More interesting is that the large object can be resolved in some 
> codepaths. In case of a describe for partition it works correctly:
> {code}
> describe formatted my_table partition (col2='F');
> ...
> | Partition Parameters: | NULL
>| NULL   |
> |   | COLUMN_STATS_ACCURATE   
>| 
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
>  |
> |   | numFiles
>| 1  |
> |   | numRows 
>| 1  |
> |   | rawDataSize 
>| 6  |
> |   | totalSize   
>| 7  |
> |   | transient_lastDdlTime   
>| 1561976024 |
> |   | NULL
>| NULL   |
> {code}
> But in case of a direct metastore query (from hive's sys schema, but the same 
> result for direct postgres), it shows the result above (see sys query 
> output). This is an issue when hive treats these ids as they were real 
> values, but they are obviously not correct, and this causes various failures 
> (e.g. using serde parameter serialization.format=28392)
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0, 
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-02 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
The issue is reproducible on a cluster with postgres metastore db by the 
following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|   | rawDataSize   
 | 6  |
|   | totalSize 
 | 7  |
|   | transient_lastDdlTime 
 | 1561976024 |
|   | NULL  
 | NULL   |

{code}

But in case of a direct metastore query (from hive's sys schema, but the same 
result for direct postgres), it shows the result above (see sys query output). 
This is an issue when hive treats these ids as they were real values, but they 
are obviously not correct, and this causes various failures (e.g. using serde 
parameter serialization.format=28392)



param_value values above are large object ids, according to pg_dump
| 151 | COLUMN_STATS_ACCURATE  | 28626   |

{code}
SELECT pg_catalog.lo_open('28626', 131072);
SELECT pg_catalog.lowrite(0, 
'\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
SELECT pg_catalog.lo_close(0);

{code}
decoded large object value:
{code}
{"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
{code}


  was:
The issue is reproducible on a cluster with postgres metastore db by the 
following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-01 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
The issue is reproducible on a cluster with postgres metastore db by the 
following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table_for_sqoop partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|   | rawDataSize   
 | 6  |
|   | totalSize 
 | 7  |
|   | transient_lastDdlTime 
 | 1561976024 |
|   | NULL  
 | NULL   |

{code}

But in case of a direct metastore query (from hive's sys schema, but the same 
result for direct postgres), it shows the result above (see sys query output). 
This is an issue when hive treats these ids as they were real values, but they 
are obviously not correct, and this causes various failures (e.g. using serde 
parameter serialization.format=28392)



param_value values above are large object ids, according to pg_dump
| 151 | COLUMN_STATS_ACCURATE  | 28626   |

{code}
SELECT pg_catalog.lo_open('28626', 131072);
SELECT pg_catalog.lowrite(0, 
'\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
SELECT pg_catalog.lo_close(0);

{code}
decoded large object value:
{code}
{"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
{code}


  was:
The issue is reproducible on a cluster with postgres metastore db by the 
following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-01 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
The issue is reproducible on a cluster with postgres metastore db by the 
following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

sys query results in:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table_for_sqoop partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|   | rawDataSize   
 | 6  |
|   | totalSize 
 | 7  |
|   | transient_lastDdlTime 
 | 1561976024 |
|   | NULL  
 | NULL   |

{code}

But in case for direct metastore query (from hive's sys schema, but the same 
result for direct postgres), it shows the result above. This is an issue when 
hive treats these ids as is, but they are obviously not correct, and this 
causes various failures (e.g. using serde parameter serialization.format=28392)



param_value values above are large object ids, according to pg_dump
| 151 | COLUMN_STATS_ACCURATE  | 28626   |

{code}
SELECT pg_catalog.lo_open('28626', 131072);
SELECT pg_catalog.lowrite(0, 
'\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
SELECT pg_catalog.lo_close(0);

{code}
decoded large object value:
{code}
{"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
{code}


  was:
The issue is reproducible on a cluster with postgres metastore db by the 
following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table_for_sqoop partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-01 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
The issue is reproducible on a cluster with postgres metastore db by the 
following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table_for_sqoop partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|   | rawDataSize   
 | 6  |
|   | totalSize 
 | 7  |
|   | transient_lastDdlTime 
 | 1561976024 |
|   | NULL  
 | NULL   |

{code}

But in case for direct metastore query (from hive's sys schema, but the same 
result for direct postgres) query:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}



param_value values above are large object ids, according to pg_dump
| 151 | COLUMN_STATS_ACCURATE  | 28626   |

{code}
SELECT pg_catalog.lo_open('28626', 131072);
SELECT pg_catalog.lowrite(0, 
'\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
SELECT pg_catalog.lo_close(0);

{code}
decoded large object value:
{code}
{"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
{code}


  was:
The issue is reproducible on a cluster with the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table_for_sqoop partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-01 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
The issue is reproducible on a cluster with the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table_for_sqoop partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|   | rawDataSize   
 | 6  |
|   | totalSize 
 | 7  |
|   | transient_lastDdlTime 
 | 1561976024 |
|   | NULL  
 | NULL   |

{code}

But in case for direct metastore query (from hive's sys schema, but the same 
result for direct postgres) query:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}



param_value values above are large object ids, according to pg_dump
| 151 | COLUMN_STATS_ACCURATE  | 28626   |

{code}
SELECT pg_catalog.lo_open('28626', 131072);
SELECT pg_catalog.lowrite(0, 
'\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
SELECT pg_catalog.lo_close(0);

{code}
decoded large object value:
{code}
{"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
{code}


  was:
The issue is reproducible on a cluster with the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table_for_sqoop partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-01 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Fix Version/s: 4.0.0

> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
>
> The issue is reproducible on a cluster with the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids into this 
> field. More interesting is that the large object can be resolved in some 
> codepaths. In case of a describe for partition it works correctly:
> {code}
> describe formatted my_table_for_sqoop partition (col2='F');
> ...
> | Partition Parameters: | NULL
>| NULL   |
> |   | COLUMN_STATS_ACCURATE   
>| 
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
>  |
> |   | numFiles
>| 1  |
> |   | numRows 
>| 1  |
> |   | rawDataSize 
>| 6  |
> |   | totalSize   
>| 7  |
> |   | transient_lastDdlTime   
>| 1561976024 |
> |   | NULL
>| NULL   |
> {code}
> But in case for direct metastore query (from hive's sys schema, but the same 
> result for direct postgres) query:
> {code}
> +-++-+
> | pp.part_id  |  pp.param_key  | pp.param_value  |
> +-++-+
> | 151 | rawDataSize| 28629   |
> | 151 | numRows| 28628   |
> | 151 | transient_lastDdlTime  | 28627   |
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> | 151 | numFiles   | 28625   |
> | 151 | totalSize  | 28622   |
> +-++-+
> {code}
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0, 
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-01 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Affects Version/s: 3.2.0

> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
>
> The issue is reproducible on a cluster with the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids into this 
> field. More interesting is that the large object can be resolved in some 
> codepaths. In case of a describe for partition it works correctly:
> {code}
> describe formatted my_table_for_sqoop partition (col2='F');
> ...
> | Partition Parameters: | NULL
>| NULL   |
> |   | COLUMN_STATS_ACCURATE   
>| 
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
>  |
> |   | numFiles
>| 1  |
> |   | numRows 
>| 1  |
> |   | rawDataSize 
>| 6  |
> |   | totalSize   
>| 7  |
> |   | transient_lastDdlTime   
>| 1561976024 |
> |   | NULL
>| NULL   |
> {code}
> But in case for direct metastore query (from hive's sys schema, but the same 
> result for direct postgres) query:
> {code}
> +-++-+
> | pp.part_id  |  pp.param_key  | pp.param_value  |
> +-++-+
> | 151 | rawDataSize| 28629   |
> | 151 | numRows| 28628   |
> | 151 | transient_lastDdlTime  | 28627   |
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> | 151 | numFiles   | 28625   |
> | 151 | totalSize  | 28622   |
> +-++-+
> {code}
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE  | 28626   |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0, 
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-01 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
The issue is reproducible on a cluster with the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids into this field. More 
interesting is that the large object can be resolved in some codepaths. In case 
of a describe for partition it works correctly:
{code}
describe formatted my_table_for_sqoop partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|   | rawDataSize   
 | 6  |
|   | totalSize 
 | 7  |
|   | transient_lastDdlTime 
 | 1561976024 |
|   | NULL  
 | NULL   |

{code}

But in case for direct metastore query (from hive's sys schema, but the same 
result for direct postgres) query:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}

param_value values above are large object ids, according to pg_dump
{code}
SELECT pg_catalog.lo_open('28626', 131072);
SELECT pg_catalog.lowrite(0, 
'\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
SELECT pg_catalog.lo_close(0);

```| 151 | COLUMN_STATS_ACCURATE  | 28626   |```
{code}
decoded large object value:
{code}
```{"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}```
{code}


  was:
The issue is reproducible on a cluster with the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids in this field. More 
interesting is that the large object can be resolved in some codepath. In case 
of a describe for partition:
{code}
describe formatted my_table_for_sqoop partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

2019-07-01 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21940:

Description: 
The issue is reproducible on a cluster with the following statements:
{code}
USE default;
drop table if exists my_table;
create external table my_table (col1 int, col3 int) partitioned by (col2 
string) STORED AS TEXTFILE;
insert into my_table VALUES(11,201,"F");
SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = "my_table";
{code}

Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no such 
type as CLOB, and metastore simply saves large object ids in this field. More 
interesting is that the large object can be resolved in some codepath. In case 
of a describe for partition:
{code}
describe formatted my_table_for_sqoop partition (col2='F');

...

| Partition Parameters: | NULL  
 | NULL   |
|   | COLUMN_STATS_ACCURATE 
 | 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
 |
|   | numFiles  
 | 1  |
|   | numRows   
 | 1  |
|   | rawDataSize   
 | 6  |
|   | totalSize 
 | 7  |
|   | transient_lastDdlTime 
 | 1561976024 |
|   | NULL  
 | NULL   |

{code}

But in case for direct metastore query (from hive's sys schema, but the same 
result for direct postgres) query:
{code}
+-++-+
| pp.part_id  |  pp.param_key  | pp.param_value  |
+-++-+
| 151 | rawDataSize| 28629   |
| 151 | numRows| 28628   |
| 151 | transient_lastDdlTime  | 28627   |
| 151 | COLUMN_STATS_ACCURATE  | 28626   |
| 151 | numFiles   | 28625   |
| 151 | totalSize  | 28622   |
+-++-+
{code}


> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
>
> The issue is reproducible on a cluster with the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2 
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = 
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = 
> "my_table";
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience 
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no 
> such type as CLOB, and metastore simply saves large object ids in this field. 
> More interesting is that the large object can be resolved in some codepath. 
> In case of a describe for partition:
> {code}
> describe formatted my_table_for_sqoop partition (col2='F');
> ...
> | Partition Parameters: | NULL
>| NULL   |
> |   | COLUMN_STATS_ACCURATE   
>| 
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}}
>  |
> |   | numFiles
>| 1  |
> |   | numRows 
>| 1  |
> |   | rawDataSize

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

[jira] [Updated] (HIVE-21940) Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE

21 matches

Site Navigation

Mail list logo

Footer information