Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

Mich Talebzadeh Fri, 11 Mar 2022 11:06:27 -0800

Hive 3.1.1
Spark 3.1.1

Your stack overflow issue raised and I quote:


"I have a need to generate DDL statements for Hive tables & views
programmatically. I tried using Spark and Beeline for this task. Beeline
takes around 5-10 seconds for each of the statements whereas Spark
completes the same thing in a few milliseconds. I am planning to use Spark
since it is faster compared to beeline. One downside of using spark for
getting DDL statements from the hive is, it treats CHAR, VARCHAR characters
as String and it doesn't preserve the length information that goes with
CHAR,VARCHAR data types. At the same time beeline preserves the data type
and the length information for CHAR,VARCHAR data types. *I am using Spark
2.4.1 and Beeline 2.1.1.*

Given below the sample create table command and its show create table
output."

Create a simple table in *Hive* in test database

hive> *use test;*
OK
hive> *create table etc(ID BIGINT, col1 VARCHAR(30), col2 STRING);*
OK
hive> *desc formatted etc;*
# col_name              data_type               comment
*id                      bigint*
*col1                    varchar(30)*
*col2                    string*

# Detailed Table Information
Database:               test
OwnerType:              USER
Owner:                  hduser
CreateTime:             Fri Mar 11 18:29:34 GMT 2022
LastAccessTime:         UNKNOWN
Retention:              0
Location:               hdfs://rhes75:9000/user/hive/warehouse/test.db/etc
Table Type:             MANAGED_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE
 
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"id\":\"true\"}}
        bucketing_version       2
        numFiles                0
        numRows                 0
        rawDataSize             0
        totalSize               0
        transient_lastDdlTime   1647023374

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1

Now let's go to spark-shell
                                              ^
scala> *spark.sql("show create table test.etc").show(false)*
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|createtab_stmt

                              |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|CREATE TABLE `test`.`etc` (
*  `id` BIGINT,*
*  `col1` VARCHAR(30),*
*  `col2` STRING)*
USING text
TBLPROPERTIES (
  'bucketing_version' = '2',
  'transient_lastDdlTime' = '1647023374')
|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

You can see Spark shows columns correctly

Now let us go and create the same table in hive through beeline


0: jdbc:hive2://rhes75:10099/default>* use test*

No rows affected (0.019 seconds)

0: jdbc:hive2://rhes75:10099/default> *create table etc(ID BIGINT, col1
VARCHAR(30), col2 STRING)*

. . . . . . . . . . . . . . . . . . > No rows affected (0.304 seconds)

0: jdbc:hive2://rhes75:10099/default> *desc formatted etc*

. . . . . . . . . . . . . . . . . . >
+-------------------------------+----------------------------------------------------+----------------------------------------------------+

|           col_name            |                     data_type
          |                      comment                       |

+-------------------------------+----------------------------------------------------+----------------------------------------------------+

| # col_name                    | data_type
          | comment                                            |

*| id                            | bigint
           |                                                    |*

*| col1                          | varchar(30)
          |                                                    |*

*| col2                          | string
           |                                                    |*

|                               | NULL
         | NULL                                               |

| # Detailed Table Information  | NULL
         | NULL                                               |

| Database:                     | test
         | NULL                                               |

| OwnerType:                    | USER
         | NULL                                               |

| Owner:                        | hduser
         | NULL                                               |

| CreateTime:                   | Fri Mar 11 18:51:00 GMT 2022
         | NULL                                               |

| LastAccessTime:               | UNKNOWN
          | NULL                                               |

| Retention:                    | 0
          | NULL                                               |

| Location:                     |
hdfs://rhes75:9000/user/hive/warehouse/test.db/etc | NULL
                             |

| Table Type:                   | MANAGED_TABLE
          | NULL                                               |

| Table Parameters:             | NULL
         | NULL                                               |

|                               | COLUMN_STATS_ACCURATE
          |
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"id\":\"true\"}}
|

|                               | bucketing_version
          | 2                                                  |

|                               | numFiles
         | 0                                                  |

|                               | numRows
          | 0                                                  |

|                               | rawDataSize
          | 0                                                  |

|                               | totalSize
          | 0                                                  |

|                               | transient_lastDdlTime
          | 1647024660                                         |

|                               | NULL
         | NULL                                               |

| # Storage Information         | NULL
         | NULL                                               |

| SerDe Library:                |
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL
                             |

| InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat
         | NULL                                               |

| OutputFormat:                 |
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL
                                     |

| Compressed:                   | No
         | NULL                                               |

| Num Buckets:                  | -1
         | NULL                                               |

| Bucket Columns:               | []
         | NULL                                               |

| Sort Columns:                 | []
         | NULL                                               |

| Storage Desc Params:          | NULL
         | NULL                                               |

|                               | serialization.format
         | 1                                                  |

+-------------------------------+----------------------------------------------------+----------------------------------------------------+

33 rows selected (0.159 seconds)

Now check that in spark-shell again


scala> spark.sql("show create table test.etc").show(false)

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|createtab_stmt

                              |

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|CREATE TABLE `test`.`etc` (

*  `id` BIGINT,*

*  `col1` VARCHAR(30),*

*  `col2` STRING)*

USING text

TBLPROPERTIES (

  'bucketing_version' = '2',

  'transient_lastDdlTime' = '1647024660')

|

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


It shows OK.  Soo in summary you get column definitions in Spark as you
have defined them in Hive


In your statement above and I quote "I am using Spark 2.4.1 and Beeline
2.1.1", refers to older versions of Spark and hive which may have had such
issues.


HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 11 Mar 2022 at 18:19, Venkatesan Muniappan <m.venkatbe...@gmail.com>
wrote:

> hi Spark Team,
>
> I have raised a question on Spark through Stackoverflow. When you get a
> chance, can you please take a look and help me ?.
>
> https://stackoverflow.com/q/71431757/5927843
>
> Thanks,
> Venkat
> 2016173438
>

Re: Show create table on a Hive Table in Spark SQL - Treats CHAR, VARCHAR as STRING

Reply via email to