Re: Column-level encryption in Spark SQL

Gourav Sengupta Thu, 21 Jan 2021 06:38:55 -0800

Hi John,

as always I would start by asking what is that y0u are trying to achieve
here? What is the exact security requirement?


We can then start looking at the options available.

Regards,
Gourav Sengupta

On Thu, Jan 21, 2021 at 1:59 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Most enterprise databases provide Data Encryption of some form. For
> example Introduction to Transparent Data Encryption (oracle.com)
> <https://docs.oracle.com/database/121/ASOAG/introduction-to-transparent-data-encryption.htm#ASOAG10272>
>
> As far as I know Hive supports text and sequence file column
> level encryption that in turn rely on hdfs data encryption. see here
> <https://support.huawei.com/enterprise/en/doc/EDOC1100020163/742cbdb6/using-the-hive-column-encryption-function#:~:text=Hive%20supports%20encryption%20of%20one,the%20related%20columns%20are%20encrypted.>
>
>
> In general this seems to be left  to the underlying storage. Most
> customers rely on tools like Protegrity  
> <https://www.protegrity.com/>tokenization
> solutions  <https://www.protegrity.com/> before data is stored in data
> warehouse like Hive or Cloud databases etc
>
> There should be no reason for Spark not to support it at least in simplest
> form. For example within PySpark one can create the table explicitly on
> Hive trying to encrypt columns ID and CLUSTERED below
>
> sqltext  = ""
> if (spark.sql("SHOW TABLES IN test like 'randomDataPy'").count() == 1):
>   rows = spark.sql(f"""SELECT COUNT(1) FROM
> {fullyQualifiedTableName}""").collect()[0][0]
>   print ("number of rows is ",rows)
> else:
>   print("\nTable test.randomDataPy does not exist, creating table ")
>   sqltext = """
>      CREATE TABLE test.randomDataPy(
>        ID INT
>      , CLUSTERED INT
>      , SCATTERED INT
>      , RANDOMISED INT
>      , RANDOM_STRING VARCHAR(50)
>      , SMALL_VC VARCHAR(50)
>      , PADDING  VARCHAR(4000)
>     )
>     ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>        WITH SERDEPROPERTIES ('column.encode.columns'='ID, CLUSTERED',
> 'column.encode.classname'='org.apache.hadoop.hive.serde2.AESRewriter')
>        STORED AS TEXTFILE
>     """
>   spark.sql(sqltext)
>
> Disclaimer: I have not tried it myself  but worth trying to see if it
> works.
>
> HTH
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 21 Jan 2021 at 11:44, Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> Never heard of it (and have once been tasked to explore a similar use
>> case). I'm curious how you'd like it to work? (no idea how Hive does this
>> either)
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books <https://books.japila.pl/>
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> <https://twitter.com/jaceklaskowski>
>>
>>
>> On Sat, Dec 19, 2020 at 2:38 AM john washington <
>> allpurpose95...@gmail.com> wrote:
>>
>>> Dear Spark team members,
>>>
>>> Can you please advise if Column-level encryption is available in Spark
>>> SQL?
>>> I am aware that HIVE supports column level encryption.
>>>
>>> Appreciate your response.
>>>
>>> Thanks,
>>> John
>>>
>>

Re: Column-level encryption in Spark SQL

Reply via email to