[CVE-2017-12625] Apache Hive information disclosure vulnerability for column masking

2017-10-31 Thread Jesus Camacho Rodriguez
CVE-2017-12625: Apache Hive information disclosure vulnerability for column 
masking

Severity: Important

Vendor: The Apache Software Foundation

Versions Affected: Hive 2.1.0 to 2.3.0

Description:
Hive exposes an interface through which masking policies can be defined on 
tables or
views, e.g., using Apache Ranger. When a view is created over a given table, the
policy enforcement does not happen correctly on the table for masked columns.

Mitigation:
2.3.0 users should upgrade to 2.3.1
2.2.0 users should upgrade to 2.3.1, obtain the latest source from git for 
branch-2.2
or apply this patch which will be included from 2.2.1
https://git1-us-west.apache.org/repos/asf?p=hive.git;a=commit;h=0e795debddf261b0ac6ace90e2d774f9a99b7f4b
2.1.x users should upgrade to 2.3.1, obtain the latest source from git for 
branch-2.1
or apply this patch which will be included from 2.1.2
https://git1-us-west.apache.org/repos/asf?p=hive.git;a=commit;h=6db9fd6e43f6eef3c9d1ca8e324b2edaa54fb0d3

To mitigate this vulnerability until Hive is upgraded to a new version, there 
are two
possible options. These steps need to be done manually in Ranger / Hive.
1) Restrict users from creating views on tables with column masking rules 
defined. For
this in Ranger Hive Policy:
 - Users should not have SELECT permission for those Table columns with masking 
rules
defined.
 - Give SELECT permission only for those columns without masking rules defined.
2) Review the Hive Column Masking Policies maintained in Ranger for the tables. 
Then
check in Hive if views that read those tables have been defined.
If present, either change the view definition so those columns are not selected 
or
directly drop those views.

Credit:
This issue was reported by Suja Santhosh of Hortonworks.


If you have any question, please reach out to us in the Hive dev list.

Regards,

The Apache Hive Team




Re: partitioned hive table

2017-10-31 Thread Furcy Pin
Hi,

If you want to load pre-existing records, instead of inserting data in this
partition, you should use the ADD PARTITION statement
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddPartitions

or simply the MSCK REPAIR TABLE statement
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)


Something that new Hive users often miss out is that Hive does not detect
automatically external data, and that sometimes a table's data and metadata
can be off sync.

What happened in your case is that Hive keeps a record count for the
partition in its metadata,
when you insert data in the partition, Hive updates its count on the fly
with the number of rows you have inserted (here: 1).

So when you do a SELECT *, all your json files are read, but when you do a
SELECT COUNT(*), Hive will just fetch that number to respond faster.

By running COMPUTE STATISTICS this number is updated, but the correct way
is to use MSCK REPAIR TABLE to tell Hive to update its partition metadata.

Regards,

Furcy









2017-10-31 1:25 GMT+01:00 Jiewen Shao :

> Thanks Mich,
> ANALYZE TABLE PARTITION(dt='2017-08-20, bar='hello'')  COMPUTE STATISTICS
> indeed make count(*) returns correct value (for the partition only).
>
> but my hive table was not able to get data from those pre-existed json
> file unless I insert one record for the partition AND run ANALYZE TABLE
> ... COMPUTE STATISTICS for the partition. I must have missed something.
>
> How to make those preexisted json visible in hive table?
>
> On Mon, Oct 30, 2017 at 4:53 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> have you analyzed table for the partition?
>>
>> ANALYZE TABLE test_table PARTITION('2017-08-20, bar='hello'') COMPUTE
>> STATISTICS;
>>
>> and do count(*) from table
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 30 October 2017 at 22:29, Jiewen Shao  wrote:
>>
>>> Hi, I have persisted lots of JSON files on S3 under partitioned
>>> directories such as /bucket/table1/dt=2017-10-28/bar=hello/*
>>>
>>> 1. Now I created a hive table:
>>> CREATE EXTERNAL TABLE table1 ( )
>>> PARTITIONED BY (dt string, bar string) ROW FORMAT serde
>>> 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION 's3://bucket/table1';
>>>
>>> 2. Under hive commandline
>>> select * from table1;// return nothing
>>>
>>> 3. INSERT INTO TABLE  table1 PARTITION (dt='2017-08-28', bar='hello')
>>> select ;
>>>
>>> 4. now select * from table1;// return all the data from that
>>> partition
>>>
>>> 5. select count(*) from table1;  // returns 1
>>>
>>> Can someone explain what did  I miss?
>>>
>>> Thanks a lot!
>>>
>>
>>
>
2017-10-31 1:25 GMT+01:00 Jiewen Shao :

> Thanks Mich,
> ANALYZE TABLE PARTITION(dt='2017-08-20, bar='hello'')  COMPUTE STATISTICS
> indeed make count(*) returns correct value (for the partition only).
>
> but my hive table was not able to get data from those pre-existed json
> file unless I insert one record for the partition AND run ANALYZE TABLE
> ... COMPUTE STATISTICS for the partition. I must have missed something.
>
> How to make those preexisted json visible in hive table?
>
> On Mon, Oct 30, 2017 at 4:53 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> have you analyzed table for the partition?
>>
>> ANALYZE TABLE test_table PARTITION('2017-08-20, bar='hello'') COMPUTE
>> STATISTICS;
>>
>> and do count(*) from table
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 30 October 2017 at 22:29, Jiewen Shao  wrote:
>>
>>> Hi, I have persisted lots of JSON files on S3 under partitioned
>>> directories such as /bucket/table1/dt=2017-10-28/bar=hello/*
>>>
>>> 1. Now I created a hive table:
>>>