Re: Why SQL Standards Based Authorization is implemented on HiveServer2 side only?

2015-05-06 Thread Sergey Tryuber
Thanks, Alan

Of course, I do no suggest to move out SSBA completely to Metastore. As you
pointed out, it depends on query processing engine in some (but not all)
cases. I was thinking about movement of as much functionality as can be
moved and, finally, have two different SSBA covering both Metastore and HS2.

Re HDFS files - yes, the group has to be defined by HDFS admins and it is
very simple. Again, by having SSBA in both Metastore and HS2 we avoiding
necessity of tiresome Storage Based Authorization for secured production
deployments. Just having a group of trusted system users for different
frameworks is enough.

Re backward compatibility and having only one place of access granting -
that's only an additional option. So let's keep it apart of the discussion
for now.

Most probably I'll have to examine the source code to strengthen my points.
I'II do it in the second part of May.
On May 5, 2015 7:04 PM, Alan Gates alanfga...@gmail.com wrote:

 The issue is that security checks are done in the client.  In order to
 fully do them in the metastore we would have been forced to move
 significant amounts of functionality out of the client and into the
 metastore.  Query parsing and planning would have had to be moved to the
 metastore, basically making it HS2.  A few more comments inline:

   Sergey Tryuber stryu...@gmail.com
  May 4, 2015 at 12:56
 Hi Guys,

 My understanding is that there are two safe ways of usage of SQL Standards
 Based Authorization (SSBA):

 1. Hide Hive Metastore from the world by embedding it into HiveServer2.
 *MetaStoreAuthzAPIAuthorizerEmbedOnly* configuration for Metastore is
 only a half-protection since everyone can change tables-specific metadata.
 2. Have two Metastores, but Public one should be additionally
 protected by Storage Based Authorization

 Option #2 is much more demanded, since there are too many frameworks in
 Hadoop ecosystem which use Hive Metastore. But necessity to keep both SQL
 and HDFS ACLs in sync is an administration nightmare (especially taking
 into account that doAs option is false in SSBA mode).

 *Why isn't it possible to add SSBA-like authorizer to Hive Metastore as
 well?* The authorizer could check if a user has permissions to update
 table-specific metadata according to his role and username. I could even
 imagine following layout:

 1. All the files in Hive tables can be accessed only by few system users
 (hive, spark-sql, impala, etc)

 How would you accomplish this?  Hive's files are stored in HDFS and thus
 must work with HDFS file permissions.  You could construct a group that
 contained those users and make the files accessible to that group, but each
 cluster admin would have to do that.

  2. There is only a single place of granting permissions - through SQL
 standards and all SQL-like frameworks around the metastore should use it

 We can't break backwards compatibility, so we could make this an option
 but we couldn't enforce it.

  3. Additional HDFS permissions configuration would be needed only for
 rare cases of data access from non-impersonated execution pipelines (Spark
 Core, etc)
 4. No necessity to have embedded into HiveServer2 metastore, no strange
 configuration options, easier for understanding and documentation

 May be I've missed something in my understanding... So, please, point me to
 my mistake in this case.




Re: Why SQL Standards Based Authorization is implemented on HiveServer2 side only?

2015-05-05 Thread Alan Gates
The issue is that security checks are done in the client.  In order to 
fully do them in the metastore we would have been forced to move 
significant amounts of functionality out of the client and into the 
metastore.  Query parsing and planning would have had to be moved to the 
metastore, basically making it HS2.  A few more comments inline:

Sergey Tryuber mailto:stryu...@gmail.com
May 4, 2015 at 12:56
Hi Guys,

My understanding is that there are two safe ways of usage of SQL Standards
Based Authorization (SSBA):

1. Hide Hive Metastore from the world by embedding it into HiveServer2.
*MetaStoreAuthzAPIAuthorizerEmbedOnly* configuration for Metastore is
only a half-protection since everyone can change tables-specific metadata.
2. Have two Metastores, but Public one should be additionally
protected by Storage Based Authorization

Option #2 is much more demanded, since there are too many frameworks in
Hadoop ecosystem which use Hive Metastore. But necessity to keep both SQL
and HDFS ACLs in sync is an administration nightmare (especially taking
into account that doAs option is false in SSBA mode).

*Why isn't it possible to add SSBA-like authorizer to Hive Metastore as
well?* The authorizer could check if a user has permissions to update
table-specific metadata according to his role and username. I could even
imagine following layout:

1. All the files in Hive tables can be accessed only by few system users
(hive, spark-sql, impala, etc)
How would you accomplish this?  Hive's files are stored in HDFS and thus 
must work with HDFS file permissions.  You could construct a group that 
contained those users and make the files accessible to that group, but 
each cluster admin would have to do that.

2. There is only a single place of granting permissions - through SQL
standards and all SQL-like frameworks around the metastore should use it
We can't break backwards compatibility, so we could make this an option 
but we couldn't enforce it.

3. Additional HDFS permissions configuration would be needed only for
rare cases of data access from non-impersonated execution pipelines (Spark
Core, etc)
4. No necessity to have embedded into HiveServer2 metastore, no strange
configuration options, easier for understanding and documentation

May be I've missed something in my understanding... So, please, point 
me to

my mistake in this case.



Why SQL Standards Based Authorization is implemented on HiveServer2 side only?

2015-05-04 Thread Sergey Tryuber
Hi Guys,

My understanding is that there are two safe ways of usage of SQL Standards
Based Authorization (SSBA):

   1. Hide Hive Metastore from the world by embedding it into HiveServer2.
   *MetaStoreAuthzAPIAuthorizerEmbedOnly* configuration for Metastore is
   only a half-protection since everyone can change tables-specific metadata.
   2. Have two Metastores, but Public one should be additionally
   protected by Storage Based Authorization

Option #2 is much more demanded, since there are too many frameworks in
Hadoop ecosystem which use Hive Metastore. But necessity to keep both SQL
and HDFS ACLs in sync is an administration nightmare (especially taking
into account that doAs option is false in SSBA mode).

*Why isn't it possible to add SSBA-like authorizer to Hive Metastore as
well?* The authorizer could check if a user has permissions to update
table-specific metadata according to his role and username. I could even
imagine following layout:

   1. All the files in Hive tables can be accessed only by few system users
   (hive, spark-sql, impala, etc)
   2. There is only a single place of granting permissions - through SQL
   standards and all SQL-like frameworks around the metastore should use it
   3. Additional HDFS permissions configuration would be needed only for
   rare cases of data access from non-impersonated execution pipelines (Spark
   Core, etc)
   4. No necessity to have embedded into HiveServer2 metastore, no strange
   configuration options, easier for understanding and documentation

May be I've missed something in my understanding... So, please, point me to
my mistake in this case.