Hello,

Thanks Madhan and Bosco for your answers.

I am using HDP 2.3 and installed Ranger from Ambari. I suppose Ambari does
run enable-hive-plugin, as Ranger does work correctly with Hive when I use
Hive through the hiveserver2. It is only when I try to use it from Spark
(using SparkSQL) that it does not work.

SparkSQL does not use Hiveserver2, but it does not use HiveCLI either (at
least not directly). Hive engine is not used at all. SparkSQL is a
standalone SQL engine which is part of Spark, it gets Hive tables directly
from where they are stored, using metadata it gets from HCAT. At least it
is my understanding.

Until recently, SparkSQL was ignoring Ranger, just like the Hive CLI, and
it was working (I could access Hive data from Spark on a cluster with
Ranger up, but of course Ranger rules were ignored). But since a recent
update, SparkSQL now clearly does interact with Ranger, as I get Ranger
exceptions when I use SparkSQL. I think that it gets the value of
hive.security.authorization.manager (which in my system is a Ranger class)
and instantiate this class in order to comply with security rules defined
by this class. I am no expert in Spark internals or Ranger, this is just
assumptions.

I have solved multiple classpath (ranger jar not found) and configuration
file (xa-secure.xml ?) issues in order to reach the point where I am now.
Now I don't get missing class or missing file exceptions, but it still does
not work, and I get the issue describe in my previous mail (see below).

I will try to continue my investigations. If I make progress I will post it
here. But any additional help would be appreciated.

Best regards,

Julien


2016-01-18 22:24 GMT+01:00 Don Bosco Durai <[email protected]>:

> Ideally, Ranger shouldn’t be in play when HiveCLI is used. If I am not
> wrong, Spark using HiveCLI API.
>
> To avoid this issue, I thought we only update hiveserver2.properties.
> Julien, I assume you are using the standard enable plugin scripts.
>
> Thanks
>
> Bosco
>
>
> From: Madhan Neethiraj <[email protected]> on behalf of Madhan
> Neethiraj <[email protected]>
> Reply-To: <[email protected]>
> Date: Monday, January 18, 2016 at 9:54 AM
> To: "[email protected]" <[email protected]>
> Subject: Re: Spark + Hive + Ranger
>
> Julien,
>
> Ranger Hive plugin requires additional configuration, like whereto
> location of Ranger Admin, name of the service containing policies for Hive,
> etc. Such configurations (in files named ranger-*.xml) are created when
> enable-hive-plugin.sh script is run with appropriate values in
> install.properties. This script also update hive-site.xml with necessary
> changes – like registering Ranger as authorizer in
> hive.security.authorization.manager. If you haven’t installed the plugin
> using enable-hive-plugin.sh, please do so and let us know the result.
>
> Hope this helps.
>
> Madhan
>
>
> From: Julien Carme <[email protected]>
> Reply-To: "[email protected]" <
> [email protected]>
> Date: Monday, January 18, 2016 at 9:27 AM
> To: "[email protected]" <[email protected]>
> Subject: Spark + Hive + Ranger
>
> Hello,
>
> I try to access Hive from Spark in an Hadoop cluster where I use Ranger to
> control Hive access.
>
> As Ranger is installed, I have setup hive accordingly:
>
> hive.security.authorization.manager=
> org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory
>
> When I run Spark and I request it to access Hive table, it is using this
> class to access it but I get several errors:
>
> 16/01/18 17:51:50 INFO provider.AuditProviderFactory: No v3 audit
> configuration found. Trying v2 audit configurations
> 16/01/18 17:51:50 ERROR util.PolicyRefresher:
> PolicyRefresher(serviceName=null): failed to refresh policies. Will
> continue to use last known version of policies (-1)
> com.sun.jersey.api.client.ClientHandlerException:
> java.lang.IllegalArgumentException: URI is not absolute
>         at
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>         at com.sun.jersey.api.client.Client.handle(Client.java:648)
>         at
> com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>         at
> com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>         at
> com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
>         at
> org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:71)
>         at
> org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:205)
>
>
>
> --
>
> And then (but it is not clear at all the two errors are connected) :
>
> 16/01/18 17:51:50 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
> mode
> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
> filterListCmdObjects: Internal error: null RangerAccessResult object
> received back from isAccessAllowed()!
> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
> filterListCmdObjects: Internal error: null RangerAccessResult object
> received back from isAccessAllowed()!
> 16/01/18 17:51:50 ERROR authorizer.RangerHiveAuthorizer:
> filterListCmdObjects: Internal error: null RangerAccessResult object
> received back from isAccessAllowed()!
> 1
> --
>
> And then the access to Hive tables fails.
>
> I am not sure where to go from there. Any help would be appreciated.
>
> Best Regards,
>
> Julien
>
>
>
>

Reply via email to