Problem is mostly with libhdfs as documented here HADOOP-12953 On a kerberized setup the service principal gets picked up. There are work arounds in the Java HDFS API but the c based one in libhdfs has this issue. Of course caching HDFS will b trickier in impala as well but first his one API in libhdfs needs to be enhanced.
Also in general having database authorization at the file level may not be a good idea or clean design and using sentry and extending it's authorization mecuanisms would be cleaner. -Shant On Wed, Jan 2, 2019, 12:21 PM mhd wrk <mhdwrkoff...@gmail.com> wrote: > Thanks for further info. Not sure if our Product Management is OK, at this > point, with us patching Impala server to get our solution working. Our > product is supposed to work with already installed servers. > > Any plans to address the gap (making requesting_user visible inside > catalog server) in future release? > > > > On Wed, Jan 2, 2019 at 11:50 AM Bharath Vissapragada < > bhara...@cloudera.com> wrote: > >> I was poking around in the code and it looks like we have most of the code >> in place >> <https://github.com/apache/impala/blob/27577dd652554dda5a03016e2d1e3ab66fe6b1f5/common/thrift/CatalogService.thrift#L47> >> >> // Common header included in all CatalogService requests. >> // TODO: The CatalogServiceVersion/protocol version should be part of the >> header. >> // This would require changes in BDR and break their compatibility story. >> We should >> // coordinate a joint change somewhere down the line. >> struct TCatalogServiceRequestHeader { >> // The effective user who submitted this request. >> 1: optional string requesting_user >> } >> >> That header is included in all the RPCs. However, that is an optional >> field and may not be in a few places (since we don't actually rely on that >> currently). So you could start with making it a "required" field and see >> what all breaks. HTH. >> >> On Wed, Jan 2, 2019 at 11:35 AM Bharath Vissapragada < >> bhara...@cloudera.com> wrote: >> >>> I think we expose it via UDF effective_user() (effective user could be >>> different from the connected if delegation/doas is enabled). You can run a >>> query like "select effective_user()" in a session. >>> >>> You can also look it up in the /sessions page on the coordinator web UI >>> (<coordinator>:25000/sessions?json) and you can get a json formatted string >>> containing the connected and delegate user for each session. >>> >>> If you want it on the Catalog side, you probably have to plumb it >>> through the RPC calls (change the thrift spec and pass it along from the >>> coordinator session handling code to the Catalog RPC code). >>> >>> On Wed, Jan 2, 2019 at 11:19 AM mhd wrk <mhdwrkoff...@gmail.com> wrote: >>> >>>> Is there any Impala/Sentry specific API we can use inside our code to >>>> figure out who current user is? >>>> >>>> On Wed, Jan 2, 2019 at 11:12 AM Bharath Vissapragada < >>>> bhara...@cloudera.com> wrote: >>>> >>>>> Yes. I think Jeszy is right. Per my understanding too, we don't >>>>> impersonate the client user on the Catalog server. Instead, we enforce the >>>>> authorization via Sentry during query planning. >>>>> >>>>> On Wed, Jan 2, 2019 at 7:06 AM mhd wrk <mhdwrkoff...@gmail.com> wrote: >>>>> >>>>>> IMPALA-2177 sounds like the correct issue. >>>>>> Here are log messages from authentication.cc for impalad and catalogd >>>>>> respectively: >>>>>> >>>>>> I0102 14:15:06.722666 28195 authentication.cc:478] Successfully >>>>>>> authenticated client user *"ad...@example.com <ad...@example.com>"* >>>>>>> I0102 03:40:07.972348 27948 authentication.cc:445] Successfully >>>>>>> authenticated principal *"impala/cdh-...@example.com >>>>>>> <cdh-...@example.com>"* on an internal connection >>>>>> >>>>>> >>>>>> As you can see from the messages above, impalad is able to identify >>>>>> the currently connected user correctly. However catalogd always >>>>>> authenticates as impala which causes the problem. >>>>>> >>>>>> >>>>>> On Wed, Jan 2, 2019 at 4:19 AM Jeszy <jes...@gmail.com> wrote: >>>>>> >>>>>>> Hey, >>>>>>> >>>>>>> IIUC your question correctly, this is a limitation. IMPALA-2177 looks >>>>>>> to be the appropriate jira. >>>>>>> Most users use Impala together with Sentry, where the recommended >>>>>>> approach is to disable impersonation (even in services that allow it, >>>>>>> like Hive). >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> On Wed, 2 Jan 2019 at 05:55, Bharath Vissapragada < >>>>>>> bhara...@cloudera.com> wrote: >>>>>>> > >>>>>>> > Hi, >>>>>>> > >>>>>>> > Can you add the stack trace here if possible? It is not super >>>>>>> clear where exactly the problem is. >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Bharath >>>>>>> > >>>>>>> > On Tue, Jan 1, 2019 at 6:34 PM mhd wrk <mhdwrkoff...@gmail.com> >>>>>>> wrote: >>>>>>> >> >>>>>>> >> we have our own implementation of Hadoop FileSystem which relies >>>>>>> on current user in a kerberosied environment to locate user specific >>>>>>> files >>>>>>> in HDFS. This custom file system works fine inside hive to create >>>>>>> external >>>>>>> tables and query them. However trying to access the same tables via >>>>>>> Impala >>>>>>> (jdbc driver) fails. Watching the log messages seems that when impalad >>>>>>> sends requests to catalogd to get meta data of a given table the current >>>>>>> user returned by UserGroupInformation is the service account running >>>>>>> the >>>>>>> server (impala/hostn...@example.com) instead of the currently >>>>>>> connected user. >>>>>>> >> >>>>>>> >> Is this a known issue or limitation of Impala? >>>>>>> >>>>>>