Problem is mostly with libhdfs as documented here HADOOP-12953

On a kerberized setup the service principal gets picked up. There are work
arounds in the Java HDFS API but the c based one in libhdfs has this issue.
Of course caching HDFS will b trickier in impala as well but first his one
API in libhdfs needs to be enhanced.

Also in general having database authorization at the file level may not be
a good idea or clean design and using sentry and extending it's
authorization mecuanisms would be cleaner.

-Shant

On Wed, Jan 2, 2019, 12:21 PM mhd wrk <mhdwrkoff...@gmail.com> wrote:

> Thanks for further info. Not sure if our Product Management is OK, at this
> point, with us patching Impala server to get our solution working. Our
> product is supposed to work with already installed servers.
>
> Any plans to address the gap (making requesting_user visible inside
> catalog server) in future release?
>
>
>
> On Wed, Jan 2, 2019 at 11:50 AM Bharath Vissapragada <
> bhara...@cloudera.com> wrote:
>
>> I was poking around in the code and it looks like we have most of the code
>> in place
>> <https://github.com/apache/impala/blob/27577dd652554dda5a03016e2d1e3ab66fe6b1f5/common/thrift/CatalogService.thrift#L47>
>>
>> // Common header included in all CatalogService requests.
>> // TODO: The CatalogServiceVersion/protocol version should be part of the
>> header.
>> // This would require changes in BDR and break their compatibility story.
>> We should
>> // coordinate a joint change somewhere down the line.
>> struct TCatalogServiceRequestHeader {
>> // The effective user who submitted this request.
>> 1: optional string requesting_user
>> }
>>
>> That header is included in all the RPCs. However, that is an optional
>> field and may not be in a few places (since we don't actually rely on that
>> currently). So you could start with making it a "required" field and see
>> what all breaks. HTH.
>>
>> On Wed, Jan 2, 2019 at 11:35 AM Bharath Vissapragada <
>> bhara...@cloudera.com> wrote:
>>
>>> I think we expose it via UDF effective_user() (effective user could be
>>> different from the connected if delegation/doas is enabled). You can run a
>>> query like "select effective_user()" in a session.
>>>
>>> You can also look it up in the /sessions page on the coordinator web UI
>>> (<coordinator>:25000/sessions?json) and you can get a json formatted string
>>> containing the connected and delegate user for each session.
>>>
>>> If you want it on the Catalog side, you probably have to plumb it
>>> through the RPC calls (change the thrift spec and pass it along from the
>>> coordinator session handling code to the Catalog RPC code).
>>>
>>> On Wed, Jan 2, 2019 at 11:19 AM mhd wrk <mhdwrkoff...@gmail.com> wrote:
>>>
>>>> Is there any Impala/Sentry specific API we can use inside our code to
>>>> figure out who current user is?
>>>>
>>>> On Wed, Jan 2, 2019 at 11:12 AM Bharath Vissapragada <
>>>> bhara...@cloudera.com> wrote:
>>>>
>>>>> Yes. I think Jeszy is right. Per my understanding too, we don't
>>>>> impersonate the client user on the Catalog server. Instead, we enforce the
>>>>> authorization via Sentry during query planning.
>>>>>
>>>>> On Wed, Jan 2, 2019 at 7:06 AM mhd wrk <mhdwrkoff...@gmail.com> wrote:
>>>>>
>>>>>> IMPALA-2177 sounds like the correct issue.
>>>>>> Here are log messages from authentication.cc for impalad and catalogd
>>>>>> respectively:
>>>>>>
>>>>>> I0102 14:15:06.722666 28195 authentication.cc:478] Successfully
>>>>>>> authenticated client user *"ad...@example.com <ad...@example.com>"*
>>>>>>> I0102 03:40:07.972348 27948 authentication.cc:445] Successfully
>>>>>>> authenticated principal *"impala/cdh-...@example.com
>>>>>>> <cdh-...@example.com>"* on an internal connection
>>>>>>
>>>>>>
>>>>>> As you can see from the messages above, impalad is able to identify
>>>>>> the currently connected user correctly. However catalogd always
>>>>>> authenticates as impala which causes the problem.
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 2, 2019 at 4:19 AM Jeszy <jes...@gmail.com> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> IIUC your question correctly, this is a limitation. IMPALA-2177 looks
>>>>>>> to be the appropriate jira.
>>>>>>> Most users use Impala together with Sentry, where the recommended
>>>>>>> approach is to disable impersonation (even in services that allow it,
>>>>>>> like Hive).
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> On Wed, 2 Jan 2019 at 05:55, Bharath Vissapragada <
>>>>>>> bhara...@cloudera.com> wrote:
>>>>>>> >
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > Can you add the stack trace here if possible? It is not super
>>>>>>> clear where exactly the problem is.
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Bharath
>>>>>>> >
>>>>>>> > On Tue, Jan 1, 2019 at 6:34 PM mhd wrk <mhdwrkoff...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> we have our own implementation of Hadoop FileSystem which relies
>>>>>>> on current user in a kerberosied environment to locate user specific 
>>>>>>> files
>>>>>>> in HDFS.  This custom file system works fine inside hive to create 
>>>>>>> external
>>>>>>> tables and query them. However trying to access the same tables via 
>>>>>>> Impala
>>>>>>> (jdbc driver) fails. Watching the log messages seems that when impalad
>>>>>>> sends requests to catalogd to get meta data of a given table the current
>>>>>>> user returned by  UserGroupInformation is the service account running 
>>>>>>> the
>>>>>>> server (impala/hostn...@example.com) instead of the currently
>>>>>>> connected user.
>>>>>>> >>
>>>>>>> >> Is this a known issue or limitation of Impala?
>>>>>>>
>>>>>>

Reply via email to