Thanks for the link. So the final answer is that even if the libhdfs bug
gets fixed there won't be any changes to Impala to expose requesting_user
in Catalog Service, right?

On Thu, Jan 3, 2019 at 9:46 AM Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> >  catalog server ignores file system authorization model
> The catalog daemon does this by design - the idea is that the catalog
> server can load and cache metadata on behalf of multiple users. It requires
> that the catalogd user (usually "impala") has permissions to read
> filesystem metadata.
>
> The "user account requirements" section in our docs explains this:
> https://impala.apache.org/docs/build/html/topics/impala_prereqs.html#prereqs
> and
> https://impala.apache.org/docs/build/html/topics/impala_security_files.html
>
> On Wed, Jan 2, 2019 at 5:52 PM mhd wrk <mhdwrkoff...@gmail.com> wrote:
>
>> it's more about enforcing Hadoop file system authorisation. All we have
>> done is implementing a custom Hadoop File System (org.apache.hadoop.fs.
>> FileSystem) and now trying to use Impala to query files hosted on that
>> file system and it fails because catalog server ignores file system
>> authorization model. The same file system works nicely with HDFS commands
>> (e.g. hdfs dfs -ls ...) as well as HiveServer.
>>
>> Our clients expect us to enforce authorization at all levels (HDFS,
>> Accumulo, Hive, Impala and ....)
>>
>> On Wed, Jan 2, 2019 at 4:56 PM Tim Armstrong <tarmstr...@cloudera.com>
>> wrote:
>>
>>> Stepping back for a second, doesn't what you're trying to do assume that
>>> each user will load metadata for each table separately? The whole point of
>>> the catalog server is that we load the metadata once and then share it
>>> between queries and users.
>>>
>>> I don't think we want to have the catalog server load different versions
>>> of a table depending on which user initially loaded the table? That would
>>> cause all sorts of issues.
>>>
>>> On Wed, Jan 2, 2019 at 12:36 PM mhd wrk <mhdwrkoff...@gmail.com> wrote:
>>>
>>>> I see. I was wondering how it works inside hive server. Basically this
>>>> is a HDFS C API issue. Thanks for the elaborate explanation.
>>>>
>>>> On Wed, Jan 2, 2019 at 12:27 PM Shant Hovsepian <sh...@arcadiadata.com>
>>>> wrote:
>>>>
>>>>> Problem is mostly with libhdfs as documented here HADOOP-12953
>>>>>
>>>>> On a kerberized setup the service principal gets picked up. There are
>>>>> work arounds in the Java HDFS API but the c based one in libhdfs has this
>>>>> issue. Of course caching HDFS will b trickier in impala as well but first
>>>>> his one API in libhdfs needs to be enhanced.
>>>>>
>>>>> Also in general having database authorization at the file level may
>>>>> not be a good idea or clean design and using sentry and extending it's
>>>>> authorization mecuanisms would be cleaner.
>>>>>
>>>>> -Shant
>>>>>
>>>>> On Wed, Jan 2, 2019, 12:21 PM mhd wrk <mhdwrkoff...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for further info. Not sure if our Product Management is OK, at
>>>>>> this point, with us patching Impala server to get our solution working. 
>>>>>> Our
>>>>>> product is supposed to work with already installed servers.
>>>>>>
>>>>>> Any plans to address the gap (making requesting_user visible inside
>>>>>> catalog server) in future release?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 2, 2019 at 11:50 AM Bharath Vissapragada <
>>>>>> bhara...@cloudera.com> wrote:
>>>>>>
>>>>>>> I was poking around in the code and it looks like we have most of
>>>>>>> the code in place
>>>>>>> <https://github.com/apache/impala/blob/27577dd652554dda5a03016e2d1e3ab66fe6b1f5/common/thrift/CatalogService.thrift#L47>
>>>>>>>
>>>>>>> // Common header included in all CatalogService requests.
>>>>>>> // TODO: The CatalogServiceVersion/protocol version should be part
>>>>>>> of the header.
>>>>>>> // This would require changes in BDR and break their compatibility
>>>>>>> story. We should
>>>>>>> // coordinate a joint change somewhere down the line.
>>>>>>> struct TCatalogServiceRequestHeader {
>>>>>>> // The effective user who submitted this request.
>>>>>>> 1: optional string requesting_user
>>>>>>> }
>>>>>>>
>>>>>>> That header is included in all the RPCs. However, that is an
>>>>>>> optional field and may not be in a few places (since we don't actually 
>>>>>>> rely
>>>>>>> on that currently). So you could start with making it a "required" field
>>>>>>> and see what all breaks. HTH.
>>>>>>>
>>>>>>> On Wed, Jan 2, 2019 at 11:35 AM Bharath Vissapragada <
>>>>>>> bhara...@cloudera.com> wrote:
>>>>>>>
>>>>>>>> I think we expose it via UDF effective_user() (effective user could
>>>>>>>> be different from the connected if delegation/doas is enabled). You 
>>>>>>>> can run
>>>>>>>> a query like "select effective_user()" in a session.
>>>>>>>>
>>>>>>>> You can also look it up in the /sessions page on the coordinator
>>>>>>>> web UI (<coordinator>:25000/sessions?json) and you can get a json 
>>>>>>>> formatted
>>>>>>>> string containing the connected and delegate user for each session.
>>>>>>>>
>>>>>>>> If you want it on the Catalog side, you probably have to plumb it
>>>>>>>> through the RPC calls (change the thrift spec and pass it along from 
>>>>>>>> the
>>>>>>>> coordinator session handling code to the Catalog RPC code).
>>>>>>>>
>>>>>>>> On Wed, Jan 2, 2019 at 11:19 AM mhd wrk <mhdwrkoff...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Is there any Impala/Sentry specific API we can use inside our code
>>>>>>>>> to figure out who current user is?
>>>>>>>>>
>>>>>>>>> On Wed, Jan 2, 2019 at 11:12 AM Bharath Vissapragada <
>>>>>>>>> bhara...@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yes. I think Jeszy is right. Per my understanding too, we don't
>>>>>>>>>> impersonate the client user on the Catalog server. Instead, we 
>>>>>>>>>> enforce the
>>>>>>>>>> authorization via Sentry during query planning.
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 2, 2019 at 7:06 AM mhd wrk <mhdwrkoff...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> IMPALA-2177 sounds like the correct issue.
>>>>>>>>>>> Here are log messages from authentication.cc for impalad and
>>>>>>>>>>> catalogd respectively:
>>>>>>>>>>>
>>>>>>>>>>> I0102 14:15:06.722666 28195 authentication.cc:478] Successfully
>>>>>>>>>>>> authenticated client user *"ad...@example.com
>>>>>>>>>>>> <ad...@example.com>"*
>>>>>>>>>>>> I0102 03:40:07.972348 27948 authentication.cc:445] Successfully
>>>>>>>>>>>> authenticated principal *"impala/cdh-...@example.com
>>>>>>>>>>>> <cdh-...@example.com>"* on an internal connection
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As you can see from the messages above, impalad is able to
>>>>>>>>>>> identify the currently connected user correctly. However catalogd 
>>>>>>>>>>> always
>>>>>>>>>>> authenticates as impala which causes the problem.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 2, 2019 at 4:19 AM Jeszy <jes...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey,
>>>>>>>>>>>>
>>>>>>>>>>>> IIUC your question correctly, this is a limitation. IMPALA-2177
>>>>>>>>>>>> looks
>>>>>>>>>>>> to be the appropriate jira.
>>>>>>>>>>>> Most users use Impala together with Sentry, where the
>>>>>>>>>>>> recommended
>>>>>>>>>>>> approach is to disable impersonation (even in services that
>>>>>>>>>>>> allow it,
>>>>>>>>>>>> like Hive).
>>>>>>>>>>>>
>>>>>>>>>>>> HTH
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, 2 Jan 2019 at 05:55, Bharath Vissapragada <
>>>>>>>>>>>> bhara...@cloudera.com> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Hi,
>>>>>>>>>>>> >
>>>>>>>>>>>> > Can you add the stack trace here if possible? It is not super
>>>>>>>>>>>> clear where exactly the problem is.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>> > Bharath
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Tue, Jan 1, 2019 at 6:34 PM mhd wrk <
>>>>>>>>>>>> mhdwrkoff...@gmail.com> wrote:
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> we have our own implementation of Hadoop FileSystem which
>>>>>>>>>>>> relies on current user in a kerberosied environment to locate user 
>>>>>>>>>>>> specific
>>>>>>>>>>>> files in HDFS.  This custom file system works fine inside hive to 
>>>>>>>>>>>> create
>>>>>>>>>>>> external tables and query them. However trying to access the same 
>>>>>>>>>>>> tables
>>>>>>>>>>>> via Impala (jdbc driver) fails. Watching the log messages seems 
>>>>>>>>>>>> that when
>>>>>>>>>>>> impalad sends requests to catalogd to get meta data of a given 
>>>>>>>>>>>> table the
>>>>>>>>>>>> current user returned by  UserGroupInformation is the service 
>>>>>>>>>>>> account
>>>>>>>>>>>> running the server (impala/hostn...@example.com) instead of
>>>>>>>>>>>> the currently connected user.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Is this a known issue or limitation of Impala?
>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to