Should we do a JIRA on this? It seems important...

On Wed, Nov 11, 2015 at 5:15 PM, John Omernik <[email protected]> wrote:

> For me it's very strange. If I delete all the .drill.parquet_metadata
> files, I can create and then run a query.  I can wait 5 minutes, and come
> back and run the same query, and then I get the permission denied, if I try
> to run the REFRESH METADATA again, then it too fails with permission denied
> until I erase all the files.
>
> What is strange here is the .drill.parquet_metadata file is owned by the
> drillbit user, and has rwxr-xr-x.  Thus, based on those permissions, the
> nondrillbit user STILL should be able to read the file with no issues.
>  (This is not something that your last bullet describes, instead it's
> restricting others from writing, not reading)
>
> In addition, when I try to run the query, it appears that the non-drillbit
> user is trying to issue a file create, and per Keys, it's already there
> (and they don't have permissions to write).
>
> There are a number of things that are not happening correctly then based
> on your understanding/description of what's happening
>
> 1. The file that is created is not limited in reading to the drillbit user
> 2. When a query is run, the file is not accessed by the drillbit user,
> it's not even accessed by the authenticated user, instead the authenticated
> user tries to overwrite the file (which makes very little sense to me on a
> select query)
>
> The only thing that is (apparently) happening correctly is the initial
> REFRESH command is creating the files as the drillbit user, however,
> subsequent operations don't seem to be working right... so I am not sure if
> that is a 3rd bullet in the "things that appear broken" list.
>
> Using the Drill Audit logs was very helpful here, if there is anything
> else I can do to help test/troubleshoot this, let me know.
>
>
>
>
> On Wed, Nov 11, 2015 at 4:54 PM, Vince Gonzalez <[email protected]>
> wrote:
>
>> Ok, I'm seeing the behavior you describe except for the last bullet - the
>> permissions on the file would allow for anyone to read the cache file.
>>
>> $ ls -la
>> total 3499
>> drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
>> drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
>> -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
>> -rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
>> -rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
>> -rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
>> *-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18
>> .drill.parquet_metadata*
>>
>> On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala <
>> [email protected]> wrote:
>>
>> > John, Vince
>> > I am little confused by this email thread.
>> > From the original description by John, I thought that the issue refresh
>> > metadata command is running successfully (and the cache is created with
>> the
>> > Drillbit user as owner) , but at query time it fails for any user (even
>> > though the user has permissions on the directory/dataset).
>> >
>> > Per the latest discussion, it seems like you are hitting permission
>> denied
>> > when running 'refresh metadata' command itself.
>> >
>> > Just wanted to share what I think the right behavior here is. Feel free
>> to
>> > comment.
>> >
>> > - When Refresh metadata command is run, the cache files get created with
>> > drillbit user as the owner (irrespective of whoever is running the
>> command
>> > and impersonation is turned on)
>> > - When a select query comes in on the table , the corresponding cache
>> file
>> > is always accessed as drillbit user (irrespective of whoever is running
>> the
>> > command and impersonation is turned on)
>> > - The cache file created through refresh metadata command should
>> restrict
>> > access to any other users other than the drillbit user (so there is no
>> > leakage of metadata for someone going to file system opening the file
>> i.e
>> > cache is Drill's internal planning purposes and not meant as user level
>> > cache).
>> >
>> > If the above is not happening, it seems like a bug.
>> >
>> > thanks
>> > Neeraja
>> >
>> > On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <[email protected]> wrote:
>> >
>> > > MapR audit records print the errno value to indicate success/failure.
>> > Thus
>> > > status 17 means errno 17 which means EEXIST. Looks like Drill is
>> trying
>> > to
>> > > create a file that already exists.
>> > >
>> > > I’ll defer to others as to why Drill might do that.
>> > >
>> > > Keys
>> > > _______________________________
>> > > Keys Botzum
>> > > Senior Principal Technologist
>> > > [email protected]
>> > > 443-718-0098
>> > > MapR Technologies
>> > > http://www.mapr.com
>> > >
>> > >
>> > >
>> > > On Nov 11, 2015, at 4:09 PM, John Omernik <[email protected]> wrote:
>> > >
>> > > > I turned on MapR Auditing (This is a handy feature) and found that
>> > when I
>> > > > run a query (that is giving me access denied.. my query is select *
>> > from
>> > > > table limit 1) Per MapR the user I am logged in as (mapradm) is
>> trying
>> > to
>> > > > do a create operation on the .drill.parquet_metadata operation and I
>> > > > guessing it's failing with status: 17 (Not sure what this means,
>> > > successes
>> > > > appear to be "0".  What was intersting was the "CREATE" being
>> attempted
>> > > > three times.   Any thoughts on why a select * from tables limit 1
>> would
>> > > try
>> > > > to initiate a create operation on the .drill.parquet_metadata file?
>> > > >
>> > > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <[email protected]>
>> > wrote:
>> > > >
>> > > >> I take it back.
>> > > >>
>> > > >> I went to run a query, in the same session that had worked, and
>> now I
>> > am
>> > > >> getting permission denied.
>> > > >>
>> > > >> I do have a query running created new directories every 5 minutes,
>> > > >> however, these aren't the directories that are giving me permission
>> > > denied.
>> > > >>  Did you try running an aggregate query accross all data? This is a
>> > > >> interesting one to track down, not sure why I am getting the access
>> > > denied
>> > > >> now,
>> > > >>
>> > > >> the .drill.parquet_metadata file in the directory that I am getting
>> > the
>> > > >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This
>> > > tells
>> > > >> me that both the user of the drillbits (mapr) and the user I am
>> logged
>> > > into
>> > > >> in sqlline (mapradm) should be able to read the file... so why do I
>> > get
>> > > an
>> > > >> access denied in running a query. I any assistance would be
>> valuable
>> > > here
>> > > >> in that there are some great performance increases with the
>> metadata
>> > > >> caching, and I don't want to miss out on that.
>> > > >>
>> > > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <[email protected]>
>> > wrote:
>> > > >>
>> > > >>> All files are owned by mapr:mapr?
>> > > >>>
>> > > >>> I have a setup where mapr is the user running the drillbit, but
>> then
>> > I
>> > > >>> have a directory that is owned by a another user. mapradm:mapradm
>> on
>> > > all
>> > > >>> files. (Permissions on directories and files appears to be
>> > rwxr-x-r-x)
>> > > When
>> > > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file
>> > gets
>> > > >>> created as mapr:mapr with rwxr-xr-x.
>> > > >>>
>> > > >>> So
>> > > >>> Drillbit User:mapr
>> > > >>> Directory (and subdirectories/files) owner: mapradm:mapradm
>> > > >>> Directory permissions (all files and folder under main directory)
>> > > >>> rwxr-x-r-x
>> > > >>>
>> > > >>> I authenticated to drill via sqlline as user mapradm (this user
>> > should
>> > > be
>> > > >>> able to read and write just fine to all directories).
>> > > >>>
>> > > >>> Now, one thing I did notice is my mapr user was not in the mapradm
>> > > group,
>> > > >>> therefore, didn't have write permissions anywhere... when I fixed
>> > that
>> > > on
>> > > >>> all nodes, and then I manually deleted the metadatafiles, things
>> seem
>> > > to be
>> > > >>> working. I wonder if that was my issue?
>> > > >>>
>> > > >>> Basically, the user running the drillbits need to be able to write
>> > > files
>> > > >>> (the .drill.parquet_metadata)  or something bad will happen :) I
>> will
>> > > do
>> > > >>> more testing. This may be a good candidate for some documentation
>> > work
>> > > to
>> > > >>> understand what permissions are required to be able to query
>> these.
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
>> > > [email protected]
>> > > >>>> wrote:
>> > > >>>
>> > > >>>> Hi John, I tried this and didn't find any issues. Let me know if
>> I
>> > > didn't
>> > > >>>> follow your reproduction faithfully.
>> > > >>>>
>> > > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
>> > > >>>> apache drill 1.2.0
>> > > >>>> "drill baby drill"
>> > > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
>> > > >>>> +-------+------------------------------------------------------+
>> > > >>>> |  ok   |                       summary                        |
>> > > >>>> +-------+------------------------------------------------------+
>> > > >>>> | true  | Successfully updated metadata for table /tmp/flows.  |
>> > > >>>> +-------+------------------------------------------------------+
>> > > >>>> 1 row selected (32.27 seconds)
>> > > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit
>> 12;
>> > > >>>> +---------------+---------------+
>> > > >>>> |     srcIP     |     dstIP     |
>> > > >>>> +---------------+---------------+
>> > > >>>> | 172.16.2.152  | 172.16.1.58   |
>> > > >>>> | 172.16.1.58   | 172.16.2.152  |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> | 172.16.2.73   | 172.16.2.152  |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> | 172.16.2.73   | 172.16.2.152  |
>> > > >>>> | 172.16.2.73   | 172.16.2.152  |
>> > > >>>> | 172.16.2.73   | 172.16.2.152  |
>> > > >>>> | 172.16.2.152  | 172.16.2.73   |
>> > > >>>> +---------------+---------------+
>> > > >>>> 12 rows selected (5.654 seconds)
>> > > >>>>
>> > > >>>> And here's what my table structure looks like (as seen via MapR
>> > NFS):
>> > > >>>>
>> > > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
>> > > >>>> /mapr/vgonzalez.drill/tmp/flows/
>> > > >>>> └── 2015
>> > > >>>>    └── 11
>> > > >>>>        ├── 10
>> > > >>>>        │   ├── 21
>> > > >>>>        │   │   ├── 39
>> > > >>>>        │   │   │   ├── 03
>> > > >>>>        │   │   │   │   ├── _common_metadata
>> > > >>>>        │   │   │   │   ├── _metadata
>> > > >>>>        │   │   │   │   ├──
>> > > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
>> > > >>>>        │   │   │   │   └── _SUCCESS
>> > > >>>>        │   │   │   └── 20
>> > > >>>>        │   │   │       ├── _common_metadata
>> > > >>>>        │   │   │       ├── _metadata
>> > > >>>>        │   │   │       ├──
>> > > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
>> > > >>>>
>> > > >>>> My parquet was created in Spark, not Drill. Not sure if that's
>> > > relevant.
>> > > >>>>
>> > > >>>> I have authentication and impersonation turned on, and the files
>> are
>> > > >>>> owned
>> > > >>>> by mapr:mapr. Here's my drill-override.conf:
>> > > >>>>
>> > > >>>> drill.exec: {
>> > > >>>>  cluster-id: "vgonzalez_drill-drillbits",
>> > > >>>> zk.connect:
>> > > >>>>
>> > > >>>>
>> > >
>> >
>> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
>> > > >>>> }
>> > > >>>> drill.exec.impersonation: { enabled: true,
>> max_chained_user_hops: 3
>> > }
>> > > >>>> drill.exec { security.user.auth { enabled: true, packages +=
>> > > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
>> > pam_profiles:
>> > > [
>> > > >>>> "login","sudo","sshd","password-auth" ] } }
>> > > >>>>
>> > > >>>>
>> > > >>>>
>> > > >>>>
>> > > >>>>
>> > > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <[email protected]>
>> > > wrote:
>> > > >>>>
>> > > >>>>> Cool, looking forward to it.
>> > > >>>>>
>> > > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
>> > > >>>> [email protected]>
>> > > >>>>> wrote:
>> > > >>>>>
>> > > >>>>>> Hey John, I have a secure cluster and some parquet files, I'll
>> try
>> > > >>>> this
>> > > >>>>> out
>> > > >>>>>> and report back.
>> > > >>>>>>
>> > > >>>>>> On Monday, November 9, 2015, John Omernik <[email protected]>
>> > wrote:
>> > > >>>>>>
>> > > >>>>>>> Has anyone been able to try/test this? I am curious if it's me
>> > only
>> > > >>>>> issue
>> > > >>>>>>> or something more of bug so I can open a JIRA if needed.
>> > > >>>>>>>
>> > > >>>>>>> John
>> > > >>>>>>>
>> > > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <
>> [email protected]
>> > > >>>>>>> <javascript:;>> wrote:
>> > > >>>>>>>
>> > > >>>>>>>> If someone has authorization/authentication setup, to
>> reproduce:
>> > > >>>>>>>>
>> > > >>>>>>>> Have a Parquet table with directories underneath the main (I
>> > have
>> > > >>>>>>>> directories per day)
>> > > >>>>>>>>
>> > > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table
>> > > >>>> running an
>> > > >>>>>>>> authenticated user other than the drill bit user. (I am using
>> > > >>>> mapr, I
>> > > >>>>>>> used
>> > > >>>>>>>> my user to run the query, and yes I have access to the data)
>> > > >>>>>>>>
>> > > >>>>>>>> Then run a normal query and see what the result is. .
>> > > >>>>>>>>
>> > > >>>>>>>> John
>> > > >>>>>>>>
>> > > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
>> > > >>>>>>>> [email protected] <javascript:;>> wrote:
>> > > >>>>>>>>
>> > > >>>>>>>>> This doesn't make sense and seems like a bug.
>> > > >>>>>>>>> I think the right behavior is for the Drillbit to access the
>> > > >>>> cache
>> > > >>>>> as
>> > > >>>>>>>>> Drillbit user at the query time (there is no user level
>> > metadata
>> > > >>>>> cache
>> > > >>>>>>> in
>> > > >>>>>>>>> Drill at this point).
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
>> [email protected]
>> > > >>>>>>> <javascript:;>> wrote:
>> > > >>>>>>>>>
>> > > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
>> > > >>>>> successfully.
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> When I tried a subsequent query, I get a IOException:
>> > > >>>> Permission
>> > > >>>>>>> Denied
>> > > >>>>>>>>> on
>> > > >>>>>>>>>> .drill.parquet_metadata.
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> I am running drill with authentication.  I ran the REFRESH
>> > > >>>> TABLE
>> > > >>>>>>>>> METADATA
>> > > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was
>> created
>> > > >>>> and
>> > > >>>>>>> owned
>> > > >>>>>>>>> by
>> > > >>>>>>>>>> the user the drill bits are running as as is created with
>> > > >>>>>> -rwxr-x-r-x
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> My question is this: So, I can see why the file is owned by
>> > > >>>> the
>> > > >>>>>> drill
>> > > >>>>>>>>> bit
>> > > >>>>>>>>>> user, and the file is created with all can read
>> permissions,
>> > > >>>> but
>> > > >>>>> why
>> > > >>>>>>> am
>> > > >>>>>>>>> I
>> > > >>>>>>>>>> getting a permission denied when user X is trying to run a
>> > > >>>> query?
>> > > >>>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>
>> > > >>>
>> > > >>>
>> > > >>
>> > >
>> > >
>> >
>>
>
>

Reply via email to