Ok, I'm seeing the behavior you describe except for the last bullet - the
permissions on the file would allow for anyone to read the cache file.

$ ls -la
total 3499
drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
-rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
*-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18
.drill.parquet_metadata*

On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala <
[email protected]> wrote:

> John, Vince
> I am little confused by this email thread.
> From the original description by John, I thought that the issue refresh
> metadata command is running successfully (and the cache is created with the
> Drillbit user as owner) , but at query time it fails for any user (even
> though the user has permissions on the directory/dataset).
>
> Per the latest discussion, it seems like you are hitting permission denied
> when running 'refresh metadata' command itself.
>
> Just wanted to share what I think the right behavior here is. Feel free to
> comment.
>
> - When Refresh metadata command is run, the cache files get created with
> drillbit user as the owner (irrespective of whoever is running the command
> and impersonation is turned on)
> - When a select query comes in on the table , the corresponding cache file
> is always accessed as drillbit user (irrespective of whoever is running the
> command and impersonation is turned on)
> - The cache file created through refresh metadata command should restrict
> access to any other users other than the drillbit user (so there is no
> leakage of metadata for someone going to file system opening the file i.e
> cache is Drill's internal planning purposes and not meant as user level
> cache).
>
> If the above is not happening, it seems like a bug.
>
> thanks
> Neeraja
>
> On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <[email protected]> wrote:
>
> > MapR audit records print the errno value to indicate success/failure.
> Thus
> > status 17 means errno 17 which means EEXIST. Looks like Drill is trying
> to
> > create a file that already exists.
> >
> > I’ll defer to others as to why Drill might do that.
> >
> > Keys
> > _______________________________
> > Keys Botzum
> > Senior Principal Technologist
> > [email protected]
> > 443-718-0098
> > MapR Technologies
> > http://www.mapr.com
> >
> >
> >
> > On Nov 11, 2015, at 4:09 PM, John Omernik <[email protected]> wrote:
> >
> > > I turned on MapR Auditing (This is a handy feature) and found that
> when I
> > > run a query (that is giving me access denied.. my query is select *
> from
> > > table limit 1) Per MapR the user I am logged in as (mapradm) is trying
> to
> > > do a create operation on the .drill.parquet_metadata operation and I
> > > guessing it's failing with status: 17 (Not sure what this means,
> > successes
> > > appear to be "0".  What was intersting was the "CREATE" being attempted
> > > three times.   Any thoughts on why a select * from tables limit 1 would
> > try
> > > to initiate a create operation on the .drill.parquet_metadata file?
> > >
> > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <[email protected]>
> wrote:
> > >
> > >> I take it back.
> > >>
> > >> I went to run a query, in the same session that had worked, and now I
> am
> > >> getting permission denied.
> > >>
> > >> I do have a query running created new directories every 5 minutes,
> > >> however, these aren't the directories that are giving me permission
> > denied.
> > >>  Did you try running an aggregate query accross all data? This is a
> > >> interesting one to track down, not sure why I am getting the access
> > denied
> > >> now,
> > >>
> > >> the .drill.parquet_metadata file in the directory that I am getting
> the
> > >> error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This
> > tells
> > >> me that both the user of the drillbits (mapr) and the user I am logged
> > into
> > >> in sqlline (mapradm) should be able to read the file... so why do I
> get
> > an
> > >> access denied in running a query. I any assistance would be valuable
> > here
> > >> in that there are some great performance increases with the metadata
> > >> caching, and I don't want to miss out on that.
> > >>
> > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <[email protected]>
> wrote:
> > >>
> > >>> All files are owned by mapr:mapr?
> > >>>
> > >>> I have a setup where mapr is the user running the drillbit, but then
> I
> > >>> have a directory that is owned by a another user. mapradm:mapradm on
> > all
> > >>> files. (Permissions on directories and files appears to be
> rwxr-x-r-x)
> > When
> > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file
> gets
> > >>> created as mapr:mapr with rwxr-xr-x.
> > >>>
> > >>> So
> > >>> Drillbit User:mapr
> > >>> Directory (and subdirectories/files) owner: mapradm:mapradm
> > >>> Directory permissions (all files and folder under main directory)
> > >>> rwxr-x-r-x
> > >>>
> > >>> I authenticated to drill via sqlline as user mapradm (this user
> should
> > be
> > >>> able to read and write just fine to all directories).
> > >>>
> > >>> Now, one thing I did notice is my mapr user was not in the mapradm
> > group,
> > >>> therefore, didn't have write permissions anywhere... when I fixed
> that
> > on
> > >>> all nodes, and then I manually deleted the metadatafiles, things seem
> > to be
> > >>> working. I wonder if that was my issue?
> > >>>
> > >>> Basically, the user running the drillbits need to be able to write
> > files
> > >>> (the .drill.parquet_metadata)  or something bad will happen :) I will
> > do
> > >>> more testing. This may be a good candidate for some documentation
> work
> > to
> > >>> understand what permissions are required to be able to query these.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> > [email protected]
> > >>>> wrote:
> > >>>
> > >>>> Hi John, I tried this and didn't find any issues. Let me know if I
> > didn't
> > >>>> follow your reproduction faithfully.
> > >>>>
> > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> > >>>> apache drill 1.2.0
> > >>>> "drill baby drill"
> > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> > >>>> +-------+------------------------------------------------------+
> > >>>> |  ok   |                       summary                        |
> > >>>> +-------+------------------------------------------------------+
> > >>>> | true  | Successfully updated metadata for table /tmp/flows.  |
> > >>>> +-------+------------------------------------------------------+
> > >>>> 1 row selected (32.27 seconds)
> > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> > >>>> +---------------+---------------+
> > >>>> |     srcIP     |     dstIP     |
> > >>>> +---------------+---------------+
> > >>>> | 172.16.2.152  | 172.16.1.58   |
> > >>>> | 172.16.1.58   | 172.16.2.152  |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >>>> | 172.16.2.73   | 172.16.2.152  |
> > >>>> | 172.16.2.152  | 172.16.2.73   |
> > >>>> +---------------+---------------+
> > >>>> 12 rows selected (5.654 seconds)
> > >>>>
> > >>>> And here's what my table structure looks like (as seen via MapR
> NFS):
> > >>>>
> > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> > >>>> /mapr/vgonzalez.drill/tmp/flows/
> > >>>> └── 2015
> > >>>>    └── 11
> > >>>>        ├── 10
> > >>>>        │   ├── 21
> > >>>>        │   │   ├── 39
> > >>>>        │   │   │   ├── 03
> > >>>>        │   │   │   │   ├── _common_metadata
> > >>>>        │   │   │   │   ├── _metadata
> > >>>>        │   │   │   │   ├──
> > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> > >>>>        │   │   │   │   └── _SUCCESS
> > >>>>        │   │   │   └── 20
> > >>>>        │   │   │       ├── _common_metadata
> > >>>>        │   │   │       ├── _metadata
> > >>>>        │   │   │       ├──
> > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> > >>>>
> > >>>> My parquet was created in Spark, not Drill. Not sure if that's
> > relevant.
> > >>>>
> > >>>> I have authentication and impersonation turned on, and the files are
> > >>>> owned
> > >>>> by mapr:mapr. Here's my drill-override.conf:
> > >>>>
> > >>>> drill.exec: {
> > >>>>  cluster-id: "vgonzalez_drill-drillbits",
> > >>>> zk.connect:
> > >>>>
> > >>>>
> >
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> > >>>> }
> > >>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3
> }
> > >>>> drill.exec { security.user.auth { enabled: true, packages +=
> > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam",
> pam_profiles:
> > [
> > >>>> "login","sudo","sshd","password-auth" ] } }
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <[email protected]>
> > wrote:
> > >>>>
> > >>>>> Cool, looking forward to it.
> > >>>>>
> > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> > >>>> [email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hey John, I have a secure cluster and some parquet files, I'll try
> > >>>> this
> > >>>>> out
> > >>>>>> and report back.
> > >>>>>>
> > >>>>>> On Monday, November 9, 2015, John Omernik <[email protected]>
> wrote:
> > >>>>>>
> > >>>>>>> Has anyone been able to try/test this? I am curious if it's me
> only
> > >>>>> issue
> > >>>>>>> or something more of bug so I can open a JIRA if needed.
> > >>>>>>>
> > >>>>>>> John
> > >>>>>>>
> > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <[email protected]
> > >>>>>>> <javascript:;>> wrote:
> > >>>>>>>
> > >>>>>>>> If someone has authorization/authentication setup, to reproduce:
> > >>>>>>>>
> > >>>>>>>> Have a Parquet table with directories underneath the main (I
> have
> > >>>>>>>> directories per day)
> > >>>>>>>>
> > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table
> > >>>> running an
> > >>>>>>>> authenticated user other than the drill bit user. (I am using
> > >>>> mapr, I
> > >>>>>>> used
> > >>>>>>>> my user to run the query, and yes I have access to the data)
> > >>>>>>>>
> > >>>>>>>> Then run a normal query and see what the result is. .
> > >>>>>>>>
> > >>>>>>>> John
> > >>>>>>>>
> > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> > >>>>>>>> [email protected] <javascript:;>> wrote:
> > >>>>>>>>
> > >>>>>>>>> This doesn't make sense and seems like a bug.
> > >>>>>>>>> I think the right behavior is for the Drillbit to access the
> > >>>> cache
> > >>>>> as
> > >>>>>>>>> Drillbit user at the query time (there is no user level
> metadata
> > >>>>> cache
> > >>>>>>> in
> > >>>>>>>>> Drill at this point).
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <[email protected]
> > >>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed
> > >>>>> successfully.
> > >>>>>>>>>>
> > >>>>>>>>>> When I tried a subsequent query, I get a IOException:
> > >>>> Permission
> > >>>>>>> Denied
> > >>>>>>>>> on
> > >>>>>>>>>> .drill.parquet_metadata.
> > >>>>>>>>>>
> > >>>>>>>>>> I am running drill with authentication.  I ran the REFRESH
> > >>>> TABLE
> > >>>>>>>>> METADATA
> > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was created
> > >>>> and
> > >>>>>>> owned
> > >>>>>>>>> by
> > >>>>>>>>>> the user the drill bits are running as as is created with
> > >>>>>> -rwxr-x-r-x
> > >>>>>>>>>>
> > >>>>>>>>>> My question is this: So, I can see why the file is owned by
> > >>>> the
> > >>>>>> drill
> > >>>>>>>>> bit
> > >>>>>>>>>> user, and the file is created with all can read permissions,
> > >>>> but
> > >>>>> why
> > >>>>>>> am
> > >>>>>>>>> I
> > >>>>>>>>>> getting a permission denied when user X is trying to run a
> > >>>> query?
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
>

Reply via email to