Ok, I'm seeing the behavior you describe except for the last bullet - the permissions on the file would allow for anyone to read the cache file.
$ ls -la total 3499 drwxr-xr-x 2 ec2-user ec2-user 5 Nov 11 21:18 . drwxrwxrwx 4 ec2-user ec2-user 2 Nov 11 21:18 .. -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet -rwxr-xr-x 1 ec2-user ec2-user 789341 Nov 11 21:18 1_1_0.parquet -rwxr-xr-x 1 ec2-user ec2-user 952667 Nov 11 21:18 1_2_0.parquet -rwxr-xr-x 1 ec2-user ec2-user 755805 Nov 11 21:18 1_3_0.parquet *-rwxr-xr-x 1 mapr mapr 14033 Nov 11 21:18 .drill.parquet_metadata* On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala < [email protected]> wrote: > John, Vince > I am little confused by this email thread. > From the original description by John, I thought that the issue refresh > metadata command is running successfully (and the cache is created with the > Drillbit user as owner) , but at query time it fails for any user (even > though the user has permissions on the directory/dataset). > > Per the latest discussion, it seems like you are hitting permission denied > when running 'refresh metadata' command itself. > > Just wanted to share what I think the right behavior here is. Feel free to > comment. > > - When Refresh metadata command is run, the cache files get created with > drillbit user as the owner (irrespective of whoever is running the command > and impersonation is turned on) > - When a select query comes in on the table , the corresponding cache file > is always accessed as drillbit user (irrespective of whoever is running the > command and impersonation is turned on) > - The cache file created through refresh metadata command should restrict > access to any other users other than the drillbit user (so there is no > leakage of metadata for someone going to file system opening the file i.e > cache is Drill's internal planning purposes and not meant as user level > cache). > > If the above is not happening, it seems like a bug. > > thanks > Neeraja > > On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <[email protected]> wrote: > > > MapR audit records print the errno value to indicate success/failure. > Thus > > status 17 means errno 17 which means EEXIST. Looks like Drill is trying > to > > create a file that already exists. > > > > I’ll defer to others as to why Drill might do that. > > > > Keys > > _______________________________ > > Keys Botzum > > Senior Principal Technologist > > [email protected] > > 443-718-0098 > > MapR Technologies > > http://www.mapr.com > > > > > > > > On Nov 11, 2015, at 4:09 PM, John Omernik <[email protected]> wrote: > > > > > I turned on MapR Auditing (This is a handy feature) and found that > when I > > > run a query (that is giving me access denied.. my query is select * > from > > > table limit 1) Per MapR the user I am logged in as (mapradm) is trying > to > > > do a create operation on the .drill.parquet_metadata operation and I > > > guessing it's failing with status: 17 (Not sure what this means, > > successes > > > appear to be "0". What was intersting was the "CREATE" being attempted > > > three times. Any thoughts on why a select * from tables limit 1 would > > try > > > to initiate a create operation on the .drill.parquet_metadata file? > > > > > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <[email protected]> > wrote: > > > > > >> I take it back. > > >> > > >> I went to run a query, in the same session that had worked, and now I > am > > >> getting permission denied. > > >> > > >> I do have a query running created new directories every 5 minutes, > > >> however, these aren't the directories that are giving me permission > > denied. > > >> Did you try running an aggregate query accross all data? This is a > > >> interesting one to track down, not sure why I am getting the access > > denied > > >> now, > > >> > > >> the .drill.parquet_metadata file in the directory that I am getting > the > > >> error on is owned by mapr:mapr and has rwxr-xr-x permissions. This > > tells > > >> me that both the user of the drillbits (mapr) and the user I am logged > > into > > >> in sqlline (mapradm) should be able to read the file... so why do I > get > > an > > >> access denied in running a query. I any assistance would be valuable > > here > > >> in that there are some great performance increases with the metadata > > >> caching, and I don't want to miss out on that. > > >> > > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <[email protected]> > wrote: > > >> > > >>> All files are owned by mapr:mapr? > > >>> > > >>> I have a setup where mapr is the user running the drillbit, but then > I > > >>> have a directory that is owned by a another user. mapradm:mapradm on > > all > > >>> files. (Permissions on directories and files appears to be > rwxr-x-r-x) > > When > > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file > gets > > >>> created as mapr:mapr with rwxr-xr-x. > > >>> > > >>> So > > >>> Drillbit User:mapr > > >>> Directory (and subdirectories/files) owner: mapradm:mapradm > > >>> Directory permissions (all files and folder under main directory) > > >>> rwxr-x-r-x > > >>> > > >>> I authenticated to drill via sqlline as user mapradm (this user > should > > be > > >>> able to read and write just fine to all directories). > > >>> > > >>> Now, one thing I did notice is my mapr user was not in the mapradm > > group, > > >>> therefore, didn't have write permissions anywhere... when I fixed > that > > on > > >>> all nodes, and then I manually deleted the metadatafiles, things seem > > to be > > >>> working. I wonder if that was my issue? > > >>> > > >>> Basically, the user running the drillbits need to be able to write > > files > > >>> (the .drill.parquet_metadata) or something bad will happen :) I will > > do > > >>> more testing. This may be a good candidate for some documentation > work > > to > > >>> understand what permissions are required to be able to query these. > > >>> > > >>> > > >>> > > >>> > > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez < > > [email protected] > > >>>> wrote: > > >>> > > >>>> Hi John, I tried this and didn't find any issues. Let me know if I > > didn't > > >>>> follow your reproduction faithfully. > > >>>> > > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr > > >>>> apache drill 1.2.0 > > >>>> "drill baby drill" > > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`; > > >>>> +-------+------------------------------------------------------+ > > >>>> | ok | summary | > > >>>> +-------+------------------------------------------------------+ > > >>>> | true | Successfully updated metadata for table /tmp/flows. | > > >>>> +-------+------------------------------------------------------+ > > >>>> 1 row selected (32.27 seconds) > > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12; > > >>>> +---------------+---------------+ > > >>>> | srcIP | dstIP | > > >>>> +---------------+---------------+ > > >>>> | 172.16.2.152 | 172.16.1.58 | > > >>>> | 172.16.1.58 | 172.16.2.152 | > > >>>> | 172.16.2.152 | 172.16.2.73 | > > >>>> | 172.16.2.152 | 172.16.2.73 | > > >>>> | 172.16.2.73 | 172.16.2.152 | > > >>>> | 172.16.2.152 | 172.16.2.73 | > > >>>> | 172.16.2.152 | 172.16.2.73 | > > >>>> | 172.16.2.152 | 172.16.2.73 | > > >>>> | 172.16.2.73 | 172.16.2.152 | > > >>>> | 172.16.2.73 | 172.16.2.152 | > > >>>> | 172.16.2.73 | 172.16.2.152 | > > >>>> | 172.16.2.152 | 172.16.2.73 | > > >>>> +---------------+---------------+ > > >>>> 12 rows selected (5.654 seconds) > > >>>> > > >>>> And here's what my table structure looks like (as seen via MapR > NFS): > > >>>> > > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15 > > >>>> /mapr/vgonzalez.drill/tmp/flows/ > > >>>> └── 2015 > > >>>> └── 11 > > >>>> ├── 10 > > >>>> │ ├── 21 > > >>>> │ │ ├── 39 > > >>>> │ │ │ ├── 03 > > >>>> │ │ │ │ ├── _common_metadata > > >>>> │ │ │ │ ├── _metadata > > >>>> │ │ │ │ ├── > > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet > > >>>> │ │ │ │ └── _SUCCESS > > >>>> │ │ │ └── 20 > > >>>> │ │ │ ├── _common_metadata > > >>>> │ │ │ ├── _metadata > > >>>> │ │ │ ├── > > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet > > >>>> > > >>>> My parquet was created in Spark, not Drill. Not sure if that's > > relevant. > > >>>> > > >>>> I have authentication and impersonation turned on, and the files are > > >>>> owned > > >>>> by mapr:mapr. Here's my drill-override.conf: > > >>>> > > >>>> drill.exec: { > > >>>> cluster-id: "vgonzalez_drill-drillbits", > > >>>> zk.connect: > > >>>> > > >>>> > > > "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181" > > >>>> } > > >>>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 > } > > >>>> drill.exec { security.user.auth { enabled: true, packages += > > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam", > pam_profiles: > > [ > > >>>> "login","sudo","sshd","password-auth" ] } } > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <[email protected]> > > wrote: > > >>>> > > >>>>> Cool, looking forward to it. > > >>>>> > > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez < > > >>>> [email protected]> > > >>>>> wrote: > > >>>>> > > >>>>>> Hey John, I have a secure cluster and some parquet files, I'll try > > >>>> this > > >>>>> out > > >>>>>> and report back. > > >>>>>> > > >>>>>> On Monday, November 9, 2015, John Omernik <[email protected]> > wrote: > > >>>>>> > > >>>>>>> Has anyone been able to try/test this? I am curious if it's me > only > > >>>>> issue > > >>>>>>> or something more of bug so I can open a JIRA if needed. > > >>>>>>> > > >>>>>>> John > > >>>>>>> > > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <[email protected] > > >>>>>>> <javascript:;>> wrote: > > >>>>>>> > > >>>>>>>> If someone has authorization/authentication setup, to reproduce: > > >>>>>>>> > > >>>>>>>> Have a Parquet table with directories underneath the main (I > have > > >>>>>>>> directories per day) > > >>>>>>>> > > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table > > >>>> running an > > >>>>>>>> authenticated user other than the drill bit user. (I am using > > >>>> mapr, I > > >>>>>>> used > > >>>>>>>> my user to run the query, and yes I have access to the data) > > >>>>>>>> > > >>>>>>>> Then run a normal query and see what the result is. . > > >>>>>>>> > > >>>>>>>> John > > >>>>>>>> > > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala < > > >>>>>>>> [email protected] <javascript:;>> wrote: > > >>>>>>>> > > >>>>>>>>> This doesn't make sense and seems like a bug. > > >>>>>>>>> I think the right behavior is for the Drillbit to access the > > >>>> cache > > >>>>> as > > >>>>>>>>> Drillbit user at the query time (there is no user level > metadata > > >>>>> cache > > >>>>>>> in > > >>>>>>>>> Drill at this point). > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <[email protected] > > >>>>>>> <javascript:;>> wrote: > > >>>>>>>>> > > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed > > >>>>> successfully. > > >>>>>>>>>> > > >>>>>>>>>> When I tried a subsequent query, I get a IOException: > > >>>> Permission > > >>>>>>> Denied > > >>>>>>>>> on > > >>>>>>>>>> .drill.parquet_metadata. > > >>>>>>>>>> > > >>>>>>>>>> I am running drill with authentication. I ran the REFRESH > > >>>> TABLE > > >>>>>>>>> METADATA > > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was created > > >>>> and > > >>>>>>> owned > > >>>>>>>>> by > > >>>>>>>>>> the user the drill bits are running as as is created with > > >>>>>> -rwxr-x-r-x > > >>>>>>>>>> > > >>>>>>>>>> My question is this: So, I can see why the file is owned by > > >>>> the > > >>>>>> drill > > >>>>>>>>> bit > > >>>>>>>>>> user, and the file is created with all can read permissions, > > >>>> but > > >>>>> why > > >>>>>>> am > > >>>>>>>>> I > > >>>>>>>>>> getting a permission denied when user X is trying to run a > > >>>> query? > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >>> > > >> > > > > >
