Should we do a JIRA on this? It seems important... On Wed, Nov 11, 2015 at 5:15 PM, John Omernik <[email protected]> wrote:
> For me it's very strange. If I delete all the .drill.parquet_metadata > files, I can create and then run a query. I can wait 5 minutes, and come > back and run the same query, and then I get the permission denied, if I try > to run the REFRESH METADATA again, then it too fails with permission denied > until I erase all the files. > > What is strange here is the .drill.parquet_metadata file is owned by the > drillbit user, and has rwxr-xr-x. Thus, based on those permissions, the > nondrillbit user STILL should be able to read the file with no issues. > (This is not something that your last bullet describes, instead it's > restricting others from writing, not reading) > > In addition, when I try to run the query, it appears that the non-drillbit > user is trying to issue a file create, and per Keys, it's already there > (and they don't have permissions to write). > > There are a number of things that are not happening correctly then based > on your understanding/description of what's happening > > 1. The file that is created is not limited in reading to the drillbit user > 2. When a query is run, the file is not accessed by the drillbit user, > it's not even accessed by the authenticated user, instead the authenticated > user tries to overwrite the file (which makes very little sense to me on a > select query) > > The only thing that is (apparently) happening correctly is the initial > REFRESH command is creating the files as the drillbit user, however, > subsequent operations don't seem to be working right... so I am not sure if > that is a 3rd bullet in the "things that appear broken" list. > > Using the Drill Audit logs was very helpful here, if there is anything > else I can do to help test/troubleshoot this, let me know. > > > > > On Wed, Nov 11, 2015 at 4:54 PM, Vince Gonzalez <[email protected]> > wrote: > >> Ok, I'm seeing the behavior you describe except for the last bullet - the >> permissions on the file would allow for anyone to read the cache file. >> >> $ ls -la >> total 3499 >> drwxr-xr-x 2 ec2-user ec2-user 5 Nov 11 21:18 . >> drwxrwxrwx 4 ec2-user ec2-user 2 Nov 11 21:18 .. >> -rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet >> -rwxr-xr-x 1 ec2-user ec2-user 789341 Nov 11 21:18 1_1_0.parquet >> -rwxr-xr-x 1 ec2-user ec2-user 952667 Nov 11 21:18 1_2_0.parquet >> -rwxr-xr-x 1 ec2-user ec2-user 755805 Nov 11 21:18 1_3_0.parquet >> *-rwxr-xr-x 1 mapr mapr 14033 Nov 11 21:18 >> .drill.parquet_metadata* >> >> On Wed, Nov 11, 2015 at 5:29 PM, Neeraja Rentachintala < >> [email protected]> wrote: >> >> > John, Vince >> > I am little confused by this email thread. >> > From the original description by John, I thought that the issue refresh >> > metadata command is running successfully (and the cache is created with >> the >> > Drillbit user as owner) , but at query time it fails for any user (even >> > though the user has permissions on the directory/dataset). >> > >> > Per the latest discussion, it seems like you are hitting permission >> denied >> > when running 'refresh metadata' command itself. >> > >> > Just wanted to share what I think the right behavior here is. Feel free >> to >> > comment. >> > >> > - When Refresh metadata command is run, the cache files get created with >> > drillbit user as the owner (irrespective of whoever is running the >> command >> > and impersonation is turned on) >> > - When a select query comes in on the table , the corresponding cache >> file >> > is always accessed as drillbit user (irrespective of whoever is running >> the >> > command and impersonation is turned on) >> > - The cache file created through refresh metadata command should >> restrict >> > access to any other users other than the drillbit user (so there is no >> > leakage of metadata for someone going to file system opening the file >> i.e >> > cache is Drill's internal planning purposes and not meant as user level >> > cache). >> > >> > If the above is not happening, it seems like a bug. >> > >> > thanks >> > Neeraja >> > >> > On Wed, Nov 11, 2015 at 2:07 PM, kbotzum <[email protected]> wrote: >> > >> > > MapR audit records print the errno value to indicate success/failure. >> > Thus >> > > status 17 means errno 17 which means EEXIST. Looks like Drill is >> trying >> > to >> > > create a file that already exists. >> > > >> > > I’ll defer to others as to why Drill might do that. >> > > >> > > Keys >> > > _______________________________ >> > > Keys Botzum >> > > Senior Principal Technologist >> > > [email protected] >> > > 443-718-0098 >> > > MapR Technologies >> > > http://www.mapr.com >> > > >> > > >> > > >> > > On Nov 11, 2015, at 4:09 PM, John Omernik <[email protected]> wrote: >> > > >> > > > I turned on MapR Auditing (This is a handy feature) and found that >> > when I >> > > > run a query (that is giving me access denied.. my query is select * >> > from >> > > > table limit 1) Per MapR the user I am logged in as (mapradm) is >> trying >> > to >> > > > do a create operation on the .drill.parquet_metadata operation and I >> > > > guessing it's failing with status: 17 (Not sure what this means, >> > > successes >> > > > appear to be "0". What was intersting was the "CREATE" being >> attempted >> > > > three times. Any thoughts on why a select * from tables limit 1 >> would >> > > try >> > > > to initiate a create operation on the .drill.parquet_metadata file? >> > > > >> > > > On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <[email protected]> >> > wrote: >> > > > >> > > >> I take it back. >> > > >> >> > > >> I went to run a query, in the same session that had worked, and >> now I >> > am >> > > >> getting permission denied. >> > > >> >> > > >> I do have a query running created new directories every 5 minutes, >> > > >> however, these aren't the directories that are giving me permission >> > > denied. >> > > >> Did you try running an aggregate query accross all data? This is a >> > > >> interesting one to track down, not sure why I am getting the access >> > > denied >> > > >> now, >> > > >> >> > > >> the .drill.parquet_metadata file in the directory that I am getting >> > the >> > > >> error on is owned by mapr:mapr and has rwxr-xr-x permissions. This >> > > tells >> > > >> me that both the user of the drillbits (mapr) and the user I am >> logged >> > > into >> > > >> in sqlline (mapradm) should be able to read the file... so why do I >> > get >> > > an >> > > >> access denied in running a query. I any assistance would be >> valuable >> > > here >> > > >> in that there are some great performance increases with the >> metadata >> > > >> caching, and I don't want to miss out on that. >> > > >> >> > > >> On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <[email protected]> >> > wrote: >> > > >> >> > > >>> All files are owned by mapr:mapr? >> > > >>> >> > > >>> I have a setup where mapr is the user running the drillbit, but >> then >> > I >> > > >>> have a directory that is owned by a another user. mapradm:mapradm >> on >> > > all >> > > >>> files. (Permissions on directories and files appears to be >> > rwxr-x-r-x) >> > > When >> > > >>> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file >> > gets >> > > >>> created as mapr:mapr with rwxr-xr-x. >> > > >>> >> > > >>> So >> > > >>> Drillbit User:mapr >> > > >>> Directory (and subdirectories/files) owner: mapradm:mapradm >> > > >>> Directory permissions (all files and folder under main directory) >> > > >>> rwxr-x-r-x >> > > >>> >> > > >>> I authenticated to drill via sqlline as user mapradm (this user >> > should >> > > be >> > > >>> able to read and write just fine to all directories). >> > > >>> >> > > >>> Now, one thing I did notice is my mapr user was not in the mapradm >> > > group, >> > > >>> therefore, didn't have write permissions anywhere... when I fixed >> > that >> > > on >> > > >>> all nodes, and then I manually deleted the metadatafiles, things >> seem >> > > to be >> > > >>> working. I wonder if that was my issue? >> > > >>> >> > > >>> Basically, the user running the drillbits need to be able to write >> > > files >> > > >>> (the .drill.parquet_metadata) or something bad will happen :) I >> will >> > > do >> > > >>> more testing. This may be a good candidate for some documentation >> > work >> > > to >> > > >>> understand what permissions are required to be able to query >> these. >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez < >> > > [email protected] >> > > >>>> wrote: >> > > >>> >> > > >>>> Hi John, I tried this and didn't find any issues. Let me know if >> I >> > > didn't >> > > >>>> follow your reproduction faithfully. >> > > >>>> >> > > >>>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr >> > > >>>> apache drill 1.2.0 >> > > >>>> "drill baby drill" >> > > >>>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`; >> > > >>>> +-------+------------------------------------------------------+ >> > > >>>> | ok | summary | >> > > >>>> +-------+------------------------------------------------------+ >> > > >>>> | true | Successfully updated metadata for table /tmp/flows. | >> > > >>>> +-------+------------------------------------------------------+ >> > > >>>> 1 row selected (32.27 seconds) >> > > >>>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit >> 12; >> > > >>>> +---------------+---------------+ >> > > >>>> | srcIP | dstIP | >> > > >>>> +---------------+---------------+ >> > > >>>> | 172.16.2.152 | 172.16.1.58 | >> > > >>>> | 172.16.1.58 | 172.16.2.152 | >> > > >>>> | 172.16.2.152 | 172.16.2.73 | >> > > >>>> | 172.16.2.152 | 172.16.2.73 | >> > > >>>> | 172.16.2.73 | 172.16.2.152 | >> > > >>>> | 172.16.2.152 | 172.16.2.73 | >> > > >>>> | 172.16.2.152 | 172.16.2.73 | >> > > >>>> | 172.16.2.152 | 172.16.2.73 | >> > > >>>> | 172.16.2.73 | 172.16.2.152 | >> > > >>>> | 172.16.2.73 | 172.16.2.152 | >> > > >>>> | 172.16.2.73 | 172.16.2.152 | >> > > >>>> | 172.16.2.152 | 172.16.2.73 | >> > > >>>> +---------------+---------------+ >> > > >>>> 12 rows selected (5.654 seconds) >> > > >>>> >> > > >>>> And here's what my table structure looks like (as seen via MapR >> > NFS): >> > > >>>> >> > > >>>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15 >> > > >>>> /mapr/vgonzalez.drill/tmp/flows/ >> > > >>>> └── 2015 >> > > >>>> └── 11 >> > > >>>> ├── 10 >> > > >>>> │ ├── 21 >> > > >>>> │ │ ├── 39 >> > > >>>> │ │ │ ├── 03 >> > > >>>> │ │ │ │ ├── _common_metadata >> > > >>>> │ │ │ │ ├── _metadata >> > > >>>> │ │ │ │ ├── >> > > >>>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet >> > > >>>> │ │ │ │ └── _SUCCESS >> > > >>>> │ │ │ └── 20 >> > > >>>> │ │ │ ├── _common_metadata >> > > >>>> │ │ │ ├── _metadata >> > > >>>> │ │ │ ├── >> > > >>>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet >> > > >>>> >> > > >>>> My parquet was created in Spark, not Drill. Not sure if that's >> > > relevant. >> > > >>>> >> > > >>>> I have authentication and impersonation turned on, and the files >> are >> > > >>>> owned >> > > >>>> by mapr:mapr. Here's my drill-override.conf: >> > > >>>> >> > > >>>> drill.exec: { >> > > >>>> cluster-id: "vgonzalez_drill-drillbits", >> > > >>>> zk.connect: >> > > >>>> >> > > >>>> >> > > >> > >> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181" >> > > >>>> } >> > > >>>> drill.exec.impersonation: { enabled: true, >> max_chained_user_hops: 3 >> > } >> > > >>>> drill.exec { security.user.auth { enabled: true, packages += >> > > >>>> "org.apache.drill.exec.rpc.user.security", impl: "pam", >> > pam_profiles: >> > > [ >> > > >>>> "login","sudo","sshd","password-auth" ] } } >> > > >>>> >> > > >>>> >> > > >>>> >> > > >>>> >> > > >>>> >> > > >>>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <[email protected]> >> > > wrote: >> > > >>>> >> > > >>>>> Cool, looking forward to it. >> > > >>>>> >> > > >>>>> On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez < >> > > >>>> [email protected]> >> > > >>>>> wrote: >> > > >>>>> >> > > >>>>>> Hey John, I have a secure cluster and some parquet files, I'll >> try >> > > >>>> this >> > > >>>>> out >> > > >>>>>> and report back. >> > > >>>>>> >> > > >>>>>> On Monday, November 9, 2015, John Omernik <[email protected]> >> > wrote: >> > > >>>>>> >> > > >>>>>>> Has anyone been able to try/test this? I am curious if it's me >> > only >> > > >>>>> issue >> > > >>>>>>> or something more of bug so I can open a JIRA if needed. >> > > >>>>>>> >> > > >>>>>>> John >> > > >>>>>>> >> > > >>>>>>> On Fri, Nov 6, 2015 at 11:06 AM, John Omernik < >> [email protected] >> > > >>>>>>> <javascript:;>> wrote: >> > > >>>>>>> >> > > >>>>>>>> If someone has authorization/authentication setup, to >> reproduce: >> > > >>>>>>>> >> > > >>>>>>>> Have a Parquet table with directories underneath the main (I >> > have >> > > >>>>>>>> directories per day) >> > > >>>>>>>> >> > > >>>>>>>> Then issue REFRESH TABLE METADATA on the root of the table >> > > >>>> running an >> > > >>>>>>>> authenticated user other than the drill bit user. (I am using >> > > >>>> mapr, I >> > > >>>>>>> used >> > > >>>>>>>> my user to run the query, and yes I have access to the data) >> > > >>>>>>>> >> > > >>>>>>>> Then run a normal query and see what the result is. . >> > > >>>>>>>> >> > > >>>>>>>> John >> > > >>>>>>>> >> > > >>>>>>>> On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala < >> > > >>>>>>>> [email protected] <javascript:;>> wrote: >> > > >>>>>>>> >> > > >>>>>>>>> This doesn't make sense and seems like a bug. >> > > >>>>>>>>> I think the right behavior is for the Drillbit to access the >> > > >>>> cache >> > > >>>>> as >> > > >>>>>>>>> Drillbit user at the query time (there is no user level >> > metadata >> > > >>>>> cache >> > > >>>>>>> in >> > > >>>>>>>>> Drill at this point). >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik < >> [email protected] >> > > >>>>>>> <javascript:;>> wrote: >> > > >>>>>>>>> >> > > >>>>>>>>>> I ran REFRESH TABLE METADATA on a table, it completed >> > > >>>>> successfully. >> > > >>>>>>>>>> >> > > >>>>>>>>>> When I tried a subsequent query, I get a IOException: >> > > >>>> Permission >> > > >>>>>>> Denied >> > > >>>>>>>>> on >> > > >>>>>>>>>> .drill.parquet_metadata. >> > > >>>>>>>>>> >> > > >>>>>>>>>> I am running drill with authentication. I ran the REFRESH >> > > >>>> TABLE >> > > >>>>>>>>> METADATA >> > > >>>>>>>>>> as user X, it appears the .drill.parquet_metadata was >> created >> > > >>>> and >> > > >>>>>>> owned >> > > >>>>>>>>> by >> > > >>>>>>>>>> the user the drill bits are running as as is created with >> > > >>>>>> -rwxr-x-r-x >> > > >>>>>>>>>> >> > > >>>>>>>>>> My question is this: So, I can see why the file is owned by >> > > >>>> the >> > > >>>>>> drill >> > > >>>>>>>>> bit >> > > >>>>>>>>>> user, and the file is created with all can read >> permissions, >> > > >>>> but >> > > >>>>> why >> > > >>>>>>> am >> > > >>>>>>>>> I >> > > >>>>>>>>>> getting a permission denied when user X is trying to run a >> > > >>>> query? >> > > >>>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>> >> > > >>>>>>>> >> > > >>>>>>> >> > > >>>>>> >> > > >>>>> >> > > >>>> >> > > >>> >> > > >>> >> > > >> >> > > >> > > >> > >> > >
