I turned on MapR Auditing (This is a handy feature) and found that when I run a query (that is giving me access denied.. my query is select * from table limit 1) Per MapR the user I am logged in as (mapradm) is trying to do a create operation on the .drill.parquet_metadata operation and I guessing it's failing with status: 17 (Not sure what this means, successes appear to be "0". What was intersting was the "CREATE" being attempted three times. Any thoughts on why a select * from tables limit 1 would try to initiate a create operation on the .drill.parquet_metadata file?
On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <[email protected]> wrote: > I take it back. > > I went to run a query, in the same session that had worked, and now I am > getting permission denied. > > I do have a query running created new directories every 5 minutes, > however, these aren't the directories that are giving me permission denied. > Did you try running an aggregate query accross all data? This is a > interesting one to track down, not sure why I am getting the access denied > now, > > the .drill.parquet_metadata file in the directory that I am getting the > error on is owned by mapr:mapr and has rwxr-xr-x permissions. This tells > me that both the user of the drillbits (mapr) and the user I am logged into > in sqlline (mapradm) should be able to read the file... so why do I get an > access denied in running a query. I any assistance would be valuable here > in that there are some great performance increases with the metadata > caching, and I don't want to miss out on that. > > On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <[email protected]> wrote: > >> All files are owned by mapr:mapr? >> >> I have a setup where mapr is the user running the drillbit, but then I >> have a directory that is owned by a another user. mapradm:mapradm on all >> files. (Permissions on directories and files appears to be rwxr-x-r-x) When >> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file gets >> created as mapr:mapr with rwxr-xr-x. >> >> So >> Drillbit User:mapr >> Directory (and subdirectories/files) owner: mapradm:mapradm >> Directory permissions (all files and folder under main directory) >> rwxr-x-r-x >> >> I authenticated to drill via sqlline as user mapradm (this user should be >> able to read and write just fine to all directories). >> >> Now, one thing I did notice is my mapr user was not in the mapradm group, >> therefore, didn't have write permissions anywhere... when I fixed that on >> all nodes, and then I manually deleted the metadatafiles, things seem to be >> working. I wonder if that was my issue? >> >> Basically, the user running the drillbits need to be able to write files >> (the .drill.parquet_metadata) or something bad will happen :) I will do >> more testing. This may be a good candidate for some documentation work to >> understand what permissions are required to be able to query these. >> >> >> >> >> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <[email protected] >> > wrote: >> >>> Hi John, I tried this and didn't find any issues. Let me know if I didn't >>> follow your reproduction faithfully. >>> >>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr >>> apache drill 1.2.0 >>> "drill baby drill" >>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`; >>> +-------+------------------------------------------------------+ >>> | ok | summary | >>> +-------+------------------------------------------------------+ >>> | true | Successfully updated metadata for table /tmp/flows. | >>> +-------+------------------------------------------------------+ >>> 1 row selected (32.27 seconds) >>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12; >>> +---------------+---------------+ >>> | srcIP | dstIP | >>> +---------------+---------------+ >>> | 172.16.2.152 | 172.16.1.58 | >>> | 172.16.1.58 | 172.16.2.152 | >>> | 172.16.2.152 | 172.16.2.73 | >>> | 172.16.2.152 | 172.16.2.73 | >>> | 172.16.2.73 | 172.16.2.152 | >>> | 172.16.2.152 | 172.16.2.73 | >>> | 172.16.2.152 | 172.16.2.73 | >>> | 172.16.2.152 | 172.16.2.73 | >>> | 172.16.2.73 | 172.16.2.152 | >>> | 172.16.2.73 | 172.16.2.152 | >>> | 172.16.2.73 | 172.16.2.152 | >>> | 172.16.2.152 | 172.16.2.73 | >>> +---------------+---------------+ >>> 12 rows selected (5.654 seconds) >>> >>> And here's what my table structure looks like (as seen via MapR NFS): >>> >>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15 >>> /mapr/vgonzalez.drill/tmp/flows/ >>> └── 2015 >>> └── 11 >>> ├── 10 >>> │ ├── 21 >>> │ │ ├── 39 >>> │ │ │ ├── 03 >>> │ │ │ │ ├── _common_metadata >>> │ │ │ │ ├── _metadata >>> │ │ │ │ ├── >>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet >>> │ │ │ │ └── _SUCCESS >>> │ │ │ └── 20 >>> │ │ │ ├── _common_metadata >>> │ │ │ ├── _metadata >>> │ │ │ ├── >>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet >>> >>> My parquet was created in Spark, not Drill. Not sure if that's relevant. >>> >>> I have authentication and impersonation turned on, and the files are >>> owned >>> by mapr:mapr. Here's my drill-override.conf: >>> >>> drill.exec: { >>> cluster-id: "vgonzalez_drill-drillbits", >>> zk.connect: >>> >>> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181" >>> } >>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 } >>> drill.exec { security.user.auth { enabled: true, packages += >>> "org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles: [ >>> "login","sudo","sshd","password-auth" ] } } >>> >>> >>> >>> >>> >>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <[email protected]> wrote: >>> >>> > Cool, looking forward to it. >>> > >>> > On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez < >>> [email protected]> >>> > wrote: >>> > >>> > > Hey John, I have a secure cluster and some parquet files, I'll try >>> this >>> > out >>> > > and report back. >>> > > >>> > > On Monday, November 9, 2015, John Omernik <[email protected]> wrote: >>> > > >>> > > > Has anyone been able to try/test this? I am curious if it's me only >>> > issue >>> > > > or something more of bug so I can open a JIRA if needed. >>> > > > >>> > > > John >>> > > > >>> > > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <[email protected] >>> > > > <javascript:;>> wrote: >>> > > > >>> > > > > If someone has authorization/authentication setup, to reproduce: >>> > > > > >>> > > > > Have a Parquet table with directories underneath the main (I have >>> > > > > directories per day) >>> > > > > >>> > > > > Then issue REFRESH TABLE METADATA on the root of the table >>> running an >>> > > > > authenticated user other than the drill bit user. (I am using >>> mapr, I >>> > > > used >>> > > > > my user to run the query, and yes I have access to the data) >>> > > > > >>> > > > > Then run a normal query and see what the result is. . >>> > > > > >>> > > > > John >>> > > > > >>> > > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala < >>> > > > > [email protected] <javascript:;>> wrote: >>> > > > > >>> > > > >> This doesn't make sense and seems like a bug. >>> > > > >> I think the right behavior is for the Drillbit to access the >>> cache >>> > as >>> > > > >> Drillbit user at the query time (there is no user level metadata >>> > cache >>> > > > in >>> > > > >> Drill at this point). >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <[email protected] >>> > > > <javascript:;>> wrote: >>> > > > >> >>> > > > >> > I ran REFRESH TABLE METADATA on a table, it completed >>> > successfully. >>> > > > >> > >>> > > > >> > When I tried a subsequent query, I get a IOException: >>> Permission >>> > > > Denied >>> > > > >> on >>> > > > >> > .drill.parquet_metadata. >>> > > > >> > >>> > > > >> > I am running drill with authentication. I ran the REFRESH >>> TABLE >>> > > > >> METADATA >>> > > > >> > as user X, it appears the .drill.parquet_metadata was created >>> and >>> > > > owned >>> > > > >> by >>> > > > >> > the user the drill bits are running as as is created with >>> > > -rwxr-x-r-x >>> > > > >> > >>> > > > >> > My question is this: So, I can see why the file is owned by >>> the >>> > > drill >>> > > > >> bit >>> > > > >> > user, and the file is created with all can read permissions, >>> but >>> > why >>> > > > am >>> > > > >> I >>> > > > >> > getting a permission denied when user X is trying to run a >>> query? >>> > > > >> > >>> > > > >> >>> > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >> >
