My files were owned by mapr:mapr. I changed the ownership of everything to
ec2-user, and now get permission denied on the refresh table metadata
command, even though impersonation is on and I authenticated as ec2-user.
If impersonation is working correctly, then I'd expect this should work. Is
this what you see?

It's also kinda weird in that both users involved should have write access
to the files - ec2-user is the owner, and mapr is the superuser on MFS.

[ec2-user@ip-172-16-2-36 tmp]$ sudo -u mapr chown -R ec2-user:ec2-user .
[ec2-user@ip-172-16-2-36 tmp]$ sqlline -u jdbc:drill: -n ec2-user -p mapr
apache drill 1.2.0
"a drill is a terrible thing to waste"
0: jdbc:drill:> select count(*) from dfs.`/tmp/flows`;
+---------+
| EXPR$0  |
+---------+
| 370280  |
+---------+
1 row selected (6.452 seconds)
0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
+--------+-----------------------------------------------------------------------------------------------------+
|   ok   |                                               summary
                                    |
+--------+-----------------------------------------------------------------------------------------------------+
| false  | Error: 2050.6796.144654
/tmp/flows/2015/11/11/15/01/20/.drill.parquet_metadata (Permission denied)
 |
+--------+-----------------------------------------------------------------------------------------------------+
1 row selected (3.253 seconds)

$ ls -la flows/2015/11/11/15/01/20/.drill.parquet_metadata
-rwxr-xr-x 1 ec2-user ec2-user 0 Nov 11 19:55
flows/2015/11/11/15/01/20/.drill.parquet_metadata


Then I tried to CTAS and it works, but apparently impersonation does not:

0: jdbc:drill:> create table dfs.tmp.flows2 as select * from
dfs.`/tmp/flows`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 1_1       | 81222                      |
| 1_3       | 78255                      |
| 1_0       | 113624                     |
| 1_2       | 97179                      |
+-----------+----------------------------+
4 rows selected (22.591 seconds)
0: jdbc:drill:> refresh table metadata dfs.tmp.flows2;
+-------+--------------------------------------------------+
|  ok   |                     summary                      |
+-------+--------------------------------------------------+
| true  | Successfully updated metadata for table flows2.  |
+-------+--------------------------------------------------+
1 row selected (0.13 seconds)

$ ls -la flows2/
total 3499
drwxr-xr-x 2 ec2-user ec2-user       5 Nov 11 21:18 .
drwxrwxrwx 4 ec2-user ec2-user       2 Nov 11 21:18 ..
-rwxr-xr-x 1 ec2-user ec2-user 1068250 Nov 11 21:18 1_0_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  789341 Nov 11 21:18 1_1_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  952667 Nov 11 21:18 1_2_0.parquet
-rwxr-xr-x 1 ec2-user ec2-user  755805 Nov 11 21:18 1_3_0.parquet
-rwxr-xr-x 1 mapr     mapr       14033 Nov 11 21:18 .drill.parquet_metadata


Looks like a bug to me. Impersonation doesn't seem to be in force for
REFRESH TABLE METADATA.


On Wed, Nov 11, 2015 at 4:09 PM, John Omernik <[email protected]> wrote:

> I turned on MapR Auditing (This is a handy feature) and found that when I
> run a query (that is giving me access denied.. my query is select * from
> table limit 1) Per MapR the user I am logged in as (mapradm) is trying to
> do a create operation on the .drill.parquet_metadata operation and I
> guessing it's failing with status: 17 (Not sure what this means, successes
> appear to be "0".  What was intersting was the "CREATE" being attempted
> three times.   Any thoughts on why a select * from tables limit 1 would try
> to initiate a create operation on the .drill.parquet_metadata file?
>
> On Wed, Nov 11, 2015 at 2:25 PM, John Omernik <[email protected]> wrote:
>
> > I take it back.
> >
> > I went to run a query, in the same session that had worked, and now I am
> > getting permission denied.
> >
> > I do have a query running created new directories every 5 minutes,
> > however, these aren't the directories that are giving me permission
> denied.
> >   Did you try running an aggregate query accross all data? This is a
> > interesting one to track down, not sure why I am getting the access
> denied
> > now,
> >
> > the .drill.parquet_metadata file in the directory that I am getting the
> > error on is owned by mapr:mapr and has rwxr-xr-x  permissions. This tells
> > me that both the user of the drillbits (mapr) and the user I am logged
> into
> > in sqlline (mapradm) should be able to read the file... so why do I get
> an
> > access denied in running a query. I any assistance would be valuable here
> > in that there are some great performance increases with the metadata
> > caching, and I don't want to miss out on that.
> >
> > On Wed, Nov 11, 2015 at 2:18 PM, John Omernik <[email protected]> wrote:
> >
> >> All files are owned by mapr:mapr?
> >>
> >> I have a setup where mapr is the user running the drillbit, but then I
> >> have a directory that is owned by a another user. mapradm:mapradm on all
> >> files. (Permissions on directories and files appears to be rwxr-x-r-x)
> When
> >> I run the REFRESH TABLE metatdata the .drill.parquet_metadata file gets
> >> created as mapr:mapr with rwxr-xr-x.
> >>
> >> So
> >> Drillbit User:mapr
> >> Directory (and subdirectories/files) owner: mapradm:mapradm
> >> Directory permissions (all files and folder under main directory)
> >> rwxr-x-r-x
> >>
> >> I authenticated to drill via sqlline as user mapradm (this user should
> be
> >> able to read and write just fine to all directories).
> >>
> >> Now, one thing I did notice is my mapr user was not in the mapradm
> group,
> >> therefore, didn't have write permissions anywhere... when I fixed that
> on
> >> all nodes, and then I manually deleted the metadatafiles, things seem
> to be
> >> working. I wonder if that was my issue?
> >>
> >> Basically, the user running the drillbits need to be able to write files
> >> (the .drill.parquet_metadata)  or something bad will happen :) I will do
> >> more testing. This may be a good candidate for some documentation work
> to
> >> understand what permissions are required to be able to query these.
> >>
> >>
> >>
> >>
> >> On Wed, Nov 11, 2015 at 1:36 PM, Vince Gonzalez <
> [email protected]
> >> > wrote:
> >>
> >>> Hi John, I tried this and didn't find any issues. Let me know if I
> didn't
> >>> follow your reproduction faithfully.
> >>>
> >>> $ sqlline -u jdbc:drill: -n ec2-user -p mapr
> >>> apache drill 1.2.0
> >>> "drill baby drill"
> >>> 0: jdbc:drill:> refresh table metadata dfs.`/tmp/flows`;
> >>> +-------+------------------------------------------------------+
> >>> |  ok   |                       summary                        |
> >>> +-------+------------------------------------------------------+
> >>> | true  | Successfully updated metadata for table /tmp/flows.  |
> >>> +-------+------------------------------------------------------+
> >>> 1 row selected (32.27 seconds)
> >>> 0: jdbc:drill:> select srcIP,dstIP from dfs.`/tmp/flows` limit 12;
> >>> +---------------+---------------+
> >>> |     srcIP     |     dstIP     |
> >>> +---------------+---------------+
> >>> | 172.16.2.152  | 172.16.1.58   |
> >>> | 172.16.1.58   | 172.16.2.152  |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> | 172.16.2.73   | 172.16.2.152  |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> | 172.16.2.73   | 172.16.2.152  |
> >>> | 172.16.2.73   | 172.16.2.152  |
> >>> | 172.16.2.73   | 172.16.2.152  |
> >>> | 172.16.2.152  | 172.16.2.73   |
> >>> +---------------+---------------+
> >>> 12 rows selected (5.654 seconds)
> >>>
> >>> And here's what my table structure looks like (as seen via MapR NFS):
> >>>
> >>> $ tree /mapr/vgonzalez.drill/tmp/flows/ | head -15
> >>> /mapr/vgonzalez.drill/tmp/flows/
> >>> └── 2015
> >>>     └── 11
> >>>         ├── 10
> >>>         │   ├── 21
> >>>         │   │   ├── 39
> >>>         │   │   │   ├── 03
> >>>         │   │   │   │   ├── _common_metadata
> >>>         │   │   │   │   ├── _metadata
> >>>         │   │   │   │   ├──
> >>> part-r-00000-853882bd-66d8-4505-96ba-f0a282e374de.gz.parquet
> >>>         │   │   │   │   └── _SUCCESS
> >>>         │   │   │   └── 20
> >>>         │   │   │       ├── _common_metadata
> >>>         │   │   │       ├── _metadata
> >>>         │   │   │       ├──
> >>> part-r-00000-37a94549-8e56-46d5-be88-cb28e6d8bc35.gz.parquet
> >>>
> >>> My parquet was created in Spark, not Drill. Not sure if that's
> relevant.
> >>>
> >>> I have authentication and impersonation turned on, and the files are
> >>> owned
> >>> by mapr:mapr. Here's my drill-override.conf:
> >>>
> >>> drill.exec: {
> >>>   cluster-id: "vgonzalez_drill-drillbits",
> >>> zk.connect:
> >>>
> >>>
> "ip-172-16-2-36.ec2.internal:5181,ip-172-16-2-37.ec2.internal:5181,ip-172-16-2-38.ec2.internal:5181"
> >>> }
> >>> drill.exec.impersonation: { enabled: true, max_chained_user_hops: 3 }
> >>> drill.exec { security.user.auth { enabled: true, packages +=
> >>> "org.apache.drill.exec.rpc.user.security", impl: "pam", pam_profiles: [
> >>> "login","sudo","sshd","password-auth" ] } }
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Nov 10, 2015 at 1:17 PM, John Omernik <[email protected]>
> wrote:
> >>>
> >>> > Cool, looking forward to it.
> >>> >
> >>> > On Mon, Nov 9, 2015 at 7:21 PM, Vince Gonzalez <
> >>> [email protected]>
> >>> > wrote:
> >>> >
> >>> > > Hey John, I have a secure cluster and some parquet files, I'll try
> >>> this
> >>> > out
> >>> > > and report back.
> >>> > >
> >>> > > On Monday, November 9, 2015, John Omernik <[email protected]>
> wrote:
> >>> > >
> >>> > > > Has anyone been able to try/test this? I am curious if it's me
> only
> >>> > issue
> >>> > > > or something more of bug so I can open a JIRA if needed.
> >>> > > >
> >>> > > > John
> >>> > > >
> >>> > > > On Fri, Nov 6, 2015 at 11:06 AM, John Omernik <[email protected]
> >>> > > > <javascript:;>> wrote:
> >>> > > >
> >>> > > > > If someone has authorization/authentication setup, to
> reproduce:
> >>> > > > >
> >>> > > > > Have a Parquet table with directories underneath the main (I
> have
> >>> > > > > directories per day)
> >>> > > > >
> >>> > > > > Then issue REFRESH TABLE METADATA on the root of the table
> >>> running an
> >>> > > > > authenticated user other than the drill bit user. (I am using
> >>> mapr, I
> >>> > > > used
> >>> > > > > my user to run the query, and yes I have access to the data)
> >>> > > > >
> >>> > > > > Then run a normal query and see what the result is. .
> >>> > > > >
> >>> > > > > John
> >>> > > > >
> >>> > > > > On Fri, Nov 6, 2015 at 10:22 AM, Neeraja Rentachintala <
> >>> > > > > [email protected] <javascript:;>> wrote:
> >>> > > > >
> >>> > > > >> This doesn't make sense and seems like a bug.
> >>> > > > >> I think the right behavior is for the Drillbit to access the
> >>> cache
> >>> > as
> >>> > > > >> Drillbit user at the query time (there is no user level
> metadata
> >>> > cache
> >>> > > > in
> >>> > > > >> Drill at this point).
> >>> > > > >>
> >>> > > > >>
> >>> > > > >>
> >>> > > > >> On Fri, Nov 6, 2015 at 6:57 AM, John Omernik <
> [email protected]
> >>> > > > <javascript:;>> wrote:
> >>> > > > >>
> >>> > > > >> > I ran REFRESH TABLE METADATA on a table, it completed
> >>> > successfully.
> >>> > > > >> >
> >>> > > > >> > When I tried a subsequent query, I get a IOException:
> >>> Permission
> >>> > > > Denied
> >>> > > > >> on
> >>> > > > >> > .drill.parquet_metadata.
> >>> > > > >> >
> >>> > > > >> > I am running drill with authentication.  I ran the REFRESH
> >>> TABLE
> >>> > > > >> METADATA
> >>> > > > >> > as user X, it appears the .drill.parquet_metadata was
> created
> >>> and
> >>> > > > owned
> >>> > > > >> by
> >>> > > > >> > the user the drill bits are running as as is created with
> >>> > > -rwxr-x-r-x
> >>> > > > >> >
> >>> > > > >> > My question is this: So, I can see why the file is owned by
> >>> the
> >>> > > drill
> >>> > > > >> bit
> >>> > > > >> > user, and the file is created with all can read permissions,
> >>> but
> >>> > why
> >>> > > > am
> >>> > > > >> I
> >>> > > > >> > getting a permission denied when user X is trying to run a
> >>> query?
> >>> > > > >> >
> >>> > > > >>
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Reply via email to