Margus,

I guess creating external table that reads from large number of HDFS files 
might be demanding more memory in HiveServer2.

When you disabled Ranger as authorizer, did you configure SQLStdAuthorizer as 
the authorizer? Or was authorization disabled in HiveServer2? If authorization 
was disabled, then HiveServer2 may not go out to NameNode (to check for user 
access to underlying HDFS files) and hence may not trigger this condition. You 
can perhaps try with increased memory for HiveServer2.

Thanks,
Madhan


From:  Margus Roo <mar...@roo.ee>
Reply-To:  "user@ranger.incubator.apache.org" <user@ranger.incubator.apache.org>
Date:  Thursday, May 19, 2016 at 1:51 AM
To:  "user@ranger.incubator.apache.org" <user@ranger.incubator.apache.org>
Subject:  Re: Can not create hive2 external table

Is there any usecase where you have more than 100 000 files and you try to 
create external table over them and ranger is enabled?

Is it possible at all?
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780
On 17/05/16 09:05, Margus Roo wrote:

Hi

[margusja@bigdata29 ~]$ hdfs dfs -count /tmp/files_10k
           1       100000             588895 /tmp/files_10k

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://bigdata29.webmedia.int:10000/> create external table files_10k 
(i int) row format delimited fields terminated by '\t' location 
'/tmp/files_10k';
Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)
1: jdbc:hive2://bigdata29.webmedia.int:10000/>



In namenode log there are loads of lines like:

op.hdfs.protocol.ClientProtocol
2016-05-17 01:57:55,408 INFO  ipc.Server (Server.java:saslProcess(1386)) - Auth 
successful for hive/bigdata29.webmedia....@testhadoop.com (auth:KERBEROS)
2016-05-17 01:57:55,409 INFO  authorize.ServiceAuthorizationManager 
(ServiceAuthorizationManager.java:authorize(135)) - Authorization successful 
for margusja (auth:PROXY) via hive/bigdata29.webmedia....@testhadoop.com 
(auth:KERBEROS) for protocol=interface 
org.apache.hadoop.hdfs.protocol.ClientProtocol

In hiveserver2.log there is loads of lines like

2016-05-17 01:58:40,202 INFO  
[org.apache.hadoop.util.JvmPauseMonitor$Monitor@6704df84]: util.JvmPauseMonitor 
(JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg 
GC): pause of approximately 1221ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=1251ms

2016-05-17 02:00:12,021 INFO  
[org.apache.hadoop.util.JvmPauseMonitor$Monitor@6704df84]: util.JvmPauseMonitor 
(JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg 
GC): pause of approximately 1455ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=1946ms
2016-05-17 02:00:13,963 INFO  
[org.apache.hadoop.util.JvmPauseMonitor$Monitor@6704df84]: util.JvmPauseMonitor 
(JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg 
GC): pause of approximately 1441ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=1928ms



Now I disable ranger and:

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
4: jdbc:hive2://bigdata29.webmedia.int:10000/> create external table files_10k 
(i int) row format delimited fields terminated by '\t' location 
'/tmp/files_10k';
No rows affected (1.399 seconds)
4: jdbc:hive2://bigdata29.webmedia.int:10000/>
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780
On 17/05/16 01:24, Don Bosco Durai wrote:
There is an implicit check done by HiveServer2 to make sure the user has access 
to the external files. You are correct, at the HDFS level, each file permission 
is individually checked.

What sort of error are you getting in hive log file? And is there any error on 
the HDFS side?

Thanks

Bosco


From: Margus Roo <mar...@roo.ee>
Reply-To: <user@ranger.incubator.apache.org>
Date: Monday, May 16, 2016 at 7:36 AM
To: <user@ranger.incubator.apache.org>
Subject: Can not create hive2 external table

Hi

In case I try to create external table and I have in example 100 000 files in 
locations I point in hive DDL and Ranger authorization is enabled then I will 
get different errors in hive log. Mainly they are GC timeouts.
In hdfs namenode log I can see loads of authorization rows. I think ranger is 
doing check for every single file ?!

In case I disable ranger authorization for hive then it creates external table.

So - am I doing something wrong? 
-- 
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780



Reply via email to