Margus, I guess creating external table that reads from large number of HDFS files might be demanding more memory in HiveServer2.
When you disabled Ranger as authorizer, did you configure SQLStdAuthorizer as the authorizer? Or was authorization disabled in HiveServer2? If authorization was disabled, then HiveServer2 may not go out to NameNode (to check for user access to underlying HDFS files) and hence may not trigger this condition. You can perhaps try with increased memory for HiveServer2. Thanks, Madhan From: Margus Roo <mar...@roo.ee> Reply-To: "user@ranger.incubator.apache.org" <user@ranger.incubator.apache.org> Date: Thursday, May 19, 2016 at 1:51 AM To: "user@ranger.incubator.apache.org" <user@ranger.incubator.apache.org> Subject: Re: Can not create hive2 external table Is there any usecase where you have more than 100 000 files and you try to create external table over them and ranger is enabled? Is it possible at all? Margus (margusja) Roo http://margus.roo.ee skype: margusja +372 51 48 780 On 17/05/16 09:05, Margus Roo wrote: Hi [margusja@bigdata29 ~]$ hdfs dfs -count /tmp/files_10k 1 100000 588895 /tmp/files_10k Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485) Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485) Transaction isolation: TRANSACTION_REPEATABLE_READ 1: jdbc:hive2://bigdata29.webmedia.int:10000/> create external table files_10k (i int) row format delimited fields terminated by '\t' location '/tmp/files_10k'; Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0) 1: jdbc:hive2://bigdata29.webmedia.int:10000/> In namenode log there are loads of lines like: op.hdfs.protocol.ClientProtocol 2016-05-17 01:57:55,408 INFO ipc.Server (Server.java:saslProcess(1386)) - Auth successful for hive/bigdata29.webmedia....@testhadoop.com (auth:KERBEROS) 2016-05-17 01:57:55,409 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(135)) - Authorization successful for margusja (auth:PROXY) via hive/bigdata29.webmedia....@testhadoop.com (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol In hiveserver2.log there is loads of lines like 2016-05-17 01:58:40,202 INFO [org.apache.hadoop.util.JvmPauseMonitor$Monitor@6704df84]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1221ms GC pool 'PS MarkSweep' had collection(s): count=1 time=1251ms 2016-05-17 02:00:12,021 INFO [org.apache.hadoop.util.JvmPauseMonitor$Monitor@6704df84]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1455ms GC pool 'PS MarkSweep' had collection(s): count=1 time=1946ms 2016-05-17 02:00:13,963 INFO [org.apache.hadoop.util.JvmPauseMonitor$Monitor@6704df84]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1441ms GC pool 'PS MarkSweep' had collection(s): count=1 time=1928ms Now I disable ranger and: Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485) Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485) Transaction isolation: TRANSACTION_REPEATABLE_READ 4: jdbc:hive2://bigdata29.webmedia.int:10000/> create external table files_10k (i int) row format delimited fields terminated by '\t' location '/tmp/files_10k'; No rows affected (1.399 seconds) 4: jdbc:hive2://bigdata29.webmedia.int:10000/> Margus (margusja) Roo http://margus.roo.ee skype: margusja +372 51 48 780 On 17/05/16 01:24, Don Bosco Durai wrote: There is an implicit check done by HiveServer2 to make sure the user has access to the external files. You are correct, at the HDFS level, each file permission is individually checked. What sort of error are you getting in hive log file? And is there any error on the HDFS side? Thanks Bosco From: Margus Roo <mar...@roo.ee> Reply-To: <user@ranger.incubator.apache.org> Date: Monday, May 16, 2016 at 7:36 AM To: <user@ranger.incubator.apache.org> Subject: Can not create hive2 external table Hi In case I try to create external table and I have in example 100 000 files in locations I point in hive DDL and Ranger authorization is enabled then I will get different errors in hive log. Mainly they are GC timeouts. In hdfs namenode log I can see loads of authorization rows. I think ranger is doing check for every single file ?! In case I disable ranger authorization for hive then it creates external table. So - am I doing something wrong? -- Margus (margusja) Roo http://margus.roo.ee skype: margusja +372 51 48 780