I fully disabled hive authorization.

Ok I try to increase memory and I let you know how it goes.

Tnx

Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780

On 19/05/16 13:04, Madhan Neethiraj wrote:
Margus,

I guess creating external table that reads from large number of HDFS files might be demanding more memory in HiveServer2.

When you disabled Ranger as authorizer, did you configure SQLStdAuthorizer as the authorizer? Or was authorization disabled in HiveServer2? If authorization was disabled, then HiveServer2 may not go out to NameNode (to check for user access to underlying HDFS files) and hence may not trigger this condition. You can perhaps try with increased memory for HiveServer2.

Thanks,
Madhan


From: Margus Roo <mar...@roo.ee <mailto:mar...@roo.ee>>
Reply-To: "user@ranger.incubator.apache.org <mailto:user@ranger.incubator.apache.org>" <user@ranger.incubator.apache.org <mailto:user@ranger.incubator.apache.org>>
Date: Thursday, May 19, 2016 at 1:51 AM
To: "user@ranger.incubator.apache.org <mailto:user@ranger.incubator.apache.org>" <user@ranger.incubator.apache.org <mailto:user@ranger.incubator.apache.org>>
Subject: Re: Can not create hive2 external table

Is there any usecase where you have more than 100 000 files and you try to create external table over them and ranger is enabled?

Is it possible at all?

Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780
On 17/05/16 09:05, Margus Roo wrote:

Hi

[margusja@bigdata29 ~]$ hdfs dfs -count /tmp/files_10k
           1       100000             588895 /tmp/files_10k

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://bigdata29.webmedia.int:10000/> create external table files_10k (i int) row format delimited fields terminated by '\t' location '/tmp/files_10k'; Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)
1: jdbc:hive2://bigdata29.webmedia.int:10000/>


In namenode log there are loads of lines like:

op.hdfs.protocol.ClientProtocol
2016-05-17 01:57:55,408 INFO ipc.Server (Server.java:saslProcess(1386)) - Auth successful for hive/bigdata29.webmedia....@testhadoop.com (auth:KERBEROS) 2016-05-17 01:57:55,409 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(135)) - Authorization successful for margusja (auth:PROXY) via hive/bigdata29.webmedia....@testhadoop.com (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

In hiveserver2.log there is loads of lines like

2016-05-17 01:58:40,202 INFO [org.apache.hadoop.util.JvmPauseMonitor$Monitor@6704df84]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1221ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=1251ms

2016-05-17 02:00:12,021 INFO [org.apache.hadoop.util.JvmPauseMonitor$Monitor@6704df84]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1455ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=1946ms
2016-05-17 02:00:13,963 INFO [org.apache.hadoop.util.JvmPauseMonitor$Monitor@6704df84]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1441ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=1928ms


Now I disable ranger and:

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
4: jdbc:hive2://bigdata29.webmedia.int:10000/> create external table files_10k (i int) row format delimited fields terminated by '\t' location '/tmp/files_10k';
No rows affected (1.399 seconds)
4: jdbc:hive2://bigdata29.webmedia.int:10000/>

Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780
On 17/05/16 01:24, Don Bosco Durai wrote:
There is an implicit check done by HiveServer2 to make sure the user has access to the external files. You are correct, at the HDFS level, each file permission is individually checked.

What sort of error are you getting in hive log file? And is there any error on the HDFS side?

Thanks

Bosco


From: Margus Roo <mar...@roo.ee>
Reply-To: <user@ranger.incubator.apache.org>
Date: Monday, May 16, 2016 at 7:36 AM
To: <user@ranger.incubator.apache.org>
Subject: Can not create hive2 external table

    Hi

    In case I try to create external table and I have in example 100
    000 files in locations I point in hive DDL and Ranger
    authorization is enabled then I will get different errors in
    hive log. Mainly they are GC timeouts.
    In hdfs namenode log I can see loads of authorization rows. I
    think ranger is doing check for every single file ?!

    In case I disable ranger authorization for hive then it creates
    external table.

    So - am I doing something wrong?

-- Margus (margusja) Roo
    http://margus.roo.ee
    skype: margusja
    +372 51 48 780




Reply via email to