[jira] [Created] (HDFS-13138) webhdfs of federated namenode does not work properly

2018-02-12 Thread KWON BYUNGCHANG (JIRA)
KWON BYUNGCHANG created HDFS-13138:
--

 Summary: webhdfs of federated namenode does not  work properly
 Key: HDFS-13138
 URL: https://issues.apache.org/jira/browse/HDFS-13138
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 3.0.0, 2.7.1
Reporter: KWON BYUNGCHANG


my cluster has multiple namenodes using HDFS Federation.

webhdfs that is not defaultFS does not work properly.

when I uploaded to non defaultFS namenode  using webhdfs.

uploaded file was founded at defaultFS namenode.

 

I think root cause is that

  clientNamenodeAddress of non defaultFS namenode is always fs.defaultFS.

 
[https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L462]

 
{code:java}
/**
   * Set the namenode address that will be used by clients to access this
   * namenode or name service. This needs to be called before the config
   * is overriden.
   */
  public void setClientNamenodeAddress(Configuration conf) {
String nnAddr = conf.get(FS_DEFAULT_NAME_KEY);
if (nnAddr == null) {
  // default fs is not set.
  clientNamenodeAddress = null;
  return;
}

LOG.info("{} is {}", FS_DEFAULT_NAME_KEY, nnAddr);
URI nnUri = URI.create(nnAddr);

String nnHost = nnUri.getHost();
if (nnHost == null) {
  clientNamenodeAddress = null;
  return;
}

if (DFSUtilClient.getNameServiceIds(conf).contains(nnHost)) {
  // host name is logical
  clientNamenodeAddress = nnHost;
} else if (nnUri.getPort() > 0) {
  // physical address with a valid port
  clientNamenodeAddress = nnUri.getAuthority();
} else {
  // the port is missing or 0. Figure out real bind address later.
  clientNamenodeAddress = null;
  return;
}
LOG.info("Clients are to use {} to access"
+ " this namenode/service.", clientNamenodeAddress );
  }

{code}
 

so webhdfs is redirected to datanode having wrong namenoderpcaddress parameter

finally file was located namenode of fs,defaultFS

 

workaround is

  configure fs.defaultFS of each namenode to its own nameservice.  

e.g.

  hdfs://ns1  has fs.defaultFS=hdfs://ns1

  hdfs://ns2  has fs.defaultFS=hdfs://ns2

  

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13137) Ozone: Ozonefs read fails because ChunkGroupInputStream#read does not iterate through all the blocks in the key

2018-02-12 Thread Mukul Kumar Singh (JIRA)
Mukul Kumar Singh created HDFS-13137:


 Summary: Ozone: Ozonefs read fails because 
ChunkGroupInputStream#read does not iterate through all the blocks in the key
 Key: HDFS-13137
 URL: https://issues.apache.org/jira/browse/HDFS-13137
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh
 Fix For: HDFS-7240


OzoneFilesystem put is failing with the following exception. This happens 
because ChunkGroupInputStream#read does not iterate through all the blocks in 
the key.

{code}
[hdfs@y129 ~]$ time /opt/hadoop/hadoop-3.1.0-SNAPSHOT/bin/hdfs dfs -put test3 
/test3a
18/02/12 13:36:21 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
2018-02-12 13:36:22,211 [main] INFO   - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.

18/02/12 13:37:25 WARN util.ShutdownHookManager: ShutdownHook 'ClientFinalizer' 
timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



HDFS Federation and HTTP Kerberos Properties

2018-02-12 Thread Grant Langlois
Hello,

While implementing a secure HDFS setup with federation, I was reviewing the
NameNode properties which are allowed to be set on a per nameservice basis.
Omitted from those properties were the
dfs.web.authentication.kerberos.principal and
dfs.web.authentication.kerberos.keytab properties.

Is there a technical reason that these are to be common across all
NameNodes or is it just that this hasn't been a requested feature yet? In
our case we're using federation to achieve isolation, so I feel it would
make sense to want to separate the kerberos credentials on a per NameNode
basis for the HTTP server as well.

Kind Regards,

Grant


[jira] [Created] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-12 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-13136:
-

 Summary: Avoid taking FSN lock while doing group member lookup for 
FSD permission check
 Key: HDFS-13136
 URL: https://issues.apache.org/jira/browse/HDFS-13136
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
take FSN lock first and then FSD lock.  The permission check is done via 
FSPermissionChecker at FSD layer assuming FSN lock is taken. 

The FSPermissionChecker constructor invokes callerUgi.getGroups() that can take 
seconds sometimes. There are external cache scheme such SSSD and internal cache 
scheme for group lookup. However, the delay could still occur during cache 
refresh, which causes severe FSN lock contentions and unresponsive namenode 
issues.

Checking the current code, we found that getBlockLocations(..) did it right but 
some methods such as getFileInfo(..), getContentSummary(..) did it wrong. This 
ticket is open to ensure the group lookup for permission checker is outside the 
FSN lock.  
 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13135) Lease not deleted when deleting INodeReference

2018-02-12 Thread Sean Mackrory (JIRA)
Sean Mackrory created HDFS-13135:


 Summary: Lease not deleted when deleting INodeReference
 Key: HDFS-13135
 URL: https://issues.apache.org/jira/browse/HDFS-13135
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sean Mackrory
Assignee: Sean Mackrory


In troubleshooting an occurrence of HDFS-13115, it seemed that there was 
another underlying root cause that should also be addressed. There was an 
INodeReference that was deleted and the lease on it was not subsequently 
deleted because it was never added to the reclaim context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-12 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDFS-13134:
--

 Summary: Ozone: Format open containers on datanode restart
 Key: HDFS-13134
 URL: https://issues.apache.org/jira/browse/HDFS-13134
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Lokesh Jain
Assignee: Lokesh Jain


Once a datanode is restarted its open containers should be formatted. Only the 
open containers whose pipeline has a replication factor of three will need to 
be formatted. The format command is sent by SCM to the datanode after the 
corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reopened HDFS-8693:


> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693.02.patch, HDFS-8693.03.patch, HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org