suman kumari created HBASE-23122:
------------------------------------

             Summary: FSHDFSUtils#isSameHdfs doesn't handle azure wasb 
filesystems correctly.
                 Key: HBASE-23122
                 URL: https://issues.apache.org/jira/browse/HBASE-23122
             Project: HBase
          Issue Type: Bug
          Components: Filesystem Integration
            Reporter: suman kumari


FSHDFSUtils#isSameHdfs retrieves the Canonical Service Name from Hadoop to 
determine if source and destination are on the same filesystem. This method 
"getCanonicalServiceName()" returns IP address for the file system, which can 
be same for two different file systems but actually there are two separate 
storage accounts,  which incorrectly causes isSameHdfs to return true even when 
they are different. 

It seems this API should not be used  to check if the src and target are in the 
same filesystem, according to the Hadoop API 
[declaration|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fr3.1.1%2Fapi%2Forg%2Fapache%2Fhadoop%2Ffs%2FFileSystem.html%23getCanonicalServiceName--&data=02%7C01%7CSuman.Kumari%40microsoft.com%7Ce85c2f4412a442dd135108d7492c96af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637058328158722814&sdata=yik1Fk7uTYYlx5n4G52ay6PiY0oeodXHnonClLlY0YM%3D&reserved=0]
 . The token cache is the *only user* of the canonical service name, and uses 
it to lookup this FileSystem's service tokens.

This error was found while doing a bulk load on hbase from one file system to 
another file system. Since getCanonicalServiceName() was returning same address 
for both the storage accounts, the two file systems were getting identified as 
same filesystem. When the HBase bulk load commands runs, it tries to find the 
file on the default file system and hence it fails for FileNotFoundException.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to