[jira] [Commented] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block

2022-01-19 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17478505#comment-17478505
 ] 

Yuxuan Wang commented on HDFS-16214:


[~zhuxiangyi] I'm not read entirely but it looks similar to HDFS-16043 ?

> Lock optimization for large deleteing, no locks on the collection block
> ---
>
> Key: HDFS-16214
> URL: https://issues.apache.org/jira/browse/HDFS-16214
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The time-consuming deletion is mainly reflected in three logics , collecting 
> blocks, deleting Inode from InodeMap, and deleting blocks. The current 
> deletion is divided into two major steps. Step 1 acquires the lock, collects 
> the block and inode, deletes the inode, and releases the lock. Step 2 Acquire 
> the lock and delete the block to release the lock.
> Phase 2 is currently deleting blocks in batches, which can control the lock 
> holding time. Here we can also delete blocks asynchronously.
> Now step 1 still has the problem of holding the lock for a long time.
> For stage 1, we can make the collection block not hold the lock. The process 
> is as follows, step 1 obtains the lock, parent.removeChild, writes to 
> editLog, releases the lock. Step 2 no lock, collects the block. Step 3 
> acquire lock, update quota, release lease, release lock. Step 4 acquire lock, 
> delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block 
> to release lock.
> There may be some problems following the above process:
> 1. When the /a/b/c file is writing, then delete the /a/b directory. If the 
> deletion is performed to the collecting block stage, the client writes 
> complete or addBlock to the /a/b/c file at this time. This step is not locked 
> and delete /a/b and editLog has been written successfully. In this case, the 
> order of editLog is delete /a/c and complete /a/b/c. In this case, the 
> standby node playback editLog /a/b/c file has been deleted, and then go to 
> complete /a/b/c file will be abnormal.
> *The process is as follows:*
> *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* 
> *replay  editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c 
> {color:#ff}(not found){color}*
> 2. If a delete operation is executed to the stage of collecting block, then 
> the administrator executes saveNameSpace, and then restarts Namenode. This 
> situation may cause the Inode that has been deleted from the parent childList 
> to remain in the InodeMap.
> To solve the above problem, in step 1, add the inode being deleted to the 
> Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile 
> EditLog), check whether there is this file and one of its parent Inodes in 
> the Set, and throw it if there is. An exception FileNotFoundException 
> occurred.
> In addition, the execution of saveNamespace needs to wait for all iNodes in 
> Set to be removed before execution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16284) Unboxing long value in old version's block report cause full GC

2021-10-26 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-16284:
---
Description: 
When upgrade cluster from 2.6 to 3.1, namenode fall into full GC trouble.

NN run 3.1 version and DN run 2.6 version, block report type is longs not PB 
introduced by https://issues.apache.org/jira/browse/HDFS-7435 . So it's run 
into LongDecoder.

We find change 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs.LongsDecoder#values type from 
List to long[] can fix the issue, but I can't for sure that the Long 
unboxing cause full GC problem.

  was:
When upgrade cluster from 2.6 to 3.1, namenode fall into full GC trouble.

NN run 3.1 version and DN run 2.6 version, block report type is longs not PB in 


> Unboxing long value in old version's block report cause full GC
> ---
>
> Key: HDFS-16284
> URL: https://issues.apache.org/jira/browse/HDFS-16284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When upgrade cluster from 2.6 to 3.1, namenode fall into full GC trouble.
> NN run 3.1 version and DN run 2.6 version, block report type is longs not PB 
> introduced by https://issues.apache.org/jira/browse/HDFS-7435 . So it's run 
> into LongDecoder.
> We find change 
> org.apache.hadoop.hdfs.protocol.BlockListAsLongs.LongsDecoder#values type 
> from List to long[] can fix the issue, but I can't for sure that the 
> Long unboxing cause full GC problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16284) Unboxing long value in old version's block report cause full GC

2021-10-26 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-16284:
---
Description: 
When upgrade cluster from 2.6 to 3.1, namenode fall into full GC trouble.

NN run 3.1 version and DN run 2.6 version, block report type is longs not PB in 

> Unboxing long value in old version's block report cause full GC
> ---
>
> Key: HDFS-16284
> URL: https://issues.apache.org/jira/browse/HDFS-16284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When upgrade cluster from 2.6 to 3.1, namenode fall into full GC trouble.
> NN run 3.1 version and DN run 2.6 version, block report type is longs not PB 
> in 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16284) Unboxing long value in old version's block report cause full GC

2021-10-26 Thread Yuxuan Wang (Jira)
Yuxuan Wang created HDFS-16284:
--

 Summary: Unboxing long value in old version's block report cause 
full GC
 Key: HDFS-16284
 URL: https://issues.apache.org/jira/browse/HDFS-16284
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yuxuan Wang
Assignee: Yuxuan Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15756) RBF: Cannot get updated delegation token from zookeeper

2021-04-15 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322581#comment-17322581
 ] 

Yuxuan Wang commented on HDFS-15756:


[~hexiaoqiao] As for Yarn, RM will renew token immediately when client submit 
job since it want to verify the token. Maybe Spark is similar?

Our company's zookeeper cluster gain a good performence after backporting 
https://issues.apache.org/jira/browse/HADOOP-16828 and 
https://issues.apache.org/jira/browse/HDFS-15383

And for avoiding renew failed, I throw StandbyException here let client retry 
and failover to other router.

> RBF: Cannot get updated delegation token from zookeeper
> ---
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15756) [RBF] Cannot get updated delegation token from zookeeper

2020-12-30 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256860#comment-17256860
 ] 

Yuxuan Wang commented on HDFS-15756:


I modify the code let router throw StandbyException, so the client can retry 
and failover to other routers.


{code:title=RouterSecurityManager.java|java}
renewDelegationToken(...){
  ...
 catch (SecretManager.InvalidToken e) {
  throw new StandbyException(e.getMessage());
} 
}
{code}

You can have a try.

> [RBF] Cannot get updated delegation token from zookeeper
> 
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15735) NameNode memory Leak on frequent execution of fsck

2020-12-17 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251445#comment-17251445
 ] 

Yuxuan Wang commented on HDFS-15735:


[~ayushtkn] I don't find any usage of tracer. Removing it is more sensible to 
me. 

> NameNode memory Leak on frequent execution of fsck  
> 
>
> Key: HDFS-15735
> URL: https://issues.apache.org/jira/browse/HDFS-15735
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-15735.001.patch
>
>
> The memory of the cluster NameNode continues to grow, and the full gc 
> eventually leads to the failure of the active and standby HDFS
> Htrace is used to track the processing time of fsck
> Checking the code it is found that the tracer object in NamenodeFsck.java was 
> only created but not closed because of this the memory footprint continues to 
> grow



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15734) [READ] DirectoryScanner#scan need not check StorageType.PROVIDED

2020-12-17 Thread Yuxuan Wang (Jira)
Yuxuan Wang created HDFS-15734:
--

 Summary: [READ] DirectoryScanner#scan need not check 
StorageType.PROVIDED
 Key: HDFS-15734
 URL: https://issues.apache.org/jira/browse/HDFS-15734
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yuxuan Wang
Assignee: Yuxuan Wang


Since https://issues.apache.org/jira/browse/HDFS-12777 , there is no PROVIDED 
storage in volume report.
We don't need check it in DirectoryScanner#scan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-12-10 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247113#comment-17247113
 ] 

Yuxuan Wang commented on HDFS-15383:


[~fengnanli] Thx for your reply. Very clear explanation.

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-12-09 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246982#comment-17246982
 ] 

Yuxuan Wang commented on HDFS-15383:


Hi ~ [~fengnanli][~elgoiri][~hexiaoqiao]

After disable watcher, tokens in router memory can be stale. And client may 
auth failed if the token is renewed but router don't rebuild cache yet.

Or there is some misunderstand in my mind? Plz figure out, Thx!

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15486) Costly sendResponse operation slows down async editlog handling

2020-07-22 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163224#comment-17163224
 ] 

Yuxuan Wang commented on HDFS-15486:



{code:java|title=FSEditLogAsync.java}
public void run() {
  ...
  while ((edit = syncWaitQ.poll()) != null) {
// We should parallel next code ?
edit.logSyncNotify(syncEx);
  }
  ...
}
{code}

Expect your patch.

If I'm wrong, correct me.

> Costly sendResponse operation slows down async editlog handling
> ---
>
> Key: HDFS-15486
> URL: https://issues.apache.org/jira/browse/HDFS-15486
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Yiqun Lin
>Priority: Major
> Attachments: Async-profile-(2).jpg, async-profile-(1).jpg
>
>
> When our cluster NameNode in a very high load, we find it often stuck in 
> Async-editlog handling.
> We use async-profile tool to get the flamegraph.
> !Async-profile-(2).jpg!
> This happened in that async editlog thread consumes Edit from the queue and 
> triggers the sendResponse call.
> But here the sendResponse call is a little expensive since our cluster 
> enabled the security env and will do some encode operations when doing the 
> return response operation.
> We often catch some moments of costly sendResponse operation when rpc call 
> queue is fulled.
> !async-profile-(1).jpg!
> Slowness on consuming Edit in async editlog will make Edit pending Queue 
> easily become the fulled state, then block its enqueue operation that is 
> invoked in writeLock type methods in FSNamesystem class.
> Here the enhancement is that we can use multiple thread to parallel execute 
> sendResponse call. sendResponse doesn't need use the write lock to do 
> protection, so this change is safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140277#comment-17140277
 ] 

Yuxuan Wang commented on HDFS-15419:


[~ayushtkn] Thanks for your reply.
IIRC, now router will retry not only when catch StandbyException, but also some 
other exception like ConnectionTimeoutExcetpion
IMO, we can improve the retry policy in router at least.
And I think add more retry is not a good work in this jira.

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-19 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140249#comment-17140249
 ] 

Yuxuan Wang commented on HDFS-15419:


[~bhji123]
Well, I more agree with [~ayushtkn]. And I think we should remove the retry 
code currently in router ranther than add more retry to it.
I see [~elgoiri] review the PR. How do you think Saxena's comment?

> RBF: Router should retry communicate with NN when cluster is unavailable 
> using configurable time interval
> -
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15419) Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-18 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139351#comment-17139351
 ] 

Yuxuan Wang commented on HDFS-15419:


[~ayushtkn] But router will retry or failover once currently in code. 
I don't know why it is involved by some patch. Should we file a jira to remove 
the logic ?

> Router should retry communicate with NN when cluster is unavailable using 
> configurable time interval
> 
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15419) Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-18 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139275#comment-17139275
 ] 

Yuxuan Wang commented on HDFS-15419:


Hi~[~bhji123] 
If router retry more times and longer, but clients' timeout and retry times are 
also here, how it works?

If router retry, but nn is still unavaliabe, and then clients timeout, finally 
clients retry. In this case, what's different between let router retry or let 
clients retry? 

> Router should retry communicate with NN when cluster is unavailable using 
> configurable time interval
> 
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15419) Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-18 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang reassigned HDFS-15419:
--

Assignee: (was: Yuxuan Wang)

> Router should retry communicate with NN when cluster is unavailable using 
> configurable time interval
> 
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15419) Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-18 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang reassigned HDFS-15419:
--

Assignee: Yuxuan Wang

> Router should retry communicate with NN when cluster is unavailable using 
> configurable time interval
> 
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Assignee: Yuxuan Wang
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15021) RBF: Delegation Token can't remove correctly in absence of cancelToken and restart the router.

2019-11-28 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984716#comment-16984716
 ] 

Yuxuan Wang commented on HDFS-15021:


I think TestZKDelegationTokenSecretManager#testNodesLoadedAfterRestart() 
already cover the case.

> RBF:  Delegation Token can't remove correctly in absence of cancelToken and 
> restart the router.
> ---
>
> Key: HDFS-15021
> URL: https://issues.apache.org/jira/browse/HDFS-15021
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Weidong Duan
>Priority: Major
>
> The ZKDelegationTokenSecretManager couldn't remove the expired DTs on the 
> Zookeeper as expected when restart the Router in the absence of invoking the 
> method ` ZKDelegationTokenSecretManager#cancelToken`.
> This case will cause many stale DTs leave on the Zookeeper . Maybe cause the 
> performance problem of the Router.
> I think this is a bug and  should be resolved in the latter. Is it Right?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14962) RBF: ConnectionPool#newConnection() error log wrong protocol class

2019-11-06 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968871#comment-16968871
 ] 

Yuxuan Wang commented on HDFS-14962:


Very *small* PR created.

> RBF: ConnectionPool#newConnection() error log wrong protocol class
> --
>
> Key: HDFS-14962
> URL: https://issues.apache.org/jira/browse/HDFS-14962
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Minor
>  Labels: RBF
>
> ConnectionPool#newConnection() has following code:
> {code:java}
> String msg = "Unsupported protocol for connection to NameNode: "
> + ((proto != null) ? proto.getClass().getName() : "null");
> {code}
> *proto.getClass().getName()* should be *proto.getName()*
> My IDE can figure out the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14962) RBF: ConnectionPool#newConnection() error log wrong protocol class

2019-11-06 Thread Yuxuan Wang (Jira)
Yuxuan Wang created HDFS-14962:
--

 Summary: RBF: ConnectionPool#newConnection() error log wrong 
protocol class
 Key: HDFS-14962
 URL: https://issues.apache.org/jira/browse/HDFS-14962
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Affects Versions: 3.3.0
Reporter: Yuxuan Wang
Assignee: Yuxuan Wang


ConnectionPool#newConnection() has following code:
{code:java}
String msg = "Unsupported protocol for connection to NameNode: "
+ ((proto != null) ? proto.getClass().getName() : "null");
{code}
*proto.getClass().getName()* should be *proto.getName()*

My IDE can figure out the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-10-10 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948370#comment-16948370
 ] 

Yuxuan Wang commented on HDFS-14509:


I uploaded branch-2 patch. But I can't build it locally. Pls check the UT 
failed if related, thanks [~vagarychen].
 In my practice, the *instanceof* is always true. But I can't confirm it is 
true overall, so I add if-condition here.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch, HDFS-14509-002.patch, 
> HDFS-14509-003.patch, HDFS-14509-branch-2.001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-10-10 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14509:
---
Attachment: HDFS-14509-branch-2.001.patch

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch, HDFS-14509-002.patch, 
> HDFS-14509-003.patch, HDFS-14509-branch-2.001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-10-07 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946443#comment-16946443
 ] 

Yuxuan Wang commented on HDFS-14509:


Sorry for delay it. I upload 003 patch. Feel free to take over this jira.
[~shv] Thanks for your patch.
[~vagarychen] Thanks for your review.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch, HDFS-14509-002.patch, 
> HDFS-14509-003.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-10-07 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14509:
---
Attachment: HDFS-14509-003.patch

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch, HDFS-14509-002.patch, 
> HDFS-14509-003.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940327#comment-16940327
 ] 

Yuxuan Wang commented on HDFS-14509:


[~ferhui] Oh, It's other approach can do, not my PR.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940326#comment-16940326
 ] 

Yuxuan Wang commented on HDFS-14509:


{quote}
As for unit tests I think we need two

that verifies the upgrade from 2.x to 3.x is possible.
that verifies the upgrade from 2.x-1 to 2.x is still possible.
{quote}
I don't understand the comment's meaning. What's 2.x-1 ? Patched with this 
patch ?
And  upgrade from 2.x to 3.x is impossible not possible?
I'll improve UT and fix checkstyle together after I realize how to.
I don't know your github ID, feel free to comment at PR.
Thanks for your review [~shv].

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940277#comment-16940277
 ] 

Yuxuan Wang commented on HDFS-14509:


[~John Smith] If we add some fields in the future, we still need this patch. 
Trunk is better.
According to hadoop's doc, we should update NN first. At that time, the block 
token will have new fields attached which DN not upgraded yet can't recognize. 
So we have to backport it to 2.x branch and upgrade DN before upgrade to 3.x .
Or I miss [~shv]'s some comment. Can you quote it ? [~ferhui]

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940266#comment-16940266
 ] 

Yuxuan Wang commented on HDFS-14509:


[~ferhui] If it's just for resolving upgrading from 2.x to 3.x, we need patch 
it before HDFS-6708 and HDFS-9807.
But I think it's a general solution, I suggest to target to trunk and backport 
to 2.x branch.


> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940253#comment-16940253
 ] 

Yuxuan Wang commented on HDFS-14509:


[~brahmareddy] Yes, existing cluster might need to update this patch before 
upgrade.
I have reopened PR, pending yetus and we can go ahead. 

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-25 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938192#comment-16938192
 ] 

Yuxuan Wang commented on HDFS-14509:


OK, I'll update my PR later.
Thanks for [~shv], [~vagarychen]'s comments.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-24 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937300#comment-16937300
 ] 

Yuxuan Wang commented on HDFS-14509:


[~shv] Thanks for your explanation! That's what I mean!
If someone determines which approach we choose, we can go follow.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14853) NPE in DFSNetworkTopology#chooseRandomWithStorageType() when the excludedNode is deleted

2019-09-19 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933463#comment-16933463
 ] 

Yuxuan Wang commented on HDFS-14853:


We should import .* in TestDFSNetworkTopology.java ?

> NPE in DFSNetworkTopology#chooseRandomWithStorageType() when the excludedNode 
> is deleted
> 
>
> Key: HDFS-14853
> URL: https://issues.apache.org/jira/browse/HDFS-14853
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ranith Sardar
>Assignee: Ranith Sardar
>Priority: Major
> Attachments: HDFS-14853.001.patch
>
>
>  
> {{org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.net.DFSNetworkTopology.chooseRandomWithStorageType(DFSNetworkTopology.java:229)
>   at 
> org.apache.hadoop.hdfs.net.DFSNetworkTopology.chooseRandomWithStorageType(DFSNetworkTopology.java:77)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14117) RBF: We can only delete the files or dirs of one subcluster in a cluster with multiple subclusters when trash is enabled

2019-09-16 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930414#comment-16930414
 ] 

Yuxuan Wang commented on HDFS-14117:


Hi [~elgoiri], [~xuzq_zander], [~ayushtkn]. Are you still working on this? I 
think the patch is critical for clusters running in real world.
After go through the related jira, I can't figure out where we block? I hope 
someone can remind me thx.

> RBF: We can only delete the files or dirs of one subcluster in a cluster with 
> multiple subclusters when trash is enabled
> 
>
> Key: HDFS-14117
> URL: https://issues.apache.org/jira/browse/HDFS-14117
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: venkata ramkumar
>Assignee: venkata ramkumar
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14117-HDFS-13891.001.patch, 
> HDFS-14117-HDFS-13891.002.patch, HDFS-14117-HDFS-13891.003.patch, 
> HDFS-14117-HDFS-13891.004.patch, HDFS-14117-HDFS-13891.005.patch, 
> HDFS-14117-HDFS-13891.006.patch, HDFS-14117-HDFS-13891.007.patch, 
> HDFS-14117-HDFS-13891.008.patch, HDFS-14117-HDFS-13891.009.patch, 
> HDFS-14117-HDFS-13891.010.patch, HDFS-14117-HDFS-13891.011.patch, 
> HDFS-14117-HDFS-13891.012.patch, HDFS-14117-HDFS-13891.013.patch, 
> HDFS-14117-HDFS-13891.014.patch, HDFS-14117-HDFS-13891.015.patch, 
> HDFS-14117-HDFS-13891.016.patch, HDFS-14117-HDFS-13891.017.patch, 
> HDFS-14117-HDFS-13891.018.patch, HDFS-14117-HDFS-13891.019.patch, 
> HDFS-14117-HDFS-13891.020.patch, HDFS-14117.001.patch, HDFS-14117.002.patch, 
> HDFS-14117.003.patch, HDFS-14117.004.patch, HDFS-14117.005.patch
>
>
> When we delete files or dirs in hdfs, it will move the deleted files or dirs 
> to trash by default.
> But in the global path we can only mount one trash dir /user. So we mount 
> trash dir /user of the subcluster ns1 to the global path /user. Then we can 
> delete files or dirs of ns1, but when we delete the files or dirs of another 
> subcluser, such as hacluster, it will be failed.
> h1. Mount Table
> ||Global path||Target nameservice||Target path||Order||Read 
> only||Owner||Group||Permission||Quota/Usage||Date Modified||Date Created||
> |/test|hacluster2|/test| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: 
> -/-]|2018/11/29 14:37:42|2018/11/29 14:37:42|
> |/tmp|hacluster1|/tmp| | |securedn|users|rwxr-xr-x|[NsQuota: -/-, SsQuota: 
> -/-]|2018/11/29 14:37:05|2018/11/29 14:37:05|
> |/user|hacluster2,hacluster1|/user|HASH| |securedn|users|rwxr-xr-x|[NsQuota: 
> -/-, SsQuota: -/-]|2018/11/29 14:42:37|2018/11/29 14:38:20|
> commands: 
> {noformat}
> 1./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /test/.
> 18/11/30 11:00:47 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> -rw-r--r-- 3 securedn supergroup 8081 2018-11-30 10:56 /test/hdfs.cmd
> 2./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -ls /tmp/.
> 18/11/30 11:00:40 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> -rw-r--r--   3 securedn supergroup   6311 2018-11-30 10:57 /tmp/mapred.cmd
> 3../opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm 
> /tmp/mapred.cmd
> 18/11/30 11:01:02 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> rm: Failed to move to trash: hdfs://router/tmp/mapred.cmd: rename destination 
> parent /user/securedn/.Trash/Current/tmp/mapred.cmd not found.
> 4./opt/HAcluater_ram1/install/hadoop/router/bin> ./hdfs dfs -rm /test/hdfs.cmd
> 18/11/30 11:01:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/11/30 11:01:22 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://router/test/hdfs.cmd' to trash at: 
> hdfs://router/user/securedn/.Trash/Current/test/hdfs.cmd
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-15 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930209#comment-16930209
 ] 

Yuxuan Wang commented on HDFS-14509:


[~ferhui] Client's version is unimportant, since client just "forward" the 
block token without reading its fields.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-08-27 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917352#comment-16917352
 ] 

Yuxuan Wang commented on HDFS-14509:


[~shv], we don't have a conclusion here. But my solution need both 3.x and 2.x 
apply a patch.

I' ll attach a my later.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-26 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14772:
---
Attachment: HDFS-14772.004.patch

> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14772.001.patch, HDFS-14772.002.patch, 
> HDFS-14772.003.patch, HDFS-14772.004.patch
>
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public static void init() {
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-26 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916267#comment-16916267
 ] 

Yuxuan Wang commented on HDFS-14772:


Fix checksytle, pending Jenkins.

> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14772.001.patch, HDFS-14772.002.patch, 
> HDFS-14772.003.patch, HDFS-14772.004.patch
>
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public static void init() {
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-26 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915679#comment-16915679
 ] 

Yuxuan Wang commented on HDFS-14772:


[~tasanuma] OK, I submit a new patch.

> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14772.001.patch, HDFS-14772.002.patch, 
> HDFS-14772.003.patch
>
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public static void init() {
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-26 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14772:
---
Attachment: HDFS-14772.003.patch

> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14772.001.patch, HDFS-14772.002.patch, 
> HDFS-14772.003.patch
>
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public static void init() {
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-26 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915499#comment-16915499
 ] 

Yuxuan Wang commented on HDFS-14772:


[~surendrasingh] [~tasanuma] [~elgoiri] Thanks for your comments. Add a new 
patch.
 Do we need remove the code here?
{code:title=RBFConfigKeys.java|borderStyle=solid}
  static {
Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
  }
{code}

> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14772.001.patch, HDFS-14772.002.patch
>
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public static void init() {
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-26 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14772:
---
Attachment: HDFS-14772.002.patch

> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14772.001.patch, HDFS-14772.002.patch
>
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public static void init() {
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14761) RBF: MountTableResolver cannot invalidate cache correctly

2019-08-23 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14761:
---
Status: Patch Available  (was: In Progress)

> RBF: MountTableResolver cannot invalidate cache correctly
> -
>
> Key: HDFS-14761
> URL: https://issues.apache.org/jira/browse/HDFS-14761
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: draft-reproduce-patch-HDFS-14761.patch
>
>
> STEPS TO REPRODUCE:
> add mount table entry 1->/
> mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
> correct
> add mount table entry 2->/foo
> mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
> still return "1->/foo/a"
> WHY:
> {code:title=MountTableResolver.java|borderStyle=solid}
> private void invalidateLocationCache(...)
> {
> ...
> String src = loc.getSourcePath();
> if (src != null) {
> if (isParentEntry(src, path)) {
>   LOG.debug("Removing {}", src);
>   it.remove();
> }
> }
> ...
> }
> {code}
> *path* is the new entry, in our case is "/foo"
> But *src* is the mount point path, in our case is "/", which isn't child of 
> "/foo"
> So, it can't invalidate the cache entry.
> HOW TO FIX:
> Just reverse the parameters of *isParentEntry* .
> PS:
> *PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
> attached a comment about "Source path in global namespace.". But I think the 
> field indeed denotes the mount point path after I review the code. I think 
> it's confused.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-23 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-14772 started by Yuxuan Wang.
--
> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14772.001.patch
>
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public static void init() {
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-23 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14772:
---
Status: Patch Available  (was: In Progress)

> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14772.001.patch
>
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public static void init() {
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-23 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14772:
---
Attachment: HDFS-14772.001.patch

> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-14772.001.patch
>
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public static void init() {
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-23 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14772:
---
Description: 
ISSUE:
hdfs-rbf-site.xml can't be loaded automatically
WHY:
Currently the code is 
{code:title=RBFConfigKeys.java|borderStyle=solid}
  static {
Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
  }
{code}
But it will never be executed unless we explicitly load the class.
HOW TO FIX:
Reference to class *HdfsConfiguration*, make a method
{code:title=RBFConfigKeys.java|borderStyle=solid}
  public static void init() {
  }
{code}
and call it in other class.

  was:
ISSUE:
hdfs-rbf-site.xml can't be loaded automatically
WHY:
Currently the code is 
{code:title=RBFConfigKeys.java|borderStyle=solid}
  static {
Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
  }
{code}
But it will never be executed unless we explicitly load the class.
HOW TO FIX:
Reference to class *HdfsConfiguration*, make a public constructor
{code:title=RBFConfigKeys.java|borderStyle=solid}
  public RBFConfigKeys() {
super();
  }
{code}
and call it in other class.


> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public static void init() {
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-23 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14772:
---
Description: 
ISSUE:
hdfs-rbf-site.xml can't be loaded automatically
WHY:
Currently the code is 
{code:title=RBFConfigKeys.java|borderStyle=solid}
  static {
Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
  }
{code}
But it will never be executed unless we explicitly load the class.
HOW TO FIX:
Reference to class *HdfsConfiguration*, make a public constructor
{code:title=RBFConfigKeys.java|borderStyle=solid}
  public RBFConfigKeys() {
super();
  }
{code}
and call it in other class.

  was:
ISSUE:
hdfs-rbf-site.xml can't be loaded automatically
WHY:
Currently the code is 
{code:title=RBFConfigKeys.java|borderStyle=solid}
  static {
Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
  }
{code}
But it will never be executed unless we explicitly load the class.
HOW TO FIX:
Reference to class *HdfsConfiguration*, make a method
{code:title=RBFConfigKeys.java|borderStyle=solid}
  public HdfsConfiguration() {
super();
  }
}
{code}
and call it in other class.


> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a public constructor
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public RBFConfigKeys() {
> super();
>   }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-23 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14772:
---
Description: 
ISSUE:
hdfs-rbf-site.xml can't be loaded automatically
WHY:
Currently the code is 
{code:title=RBFConfigKeys.java|borderStyle=solid}
  static {
Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
  }
{code}
But it will never be executed unless we explicitly load the class.
HOW TO FIX:
Reference to class *HdfsConfiguration*, make a method
{code:title=RBFConfigKeys.java|borderStyle=solid}
  public HdfsConfiguration() {
super();
  }
}
{code}
and call it in other class.

  was:
ISSUE:
hdfs-rbf-site.xml can't be loaded automatically
WHY:
Currently the code is 
{code:title=RBFConfigKeys.java|borderStyle=solid}
  static {
Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
  }
{code}
But it will never be executed unless we explicitly load the class.
HOW TO FIX:
Reference to class *HdfsConfiguration*, make a method
{code:title=RBFConfigKeys.java|borderStyle=solid}
  public HdfsConfiguration() {
super();
  }
}
and call it in other class.


> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public HdfsConfiguration() {
> super();
>   }
> }
> {code}
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14772) RBF: hdfs-rbf-site.xml can't be loaded automatically

2019-08-23 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14772:
---
Description: 
ISSUE:
hdfs-rbf-site.xml can't be loaded automatically
WHY:
Currently the code is 
{code:title=RBFConfigKeys.java|borderStyle=solid}
  static {
Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
  }
{code}
But it will never be executed unless we explicitly load the class.
HOW TO FIX:
Reference to class *HdfsConfiguration*, make a method
{code:title=RBFConfigKeys.java|borderStyle=solid}
  public HdfsConfiguration() {
super();
  }
}
and call it in other class.
Summary: RBF: hdfs-rbf-site.xml can't be loaded automatically  (was: 
RBF)

> RBF: hdfs-rbf-site.xml can't be loaded automatically
> 
>
> Key: HDFS-14772
> URL: https://issues.apache.org/jira/browse/HDFS-14772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
>
> ISSUE:
> hdfs-rbf-site.xml can't be loaded automatically
> WHY:
> Currently the code is 
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   static {
> Configuration.addDefaultResource(HDFS_RBF_SITE_XML);
>   }
> {code}
> But it will never be executed unless we explicitly load the class.
> HOW TO FIX:
> Reference to class *HdfsConfiguration*, make a method
> {code:title=RBFConfigKeys.java|borderStyle=solid}
>   public HdfsConfiguration() {
> super();
>   }
> }
> and call it in other class.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14772) RBF

2019-08-23 Thread Yuxuan Wang (Jira)
Yuxuan Wang created HDFS-14772:
--

 Summary: RBF
 Key: HDFS-14772
 URL: https://issues.apache.org/jira/browse/HDFS-14772
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Reporter: Yuxuan Wang
Assignee: Yuxuan Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-14761) RBF: MountTableResolver cannot invalidate cache correctly

2019-08-22 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-14761 started by Yuxuan Wang.
--
> RBF: MountTableResolver cannot invalidate cache correctly
> -
>
> Key: HDFS-14761
> URL: https://issues.apache.org/jira/browse/HDFS-14761
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: draft-reproduce-patch-HDFS-14761.patch
>
>
> STEPS TO REPRODUCE:
> add mount table entry 1->/
> mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
> correct
> add mount table entry 2->/foo
> mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
> still return "1->/foo/a"
> WHY:
> {code:title=MountTableResolver.java|borderStyle=solid}
> private void invalidateLocationCache(...)
> {
> ...
> String src = loc.getSourcePath();
> if (src != null) {
> if (isParentEntry(src, path)) {
>   LOG.debug("Removing {}", src);
>   it.remove();
> }
> }
> ...
> }
> {code}
> *path* is the new entry, in our case is "/foo"
> But *src* is the mount point path, in our case is "/", which isn't child of 
> "/foo"
> So, it can't invalidate the cache entry.
> HOW TO FIX:
> Just reverse the parameters of *isParentEntry* .
> PS:
> *PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
> attached a comment about "Source path in global namespace.". But I think the 
> field indeed denotes the mount point path after I review the code. I think 
> it's confused.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14761) RBF: MountTableResolver cannot invalidate cache correctly

2019-08-22 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913082#comment-16913082
 ] 

Yuxuan Wang commented on HDFS-14761:


Hi, [~wuweiwei], [~elgoiri]. I see you worked on *LocationCache* ever. Do you 
mind take a look? I create a github PR here.

> RBF: MountTableResolver cannot invalidate cache correctly
> -
>
> Key: HDFS-14761
> URL: https://issues.apache.org/jira/browse/HDFS-14761
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: draft-reproduce-patch-HDFS-14761.patch
>
>
> STEPS TO REPRODUCE:
> add mount table entry 1->/
> mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
> correct
> add mount table entry 2->/foo
> mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
> still return "1->/foo/a"
> WHY:
> {code:title=MountTableResolver.java|borderStyle=solid}
> private void invalidateLocationCache(...)
> {
> ...
> String src = loc.getSourcePath();
> if (src != null) {
> if (isParentEntry(src, path)) {
>   LOG.debug("Removing {}", src);
>   it.remove();
> }
> }
> ...
> }
> {code}
> *path* is the new entry, in our case is "/foo"
> But *src* is the mount point path, in our case is "/", which isn't child of 
> "/foo"
> So, it can't invalidate the cache entry.
> HOW TO FIX:
> Just reverse the parameters of *isParentEntry* .
> PS:
> *PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
> attached a comment about "Source path in global namespace.". But I think the 
> field indeed denotes the mount point path after I review the code. I think 
> it's confused.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14761) RBF: MountTableResolver cannot invalidate cache correctly

2019-08-21 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911964#comment-16911964
 ] 

Yuxuan Wang commented on HDFS-14761:


[~zhangchen] Thanks for your comment. My bad. I'll attach a patch to fix it 
later.

> RBF: MountTableResolver cannot invalidate cache correctly
> -
>
> Key: HDFS-14761
> URL: https://issues.apache.org/jira/browse/HDFS-14761
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: draft-reproduce-patch-HDFS-14761.patch
>
>
> STEPS TO REPRODUCE:
> add mount table entry 1->/
> mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
> correct
> add mount table entry 2->/foo
> mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
> still return "1->/foo/a"
> WHY:
> {code:title=MountTableResolver.java|borderStyle=solid}
> private void invalidateLocationCache(...)
> {
> ...
> String src = loc.getSourcePath();
> if (src != null) {
> if (isParentEntry(src, path)) {
>   LOG.debug("Removing {}", src);
>   it.remove();
> }
> }
> ...
> }
> {code}
> *path* is the new entry, in our case is "/foo"
> But *src* is the mount point path, in our case is "/", which isn't child of 
> "/foo"
> So, it can't invalidate the cache entry.
> HOW TO FIX:
> Just reverse the parameters of *isParentEntry* .
> PS:
> *PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
> attached a comment about "Source path in global namespace.". But I think the 
> field indeed denotes the mount point path after I review the code. I think 
> it's confused.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14761) RBF: MountTableResolver cannot invalidate cache correctly

2019-08-20 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14761:
---
Description: 
STEPS TO REPRODUCE:
add mount table entry 1->/
mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
correct
add mount table entry 2->/foo
mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
still return "1->/foo/a"
WHY:
{code:title=MountTableResolver.java|borderStyle=solid}
private void invalidateLocationCache(...)
{
...
String src = loc.getSourcePath();
if (src != null) {
if (isParentEntry(src, path)) {
  LOG.debug("Removing {}", src);
  it.remove();
}
}
...
}
{code}
*path* is the new entry, in our case is "/foo"
But *src* is the mount point path, in our case is "/", which isn't child of 
"/foo"
So, it can't invalidate the cache entry.
HOW TO FIX:
Just reverse the parameters of *isParentEntry* .
PS:
*PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
attached a comment about "Source path in global namespace.". But I think the 
field indeed denotes the mount point path after I review the code. I think it's 
confused.

  was:
STEPS TO REPRODUCE:
add mount table entry 1->/
mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
correct
add mount table entry 2->/foo
mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
still return "1->/foo/a"
WHY:
{code:title=MountTableResolver.java|borderStyle=solid}
private void invalidateLocationCache(...)
{
...
String src = loc.getSourcePath();
if (src != null) {
if (isParentEntry(src, path)) {
  LOG.debug("Removing {}", src);
  it.remove();
}
}
...
}
{code}
*path* is the new entry, in our case is "/foo"
But src is the mount point path, in our case is "/", which isn't child of "/foo"
So, it can't invalidate the cache entry.
HOW TO FIX:
Just reverse the parameters of *isParentEntry* .
PS:
*PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
attached a comment about "Source path in global namespace.". But I think the 
field indeed denotes the mount point path after I review the code. I think it's 
confused.


> RBF: MountTableResolver cannot invalidate cache correctly
> -
>
> Key: HDFS-14761
> URL: https://issues.apache.org/jira/browse/HDFS-14761
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: draft-reproduce-patch-HDFS-14761.patch
>
>
> STEPS TO REPRODUCE:
> add mount table entry 1->/
> mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
> correct
> add mount table entry 2->/foo
> mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
> still return "1->/foo/a"
> WHY:
> {code:title=MountTableResolver.java|borderStyle=solid}
> private void invalidateLocationCache(...)
> {
> ...
> String src = loc.getSourcePath();
> if (src != null) {
> if (isParentEntry(src, path)) {
>   LOG.debug("Removing {}", src);
>   it.remove();
> }
> }
> ...
> }
> {code}
> *path* is the new entry, in our case is "/foo"
> But *src* is the mount point path, in our case is "/", which isn't child of 
> "/foo"
> So, it can't invalidate the cache entry.
> HOW TO FIX:
> Just reverse the parameters of *isParentEntry* .
> PS:
> *PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
> attached a comment about "Source path in global namespace.". But I think the 
> field indeed denotes the mount point path after I review the code. I think 
> it's confused.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14761) RBF: MountTableResolver cannot invalidate cache correctly

2019-08-20 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang reassigned HDFS-14761:
--

Assignee: Yuxuan Wang

> RBF: MountTableResolver cannot invalidate cache correctly
> -
>
> Key: HDFS-14761
> URL: https://issues.apache.org/jira/browse/HDFS-14761
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: draft-reproduce-patch-HDFS-14761.patch
>
>
> STEPS TO REPRODUCE:
> add mount table entry 1->/
> mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
> correct
> add mount table entry 2->/foo
> mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
> still return "1->/foo/a"
> WHY:
> {code:title=MountTableResolver.java|borderStyle=solid}
> private void invalidateLocationCache(...)
> {
> ...
> String src = loc.getSourcePath();
> if (src != null) {
> if (isParentEntry(src, path)) {
>   LOG.debug("Removing {}", src);
>   it.remove();
> }
> }
> ...
> }
> {code}
> *path* is the new entry, in our case is "/foo"
> But src is the mount point path, in our case is "/", which isn't child of 
> "/foo"
> So, it can't invalidate the cache entry.
> HOW TO FIX:
> Just reverse the parameters of *isParentEntry* .
> PS:
> *PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
> attached a comment about "Source path in global namespace.". But I think the 
> field indeed denotes the mount point path after I review the code. I think 
> it's confused.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14761) RBF: MountTableResolver cannot invalidate cache correctly

2019-08-20 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911912#comment-16911912
 ] 

Yuxuan Wang commented on HDFS-14761:


Attach a UT patch that can reproduce the issue.

> RBF: MountTableResolver cannot invalidate cache correctly
> -
>
> Key: HDFS-14761
> URL: https://issues.apache.org/jira/browse/HDFS-14761
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: draft-reproduce-patch-HDFS-14761.patch
>
>
> STEPS TO REPRODUCE:
> add mount table entry 1->/
> mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
> correct
> add mount table entry 2->/foo
> mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
> still return "1->/foo/a"
> WHY:
> {code:title=MountTableResolver.java|borderStyle=solid}
> private void invalidateLocationCache(...)
> {
> ...
> String src = loc.getSourcePath();
> if (src != null) {
> if (isParentEntry(src, path)) {
>   LOG.debug("Removing {}", src);
>   it.remove();
> }
> }
> ...
> }
> {code}
> *path* is the new entry, in our case is "/foo"
> But src is the mount point path, in our case is "/", which isn't child of 
> "/foo"
> So, it can't invalidate the cache entry.
> HOW TO FIX:
> Just reverse the parameters of *isParentEntry* .
> PS:
> *PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
> attached a comment about "Source path in global namespace.". But I think the 
> field indeed denotes the mount point path after I review the code. I think 
> it's confused.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14761) RBF: MountTableResolver cannot invalidate cache correctly

2019-08-20 Thread Yuxuan Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14761:
---
Attachment: draft-reproduce-patch-HDFS-14761.patch

> RBF: MountTableResolver cannot invalidate cache correctly
> -
>
> Key: HDFS-14761
> URL: https://issues.apache.org/jira/browse/HDFS-14761
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yuxuan Wang
>Priority: Major
>  Labels: RBF
> Attachments: draft-reproduce-patch-HDFS-14761.patch
>
>
> STEPS TO REPRODUCE:
> add mount table entry 1->/
> mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
> correct
> add mount table entry 2->/foo
> mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
> still return "1->/foo/a"
> WHY:
> {code:title=MountTableResolver.java|borderStyle=solid}
> private void invalidateLocationCache(...)
> {
> ...
> String src = loc.getSourcePath();
> if (src != null) {
> if (isParentEntry(src, path)) {
>   LOG.debug("Removing {}", src);
>   it.remove();
> }
> }
> ...
> }
> {code}
> *path* is the new entry, in our case is "/foo"
> But src is the mount point path, in our case is "/", which isn't child of 
> "/foo"
> So, it can't invalidate the cache entry.
> HOW TO FIX:
> Just reverse the parameters of *isParentEntry* .
> PS:
> *PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
> attached a comment about "Source path in global namespace.". But I think the 
> field indeed denotes the mount point path after I review the code. I think 
> it's confused.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14761) RBF: MountTableResolver cannot invalidate cache correctly

2019-08-20 Thread Yuxuan Wang (Jira)
Yuxuan Wang created HDFS-14761:
--

 Summary: RBF: MountTableResolver cannot invalidate cache correctly
 Key: HDFS-14761
 URL: https://issues.apache.org/jira/browse/HDFS-14761
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Reporter: Yuxuan Wang


STEPS TO REPRODUCE:
add mount table entry 1->/
mountTable.getDestinationForPath("/foo/a") will return "1->/foo/a", that's 
correct
add mount table entry 2->/foo
mountTable.getDestinationForPath("/foo/a") should return "2->/foo/a", but it 
still return "1->/foo/a"
WHY:
{code:title=MountTableResolver.java|borderStyle=solid}
private void invalidateLocationCache(...)
{
...
String src = loc.getSourcePath();
if (src != null) {
if (isParentEntry(src, path)) {
  LOG.debug("Removing {}", src);
  it.remove();
}
}
...
}
{code}
*path* is the new entry, in our case is "/foo"
But src is the mount point path, in our case is "/", which isn't child of "/foo"
So, it can't invalidate the cache entry.
HOW TO FIX:
Just reverse the parameters of *isParentEntry* .
PS:
*PathLocation#getSourcePath()* will return *PathLocation#sourcePath*, which 
attached a comment about "Source path in global namespace.". But I think the 
field indeed denotes the mount point path after I review the code. I think it's 
confused.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14756) RBF: getQuotaUsage may ignore some folders

2019-08-20 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911890#comment-16911890
 ] 

Yuxuan Wang edited comment on HDFS-14756 at 8/21/19 2:24 AM:
-

I find it other *startsWith()* exists, maybe we can file another jira to fix 
them once?


was (Author: john smith):
I find it other *startWith()* exists, maybe we can file another jira to fix 
them once?

> RBF: getQuotaUsage may ignore some folders
> --
>
> Key: HDFS-14756
> URL: https://issues.apache.org/jira/browse/HDFS-14756
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14756.001.patch
>
>
> {{getValidQuotaLocations}} want to filter duplicate subfolders, but it used 
> wrong method to determine the parent folder. In this logic, if we have 2 
> mountpoint like /miui and /miuiads, then /miuiads will be ignored.
> {code:java}
> private List getValidQuotaLocations(String path)
> throws IOException {
>   final List locations = getQuotaRemoteLocations(path);
>   // NameService -> Locations
>   ListMultimap validLocations =
>   ArrayListMultimap.create();
>   for (RemoteLocation loc : locations) {
> final String nsId = loc.getNameserviceId();
> final Collection dests = validLocations.get(nsId);
> // Ensure the paths in the same nameservice is different.
> // Do not include parent-child paths.
> boolean isChildPath = false;
> for (RemoteLocation d : dests) {
>   if (StringUtils.startsWith(loc.getDest(), d.getDest())) {
> isChildPath = true;
> break;
>   }
> }
> if (!isChildPath) {
>   validLocations.put(nsId, loc);
> }
>   }
>   return Collections
>   .unmodifiableList(new ArrayList<>(validLocations.values()));
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2019-08-20 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911105#comment-16911105
 ] 

Yuxuan Wang commented on HDFS-13596:


[~aajisaka] Additionally, you can check HDFS-8432

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Blocker
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, 
> HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, 
> HDFS-13596.009.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> 

[jira] [Commented] (HDFS-14396) Failed to load image from FSImageFile when downgrade from 3.x to 2.x

2019-08-19 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910978#comment-16910978
 ] 

Yuxuan Wang commented on HDFS-14396:


[~jojochuang] With HDFS-13596 , NN will not accept EC operation and also should 
not persist it until finalize is done.

> Failed to load image from FSImageFile when downgrade from 3.x to 2.x
> 
>
> Key: HDFS-14396
> URL: https://issues.apache.org/jira/browse/HDFS-14396
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14396.001.patch, HDFS-14396.002.patch
>
>
> After fixing HDFS-13596, try to downgrade from 3.x to 2.x. But namenode can't 
> start because exception occurs. The message follows
> {code:java}
> 2019-01-23 17:22:18,730 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Failed to load image from 
> FSImageFile(file=/data1/hadoopdata/hadoop-namenode/current/fsimage_0025310,
>  cpktTxId=00
> 25310)
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:179)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:885)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:869)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:742)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:673)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:998)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:700)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:612)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:672)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:839)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1517)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1583)
> 2019-01-23 17:22:19,023 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: Failed to load FSImage file, see error(s) above for more 
> info.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:998)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:700)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:612)
> {code}
> This issue occurs because 3.x namenode saves image with EC fields during 
> upgrade
> Try to fix it



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14551) NN throws NPE if downgrade it during rolling upgrade from 3.x to 2.x

2019-08-19 Thread Yuxuan Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang resolved HDFS-14551.

Resolution: Duplicate

> NN throws NPE if downgrade it during rolling upgrade from 3.x to 2.x
> 
>
> Key: HDFS-14551
> URL: https://issues.apache.org/jira/browse/HDFS-14551
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>
> We can downgrade NN during roling upgrade (runned "hdfs dfsadmin 
> -rollingUpgrade prepare) with HDFS-8432 involved. But with HDFS-14172 if the 
> image has any unrecogized section, it will throw IOException at
> {code:title=FSImageFormatProtobuf.java|borderStyle=solid}
> private void loadInternal(..) {
> ..
> String n = s.getName();
> SectionName sectionName = SectionName.fromString(n);
> if (sectionName == null) {
>   throw new IOException("Unrecognized section " + n);
> }
> ..
> }
> {code}
> and throw NPE on Hadoop 2.x
> {code:title=FSImageFormatProtobuf.java|borderStyle=solid}
> private void loadInternal(..) {
> ..
> String n = s.getName();
> switch (sectionName)
> ..
> }
> {code}
> When we downgrade NN from 3.x to 2.x, NN may load the image saved by 3.x NN. 
> Then the lack of {{SectionName.ERASURE_CODING}} can break 2.x NN.
> We should just skip the unrecogized section instead of throwing exception.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14710) RBF: Improve some RPC performances

2019-08-12 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905170#comment-16905170
 ] 

Yuxuan Wang commented on HDFS-14710:


Before the patch, if I change/add a mount table entry which involving some 
opening files, clients may get some unexpected exceptions. e.g. client calls 
*complete* but the destination has changed, we should resolve to original path 
not the new.
And here, *abandonBlock*, *updateBlockForPipeline*, *updatePipeline* just use 
parameter *extendedBlock* to determine the dest.
I mean, I think it's not just Improve some RPC performances.

> RBF: Improve some RPC performances
> --
>
> Key: HDFS-14710
> URL: https://issues.apache.org/jira/browse/HDFS-14710
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: xuzq
>Assignee: xuzq
>Priority: Minor
> Attachments: HDFS-14710-trunk-001.patch, HDFS-14710-trunk-002.patch, 
> HDFS-14710-trunk-003.patch
>
>
> We can improve some RPC performance if the extendedBlock is not null.
> Such as addBlock, getAdditionalDatanode and complete.
> Since HDFS encourages user to write large files, so the extendedBlock is not 
> null in most case.
> In the scenario of Multiple Destination and large file, the effect is more 
> obvious.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14654) RBF: TestRouterRpc tests are flaky

2019-08-12 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905153#comment-16905153
 ] 

Yuxuan Wang commented on HDFS-14654:


{code:title=MockResolver.java|borderStyle=solid}
public synchronized Set getNamespaces()
{code}
Do we really need *synchronized* here?

> RBF: TestRouterRpc tests are flaky
> --
>
> Key: HDFS-14654
> URL: https://issues.apache.org/jira/browse/HDFS-14654
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Takanobu Asanuma
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14654.001.patch, HDFS-14654.002.patch, error.log
>
>
> They sometimes pass and sometimes fail.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14674) Got an unexpected txid when tail editlog

2019-07-31 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896844#comment-16896844
 ] 

Yuxuan Wang commented on HDFS-14674:


I find the same problem after HDFS-12978. The fix looks good.

> Got an unexpected txid when tail editlog
> 
>
> Key: HDFS-14674
> URL: https://issues.apache.org/jira/browse/HDFS-14674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Major
> Attachments: HDFS-14674-001.patch, image-2019-07-26-11-34-23-405.png
>
>
> Add the following configuration
> !image-2019-07-26-11-34-23-405.png!
> error:
> {code:java}
> //代码占位符
> [2019-07-17T11:50:21.048+08:00] [INFO] [Edit log tailer] : replaying edit 
> log: 1/20512836 transactions completed. (0%) [2019-07-17T11:50:21.059+08:00] 
> [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003=232056426162=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003=232056426162=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003=232056426162=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  of size 3126782311 edits # 500 loaded in 3 seconds 
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Reading 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@51ceb7bc 
> expecting start txid #232056752162 [2019-07-17T11:50:21.059+08:00] [INFO] 
> [Edit log tailer] : Start loading edits file 
> http://ip/getJournal?ipjid=ns1003=232077264498=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003=232077264498=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003=232077264498=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  maxTxnipsToRead = 500 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log 
> tailer] : Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003=232077264498=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003=232077264498=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003=232077264498=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit 
> log tailer] ip: Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003=232077264498=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.061+08:00] [ERROR] [Edit 
> log tailer] : Unknown error encountered while tailing edits. Shutting down 
> standby NN. java.io.IOException: There appears to be a gap in the edit log. 
> We expected txid 232056752162, but got txid 232077264498. at 
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:239)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:895) at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
>  [2019-07-17T11:50:21.064+08:00] [INFO] [Edit log tailer] : Exiting with 
> status 1 [2019-07-17T11:50:21.066+08:00] [INFO] [Thread-1] : SHUTDOWN_MSG: 
> / SHUTDOWN_MSG: 
> Shutting down NameNode at ip 
> /
> {code}
>  
> if dfs.ha.tail-edits.max-txns-per-lock value is 500,when the namenode load 
> the editlog util 500,the current namenode will load the next editlog,but 
> editlog more than 500.So,namenode got an unexpected txid when tail editlog.
>  
>  
> {code:java}
> //代码占位符[2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003=232056426162=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?jid=ns1003=232056426162=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> 

[jira] [Commented] (HDFS-14396) Failed to load image from FSImageFile when downgrade from 3.x to 2.x

2019-07-30 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895893#comment-16895893
 ] 

Yuxuan Wang commented on HDFS-14396:


It looks great! I create HDFS-14551 doing the same thing as you but you do it 
better. Because my patch need to be backported to 2.x .
Hi, [~xkrogen]. I guess you may have interest in this. 

> Failed to load image from FSImageFile when downgrade from 3.x to 2.x
> 
>
> Key: HDFS-14396
> URL: https://issues.apache.org/jira/browse/HDFS-14396
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14396.001.patch, HDFS-14396.002.patch
>
>
> After fixing HDFS-13596, try to downgrade from 3.x to 2.x. But namenode can't 
> start because exception occurs. The message follows
> {code:java}
> 2019-01-23 17:22:18,730 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Failed to load image from 
> FSImageFile(file=/data1/hadoopdata/hadoop-namenode/current/fsimage_0025310,
>  cpktTxId=00
> 25310)
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:179)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:885)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:869)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:742)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:673)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:998)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:700)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:612)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:672)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:839)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1517)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1583)
> 2019-01-23 17:22:19,023 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: Failed to load FSImage file, see error(s) above for more 
> info.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:998)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:700)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:612)
> {code}
> This issue occurs because 3.x namenode saves image with EC fields during 
> upgrade
> Try to fix it



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-07-22 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890619#comment-16890619
 ] 

Yuxuan Wang commented on HDFS-14509:


[~xkrogen], I've tested upgrading/downgrading between 2.x and 3.x in my 
company. It's ok if we patch HDFS-13596, HDFS-14551 and this jira.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-07-18 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888533#comment-16888533
 ] 

Yuxuan Wang commented on HDFS-14509:


I've created an PR about my thoughts.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14551) NN throws NPE if downgrade it during rolling upgrade from 3.x to 2.x

2019-07-18 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888494#comment-16888494
 ] 

Yuxuan Wang commented on HDFS-14551:


I have created a PR. And we need backport it to 2.x which we support 
downgrading.

Hi, [~brahmareddy]. I find you work on upgrading compatibility. Do you have an 
interest in this issue?

> NN throws NPE if downgrade it during rolling upgrade from 3.x to 2.x
> 
>
> Key: HDFS-14551
> URL: https://issues.apache.org/jira/browse/HDFS-14551
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Major
>
> We can downgrade NN during roling upgrade (runned "hdfs dfsadmin 
> -rollingUpgrade prepare) with HDFS-8432 involved. But with HDFS-14172 if the 
> image has any unrecogized section, it will throw IOException at
> {code:title=FSImageFormatProtobuf.java|borderStyle=solid}
> private void loadInternal(..) {
> ..
> String n = s.getName();
> SectionName sectionName = SectionName.fromString(n);
> if (sectionName == null) {
>   throw new IOException("Unrecognized section " + n);
> }
> ..
> }
> {code}
> and throw NPE on Hadoop 2.x
> {code:title=FSImageFormatProtobuf.java|borderStyle=solid}
> private void loadInternal(..) {
> ..
> String n = s.getName();
> switch (sectionName)
> ..
> }
> {code}
> When we downgrade NN from 3.x to 2.x, NN may load the image saved by 3.x NN. 
> Then the lack of {{SectionName.ERASURE_CODING}} can break 2.x NN.
> We should just skip the unrecogized section instead of throwing exception.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14562) The behaviour of getContentSummaryInt() in getQuotaUsage() should be configurable.

2019-06-13 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862769#comment-16862769
 ] 

Yuxuan Wang commented on HDFS-14562:


Hi, [~LiJinglun]
 I think it shouldn't be removed for compatibility.
{code:java|title=PBHelperClient.java|borderStyle=solid}
public static QuotaUsage convert(QuotaUsageProto qu) {
  ..
  if (qu.hasTypeQuotaInfos()) {
addStorageTypes(qu.getTypeQuotaInfos(), builder);
  }
  ..
}
{code}
*I agree with avoiding counting a big directory.* But the new added config item 
changes the {{FileSystem#getQuotaUsage}} which is public interface, and also 
changes the semantics of {{dfsadmin -count -u}} or {{-q}} with storage type. 
Will community accept the changes? Anyone can take a look?

> The behaviour of getContentSummaryInt() in getQuotaUsage() should be 
> configurable.
> --
>
> Key: HDFS-14562
> URL: https://issues.apache.org/jira/browse/HDFS-14562
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-14562.001.patch
>
>
> Our XiaoMi HDFS is considering upgrading from 2.6 to 3.1. There is a ploblem 
> about the getQuotaUsage rpc. In FSDirStatAndListingOp.getQuotaUsage(), if 
> there isn't any quota on the dir, it will automatically count the dir to get 
> the info of usage. But count on big dirs are quit dangerous, it can slow the 
> NameNode and even cause a failover. We've encountered the case that 10 
> concurrent count rpcs on big dir causes the NameNode failover.
> In our cluster we always need to check whether the dir has got quota or not, 
> and the automatically count will make things dangerous. Making the behavior 
> configurable seems a good idea. Administrator can decide to fall back to 
> count or fill the consume with -1 when there is no quota on the dir.
> When I try to make it configurable, I find another problem. When we convert 
> QuotaUsageProto and QuotaUsage in PBHelperClient.class, there are checks for 
> qu.hasTypeQuotaInfos() and qu.isTypeQuotaSet() || 
> qu.isTypeConsumedAvailable(). Supposing we want to return a QuotaUsage with 
> \{fileAndDirectoryCount=-1, spaceConsumed=-1, typeConsumed={-1,-1,-1,-1,-1}} 
> from Namenode to Client, because of the check, the value got by Client will 
> be \{fileAndDirectoryCount=-1, spaceConsumed=-1, typeConsumed={0,0,0,0,0}}. 
> It's inconsistent and I can't see any good reason that spaceConsumed could 
> return -1 while typeConsumed must be 0. In fact we don't need the checks, 
> checking all the assignment statement then we'll find that 
> QuotaUsage.typeConsumed and typeQuota will never be null. And it's not right 
> for the Convert layer to tamper the returned value. Since -1 represents 
> undefined in quota and usage, we should remove the check and let Namenode 
> returns -1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14551) NN throws NPE if downgrade it during rolling upgrade from 3.x to 2.x

2019-06-06 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857603#comment-16857603
 ] 

Yuxuan Wang commented on HDFS-14551:


I can attach a patch if anyone agree with me.

> NN throws NPE if downgrade it during rolling upgrade from 3.x to 2.x
> 
>
> Key: HDFS-14551
> URL: https://issues.apache.org/jira/browse/HDFS-14551
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Major
>
> We can downgrade NN during roling upgrade (runned "hdfs dfsadmin 
> -rollingUpgrade prepare) with HDFS-8432 involved. But with HDFS-14172 if the 
> image has any unrecogized section, it will throw IOException at
> {code:title=FSImageFormatProtobuf.java|borderStyle=solid}
> private void loadInternal(..) {
> ..
> String n = s.getName();
> SectionName sectionName = SectionName.fromString(n);
> if (sectionName == null) {
>   throw new IOException("Unrecognized section " + n);
> }
> ..
> }
> {code}
> and throw NPE on Hadoop 2.x
> {code:title=FSImageFormatProtobuf.java|borderStyle=solid}
> private void loadInternal(..) {
> ..
> String n = s.getName();
> switch (sectionName)
> ..
> }
> {code}
> When we downgrade NN from 3.x to 2.x, NN may load the image saved by 3.x NN. 
> Then the lack of {{SectionName.ERASURE_CODING}} can break 2.x NN.
> We should just skip the unrecogized section instead of throwing exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14551) NN throws NPE if downgrade it during rolling upgrade from 3.x to 2.x

2019-06-06 Thread Yuxuan Wang (JIRA)
Yuxuan Wang created HDFS-14551:
--

 Summary: NN throws NPE if downgrade it during rolling upgrade from 
3.x to 2.x
 Key: HDFS-14551
 URL: https://issues.apache.org/jira/browse/HDFS-14551
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yuxuan Wang


We can downgrade NN during roling upgrade (runned "hdfs dfsadmin 
-rollingUpgrade prepare) with HDFS-8432 involved. But with HDFS-14172 if the 
image has any unrecogized section, it will throw IOException at
{code:title=FSImageFormatProtobuf.java|borderStyle=solid}
private void loadInternal(..) {
..
String n = s.getName();
SectionName sectionName = SectionName.fromString(n);
if (sectionName == null) {
  throw new IOException("Unrecognized section " + n);
}
..
}
{code}
and throw NPE on Hadoop 2.x
{code:title=FSImageFormatProtobuf.java|borderStyle=solid}
private void loadInternal(..) {
..
String n = s.getName();
switch (sectionName)
..
}
{code}
When we downgrade NN from 3.x to 2.x, NN may load the image saved by 3.x NN. 
Then the lack of {{SectionName.ERASURE_CODING}} can break 2.x NN.
We should just skip the unrecogized section instead of throwing exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-06-06 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857500#comment-16857500
 ] 

Yuxuan Wang commented on HDFS-14509:


Hi, [~brahmareddy].
If I add codes to {{BlockTokenIdentifier#readFields()}} like:
{code:title=BlockTokenIdentifier.java|borderStyle=solid}
public void readFields(DataInput in) throws IOException {
{

this.cache = IOUtils.readFullyToByteArray(dis);
dis.reset();

}
{code}
And then it works for me.
In my opinion, it doesn't break token's security. How about you? Is there any 
misunderstanding? 



> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-05-23 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847190#comment-16847190
 ] 

Yuxuan Wang commented on HDFS-14509:


Hi~, [~kihwal] [~brahmareddy], thanks for comment. It's the same issue as 
HDFS-6708 after I take a look.
I wonder if we can use {{token.getIdentifier()}} instead of following when 
compute password.
{code}
public byte[] retrievePassword(BlockTokenIdentifier identifier)
{
...
return createPassword(identifier.getBytes(), key.getKey());
}
{code}

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Major
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-05-23 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846697#comment-16846697
 ] 

Yuxuan Wang commented on HDFS-14509:


Anyone can take a look? It's a bug or just I miss something?

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Major
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-05-23 Thread Yuxuan Wang (JIRA)
Yuxuan Wang created HDFS-14509:
--

 Summary: DN throws InvalidToken due to inequality of password when 
upgrade NN 2.x to 3.x
 Key: HDFS-14509
 URL: https://issues.apache.org/jira/browse/HDFS-14509
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yuxuan Wang


According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
upgrade NN first. And there will be a intermediate state that NN is 3.x and DN 
is 2.x. At that moment, if a client reads (or writes) a block, it will get a 
block token from NN and then deliver the token to DN who can verify the token. 
But the verification in the code now is :
{code:title=BlockTokenSecretManager.java|borderStyle=solid}
public void checkAccess(...)
{
...
id.readFields(new DataInputStream(new 
ByteArrayInputStream(token.getIdentifier(;
...
if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
  throw new InvalidToken("Block token with " + id.toString()
  + " doesn't have the correct token password");
}
}
{code} 
And {{retrievePassword(id)}} is:
{code} 
public byte[] retrievePassword(BlockTokenIdentifier identifier)
{
...
return createPassword(identifier.getBytes(), key.getKey());
}
{code} 
So, if NN's identifier add new fields, DN will lose the fields and compute 
wrong password.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-05-06 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834323#comment-16834323
 ] 

Yuxuan Wang commented on HDFS-14134:


Hi [~lukmajercak]. No. I just figure that {{StandbyException}} always trigger 
{{FAILOVER_AND_RETRY}}. And with hedging proxy, the action 
{{FAILOVER_AND_RETRY}} will always be gotten and will cover all actions have 
lower order than {{FAILOVER_AND_RETRY}}.
 I mean that the order of {{enum RetryDecision}} may should be 
{{FAILOVER_AND_RETRY < RETRY < FAIL}} in patch 007. 
Or in short, {{FAILOVER_AND_RETRY}}'s order should be the lowest.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-04-28 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828886#comment-16828886
 ] 

Yuxuan Wang commented on HDFS-14134:


Hello, anyone is working on this? I find a bug in 
{{org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider}} . 
Just like [~lukmajercak] said:
{quote}Also note that previously, if a hedging request got FAILOVER_RETRY and 
some request got SocketExc on nonidempotent operation (e.g. FAIL), the client 
would still pick FAILOVER_RETRY over FAIL, so i think we are fixing an issue 
here as well.
{quote}
But more than this, standby namenode will always throw back StandbyException 
which can cause {{FAILOVER_AND_RETRY}} action. It will cover all actions have 
lower order than {{FAILOVER_AND_RETRY}}, such as {{RETRY}} in 
[^HDFS-14134.007.patch].
I mean, the correct order should be {{Ordering: FAILOVER_AND_RETRY < RETRY < 
FAIL}}, right ?

 

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-12-12 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719732#comment-16719732
 ] 

Yuxuan Wang commented on HDFS-14088:


HDFS-14088.006.patch  fix whitespace 

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch, HDFS-14088.002.patch, 
> HDFS-14088.003.patch, HDFS-14088.004.patch, HDFS-14088.005.patch, 
> HDFS-14088.006.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-12-12 Thread Yuxuan Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14088:
---
Attachment: HDFS-14088.006.patch

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch, HDFS-14088.002.patch, 
> HDFS-14088.003.patch, HDFS-14088.004.patch, HDFS-14088.005.patch, 
> HDFS-14088.006.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-12-12 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719652#comment-16719652
 ] 

Yuxuan Wang commented on HDFS-14088:


Thanks [~elgoiri] for reviewing.
A new patch HDFS-14088.005.patch attached.
Have I do javadoc comment in right format?

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch, HDFS-14088.002.patch, 
> HDFS-14088.003.patch, HDFS-14088.004.patch, HDFS-14088.005.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-12-12 Thread Yuxuan Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14088:
---
Attachment: HDFS-14088.005.patch

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch, HDFS-14088.002.patch, 
> HDFS-14088.003.patch, HDFS-14088.004.patch, HDFS-14088.005.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-12-11 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716799#comment-16716799
 ] 

Yuxuan Wang commented on HDFS-14088:


A new patch HDFS-14088.004.patch attached.
Use {{LambdaTestUtils#intercept()}} for test.
Add comment for test.

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch, HDFS-14088.002.patch, 
> HDFS-14088.003.patch, HDFS-14088.004.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-12-11 Thread Yuxuan Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14088:
---
Attachment: HDFS-14088.004.patch

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch, HDFS-14088.002.patch, 
> HDFS-14088.003.patch, HDFS-14088.004.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-11-26 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698660#comment-16698660
 ] 

Yuxuan Wang commented on HDFS-14088:


Hi, [~elgoiri] Thanks for comment. 
 I modify the UT, and add comments what you said.
 There is confusion. In my patch, the variable {{currentUsedHandler}} is 
{{RequestHedgingProxyProvider}} 's field and {{currentUsedProxy}} is 
{{RequestHedgingInvocationHandler}} 's.

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch, HDFS-14088.002.patch, 
> HDFS-14088.003.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-11-26 Thread Yuxuan Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14088:
---
Attachment: HDFS-14088.003.patch

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch, HDFS-14088.002.patch, 
> HDFS-14088.003.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-11-20 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694217#comment-16694217
 ] 

Yuxuan Wang commented on HDFS-14088:


Thanks Íñigo Goiri for reviewing.
Here's a new patch HDFS-14088.002.patch for trunk.

I'm sorry about not explaining clearly. The patch avoid synchronize the whole 
thing with double check locking.

The original question is, after calling performFailover() and getProxy(), the 
former RequestHedgingInvocationHandler instances should be deprecated can still 
access currentUsedProxy. It can cause the handler success pass the if-condition 
{code:java}
currentUsedProxy != null
{code} 
 but immediately it turn to null because of performFailover() called by 
somebody.

My idea is, let the RequestHedgingInvocationHandler hold the currentUsedProxy 
and RequestHedgingProxyProvider hold the currentUsedHandler which warp the 
RequestHedgingInvocationHandler into a proxy like before. Fail over will set 
currentUsedHandler to null and getProxy() will assign a new 
RequestHedgingInvocationHandler to it and return. So we can avoid the 
deprecated handler can still access null currentUsedProxy after calling 
performFailover().

For the unit test, I add some metric check. The test's idea is mock a call and 
sleep, then call performFailover(). The original code 
{code:java}
if (currentUsedProxy != null) {
  try {
Object retVal = method.invoke(currentUsedProxy.proxy, args);
LOG.debug("Invocation successful on [{}]",
currentUsedProxy.proxyInfo);
return retVal;
  } 
{code}
debug log can throw a NullPointerException.
I know the idea for unit test is a little tricky, but I can't figure out better 
one.

Reformat the code.

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch, HDFS-14088.002.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-11-20 Thread Yuxuan Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14088:
---
Attachment: HDFS-14088.002.patch

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch, HDFS-14088.002.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failover due to no lock on currentUsedProxy

2018-11-20 Thread Yuxuan Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxuan Wang updated HDFS-14088:
---
Attachment: HDFS-14088.001.patch

> RequestHedgingProxyProvider can throw NullPointerException when failover due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Major
> Attachments: HDFS-14088.001.patch
>
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failvoer due to no lock on currentUsedProxy

2018-11-19 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691488#comment-16691488
 ] 

Yuxuan Wang commented on HDFS-14088:


I'll attach a patch later.

> RequestHedgingProxyProvider can throw NullPointerException when failvoer due 
> to no lock on currentUsedProxy
> ---
>
> Key: HDFS-14088
> URL: https://issues.apache.org/jira/browse/HDFS-14088
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Yuxuan Wang
>Priority: Major
>
> {code:java}
> if (currentUsedProxy != null) {
> try {
>   Object retVal = method.invoke(currentUsedProxy.proxy, args);
>   LOG.debug("Invocation successful on [{}]",
>   currentUsedProxy.proxyInfo);
> {code}
> If a thread run try block and then other thread trigger a fail over calling 
> method
> {code:java}
> @Override
>   public synchronized void performFailover(T currentProxy) {
> toIgnore = this.currentUsedProxy.proxyInfo;
> this.currentUsedProxy = null;
>   }
> {code}
> It will set currentUsedProxy to null, and the first thread can throw a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14088) RequestHedgingProxyProvider can throw NullPointerException when failvoer due to no lock on currentUsedProxy

2018-11-19 Thread Yuxuan Wang (JIRA)
Yuxuan Wang created HDFS-14088:
--

 Summary: RequestHedgingProxyProvider can throw 
NullPointerException when failvoer due to no lock on currentUsedProxy
 Key: HDFS-14088
 URL: https://issues.apache.org/jira/browse/HDFS-14088
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Yuxuan Wang



{code:java}
if (currentUsedProxy != null) {
try {
  Object retVal = method.invoke(currentUsedProxy.proxy, args);
  LOG.debug("Invocation successful on [{}]",
  currentUsedProxy.proxyInfo);
{code}
If a thread run try block and then other thread trigger a fail over calling 
method
{code:java}
@Override
  public synchronized void performFailover(T currentProxy) {
toIgnore = this.currentUsedProxy.proxyInfo;
this.currentUsedProxy = null;
  }
{code}
It will set currentUsedProxy to null, and the first thread can throw a 
NullPointerException.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org