[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Status: Patch Available (was: In Progress) > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: NIO, Windows, datanode > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Component/s: datanode > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: NIO, Windows, datanode > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Labels: NIO Windows datanode (was: ) > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: NIO, Windows, datanode > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-15151 started by Lukas Majercak. - > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Affects Version/s: 3.4.0 3.3.1 3.2.2 3.1.4 3.3.0 > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Description: Proposing to give an option to use TransmitFile Windows function for file to socket data transfer. https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile was: Proposing to give an option to use TransmitFile Windows function for file to socket data transfer. > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Description: Proposing to give an option to use TransmitFile > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > > Proposing to give an option to use TransmitFile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Description: Proposing to give an option to use TransmitFile Windows function for file to socket data transfer. was:Proposing to give an option to use TransmitFile > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15151) Use TransmitFile for file to socket data transfer
Lukas Majercak created HDFS-15151: - Summary: Use TransmitFile for file to socket data transfer Key: HDFS-15151 URL: https://issues.apache.org/jira/browse/HDFS-15151 Project: Hadoop HDFS Issue Type: New Feature Reporter: Lukas Majercak Assignee: Lukas Majercak -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15055) Hedging clones client's buffer
[ https://issues.apache.org/jira/browse/HDFS-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15055: -- Priority: Minor (was: Major) > Hedging clones client's buffer > -- > > Key: HDFS-15055 > URL: https://issues.apache.org/jira/browse/HDFS-15055 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.9.2, 3.3.0, 3.2.1, 2.9.3, 3.2.2 >Reporter: Lukas Majercak >Priority: Minor > > Currently, DFSInputStream clones the buffer passed from the caller for every > request, this can have severe impact on the performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-15055) Hedging clones client's buffer
[ https://issues.apache.org/jira/browse/HDFS-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak reopened HDFS-15055: --- > Hedging clones client's buffer > -- > > Key: HDFS-15055 > URL: https://issues.apache.org/jira/browse/HDFS-15055 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.9.2, 3.3.0, 3.2.1, 2.9.3, 3.2.2 >Reporter: Lukas Majercak >Priority: Major > > Currently, DFSInputStream clones the buffer passed from the caller for every > request, this can have severe impact on the performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15055) Hedging clones client's buffer
[ https://issues.apache.org/jira/browse/HDFS-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995256#comment-16995256 ] Lukas Majercak commented on HDFS-15055: --- Although I feel like this still could be an issue. Potentially we'll create up to a blocksize sized buffer for every single hedged request. > Hedging clones client's buffer > -- > > Key: HDFS-15055 > URL: https://issues.apache.org/jira/browse/HDFS-15055 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.9.2, 3.3.0, 3.2.1, 2.9.3, 3.2.2 >Reporter: Lukas Majercak >Priority: Major > > Currently, DFSInputStream clones the buffer passed from the caller for every > request, this can have severe impact on the performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15055) Hedging clones client's buffer
[ https://issues.apache.org/jira/browse/HDFS-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995254#comment-16995254 ] Lukas Majercak commented on HDFS-15055: --- Closing as we actually create a separate buffer for the length requested. > Hedging clones client's buffer > -- > > Key: HDFS-15055 > URL: https://issues.apache.org/jira/browse/HDFS-15055 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.9.2, 3.3.0, 3.2.1, 2.9.3, 3.2.2 >Reporter: Lukas Majercak >Priority: Major > > Currently, DFSInputStream clones the buffer passed from the caller for every > request, this can have severe impact on the performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15055) Hedging clones client's buffer
[ https://issues.apache.org/jira/browse/HDFS-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak resolved HDFS-15055. --- Resolution: Not A Problem > Hedging clones client's buffer > -- > > Key: HDFS-15055 > URL: https://issues.apache.org/jira/browse/HDFS-15055 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.9.2, 3.3.0, 3.2.1, 2.9.3, 3.2.2 >Reporter: Lukas Majercak >Priority: Major > > Currently, DFSInputStream clones the buffer passed from the caller for every > request, this can have severe impact on the performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15055) Hedging clones client's buffer
[ https://issues.apache.org/jira/browse/HDFS-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15055: -- Description: Currently, DFSInputStream clones the buffer passed from the caller for every request, this can have severe impact on the performance. (was: Currently, DFSInputStream clones the buffer passed from the caller for every request, this can have severe impact on the performance (imagine cloning a 1GB buffer).) > Hedging clones client's buffer > -- > > Key: HDFS-15055 > URL: https://issues.apache.org/jira/browse/HDFS-15055 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.9.2, 3.3.0, 3.2.1, 2.9.3, 3.2.2 >Reporter: Lukas Majercak >Priority: Major > > Currently, DFSInputStream clones the buffer passed from the caller for every > request, this can have severe impact on the performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15055) Hedging clones client's buffer
[ https://issues.apache.org/jira/browse/HDFS-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15055: -- Priority: Major (was: Critical) > Hedging clones client's buffer > -- > > Key: HDFS-15055 > URL: https://issues.apache.org/jira/browse/HDFS-15055 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.9.2, 3.3.0, 3.2.1, 2.9.3, 3.2.2 >Reporter: Lukas Majercak >Priority: Major > > Currently, DFSInputStream clones the buffer passed from the caller for every > request, this can have severe impact on the performance (imagine cloning a > 1GB buffer). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15055) Hedging clones client's buffer
[ https://issues.apache.org/jira/browse/HDFS-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15055: -- Description: Currently, DFSInputStream clones the buffer passed from the caller for every request, this can have severe impact on the performance (imagine cloning a 1GB buffer). (was: _emphasized text_) > Hedging clones client's buffer > -- > > Key: HDFS-15055 > URL: https://issues.apache.org/jira/browse/HDFS-15055 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.9.2, 3.3.0, 3.2.1, 2.9.3, 3.2.2 >Reporter: Lukas Majercak >Priority: Critical > > Currently, DFSInputStream clones the buffer passed from the caller for every > request, this can have severe impact on the performance (imagine cloning a > 1GB buffer). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15055) Hedging clones client's buffer
[ https://issues.apache.org/jira/browse/HDFS-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15055: -- Description: _emphasized text_ > Hedging clones client's buffer > -- > > Key: HDFS-15055 > URL: https://issues.apache.org/jira/browse/HDFS-15055 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.9.2, 3.3.0, 3.2.1, 2.9.3, 3.2.2 >Reporter: Lukas Majercak >Priority: Critical > > _emphasized text_ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15055) Hedging clones client's buffer
Lukas Majercak created HDFS-15055: - Summary: Hedging clones client's buffer Key: HDFS-15055 URL: https://issues.apache.org/jira/browse/HDFS-15055 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.2.1, 2.9.2, 3.3.0, 2.9.3, 3.2.2 Reporter: Lukas Majercak -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941143#comment-16941143 ] Lukas Majercak commented on HDFS-14882: --- Overall looks okay to me, seems like an improvement of dfs.namenode.avoid.read.highload.datanode (+ .threshold). I only wish we could also use some sort of an estimate of load that we've already scheduled on each DN, not just xceivers reported by them. > Consider DataNode load when #getBlockLocation > - > > Key: HDFS-14882 > URL: https://issues.apache.org/jira/browse/HDFS-14882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14882.001.patch > > > Currently, we consider load of datanode when #chooseTarget for writer, > however not consider it for reader. Thus, the process slot of datanode could > be occupied by #BlockSender for reader, and disk/network will be busy > workload, then meet some slow node exception. IIRC same case is reported > times. Based on the fact, I propose to consider load for reader same as it > did #chooseTarget for writer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation
[ https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925976#comment-16925976 ] Lukas Majercak commented on HDFS-12288: --- [~zhangchen] not working on this right now, feel free to pick it up. > Fix DataNode's xceiver count calculation > > > Key: HDFS-12288 > URL: https://issues.apache.org/jira/browse/HDFS-12288 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch > > > The problem with the ThreadGroup.activeCount() method is that the method is > only a very rough estimate, and in reality returns the total number of > threads in the thread group as opposed to the threads actually running. > In some DNs, we saw this to return 50~ for a long time, even though the > actual number of DataXceiver threads was next to none. > This is a big issue as we use the xceiverCount to make decisions on the NN > for choosing replication source DN or returning DNs to clients for R/W. > The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value > which only accounts for actual number of DataXcevier threads currently > running and thus represents the load on the DN much better. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14545) RBF: Router should support GetUserMappingsProtocol
[ https://issues.apache.org/jira/browse/HDFS-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863822#comment-16863822 ] Lukas Majercak commented on HDFS-14545: --- Thanks [~ayushtkn], LGTM > RBF: Router should support GetUserMappingsProtocol > -- > > Key: HDFS-14545 > URL: https://issues.apache.org/jira/browse/HDFS-14545 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14545-HDFS-13891-01.patch, > HDFS-14545-HDFS-13891-02.patch, HDFS-14545-HDFS-13891-03.patch, > HDFS-14545-HDFS-13891-04.patch, HDFS-14545-HDFS-13891-05.patch, > HDFS-14545-HDFS-13891-06.patch, HDFS-14545-HDFS-13891-07.patch, > HDFS-14545-HDFS-13891-08.patch, HDFS-14545-HDFS-13891-09.patch, > HDFS-14545-HDFS-13891-10.patch, HDFS-14545-HDFS-13891.000.patch > > > We should be able to check the groups for a user from a Router. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14545) RBF: Router should support GetUserMappingsProtocol
[ https://issues.apache.org/jira/browse/HDFS-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858912#comment-16858912 ] Lukas Majercak commented on HDFS-14545: --- ConnectionPool lines 410, 411. Would be nice to either change "clazz0" to something like "clazzProtoPb" or remove these variables altogether. Nitpicks: - RouterRpcServer line 361 missing space before "=" - RouterUserProtocol line 45: you don't need .getName() there? Also maybe separating static and non static members visually - TestRouterUserMappings line 295: this assert length == 2 seems kinda vague, can we pass in the actual groups ? > RBF: Router should support GetUserMappingsProtocol > -- > > Key: HDFS-14545 > URL: https://issues.apache.org/jira/browse/HDFS-14545 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14545-HDFS-13891-01.patch, > HDFS-14545-HDFS-13891-02.patch, HDFS-14545-HDFS-13891-03.patch, > HDFS-14545-HDFS-13891-04.patch, HDFS-14545-HDFS-13891-05.patch, > HDFS-14545-HDFS-13891-06.patch, HDFS-14545-HDFS-13891-07.patch, > HDFS-14545-HDFS-13891-08.patch, HDFS-14545-HDFS-13891-09.patch, > HDFS-14545-HDFS-13891.000.patch > > > We should be able to check the groups for a user from a Router. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14447) RBF: Router should support RefreshUserMappingsProtocol
[ https://issues.apache.org/jira/browse/HDFS-14447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841554#comment-16841554 ] Lukas Majercak commented on HDFS-14447: --- patch09 lgtm > RBF: Router should support RefreshUserMappingsProtocol > -- > > Key: HDFS-14447 > URL: https://issues.apache.org/jira/browse/HDFS-14447 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.1.0 >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Major > Fix For: HDFS-13891 > > Attachments: HDFS-14447-HDFS-13891.01.patch, > HDFS-14447-HDFS-13891.02.patch, HDFS-14447-HDFS-13891.03.patch, > HDFS-14447-HDFS-13891.04.patch, HDFS-14447-HDFS-13891.05.patch, > HDFS-14447-HDFS-13891.06.patch, HDFS-14447-HDFS-13891.07.patch, > HDFS-14447-HDFS-13891.08.patch, HDFS-14447-HDFS-13891.09.patch, error.png > > > HDFS with RBF > We configure hadoop.proxyuser.xx.yy ,then execute hdfs dfsadmin > -Dfs.defaultFS=hdfs://router-fed -refreshSuperUserGroupsConfiguration, > it throws "Unknown protocol: ...RefreshUserMappingProtocol". > RouterAdminServer should support RefreshUserMappingsProtocol , or a proxyuser > client would be refused to impersonate.As shown in the screenshot -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14447) RBF: Router should support RefreshUserMappingsProtocol
[ https://issues.apache.org/jira/browse/HDFS-14447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839720#comment-16839720 ] Lukas Majercak commented on HDFS-14447: --- There are a couple of syntax inconsistencies in 06.patch, such as lines: 303, 317, 369, 370, 375, 380 in TestRefreshUserMappingsWithRouters. But other than that the patch lgtm > RBF: Router should support RefreshUserMappingsProtocol > -- > > Key: HDFS-14447 > URL: https://issues.apache.org/jira/browse/HDFS-14447 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.1.0 >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Major > Fix For: HDFS-13891 > > Attachments: HDFS-14447-HDFS-13891.01.patch, > HDFS-14447-HDFS-13891.02.patch, HDFS-14447-HDFS-13891.03.patch, > HDFS-14447-HDFS-13891.04.patch, HDFS-14447-HDFS-13891.05.patch, > HDFS-14447-HDFS-13891.06.patch, error.png > > > HDFS with RBF > We configure hadoop.proxyuser.xx.yy ,then execute hdfs dfsadmin > -Dfs.defaultFS=hdfs://router-fed -refreshSuperUserGroupsConfiguration, > it throws "Unknown protocol: ...RefreshUserMappingProtocol". > RouterAdminServer should support RefreshUserMappingsProtocol , or a proxyuser > client would be refused to impersonate.As shown in the screenshot -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834936#comment-16834936 ] Lukas Majercak commented on HDFS-14134: --- Hi [~John Smith]. I'm not sure if it's that simple. Say you have 2 NNs, where nn1 is active, nn2 is standby. If your current target is nn2, but we send a request to both, you can get responses like: nn1 - RETRY nn2 - RETRY_AND_FAILOVER In this case, the ideal scenario would be to failover to nn1, but what you're proposing would not do that. I think this obviously has room for improvement, as in some cases RETRY > FAILOVER is reasonable, but I'd like to refrain from increasing the scope of this JIRA. Maybe we can create a new one and have the discussion there? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832730#comment-16832730 ] Lukas Majercak commented on HDFS-14134: --- Hi [~John Smith]. I'm not sure I'm following, what's the concern with StandbyException triggering FAILOVER_AND_RETRY with the logic in patch 007 ? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14326) Add CorruptFilesCount to JMX
[ https://issues.apache.org/jira/browse/HDFS-14326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780939#comment-16780939 ] Lukas Majercak commented on HDFS-14326: --- Can we add more test coverage? We could just assert the expected length throughout the tests in TestListCorruptFileBlocks . > Add CorruptFilesCount to JMX > > > Key: HDFS-14326 > URL: https://issues.apache.org/jira/browse/HDFS-14326 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, metrics, namenode >Reporter: Danny Becker >Assignee: Danny Becker >Priority: Minor > Attachments: HDFS-14326.000.patch > > > Add CorruptFilesCount to JMX -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740810#comment-16740810 ] Lukas Majercak commented on HDFS-14134: --- + more people [~szetszwo], [~jingzhao]. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740802#comment-16740802 ] Lukas Majercak commented on HDFS-14134: --- Thanks [~knanasi]. [~atm], [~eli], [~sureshms], [~sanjay.radia], [~xgong], [~jianhe]; I see you guys worked on this part of the codebase before, anyone available to review this? Thanks! > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739821#comment-16739821 ] Lukas Majercak commented on HDFS-14134: --- Patch 007 to fix checkstyle + whitespace warnings. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.007.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.006.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739719#comment-16739719 ] Lukas Majercak commented on HDFS-14134: --- Added patch006 together with HDFS-14134_retrypolicy_change_proposal_1.pdf to explain the changes. After the discussion, it seems like just changing the logic for remote IOExceptions together with the priority for retry actions will be enough. Could you review this [~knanasi] ? Thanks! > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: (was: HDFS-14134.006.patch) > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134_retrypolicy_change_proposal_1.pdf > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.006.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738702#comment-16738702 ] Lukas Majercak commented on HDFS-14134: --- Also note that previously, if a hedging request got FAILOVER_RETRY and some request got SocketExc on nonidempotent operation (e.g. FAIL), the client would still pick FAILOVER_RETRY over FAIL, so i think we are fixing an issue here as well. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738700#comment-16738700 ] Lukas Majercak commented on HDFS-14134: --- I see, that makes sense, I'm happy to change SocketException (non-idempotent) IOException (non-idempotent) back to being FAIL. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720564#comment-16720564 ] Lukas Majercak commented on HDFS-14134: --- I'll go through that discussion. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720506#comment-16720506 ] Lukas Majercak commented on HDFS-14134: --- I agree non-remote IOExceptions could be network related, but this is covered right? Non-remote IOExceptions are retried with this change, no matter whether the operation is idempotent. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720507#comment-16720507 ] Lukas Majercak commented on HDFS-14134: --- I'd argue that this change is even safer, because previously the retry action would be FAIL for: SocketExceptions (non-idempotent) Non-remote IOExceptions (non-idempotent) > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719444#comment-16719444 ] Lukas Majercak commented on HDFS-14134: --- Retrying failed idempotent operations might be safe, but it surely is wasteful. Check the test I wrote for getXAttr, the client just retries the same exception over and over for no reason. We have to have a concept of nonretriable exceptions in HDFS, and I feel like RemoteException of an idempotent operation is a very good start. The previous design was very strange, they chose to FAIL if the operation was not idempotent and the exception was not Remote, which does not make a lot of sense to me. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717754#comment-16717754 ] Lukas Majercak commented on HDFS-14134: --- Why should we retry if the operation is idempotent and the exception is remoteexception? The definition of an idempotent operation is that it will have the same outcome next time as well right? In that case, we should just fail fast. Check TestRequestHedgingProxyProvider.testIdempotentOperationShouldNotGetStuckInRetries > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.005.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715863#comment-16715863 ] Lukas Majercak commented on HDFS-14134: --- Patch 005 to fix minor checkstyle issue in UnreliableImplementation > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.004.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715705#comment-16715705 ] Lukas Majercak commented on HDFS-14134: --- Added tests to cover all cases (SocketExc, IOException, RemoteException, non/idempotent) in patch004. Anyone available to review? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: (was: HDFS-14134.003.patch) > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715682#comment-16715682 ] Lukas Majercak commented on HDFS-14134: --- Fixed TestFailoverProxy as well, still might need to add more tests to cover all the exceptions. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.003.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.003.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715579#comment-16715579 ] Lukas Majercak commented on HDFS-14134: --- I realized I needed to change the mock expectations to fix TestLoadBalancingKMSClientProvider. Added patch003 to fix that. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715545#comment-16715545 ] Lukas Majercak commented on HDFS-14134: --- Thanks for the review [~knanasi]. I've uploaded patch002 to include non-remote IOException handling + fix TestDefaultRetryPolicy. Seems like this should also fix TestLoadBalancingKMSClientProvider. I'll then fix and add more tests in TestFailoverProxy. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.002.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713440#comment-16713440 ] Lukas Majercak commented on HDFS-14134: --- The unit tests are expected to fail, I can fix them once we agree on how the retry policy should behave. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713358#comment-16713358 ] Lukas Majercak commented on HDFS-14134: --- For the retry policy changes, maybe it would make sense to just RETRY when the exception is RemoteException and the operation is not idempotent/atmostonce. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.001.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: (was: HDFS-14134.001.patch) > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713326#comment-16713326 ] Lukas Majercak commented on HDFS-14134: --- Reuploaded the patch, because this guy Yetus took my pdf file as patch. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713314#comment-16713314 ] Lukas Majercak commented on HDFS-14134: --- Added HDFS-14134_retrypolicy_change_proposal.pdf to illustrate the proposed changes in the FailoverOnNetworkExceptionRetry retry policy. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134_retrypolicy_change_proposal.pdf > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.001.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where file does > not have the attribute, NN throws an IOException with message "could not find > attr". The current client retry policy determines the action for that to be > FAILOVER_AND_RETRY. The client then fails over and retries until it reaches > the maximum number of retries. Supposedly, the client should be able to tell > that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: HDFS-14134.001.patch > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Attachment: (was: HDFS-14134.001.patch) > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Description: Currently, some operations that throw IOException on the NameNode are evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail fast. For example, when calling getXAttr("user.some_attr", file") where the file does not have the attribute, NN throws an IOException with message "could not find attr". The current client retry policy determines the action for that to be FAILOVER_AND_RETRY. The client then fails over and retries until it reaches the maximum number of retries. Supposedly, the client should be able to tell that this exception is normal and fail fast. Moreover, even if the action was FAIL, the RetryInvocationHandler looks at all the retry actions from all requests, and FAILOVER_AND_RETRY takes precedence over FAIL action. was: Currently, some operations that throw IOException on the NameNode are evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail fast. For example, when calling getXAttr("user.some_attr", file") where file does not have the attribute, NN throws an IOException with message "could not find attr". The current client retry policy determines the action for that to be FAILOVER_AND_RETRY. The client then fails over and retries until it reaches the maximum number of retries. Supposedly, the client should be able to tell that this exception is normal and fail fast. Moreover, even if the action was FAIL, the RetryInvocationHandler looks at all the retry actions from all requests, and FAILOVER_AND_RETRY takes precedence over FAIL action. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713233#comment-16713233 ] Lukas Majercak commented on HDFS-14134: --- Added a patch to demonstrate the issue. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where file does > not have the attribute, NN throws an IOException with message "could not find > attr". The current client retry policy determines the action for that to be > FAILOVER_AND_RETRY. The client then fails over and retries until it reaches > the maximum number of retries. Supposedly, the client should be able to tell > that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14134: -- Description: Currently, some operations that throw IOException on the NameNode are evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail fast. For example, when calling getXAttr("user.some_attr", file") where file does not have the attribute, NN throws an IOException with message "could not find attr". The current client retry policy determines the action for that to be FAILOVER_AND_RETRY. The client then fails over and retries until it reaches the maximum number of retries. Supposedly, the client should be able to tell that this exception is normal and fail fast. Moreover, even if the action was FAIL, the RetryInvocationHandler looks at all the retry actions from all requests, and FAILOVER_AND_RETRY takes precedence over FAIL action. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where file does > not have the attribute, NN throws an IOException with message "could not find > attr". The current client retry policy determines the action for that to be > FAILOVER_AND_RETRY. The client then fails over and retries until it reaches > the maximum number of retries. Supposedly, the client should be able to tell > that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
Lukas Majercak created HDFS-14134: - Summary: Idempotent operations throwing RemoteException should not be retried by the client Key: HDFS-14134 URL: https://issues.apache.org/jira/browse/HDFS-14134 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, hdfs-client, ipc Reporter: Lukas Majercak Assignee: Lukas Majercak -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14043) Tolerate corrupted seen_txid file
[ https://issues.apache.org/jira/browse/HDFS-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675934#comment-16675934 ] Lukas Majercak commented on HDFS-14043: --- Yes, I ran the TestSaveNamespace in my local trunk version + we ran all the hdfs tests with this patch applied on top of our internal 2.9 version. > Tolerate corrupted seen_txid file > - > > Key: HDFS-14043 > URL: https://issues.apache.org/jira/browse/HDFS-14043 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.9.2, 3.1.2, 2.9.3 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-14043.001.patch, HDFS-14043.002.patch, > HDFS-14043.003.patch > > > We already tolerate IOExceptions when reading seen_txid file from namenode's > dirs. So we take the maximum txid of all the *readable* namenode dirs. We > should extend this to when the file is corrupted. Currently, > PersistentLongFile.readFile throws NumberFormatException in this case and the > whole NN crashes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14043) Tolerate corrupted seen_txid file
[ https://issues.apache.org/jira/browse/HDFS-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675766#comment-16675766 ] Lukas Majercak commented on HDFS-14043: --- Added patch003 to fix checkstyle errors. > Tolerate corrupted seen_txid file > - > > Key: HDFS-14043 > URL: https://issues.apache.org/jira/browse/HDFS-14043 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.9.2, 3.1.2, 2.9.3 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-14043.001.patch, HDFS-14043.002.patch, > HDFS-14043.003.patch > > > We already tolerate IOExceptions when reading seen_txid file from namenode's > dirs. So we take the maximum txid of all the *readable* namenode dirs. We > should extend this to when the file is corrupted. Currently, > PersistentLongFile.readFile throws NumberFormatException in this case and the > whole NN crashes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14043) Tolerate corrupted seen_txid file
[ https://issues.apache.org/jira/browse/HDFS-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14043: -- Attachment: HDFS-14043.003.patch > Tolerate corrupted seen_txid file > - > > Key: HDFS-14043 > URL: https://issues.apache.org/jira/browse/HDFS-14043 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.9.2, 3.1.2, 2.9.3 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-14043.001.patch, HDFS-14043.002.patch, > HDFS-14043.003.patch > > > We already tolerate IOExceptions when reading seen_txid file from namenode's > dirs. So we take the maximum txid of all the *readable* namenode dirs. We > should extend this to when the file is corrupted. Currently, > PersistentLongFile.readFile throws NumberFormatException in this case and the > whole NN crashes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14043) Tolerate corrupted seen_txid file
[ https://issues.apache.org/jira/browse/HDFS-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14043: -- Attachment: HDFS-14043.002.patch > Tolerate corrupted seen_txid file > - > > Key: HDFS-14043 > URL: https://issues.apache.org/jira/browse/HDFS-14043 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.9.2, 3.1.2, 2.9.3 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-14043.001.patch, HDFS-14043.002.patch > > > We already tolerate IOExceptions when reading seen_txid file from namenode's > dirs. So we take the maximum txid of all the *readable* namenode dirs. We > should extend this to when the file is corrupted. Currently, > PersistentLongFile.readFile throws NumberFormatException in this case and the > whole NN crashes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14043) Tolerate corrupted seen_txid file
[ https://issues.apache.org/jira/browse/HDFS-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675677#comment-16675677 ] Lukas Majercak commented on HDFS-14043: --- Added patch002 that should apply to trunk. > Tolerate corrupted seen_txid file > - > > Key: HDFS-14043 > URL: https://issues.apache.org/jira/browse/HDFS-14043 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.9.2, 3.1.2, 2.9.3 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-14043.001.patch, HDFS-14043.002.patch > > > We already tolerate IOExceptions when reading seen_txid file from namenode's > dirs. So we take the maximum txid of all the *readable* namenode dirs. We > should extend this to when the file is corrupted. Currently, > PersistentLongFile.readFile throws NumberFormatException in this case and the > whole NN crashes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14043) Tolerate corrupted seen_txid file
[ https://issues.apache.org/jira/browse/HDFS-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672077#comment-16672077 ] Lukas Majercak edited comment on HDFS-14043 at 11/1/18 7:38 PM: [~cmccabe] could you review this, as this seems to be related to HDFS-3004. was (Author: lukmajercak): [~cmccabe] could you review this, as this seems to be related to https://issues.apache.org/jira/browse/HDFS-3004 > Tolerate corrupted seen_txid file > - > > Key: HDFS-14043 > URL: https://issues.apache.org/jira/browse/HDFS-14043 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.9.2, 3.1.2, 2.9.3 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-14043.001.patch > > > We already tolerate IOExceptions when reading seen_txid file from namenode's > dirs. So we take the maximum txid of all the *readable* namenode dirs. We > should extend this to when the file is corrupted. Currently, > PersistentLongFile.readFile throws NumberFormatException in this case and the > whole NN crashes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14043) Tolerate corrupted seen_txid file
[ https://issues.apache.org/jira/browse/HDFS-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672077#comment-16672077 ] Lukas Majercak commented on HDFS-14043: --- [~cmccabe] could you review this, as this seems to be related to https://issues.apache.org/jira/browse/HDFS-3004 > Tolerate corrupted seen_txid file > - > > Key: HDFS-14043 > URL: https://issues.apache.org/jira/browse/HDFS-14043 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.9.2, 3.1.2, 2.9.3 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-14043.001.patch > > > We already tolerate IOExceptions when reading seen_txid file from namenode's > dirs. So we take the maximum txid of all the *readable* namenode dirs. We > should extend this to when the file is corrupted. Currently, > PersistentLongFile.readFile throws NumberFormatException in this case and the > whole NN crashes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14043) Tolerate corrupted seen_txid file
[ https://issues.apache.org/jira/browse/HDFS-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14043: -- Description: We already tolerate IOExceptions when reading seen_txid file from namenode's dirs. So we take the maximum txid of all the *readable* namenode dirs. We should extend this to when the file is corrupted. Currently, PersistentLongFile.readFile throws NumberFormatException in this case and the whole NN crashes. > Tolerate corrupted seen_txid file > - > > Key: HDFS-14043 > URL: https://issues.apache.org/jira/browse/HDFS-14043 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.9.2, 3.1.2, 2.9.3 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-14043.001.patch > > > We already tolerate IOExceptions when reading seen_txid file from namenode's > dirs. So we take the maximum txid of all the *readable* namenode dirs. We > should extend this to when the file is corrupted. Currently, > PersistentLongFile.readFile throws NumberFormatException in this case and the > whole NN crashes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14043) Tolerate corrupted seen_txid file
[ https://issues.apache.org/jira/browse/HDFS-14043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14043: -- Attachment: HDFS-14043.001.patch > Tolerate corrupted seen_txid file > - > > Key: HDFS-14043 > URL: https://issues.apache.org/jira/browse/HDFS-14043 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.9.2, 3.1.2, 2.9.3 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-14043.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14043) Tolerate corrupted seen_txid file
Lukas Majercak created HDFS-14043: - Summary: Tolerate corrupted seen_txid file Key: HDFS-14043 URL: https://issues.apache.org/jira/browse/HDFS-14043 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Affects Versions: 2.9.2, 3.1.2, 2.9.3 Reporter: Lukas Majercak Assignee: Lukas Majercak -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12284) RBF: Support for Kerberos authentication
[ https://issues.apache.org/jira/browse/HDFS-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661362#comment-16661362 ] Lukas Majercak commented on HDFS-12284: --- [~daryn], I feel like we should distinguish between ServicePrincipalNames and UserPrincipalNames for all services in HDFS, or at least give the admin an option to override the user principal. The _HOST solution is okay, but it relies on DNS giving consistent results. This inconsistency is fine for SPNs, as you can have as many as you want in your keytab, but is not okay for client principals. Say you have a NN running on HOSTNAME, and set it up using hdfs/_HOST@DOMAIN as the principal name. Now, one day, when your NN starts up and tries to resolve itself using _HOST, your DNS server decides to return back HOSTNAME.domain instead of the usual HOSTNAME. Your NN then uses that as the client principal to log in, and will fail. Maybe something like {{dfs.federation.router.kerberos.user.principal}} would be better than {{dfs.federation.router.hostname}} > RBF: Support for Kerberos authentication > > > Key: HDFS-12284 > URL: https://issues.apache.org/jira/browse/HDFS-12284 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Zhe Zhang >Assignee: Sherwood Zheng >Priority: Major > Attachments: HDFS-12284-HDFS-13532.004.patch, > HDFS-12284-HDFS-13532.005.patch, HDFS-12284-HDFS-13532.006.patch, > HDFS-12284-HDFS-13532.007.patch, HDFS-12284-HDFS-13532.008.patch, > HDFS-12284-HDFS-13532.009.patch, HDFS-12284-HDFS-13532.010.patch, > HDFS-12284-HDFS-13532.011.patch, HDFS-12284.000.patch, HDFS-12284.001.patch, > HDFS-12284.002.patch, HDFS-12284.003.patch > > > HDFS Router should support Kerberos authentication and issuing / managing > HDFS delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14010) Pass correct DF usage to ReservedSpaceCalculator builder
[ https://issues.apache.org/jira/browse/HDFS-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657347#comment-16657347 ] Lukas Majercak commented on HDFS-14010: --- 002.patch LGTM. Maybe some warn log when usage==null, and a comment for the unit test. > Pass correct DF usage to ReservedSpaceCalculator builder > > > Key: HDFS-14010 > URL: https://issues.apache.org/jira/browse/HDFS-14010 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Minor > Attachments: HDFS-14010.001.patch, HDFS-14010.002.patch > > > In FsVolumeImpl's constructor, we currently pass the DF usage that was passed > to the constructor to ReservedSpaceCalculator.Builder. This can cause issues > if the usage is changed in the constructor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14010) Pass correct DF usage to ReservedSpaceCalculator builder
[ https://issues.apache.org/jira/browse/HDFS-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14010: -- Affects Version/s: 2.9.2 > Pass correct DF usage to ReservedSpaceCalculator builder > > > Key: HDFS-14010 > URL: https://issues.apache.org/jira/browse/HDFS-14010 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Minor > Attachments: HDFS-14010.001.patch > > > In FsVolumeImpl's constructor, we currently pass the DF usage that was passed > to the constructor to ReservedSpaceCalculator.Builder. This can cause issues > if the usage is changed in the constructor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14010) Pass correct DF usage to ReservedSpaceCalculator builder
[ https://issues.apache.org/jira/browse/HDFS-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14010: -- Attachment: HDFS-14010.001.patch > Pass correct DF usage to ReservedSpaceCalculator builder > > > Key: HDFS-14010 > URL: https://issues.apache.org/jira/browse/HDFS-14010 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Minor > Attachments: HDFS-14010.001.patch > > > In FsVolumeImpl's constructor, we currently pass the DF usage that was passed > to the constructor to ReservedSpaceCalculator.Builder. This can cause issues > if the usage is changed in the constructor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14010) Pass correct DF usage to ReservedSpaceCalculator builder
[ https://issues.apache.org/jira/browse/HDFS-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-14010: -- Description: In FsVolumeImpl's constructor, we currently pass the DF usage that was passed to the constructor to ReservedSpaceCalculator.Builder. This can cause issues if the usage is changed in the constructor. > Pass correct DF usage to ReservedSpaceCalculator builder > > > Key: HDFS-14010 > URL: https://issues.apache.org/jira/browse/HDFS-14010 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Minor > > In FsVolumeImpl's constructor, we currently pass the DF usage that was passed > to the constructor to ReservedSpaceCalculator.Builder. This can cause issues > if the usage is changed in the constructor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14010) Pass correct DF usage to ReservedSpaceCalculator builder
Lukas Majercak created HDFS-14010: - Summary: Pass correct DF usage to ReservedSpaceCalculator builder Key: HDFS-14010 URL: https://issues.apache.org/jira/browse/HDFS-14010 Project: Hadoop HDFS Issue Type: Bug Reporter: Lukas Majercak Assignee: Lukas Majercak -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12284) RBF: Support for Kerberos authentication
[ https://issues.apache.org/jira/browse/HDFS-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655772#comment-16655772 ] Lukas Majercak commented on HDFS-12284: --- There are star imports in every test class import static org.apache.hadoop.fs.contract.router.SecurityConfUtil.*; > RBF: Support for Kerberos authentication > > > Key: HDFS-12284 > URL: https://issues.apache.org/jira/browse/HDFS-12284 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Zhe Zhang >Assignee: Sherwood Zheng >Priority: Major > Attachments: HDFS-12284-HDFS-13532.004.patch, > HDFS-12284-HDFS-13532.005.patch, HDFS-12284-HDFS-13532.006.patch, > HDFS-12284-HDFS-13532.007.patch, HDFS-12284-HDFS-13532.008.patch, > HDFS-12284.000.patch, HDFS-12284.001.patch, HDFS-12284.002.patch, > HDFS-12284.003.patch > > > HDFS Router should support Kerberos authentication and issuing / managing > HDFS delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13976) Backport HDFS-12813 to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-13976: -- Attachment: TestRequestHedgingProxyProvider.png > Backport HDFS-12813 to branch-2.9 > - > > Key: HDFS-13976 > URL: https://issues.apache.org/jira/browse/HDFS-13976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Fix For: 2.9.2 > > Attachments: HDFS-12813.branch-2.001.patch, > HDFS-12813.branch-2.9.001.patch, TestRequestHedgingProxyProvider.png > > > 2.9 also shows the issue from HDFS-12813: > HDFS-11395 fixed the problem where the MultiException thrown by > RequestHedgingProxyProvider was hidden. However when the target proxy size is > 1, then unwrapping is not done for the InvocationTargetException. for target > proxy size of 1, the unwrapping should be done till first level where as for > multiple proxy size, it should be done at 2 levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13976) Backport HDFS-12813 to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645401#comment-16645401 ] Lukas Majercak commented on HDFS-13976: --- Ran the tests on branch-2.9 and it seems fine, TestRequestHedgingProxyProvider passes and everything else seems intact: !TestRequestHedgingProxyProvider.png! > Backport HDFS-12813 to branch-2.9 > - > > Key: HDFS-13976 > URL: https://issues.apache.org/jira/browse/HDFS-13976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Fix For: 2.9.2 > > Attachments: HDFS-12813.branch-2.001.patch, > HDFS-12813.branch-2.9.001.patch, TestRequestHedgingProxyProvider.png > > > 2.9 also shows the issue from HDFS-12813: > HDFS-11395 fixed the problem where the MultiException thrown by > RequestHedgingProxyProvider was hidden. However when the target proxy size is > 1, then unwrapping is not done for the InvocationTargetException. for target > proxy size of 1, the unwrapping should be done till first level where as for > multiple proxy size, it should be done at 2 levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13976) Backport HDFS-12813 to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-13976: -- Attachment: HDFS-12813.branch-2.9.001.patch > Backport HDFS-12813 to branch-2.9 > - > > Key: HDFS-13976 > URL: https://issues.apache.org/jira/browse/HDFS-13976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Fix For: 2.9.2 > > Attachments: HDFS-12813.branch-2.001.patch, > HDFS-12813.branch-2.9.001.patch > > > 2.9 also shows the issue from HDFS-12813: > HDFS-11395 fixed the problem where the MultiException thrown by > RequestHedgingProxyProvider was hidden. However when the target proxy size is > 1, then unwrapping is not done for the InvocationTargetException. for target > proxy size of 1, the unwrapping should be done till first level where as for > multiple proxy size, it should be done at 2 levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13976) Backport HDFS-12813 to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-13976: -- Attachment: HDFS-12813.branch-2.001.patch > Backport HDFS-12813 to branch-2.9 > - > > Key: HDFS-13976 > URL: https://issues.apache.org/jira/browse/HDFS-13976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Fix For: 2.9.2 > > Attachments: HDFS-12813.branch-2.001.patch > > > 2.9 also shows the issue from HDFS-12813: > HDFS-11395 fixed the problem where the MultiException thrown by > RequestHedgingProxyProvider was hidden. However when the target proxy size is > 1, then unwrapping is not done for the InvocationTargetException. for target > proxy size of 1, the unwrapping should be done till first level where as for > multiple proxy size, it should be done at 2 levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13976) Backport HDFS-12813 to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-13976: -- Fix Version/s: 2.9.2 > Backport HDFS-12813 to branch-2.9 > - > > Key: HDFS-13976 > URL: https://issues.apache.org/jira/browse/HDFS-13976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: Lukas Majercak >Priority: Major > Fix For: 2.9.2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13976) Backport HDFS-12813 to branch-2.9
Lukas Majercak created HDFS-13976: - Summary: Backport HDFS-12813 to branch-2.9 Key: HDFS-13976 URL: https://issues.apache.org/jira/browse/HDFS-13976 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, hdfs-client Reporter: Lukas Majercak -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13792) Fix FSN read/write lock metrics name
[ https://issues.apache.org/jira/browse/HDFS-13792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570761#comment-16570761 ] Lukas Majercak commented on HDFS-13792: --- Thanks for this [~csun], patch001 LGTM. > Fix FSN read/write lock metrics name > > > Key: HDFS-13792 > URL: https://issues.apache.org/jira/browse/HDFS-13792 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, metrics >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Trivial > Attachments: HDFS-13792.000.patch, HDFS-13792.001.patch > > > The metrics name for FSN read/write lock should be in the format: > {code} > FSN(Read|Write)Lock`*OperationName*`NanosNumOps > {code} > not > {code} > FSN(Read|Write)Lock`*OperationName*`NumOps > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13757) After HDFS-12886, close() can throw AssertionError "Negative replicas!"
[ https://issues.apache.org/jira/browse/HDFS-13757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551420#comment-16551420 ] Lukas Majercak commented on HDFS-13757: --- Surely the test should fail if you disable IBR? > After HDFS-12886, close() can throw AssertionError "Negative replicas!" > --- > > Key: HDFS-13757 > URL: https://issues.apache.org/jira/browse/HDFS-13757 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.0, 2.10.0, 2.9.1, 3.2.0, 3.0.3 >Reporter: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-13757.test.02.patch, HDFS-13757.test.patch > > > While investigating a data corruption bug caused by concurrent recoverLease() > and close(), I found HDFS-12886 may cause close() to throw AssertionError > under a corner case, because the block has zero live replica, and client > calls recoverLease() immediately followed by close(). > {noformat} > org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Negative > replicas! > at > org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks.getPriority(LowRedundancyBlocks.java:197) > at > org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks.update(LowRedundancyBlocks.java:422) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.updateNeededReconstructions(BlockManager.java:4274) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.commitOrCompleteLastBlock(BlockManager.java:1001) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3471) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:713) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:671) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2854) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:928) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:607) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I have a test case to reproduce it. > [~lukmajercak] [~elgoiri] would you please take a look at it? I think we > should add a check to reject completeFile() if the block is under recovery, > similar to what's proposed in HDFS-10240. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13757) After HDFS-12886, close() can throw AssertionError "Negative replicas!"
[ https://issues.apache.org/jira/browse/HDFS-13757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551418#comment-16551418 ] Lukas Majercak commented on HDFS-13757: --- Hi [~jojochuang]. Unfortunately, I haven't been able to reproduce this on trunk (ran the test 150~ times). I could reproduce something on 2.9, but the exception was not the same and the failed run did not even touch the code added in HDFS-12886. > After HDFS-12886, close() can throw AssertionError "Negative replicas!" > --- > > Key: HDFS-13757 > URL: https://issues.apache.org/jira/browse/HDFS-13757 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.0, 2.10.0, 2.9.1, 3.2.0, 3.0.3 >Reporter: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-13757.test.02.patch, HDFS-13757.test.patch > > > While investigating a data corruption bug caused by concurrent recoverLease() > and close(), I found HDFS-12886 may cause close() to throw AssertionError > under a corner case, because the block has zero live replica, and client > calls recoverLease() immediately followed by close(). > {noformat} > org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Negative > replicas! > at > org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks.getPriority(LowRedundancyBlocks.java:197) > at > org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks.update(LowRedundancyBlocks.java:422) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.updateNeededReconstructions(BlockManager.java:4274) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.commitOrCompleteLastBlock(BlockManager.java:1001) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3471) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:713) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:671) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2854) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:928) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:607) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {noformat} > I have a test case to reproduce it. > [~lukmajercak] [~elgoiri] would you please take a look at it? I think we > should add a check to reject completeFile() if the block is under recovery, > similar to what's proposed in HDFS-10240. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13714) Fix TestNameNodePrunesMissingStorages test failures on Windows
[ https://issues.apache.org/jira/browse/HDFS-13714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530423#comment-16530423 ] Lukas Majercak edited comment on HDFS-13714 at 7/2/18 8:44 PM: --- The rename used is standard Java. From docs: {code:java} Renames the file denoted by this abstract pathname. * * Many aspects of the behavior of this method are inherently * platform-dependent: The rename operation might not be able to move a * file from one filesystem to another, it might not be atomic, and it * might not succeed if a file with the destination abstract pathname * already exists. The return value should always be checked to make sure * that the rename operation was successful.{code} Seems like this fails on Windows. Proposing to change to delete() followed by renameTo() was (Author: lukmajercak): The rename used is standard Java. From docs: {code:java} Renames the file denoted by this abstract pathname. * * Many aspects of the behavior of this method are inherently * platform-dependent: The rename operation might not be able to move a * file from one filesystem to another, it might not be atomic, and it * might not succeed if a file with the destination abstract pathname * already exists. The return value should always be checked to make sure * that the rename operation was successful.{code} Seems like this fails on Windows. Proposing to change to delete() followed by renameTo() > Fix TestNameNodePrunesMissingStorages test failures on Windows > -- > > Key: HDFS-13714 > URL: https://issues.apache.org/jira/browse/HDFS-13714 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode, test >Affects Versions: 3.1.0, 2.9.1, 3.2.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: windows > Attachments: HDFS-13714.000.patch > > > Failed here: > https://builds.apache.org/job/hadoop-trunk-win/508/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestNameNodePrunesMissingStorages/testRenamingStorageIds/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13714) Fix TestNameNodePrunesMissingStorages test failures on Windows
[ https://issues.apache.org/jira/browse/HDFS-13714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530423#comment-16530423 ] Lukas Majercak commented on HDFS-13714: --- The rename used is standard Java. From docs: {code:java} Renames the file denoted by this abstract pathname. * * Many aspects of the behavior of this method are inherently * platform-dependent: The rename operation might not be able to move a * file from one filesystem to another, it might not be atomic, and it * might not succeed if a file with the destination abstract pathname * already exists. The return value should always be checked to make sure * that the rename operation was successful.{code} Seems like this fails on Windows. Proposing to change to delete() followed by renameTo() > Fix TestNameNodePrunesMissingStorages test failures on Windows > -- > > Key: HDFS-13714 > URL: https://issues.apache.org/jira/browse/HDFS-13714 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode, test >Affects Versions: 3.1.0, 2.9.1, 3.2.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: windows > Attachments: HDFS-13714.000.patch > > > Failed here: > https://builds.apache.org/job/hadoop-trunk-win/508/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestNameNodePrunesMissingStorages/testRenamingStorageIds/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13714) Fix TestNameNodePrunesMissingStorages test failures on Windows
[ https://issues.apache.org/jira/browse/HDFS-13714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-13714: -- Affects Version/s: 3.2.0 > Fix TestNameNodePrunesMissingStorages test failures on Windows > -- > > Key: HDFS-13714 > URL: https://issues.apache.org/jira/browse/HDFS-13714 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode, test >Affects Versions: 3.1.0, 2.9.1, 3.2.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: windows > Attachments: HDFS-13714.000.patch > > > Failed here: > https://builds.apache.org/job/hadoop-trunk-win/508/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestNameNodePrunesMissingStorages/testRenamingStorageIds/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-13714) Fix TestNameNodePrunesMissingStorages test failures on Windows
[ https://issues.apache.org/jira/browse/HDFS-13714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-13714 started by Lukas Majercak. - > Fix TestNameNodePrunesMissingStorages test failures on Windows > -- > > Key: HDFS-13714 > URL: https://issues.apache.org/jira/browse/HDFS-13714 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode, test >Affects Versions: 3.1.0, 2.9.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: windows > Attachments: HDFS-13714.000.patch > > > Failed here: > https://builds.apache.org/job/hadoop-trunk-win/508/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestNameNodePrunesMissingStorages/testRenamingStorageIds/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13714) Fix TestNameNodePrunesMissingStorages test failures on Windows
[ https://issues.apache.org/jira/browse/HDFS-13714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-13714: -- Attachment: HDFS-13714.000.patch > Fix TestNameNodePrunesMissingStorages test failures on Windows > -- > > Key: HDFS-13714 > URL: https://issues.apache.org/jira/browse/HDFS-13714 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode, test >Affects Versions: 3.1.0, 2.9.1 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: windows > Attachments: HDFS-13714.000.patch > > > Failed here: > https://builds.apache.org/job/hadoop-trunk-win/508/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestNameNodePrunesMissingStorages/testRenamingStorageIds/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org