[jira] [Resolved] (HDFS-15687) allowSnapshot fails when directory already has a Trash sub directory

2020-12-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDFS-15687.

Fix Version/s: 3.4.0
   Resolution: Duplicate

> allowSnapshot fails when directory already has a Trash sub directory
> 
>
> Key: HDFS-15687
> URL: https://issues.apache.org/jira/browse/HDFS-15687
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
>
> Steps :
> 1. Create an encryption zone , Trash directory would be created inside EZ 
> directory.
> /opt/cloudera/parcels/CDH/bin/hdfs crypto -createZone -keyName 
> testkeysnapshot1605613314 -path 
> /user/hrt_6/test_dir1/snap_encrypt_dir1605613504
> 2. Try to make the EZ directory snapshottable.
> /opt/cloudera/parcels/CDH/bin/hdfs dfsadmin -allowSnapshot 
> /user/hrt_6/test_dir1/snap_encrypt_dir1605613504
> It fails with error :
> {noformat}
> /opt/cloudera/parcels/CDH/bin/hdfs dfsadmin -allowSnapshot 
> /user/hrt_6/test_dir1/snap_encrypt_dir1605613504
> 2020-11-17 11:45:16,598|INFO|MainThread|machine.py:180 - 
> run()||GUID=b35fc918-ed08-4c5d-92c1-c5aab449fb10|allowSnapshot: Can't 
> provision trash for snapshottable directory 
> /user/hrt_6/test_dir1/snap_encrypt_dir1605613504 because trash path 
> /user/hrt_6/test_dir1/snap_encrypt_dir1605613504/.Trash already exists.
> 2020-11-17 11:45:16,956|INFO|MainThread|machine.py:209 - 
> run()||GUID=b35fc918-ed08-4c5d-92c1-c5aab449fb10|Exit Code: 255{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15719) [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket timeout

2020-12-15 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250099#comment-17250099
 ] 

Akira Ajisaka commented on HDFS-15719:
--

FYI: In our environment, we set 60 seconds idle timeout for HttpFS.

In Hadoop 2.x HttpFS, it was Tomcat 6:
https://www.slideshare.net/techblogyahoo/hdfs-migration-from-27-to-33-and-enabling-router-based-federation-rbf-in-production-acah2020/31

> [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket 
> timeout
> -
>
> Key: HDFS-15719
> URL: https://issues.apache.org/jira/browse/HDFS-15719
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After Hadoop 3, we migrated Jetty 6 to Jetty 9. It was implemented in 
> HADOOP-10075.
> However, HADOOP-10075 erroneously set the HttpServer2 socket idle timeout too 
> low.
> We replaced SelectChannelConnector.setLowResourceMaxIdleTime() with 
> ServerConnector.setIdleTimeout() but they aren't the same.
> Essentially, the HttpServer2's idle timeout was the default timeout set by 
> Jetty 6, which is 200 seconds. After Hadoop 3, the idle timeout is set to 10 
> seconds, which is unreasonable for JN. If NameNodes try to download a big 
> edit log from JournalNodes (say a few hundred MB), it is likely to exceed 10 
> seconds. When it happens, both NN crashes and there's no way to workaround 
> unless you apply the patch in HADOOP-15696 to add a config switch for the 
> idle timeout. Fortunately, it doesn't happen a lot.
> Propose: bump the idle timeout default to 200 seconds to match the behavior 
> in Jetty 6. (Jetty 9 reduces the default idle timeout to 30 seconds, which is 
> not suitable for JN)
> Other things to consider:
> 1. fsck serverlet? (somehow I suspect this is related to the socket timeout 
> reported in HDFS-7175)
> 2. webhdfs, httpfs? --> we've also received reports that webhdfs can timeout. 
> so having a longer timeout makes sense here.
> 2. kms? will the longer timeout cause more lingering sockets?
> Thanks [~zhenshan.wen] for the discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15719) [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket timeout

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15719?focusedWorklogId=524828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524828
 ]

ASF GitHub Bot logged work on HDFS-15719:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 04:41
Start Date: 16/Dec/20 04:41
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #2533:
URL: https://github.com/apache/hadoop/pull/2533#issuecomment-745759540


   Thanx @jojochuang for the work here.
   The PR seems to include two Jira's, I think the timeout stuff is sufficient 
alone, If so, Can you update the PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524828)
Time Spent: 0.5h  (was: 20m)

> [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket 
> timeout
> -
>
> Key: HDFS-15719
> URL: https://issues.apache.org/jira/browse/HDFS-15719
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After Hadoop 3, we migrated Jetty 6 to Jetty 9. It was implemented in 
> HADOOP-10075.
> However, HADOOP-10075 erroneously set the HttpServer2 socket idle timeout too 
> low.
> We replaced SelectChannelConnector.setLowResourceMaxIdleTime() with 
> ServerConnector.setIdleTimeout() but they aren't the same.
> Essentially, the HttpServer2's idle timeout was the default timeout set by 
> Jetty 6, which is 200 seconds. After Hadoop 3, the idle timeout is set to 10 
> seconds, which is unreasonable for JN. If NameNodes try to download a big 
> edit log from JournalNodes (say a few hundred MB), it is likely to exceed 10 
> seconds. When it happens, both NN crashes and there's no way to workaround 
> unless you apply the patch in HADOOP-15696 to add a config switch for the 
> idle timeout. Fortunately, it doesn't happen a lot.
> Propose: bump the idle timeout default to 200 seconds to match the behavior 
> in Jetty 6. (Jetty 9 reduces the default idle timeout to 30 seconds, which is 
> not suitable for JN)
> Other things to consider:
> 1. fsck serverlet? (somehow I suspect this is related to the socket timeout 
> reported in HDFS-7175)
> 2. webhdfs, httpfs? --> we've also received reports that webhdfs can timeout. 
> so having a longer timeout makes sense here.
> 2. kms? will the longer timeout cause more lingering sockets?
> Thanks [~zhenshan.wen] for the discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14737) Writing data through the HDFS nfs3 service is very slow, and timeout occurs while mount directories on the nfs client

2020-12-15 Thread Daniel Howard (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249994#comment-17249994
 ] 

Daniel Howard commented on HDFS-14737:
--

On 3.2.1 I'm seeing the NFS server wedge up with messages like these:

{{2020-12-15 14:35:49,558 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got 
overwrite [2661285888-2662334464) smaller than current offset 2792085880, drop 
the request.}}
{{2020-12-15 14:35:49,609 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
Perfect overwrite has same content, updating the mtime, then return success}}
{{2020-12-15 14:35:49,701 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got 
overwrite [2662334464-2663383040) smaller than current offset 2792085880, drop 
the request.}}
{{2020-12-15 14:35:49,753 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
Perfect overwrite has same content, updating the mtime, then return success}}

> Writing data through the HDFS nfs3 service is very slow, and timeout occurs 
> while mount directories on the nfs client
> -
>
> Key: HDFS-14737
> URL: https://issues.apache.org/jira/browse/HDFS-14737
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: liying
>Assignee: liying
>Priority: Major
>
> We know the NFS Gateway supports NFSv3 and allows HDFS to be mounted as part 
> of the client’s local file system.  I start the portmap and nfs3server in the 
> hadoop node,and use the command (mount -t nfs -o vers=3,proto=tcp,nolock 
> nfs3serverIP:/ /hdfs) to mount. It works well。Then i ues this command to 
> mount on many client. And i used linux cp to copy files to the mounted 
> dir(/hdfs),found the speed was very slow . There is a lot of information in 
> the log as the follow:
> 2019-08-14 15:36:28,347 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:03,093 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:03,850 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:14,780 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:14,928 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:28,410 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:09,310 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:10,069 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:14,856 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:14,957 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:28,475 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:14,923 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:14,987 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:15,541 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:16,287 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:28,530 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,015 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,024 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,028 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:21,757 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:22,508 INFO 

[jira] [Issue Comment Deleted] (HDFS-14737) Writing data through the HDFS nfs3 service is very slow, and timeout occurs while mount directories on the nfs client

2020-12-15 Thread Daniel Howard (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Howard updated HDFS-14737:
-
Comment: was deleted

(was: On 3.2.1 I'm seeing the NFS server wedge up with messages like these:

{{2020-12-15 14:35:49,558 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got 
overwrite [2661285888-2662334464) smaller than current offset 2792085880, drop 
the request.}}
{{2020-12-15 14:35:49,609 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
Perfect overwrite has same content, updating the mtime, then return success}}
{{2020-12-15 14:35:49,701 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got 
overwrite [2662334464-2663383040) smaller than current offset 2792085880, drop 
the request.}}
{{2020-12-15 14:35:49,753 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
Perfect overwrite has same content, updating the mtime, then return success}})

> Writing data through the HDFS nfs3 service is very slow, and timeout occurs 
> while mount directories on the nfs client
> -
>
> Key: HDFS-14737
> URL: https://issues.apache.org/jira/browse/HDFS-14737
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: liying
>Assignee: liying
>Priority: Major
>
> We know the NFS Gateway supports NFSv3 and allows HDFS to be mounted as part 
> of the client’s local file system.  I start the portmap and nfs3server in the 
> hadoop node,and use the command (mount -t nfs -o vers=3,proto=tcp,nolock 
> nfs3serverIP:/ /hdfs) to mount. It works well。Then i ues this command to 
> mount on many client. And i used linux cp to copy files to the mounted 
> dir(/hdfs),found the speed was very slow . There is a lot of information in 
> the log as the follow:
> 2019-08-14 15:36:28,347 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:03,093 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:03,850 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:14,780 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:14,928 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:28,410 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:09,310 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:10,069 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:14,856 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:14,957 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:28,475 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:14,923 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:14,987 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:15,541 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:16,287 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:28,530 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,015 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,024 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,028 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:21,757 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:22,508 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> 

[jira] [Updated] (HDFS-14272) [SBN read] ObserverReadProxyProvider should sync with active txnID on startup

2020-12-15 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14272:
--
Fix Version/s: 2.10.0
   3.2.1
   3.1.3

> [SBN read] ObserverReadProxyProvider should sync with active txnID on startup
> -
>
> Key: HDFS-14272
> URL: https://issues.apache.org/jira/browse/HDFS-14272
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
> Environment: CDH6.1 (Hadoop 3.0.x) + Consistency Reads from Standby + 
> SSL + Kerberos + RPC encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 2.10.0, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14272.000.patch, HDFS-14272.001.patch, 
> HDFS-14272.002.patch
>
>
> It is typical for integration tests to create some files and then check their 
> existence. For example, like the following simple bash script:
> {code:java}
> # hdfs dfs -touchz /tmp/abc
> # hdfs dfs -ls /tmp/abc
> {code}
> The test executes HDFS bash command sequentially, but it may fail with 
> Consistent Standby Read because the -ls does not find the file.
> Analysis: the second bash command, while launched sequentially after the 
> first one, is not aware of the state id returned from the first bash command. 
> So ObserverNode wouldn't wait for the the edits to get propagated, and thus 
> fails.
> I've got a cluster where the Observer has tens of seconds of RPC latency, and 
> this becomes very annoying. (I am still trying to figure out why this 
> Observer has such a long RPC latency. But that's another story.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15704) Mitigate lease monitor's rapid infinite loop

2020-12-15 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249739#comment-17249739
 ] 

Jim Brennan commented on HDFS-15704:


Copying my comment about the TreeMap to HashMap change here:
{quote}
The reason a TreeMap was used here was to maintain a sorted order, which 
allowed the checkLeases() to exit the while loop as soon as it hit an unexpired 
lease.

The new design removes the need for the TreeMap by pruning the list it passes 
to checkLeases().
{quote}


> Mitigate lease monitor's rapid infinite loop
> 
>
> Key: HDFS-15704
> URL: https://issues.apache.org/jira/browse/HDFS-15704
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [~daryn] reported that the lease monitor goes into a rapid infinite loop if 
> an exception occurs during a lease recovery.  The two main issues are:
> # lease monitor thread does not sleep if an exception occurs before looping 
> again
> # the loop peeks at the first element of a sorted tree set so when an 
> exception occurs, the "bad" lease remains as the first element preventing 
> recovery of other leases.
> This jira is not intended to fix the underlying issues causing the exception 
> during recovery but merely to mitigate the cited issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15704) Mitigate lease monitor's rapid infinite loop

2020-12-15 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249737#comment-17249737
 ] 

Jim Brennan commented on HDFS-15704:


Thanks for the updated PR [~ahussein].  I am +1 on this patch.  I will wait 
another day or two to see if anyone else wants to comment before committing it.

cc: [~daryn], [~kihwal], [~sodonnell], [~weichiu]

> Mitigate lease monitor's rapid infinite loop
> 
>
> Key: HDFS-15704
> URL: https://issues.apache.org/jira/browse/HDFS-15704
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [~daryn] reported that the lease monitor goes into a rapid infinite loop if 
> an exception occurs during a lease recovery.  The two main issues are:
> # lease monitor thread does not sleep if an exception occurs before looping 
> again
> # the loop peeks at the first element of a sorted tree set so when an 
> exception occurs, the "bad" lease remains as the first element preventing 
> recovery of other leases.
> This jira is not intended to fix the underlying issues causing the exception 
> during recovery but merely to mitigate the cited issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12861) Track speed in DFSClient

2020-12-15 Thread huhaiyang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249676#comment-17249676
 ] 

huhaiyang commented on HDFS-12861:
--

 
{code:java}
// code placeholder
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
 
b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
index be822d664f8..ea216bc04e3 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
@@ -165,6 +165,19 @@ public long getDownstreamAckTimeNanos() {
 return proto.getDownstreamAckTimeNanos();
   }
 
+  /**
+   * Get packet processing time of datanode at the given index in the pipeline.
+   * @param i - datanode index in the pipeline
+   */
+  public long getPacketProcessingTime(int i) {
+if (proto.getPacketProcessingTimeNanosCount() > i) {
+  return proto.getPacketProcessingTimeNanos(i);
+} else {
+  // Return -1 if datanode at this index didn't send this info
+  return -1;
+}
+  }
+  
   /**
* Check if this ack contains error status
* @return true if all statuses are SUCCESS
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/datatransfer.proto 
b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/datatransfer.proto
index 2356201f04d..dfededb7619 100644
--- a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/datatransfer.proto
+++ b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/datatransfer.proto
@@ -260,6 +260,7 @@ message PipelineAckProto {
   repeated Status reply = 2;
   optional uint64 downstreamAckTimeNanos = 3 [default = 0];
   repeated uint32 flag = 4 [packed=true];
+  repeated uint64 packetProcessingTimeNanos = 100;
 }
{code}
hi [~elgoiri]  Consult, I found that there is no set packetProcessingTimeNanos 
value method found in the current patch?
Looking forward to your reply, Thanks!

 

> Track speed in DFSClient
> 
>
> Key: HDFS-12861
> URL: https://issues.apache.org/jira/browse/HDFS-12861
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: María Fernanda Borge
>Priority: Major
> Attachments: HDFS-12861-10-april-18.patch
>
>
> Sometimes we get slow jobs because of the access to HDFS. However, is hard to 
> tell what is the actual speed. We propose to add a log line with something 
> like:
> {code}
> 2017-11-19 09:55:26,309 INFO [main] hdfs.DFSClient: blk_1107222019_38144502 
> READ 129500B in 7ms 17.6MB/s
> 2017-11-27 19:01:04,141 INFO [DataStreamer for file 
> /hdfs-federation/stats/2017/11/27/151183800.json] hdfs.DFSClient: 
> blk_1135792057_86833357 WRITE 131072B in 10ms 12.5MB/s
> 2017-11-27 19:01:14,219 INFO [DataStreamer for file 
> /hdfs-federation/stats/2017/11/27/151183800.json] hdfs.DFSClient: 
> blk_1135792069_86833369 WRITE 131072B in 12ms 10.4MB/s
> 2017-11-27 19:01:24,282 INFO [DataStreamer for file 
> /hdfs-federation/stats/2017/11/27/151183800.json] hdfs.DFSClient: 
> blk_1135792081_86833381 WRITE 131072B in 11ms 11.4MB/s
> 2017-11-27 19:01:34,330 INFO [DataStreamer for file 
> /hdfs-federation/stats/2017/11/27/151183800.json] hdfs.DFSClient: 
> blk_1135792093_86833393 WRITE 131072B in 11ms 11.4MB/s
> 2017-11-27 19:01:44,408 INFO [DataStreamer for file 
> /hdfs-federation/stats/2017/11/27/151183800.json] hdfs.DFSClient: 
> blk_1135792105_86833405 WRITE 131072B in 11ms 11.4MB/s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15628) HttpFS server throws NPE if a file is a symlink

2020-12-15 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15628:
---
Fix Version/s: 3.2.2

Add fix version 3.2.2 + 3.2.3, Please update fix version tag in time when 
commit and backport, Thanks.

> HttpFS server throws NPE if a file is a symlink
> ---
>
> Key: HDFS-15628
> URL: https://issues.apache.org/jira/browse/HDFS-15628
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch
>
>
> If a directory containing a symlink is listed, the client (WebHfdsFileSystem) 
> blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} 
> field whose value is the link target string. HttpFS returns a response 
> without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a 
> symlink and blindly tries to parse it, causing NPE.
> This is not an issue if the destination cluster does not have symlinks 
> enabled.
>  
> {code:bash}
> java.io.IOException: localhost:55901: Response decoding failure: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230)
>   at 
> org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at 
> org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42)
>   at 
> org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74)
>   at 
> org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at 

[jira] [Updated] (HDFS-15583) Backport DirectoryScanner improvements HDFS-14476, HDFS-14751 and HDFS-15048 to branch 3.2 and 3.1

2020-12-15 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15583:
---
Fix Version/s: 3.2.2

Add fix version 3.2.2 + 3.2.3, Please update fix version tag in time when 
commit and backport, Thanks.

> Backport DirectoryScanner improvements HDFS-14476, HDFS-14751 and HDFS-15048 
> to branch 3.2 and 3.1
> --
>
> Key: HDFS-15583
> URL: https://issues.apache.org/jira/browse/HDFS-15583
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0, 3.2.1
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.2.2, 3.1.5, 3.2.3
>
> Attachments: HDFS-15583.branch-3.2.001.patch
>
>
> HDFS-14476, HDFS-14751 and HDFS-15048 made some good improvements to the 
> datanode DirectoryScanner, but due to a large refactor on that class in 
> branch-3.3, they are not trivial to backport to earlier branches.
> HDFS-14476 introduced the problem in HDFS-14751 and a findbugs warning, fixed 
> in HDFS-15048, so these 3 need to be backported together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15574) Remove unnecessary sort of block list in DirectoryScanner

2020-12-15 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15574:
---
Fix Version/s: 3.2.2

Add fix version 3.2.2 + 3.2.3, Please update fix version tag in time when 
commit and backport, Thanks.

> Remove unnecessary sort of block list in DirectoryScanner
> -
>
> Key: HDFS-15574
> URL: https://issues.apache.org/jira/browse/HDFS-15574
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15574.001.patch, HDFS-15574.002.patch, 
> HDFS-15574.003.patch, HDFS-15574.branch-3.2.001.patch, 
> HDFS-15574.branch-3.2.002.patch, HDFS-15574.branch-3.3.001.patch, 
> HDFS-15574.branch-3.3.002.patch
>
>
> These lines of code in DirectoryScanner#scan(), obtain a snapshot of the 
> finalized blocks from memory, and then sort them, under the DN lock. However 
> the blocks are stored in a sorted structure (FoldedTreeSet) and hence the 
> sort should be unnecessary.
> {code}
>   final List bl = dataset.getFinalizedBlocks(bpid);
>   Collections.sort(bl); // Sort based on blockId
> {code}
> This Jira removes the sort, and renames the getFinalizedBlocks to 
> getSortedFinalizedBlocks to make the intent of the method more clear.
> Also added a test, just in case the underlying block structure is ever 
> changed to something unsorted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicitly.

2020-12-15 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15567:
---
Fix Version/s: 3.2.2

> [SBN Read] HDFS should expose msync() API to allow downstream applications 
> call it explicitly.
> --
>
> Key: HDFS-15567
> URL: https://issues.apache.org/jira/browse/HDFS-15567
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: HDFS-15567.001.patch, HDFS-15567.002.patch
>
>
> Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which 
> updates client's state ID with current state of the Active NameNode to 
> guarantee consistency of subsequent calls to an ObserverNode. Currently this 
> API is exposed via {{DFSClient}} only, which makes it hard for applications 
> to access {{msync()}}. One way is to use something like this:
> {code}
> if(fs instanceof DistributedFileSystem) {
>   ((DistributedFileSystem)fs).getClient().msync();
> }
> {code}
> This should be exposed both for {{FileSystem}} and {{FileContext}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15478) When Empty mount points, we are assigning fallback link to self. But it should not use full URI for target fs.

2020-12-15 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15478:
---
Fix Version/s: 3.2.3
   3.2.2

Add fix version 3.2.2 + 3.2.3, Please update fix version tag in time when 
commit and backport, Thanks.

> When Empty mount points, we are assigning fallback link to self. But it 
> should not use full URI for target fs.
> --
>
> Key: HDFS-15478
> URL: https://issues.apache.org/jira/browse/HDFS-15478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
>
> On empty mount tables detection, we will automatically assign fallback with 
> the same initialized uri fs. Currently we are using given uri for creating 
> target fs. 
> When creating target fs, we use Chrooted fs where it will set the path from 
> uri as base directory.  So, this can make path wrong in the case of fs 
> initialized with path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15464) ViewFsOverloadScheme should work when -fs option pointing to remote cluster without mount links

2020-12-15 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15464:
---
Fix Version/s: 3.2.3
   3.2.2

Add fix version 3.2.2 + 3.2.3, Please update fix version tag in time when 
commit and backport, Thanks.

> ViewFsOverloadScheme should work when -fs option pointing to remote cluster 
> without mount links
> ---
>
> Key: HDFS-15464
> URL: https://issues.apache.org/jira/browse/HDFS-15464
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.2.3
>
>
> When users try to connect to remote cluster from the cluster env where you 
> enabled ViewFSOverloadScheme, it expects to have at least one mount link make 
> fs init success. 
> Unfortunately you might not have configured any mount links with that remote 
> cluster in your current env. You would have configured only with your local 
> clusters mount points.
> In this case fs init will fail with no mount points configured the mount 
> table if that remote cluster uri's authority.
> One idea is that, when there are no mount links configured, we should just 
> consider that as default cluster, that can be achieved by considering it as 
> fallback option automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org