[jira] [Updated] (HDFS-13913) LazyPersistFileScrubber.run() error handling is poor
[ https://issues.apache.org/jira/browse/HDFS-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-13913: Status: Patch Available (was: Open) > LazyPersistFileScrubber.run() error handling is poor > > > Key: HDFS-13913 > URL: https://issues.apache.org/jira/browse/HDFS-13913 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.0 >Reporter: Daniel Templeton >Assignee: Daniel Green >Priority: Minor > Attachments: HDFS-13913.001.patch > > > In {{LazyPersistFileScrubber.run()}} we have: > {code} > try { > clearCorruptLazyPersistFiles(); > } catch (Exception e) { > FSNamesystem.LOG.error( > "Ignoring exception in LazyPersistFileScrubber:", e); > } > {code} > First problem is that catching {{Exception}} is sloppy. It should instead be > a multicatch for the actual exceptions thrown or better a set of separate > catch statements that react appropriately to the type of exception. > Second problem is that it's bad to log an ERROR that's not actionable and > that can be safely ignored. The log message should be logged at WARN or INFO > level. > Third, the log message is useless. If it's going to be a WARN or ERROR, a > log message should be actionable. Otherwise it's an info. A log message > should contain enough information for an admin to understand what it means. > In the end, I think the right thing here is to leave the high-level behavior > unchanged: log a message and ignore the error, hoping that the next run will > go better. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13913) LazyPersistFileScrubber.run() error handling is poor
[ https://issues.apache.org/jira/browse/HDFS-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885425#comment-16885425 ] Daniel Templeton commented on HDFS-13913: - LGTM +1 pending Jenkins. > LazyPersistFileScrubber.run() error handling is poor > > > Key: HDFS-13913 > URL: https://issues.apache.org/jira/browse/HDFS-13913 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.0 >Reporter: Daniel Templeton >Assignee: Daniel Green >Priority: Minor > Attachments: HDFS-13913.001.patch > > > In {{LazyPersistFileScrubber.run()}} we have: > {code} > try { > clearCorruptLazyPersistFiles(); > } catch (Exception e) { > FSNamesystem.LOG.error( > "Ignoring exception in LazyPersistFileScrubber:", e); > } > {code} > First problem is that catching {{Exception}} is sloppy. It should instead be > a multicatch for the actual exceptions thrown or better a set of separate > catch statements that react appropriately to the type of exception. > Second problem is that it's bad to log an ERROR that's not actionable and > that can be safely ignored. The log message should be logged at WARN or INFO > level. > Third, the log message is useless. If it's going to be a WARN or ERROR, a > log message should be actionable. Otherwise it's an info. A log message > should contain enough information for an admin to understand what it means. > In the end, I think the right thing here is to leave the high-level behavior > unchanged: log a message and ignore the error, hoping that the next run will > go better. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-9499) Fix typos in DFSAdmin.java
[ https://issues.apache.org/jira/browse/HDFS-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton resolved HDFS-9499. Resolution: Invalid Looks like it's already been resolved by another JIRA. > Fix typos in DFSAdmin.java > -- > > Key: HDFS-9499 > URL: https://issues.apache.org/jira/browse/HDFS-9499 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.8.0 >Reporter: Arpit Agarwal >Assignee: Daniel Green >Priority: Major > > There are multiple instances of 'snapshot' spelled as 'snaphot' in > DFSAdmin.java and TestSnapshotCommands.java. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14047) [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c
[ https://issues.apache.org/jira/browse/HDFS-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870037#comment-16870037 ] Daniel Templeton commented on HDFS-14047: - I can't at the moment; no desktop/laptop. @weichiu, could you do the honors? > [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c > > > Key: HDFS-14047 > URL: https://issues.apache.org/jira/browse/HDFS-14047 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: libhdfs, native >Reporter: Anatoli Shein >Assignee: Anatoli Shein >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14047.000.patch, HDFS-14047.001.patch > > > Currently the native client CI tests break deterministically with these > errors: > Libhdfs > 1 - test_test_libhdfs_threaded_hdfs_static (Failed) > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > [exec] hdfsOpenFile(/tlhData0001/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > [exec] (unable to get root cause for java.io.FileNotFoundException) > [exec] (unable to get stack trace for java.io.FileNotFoundException) > > Libhdfs++ > 34 - test_libhdfs_threaded_hdfspp_test_shim_static (Failed) > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > [exec] hdfsOpenFile(/tlhData0001/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > [exec] (unable to get root cause for java.io.FileNotFoundException) > [exec] (unable to get stack trace for java.io.FileNotFoundException) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14487) Missing Space in Client Error Message
[ https://issues.apache.org/jira/browse/HDFS-14487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton resolved HDFS-14487. - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Thanks for the patch, [~shwetayakkali], and the review, [~sodonnell]. +1 Committed to trunk. > Missing Space in Client Error Message > - > > Key: HDFS-14487 > URL: https://issues.apache.org/jira/browse/HDFS-14487 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: Shweta >Priority: Minor > Labels: newbie, noob > Fix For: 3.3.0 > > Attachments: HDFS-14487.001.patch > > > {code:java} > if (retries == 0) { > throw new IOException("Unable to close file because the last > block" > + last + " does not have enough number of replicas."); > } > {code} > Note the missing space after "last block". > https://github.com/apache/hadoop/blob/f940ab242da80a22bae95509d5c282d7e2f7ecdb/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L968-L969 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14514) Actual read size of open file in encryption zone still larger than listing size even after enabling HDFS-11402 in Hadoop 2
[ https://issues.apache.org/jira/browse/HDFS-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851419#comment-16851419 ] Daniel Templeton edited comment on HDFS-14514 at 5/30/19 12:17 AM: --- LGTM. I'd like to see the last two ifs in DFSInputStream be an if/else-if, but I can fix that on commit. If there are no complaints, I'll commit this later this evening. was (Author: templedf): LGTM. I'd like to see the last two {{if}}s in {{DFSInputStream}} be an {{if/else-if}}, but I can fix that on commit. If there are no complaints, I'll commit this later this evening. > Actual read size of open file in encryption zone still larger than listing > size even after enabling HDFS-11402 in Hadoop 2 > -- > > Key: HDFS-14514 > URL: https://issues.apache.org/jira/browse/HDFS-14514 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, hdfs, snapshots >Affects Versions: 2.6.5, 2.9.2, 2.8.5, 2.7.7 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Attachments: HDFS-14514.branch-2.001.patch, > HDFS-14514.branch-2.002.patch, HDFS-14514.branch-2.003.patch > > > In Hadoop 2, when a file is opened for write in *encryption zone*, taken a > snapshot and appended, the read out file size in the snapshot is larger than > the listing size. This happens even when immutable snapshot HDFS-11402 is > enabled. > Note: The refactor HDFS-8905 happened in Hadoop 3.0 and later fixed the bug > silently (probably incidentally). Hadoop 2.x are still suffering from this > issue. > Thanks [~sodonnell] for locating the root cause in the codebase. > Repro: > 1. Set dfs.namenode.snapshot.capture.openfiles to true in hdfs-site.xml, > start HDFS cluster > 2. Create an empty directory /dataenc, create encryption zone and allow > snapshot on it > {code:bash} > hadoop key create reprokey > sudo -u hdfs hdfs dfs -mkdir /dataenc > sudo -u hdfs hdfs crypto -createZone -keyName reprokey -path /dataenc > sudo -u hdfs hdfs dfsadmin -allowSnapshot /dataenc > {code} > 3. Use a client that keeps a file open for write under /dataenc. For example, > I'm using Flume HDFS sink to tail a local file. > 4. Append the file several times using the client, keep the file open. > 5. Create a snapshot > {code:bash} > sudo -u hdfs hdfs dfs -createSnapshot /dataenc snap1 > {code} > 6. Append the file one or more times, but don't let the file size exceed the > block size limit. Wait for several seconds for the append to be flushed to DN. > 7. Do a -ls on the file inside the snapshot, then try to read the file using > -get, you should see the actual file size read is larger than the listing > size from -ls. > The patch and an updated unit test will be uploaded later. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14514) Actual read size of open file in encryption zone still larger than listing size even after enabling HDFS-11402 in Hadoop 2
[ https://issues.apache.org/jira/browse/HDFS-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851419#comment-16851419 ] Daniel Templeton commented on HDFS-14514: - LGTM. I'd like to see the last two {{if}}s in {{DFSInputStream}} be an {{if/else-if}}, but I can fix that on commit. If there are no complaints, I'll commit this later this evening. > Actual read size of open file in encryption zone still larger than listing > size even after enabling HDFS-11402 in Hadoop 2 > -- > > Key: HDFS-14514 > URL: https://issues.apache.org/jira/browse/HDFS-14514 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, hdfs, snapshots >Affects Versions: 2.6.5, 2.9.2, 2.8.5, 2.7.7 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Attachments: HDFS-14514.branch-2.001.patch, > HDFS-14514.branch-2.002.patch, HDFS-14514.branch-2.003.patch > > > In Hadoop 2, when a file is opened for write in *encryption zone*, taken a > snapshot and appended, the read out file size in the snapshot is larger than > the listing size. This happens even when immutable snapshot HDFS-11402 is > enabled. > Note: The refactor HDFS-8905 happened in Hadoop 3.0 and later fixed the bug > silently (probably incidentally). Hadoop 2.x are still suffering from this > issue. > Thanks [~sodonnell] for locating the root cause in the codebase. > Repro: > 1. Set dfs.namenode.snapshot.capture.openfiles to true in hdfs-site.xml, > start HDFS cluster > 2. Create an empty directory /dataenc, create encryption zone and allow > snapshot on it > {code:bash} > hadoop key create reprokey > sudo -u hdfs hdfs dfs -mkdir /dataenc > sudo -u hdfs hdfs crypto -createZone -keyName reprokey -path /dataenc > sudo -u hdfs hdfs dfsadmin -allowSnapshot /dataenc > {code} > 3. Use a client that keeps a file open for write under /dataenc. For example, > I'm using Flume HDFS sink to tail a local file. > 4. Append the file several times using the client, keep the file open. > 5. Create a snapshot > {code:bash} > sudo -u hdfs hdfs dfs -createSnapshot /dataenc snap1 > {code} > 6. Append the file one or more times, but don't let the file size exceed the > block size limit. Wait for several seconds for the append to be flushed to DN. > 7. Do a -ls on the file inside the snapshot, then try to read the file using > -get, you should see the actual file size read is larger than the listing > size from -ls. > The patch and an updated unit test will be uploaded later. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14359) Inherited ACL permissions masked when parent directory does not exist (mkdir -p)
[ https://issues.apache.org/jira/browse/HDFS-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14359: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) Thanks for the patch, [~sodonnell], and for the review, [~jojochuang]. Committed to trunk. > Inherited ACL permissions masked when parent directory does not exist (mkdir > -p) > > > Key: HDFS-14359 > URL: https://issues.apache.org/jira/browse/HDFS-14359 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14359.001.patch, HDFS-14359.002.patch, > HDFS-14359.003.patch > > > There appears to be an issue with ACL inheritance if you 'mkdir' a directory > such that the parent directories need to be created (ie mkdir -p). > If you have a folder /tmp2/testacls as: > {code} > hadoop fs -mkdir /tmp2 > hadoop fs -mkdir /tmp2/testacls > hadoop fs -setfacl -m default:user:hive:rwx /tmp2/testacls > hadoop fs -setfacl -m default:user:flume:rwx /tmp2/testacls > hadoop fs -setfacl -m user:hive:rwx /tmp2/testacls > hadoop fs -setfacl -m user:flume:rwx /tmp2/testacls > hadoop fs -getfacl -R /tmp2/testacls > # file: /tmp2/testacls > # owner: kafka > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > Then create a sub-directory in it, the ACLs are as expected: > {code} > hadoop fs -mkdir /tmp2/testacls/dir_from_mkdir > # file: /tmp2/testacls/dir_from_mkdir > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > However if you mkdir -p a directory, the situation is not the same: > {code} > hadoop fs -mkdir -p /tmp2/testacls/dir_with_subdirs/sub1/sub2 > # file: /tmp2/testacls/dir_with_subdirs > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx#effective:r-x > user:hive:rwx #effective:r-x > group::r-x > mask::r-x > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > # file: /tmp2/testacls/dir_with_subdirs/sub1 > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx#effective:r-x > user:hive:rwx #effective:r-x > group::r-x > mask::r-x > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > # file: /tmp2/testacls/dir_with_subdirs/sub1/sub2 > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > Notice the the leaf folder "sub2" is correct, but the two ancestor folders > have their permissions masked. I believe this is a regression from the fix > for HDFS-6962 with dfs.namenode.posix.acl.inheritance.enabled set to true, as > the code has changed significantly from the earlier 2.6 / 2.8 branch. > I will submit a patch for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14359) Inherited ACL permissions masked when parent directory does not exist (mkdir -p)
[ https://issues.apache.org/jira/browse/HDFS-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801220#comment-16801220 ] Daniel Templeton commented on HDFS-14359: - Alrighty. I'll get this committed. Thanks, [~jojochuang]! > Inherited ACL permissions masked when parent directory does not exist (mkdir > -p) > > > Key: HDFS-14359 > URL: https://issues.apache.org/jira/browse/HDFS-14359 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-14359.001.patch, HDFS-14359.002.patch, > HDFS-14359.003.patch > > > There appears to be an issue with ACL inheritance if you 'mkdir' a directory > such that the parent directories need to be created (ie mkdir -p). > If you have a folder /tmp2/testacls as: > {code} > hadoop fs -mkdir /tmp2 > hadoop fs -mkdir /tmp2/testacls > hadoop fs -setfacl -m default:user:hive:rwx /tmp2/testacls > hadoop fs -setfacl -m default:user:flume:rwx /tmp2/testacls > hadoop fs -setfacl -m user:hive:rwx /tmp2/testacls > hadoop fs -setfacl -m user:flume:rwx /tmp2/testacls > hadoop fs -getfacl -R /tmp2/testacls > # file: /tmp2/testacls > # owner: kafka > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > Then create a sub-directory in it, the ACLs are as expected: > {code} > hadoop fs -mkdir /tmp2/testacls/dir_from_mkdir > # file: /tmp2/testacls/dir_from_mkdir > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > However if you mkdir -p a directory, the situation is not the same: > {code} > hadoop fs -mkdir -p /tmp2/testacls/dir_with_subdirs/sub1/sub2 > # file: /tmp2/testacls/dir_with_subdirs > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx#effective:r-x > user:hive:rwx #effective:r-x > group::r-x > mask::r-x > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > # file: /tmp2/testacls/dir_with_subdirs/sub1 > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx#effective:r-x > user:hive:rwx #effective:r-x > group::r-x > mask::r-x > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > # file: /tmp2/testacls/dir_with_subdirs/sub1/sub2 > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > Notice the the leaf folder "sub2" is correct, but the two ancestor folders > have their permissions masked. I believe this is a regression from the fix > for HDFS-6962 with dfs.namenode.posix.acl.inheritance.enabled set to true, as > the code has changed significantly from the earlier 2.6 / 2.8 branch. > I will submit a patch for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14359) Inherited ACL permissions masked when parent directory does not exist (mkdir -p)
[ https://issues.apache.org/jira/browse/HDFS-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797246#comment-16797246 ] Daniel Templeton commented on HDFS-14359: - LGTM. +1 from me. Anyone else want to weigh in before I commit? ([~andrew.wang], [~jzhuge], [~steve_l], [~arpaga], ...) > Inherited ACL permissions masked when parent directory does not exist (mkdir > -p) > > > Key: HDFS-14359 > URL: https://issues.apache.org/jira/browse/HDFS-14359 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-14359.001.patch, HDFS-14359.002.patch, > HDFS-14359.003.patch > > > There appears to be an issue with ACL inheritance if you 'mkdir' a directory > such that the parent directories need to be created (ie mkdir -p). > If you have a folder /tmp2/testacls as: > {code} > hadoop fs -mkdir /tmp2 > hadoop fs -mkdir /tmp2/testacls > hadoop fs -setfacl -m default:user:hive:rwx /tmp2/testacls > hadoop fs -setfacl -m default:user:flume:rwx /tmp2/testacls > hadoop fs -setfacl -m user:hive:rwx /tmp2/testacls > hadoop fs -setfacl -m user:flume:rwx /tmp2/testacls > hadoop fs -getfacl -R /tmp2/testacls > # file: /tmp2/testacls > # owner: kafka > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > Then create a sub-directory in it, the ACLs are as expected: > {code} > hadoop fs -mkdir /tmp2/testacls/dir_from_mkdir > # file: /tmp2/testacls/dir_from_mkdir > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > However if you mkdir -p a directory, the situation is not the same: > {code} > hadoop fs -mkdir -p /tmp2/testacls/dir_with_subdirs/sub1/sub2 > # file: /tmp2/testacls/dir_with_subdirs > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx#effective:r-x > user:hive:rwx #effective:r-x > group::r-x > mask::r-x > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > # file: /tmp2/testacls/dir_with_subdirs/sub1 > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx#effective:r-x > user:hive:rwx #effective:r-x > group::r-x > mask::r-x > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > # file: /tmp2/testacls/dir_with_subdirs/sub1/sub2 > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > Notice the the leaf folder "sub2" is correct, but the two ancestor folders > have their permissions masked. I believe this is a regression from the fix > for HDFS-6962 with dfs.namenode.posix.acl.inheritance.enabled set to true, as > the code has changed significantly from the earlier 2.6 / 2.8 branch. > I will submit a patch for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14359) Inherited ACL permissions masked when parent directory does not exist (mkdir -p)
[ https://issues.apache.org/jira/browse/HDFS-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796376#comment-16796376 ] Daniel Templeton commented on HDFS-14359: - I think you're right about the 3 failed test results. It was probably a case of testing for what makes the test pass, as opposed to testing expected behavior. :) Looking at the test code history, [~jzhuge] only updated the expected ACLs for the leaf directory when he added the POSIX ACL inheritance option, which supports my theory. > Inherited ACL permissions masked when parent directory does not exist (mkdir > -p) > > > Key: HDFS-14359 > URL: https://issues.apache.org/jira/browse/HDFS-14359 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-14359.001.patch, HDFS-14359.002.patch, > HDFS-14359.003.patch > > > There appears to be an issue with ACL inheritance if you 'mkdir' a directory > such that the parent directories need to be created (ie mkdir -p). > If you have a folder /tmp2/testacls as: > {code} > hadoop fs -mkdir /tmp2 > hadoop fs -mkdir /tmp2/testacls > hadoop fs -setfacl -m default:user:hive:rwx /tmp2/testacls > hadoop fs -setfacl -m default:user:flume:rwx /tmp2/testacls > hadoop fs -setfacl -m user:hive:rwx /tmp2/testacls > hadoop fs -setfacl -m user:flume:rwx /tmp2/testacls > hadoop fs -getfacl -R /tmp2/testacls > # file: /tmp2/testacls > # owner: kafka > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > Then create a sub-directory in it, the ACLs are as expected: > {code} > hadoop fs -mkdir /tmp2/testacls/dir_from_mkdir > # file: /tmp2/testacls/dir_from_mkdir > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > However if you mkdir -p a directory, the situation is not the same: > {code} > hadoop fs -mkdir -p /tmp2/testacls/dir_with_subdirs/sub1/sub2 > # file: /tmp2/testacls/dir_with_subdirs > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx#effective:r-x > user:hive:rwx #effective:r-x > group::r-x > mask::r-x > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > # file: /tmp2/testacls/dir_with_subdirs/sub1 > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx#effective:r-x > user:hive:rwx #effective:r-x > group::r-x > mask::r-x > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > # file: /tmp2/testacls/dir_with_subdirs/sub1/sub2 > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > Notice the the leaf folder "sub2" is correct, but the two ancestor folders > have their permissions masked. I believe this is a regression from the fix > for HDFS-6962 with dfs.namenode.posix.acl.inheritance.enabled set to true, as > the code has changed significantly from the earlier 2.6 / 2.8 branch. > I will submit a patch for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14381) Add option to hdfs dfs -cat to ignore corrupt blocks
[ https://issues.apache.org/jira/browse/HDFS-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796338#comment-16796338 ] Daniel Templeton commented on HDFS-14381: - That's a really good point. I'll update the description accordingly. > Add option to hdfs dfs -cat to ignore corrupt blocks > > > Key: HDFS-14381 > URL: https://issues.apache.org/jira/browse/HDFS-14381 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Affects Versions: 3.2.0 >Reporter: Daniel Templeton >Priority: Minor > > If I have a file in HDFS that contains 100 blocks, and I happen to lose the > first block (for whatever obscure/unlikely/dumb reason), I can no longer > access the 99% of the file that's still there and accessible. In the case of > some data formats (e.g. text), the remaining data may still be useful. It > would be nice to have a way to extract the remaining data without having to > manually reassemble the file contents from the block files. Something like > {{hdfs dfs -cat -ignoreCorrupt }}. It could insert some marker to show > where the missing blocks are. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14381) Add option to hdfs dfs to ignore corrupt blocks
[ https://issues.apache.org/jira/browse/HDFS-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14381: Description: If I have a file in HDFS that contains 100 blocks, and I happen to lose the first block (for whatever obscure/unlikely/dumb reason), I can no longer access the 99% of the file that's still there and accessible. In the case of some data formats (e.g. text), the remaining data may still be useful. It would be nice to have a way to extract the remaining data without having to manually reassemble the file contents from the block files. Something like {{hdfs dfs -copyToLocal -ignoreCorrupt }}. It could insert some marker to show where the missing blocks are. (was: If I have a file in HDFS that contains 100 blocks, and I happen to lose the first block (for whatever obscure/unlikely/dumb reason), I can no longer access the 99% of the file that's still there and accessible. In the case of some data formats (e.g. text), the remaining data may still be useful. It would be nice to have a way to extract the remaining data without having to manually reassemble the file contents from the block files. Something like {{hdfs dfs -cat -ignoreCorrupt }}. It could insert some marker to show where the missing blocks are.) > Add option to hdfs dfs to ignore corrupt blocks > --- > > Key: HDFS-14381 > URL: https://issues.apache.org/jira/browse/HDFS-14381 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Affects Versions: 3.2.0 >Reporter: Daniel Templeton >Priority: Minor > > If I have a file in HDFS that contains 100 blocks, and I happen to lose the > first block (for whatever obscure/unlikely/dumb reason), I can no longer > access the 99% of the file that's still there and accessible. In the case of > some data formats (e.g. text), the remaining data may still be useful. It > would be nice to have a way to extract the remaining data without having to > manually reassemble the file contents from the block files. Something like > {{hdfs dfs -copyToLocal -ignoreCorrupt }}. It could insert some marker > to show where the missing blocks are. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14381) Add option to hdfs dfs to ignore corrupt blocks
[ https://issues.apache.org/jira/browse/HDFS-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14381: Summary: Add option to hdfs dfs to ignore corrupt blocks (was: Add option to hdfs dfs -cat to ignore corrupt blocks) > Add option to hdfs dfs to ignore corrupt blocks > --- > > Key: HDFS-14381 > URL: https://issues.apache.org/jira/browse/HDFS-14381 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Affects Versions: 3.2.0 >Reporter: Daniel Templeton >Priority: Minor > > If I have a file in HDFS that contains 100 blocks, and I happen to lose the > first block (for whatever obscure/unlikely/dumb reason), I can no longer > access the 99% of the file that's still there and accessible. In the case of > some data formats (e.g. text), the remaining data may still be useful. It > would be nice to have a way to extract the remaining data without having to > manually reassemble the file contents from the block files. Something like > {{hdfs dfs -cat -ignoreCorrupt }}. It could insert some marker to show > where the missing blocks are. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14382) The hdfs fsck command docs do not explain the meaning of the reported fields
Daniel Templeton created HDFS-14382: --- Summary: The hdfs fsck command docs do not explain the meaning of the reported fields Key: HDFS-14382 URL: https://issues.apache.org/jira/browse/HDFS-14382 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 3.2.0 Reporter: Daniel Templeton The {{hdfs fsck}} command shows something like: {noformat}FSCK started by root (auth:SIMPLE) from /172.17.0.2 for path /tmp at Tue Mar 19 15:50:24 UTC 2019 .Status: HEALTHY Total size:179159051 B Total dirs:11 Total files: 1 Total symlinks:0 Total blocks (validated): 2 (avg. block size 89579525 B) Minimally replicated blocks: 2 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:1 Average block replication: 1.0 Corrupt blocks:0 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Tue Mar 19 15:50:24 UTC 2019 in 3 milliseconds The filesystem under path '/tmp' is HEALTHY{noformat} The fields are presumed to be self-explanatory, but I think that's a bold assumption. In particular, it's not obvious how "mis-replicated" blocks differ from "under-replicated" or "over-replicated" blocks. It would be nice to explain the meaning of all the fields clearly in the docs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14381) Add option to hdfs dfs -cat to ignore corrupt blocks
Daniel Templeton created HDFS-14381: --- Summary: Add option to hdfs dfs -cat to ignore corrupt blocks Key: HDFS-14381 URL: https://issues.apache.org/jira/browse/HDFS-14381 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 3.2.0 Reporter: Daniel Templeton If I have a file in HDFS that contains 100 blocks, and I happen to lose the first block (for whatever obscure/unlikely/dumb reason), I can no longer access the 99% of the file that's still there and accessible. In the case of some data formats (e.g. text), the remaining data may still be useful. It would be nice to have a way to extract the remaining data without having to manually reassemble the file contents from the block files. Something like {{hdfs dfs -cat -ignoreCorrupt }}. It could insert some marker to show where the missing blocks are. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14328) [Clean-up] Remove NULL check before instanceof in TestGSet
[ https://issues.apache.org/jira/browse/HDFS-14328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14328: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) +1 Thanks for the patch, [~shwetayakkali]. Committed to trunk. > [Clean-up] Remove NULL check before instanceof in TestGSet > -- > > Key: HDFS-14328 > URL: https://issues.apache.org/jira/browse/HDFS-14328 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shweta >Assignee: Shweta >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14328.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14359) Inherited ACL permissions masked when parent directory does not exist (mkdir -p)
[ https://issues.apache.org/jira/browse/HDFS-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789803#comment-16789803 ] Daniel Templeton commented on HDFS-14359: - At a first pass, it LGTM. I'll take a closer look when I get a chance. > Inherited ACL permissions masked when parent directory does not exist (mkdir > -p) > > > Key: HDFS-14359 > URL: https://issues.apache.org/jira/browse/HDFS-14359 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-14359.001.patch > > > There appears to be an issue with ACL inheritance if you 'mkdir' a directory > such that the parent directories need to be created (ie mkdir -p). > If you have a folder /tmp2/testacls as: > {code} > hadoop fs -mkdir /tmp2 > hadoop fs -mkdir /tmp2/testacls > hadoop fs -setfacl -m default:user:hive:rwx /tmp2/testacls > hadoop fs -setfacl -m default:user:flume:rwx /tmp2/testacls > hadoop fs -setfacl -m user:hive:rwx /tmp2/testacls > hadoop fs -setfacl -m user:flume:rwx /tmp2/testacls > hadoop fs -getfacl -R /tmp2/testacls > # file: /tmp2/testacls > # owner: kafka > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > Then create a sub-directory in it, the ACLs are as expected: > {code} > hadoop fs -mkdir /tmp2/testacls/dir_from_mkdir > # file: /tmp2/testacls/dir_from_mkdir > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > However if you mkdir -p a directory, the situation is not the same: > {code} > hadoop fs -mkdir -p /tmp2/testacls/dir_with_subdirs/sub1/sub2 > # file: /tmp2/testacls/dir_with_subdirs > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx#effective:r-x > user:hive:rwx #effective:r-x > group::r-x > mask::r-x > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > # file: /tmp2/testacls/dir_with_subdirs/sub1 > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx#effective:r-x > user:hive:rwx #effective:r-x > group::r-x > mask::r-x > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > # file: /tmp2/testacls/dir_with_subdirs/sub1/sub2 > # owner: sodonnell > # group: supergroup > user::rwx > user:flume:rwx > user:hive:rwx > group::r-x > mask::rwx > other::r-x > default:user::rwx > default:user:flume:rwx > default:user:hive:rwx > default:group::r-x > default:mask::rwx > default:other::r-x > {code} > Notice the the leaf folder "sub2" is correct, but the two ancestor folders > have their permissions masked. I believe this is a regression from the fix > for HDFS-6962 with dfs.namenode.posix.acl.inheritance.enabled set to true, as > the code has changed significantly from the earlier 2.6 / 2.8 branch. > I will submit a patch for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14339) Inconsistent log level practices in RpcProgramNfs3.java
[ https://issues.apache.org/jira/browse/HDFS-14339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788795#comment-16788795 ] Daniel Templeton commented on HDFS-14339: - Since it's [~OneisAll]'s patch, I went ahead and reassigned the JIRA to him. > Inconsistent log level practices in RpcProgramNfs3.java > --- > > Key: HDFS-14339 > URL: https://issues.apache.org/jira/browse/HDFS-14339 > Project: Hadoop HDFS > Issue Type: Improvement > Components: nfs >Affects Versions: 3.1.0, 2.8.5 >Reporter: Anuhan Torgonshar >Assignee: Anuhan Torgonshar >Priority: Major > Labels: easyfix > Attachments: HDFS-14339.trunk.patch > > > There are *inconsistent* log level practices in > _*hadoop-2.8.5-src/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/**RpcProgramNfs3.java*_. > > {code:java} > //following log levels are inconsistent with other practices which seems to > more appropriate > //from line 1814 to 1819 & line 1831 to 1836 in Hadoop-2.8.5 version > try { > attr = writeManager.getFileAttr(dfsClient, childHandle, iug); > } catch (IOException e) { > LOG.error("Can't get file attributes for fileId: {}", fileId, e); > continue; > } > //other 2 same practices in this file > //from line 907 to 911 & line 2102 to 2106 > try { > postOpAttr = writeManager.getFileAttr(dfsClient, handle, iug); > } catch (IOException e1) { > LOG.info("Can't get postOpAttr for fileId: {}", e1); > } > //other 3 similar practices > //from line 1224 to 1227 & line 1139 to 1143 1309 to 1313 > try { > postOpDirAttr = Nfs3Utils.getFileAttr(dfsClient, dirFileIdPath, iug); > } catch (IOException e1) { > LOG.info("Can't get postOpDirAttr for {}", dirFileIdPath, e1); > } > {code} > Therefore, when the code catches _*IOException*_ for _*getFileAttr()*_ > method, it more likely prints a log message with _*INFO*_ level, a lower > level, a higher level may be scary the users in future. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14339) Inconsistent log level practices in RpcProgramNfs3.java
[ https://issues.apache.org/jira/browse/HDFS-14339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton reassigned HDFS-14339: --- Assignee: Anuhan Torgonshar (was: Shweta) > Inconsistent log level practices in RpcProgramNfs3.java > --- > > Key: HDFS-14339 > URL: https://issues.apache.org/jira/browse/HDFS-14339 > Project: Hadoop HDFS > Issue Type: Improvement > Components: nfs >Affects Versions: 3.1.0, 2.8.5 >Reporter: Anuhan Torgonshar >Assignee: Anuhan Torgonshar >Priority: Major > Labels: easyfix > Attachments: HDFS-14339.trunk.patch > > > There are *inconsistent* log level practices in > _*hadoop-2.8.5-src/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/**RpcProgramNfs3.java*_. > > {code:java} > //following log levels are inconsistent with other practices which seems to > more appropriate > //from line 1814 to 1819 & line 1831 to 1836 in Hadoop-2.8.5 version > try { > attr = writeManager.getFileAttr(dfsClient, childHandle, iug); > } catch (IOException e) { > LOG.error("Can't get file attributes for fileId: {}", fileId, e); > continue; > } > //other 2 same practices in this file > //from line 907 to 911 & line 2102 to 2106 > try { > postOpAttr = writeManager.getFileAttr(dfsClient, handle, iug); > } catch (IOException e1) { > LOG.info("Can't get postOpAttr for fileId: {}", e1); > } > //other 3 similar practices > //from line 1224 to 1227 & line 1139 to 1143 1309 to 1313 > try { > postOpDirAttr = Nfs3Utils.getFileAttr(dfsClient, dirFileIdPath, iug); > } catch (IOException e1) { > LOG.info("Can't get postOpDirAttr for {}", dirFileIdPath, e1); > } > {code} > Therefore, when the code catches _*IOException*_ for _*getFileAttr()*_ > method, it more likely prints a log message with _*INFO*_ level, a lower > level, a higher level may be scary the users in future. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14333) Datanode fails to start if any disk has errors during Namenode registration
[ https://issues.apache.org/jira/browse/HDFS-14333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783792#comment-16783792 ] Daniel Templeton commented on HDFS-14333: - I took a look, and I don't have any comments. LGTM! I'm gonna let someone who knows volume management in the data node better give you the +1, though. > Datanode fails to start if any disk has errors during Namenode registration > --- > > Key: HDFS-14333 > URL: https://issues.apache.org/jira/browse/HDFS-14333 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-14333.001.patch > > > This is closely related to HDFS-9908, where it was reported that a datanode > would fail to start if an IO error occurred on a single disk when running du > during Datanode registration. That Jira was closed due to HADOOP-12973 which > refactored how du is called and prevents any exception being thrown. However > this problem can still occur if the volume has errors (eg permission or > filesystem corruption) when the disk is scanned to load all the replicas. The > method chain is: > DataNode.initBlockPool -> FSDataSetImpl.addBlockPool -> > FSVolumeList.getAllVolumesMap -> Throws exception which goes unhandled. > The DN logs will contain a stack trace for the problem volume, so the > workaround is to remove the volume from the DN config and the DN will start, > but the logs are a little confusing, so its always not obvious what the issue > is. > These are the cut down logs from an occurrence of this issue. > {code} > 2019-03-01 08:58:24,830 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning > block pool BP-240961797-x.x.x.x-1392827522027 on volume > /data/18/dfs/dn/current... > ... > 2019-03-01 08:58:27,029 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Could > not get disk usage information > ExitCodeException exitCode=1: du: cannot read directory > `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir215': > Permission denied > du: cannot read directory > `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir213': > Permission denied > du: cannot read directory > `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir97/subdir25': > Permission denied > at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) > at org.apache.hadoop.util.Shell.run(Shell.java:504) > at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:61) > at org.apache.hadoop.fs.DU.refresh(DU.java:53) > at > org.apache.hadoop.fs.CachingGetSpaceUsed.init(CachingGetSpaceUsed.java:84) > at > org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:166) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:145) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:881) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:412) > ... > 2019-03-01 08:58:27,043 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time > taken to scan block pool BP-240961797-x.x.x.x-1392827522027 on > /data/18/dfs/dn/current: 2202ms > {code} > So we can see a du error occurred, was logged but not re-thrown (due to > HADOOP-12973) and the blockpool scan completed. However then in the 'add > replicas to map' logic, we got another exception stemming from the same > problem: > {code} > 2019-03-01 08:58:27,564 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding > replicas to map for block pool BP-240961797-x.x.x.x-1392827522027 on volume > /data/18/dfs/dn/current... > ... > 2019-03-01 08:58:31,155 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught > exception while adding replicas from /data/18/dfs/dn/current. Will throw > later. > java.io.IOException: Invalid directory or I/O error occurred for dir: > /data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir215 > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1167) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at >
[jira] [Updated] (HDFS-14273) Fix checkstyle issues in BlockLocation's method javadoc
[ https://issues.apache.org/jira/browse/HDFS-14273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14273: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Release Note: Thanks for the patch, [~shwetayakkali], and review, [~knanasi]. Committed to trunk. Status: Resolved (was: Patch Available) > Fix checkstyle issues in BlockLocation's method javadoc > --- > > Key: HDFS-14273 > URL: https://issues.apache.org/jira/browse/HDFS-14273 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shweta >Assignee: Shweta >Priority: Trivial > Fix For: 3.3.0 > > Attachments: HDFS-14273.001.patch > > > BlockLocation. java has checkstyle issues for most of methods's javadoc and > an indentation error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14273) Fix checkstyle issues in BlockLocation's method javadoc
[ https://issues.apache.org/jira/browse/HDFS-14273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773511#comment-16773511 ] Daniel Templeton commented on HDFS-14273: - LGTM +1. I'll check it in soon. > Fix checkstyle issues in BlockLocation's method javadoc > --- > > Key: HDFS-14273 > URL: https://issues.apache.org/jira/browse/HDFS-14273 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shweta >Assignee: Shweta >Priority: Trivial > Attachments: HDFS-14273.001.patch > > > BlockLocation. java has checkstyle issues for most of methods's javadoc and > an indentation error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14185) Cleanup method calls to static Assert methods in TestAddStripedBlocks
[ https://issues.apache.org/jira/browse/HDFS-14185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14185: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) Thanks, [~shwetayakkali]. Committed to trunk. > Cleanup method calls to static Assert methods in TestAddStripedBlocks > - > > Key: HDFS-14185 > URL: https://issues.apache.org/jira/browse/HDFS-14185 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Shweta >Assignee: Shweta >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14185.001.patch, HDFS-14185.002.patch, > HDFS-14185.003.patch, HDFS-14185.004.patch > > > Cleanup code in TestAddStripedBlock to cleanup method calls to static Assert > methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14185) Cleanup method calls to static Assert methods in TestAddStripedBlocks
[ https://issues.apache.org/jira/browse/HDFS-14185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739848#comment-16739848 ] Daniel Templeton commented on HDFS-14185: - +1 pending a clean(-ish) Jenkins run. > Cleanup method calls to static Assert methods in TestAddStripedBlocks > - > > Key: HDFS-14185 > URL: https://issues.apache.org/jira/browse/HDFS-14185 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Shweta >Assignee: Shweta >Priority: Minor > Attachments: HDFS-14185.001.patch, HDFS-14185.002.patch, > HDFS-14185.003.patch, HDFS-14185.004.patch > > > Cleanup code in TestAddStripedBlock to cleanup method calls to static Assert > methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14132) Add BlockLocation.isStriped() to determine if block is replicated or Striped
[ https://issues.apache.org/jira/browse/HDFS-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737708#comment-16737708 ] Daniel Templeton edited comment on HDFS-14132 at 1/9/19 1:06 AM: - Thanks for the patch, [~shwetayakkali]. Committed to trunk. was (Author: templedf): Thanks for the patch, [~shwetayakkali]. Commited to trunk. > Add BlockLocation.isStriped() to determine if block is replicated or Striped > > > Key: HDFS-14132 > URL: https://issues.apache.org/jira/browse/HDFS-14132 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Shweta >Assignee: Shweta >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14132.001.patch, HDFS-14132.002.patch, > HDFS-14132.003.patch, HDFS-14132.004.patch > > > Impala uses FileSystem#getBlockLocation to get block locations. We can add > isStriped() method for it to easier determine the block is belonged to > replicated file or striped file. > In HDFS, this isStriped information is already in > HdfsBlockLocation#LocatedBlock#isStriped(), adding this method to > BlockLocation does not introduce space overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14132) Add BlockLocation.isStriped() to determine if block is replicated or Striped
[ https://issues.apache.org/jira/browse/HDFS-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14132: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) Thanks for the patch, [~shwetayakkali]. Commited to trunk. > Add BlockLocation.isStriped() to determine if block is replicated or Striped > > > Key: HDFS-14132 > URL: https://issues.apache.org/jira/browse/HDFS-14132 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Shweta >Assignee: Shweta >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14132.001.patch, HDFS-14132.002.patch, > HDFS-14132.003.patch, HDFS-14132.004.patch > > > Impala uses FileSystem#getBlockLocation to get block locations. We can add > isStriped() method for it to easier determine the block is belonged to > replicated file or striped file. > In HDFS, this isStriped information is already in > HdfsBlockLocation#LocatedBlock#isStriped(), adding this method to > BlockLocation does not introduce space overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14132) Add BlockLocation.isStriped() to determine if block is replicated or Striped
[ https://issues.apache.org/jira/browse/HDFS-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737703#comment-16737703 ] Daniel Templeton commented on HDFS-14132: - I don't love breaking a line on a "." when you could have broken on an "=" instead, but it's not enough to ask for a new patch. +1 I'll commit shortly. > Add BlockLocation.isStriped() to determine if block is replicated or Striped > > > Key: HDFS-14132 > URL: https://issues.apache.org/jira/browse/HDFS-14132 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Shweta >Assignee: Shweta >Priority: Major > Attachments: HDFS-14132.001.patch, HDFS-14132.002.patch, > HDFS-14132.003.patch, HDFS-14132.004.patch > > > Impala uses FileSystem#getBlockLocation to get block locations. We can add > isStriped() method for it to easier determine the block is belonged to > replicated file or striped file. > In HDFS, this isStriped information is already in > HdfsBlockLocation#LocatedBlock#isStriped(), adding this method to > BlockLocation does not introduce space overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14121) Log message about the old hosts file format is misleading
[ https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14121: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the patch, [~zvenczel], and the reviews, [~knanasi]. Committed to trunk. > Log message about the old hosts file format is misleading > - > > Key: HDFS-14121 > URL: https://issues.apache.org/jira/browse/HDFS-14121 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch > > > In {{CombinedHostsFileReader.readFile()}} we have the following: > {code} LOG.warn("{} has invalid JSON format." + > "Try the old format without top-level token defined.", > hostsFile);{code} > That message is trying to say that we tried parsing the hosts file as a > well-formed JSON file and failed, so we're going to try again assuming that > it's in the old badly-formed format. What it actually says is that the hosts > fie is bad, and the admin should try switching to the old format. Those are > two very different things. > While were in there, we should refactor the logging so that instead of > reporting that we're going to try using a different parser (who the heck > cares?), we report that the we had to use the old parser to successfully > parse the hosts file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14121) Log message about the old hosts file format is misleading
[ https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721324#comment-16721324 ] Daniel Templeton commented on HDFS-14121: - On further reflection, let's leave it as a WARN. +1 > Log message about the old hosts file format is misleading > - > > Key: HDFS-14121 > URL: https://issues.apache.org/jira/browse/HDFS-14121 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch > > > In {{CombinedHostsFileReader.readFile()}} we have the following: > {code} LOG.warn("{} has invalid JSON format." + > "Try the old format without top-level token defined.", > hostsFile);{code} > That message is trying to say that we tried parsing the hosts file as a > well-formed JSON file and failed, so we're going to try again assuming that > it's in the old badly-formed format. What it actually says is that the hosts > fie is bad, and the admin should try switching to the old format. Those are > two very different things. > While were in there, we should refactor the logging so that instead of > reporting that we're going to try using a different parser (who the heck > cares?), we report that the we had to use the old parser to successfully > parse the hosts file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14121) Log message about the old hosts file format is misleading
[ https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721214#comment-16721214 ] Daniel Templeton commented on HDFS-14121: - Thanks, [~zvenczel]. The patch looks great. One philosophical question: when you log that the old format is being used, should that be INFO level? It's not actually a problem, per se, though it is something the admin should fix. > Log message about the old hosts file format is misleading > - > > Key: HDFS-14121 > URL: https://issues.apache.org/jira/browse/HDFS-14121 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch > > > In {{CombinedHostsFileReader.readFile()}} we have the following: > {code} LOG.warn("{} has invalid JSON format." + > "Try the old format without top-level token defined.", > hostsFile);{code} > That message is trying to say that we tried parsing the hosts file as a > well-formed JSON file and failed, so we're going to try again assuming that > it's in the old badly-formed format. What it actually says is that the hosts > fie is bad, and the admin should try switching to the old format. Those are > two very different things. > While were in there, we should refactor the logging so that instead of > reporting that we're going to try using a different parser (who the heck > cares?), we report that the we had to use the old parser to successfully > parse the hosts file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14132) Add BlockLocation.isStriped() to determine if block is replicated or Striped
[ https://issues.apache.org/jira/browse/HDFS-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721093#comment-16721093 ] Daniel Templeton commented on HDFS-14132: - Thanks for the patch, [~shwetayakkali]. A couple of comments: # For this method:{code} public boolean isStriped() { return false; }{code} it would be nicer to add the newlines around the return statement. It's more consistent, and it's easier to read. # For the asserts in your tests, please add assert messages that explain what the failure is in a way that someone can understand without having to read code. # Super trivial point, but the usual way to do asserts is to leave out the {{Assert.}} and add the needed imports. In that test class, I can see that {{Assert.assertEquals}} is already imported, but most of the asserts include the {{Assert.}}. Since it's already inconsistent, I'd say it's better to do it the usual way. But that's just me. > Add BlockLocation.isStriped() to determine if block is replicated or Striped > > > Key: HDFS-14132 > URL: https://issues.apache.org/jira/browse/HDFS-14132 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Shweta >Assignee: Shweta >Priority: Major > Attachments: HDFS-14132.001.patch > > > Impala uses FileSystem#getBlockLocation to get block locations. We can add > isStriped() method for it to easier determine the block is belonged to > replicated file or striped file. > In HDFS, this isStriped information is already in > HdfsBlockLocation#LocatedBlock#isStriped(), adding this method to > BlockLocation does not introduce space overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13985) Clearer error message for ReplicaNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-13985: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) Thanks for the patch, [~adam.antal]. Committed to trunk. > Clearer error message for ReplicaNotFoundException > -- > > Key: HDFS-13985 > URL: https://issues.apache.org/jira/browse/HDFS-13985 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-13985.001.patch, HDFS-13985.002.patch, > HDFS-13985.002.patch, HDFS-13985.003.patch > > > The issue is that we came across a ReplicaNotFoundException in a bug report, > the most informative thing we could get is "Replica not found for > [ExtendedBlock]". If someone tries to investigate cases including > ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, > but as a starting point enhancing the exception message would boost this > process, and be beneficial in the long run. > More concretely, it would be helpful if any of the following information was > displayed along with the exception: file's name, replication factor or block > location. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13985) Clearer error message for ReplicaNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720293#comment-16720293 ] Daniel Templeton commented on HDFS-13985: - Oops. I forgot to also thank [~zvenczel] and [~jojochuang] for the reviews! > Clearer error message for ReplicaNotFoundException > -- > > Key: HDFS-13985 > URL: https://issues.apache.org/jira/browse/HDFS-13985 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-13985.001.patch, HDFS-13985.002.patch, > HDFS-13985.002.patch, HDFS-13985.003.patch > > > The issue is that we came across a ReplicaNotFoundException in a bug report, > the most informative thing we could get is "Replica not found for > [ExtendedBlock]". If someone tries to investigate cases including > ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, > but as a starting point enhancing the exception message would boost this > process, and be beneficial in the long run. > More concretely, it would be helpful if any of the following information was > displayed along with the exception: file's name, replication factor or block > location. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13985) Clearer error message for ReplicaNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720160#comment-16720160 ] Daniel Templeton commented on HDFS-13985: - LGTM +1 > Clearer error message for ReplicaNotFoundException > -- > > Key: HDFS-13985 > URL: https://issues.apache.org/jira/browse/HDFS-13985 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HDFS-13985.001.patch, HDFS-13985.002.patch, > HDFS-13985.002.patch, HDFS-13985.003.patch > > > The issue is that we came across a ReplicaNotFoundException in a bug report, > the most informative thing we could get is "Replica not found for > [ExtendedBlock]". If someone tries to investigate cases including > ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, > but as a starting point enhancing the exception message would boost this > process, and be beneficial in the long run. > More concretely, it would be helpful if any of the following information was > displayed along with the exception: file's name, replication factor or block > location. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14126) DataNode DirectoryScanner holding global lock for too long
[ https://issues.apache.org/jira/browse/HDFS-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719884#comment-16719884 ] Daniel Templeton commented on HDFS-14126: - That error and the lag that causes it are exactly why the directory scanner throttle test is flaky. When working on that test, I noticed that we occasionally see that lock-held-too-long message, and it correlates with the directory scanner taking much longer than usual to complete a scan. So, yes it's a performance issue, but, no, it's not a regression. > DataNode DirectoryScanner holding global lock for too long > -- > > Key: HDFS-14126 > URL: https://issues.apache.org/jira/browse/HDFS-14126 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Priority: Major > > I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand > blocks. > And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds: > {quote} > 2018-12-03 21:33:09,130 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-4588049-10.17.XXX-XX-281857726 Total blocks: 434401, missing metadata > fi > les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0 > 2018-12-03 21:33:09,131 WARN > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock > held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve > r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 > lock warnings. The stack trace is: > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) > org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) > org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133) > org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) > org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > {quote} > Log messages like this repeats every several hours (6, to be exact). I am not > sure if this is a performance regression, or just the fact that the lock > information is printed in Hadoop 3. [~vagarychen] or [~templedf] do you know? > There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's > heap size is set to several GB. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13985) Clearer error message for ReplicaNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719473#comment-16719473 ] Daniel Templeton commented on HDFS-13985: - Thanks' [~adam.antal]. I'm going to pick on the error message a bit. Can we please make it, "The block may have been removed recently by the balancer or by intentionally reducing the replication factor. This condition is usually harmless. To be certain, please check the preceding datanode log messages for signs of a more serious issue." > Clearer error message for ReplicaNotFoundException > -- > > Key: HDFS-13985 > URL: https://issues.apache.org/jira/browse/HDFS-13985 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HDFS-13985.001.patch, HDFS-13985.002.patch, > HDFS-13985.002.patch > > > The issue is that we came across a ReplicaNotFoundException in a bug report, > the most informative thing we could get is "Replica not found for > [ExtendedBlock]". If someone tries to investigate cases including > ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, > but as a starting point enhancing the exception message would boost this > process, and be beneficial in the long run. > More concretely, it would be helpful if any of the following information was > displayed along with the exception: file's name, replication factor or block > location. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14121) Log message about the old hosts file format is misleading
[ https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719123#comment-16719123 ] Daniel Templeton commented on HDFS-14121: - Thanks, [~zvenczel]. I agree with Kitti that the patch could be more verbose. Specifically, it should be actionable. If we're going to issue a warning, it should explain how to make it go away. Also, I'd rather have the warning printed after we know whether using the old parser worked. If using the old parser also fails, there's no reason to say we tried it. > Log message about the old hosts file format is misleading > - > > Key: HDFS-14121 > URL: https://issues.apache.org/jira/browse/HDFS-14121 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-14121.01.patch > > > In {{CombinedHostsFileReader.readFile()}} we have the following: > {code} LOG.warn("{} has invalid JSON format." + > "Try the old format without top-level token defined.", > hostsFile);{code} > That message is trying to say that we tried parsing the hosts file as a > well-formed JSON file and failed, so we're going to try again assuming that > it's in the old badly-formed format. What it actually says is that the hosts > fie is bad, and the admin should try switching to the old format. Those are > two very different things. > While were in there, we should refactor the logging so that instead of > reporting that we're going to try using a different parser (who the heck > cares?), we report that the we had to use the old parser to successfully > parse the hosts file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-13752) fs.Path stores file path in java.net.URI causes big memory waste
[ https://issues.apache.org/jira/browse/HDFS-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton reassigned HDFS-13752: --- Assignee: Barnabas Maidics > fs.Path stores file path in java.net.URI causes big memory waste > > > Key: HDFS-13752 > URL: https://issues.apache.org/jira/browse/HDFS-13752 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs >Affects Versions: 2.7.6 > Environment: Hive 2.1.1 and hadoop 2.7.6 >Reporter: Barnabas Maidics >Assignee: Barnabas Maidics >Priority: Major > Attachments: HDFS-13752.001.patch, HDFS-13752.002.patch, > HDFS-13752.003.patch, HDFSbenchmark.pdf, Screen Shot 2018-07-20 at > 11.12.38.png, heapdump-10partitions.html, measurement.pdf > > > I was looking at HiveServer2 memory usage, and a big percentage of this was > because of org.apache.hadoop.fs.Path, where you store file paths in a > java.net.URI object. The URI implementation stores the same string in 3 > different objects (see the attached image). In Hive when there are many > partitions this cause a big memory usage. In my particular case 42% of memory > was used by java.net.URI so it could be reduced to 14%. > I wonder if the community is open to replace it with a more memory efficient > implementation and what other things should be considered here? It can be a > huge memory improvement for Hadoop and for Hive as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14121) Log message about the old hosts file format is misleading
Daniel Templeton created HDFS-14121: --- Summary: Log message about the old hosts file format is misleading Key: HDFS-14121 URL: https://issues.apache.org/jira/browse/HDFS-14121 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Daniel Templeton In {{CombinedHostsFileReader.readFile()}} we have the following: {code} LOG.warn("{} has invalid JSON format." + "Try the old format without top-level token defined.", hostsFile);{code} That message is trying to say that we tried parsing the hosts file as a well-formed JSON file and failed, so we're going to try again assuming that it's in the old badly-formed format. What it actually says is that the hosts fie is bad, and the admin should try switching to the old format. Those are two very different things. While were in there, we should refactor the logging so that instead of reporting that we're going to try using a different parser (who the heck cares?), we report that the we had to use the old parser to successfully parse the hosts file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700796#comment-16700796 ] Daniel Templeton commented on HDFS-13998: - The safest answer is to add another CLI option. I also think it's clearer. I find {{hdfs ec -setPolicy -path /EC}} to be strange syntax. Let's set the policy but not say to what! Huh? > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, > HDFS-13998.03.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14092) Remove two-step create/append in WebHdfsFileSystem
Daniel Templeton created HDFS-14092: --- Summary: Remove two-step create/append in WebHdfsFileSystem Key: HDFS-14092 URL: https://issues.apache.org/jira/browse/HDFS-14092 Project: Hadoop HDFS Issue Type: Improvement Components: webhdfs Affects Versions: 3.2.0 Reporter: Daniel Templeton Per the javadoc on the {{WebHdfsFileSystem.connect()}} method: {code}/** * Two-step requests redirected to a DN * * Create/Append: * Step 1) Submit a Http request with neither auto-redirect nor data. * Step 2) Submit another Http request with the URL from the Location header * with data. * * The reason of having two-step create/append is for preventing clients to * send out the data before the redirect. This issue is addressed by the * "Expect: 100-continue" header in HTTP/1.1; see RFC 2616, Section 8.2.3. * Unfortunately, there are software library bugs (e.g. Jetty 6 http server * and Java 6 http client), which do not correctly implement "Expect: * 100-continue". The two-step create/append is a temporary workaround for * the software library bugs. * * Open/Checksum * Also implements two-step connects for other operations redirected to * a DN such as open and checksum */{code} We should validate that it's safe to remove the two-step process and do so. FYI, [~smeng]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) Committed to trunk. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch, > HDFS-14015.012.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690164#comment-16690164 ] Daniel Templeton commented on HDFS-14015: - {quote}assuming this is tested (e.g. hard code the DetachCurrentThread return to be non zero and eye-checked stderr){quote} I have tested this method manually. The results look like: {noformat}detachCurrentThreadFromJvm: Unable to detach thread Thread[MyThread,10,MyGroup]:10 from the JVM. Error code: -1{noformat} Having addressed [~xiaochen]'s concerns, I'm going to invoke his +1 and commit. Thanks, [~xiaochen], [~pranay_singh], [~yzhangal], and [~jojochuang] for the reviews. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch, > HDFS-14015.012.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688972#comment-16688972 ] Daniel Templeton commented on HDFS-14015: - Patch 012 addresses: * JNI_OK for return value * Corrected log output for windows It doesn't address: * multiple DescribeException() calls -- because there's no case where they would result in the same message being printed repeatedly. * removing the {{!= NULL}} in the conditional -- just because it's possible, doesn't mean we should. Clarity still counts, even in C. * deduping the get_current_thread_id() methods -- there's just no decent place to store the common method. I could create new files for it, but that seems like overkill. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch, > HDFS-14015.012.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.012.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch, > HDFS-14015.012.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.012.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: (was: HDFS-14015.012.patch) > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688719#comment-16688719 ] Daniel Templeton commented on HDFS-13998: - [~vinayrpet] and [~xiaochen], that's a hard-line interpretation of the compatibility guidelines. The intent of the guideline is to avoid breaking scripts that rely on the output of our CLIs. From the perspective of parsing CLI output, HDFS-13732 is not a breaking change. From the perspective of behavior, it's a little grey. Strictly speaking it's a behavioral change that could break a script that doesn't know if it's passing a {{-policy}} arg and is using the output to tell for some reason. I find that scenario pretty unlikely, though. The guidelines are guidelines. Common sense still takes precedence. In this case, EC is a new feature, and the probability that a script exists that would be broken by this change is vanishingly small. For those reasons, I'm not sure I would have labeled HDFS-13732 incompatible. The way to make the change without breaking compatibility in any way is to add another CLI option. {{hdfs ec -defaultPolicy -path /EC}} for example, which would set the policy to the default and print it, or maybe a {{-default}} option for {{-setPolicy}}. > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, > HDFS-13998.03.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687476#comment-16687476 ] Daniel Templeton commented on HDFS-14015: - Thanks for the review, [~pranay_singh]! I originally had all those null checks in there, but I looked at some other JNI code, and no one checks for null on things that are required to be there, such as java.lang.Thread and its API methods. If java.lang.Thread can't be found, we have bigger problems than a segfault. I can add them back in if you like, though. Good point on the return values. I'll clean that up. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687111#comment-16687111 ] Daniel Templeton commented on HDFS-14015: - Oh, duh. The -1 is for not having tests in the patch. Yeah, that's an issue, but I don't see a reasonable way to write a test for this code. I'm open to suggestions. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687029#comment-16687029 ] Daniel Templeton commented on HDFS-14015: - I just did my own review of my patch and caught some issues which are now addressed in patch 11. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.011.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687004#comment-16687004 ] Daniel Templeton commented on HDFS-14015: - I don't see any evidence of a test failure, so I'm not sure what's up with the -1. [~pranay_singh] or [~xiaochen], would one of you care to review this patch? > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.010.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch, HDFS-14015.010.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681741#comment-16681741 ] Daniel Templeton commented on HDFS-13998: - Patch 002 looks good except for one minor quibble: {code} System.out.println("Set " + ecPolicyName + " erasure coding policy on" + " " + path);{code} Can we get rid of the extraneous {{" "}} in the second line, i.e. add a space inside the quote on the first line? > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14047) [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c
[ https://issues.apache.org/jira/browse/HDFS-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14047: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the patch, [~anatoli.shein], and the review, [~James C]. > [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c > > > Key: HDFS-14047 > URL: https://issues.apache.org/jira/browse/HDFS-14047 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: libhdfs, native >Reporter: Anatoli Shein >Assignee: Anatoli Shein >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14047.000.patch, HDFS-14047.001.patch > > > Currently the native client CI tests break deterministically with these > errors: > Libhdfs > 1 - test_test_libhdfs_threaded_hdfs_static (Failed) > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > [exec] hdfsOpenFile(/tlhData0001/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > [exec] (unable to get root cause for java.io.FileNotFoundException) > [exec] (unable to get stack trace for java.io.FileNotFoundException) > > Libhdfs++ > 34 - test_libhdfs_threaded_hdfspp_test_shim_static (Failed) > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > [exec] hdfsOpenFile(/tlhData0001/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > [exec] (unable to get root cause for java.io.FileNotFoundException) > [exec] (unable to get stack trace for java.io.FileNotFoundException) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675930#comment-16675930 ] Daniel Templeton commented on HDFS-14015: - Attached patch 009 to address compiler warnings. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.009.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, > HDFS-14015.009.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673790#comment-16673790 ] Daniel Templeton commented on HDFS-14015: - New patch uses {{toString() + ":" + getId()}} and removes some of the obsessive and unnecessary error checking. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.008.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673741#comment-16673741 ] Daniel Templeton commented on HDFS-14015: - I'm worried about the testing, too. No idea how to reasonably test this code. I'll work on adding the extra info. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14047) [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c
[ https://issues.apache.org/jira/browse/HDFS-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673710#comment-16673710 ] Daniel Templeton commented on HDFS-14047: - LGTM +1 I'll let this sit a day or two before I commit it. > [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c > > > Key: HDFS-14047 > URL: https://issues.apache.org/jira/browse/HDFS-14047 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: libhdfs, native >Reporter: Anatoli Shein >Assignee: Anatoli Shein >Priority: Major > Attachments: HDFS-14047.000.patch, HDFS-14047.001.patch > > > Currently the native client CI tests break deterministically with these > errors: > Libhdfs > 1 - test_test_libhdfs_threaded_hdfs_static (Failed) > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > [exec] hdfsOpenFile(/tlhData0001/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > [exec] (unable to get root cause for java.io.FileNotFoundException) > [exec] (unable to get stack trace for java.io.FileNotFoundException) > > Libhdfs++ > 34 - test_libhdfs_threaded_hdfspp_test_shim_static (Failed) > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > [exec] TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > [exec] hdfsOpenFile(/tlhData0001/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > [exec] (unable to get root cause for java.io.FileNotFoundException) > [exec] (unable to get stack trace for java.io.FileNotFoundException) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.007.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: (was: HDFS-14015.007.patch) > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673678#comment-16673678 ] Daniel Templeton commented on HDFS-14015: - I extended my patch to include more information about the thread that failed to detach from the JVM. Please have a look at patch 007. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.007.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch, HDFS-14015.007.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14027) DFSStripedOutputStream should implement both hsync methods
[ https://issues.apache.org/jira/browse/HDFS-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667921#comment-16667921 ] Daniel Templeton commented on HDFS-14027: - LGTM +1 > DFSStripedOutputStream should implement both hsync methods > -- > > Key: HDFS-14027 > URL: https://issues.apache.org/jira/browse/HDFS-14027 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Attachments: HDFS-14027.01.patch, HDFS-14027.02.patch, > HDFS-14027.03.patch > > > In an internal spark investigation, it appears that when > [EventLoggingListener|https://github.com/apache/spark/blob/7251be0c04f0380208e0197e559158a9e1400868/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L155] > writes to an EC file, one may get exceptions reading, or get odd outputs. A > sample exception is > {noformat} > hdfs dfs -cat /user/spark/applicationHistory/application_1540333573846_0003 | > head -1 > 18/10/23 18:12:39 WARN impl.BlockReaderFactory: I/O error constructing remote > block reader. > java.io.IOException: Got error, status=ERROR, status message opReadBlock > BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 received > exception java.io.IOException: Offset 0 and length 116161 don't match block > BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 ( blockLen > 110296 ), for OP_READ_BLOCK, self=/HOST_IP:48610, remote=/HOST2_IP:20002, for > file /user/spark/applicationHistory/application_1540333573846_0003, for pool > BP-1488936467-HOST_IP-154092519 block -9223372036854774960_1085 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134) > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.checkSuccess(BlockReaderRemote.java:440) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:408) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:848) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:744) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:264) > at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:299) > at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:330) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:326) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:419) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:92) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at > org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at > org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:177) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:326) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:389) > 18/10/23 18:12:39 WARN hdfs.DFSClient: Failed to connect to /HOST2_IP:20002 > for blockBP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 > java.io.IOException: Got error, status=ERROR, status message opReadBlock > BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 received > exception java.io.IOException: Offset 0
[jira] [Commented] (HDFS-14027) DFSStripedOutputStream should implement both hsync methods
[ https://issues.apache.org/jira/browse/HDFS-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665685#comment-16665685 ] Daniel Templeton commented on HDFS-14027: - Thanks for the updated patch, [~xiaochen]. Last comment is on those WARN log messages. There's not much that an admin is going to be able to do with those log messages, and they could potentially occur a lot. I'd knock them back to either INFO or DEBUG and maybe be a little more explicit about the context of what's going on and why it's wrong. > DFSStripedOutputStream should implement both hsync methods > -- > > Key: HDFS-14027 > URL: https://issues.apache.org/jira/browse/HDFS-14027 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Attachments: HDFS-14027.01.patch, HDFS-14027.02.patch > > > In an internal spark investigation, it appears that when > [EventLoggingListener|https://github.com/apache/spark/blob/7251be0c04f0380208e0197e559158a9e1400868/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L155] > writes to an EC file, one may get exceptions reading, or get odd outputs. A > sample exception is > {noformat} > hdfs dfs -cat /user/spark/applicationHistory/application_1540333573846_0003 | > head -1 > 18/10/23 18:12:39 WARN impl.BlockReaderFactory: I/O error constructing remote > block reader. > java.io.IOException: Got error, status=ERROR, status message opReadBlock > BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 received > exception java.io.IOException: Offset 0 and length 116161 don't match block > BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 ( blockLen > 110296 ), for OP_READ_BLOCK, self=/HOST_IP:48610, remote=/HOST2_IP:20002, for > file /user/spark/applicationHistory/application_1540333573846_0003, for pool > BP-1488936467-HOST_IP-154092519 block -9223372036854774960_1085 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134) > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.checkSuccess(BlockReaderRemote.java:440) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:408) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:848) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:744) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:264) > at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:299) > at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:330) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:326) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:419) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:92) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at > org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at > org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:177) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:326) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:389) > 18/10/23 18:12:39 WARN
[jira] [Commented] (HDFS-14027) DFSStripedOutputStream should implement both hsync methods
[ https://issues.apache.org/jira/browse/HDFS-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664292#comment-16664292 ] Daniel Templeton commented on HDFS-14027: - Thanks for the patch, [~xiaochen]. Couple of comments/questions: # Is it more reasonable to stub out the {{hsync()}} method than to throw an {{UnsupportedOperationException}}? I assume the latter would break all the things, but I have to ask. # In the test, please add a message to the assert. # Why is {{dfssos}} final? # Should there be a try/finally or try with resources there in the test code? > DFSStripedOutputStream should implement both hsync methods > -- > > Key: HDFS-14027 > URL: https://issues.apache.org/jira/browse/HDFS-14027 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Attachments: HDFS-14027.01.patch > > > In an internal spark investigation, it appears that when > [EventLoggingListener|https://github.com/apache/spark/blob/7251be0c04f0380208e0197e559158a9e1400868/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L155] > writes to an EC file, one may get exceptions reading, or get odd outputs. A > sample exception is > {noformat} > hdfs dfs -cat /user/spark/applicationHistory/application_1540333573846_0003 | > head -1 > 18/10/23 18:12:39 WARN impl.BlockReaderFactory: I/O error constructing remote > block reader. > java.io.IOException: Got error, status=ERROR, status message opReadBlock > BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 received > exception java.io.IOException: Offset 0 and length 116161 don't match block > BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 ( blockLen > 110296 ), for OP_READ_BLOCK, self=/HOST_IP:48610, remote=/HOST2_IP:20002, for > file /user/spark/applicationHistory/application_1540333573846_0003, for pool > BP-1488936467-HOST_IP-154092519 block -9223372036854774960_1085 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134) > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.checkSuccess(BlockReaderRemote.java:440) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:408) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:848) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:744) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:264) > at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:299) > at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:330) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:326) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:419) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:92) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at > org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at > org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:177) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:326) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:389) >
[jira] [Commented] (HDFS-14022) Failing CTEST test_libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662774#comment-16662774 ] Daniel Templeton commented on HDFS-14022: - HDFS-14015 patch 006 also failed the same way. > Failing CTEST test_libhdfs > -- > > Key: HDFS-14022 > URL: https://issues.apache.org/jira/browse/HDFS-14022 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Pranay Singh >Assignee: Pranay Singh >Priority: Major > > Here are list of recurring failures that are seen, there seems to be a > problem with > invoking the build() in MiniDFSClusterBuilder, there are several failures ( 2 > cores related to it), in the function > struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf) > { >jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER, > "build", "()L" MINIDFS_CLUSTER ";"); ---> > } > Failed CTEST tests > test_test_libhdfs_threaded_hdfs_static > test_test_libhdfs_zerocopy_hdfs_static > test_libhdfs_threaded_hdfspp_test_shim_static > test_hdfspp_mini_dfs_smoke_hdfspp_test_shim_static > libhdfs_mini_stress_valgrind_hdfspp_test_static > memcheck_libhdfs_mini_stress_valgrind_hdfspp_test_static > test_libhdfs_mini_stress_hdfspp_test_shim_static > test_hdfs_ext_hdfspp_test_shim_static > > Details of the failures: > a) test_test_libhdfs_threaded_hdfs_static > hdfsOpenFile(/tlhData0001/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) ---> > (unable to get stack trace for java.io.FileNotFoundException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > hdfsOpenFile(/tlhData/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) > b) test_test_libhdfs_zerocopy_hdfs_static > nmdCreate: Builder#build error: > (unable to get root cause for java.lang.RuntimeException) > (unable to get stack trace for java.lang.RuntimeException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c:253 > (errno: 2): got NULL from cl > Failure: > struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf) > jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER, > "build", "()L" MINIDFS_CLUSTER ";"); ===> Failure > if (jthr) { > printExceptionAndFree(env, jthr, PRINT_EXC_ALL, > "nmdCreate: Builder#build"); > goto error; > } > } > c) test_libhdfs_threaded_hdfspp_test_shim_static > hdfsOpenFile(/tlhData0002/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) ---> > (unable to get stack trace for java.io.FileNotFoundException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > d) > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0078c513, pid=16765, tid=0x7fc4449717c0 > # > # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build > 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # C [hdfs_ext_hdfspp_test_shim_static+0x38c513] > # > # Core dump written. Default location: > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/core > or core.16765 > # > # An error report file with more information is saved as: > # > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/hs_err_pid16765.log > # > # If you would like to submit a bug report, please visit:
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662770#comment-16662770 ] Daniel Templeton commented on HDFS-14015: - Whew. Failed as expected. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662679#comment-16662679 ] Daniel Templeton commented on HDFS-14015: - Just in case, I just posted a patch 006 that only adds comments. Let's see what Jenkins says. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.006.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, > HDFS-14015.006.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14022) Failing CTEST test_libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662676#comment-16662676 ] Daniel Templeton commented on HDFS-14022: - Doesn't look like HDFS-15856 changed anything. I still see a ton a failures. I didn't compare against the previous runs, but it looked like all the same failures as before. In HDFS-14015 patch 005, I changed some text in a {{printf()}} as an "innocuous" change. Could the tests be somehow dependent on the output of that {{printf()}}? Sounds dumb, but ya never know. I'll post a patch that does even less, just to be sure. > Failing CTEST test_libhdfs > -- > > Key: HDFS-14022 > URL: https://issues.apache.org/jira/browse/HDFS-14022 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Pranay Singh >Assignee: Pranay Singh >Priority: Major > > Here are list of recurring failures that are seen, there seems to be a > problem with > invoking the build() in MiniDFSClusterBuilder, there are several failures ( 2 > cores related to it), in the function > struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf) > { >jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER, > "build", "()L" MINIDFS_CLUSTER ";"); ---> > } > Failed CTEST tests > test_test_libhdfs_threaded_hdfs_static > test_test_libhdfs_zerocopy_hdfs_static > test_libhdfs_threaded_hdfspp_test_shim_static > test_hdfspp_mini_dfs_smoke_hdfspp_test_shim_static > libhdfs_mini_stress_valgrind_hdfspp_test_static > memcheck_libhdfs_mini_stress_valgrind_hdfspp_test_static > test_libhdfs_mini_stress_hdfspp_test_shim_static > test_hdfs_ext_hdfspp_test_shim_static > > Details of the failures: > a) test_test_libhdfs_threaded_hdfs_static > hdfsOpenFile(/tlhData0001/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) ---> > (unable to get stack trace for java.io.FileNotFoundException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > hdfsOpenFile(/tlhData/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) > b) test_test_libhdfs_zerocopy_hdfs_static > nmdCreate: Builder#build error: > (unable to get root cause for java.lang.RuntimeException) > (unable to get stack trace for java.lang.RuntimeException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c:253 > (errno: 2): got NULL from cl > Failure: > struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf) > jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER, > "build", "()L" MINIDFS_CLUSTER ";"); ===> Failure > if (jthr) { > printExceptionAndFree(env, jthr, PRINT_EXC_ALL, > "nmdCreate: Builder#build"); > goto error; > } > } > c) test_libhdfs_threaded_hdfspp_test_shim_static > hdfsOpenFile(/tlhData0002/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) ---> > (unable to get stack trace for java.io.FileNotFoundException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > d) > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0078c513, pid=16765, tid=0x7fc4449717c0 > # > # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build > 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # C [hdfs_ext_hdfspp_test_shim_static+0x38c513] > # > # Core dump written. Default
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662671#comment-16662671 ] Daniel Templeton commented on HDFS-14015: - Still a bunch of failures. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14022) Failing CTEST test_libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662535#comment-16662535 ] Daniel Templeton commented on HDFS-14022: - Just FYI, I just retriggered the build on my placebo patch on HDFS-14015 now that HDFS-15856 is in. > Failing CTEST test_libhdfs > -- > > Key: HDFS-14022 > URL: https://issues.apache.org/jira/browse/HDFS-14022 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Pranay Singh >Assignee: Pranay Singh >Priority: Major > > Here are list of recurring failures that are seen, there seems to be a > problem with > invoking the build() in MiniDFSClusterBuilder, there are several failures ( 2 > cores related to it), in the function > struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf) > { >jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER, > "build", "()L" MINIDFS_CLUSTER ";"); ---> > } > Failed CTEST tests > test_test_libhdfs_threaded_hdfs_static > test_test_libhdfs_zerocopy_hdfs_static > test_libhdfs_threaded_hdfspp_test_shim_static > test_hdfspp_mini_dfs_smoke_hdfspp_test_shim_static > libhdfs_mini_stress_valgrind_hdfspp_test_static > memcheck_libhdfs_mini_stress_valgrind_hdfspp_test_static > test_libhdfs_mini_stress_hdfspp_test_shim_static > test_hdfs_ext_hdfspp_test_shim_static > > Details of the failures: > a) test_test_libhdfs_threaded_hdfs_static > hdfsOpenFile(/tlhData0001/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) ---> > (unable to get stack trace for java.io.FileNotFoundException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > hdfsOpenFile(/tlhData/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) > b) test_test_libhdfs_zerocopy_hdfs_static > nmdCreate: Builder#build error: > (unable to get root cause for java.lang.RuntimeException) > (unable to get stack trace for java.lang.RuntimeException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c:253 > (errno: 2): got NULL from cl > Failure: > struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf) > jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER, > "build", "()L" MINIDFS_CLUSTER ";"); ===> Failure > if (jthr) { > printExceptionAndFree(env, jthr, PRINT_EXC_ALL, > "nmdCreate: Builder#build"); > goto error; > } > } > c) test_libhdfs_threaded_hdfspp_test_shim_static > hdfsOpenFile(/tlhData0002/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) ---> > (unable to get stack trace for java.io.FileNotFoundException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > d) > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0078c513, pid=16765, tid=0x7fc4449717c0 > # > # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build > 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # C [hdfs_ext_hdfspp_test_shim_static+0x38c513] > # > # Core dump written. Default location: > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/core > or core.16765 > # > # An error report file with more information is saved as: > # > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/hs_err_pid16765.log > # > # If
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662522#comment-16662522 ] Daniel Templeton commented on HDFS-14015: - I just retriggered the build now that HDFS-15856 is in. Let's see what we get. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14022) Failing CTEST test_libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662515#comment-16662515 ] Daniel Templeton commented on HDFS-14022: - I don't think so. > Failing CTEST test_libhdfs > -- > > Key: HDFS-14022 > URL: https://issues.apache.org/jira/browse/HDFS-14022 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Pranay Singh >Priority: Major > > Here are list of recurring failures that are seen, there seems to be a > problem with > invoking the build() in MiniDFSClusterBuilder, there are several failures ( 2 > cores related to it), in the function > struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf) > { >jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER, > "build", "()L" MINIDFS_CLUSTER ";"); ---> > } > Failed CTEST tests > test_test_libhdfs_threaded_hdfs_static > test_test_libhdfs_zerocopy_hdfs_static > test_libhdfs_threaded_hdfspp_test_shim_static > test_hdfspp_mini_dfs_smoke_hdfspp_test_shim_static > libhdfs_mini_stress_valgrind_hdfspp_test_static > memcheck_libhdfs_mini_stress_valgrind_hdfspp_test_static > test_libhdfs_mini_stress_hdfspp_test_shim_static > test_hdfs_ext_hdfspp_test_shim_static > > Details of the failures: > a) test_test_libhdfs_threaded_hdfs_static > hdfsOpenFile(/tlhData0001/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) ---> > (unable to get stack trace for java.io.FileNotFoundException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > hdfsOpenFile(/tlhData/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) > b) test_test_libhdfs_zerocopy_hdfs_static > nmdCreate: Builder#build error: > (unable to get root cause for java.lang.RuntimeException) > (unable to get stack trace for java.lang.RuntimeException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c:253 > (errno: 2): got NULL from cl > Failure: > struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf) > jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER, > "build", "()L" MINIDFS_CLUSTER ";"); ===> Failure > if (jthr) { > printExceptionAndFree(env, jthr, PRINT_EXC_ALL, > "nmdCreate: Builder#build"); > goto error; > } > } > c) test_libhdfs_threaded_hdfspp_test_shim_static > hdfsOpenFile(/tlhData0002/file1): > FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) > error: > (unable to get root cause for java.io.FileNotFoundException) ---> > (unable to get stack trace for java.io.FileNotFoundException) > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180 > with NULL return return value (errno: 2): expected substring: File does not > exist > TEST_ERROR: failed on > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336 > with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, > fs, ) > d) > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0078c513, pid=16765, tid=0x7fc4449717c0 > # > # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build > 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # C [hdfs_ext_hdfspp_test_shim_static+0x38c513] > # > # Core dump written. Default location: > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/core > or core.16765 > # > # An error report file with more information is saved as: > # > /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/hs_err_pid16765.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # The crash
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661519#comment-16661519 ] Daniel Templeton commented on HDFS-14015: - Thanks, [~jojochuang] and [~pranay_singh]. How shall we proceed here? We can see that the build for patch 004 (the current patch) appears to be just as broken as the build for patch 005 (the placebo patch). I'm a little nervous to commit patch 004 on faith, but I also don't want to make resolving HDFS-14022 a dependency for committing patch 004. Thoughts? > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661055#comment-16661055 ] Daniel Templeton edited comment on HDFS-14015 at 10/23/18 5:54 PM: --- I don't see why the tests are failing, but they're failing consistently. I just posted a new patch that doesn't actually change anything important; it fixes a typo in a string. I want to see what happens with a provably innocuous patch. was (Author: templedf): I don't see why the tests are failing, but they're failing consistently. I just posted a new patch that doesn't actually change anything important; it fixes a typo in a string. I want to see what happens when with a provably innocuous patch. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661055#comment-16661055 ] Daniel Templeton commented on HDFS-14015: - I don't see why the tests are failing, but they're failing consistently. I just posted a new patch that doesn't actually change anything important; it fixes a typo in a string. I want to see what happens when with a provably innocuous patch. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.005.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659800#comment-16659800 ] Daniel Templeton commented on HDFS-14015: - Isn't that the build for the 1st patch? I'm surprised it compiled. It didn't for me locally. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659763#comment-16659763 ] Daniel Templeton commented on HDFS-14015: - Darn it. Caught a mistake in my own review. Updated patch 4 posted. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.004.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch, HDFS-14015.004.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659760#comment-16659760 ] Daniel Templeton commented on HDFS-14015: - Good catch, [~jojochuang]! I also lack a Windows box on which to test, but I have to assume the same laws of physics apply there as well. Posted a new patch that applies the same changes to the Windows side. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.003.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, > HDFS-14015.003.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659651#comment-16659651 ] Daniel Templeton commented on HDFS-14015: - Whoops. Posted the wrong patch before. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.002.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Status: Patch Available (was: Open) > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-14015: Attachment: HDFS-14015.001.patch > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: HDFS-14015.001.patch > > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
[ https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659206#comment-16659206 ] Daniel Templeton commented on HDFS-14015: - I will post the patch as soon as my local build completes. > Improve error handling in hdfsThreadDestructor in native thread local storage > - > > Key: HDFS-14015 > URL: https://issues.apache.org/jira/browse/HDFS-14015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > > In the hdfsThreadDestructor() function, we ignore the return value from the > DetachCurrentThread() call. We are seeing cases where a native thread dies > while holding a JVM monitor, and it doesn't release the monitor. We're > hoping that logging this error instead of ignoring it will shed some light on > the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage
Daniel Templeton created HDFS-14015: --- Summary: Improve error handling in hdfsThreadDestructor in native thread local storage Key: HDFS-14015 URL: https://issues.apache.org/jira/browse/HDFS-14015 Project: Hadoop HDFS Issue Type: Improvement Components: native Affects Versions: 3.0.0 Reporter: Daniel Templeton Assignee: Daniel Templeton In the hdfsThreadDestructor() function, we ignore the return value from the DetachCurrentThread() call. We are seeing cases where a native thread dies while holding a JVM monitor, and it doesn't release the monitor. We're hoping that logging this error instead of ignoring it will shed some light on the issue. In any case, it's good programming practice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped
[ https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-13846: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.2.0 Status: Resolved (was: Patch Available) > Safe blocks counter is not decremented correctly if the block is striped > > > Key: HDFS-13846 > URL: https://issues.apache.org/jira/browse/HDFS-13846 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Fix For: 3.2.0 > > Attachments: HDFS-13846.001.patch, HDFS-13846.002.patch, > HDFS-13846.003.patch, HDFS-13846.004.patch, HDFS-13846.005.patch > > > In BlockManagerSafeMode class, the "safe blocks" counter is incremented if > the number of nodes containing the block equals to the number of data units > specified by the erasure coding policy, which looks like this in the code: > {code:java} > final int safe = storedBlock.isStriped() ? > ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : > safeReplication; > if (storageNum == safe) { > this.blockSafe++; > {code} > But when it is decremented the code does not check if the block is striped or > not, just compares the number of nodes containing the block with 0 > (safeReplication - 1) if the block is complete, which is not correct. > {code:java} > if (storedBlock.isComplete() && > blockManager.countNodes(b).liveReplicas() == safeReplication - 1) { > this.blockSafe--; > assert blockSafe >= 0; > checkSafeMode(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped
[ https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612555#comment-16612555 ] Daniel Templeton commented on HDFS-13846: - Thanks, [~knanasi]. I was going to let those 3 characters slide. :) Committed to trunk. > Safe blocks counter is not decremented correctly if the block is striped > > > Key: HDFS-13846 > URL: https://issues.apache.org/jira/browse/HDFS-13846 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Attachments: HDFS-13846.001.patch, HDFS-13846.002.patch, > HDFS-13846.003.patch, HDFS-13846.004.patch, HDFS-13846.005.patch > > > In BlockManagerSafeMode class, the "safe blocks" counter is incremented if > the number of nodes containing the block equals to the number of data units > specified by the erasure coding policy, which looks like this in the code: > {code:java} > final int safe = storedBlock.isStriped() ? > ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : > safeReplication; > if (storageNum == safe) { > this.blockSafe++; > {code} > But when it is decremented the code does not check if the block is striped or > not, just compares the number of nodes containing the block with 0 > (safeReplication - 1) if the block is complete, which is not correct. > {code:java} > if (storedBlock.isComplete() && > blockManager.countNodes(b).liveReplicas() == safeReplication - 1) { > this.blockSafe--; > assert blockSafe >= 0; > checkSafeMode(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13913) LazyPersistFileScrubber.run() error handling is poor
Daniel Templeton created HDFS-13913: --- Summary: LazyPersistFileScrubber.run() error handling is poor Key: HDFS-13913 URL: https://issues.apache.org/jira/browse/HDFS-13913 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.1.0 Reporter: Daniel Templeton Assignee: Daniel Green In {{LazyPersistFileScrubber.run()}} we have: {code} try { clearCorruptLazyPersistFiles(); } catch (Exception e) { FSNamesystem.LOG.error( "Ignoring exception in LazyPersistFileScrubber:", e); } {code} First problem is that catching {{Exception}} is sloppy. It should instead be a multicatch for the actual exceptions thrown or better a set of separate catch statements that react appropriately to the type of exception. Second problem is that it's bad to log an ERROR that's not actionable and that can be safely ignored. The log message should be logged at WARN or INFO level. Third, the log message is useless. If it's going to be a WARN or ERROR, a log message should be actionable. Otherwise it's an info. A log message should contain enough information for an admin to understand what it means. In the end, I think the right thing here is to leave the high-level behavior unchanged: log a message and ignore the error, hoping that the next run will go better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped
[ https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612417#comment-16612417 ] Daniel Templeton commented on HDFS-13846: - OK, +1 from me. I'll commit later today. > Safe blocks counter is not decremented correctly if the block is striped > > > Key: HDFS-13846 > URL: https://issues.apache.org/jira/browse/HDFS-13846 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Attachments: HDFS-13846.001.patch, HDFS-13846.002.patch, > HDFS-13846.003.patch, HDFS-13846.004.patch > > > In BlockManagerSafeMode class, the "safe blocks" counter is incremented if > the number of nodes containing the block equals to the number of data units > specified by the erasure coding policy, which looks like this in the code: > {code:java} > final int safe = storedBlock.isStriped() ? > ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : > safeReplication; > if (storageNum == safe) { > this.blockSafe++; > {code} > But when it is decremented the code does not check if the block is striped or > not, just compares the number of nodes containing the block with 0 > (safeReplication - 1) if the block is complete, which is not correct. > {code:java} > if (storedBlock.isComplete() && > blockManager.countNodes(b).liveReplicas() == safeReplication - 1) { > this.blockSafe--; > assert blockSafe >= 0; > checkSafeMode(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped
[ https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611338#comment-16611338 ] Daniel Templeton commented on HDFS-13846: - That sounds good to me. Hmmm... I'm wondering why there hasn't been a Jenkins run. Lemme go kick it. > Safe blocks counter is not decremented correctly if the block is striped > > > Key: HDFS-13846 > URL: https://issues.apache.org/jira/browse/HDFS-13846 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Attachments: HDFS-13846.001.patch, HDFS-13846.002.patch, > HDFS-13846.003.patch, HDFS-13846.004.patch > > > In BlockManagerSafeMode class, the "safe blocks" counter is incremented if > the number of nodes containing the block equals to the number of data units > specified by the erasure coding policy, which looks like this in the code: > {code:java} > final int safe = storedBlock.isStriped() ? > ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : > safeReplication; > if (storageNum == safe) { > this.blockSafe++; > {code} > But when it is decremented the code does not check if the block is striped or > not, just compares the number of nodes containing the block with 0 > (safeReplication - 1) if the block is complete, which is not correct. > {code:java} > if (storedBlock.isComplete() && > blockManager.countNodes(b).liveReplicas() == safeReplication - 1) { > this.blockSafe--; > assert blockSafe >= 0; > checkSafeMode(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped
[ https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596832#comment-16596832 ] Daniel Templeton commented on HDFS-13846: - I see. That makes sense. Might be nice to add a comment to explain that so that someone doesn't "fix" it later by making the conditional test {{<=}}. Aside from that, LGTM. Did you look at the deprecation warning that popped up? The jenkins build is gone now, so I can't verify whether it was related to code you added. > Safe blocks counter is not decremented correctly if the block is striped > > > Key: HDFS-13846 > URL: https://issues.apache.org/jira/browse/HDFS-13846 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Attachments: HDFS-13846.001.patch, HDFS-13846.002.patch, > HDFS-13846.003.patch > > > In BlockManagerSafeMode class, the "safe blocks" counter is incremented if > the number of nodes containing the block equals to the number of data units > specified by the erasure coding policy, which looks like this in the code: > {code:java} > final int safe = storedBlock.isStriped() ? > ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : > safeReplication; > if (storageNum == safe) { > this.blockSafe++; > {code} > But when it is decremented the code does not check if the block is striped or > not, just compares the number of nodes containing the block with 0 > (safeReplication - 1) if the block is complete, which is not correct. > {code:java} > if (storedBlock.isComplete() && > blockManager.countNodes(b).liveReplicas() == safeReplication - 1) { > this.blockSafe--; > assert blockSafe >= 0; > checkSafeMode(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org