[jira] [Updated] (HDFS-13913) LazyPersistFileScrubber.run() error handling is poor

2019-09-01 Thread Daniel Templeton (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-13913:

Status: Patch Available  (was: Open)

> LazyPersistFileScrubber.run() error handling is poor
> 
>
> Key: HDFS-13913
> URL: https://issues.apache.org/jira/browse/HDFS-13913
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.0
>Reporter: Daniel Templeton
>Assignee: Daniel Green
>Priority: Minor
> Attachments: HDFS-13913.001.patch
>
>
> In {{LazyPersistFileScrubber.run()}} we have:
> {code}
> try {
>   clearCorruptLazyPersistFiles();
> } catch (Exception e) {
>   FSNamesystem.LOG.error(
>   "Ignoring exception in LazyPersistFileScrubber:", e);
> }
> {code}
> First problem is that catching {{Exception}} is sloppy.  It should instead be 
> a multicatch for the actual exceptions thrown or better a set of separate 
> catch statements that react appropriately to the type of exception.
> Second problem is that it's bad to log an ERROR that's not actionable and 
> that can be safely ignored.  The log message should be logged at WARN or INFO 
> level.
> Third, the log message is useless.  If it's going to be a WARN or ERROR, a 
> log message should be actionable.  Otherwise it's an info.  A log message 
> should contain enough information for an admin to understand what it means.
> In the end, I think the right thing here is to leave the high-level behavior 
> unchanged: log a message and ignore the error, hoping that the next run will 
> go better.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13913) LazyPersistFileScrubber.run() error handling is poor

2019-07-15 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885425#comment-16885425
 ] 

Daniel Templeton commented on HDFS-13913:
-

LGTM +1 pending Jenkins.

> LazyPersistFileScrubber.run() error handling is poor
> 
>
> Key: HDFS-13913
> URL: https://issues.apache.org/jira/browse/HDFS-13913
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.0
>Reporter: Daniel Templeton
>Assignee: Daniel Green
>Priority: Minor
> Attachments: HDFS-13913.001.patch
>
>
> In {{LazyPersistFileScrubber.run()}} we have:
> {code}
> try {
>   clearCorruptLazyPersistFiles();
> } catch (Exception e) {
>   FSNamesystem.LOG.error(
>   "Ignoring exception in LazyPersistFileScrubber:", e);
> }
> {code}
> First problem is that catching {{Exception}} is sloppy.  It should instead be 
> a multicatch for the actual exceptions thrown or better a set of separate 
> catch statements that react appropriately to the type of exception.
> Second problem is that it's bad to log an ERROR that's not actionable and 
> that can be safely ignored.  The log message should be logged at WARN or INFO 
> level.
> Third, the log message is useless.  If it's going to be a WARN or ERROR, a 
> log message should be actionable.  Otherwise it's an info.  A log message 
> should contain enough information for an admin to understand what it means.
> In the end, I think the right thing here is to leave the high-level behavior 
> unchanged: log a message and ignore the error, hoping that the next run will 
> go better.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-9499) Fix typos in DFSAdmin.java

2019-07-15 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved HDFS-9499.

Resolution: Invalid

Looks like it's already been resolved by another JIRA.

> Fix typos in DFSAdmin.java
> --
>
> Key: HDFS-9499
> URL: https://issues.apache.org/jira/browse/HDFS-9499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.8.0
>Reporter: Arpit Agarwal
>Assignee: Daniel Green
>Priority: Major
>
> There are multiple instances of 'snapshot' spelled as 'snaphot' in 
> DFSAdmin.java and TestSnapshotCommands.java.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14047) [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c

2019-06-21 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870037#comment-16870037
 ] 

Daniel Templeton commented on HDFS-14047:
-

I can't at the moment; no desktop/laptop.  @weichiu, could you do the
honors?




> [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c
> 
>
> Key: HDFS-14047
> URL: https://issues.apache.org/jira/browse/HDFS-14047
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: libhdfs, native
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14047.000.patch, HDFS-14047.001.patch
>
>
> Currently the native client CI tests break deterministically with these 
> errors:
> Libhdfs
> 1 - test_test_libhdfs_threaded_hdfs_static (Failed)
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> [exec] hdfsOpenFile(/tlhData0001/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> [exec] (unable to get root cause for java.io.FileNotFoundException)
> [exec] (unable to get stack trace for java.io.FileNotFoundException)
>  
> Libhdfs++
> 34 - test_libhdfs_threaded_hdfspp_test_shim_static (Failed)
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> [exec] hdfsOpenFile(/tlhData0001/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> [exec] (unable to get root cause for java.io.FileNotFoundException)
> [exec] (unable to get stack trace for java.io.FileNotFoundException)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14487) Missing Space in Client Error Message

2019-06-18 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved HDFS-14487.
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0

Thanks for the patch, [~shwetayakkali], and the review, [~sodonnell].  +1  
Committed to trunk.

> Missing Space in Client Error Message
> -
>
> Key: HDFS-14487
> URL: https://issues.apache.org/jira/browse/HDFS-14487
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: Shweta
>Priority: Minor
>  Labels: newbie, noob
> Fix For: 3.3.0
>
> Attachments: HDFS-14487.001.patch
>
>
> {code:java}
>   if (retries == 0) {
> throw new IOException("Unable to close file because the last 
> block"
> + last + " does not have enough number of replicas.");
>   }
> {code}
> Note the missing space after "last block".
> https://github.com/apache/hadoop/blob/f940ab242da80a22bae95509d5c282d7e2f7ecdb/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L968-L969



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14514) Actual read size of open file in encryption zone still larger than listing size even after enabling HDFS-11402 in Hadoop 2

2019-05-29 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851419#comment-16851419
 ] 

Daniel Templeton edited comment on HDFS-14514 at 5/30/19 12:17 AM:
---

LGTM.  I'd like to see the last two ifs in DFSInputStream be an if/else-if, but 
I can fix that on commit.  If there are no complaints, I'll commit this later 
this evening.


was (Author: templedf):
LGTM.  I'd like to see the last two {{if}}s in {{DFSInputStream}} be an 
{{if/else-if}}, but I can fix that on commit.  If there are no complaints, I'll 
commit this later this evening.

> Actual read size of open file in encryption zone still larger than listing 
> size even after enabling HDFS-11402 in Hadoop 2
> --
>
> Key: HDFS-14514
> URL: https://issues.apache.org/jira/browse/HDFS-14514
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs, snapshots
>Affects Versions: 2.6.5, 2.9.2, 2.8.5, 2.7.7
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-14514.branch-2.001.patch, 
> HDFS-14514.branch-2.002.patch, HDFS-14514.branch-2.003.patch
>
>
> In Hadoop 2, when a file is opened for write in *encryption zone*, taken a 
> snapshot and appended, the read out file size in the snapshot is larger than 
> the listing size. This happens even when immutable snapshot HDFS-11402 is 
> enabled.
> Note: The refactor HDFS-8905 happened in Hadoop 3.0 and later fixed the bug 
> silently (probably incidentally). Hadoop 2.x are still suffering from this 
> issue.
> Thanks [~sodonnell] for locating the root cause in the codebase.
> Repro:
> 1. Set dfs.namenode.snapshot.capture.openfiles to true in hdfs-site.xml, 
> start HDFS cluster
> 2. Create an empty directory /dataenc, create encryption zone and allow 
> snapshot on it
> {code:bash}
> hadoop key create reprokey
> sudo -u hdfs hdfs dfs -mkdir /dataenc
> sudo -u hdfs hdfs crypto -createZone -keyName reprokey -path /dataenc
> sudo -u hdfs hdfs dfsadmin -allowSnapshot /dataenc
> {code}
> 3. Use a client that keeps a file open for write under /dataenc. For example, 
> I'm using Flume HDFS sink to tail a local file.
> 4. Append the file several times using the client, keep the file open.
> 5. Create a snapshot
> {code:bash}
> sudo -u hdfs hdfs dfs -createSnapshot /dataenc snap1
> {code}
> 6. Append the file one or more times, but don't let the file size exceed the 
> block size limit. Wait for several seconds for the append to be flushed to DN.
> 7. Do a -ls on the file inside the snapshot, then try to read the file using 
> -get, you should see the actual file size read is larger than the listing 
> size from -ls.
> The patch and an updated unit test will be uploaded later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14514) Actual read size of open file in encryption zone still larger than listing size even after enabling HDFS-11402 in Hadoop 2

2019-05-29 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851419#comment-16851419
 ] 

Daniel Templeton commented on HDFS-14514:
-

LGTM.  I'd like to see the last two {{if}}s in {{DFSInputStream}} be an 
{{if/else-if}}, but I can fix that on commit.  If there are no complaints, I'll 
commit this later this evening.

> Actual read size of open file in encryption zone still larger than listing 
> size even after enabling HDFS-11402 in Hadoop 2
> --
>
> Key: HDFS-14514
> URL: https://issues.apache.org/jira/browse/HDFS-14514
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs, snapshots
>Affects Versions: 2.6.5, 2.9.2, 2.8.5, 2.7.7
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-14514.branch-2.001.patch, 
> HDFS-14514.branch-2.002.patch, HDFS-14514.branch-2.003.patch
>
>
> In Hadoop 2, when a file is opened for write in *encryption zone*, taken a 
> snapshot and appended, the read out file size in the snapshot is larger than 
> the listing size. This happens even when immutable snapshot HDFS-11402 is 
> enabled.
> Note: The refactor HDFS-8905 happened in Hadoop 3.0 and later fixed the bug 
> silently (probably incidentally). Hadoop 2.x are still suffering from this 
> issue.
> Thanks [~sodonnell] for locating the root cause in the codebase.
> Repro:
> 1. Set dfs.namenode.snapshot.capture.openfiles to true in hdfs-site.xml, 
> start HDFS cluster
> 2. Create an empty directory /dataenc, create encryption zone and allow 
> snapshot on it
> {code:bash}
> hadoop key create reprokey
> sudo -u hdfs hdfs dfs -mkdir /dataenc
> sudo -u hdfs hdfs crypto -createZone -keyName reprokey -path /dataenc
> sudo -u hdfs hdfs dfsadmin -allowSnapshot /dataenc
> {code}
> 3. Use a client that keeps a file open for write under /dataenc. For example, 
> I'm using Flume HDFS sink to tail a local file.
> 4. Append the file several times using the client, keep the file open.
> 5. Create a snapshot
> {code:bash}
> sudo -u hdfs hdfs dfs -createSnapshot /dataenc snap1
> {code}
> 6. Append the file one or more times, but don't let the file size exceed the 
> block size limit. Wait for several seconds for the append to be flushed to DN.
> 7. Do a -ls on the file inside the snapshot, then try to read the file using 
> -get, you should see the actual file size read is larger than the listing 
> size from -ls.
> The patch and an updated unit test will be uploaded later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14359) Inherited ACL permissions masked when parent directory does not exist (mkdir -p)

2019-03-25 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14359:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, [~sodonnell], and for the review, [~jojochuang].  
Committed to trunk.

> Inherited ACL permissions masked when parent directory does not exist (mkdir 
> -p)
> 
>
> Key: HDFS-14359
> URL: https://issues.apache.org/jira/browse/HDFS-14359
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14359.001.patch, HDFS-14359.002.patch, 
> HDFS-14359.003.patch
>
>
> There appears to be an issue with ACL inheritance if you 'mkdir' a directory 
> such that the parent directories need to be created (ie mkdir -p).
> If you have a folder /tmp2/testacls as:
> {code}
> hadoop fs -mkdir /tmp2
> hadoop fs -mkdir /tmp2/testacls
> hadoop fs -setfacl -m default:user:hive:rwx /tmp2/testacls
> hadoop fs -setfacl -m default:user:flume:rwx /tmp2/testacls
> hadoop fs -setfacl -m user:hive:rwx /tmp2/testacls
> hadoop fs -setfacl -m user:flume:rwx /tmp2/testacls
> hadoop fs -getfacl -R /tmp2/testacls
> # file: /tmp2/testacls
> # owner: kafka
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> Then create a sub-directory in it, the ACLs are as expected:
> {code}
> hadoop fs -mkdir /tmp2/testacls/dir_from_mkdir
> # file: /tmp2/testacls/dir_from_mkdir
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> However if you mkdir -p a directory, the situation is not the same:
> {code}
> hadoop fs -mkdir -p /tmp2/testacls/dir_with_subdirs/sub1/sub2
> # file: /tmp2/testacls/dir_with_subdirs
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx#effective:r-x
> user:hive:rwx #effective:r-x
> group::r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> # file: /tmp2/testacls/dir_with_subdirs/sub1
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx#effective:r-x
> user:hive:rwx #effective:r-x
> group::r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> # file: /tmp2/testacls/dir_with_subdirs/sub1/sub2
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> Notice the the leaf folder "sub2" is correct, but the two ancestor folders 
> have their permissions masked. I believe this is a regression from the fix 
> for HDFS-6962 with dfs.namenode.posix.acl.inheritance.enabled set to true, as 
> the code has changed significantly from the earlier 2.6 / 2.8 branch.
> I will submit a patch for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14359) Inherited ACL permissions masked when parent directory does not exist (mkdir -p)

2019-03-25 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801220#comment-16801220
 ] 

Daniel Templeton commented on HDFS-14359:
-

Alrighty.  I'll get this committed.  Thanks, [~jojochuang]!

> Inherited ACL permissions masked when parent directory does not exist (mkdir 
> -p)
> 
>
> Key: HDFS-14359
> URL: https://issues.apache.org/jira/browse/HDFS-14359
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14359.001.patch, HDFS-14359.002.patch, 
> HDFS-14359.003.patch
>
>
> There appears to be an issue with ACL inheritance if you 'mkdir' a directory 
> such that the parent directories need to be created (ie mkdir -p).
> If you have a folder /tmp2/testacls as:
> {code}
> hadoop fs -mkdir /tmp2
> hadoop fs -mkdir /tmp2/testacls
> hadoop fs -setfacl -m default:user:hive:rwx /tmp2/testacls
> hadoop fs -setfacl -m default:user:flume:rwx /tmp2/testacls
> hadoop fs -setfacl -m user:hive:rwx /tmp2/testacls
> hadoop fs -setfacl -m user:flume:rwx /tmp2/testacls
> hadoop fs -getfacl -R /tmp2/testacls
> # file: /tmp2/testacls
> # owner: kafka
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> Then create a sub-directory in it, the ACLs are as expected:
> {code}
> hadoop fs -mkdir /tmp2/testacls/dir_from_mkdir
> # file: /tmp2/testacls/dir_from_mkdir
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> However if you mkdir -p a directory, the situation is not the same:
> {code}
> hadoop fs -mkdir -p /tmp2/testacls/dir_with_subdirs/sub1/sub2
> # file: /tmp2/testacls/dir_with_subdirs
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx#effective:r-x
> user:hive:rwx #effective:r-x
> group::r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> # file: /tmp2/testacls/dir_with_subdirs/sub1
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx#effective:r-x
> user:hive:rwx #effective:r-x
> group::r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> # file: /tmp2/testacls/dir_with_subdirs/sub1/sub2
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> Notice the the leaf folder "sub2" is correct, but the two ancestor folders 
> have their permissions masked. I believe this is a regression from the fix 
> for HDFS-6962 with dfs.namenode.posix.acl.inheritance.enabled set to true, as 
> the code has changed significantly from the earlier 2.6 / 2.8 branch.
> I will submit a patch for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14359) Inherited ACL permissions masked when parent directory does not exist (mkdir -p)

2019-03-20 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797246#comment-16797246
 ] 

Daniel Templeton commented on HDFS-14359:
-

LGTM.  +1 from me.  Anyone else want to weigh in before I commit?  
([~andrew.wang], [~jzhuge], [~steve_l], [~arpaga], ...)

> Inherited ACL permissions masked when parent directory does not exist (mkdir 
> -p)
> 
>
> Key: HDFS-14359
> URL: https://issues.apache.org/jira/browse/HDFS-14359
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14359.001.patch, HDFS-14359.002.patch, 
> HDFS-14359.003.patch
>
>
> There appears to be an issue with ACL inheritance if you 'mkdir' a directory 
> such that the parent directories need to be created (ie mkdir -p).
> If you have a folder /tmp2/testacls as:
> {code}
> hadoop fs -mkdir /tmp2
> hadoop fs -mkdir /tmp2/testacls
> hadoop fs -setfacl -m default:user:hive:rwx /tmp2/testacls
> hadoop fs -setfacl -m default:user:flume:rwx /tmp2/testacls
> hadoop fs -setfacl -m user:hive:rwx /tmp2/testacls
> hadoop fs -setfacl -m user:flume:rwx /tmp2/testacls
> hadoop fs -getfacl -R /tmp2/testacls
> # file: /tmp2/testacls
> # owner: kafka
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> Then create a sub-directory in it, the ACLs are as expected:
> {code}
> hadoop fs -mkdir /tmp2/testacls/dir_from_mkdir
> # file: /tmp2/testacls/dir_from_mkdir
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> However if you mkdir -p a directory, the situation is not the same:
> {code}
> hadoop fs -mkdir -p /tmp2/testacls/dir_with_subdirs/sub1/sub2
> # file: /tmp2/testacls/dir_with_subdirs
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx#effective:r-x
> user:hive:rwx #effective:r-x
> group::r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> # file: /tmp2/testacls/dir_with_subdirs/sub1
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx#effective:r-x
> user:hive:rwx #effective:r-x
> group::r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> # file: /tmp2/testacls/dir_with_subdirs/sub1/sub2
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> Notice the the leaf folder "sub2" is correct, but the two ancestor folders 
> have their permissions masked. I believe this is a regression from the fix 
> for HDFS-6962 with dfs.namenode.posix.acl.inheritance.enabled set to true, as 
> the code has changed significantly from the earlier 2.6 / 2.8 branch.
> I will submit a patch for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14359) Inherited ACL permissions masked when parent directory does not exist (mkdir -p)

2019-03-19 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796376#comment-16796376
 ] 

Daniel Templeton commented on HDFS-14359:
-

I think you're right about the 3 failed test results.  It was probably a case 
of testing for what makes the test pass, as opposed to testing expected 
behavior. :)  Looking at the test code history, [~jzhuge] only updated the 
expected ACLs for the leaf directory when he added the POSIX ACL inheritance 
option, which supports my theory.

> Inherited ACL permissions masked when parent directory does not exist (mkdir 
> -p)
> 
>
> Key: HDFS-14359
> URL: https://issues.apache.org/jira/browse/HDFS-14359
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14359.001.patch, HDFS-14359.002.patch, 
> HDFS-14359.003.patch
>
>
> There appears to be an issue with ACL inheritance if you 'mkdir' a directory 
> such that the parent directories need to be created (ie mkdir -p).
> If you have a folder /tmp2/testacls as:
> {code}
> hadoop fs -mkdir /tmp2
> hadoop fs -mkdir /tmp2/testacls
> hadoop fs -setfacl -m default:user:hive:rwx /tmp2/testacls
> hadoop fs -setfacl -m default:user:flume:rwx /tmp2/testacls
> hadoop fs -setfacl -m user:hive:rwx /tmp2/testacls
> hadoop fs -setfacl -m user:flume:rwx /tmp2/testacls
> hadoop fs -getfacl -R /tmp2/testacls
> # file: /tmp2/testacls
> # owner: kafka
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> Then create a sub-directory in it, the ACLs are as expected:
> {code}
> hadoop fs -mkdir /tmp2/testacls/dir_from_mkdir
> # file: /tmp2/testacls/dir_from_mkdir
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> However if you mkdir -p a directory, the situation is not the same:
> {code}
> hadoop fs -mkdir -p /tmp2/testacls/dir_with_subdirs/sub1/sub2
> # file: /tmp2/testacls/dir_with_subdirs
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx#effective:r-x
> user:hive:rwx #effective:r-x
> group::r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> # file: /tmp2/testacls/dir_with_subdirs/sub1
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx#effective:r-x
> user:hive:rwx #effective:r-x
> group::r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> # file: /tmp2/testacls/dir_with_subdirs/sub1/sub2
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> Notice the the leaf folder "sub2" is correct, but the two ancestor folders 
> have their permissions masked. I believe this is a regression from the fix 
> for HDFS-6962 with dfs.namenode.posix.acl.inheritance.enabled set to true, as 
> the code has changed significantly from the earlier 2.6 / 2.8 branch.
> I will submit a patch for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14381) Add option to hdfs dfs -cat to ignore corrupt blocks

2019-03-19 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796338#comment-16796338
 ] 

Daniel Templeton commented on HDFS-14381:
-

That's a really good point.  I'll update the description accordingly.

> Add option to hdfs dfs -cat to ignore corrupt blocks
> 
>
> Key: HDFS-14381
> URL: https://issues.apache.org/jira/browse/HDFS-14381
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.2.0
>Reporter: Daniel Templeton
>Priority: Minor
>
> If I have a file in HDFS that contains 100 blocks, and I happen to lose the 
> first block (for whatever obscure/unlikely/dumb reason), I can no longer 
> access the 99% of the file that's still there and accessible.  In the case of 
> some data formats (e.g. text), the remaining data may still be useful.  It 
> would be nice to have a way to extract the remaining data without having to 
> manually reassemble the file contents from the block files.  Something like 
> {{hdfs dfs -cat -ignoreCorrupt }}.  It could insert some marker to show 
> where the missing blocks are.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14381) Add option to hdfs dfs to ignore corrupt blocks

2019-03-19 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14381:

Description: If I have a file in HDFS that contains 100 blocks, and I 
happen to lose the first block (for whatever obscure/unlikely/dumb reason), I 
can no longer access the 99% of the file that's still there and accessible.  In 
the case of some data formats (e.g. text), the remaining data may still be 
useful.  It would be nice to have a way to extract the remaining data without 
having to manually reassemble the file contents from the block files.  
Something like {{hdfs dfs -copyToLocal -ignoreCorrupt }}.  It could 
insert some marker to show where the missing blocks are.  (was: If I have a 
file in HDFS that contains 100 blocks, and I happen to lose the first block 
(for whatever obscure/unlikely/dumb reason), I can no longer access the 99% of 
the file that's still there and accessible.  In the case of some data formats 
(e.g. text), the remaining data may still be useful.  It would be nice to have 
a way to extract the remaining data without having to manually reassemble the 
file contents from the block files.  Something like {{hdfs dfs -cat 
-ignoreCorrupt }}.  It could insert some marker to show where the missing 
blocks are.)

> Add option to hdfs dfs to ignore corrupt blocks
> ---
>
> Key: HDFS-14381
> URL: https://issues.apache.org/jira/browse/HDFS-14381
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.2.0
>Reporter: Daniel Templeton
>Priority: Minor
>
> If I have a file in HDFS that contains 100 blocks, and I happen to lose the 
> first block (for whatever obscure/unlikely/dumb reason), I can no longer 
> access the 99% of the file that's still there and accessible.  In the case of 
> some data formats (e.g. text), the remaining data may still be useful.  It 
> would be nice to have a way to extract the remaining data without having to 
> manually reassemble the file contents from the block files.  Something like 
> {{hdfs dfs -copyToLocal -ignoreCorrupt }}.  It could insert some marker 
> to show where the missing blocks are.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14381) Add option to hdfs dfs to ignore corrupt blocks

2019-03-19 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14381:

Summary: Add option to hdfs dfs to ignore corrupt blocks  (was: Add option 
to hdfs dfs -cat to ignore corrupt blocks)

> Add option to hdfs dfs to ignore corrupt blocks
> ---
>
> Key: HDFS-14381
> URL: https://issues.apache.org/jira/browse/HDFS-14381
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.2.0
>Reporter: Daniel Templeton
>Priority: Minor
>
> If I have a file in HDFS that contains 100 blocks, and I happen to lose the 
> first block (for whatever obscure/unlikely/dumb reason), I can no longer 
> access the 99% of the file that's still there and accessible.  In the case of 
> some data formats (e.g. text), the remaining data may still be useful.  It 
> would be nice to have a way to extract the remaining data without having to 
> manually reassemble the file contents from the block files.  Something like 
> {{hdfs dfs -cat -ignoreCorrupt }}.  It could insert some marker to show 
> where the missing blocks are.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14382) The hdfs fsck command docs do not explain the meaning of the reported fields

2019-03-19 Thread Daniel Templeton (JIRA)
Daniel Templeton created HDFS-14382:
---

 Summary: The hdfs fsck command docs do not explain the meaning of 
the reported fields
 Key: HDFS-14382
 URL: https://issues.apache.org/jira/browse/HDFS-14382
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.2.0
Reporter: Daniel Templeton


The {{hdfs fsck}} command shows something like:

{noformat}FSCK started by root (auth:SIMPLE) from /172.17.0.2 for path /tmp at 
Tue Mar 19 15:50:24 UTC 2019
.Status: HEALTHY
 Total size:179159051 B
 Total dirs:11
 Total files:   1
 Total symlinks:0
 Total blocks (validated):  2 (avg. block size 89579525 B)
 Minimally replicated blocks:   2 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:1
 Average block replication: 1.0
 Corrupt blocks:0
 Missing replicas:  0 (0.0 %)
 Number of data-nodes:  1
 Number of racks:   1
FSCK ended at Tue Mar 19 15:50:24 UTC 2019 in 3 milliseconds


The filesystem under path '/tmp' is HEALTHY{noformat}

The fields are presumed to be self-explanatory, but I think that's a bold 
assumption.  In particular, it's not obvious how "mis-replicated" blocks differ 
from "under-replicated" or "over-replicated" blocks.  It would be nice to 
explain the meaning of all the fields clearly in the docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14381) Add option to hdfs dfs -cat to ignore corrupt blocks

2019-03-19 Thread Daniel Templeton (JIRA)
Daniel Templeton created HDFS-14381:
---

 Summary: Add option to hdfs dfs -cat to ignore corrupt blocks
 Key: HDFS-14381
 URL: https://issues.apache.org/jira/browse/HDFS-14381
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 3.2.0
Reporter: Daniel Templeton


If I have a file in HDFS that contains 100 blocks, and I happen to lose the 
first block (for whatever obscure/unlikely/dumb reason), I can no longer access 
the 99% of the file that's still there and accessible.  In the case of some 
data formats (e.g. text), the remaining data may still be useful.  It would be 
nice to have a way to extract the remaining data without having to manually 
reassemble the file contents from the block files.  Something like {{hdfs dfs 
-cat -ignoreCorrupt }}.  It could insert some marker to show where the 
missing blocks are.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14328) [Clean-up] Remove NULL check before instanceof in TestGSet

2019-03-18 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14328:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

+1  Thanks for the patch, [~shwetayakkali].  Committed to trunk.

> [Clean-up] Remove NULL check before instanceof in TestGSet
> --
>
> Key: HDFS-14328
> URL: https://issues.apache.org/jira/browse/HDFS-14328
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shweta
>Assignee: Shweta
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14328.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14359) Inherited ACL permissions masked when parent directory does not exist (mkdir -p)

2019-03-11 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789803#comment-16789803
 ] 

Daniel Templeton commented on HDFS-14359:
-

At a first pass, it LGTM.  I'll take a closer look when I get a chance.

> Inherited ACL permissions masked when parent directory does not exist (mkdir 
> -p)
> 
>
> Key: HDFS-14359
> URL: https://issues.apache.org/jira/browse/HDFS-14359
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14359.001.patch
>
>
> There appears to be an issue with ACL inheritance if you 'mkdir' a directory 
> such that the parent directories need to be created (ie mkdir -p).
> If you have a folder /tmp2/testacls as:
> {code}
> hadoop fs -mkdir /tmp2
> hadoop fs -mkdir /tmp2/testacls
> hadoop fs -setfacl -m default:user:hive:rwx /tmp2/testacls
> hadoop fs -setfacl -m default:user:flume:rwx /tmp2/testacls
> hadoop fs -setfacl -m user:hive:rwx /tmp2/testacls
> hadoop fs -setfacl -m user:flume:rwx /tmp2/testacls
> hadoop fs -getfacl -R /tmp2/testacls
> # file: /tmp2/testacls
> # owner: kafka
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> Then create a sub-directory in it, the ACLs are as expected:
> {code}
> hadoop fs -mkdir /tmp2/testacls/dir_from_mkdir
> # file: /tmp2/testacls/dir_from_mkdir
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> However if you mkdir -p a directory, the situation is not the same:
> {code}
> hadoop fs -mkdir -p /tmp2/testacls/dir_with_subdirs/sub1/sub2
> # file: /tmp2/testacls/dir_with_subdirs
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx#effective:r-x
> user:hive:rwx #effective:r-x
> group::r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> # file: /tmp2/testacls/dir_with_subdirs/sub1
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx#effective:r-x
> user:hive:rwx #effective:r-x
> group::r-x
> mask::r-x
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> # file: /tmp2/testacls/dir_with_subdirs/sub1/sub2
> # owner: sodonnell
> # group: supergroup
> user::rwx
> user:flume:rwx
> user:hive:rwx
> group::r-x
> mask::rwx
> other::r-x
> default:user::rwx
> default:user:flume:rwx
> default:user:hive:rwx
> default:group::r-x
> default:mask::rwx
> default:other::r-x
> {code}
> Notice the the leaf folder "sub2" is correct, but the two ancestor folders 
> have their permissions masked. I believe this is a regression from the fix 
> for HDFS-6962 with dfs.namenode.posix.acl.inheritance.enabled set to true, as 
> the code has changed significantly from the earlier 2.6 / 2.8 branch.
> I will submit a patch for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14339) Inconsistent log level practices in RpcProgramNfs3.java

2019-03-09 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788795#comment-16788795
 ] 

Daniel Templeton commented on HDFS-14339:
-

Since it's [~OneisAll]'s patch, I went ahead and reassigned the JIRA to him.

> Inconsistent log level practices in RpcProgramNfs3.java
> ---
>
> Key: HDFS-14339
> URL: https://issues.apache.org/jira/browse/HDFS-14339
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 3.1.0, 2.8.5
>Reporter: Anuhan Torgonshar
>Assignee: Anuhan Torgonshar
>Priority: Major
>  Labels: easyfix
> Attachments: HDFS-14339.trunk.patch
>
>
> There are *inconsistent* log level practices in 
> _*hadoop-2.8.5-src/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/**RpcProgramNfs3.java*_.
>  
> {code:java}
> //following log levels are inconsistent with other practices which seems to 
> more appropriate
> //from line 1814 to 1819 & line 1831 to 1836 in Hadoop-2.8.5 version
> try { 
> attr = writeManager.getFileAttr(dfsClient, childHandle, iug); 
> } catch (IOException e) { 
> LOG.error("Can't get file attributes for fileId: {}", fileId, e); 
> continue; 
> }
> //other 2 same practices in this file
> //from line 907 to 911 & line 2102 to 2106 
> try {
> postOpAttr = writeManager.getFileAttr(dfsClient, handle, iug);
> } catch (IOException e1) {
> LOG.info("Can't get postOpAttr for fileId: {}", e1);
> }
> //other 3 similar practices
> //from line 1224 to 1227 & line 1139 to 1143  1309 to 1313
> try {
> postOpDirAttr = Nfs3Utils.getFileAttr(dfsClient, dirFileIdPath, iug);
> } catch (IOException e1) {
> LOG.info("Can't get postOpDirAttr for {}", dirFileIdPath, e1);
> } 
> {code}
> Therefore, when the code catches _*IOException*_ for _*getFileAttr()*_ 
> method, it more likely prints a log message with _*INFO*_ level, a lower 
> level, a higher level may be scary the users in future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14339) Inconsistent log level practices in RpcProgramNfs3.java

2019-03-09 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton reassigned HDFS-14339:
---

Assignee: Anuhan Torgonshar  (was: Shweta)

> Inconsistent log level practices in RpcProgramNfs3.java
> ---
>
> Key: HDFS-14339
> URL: https://issues.apache.org/jira/browse/HDFS-14339
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 3.1.0, 2.8.5
>Reporter: Anuhan Torgonshar
>Assignee: Anuhan Torgonshar
>Priority: Major
>  Labels: easyfix
> Attachments: HDFS-14339.trunk.patch
>
>
> There are *inconsistent* log level practices in 
> _*hadoop-2.8.5-src/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/**RpcProgramNfs3.java*_.
>  
> {code:java}
> //following log levels are inconsistent with other practices which seems to 
> more appropriate
> //from line 1814 to 1819 & line 1831 to 1836 in Hadoop-2.8.5 version
> try { 
> attr = writeManager.getFileAttr(dfsClient, childHandle, iug); 
> } catch (IOException e) { 
> LOG.error("Can't get file attributes for fileId: {}", fileId, e); 
> continue; 
> }
> //other 2 same practices in this file
> //from line 907 to 911 & line 2102 to 2106 
> try {
> postOpAttr = writeManager.getFileAttr(dfsClient, handle, iug);
> } catch (IOException e1) {
> LOG.info("Can't get postOpAttr for fileId: {}", e1);
> }
> //other 3 similar practices
> //from line 1224 to 1227 & line 1139 to 1143  1309 to 1313
> try {
> postOpDirAttr = Nfs3Utils.getFileAttr(dfsClient, dirFileIdPath, iug);
> } catch (IOException e1) {
> LOG.info("Can't get postOpDirAttr for {}", dirFileIdPath, e1);
> } 
> {code}
> Therefore, when the code catches _*IOException*_ for _*getFileAttr()*_ 
> method, it more likely prints a log message with _*INFO*_ level, a lower 
> level, a higher level may be scary the users in future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14333) Datanode fails to start if any disk has errors during Namenode registration

2019-03-04 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783792#comment-16783792
 ] 

Daniel Templeton commented on HDFS-14333:
-

I took a look, and I don't have any comments.  LGTM!  I'm gonna let someone who 
knows volume management in the data node better give you the +1, though.

> Datanode fails to start if any disk has errors during Namenode registration
> ---
>
> Key: HDFS-14333
> URL: https://issues.apache.org/jira/browse/HDFS-14333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14333.001.patch
>
>
> This is closely related to HDFS-9908, where it was reported that a datanode 
> would fail to start if an IO error occurred on a single disk when running du 
> during Datanode registration. That Jira was closed due to HADOOP-12973 which 
> refactored how du is called and prevents any exception being thrown. However 
> this problem can still occur if the volume has errors (eg permission or 
> filesystem corruption) when the disk is scanned to load all the replicas. The 
> method chain is:
> DataNode.initBlockPool -> FSDataSetImpl.addBlockPool -> 
> FSVolumeList.getAllVolumesMap -> Throws exception which goes unhandled.
> The DN logs will contain a stack trace for the problem volume, so the 
> workaround is to remove the volume from the DN config and the DN will start, 
> but the logs are a little confusing, so its always not obvious what the issue 
> is.
> These are the cut down logs from an occurrence of this issue.
> {code}
> 2019-03-01 08:58:24,830 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
> block pool BP-240961797-x.x.x.x-1392827522027 on volume 
> /data/18/dfs/dn/current...
> ...
> 2019-03-01 08:58:27,029 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Could 
> not get disk usage information
> ExitCodeException exitCode=1: du: cannot read directory 
> `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir215':
>  Permission denied
> du: cannot read directory 
> `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir213':
>  Permission denied
> du: cannot read directory 
> `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir97/subdir25':
>  Permission denied
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
>   at org.apache.hadoop.util.Shell.run(Shell.java:504)
>   at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:61)
>   at org.apache.hadoop.fs.DU.refresh(DU.java:53)
>   at 
> org.apache.hadoop.fs.CachingGetSpaceUsed.init(CachingGetSpaceUsed.java:84)
>   at 
> org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:166)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:145)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:881)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:412)
> ...
> 2019-03-01 08:58:27,043 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time 
> taken to scan block pool BP-240961797-x.x.x.x-1392827522027 on 
> /data/18/dfs/dn/current: 2202ms
> {code}
> So we can see a du error occurred, was logged but not re-thrown (due to 
> HADOOP-12973) and the blockpool scan completed. However then in the 'add 
> replicas to map' logic, we got another exception stemming from the same 
> problem:
> {code}
> 2019-03-01 08:58:27,564 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding 
> replicas to map for block pool BP-240961797-x.x.x.x-1392827522027 on volume 
> /data/18/dfs/dn/current...
> ...
> 2019-03-01 08:58:31,155 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught 
> exception while adding replicas from /data/18/dfs/dn/current. Will throw 
> later.
> java.io.IOException: Invalid directory or I/O error occurred for dir: 
> /data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir215
>   at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1167)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
>   at 
> 

[jira] [Updated] (HDFS-14273) Fix checkstyle issues in BlockLocation's method javadoc

2019-02-20 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14273:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
 Release Note: Thanks for the patch, [~shwetayakkali], and review, 
[~knanasi]. Committed to trunk.
   Status: Resolved  (was: Patch Available)

> Fix checkstyle issues in BlockLocation's method javadoc
> ---
>
> Key: HDFS-14273
> URL: https://issues.apache.org/jira/browse/HDFS-14273
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shweta
>Assignee: Shweta
>Priority: Trivial
> Fix For: 3.3.0
>
> Attachments: HDFS-14273.001.patch
>
>
> BlockLocation. java has checkstyle issues for most of methods's javadoc and 
> an indentation error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14273) Fix checkstyle issues in BlockLocation's method javadoc

2019-02-20 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773511#comment-16773511
 ] 

Daniel Templeton commented on HDFS-14273:
-

LGTM +1.  I'll check it in soon.

> Fix checkstyle issues in BlockLocation's method javadoc
> ---
>
> Key: HDFS-14273
> URL: https://issues.apache.org/jira/browse/HDFS-14273
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shweta
>Assignee: Shweta
>Priority: Trivial
> Attachments: HDFS-14273.001.patch
>
>
> BlockLocation. java has checkstyle issues for most of methods's javadoc and 
> an indentation error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14185) Cleanup method calls to static Assert methods in TestAddStripedBlocks

2019-01-23 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14185:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Thanks, [~shwetayakkali].  Committed to trunk.

> Cleanup method calls to static Assert methods in TestAddStripedBlocks
> -
>
> Key: HDFS-14185
> URL: https://issues.apache.org/jira/browse/HDFS-14185
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Shweta
>Assignee: Shweta
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14185.001.patch, HDFS-14185.002.patch, 
> HDFS-14185.003.patch, HDFS-14185.004.patch
>
>
> Cleanup code in TestAddStripedBlock to cleanup method calls to static Assert 
> methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14185) Cleanup method calls to static Assert methods in TestAddStripedBlocks

2019-01-10 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739848#comment-16739848
 ] 

Daniel Templeton commented on HDFS-14185:
-

+1 pending a clean(-ish) Jenkins run.

> Cleanup method calls to static Assert methods in TestAddStripedBlocks
> -
>
> Key: HDFS-14185
> URL: https://issues.apache.org/jira/browse/HDFS-14185
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Shweta
>Assignee: Shweta
>Priority: Minor
> Attachments: HDFS-14185.001.patch, HDFS-14185.002.patch, 
> HDFS-14185.003.patch, HDFS-14185.004.patch
>
>
> Cleanup code in TestAddStripedBlock to cleanup method calls to static Assert 
> methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14132) Add BlockLocation.isStriped() to determine if block is replicated or Striped

2019-01-08 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737708#comment-16737708
 ] 

Daniel Templeton edited comment on HDFS-14132 at 1/9/19 1:06 AM:
-

Thanks for the patch, [~shwetayakkali].  Committed to trunk.


was (Author: templedf):
Thanks for the patch, [~shwetayakkali].  Commited to trunk.

> Add BlockLocation.isStriped() to determine if block is replicated or Striped
> 
>
> Key: HDFS-14132
> URL: https://issues.apache.org/jira/browse/HDFS-14132
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14132.001.patch, HDFS-14132.002.patch, 
> HDFS-14132.003.patch, HDFS-14132.004.patch
>
>
> Impala uses FileSystem#getBlockLocation to get block locations. We can add 
> isStriped() method for it to easier determine the block is belonged to 
> replicated file or striped file.
> In HDFS, this isStriped information is already in 
> HdfsBlockLocation#LocatedBlock#isStriped(), adding this method to 
> BlockLocation does not introduce space overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14132) Add BlockLocation.isStriped() to determine if block is replicated or Striped

2019-01-08 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14132:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, [~shwetayakkali].  Commited to trunk.

> Add BlockLocation.isStriped() to determine if block is replicated or Striped
> 
>
> Key: HDFS-14132
> URL: https://issues.apache.org/jira/browse/HDFS-14132
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14132.001.patch, HDFS-14132.002.patch, 
> HDFS-14132.003.patch, HDFS-14132.004.patch
>
>
> Impala uses FileSystem#getBlockLocation to get block locations. We can add 
> isStriped() method for it to easier determine the block is belonged to 
> replicated file or striped file.
> In HDFS, this isStriped information is already in 
> HdfsBlockLocation#LocatedBlock#isStriped(), adding this method to 
> BlockLocation does not introduce space overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14132) Add BlockLocation.isStriped() to determine if block is replicated or Striped

2019-01-08 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737703#comment-16737703
 ] 

Daniel Templeton commented on HDFS-14132:
-

I don't love breaking a line on a "." when you could have broken on an "=" 
instead, but it's not enough to ask for a new patch.  +1  I'll commit shortly.

> Add BlockLocation.isStriped() to determine if block is replicated or Striped
> 
>
> Key: HDFS-14132
> URL: https://issues.apache.org/jira/browse/HDFS-14132
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Attachments: HDFS-14132.001.patch, HDFS-14132.002.patch, 
> HDFS-14132.003.patch, HDFS-14132.004.patch
>
>
> Impala uses FileSystem#getBlockLocation to get block locations. We can add 
> isStriped() method for it to easier determine the block is belonged to 
> replicated file or striped file.
> In HDFS, this isStriped information is already in 
> HdfsBlockLocation#LocatedBlock#isStriped(), adding this method to 
> BlockLocation does not introduce space overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14121) Log message about the old hosts file format is misleading

2018-12-14 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14121:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the patch, [~zvenczel], and the reviews, [~knanasi].  Committed to 
trunk.

> Log message about the old hosts file format is misleading
> -
>
> Key: HDFS-14121
> URL: https://issues.apache.org/jira/browse/HDFS-14121
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch
>
>
> In {{CombinedHostsFileReader.readFile()}} we have the following:
> {code}  LOG.warn("{} has invalid JSON format." +
>   "Try the old format without top-level token defined.", 
> hostsFile);{code}
> That message is trying to say that we tried parsing the hosts file as a 
> well-formed JSON file and failed, so we're going to try again assuming that 
> it's in the old badly-formed format.  What it actually says is that the hosts 
> fie is bad, and the admin should try switching to the old format.  Those are 
> two very different things.
> While were in there, we should refactor the logging so that instead of 
> reporting that we're going to try using a different parser (who the heck 
> cares?), we report that the we had to use the old parser to successfully 
> parse the hosts file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14121) Log message about the old hosts file format is misleading

2018-12-14 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721324#comment-16721324
 ] 

Daniel Templeton commented on HDFS-14121:
-

On further reflection, let's leave it as a WARN.  +1

> Log message about the old hosts file format is misleading
> -
>
> Key: HDFS-14121
> URL: https://issues.apache.org/jira/browse/HDFS-14121
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch
>
>
> In {{CombinedHostsFileReader.readFile()}} we have the following:
> {code}  LOG.warn("{} has invalid JSON format." +
>   "Try the old format without top-level token defined.", 
> hostsFile);{code}
> That message is trying to say that we tried parsing the hosts file as a 
> well-formed JSON file and failed, so we're going to try again assuming that 
> it's in the old badly-formed format.  What it actually says is that the hosts 
> fie is bad, and the admin should try switching to the old format.  Those are 
> two very different things.
> While were in there, we should refactor the logging so that instead of 
> reporting that we're going to try using a different parser (who the heck 
> cares?), we report that the we had to use the old parser to successfully 
> parse the hosts file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14121) Log message about the old hosts file format is misleading

2018-12-14 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721214#comment-16721214
 ] 

Daniel Templeton commented on HDFS-14121:
-

Thanks, [~zvenczel].  The patch looks great.  One philosophical question: when 
you log that the old format is being used, should that be INFO level?  It's not 
actually a problem, per se, though it is something the admin should fix.

> Log message about the old hosts file format is misleading
> -
>
> Key: HDFS-14121
> URL: https://issues.apache.org/jira/browse/HDFS-14121
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch
>
>
> In {{CombinedHostsFileReader.readFile()}} we have the following:
> {code}  LOG.warn("{} has invalid JSON format." +
>   "Try the old format without top-level token defined.", 
> hostsFile);{code}
> That message is trying to say that we tried parsing the hosts file as a 
> well-formed JSON file and failed, so we're going to try again assuming that 
> it's in the old badly-formed format.  What it actually says is that the hosts 
> fie is bad, and the admin should try switching to the old format.  Those are 
> two very different things.
> While were in there, we should refactor the logging so that instead of 
> reporting that we're going to try using a different parser (who the heck 
> cares?), we report that the we had to use the old parser to successfully 
> parse the hosts file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14132) Add BlockLocation.isStriped() to determine if block is replicated or Striped

2018-12-14 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721093#comment-16721093
 ] 

Daniel Templeton commented on HDFS-14132:
-

Thanks for the patch, [~shwetayakkali].  A couple of comments:

# For this method:{code}  public boolean isStriped() { return false; }{code} it 
would be nicer to add the newlines around the return statement.  It's more 
consistent, and it's easier to read.
# For the asserts in your tests, please add assert messages that explain what 
the failure is in a way that someone can understand without having to read code.
# Super trivial point, but the usual way to do asserts is to leave out the 
{{Assert.}} and add the needed imports.  In that test class, I can see that 
{{Assert.assertEquals}} is already imported, but most of the asserts include 
the {{Assert.}}.  Since it's already inconsistent, I'd say it's better to do it 
the usual way.  But that's just me.

> Add BlockLocation.isStriped() to determine if block is replicated or Striped
> 
>
> Key: HDFS-14132
> URL: https://issues.apache.org/jira/browse/HDFS-14132
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Attachments: HDFS-14132.001.patch
>
>
> Impala uses FileSystem#getBlockLocation to get block locations. We can add 
> isStriped() method for it to easier determine the block is belonged to 
> replicated file or striped file.
> In HDFS, this isStriped information is already in 
> HdfsBlockLocation#LocatedBlock#isStriped(), adding this method to 
> BlockLocation does not introduce space overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13985) Clearer error message for ReplicaNotFoundException

2018-12-13 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-13985:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, [~adam.antal].  Committed to trunk.

> Clearer error message for ReplicaNotFoundException
> --
>
> Key: HDFS-13985
> URL: https://issues.apache.org/jira/browse/HDFS-13985
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13985.001.patch, HDFS-13985.002.patch, 
> HDFS-13985.002.patch, HDFS-13985.003.patch
>
>
> The issue is that we came across a ReplicaNotFoundException in a bug report, 
> the most informative thing we could get is "Replica not found for 
> [ExtendedBlock]". If someone tries to investigate cases including 
> ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, 
> but as a starting point enhancing the exception message would boost this 
> process, and be beneficial in the long run.
> More concretely, it would be helpful if any of the following information was 
> displayed along with the exception: file's name, replication factor or block 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13985) Clearer error message for ReplicaNotFoundException

2018-12-13 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720293#comment-16720293
 ] 

Daniel Templeton commented on HDFS-13985:
-

Oops.  I forgot to also thank [~zvenczel] and [~jojochuang] for the reviews!

> Clearer error message for ReplicaNotFoundException
> --
>
> Key: HDFS-13985
> URL: https://issues.apache.org/jira/browse/HDFS-13985
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13985.001.patch, HDFS-13985.002.patch, 
> HDFS-13985.002.patch, HDFS-13985.003.patch
>
>
> The issue is that we came across a ReplicaNotFoundException in a bug report, 
> the most informative thing we could get is "Replica not found for 
> [ExtendedBlock]". If someone tries to investigate cases including 
> ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, 
> but as a starting point enhancing the exception message would boost this 
> process, and be beneficial in the long run.
> More concretely, it would be helpful if any of the following information was 
> displayed along with the exception: file's name, replication factor or block 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13985) Clearer error message for ReplicaNotFoundException

2018-12-13 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720160#comment-16720160
 ] 

Daniel Templeton commented on HDFS-13985:
-

LGTM +1

> Clearer error message for ReplicaNotFoundException
> --
>
> Key: HDFS-13985
> URL: https://issues.apache.org/jira/browse/HDFS-13985
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: HDFS-13985.001.patch, HDFS-13985.002.patch, 
> HDFS-13985.002.patch, HDFS-13985.003.patch
>
>
> The issue is that we came across a ReplicaNotFoundException in a bug report, 
> the most informative thing we could get is "Replica not found for 
> [ExtendedBlock]". If someone tries to investigate cases including 
> ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, 
> but as a starting point enhancing the exception message would boost this 
> process, and be beneficial in the long run.
> More concretely, it would be helpful if any of the following information was 
> displayed along with the exception: file's name, replication factor or block 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14126) DataNode DirectoryScanner holding global lock for too long

2018-12-13 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719884#comment-16719884
 ] 

Daniel Templeton commented on HDFS-14126:
-

That error and the lag that causes it are exactly why the directory scanner 
throttle test is flaky.  When working on that test, I noticed that we 
occasionally see that lock-held-too-long message, and it correlates with the 
directory scanner taking much longer than usual to complete a scan.  So, yes 
it's a performance issue, but, no, it's not a regression.

> DataNode DirectoryScanner holding global lock for too long
> --
>
> Key: HDFS-14126
> URL: https://issues.apache.org/jira/browse/HDFS-14126
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand 
> blocks.
> And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds:
> {quote}
> 2018-12-03 21:33:09,130 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-4588049-10.17.XXX-XX-281857726 Total blocks: 434401, missing metadata 
> fi
> les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
> 2018-12-03 21:33:09,131 WARN 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock 
> held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve
> r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 
> lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {quote}
> Log messages like this repeats every several hours (6, to be exact). I am not 
> sure if this is a performance regression, or just the fact that the lock 
> information is printed in Hadoop 3. [~vagarychen] or [~templedf] do you know?
> There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's 
> heap size is set to several GB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13985) Clearer error message for ReplicaNotFoundException

2018-12-12 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719473#comment-16719473
 ] 

Daniel Templeton commented on HDFS-13985:
-

Thanks' [~adam.antal].  I'm going to pick on the error message a bit.  Can we 
please make it, "The block may have been removed recently by the balancer or by 
intentionally reducing the replication factor. This condition is usually 
harmless. To be certain, please check the preceding datanode log messages for 
signs of a more serious issue."

> Clearer error message for ReplicaNotFoundException
> --
>
> Key: HDFS-13985
> URL: https://issues.apache.org/jira/browse/HDFS-13985
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: HDFS-13985.001.patch, HDFS-13985.002.patch, 
> HDFS-13985.002.patch
>
>
> The issue is that we came across a ReplicaNotFoundException in a bug report, 
> the most informative thing we could get is "Replica not found for 
> [ExtendedBlock]". If someone tries to investigate cases including 
> ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, 
> but as a starting point enhancing the exception message would boost this 
> process, and be beneficial in the long run.
> More concretely, it would be helpful if any of the following information was 
> displayed along with the exception: file's name, replication factor or block 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14121) Log message about the old hosts file format is misleading

2018-12-12 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719123#comment-16719123
 ] 

Daniel Templeton commented on HDFS-14121:
-

Thanks, [~zvenczel].  I agree with Kitti that the patch could be more verbose.  
Specifically, it should be actionable.  If we're going to issue a warning, it 
should explain how to make it go away.  Also, I'd rather have the warning 
printed after we know whether using the old parser worked.  If using the old 
parser also fails, there's no reason to say we tried it.

> Log message about the old hosts file format is misleading
> -
>
> Key: HDFS-14121
> URL: https://issues.apache.org/jira/browse/HDFS-14121
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-14121.01.patch
>
>
> In {{CombinedHostsFileReader.readFile()}} we have the following:
> {code}  LOG.warn("{} has invalid JSON format." +
>   "Try the old format without top-level token defined.", 
> hostsFile);{code}
> That message is trying to say that we tried parsing the hosts file as a 
> well-formed JSON file and failed, so we're going to try again assuming that 
> it's in the old badly-formed format.  What it actually says is that the hosts 
> fie is bad, and the admin should try switching to the old format.  Those are 
> two very different things.
> While were in there, we should refactor the logging so that instead of 
> reporting that we're going to try using a different parser (who the heck 
> cares?), we report that the we had to use the old parser to successfully 
> parse the hosts file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13752) fs.Path stores file path in java.net.URI causes big memory waste

2018-12-04 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton reassigned HDFS-13752:
---

Assignee: Barnabas Maidics

> fs.Path stores file path in java.net.URI causes big memory waste
> 
>
> Key: HDFS-13752
> URL: https://issues.apache.org/jira/browse/HDFS-13752
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.7.6
> Environment: Hive 2.1.1 and hadoop 2.7.6 
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HDFS-13752.001.patch, HDFS-13752.002.patch, 
> HDFS-13752.003.patch, HDFSbenchmark.pdf, Screen Shot 2018-07-20 at 
> 11.12.38.png, heapdump-10partitions.html, measurement.pdf
>
>
> I was looking at HiveServer2 memory usage, and a big percentage of this was 
> because of org.apache.hadoop.fs.Path, where you store file paths in a 
> java.net.URI object. The URI implementation stores the same string in 3 
> different objects (see the attached image). In Hive when there are many 
> partitions this cause a big memory usage. In my particular case 42% of memory 
> was used by java.net.URI so it could be reduced to 14%. 
> I wonder if the community is open to replace it with a more memory efficient 
> implementation and what other things should be considered here? It can be a 
> huge memory improvement for Hadoop and for Hive as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14121) Log message about the old hosts file format is misleading

2018-11-30 Thread Daniel Templeton (JIRA)
Daniel Templeton created HDFS-14121:
---

 Summary: Log message about the old hosts file format is misleading
 Key: HDFS-14121
 URL: https://issues.apache.org/jira/browse/HDFS-14121
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Daniel Templeton


In {{CombinedHostsFileReader.readFile()}} we have the following:

{code}  LOG.warn("{} has invalid JSON format." +
  "Try the old format without top-level token defined.", 
hostsFile);{code}

That message is trying to say that we tried parsing the hosts file as a 
well-formed JSON file and failed, so we're going to try again assuming that 
it's in the old badly-formed format.  What it actually says is that the hosts 
fie is bad, and the admin should try switching to the old format.  Those are 
two very different things.

While were in there, we should refactor the logging so that instead of 
reporting that we're going to try using a different parser (who the heck 
cares?), we report that the we had to use the old parser to successfully parse 
the hosts file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-27 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700796#comment-16700796
 ] 

Daniel Templeton commented on HDFS-13998:
-

The safest answer is to add another CLI option.  I also think it's clearer.  I 
find {{hdfs ec -setPolicy -path /EC}} to be strange syntax.  Let's set the 
policy but not say to what!  Huh?

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, 
> HDFS-13998.03.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14092) Remove two-step create/append in WebHdfsFileSystem

2018-11-20 Thread Daniel Templeton (JIRA)
Daniel Templeton created HDFS-14092:
---

 Summary: Remove two-step create/append in WebHdfsFileSystem
 Key: HDFS-14092
 URL: https://issues.apache.org/jira/browse/HDFS-14092
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 3.2.0
Reporter: Daniel Templeton


Per the javadoc on the {{WebHdfsFileSystem.connect()}} method:

{code}/**
 * Two-step requests redirected to a DN
 *
 * Create/Append:
 * Step 1) Submit a Http request with neither auto-redirect nor data.
 * Step 2) Submit another Http request with the URL from the Location header
 * with data.
 *
 * The reason of having two-step create/append is for preventing clients to
 * send out the data before the redirect. This issue is addressed by the
 * "Expect: 100-continue" header in HTTP/1.1; see RFC 2616, Section 8.2.3.
 * Unfortunately, there are software library bugs (e.g. Jetty 6 http server
 * and Java 6 http client), which do not correctly implement "Expect:
 * 100-continue". The two-step create/append is a temporary workaround for
 * the software library bugs.
 *
 * Open/Checksum
 * Also implements two-step connects for other operations redirected to
 * a DN such as open and checksum
 */{code}

We should validate that it's safe to remove the two-step process and do so.  
FYI, [~smeng].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-16 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch, 
> HDFS-14015.012.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-16 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690164#comment-16690164
 ] 

Daniel Templeton commented on HDFS-14015:
-

{quote}assuming this is tested (e.g. hard code the DetachCurrentThread return 
to be non zero and eye-checked stderr){quote}

I have tested this method manually.  The results look like:

{noformat}detachCurrentThreadFromJvm: Unable to detach thread 
Thread[MyThread,10,MyGroup]:10 from the JVM. Error code: -1{noformat}

Having addressed [~xiaochen]'s concerns, I'm going to invoke his +1 and commit. 
 Thanks, [~xiaochen], [~pranay_singh], [~yzhangal], and [~jojochuang] for the 
reviews.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch, 
> HDFS-14015.012.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-15 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688972#comment-16688972
 ] 

Daniel Templeton commented on HDFS-14015:
-

Patch 012 addresses:

* JNI_OK for return value
* Corrected log output for windows

It doesn't address:
* multiple DescribeException() calls -- because there's no case where they 
would result in the same message being printed repeatedly.
* removing the {{!= NULL}} in the conditional -- just because it's possible, 
doesn't mean we should.  Clarity still counts, even in C.
* deduping the get_current_thread_id() methods -- there's just no decent place 
to store the common method.  I could create new files for it, but that seems 
like overkill.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch, 
> HDFS-14015.012.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-15 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.012.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch, 
> HDFS-14015.012.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-15 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.012.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-15 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: (was: HDFS-14015.012.patch)

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-15 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688719#comment-16688719
 ] 

Daniel Templeton commented on HDFS-13998:
-

[~vinayrpet] and [~xiaochen], that's a hard-line interpretation of the 
compatibility guidelines.  The intent of the guideline is to avoid breaking 
scripts that rely on the output of our CLIs.  From the perspective of parsing 
CLI output, HDFS-13732 is not a breaking change.  From the perspective of 
behavior, it's a little grey.  Strictly speaking it's a behavioral change that 
could break a script that doesn't know if it's passing a {{-policy}} arg and is 
using the output to tell for some reason.  I find that scenario pretty 
unlikely, though.

The guidelines are guidelines.  Common sense still takes precedence.  In this 
case, EC is a new feature, and the probability that a script exists that would 
be broken by this change is vanishingly small.  For those reasons, I'm not sure 
I would have labeled HDFS-13732 incompatible.

The way to make the change without breaking compatibility in any way is to add 
another CLI option.  {{hdfs ec -defaultPolicy -path /EC}} for example, which 
would set the policy to the default and print it, or maybe a {{-default}} 
option for {{-setPolicy}}.

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, 
> HDFS-13998.03.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-14 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687476#comment-16687476
 ] 

Daniel Templeton commented on HDFS-14015:
-

Thanks for the review, [~pranay_singh]!

I originally had all those null checks in there, but I looked at some other JNI 
code, and no one checks for null on things that are required to be there, such 
as java.lang.Thread and its API methods.  If java.lang.Thread can't be found, 
we have bigger problems than a segfault.  I can add them back in if you like, 
though.

Good point on the return values.  I'll clean that up.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-14 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687111#comment-16687111
 ] 

Daniel Templeton commented on HDFS-14015:
-

Oh, duh.  The -1 is for not having tests in the patch.  Yeah, that's an issue, 
but I don't see a reasonable way to write a test for this code.  I'm open to 
suggestions.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-14 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687029#comment-16687029
 ] 

Daniel Templeton commented on HDFS-14015:
-

I just did my own review of my patch and caught some issues which are now 
addressed in patch 11.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-14 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.011.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch, HDFS-14015.011.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-14 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687004#comment-16687004
 ] 

Daniel Templeton commented on HDFS-14015:
-

I don't see any evidence of a test failure, so I'm not sure what's up with the 
-1.  [~pranay_singh] or [~xiaochen], would one of you care to review this patch?

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-13 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.010.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch, HDFS-14015.010.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-09 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681741#comment-16681741
 ] 

Daniel Templeton commented on HDFS-13998:
-

Patch 002 looks good except for one minor quibble:

{code}   System.out.println("Set " + ecPolicyName + " erasure coding policy 
on" +
   " " + path);{code}

Can we get rid of the extraneous {{" "}} in the second line, i.e. add a space 
inside the quote on the first line?

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14047) [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c

2018-11-06 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14047:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks for the patch, [~anatoli.shein], and the review, 
[~James C].

> [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c
> 
>
> Key: HDFS-14047
> URL: https://issues.apache.org/jira/browse/HDFS-14047
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: libhdfs, native
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14047.000.patch, HDFS-14047.001.patch
>
>
> Currently the native client CI tests break deterministically with these 
> errors:
> Libhdfs
> 1 - test_test_libhdfs_threaded_hdfs_static (Failed)
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> [exec] hdfsOpenFile(/tlhData0001/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> [exec] (unable to get root cause for java.io.FileNotFoundException)
> [exec] (unable to get stack trace for java.io.FileNotFoundException)
>  
> Libhdfs++
> 34 - test_libhdfs_threaded_hdfspp_test_shim_static (Failed)
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> [exec] hdfsOpenFile(/tlhData0001/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> [exec] (unable to get root cause for java.io.FileNotFoundException)
> [exec] (unable to get stack trace for java.io.FileNotFoundException)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-05 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675930#comment-16675930
 ] 

Daniel Templeton commented on HDFS-14015:
-

Attached patch 009 to address compiler warnings.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-05 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.009.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch, 
> HDFS-14015.009.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-02 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673790#comment-16673790
 ] 

Daniel Templeton commented on HDFS-14015:
-

New patch uses {{toString() + ":" + getId()}} and removes some of the obsessive 
and unnecessary error checking.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-02 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.008.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch, HDFS-14015.008.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-02 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673741#comment-16673741
 ] 

Daniel Templeton commented on HDFS-14015:
-

I'm worried about the testing, too.  No idea how to reasonably test this code.

I'll work on adding the extra info.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14047) [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c

2018-11-02 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673710#comment-16673710
 ] 

Daniel Templeton commented on HDFS-14047:
-

LGTM +1

I'll let this sit a day or two before I commit it.

> [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in test_libhdfs_threaded.c
> 
>
> Key: HDFS-14047
> URL: https://issues.apache.org/jira/browse/HDFS-14047
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: libhdfs, native
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
>Priority: Major
> Attachments: HDFS-14047.000.patch, HDFS-14047.001.patch
>
>
> Currently the native client CI tests break deterministically with these 
> errors:
> Libhdfs
> 1 - test_test_libhdfs_threaded_hdfs_static (Failed)
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> [exec] hdfsOpenFile(/tlhData0001/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> [exec] (unable to get root cause for java.io.FileNotFoundException)
> [exec] (unable to get stack trace for java.io.FileNotFoundException)
>  
> Libhdfs++
> 34 - test_libhdfs_threaded_hdfspp_test_shim_static (Failed)
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> [exec] TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> [exec] hdfsOpenFile(/tlhData0001/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> [exec] (unable to get root cause for java.io.FileNotFoundException)
> [exec] (unable to get stack trace for java.io.FileNotFoundException)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-02 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.007.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-02 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: (was: HDFS-14015.007.patch)

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-02 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673678#comment-16673678
 ] 

Daniel Templeton commented on HDFS-14015:
-

I extended my patch to include more information about the thread that failed to 
detach from the JVM.  Please have a look at patch 007.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-11-02 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.007.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch, HDFS-14015.007.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14027) DFSStripedOutputStream should implement both hsync methods

2018-10-29 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667921#comment-16667921
 ] 

Daniel Templeton commented on HDFS-14027:
-

LGTM +1

> DFSStripedOutputStream should implement both hsync methods
> --
>
> Key: HDFS-14027
> URL: https://issues.apache.org/jira/browse/HDFS-14027
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-14027.01.patch, HDFS-14027.02.patch, 
> HDFS-14027.03.patch
>
>
> In an internal spark investigation, it appears that when 
> [EventLoggingListener|https://github.com/apache/spark/blob/7251be0c04f0380208e0197e559158a9e1400868/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L155]
>  writes to an EC file, one may get exceptions reading, or get odd outputs. A 
> sample exception is
> {noformat}
> hdfs dfs -cat /user/spark/applicationHistory/application_1540333573846_0003 | 
> head -1
> 18/10/23 18:12:39 WARN impl.BlockReaderFactory: I/O error constructing remote 
> block reader.
> java.io.IOException: Got error, status=ERROR, status message opReadBlock 
> BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 received 
> exception java.io.IOException:  Offset 0 and length 116161 don't match block 
> BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 ( blockLen 
> 110296 ), for OP_READ_BLOCK, self=/HOST_IP:48610, remote=/HOST2_IP:20002, for 
> file /user/spark/applicationHistory/application_1540333573846_0003, for pool 
> BP-1488936467-HOST_IP-154092519 block -9223372036854774960_1085
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.checkSuccess(BlockReaderRemote.java:440)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:408)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:848)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:744)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:264)
>   at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:299)
>   at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:330)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:326)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:419)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
>   at java.io.DataInputStream.read(DataInputStream.java:100)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:92)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127)
>   at 
> org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101)
>   at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96)
>   at 
> org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
>   at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304)
>   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286)
>   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270)
>   at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
>   at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
>   at org.apache.hadoop.fs.FsShell.run(FsShell.java:326)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at org.apache.hadoop.fs.FsShell.main(FsShell.java:389)
> 18/10/23 18:12:39 WARN hdfs.DFSClient: Failed to connect to /HOST2_IP:20002 
> for blockBP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085
> java.io.IOException: Got error, status=ERROR, status message opReadBlock 
> BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 received 
> exception java.io.IOException:  Offset 0 

[jira] [Commented] (HDFS-14027) DFSStripedOutputStream should implement both hsync methods

2018-10-26 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665685#comment-16665685
 ] 

Daniel Templeton commented on HDFS-14027:
-

Thanks for the updated patch, [~xiaochen].  Last comment is on those WARN log 
messages.  There's not much that an admin is going to be able to do with those 
log messages, and they could potentially occur a lot.  I'd knock them back to 
either INFO or DEBUG and maybe be a little more explicit about the context of 
what's going on and why it's wrong.

> DFSStripedOutputStream should implement both hsync methods
> --
>
> Key: HDFS-14027
> URL: https://issues.apache.org/jira/browse/HDFS-14027
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-14027.01.patch, HDFS-14027.02.patch
>
>
> In an internal spark investigation, it appears that when 
> [EventLoggingListener|https://github.com/apache/spark/blob/7251be0c04f0380208e0197e559158a9e1400868/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L155]
>  writes to an EC file, one may get exceptions reading, or get odd outputs. A 
> sample exception is
> {noformat}
> hdfs dfs -cat /user/spark/applicationHistory/application_1540333573846_0003 | 
> head -1
> 18/10/23 18:12:39 WARN impl.BlockReaderFactory: I/O error constructing remote 
> block reader.
> java.io.IOException: Got error, status=ERROR, status message opReadBlock 
> BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 received 
> exception java.io.IOException:  Offset 0 and length 116161 don't match block 
> BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 ( blockLen 
> 110296 ), for OP_READ_BLOCK, self=/HOST_IP:48610, remote=/HOST2_IP:20002, for 
> file /user/spark/applicationHistory/application_1540333573846_0003, for pool 
> BP-1488936467-HOST_IP-154092519 block -9223372036854774960_1085
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.checkSuccess(BlockReaderRemote.java:440)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:408)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:848)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:744)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:264)
>   at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:299)
>   at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:330)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:326)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:419)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
>   at java.io.DataInputStream.read(DataInputStream.java:100)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:92)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127)
>   at 
> org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101)
>   at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96)
>   at 
> org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
>   at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304)
>   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286)
>   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270)
>   at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
>   at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
>   at org.apache.hadoop.fs.FsShell.run(FsShell.java:326)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at org.apache.hadoop.fs.FsShell.main(FsShell.java:389)
> 18/10/23 18:12:39 WARN 

[jira] [Commented] (HDFS-14027) DFSStripedOutputStream should implement both hsync methods

2018-10-25 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664292#comment-16664292
 ] 

Daniel Templeton commented on HDFS-14027:
-

Thanks for the patch, [~xiaochen].  Couple of comments/questions:

# Is it more reasonable to stub out the {{hsync()}} method than to throw an 
{{UnsupportedOperationException}}?  I assume the latter would break all the 
things, but I have to ask.
# In the test, please add a message to the assert.
# Why is {{dfssos}} final?
# Should there be a try/finally or try with resources there in the test code?

> DFSStripedOutputStream should implement both hsync methods
> --
>
> Key: HDFS-14027
> URL: https://issues.apache.org/jira/browse/HDFS-14027
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-14027.01.patch
>
>
> In an internal spark investigation, it appears that when 
> [EventLoggingListener|https://github.com/apache/spark/blob/7251be0c04f0380208e0197e559158a9e1400868/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L155]
>  writes to an EC file, one may get exceptions reading, or get odd outputs. A 
> sample exception is
> {noformat}
> hdfs dfs -cat /user/spark/applicationHistory/application_1540333573846_0003 | 
> head -1
> 18/10/23 18:12:39 WARN impl.BlockReaderFactory: I/O error constructing remote 
> block reader.
> java.io.IOException: Got error, status=ERROR, status message opReadBlock 
> BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 received 
> exception java.io.IOException:  Offset 0 and length 116161 don't match block 
> BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 ( blockLen 
> 110296 ), for OP_READ_BLOCK, self=/HOST_IP:48610, remote=/HOST2_IP:20002, for 
> file /user/spark/applicationHistory/application_1540333573846_0003, for pool 
> BP-1488936467-HOST_IP-154092519 block -9223372036854774960_1085
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.checkSuccess(BlockReaderRemote.java:440)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:408)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:848)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:744)
>   at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:264)
>   at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:299)
>   at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:330)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:326)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:419)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
>   at java.io.DataInputStream.read(DataInputStream.java:100)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:92)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127)
>   at 
> org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101)
>   at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96)
>   at 
> org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
>   at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304)
>   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286)
>   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270)
>   at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
>   at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
>   at org.apache.hadoop.fs.FsShell.run(FsShell.java:326)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at org.apache.hadoop.fs.FsShell.main(FsShell.java:389)
> 

[jira] [Commented] (HDFS-14022) Failing CTEST test_libhdfs

2018-10-24 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662774#comment-16662774
 ] 

Daniel Templeton commented on HDFS-14022:
-

HDFS-14015 patch 006 also failed the same way.

> Failing CTEST test_libhdfs
> --
>
> Key: HDFS-14022
> URL: https://issues.apache.org/jira/browse/HDFS-14022
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Assignee: Pranay Singh
>Priority: Major
>
> Here are list of recurring failures that are seen, there seems to be a 
> problem with
> invoking the build() in MiniDFSClusterBuilder, there are several failures ( 2 
> cores related to it), in the function
> struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf)
> {
>jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER,
> "build", "()L" MINIDFS_CLUSTER ";"); --->
> }
> Failed CTEST tests
> test_test_libhdfs_threaded_hdfs_static
>   test_test_libhdfs_zerocopy_hdfs_static
>   test_libhdfs_threaded_hdfspp_test_shim_static
>   test_hdfspp_mini_dfs_smoke_hdfspp_test_shim_static
>   libhdfs_mini_stress_valgrind_hdfspp_test_static
>   memcheck_libhdfs_mini_stress_valgrind_hdfspp_test_static
>   test_libhdfs_mini_stress_hdfspp_test_shim_static
>   test_hdfs_ext_hdfspp_test_shim_static
> 
> Details of the failures:
>  a) test_test_libhdfs_threaded_hdfs_static
> hdfsOpenFile(/tlhData0001/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException) --->
> (unable to get stack trace for java.io.FileNotFoundException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> hdfsOpenFile(/tlhData/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException)
> b) test_test_libhdfs_zerocopy_hdfs_static
> nmdCreate: Builder#build error:
> (unable to get root cause for java.lang.RuntimeException)
> (unable to get stack trace for java.lang.RuntimeException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c:253
>  (errno: 2): got NULL from cl
> Failure: 
> struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf)
> jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER,
> "build", "()L" MINIDFS_CLUSTER ";"); ===> Failure 
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL,
>   "nmdCreate: Builder#build");
> goto error;
> }
> }
> c) test_libhdfs_threaded_hdfspp_test_shim_static
> hdfsOpenFile(/tlhData0002/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException) --->
> (unable to get stack trace for java.io.FileNotFoundException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> d)
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0078c513, pid=16765, tid=0x7fc4449717c0
> #
> # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 
> 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13)
> # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # C  [hdfs_ext_hdfspp_test_shim_static+0x38c513]
> #
> # Core dump written. Default location: 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/core
>  or core.16765
> #
> # An error report file with more information is saved as:
> # 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/hs_err_pid16765.log
> #
> # If you would like to submit a bug report, please visit:

[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-24 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662770#comment-16662770
 ] 

Daniel Templeton commented on HDFS-14015:
-

Whew.  Failed as expected.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-24 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662679#comment-16662679
 ] 

Daniel Templeton commented on HDFS-14015:
-

Just in case, I just posted a patch 006 that only adds comments.  Let's see 
what Jenkins says.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-24 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.006.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch, 
> HDFS-14015.006.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14022) Failing CTEST test_libhdfs

2018-10-24 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662676#comment-16662676
 ] 

Daniel Templeton commented on HDFS-14022:
-

Doesn't look like HDFS-15856 changed anything.  I still see a ton a failures.  
I didn't compare against the previous runs, but it looked like all the same 
failures as before.

In HDFS-14015 patch 005, I changed some text in a {{printf()}} as an 
"innocuous" change.  Could the tests be somehow dependent on the output of that 
{{printf()}}?  Sounds dumb, but ya never know.  I'll post a patch that does 
even less, just to be sure.

> Failing CTEST test_libhdfs
> --
>
> Key: HDFS-14022
> URL: https://issues.apache.org/jira/browse/HDFS-14022
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Assignee: Pranay Singh
>Priority: Major
>
> Here are list of recurring failures that are seen, there seems to be a 
> problem with
> invoking the build() in MiniDFSClusterBuilder, there are several failures ( 2 
> cores related to it), in the function
> struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf)
> {
>jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER,
> "build", "()L" MINIDFS_CLUSTER ";"); --->
> }
> Failed CTEST tests
> test_test_libhdfs_threaded_hdfs_static
>   test_test_libhdfs_zerocopy_hdfs_static
>   test_libhdfs_threaded_hdfspp_test_shim_static
>   test_hdfspp_mini_dfs_smoke_hdfspp_test_shim_static
>   libhdfs_mini_stress_valgrind_hdfspp_test_static
>   memcheck_libhdfs_mini_stress_valgrind_hdfspp_test_static
>   test_libhdfs_mini_stress_hdfspp_test_shim_static
>   test_hdfs_ext_hdfspp_test_shim_static
> 
> Details of the failures:
>  a) test_test_libhdfs_threaded_hdfs_static
> hdfsOpenFile(/tlhData0001/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException) --->
> (unable to get stack trace for java.io.FileNotFoundException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> hdfsOpenFile(/tlhData/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException)
> b) test_test_libhdfs_zerocopy_hdfs_static
> nmdCreate: Builder#build error:
> (unable to get root cause for java.lang.RuntimeException)
> (unable to get stack trace for java.lang.RuntimeException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c:253
>  (errno: 2): got NULL from cl
> Failure: 
> struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf)
> jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER,
> "build", "()L" MINIDFS_CLUSTER ";"); ===> Failure 
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL,
>   "nmdCreate: Builder#build");
> goto error;
> }
> }
> c) test_libhdfs_threaded_hdfspp_test_shim_static
> hdfsOpenFile(/tlhData0002/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException) --->
> (unable to get stack trace for java.io.FileNotFoundException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> d)
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0078c513, pid=16765, tid=0x7fc4449717c0
> #
> # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 
> 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13)
> # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # C  [hdfs_ext_hdfspp_test_shim_static+0x38c513]
> #
> # Core dump written. Default 

[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-24 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662671#comment-16662671
 ] 

Daniel Templeton commented on HDFS-14015:
-

Still a bunch of failures.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14022) Failing CTEST test_libhdfs

2018-10-24 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662535#comment-16662535
 ] 

Daniel Templeton commented on HDFS-14022:
-

Just FYI, I just retriggered the build on my placebo patch on HDFS-14015 now 
that HDFS-15856 is in.

> Failing CTEST test_libhdfs
> --
>
> Key: HDFS-14022
> URL: https://issues.apache.org/jira/browse/HDFS-14022
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Assignee: Pranay Singh
>Priority: Major
>
> Here are list of recurring failures that are seen, there seems to be a 
> problem with
> invoking the build() in MiniDFSClusterBuilder, there are several failures ( 2 
> cores related to it), in the function
> struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf)
> {
>jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER,
> "build", "()L" MINIDFS_CLUSTER ";"); --->
> }
> Failed CTEST tests
> test_test_libhdfs_threaded_hdfs_static
>   test_test_libhdfs_zerocopy_hdfs_static
>   test_libhdfs_threaded_hdfspp_test_shim_static
>   test_hdfspp_mini_dfs_smoke_hdfspp_test_shim_static
>   libhdfs_mini_stress_valgrind_hdfspp_test_static
>   memcheck_libhdfs_mini_stress_valgrind_hdfspp_test_static
>   test_libhdfs_mini_stress_hdfspp_test_shim_static
>   test_hdfs_ext_hdfspp_test_shim_static
> 
> Details of the failures:
>  a) test_test_libhdfs_threaded_hdfs_static
> hdfsOpenFile(/tlhData0001/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException) --->
> (unable to get stack trace for java.io.FileNotFoundException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> hdfsOpenFile(/tlhData/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException)
> b) test_test_libhdfs_zerocopy_hdfs_static
> nmdCreate: Builder#build error:
> (unable to get root cause for java.lang.RuntimeException)
> (unable to get stack trace for java.lang.RuntimeException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c:253
>  (errno: 2): got NULL from cl
> Failure: 
> struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf)
> jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER,
> "build", "()L" MINIDFS_CLUSTER ";"); ===> Failure 
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL,
>   "nmdCreate: Builder#build");
> goto error;
> }
> }
> c) test_libhdfs_threaded_hdfspp_test_shim_static
> hdfsOpenFile(/tlhData0002/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException) --->
> (unable to get stack trace for java.io.FileNotFoundException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> d)
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0078c513, pid=16765, tid=0x7fc4449717c0
> #
> # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 
> 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13)
> # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # C  [hdfs_ext_hdfspp_test_shim_static+0x38c513]
> #
> # Core dump written. Default location: 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/core
>  or core.16765
> #
> # An error report file with more information is saved as:
> # 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/hs_err_pid16765.log
> #
> # If 

[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-24 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662522#comment-16662522
 ] 

Daniel Templeton commented on HDFS-14015:
-

I just retriggered the build now that HDFS-15856 is in.  Let's see what we get.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14022) Failing CTEST test_libhdfs

2018-10-24 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662515#comment-16662515
 ] 

Daniel Templeton commented on HDFS-14022:
-

I don't think so.

> Failing CTEST test_libhdfs
> --
>
> Key: HDFS-14022
> URL: https://issues.apache.org/jira/browse/HDFS-14022
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Priority: Major
>
> Here are list of recurring failures that are seen, there seems to be a 
> problem with
> invoking the build() in MiniDFSClusterBuilder, there are several failures ( 2 
> cores related to it), in the function
> struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf)
> {
>jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER,
> "build", "()L" MINIDFS_CLUSTER ";"); --->
> }
> Failed CTEST tests
> test_test_libhdfs_threaded_hdfs_static
>   test_test_libhdfs_zerocopy_hdfs_static
>   test_libhdfs_threaded_hdfspp_test_shim_static
>   test_hdfspp_mini_dfs_smoke_hdfspp_test_shim_static
>   libhdfs_mini_stress_valgrind_hdfspp_test_static
>   memcheck_libhdfs_mini_stress_valgrind_hdfspp_test_static
>   test_libhdfs_mini_stress_hdfspp_test_shim_static
>   test_hdfs_ext_hdfspp_test_shim_static
> 
> Details of the failures:
>  a) test_test_libhdfs_threaded_hdfs_static
> hdfsOpenFile(/tlhData0001/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException) --->
> (unable to get stack trace for java.io.FileNotFoundException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> hdfsOpenFile(/tlhData/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException)
> b) test_test_libhdfs_zerocopy_hdfs_static
> nmdCreate: Builder#build error:
> (unable to get root cause for java.lang.RuntimeException)
> (unable to get stack trace for java.lang.RuntimeException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c:253
>  (errno: 2): got NULL from cl
> Failure: 
> struct NativeMiniDfsCluster* nmdCreate(struct NativeMiniDfsConf *conf)
> jthr = invokeMethod(env, , INSTANCE, bld, MINIDFS_CLUSTER_BUILDER,
> "build", "()L" MINIDFS_CLUSTER ";"); ===> Failure 
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL,
>   "nmdCreate: Builder#build");
> goto error;
> }
> }
> c) test_libhdfs_threaded_hdfspp_test_shim_static
> hdfsOpenFile(/tlhData0002/file1): 
> FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
>  error:
> (unable to get root cause for java.io.FileNotFoundException) --->
> (unable to get stack trace for java.io.FileNotFoundException)
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:180
>  with NULL return return value (errno: 2): expected substring: File does not 
> exist
> TEST_ERROR: failed on 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c:336
>  with return code -1 (errno: 2): got nonzero from doTestHdfsOperations(ti, 
> fs, )
> d)
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0078c513, pid=16765, tid=0x7fc4449717c0
> #
> # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 
> 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13)
> # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # C  [hdfs_ext_hdfspp_test_shim_static+0x38c513]
> #
> # Core dump written. Default location: 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/core
>  or core.16765
> #
> # An error report file with more information is saved as:
> # 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/hs_err_pid16765.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash 

[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-23 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661519#comment-16661519
 ] 

Daniel Templeton commented on HDFS-14015:
-

Thanks, [~jojochuang] and [~pranay_singh].  How shall we proceed here?  We can 
see that the build for patch 004 (the current patch) appears to be just as 
broken as the build for patch 005 (the placebo patch).  I'm a little nervous to 
commit patch 004 on faith, but I also don't want to make resolving HDFS-14022 a 
dependency for committing patch 004.  Thoughts?

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-23 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661055#comment-16661055
 ] 

Daniel Templeton edited comment on HDFS-14015 at 10/23/18 5:54 PM:
---

I don't see why the tests are failing, but they're failing consistently.  I 
just posted a new patch that doesn't actually change anything important; it 
fixes a typo in a string.  I want to see what happens with a provably innocuous 
patch.


was (Author: templedf):
I don't see why the tests are failing, but they're failing consistently.  I 
just posted a new patch that doesn't actually change anything important; it 
fixes a typo in a string.  I want to see what happens when with a provably 
innocuous patch.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-23 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661055#comment-16661055
 ] 

Daniel Templeton commented on HDFS-14015:
-

I don't see why the tests are failing, but they're failing consistently.  I 
just posted a new patch that doesn't actually change anything important; it 
fixes a typo in a string.  I want to see what happens when with a provably 
innocuous patch.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-23 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.005.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch, HDFS-14015.005.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659800#comment-16659800
 ] 

Daniel Templeton commented on HDFS-14015:
-

Isn't that the build for the 1st patch?  I'm surprised it compiled.  It didn't 
for me locally.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659763#comment-16659763
 ] 

Daniel Templeton commented on HDFS-14015:
-

Darn it.  Caught a mistake in my own review.  Updated patch 4 posted.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.004.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch, HDFS-14015.004.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659760#comment-16659760
 ] 

Daniel Templeton commented on HDFS-14015:
-

Good catch, [~jojochuang]!  I also lack a Windows box on which to test, but I 
have to assume the same laws of physics apply there as well.  Posted a new 
patch that applies the same changes to the Windows side.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.003.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch, 
> HDFS-14015.003.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659651#comment-16659651
 ] 

Daniel Templeton commented on HDFS-14015:
-

Whoops.  Posted the wrong patch before.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.002.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch, HDFS-14015.002.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Status: Patch Available  (was: Open)

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-14015:

Attachment: HDFS-14015.001.patch

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: HDFS-14015.001.patch
>
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659206#comment-16659206
 ] 

Daniel Templeton commented on HDFS-14015:
-

I will post the patch as soon as my local build completes.

> Improve error handling in hdfsThreadDestructor in native thread local storage
> -
>
> Key: HDFS-14015
> URL: https://issues.apache.org/jira/browse/HDFS-14015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
>
> In the hdfsThreadDestructor() function, we ignore the return value from the 
> DetachCurrentThread() call.  We are seeing cases where a native thread dies 
> while holding a JVM monitor, and it doesn't release the monitor.  We're 
> hoping that logging this error instead of ignoring it will shed some light on 
> the issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14015) Improve error handling in hdfsThreadDestructor in native thread local storage

2018-10-22 Thread Daniel Templeton (JIRA)
Daniel Templeton created HDFS-14015:
---

 Summary: Improve error handling in hdfsThreadDestructor in native 
thread local storage
 Key: HDFS-14015
 URL: https://issues.apache.org/jira/browse/HDFS-14015
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: native
Affects Versions: 3.0.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton


In the hdfsThreadDestructor() function, we ignore the return value from the 
DetachCurrentThread() call.  We are seeing cases where a native thread dies 
while holding a JVM monitor, and it doesn't release the monitor.  We're hoping 
that logging this error instead of ignoring it will shed some light on the 
issue.  In any case, it's good programming practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped

2018-09-12 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-13846:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

> Safe blocks counter is not decremented correctly if the block is striped
> 
>
> Key: HDFS-13846
> URL: https://issues.apache.org/jira/browse/HDFS-13846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13846.001.patch, HDFS-13846.002.patch, 
> HDFS-13846.003.patch, HDFS-13846.004.patch, HDFS-13846.005.patch
>
>
> In BlockManagerSafeMode class, the "safe blocks" counter is incremented if 
> the number of nodes containing the block equals to the number of data units 
> specified by the erasure coding policy, which looks like this in the code:
> {code:java}
> final int safe = storedBlock.isStriped() ?
> ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : 
> safeReplication;
> if (storageNum == safe) {
>   this.blockSafe++;
> {code}
> But when it is decremented the code does not check if the block is striped or 
> not, just compares the number of nodes containing the block with 0 
> (safeReplication - 1) if the block is complete, which is not correct.
> {code:java}
> if (storedBlock.isComplete() &&
> blockManager.countNodes(b).liveReplicas() == safeReplication - 1) {
>   this.blockSafe--;
>   assert blockSafe >= 0;
>   checkSafeMode();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped

2018-09-12 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612555#comment-16612555
 ] 

Daniel Templeton commented on HDFS-13846:
-

Thanks, [~knanasi].  I was going to let those 3 characters slide. :)

Committed to trunk.

> Safe blocks counter is not decremented correctly if the block is striped
> 
>
> Key: HDFS-13846
> URL: https://issues.apache.org/jira/browse/HDFS-13846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13846.001.patch, HDFS-13846.002.patch, 
> HDFS-13846.003.patch, HDFS-13846.004.patch, HDFS-13846.005.patch
>
>
> In BlockManagerSafeMode class, the "safe blocks" counter is incremented if 
> the number of nodes containing the block equals to the number of data units 
> specified by the erasure coding policy, which looks like this in the code:
> {code:java}
> final int safe = storedBlock.isStriped() ?
> ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : 
> safeReplication;
> if (storageNum == safe) {
>   this.blockSafe++;
> {code}
> But when it is decremented the code does not check if the block is striped or 
> not, just compares the number of nodes containing the block with 0 
> (safeReplication - 1) if the block is complete, which is not correct.
> {code:java}
> if (storedBlock.isComplete() &&
> blockManager.countNodes(b).liveReplicas() == safeReplication - 1) {
>   this.blockSafe--;
>   assert blockSafe >= 0;
>   checkSafeMode();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13913) LazyPersistFileScrubber.run() error handling is poor

2018-09-12 Thread Daniel Templeton (JIRA)
Daniel Templeton created HDFS-13913:
---

 Summary: LazyPersistFileScrubber.run() error handling is poor
 Key: HDFS-13913
 URL: https://issues.apache.org/jira/browse/HDFS-13913
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.1.0
Reporter: Daniel Templeton
Assignee: Daniel Green


In {{LazyPersistFileScrubber.run()}} we have:

{code}
try {
  clearCorruptLazyPersistFiles();
} catch (Exception e) {
  FSNamesystem.LOG.error(
  "Ignoring exception in LazyPersistFileScrubber:", e);
}
{code}

First problem is that catching {{Exception}} is sloppy.  It should instead be a 
multicatch for the actual exceptions thrown or better a set of separate catch 
statements that react appropriately to the type of exception.

Second problem is that it's bad to log an ERROR that's not actionable and that 
can be safely ignored.  The log message should be logged at WARN or INFO level.

Third, the log message is useless.  If it's going to be a WARN or ERROR, a log 
message should be actionable.  Otherwise it's an info.  A log message should 
contain enough information for an admin to understand what it means.

In the end, I think the right thing here is to leave the high-level behavior 
unchanged: log a message and ignore the error, hoping that the next run will go 
better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped

2018-09-12 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612417#comment-16612417
 ] 

Daniel Templeton commented on HDFS-13846:
-

OK, +1 from me.  I'll commit later today.

> Safe blocks counter is not decremented correctly if the block is striped
> 
>
> Key: HDFS-13846
> URL: https://issues.apache.org/jira/browse/HDFS-13846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13846.001.patch, HDFS-13846.002.patch, 
> HDFS-13846.003.patch, HDFS-13846.004.patch
>
>
> In BlockManagerSafeMode class, the "safe blocks" counter is incremented if 
> the number of nodes containing the block equals to the number of data units 
> specified by the erasure coding policy, which looks like this in the code:
> {code:java}
> final int safe = storedBlock.isStriped() ?
> ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : 
> safeReplication;
> if (storageNum == safe) {
>   this.blockSafe++;
> {code}
> But when it is decremented the code does not check if the block is striped or 
> not, just compares the number of nodes containing the block with 0 
> (safeReplication - 1) if the block is complete, which is not correct.
> {code:java}
> if (storedBlock.isComplete() &&
> blockManager.countNodes(b).liveReplicas() == safeReplication - 1) {
>   this.blockSafe--;
>   assert blockSafe >= 0;
>   checkSafeMode();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped

2018-09-11 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611338#comment-16611338
 ] 

Daniel Templeton commented on HDFS-13846:
-

That sounds good to me.  Hmmm...  I'm wondering why there hasn't been a Jenkins 
run.  Lemme go kick it.

> Safe blocks counter is not decremented correctly if the block is striped
> 
>
> Key: HDFS-13846
> URL: https://issues.apache.org/jira/browse/HDFS-13846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13846.001.patch, HDFS-13846.002.patch, 
> HDFS-13846.003.patch, HDFS-13846.004.patch
>
>
> In BlockManagerSafeMode class, the "safe blocks" counter is incremented if 
> the number of nodes containing the block equals to the number of data units 
> specified by the erasure coding policy, which looks like this in the code:
> {code:java}
> final int safe = storedBlock.isStriped() ?
> ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : 
> safeReplication;
> if (storageNum == safe) {
>   this.blockSafe++;
> {code}
> But when it is decremented the code does not check if the block is striped or 
> not, just compares the number of nodes containing the block with 0 
> (safeReplication - 1) if the block is complete, which is not correct.
> {code:java}
> if (storedBlock.isComplete() &&
> blockManager.countNodes(b).liveReplicas() == safeReplication - 1) {
>   this.blockSafe--;
>   assert blockSafe >= 0;
>   checkSafeMode();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped

2018-08-29 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596832#comment-16596832
 ] 

Daniel Templeton commented on HDFS-13846:
-

I see.  That makes sense.  Might be nice to add a comment to explain that so 
that someone doesn't "fix" it later by making the conditional test {{<=}}.  
Aside from that, LGTM.  Did you look at the deprecation warning that popped up? 
 The jenkins build is gone now, so I can't verify whether it was related to 
code you added.

> Safe blocks counter is not decremented correctly if the block is striped
> 
>
> Key: HDFS-13846
> URL: https://issues.apache.org/jira/browse/HDFS-13846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13846.001.patch, HDFS-13846.002.patch, 
> HDFS-13846.003.patch
>
>
> In BlockManagerSafeMode class, the "safe blocks" counter is incremented if 
> the number of nodes containing the block equals to the number of data units 
> specified by the erasure coding policy, which looks like this in the code:
> {code:java}
> final int safe = storedBlock.isStriped() ?
> ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : 
> safeReplication;
> if (storageNum == safe) {
>   this.blockSafe++;
> {code}
> But when it is decremented the code does not check if the block is striped or 
> not, just compares the number of nodes containing the block with 0 
> (safeReplication - 1) if the block is complete, which is not correct.
> {code:java}
> if (storedBlock.isComplete() &&
> blockManager.countNodes(b).liveReplicas() == safeReplication - 1) {
>   this.blockSafe--;
>   assert blockSafe >= 0;
>   checkSafeMode();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   >