[jira] [Assigned] (HDFS-14719) Correct the safemode threshold value in BlockManagerSafeMode

2019-08-11 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-14719:


Assignee: hemanthboyina

> Correct the safemode threshold value in BlockManagerSafeMode
> 
>
> Key: HDFS-14719
> URL: https://issues.apache.org/jira/browse/HDFS-14719
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
>
> BlockManagerSafeMode is doing wrong parsing for safemode threshold. It is 
> storing float value in double, which will give different result some time. If 
> we store "0.999f" value in double then it will be converted to 
> "0.999128746033".
> {code:java}
> this.threshold = conf.getFloat(DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_KEY,
> DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT);{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

2019-08-12 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-14720:


Assignee: hemanthboyina

> DataNode shouldn't report block as bad block if the block length is 
> Long.MAX_VALUE.
> ---
>
> Key: HDFS-14720
> URL: https://issues.apache.org/jira/browse/HDFS-14720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
>
> {noformat}
> 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Can't replicate block 
> BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because 
> on-disk length 175085 is shorter than NameNode recorded length 
> 9223372036854775807.{noformat}
> If the block length is Long.MAX_VALUE, means file belongs to this block is 
> deleted from the namenode and DN got the command after deletion of file. In 
> this case command should be ignored.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13270) RBF: Router audit logger

2019-08-11 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904824#comment-16904824
 ] 

hemanthboyina edited comment on HDFS-13270 at 8/12/19 3:40 AM:
---

thanks for the comment [~xuzq_zander] 
 the updated patch HDFS-13270.002 was including the changes done with 
-HDFS-14685-. 

       _we should support one configuration to close audit log_

yes we can do that , when user no longer required audit log , we can make a 
configuration to close the audit log

 

 


was (Author: hemanthboyina):
thanks for the comment [~xuzq_zander] 
this patch [link 
title|https://issues.apache.org/jira/secure/attachment/12977228/HDFS-13270.002.patch]
 was including the changes done with -HDFS-14685-. 

       _we should support one configuration to close audit log_

yes we can do that , when user no longer required audit log , we can make a 
configuration to close the audit log

 

 

> RBF: Router audit logger
> 
>
> Key: HDFS-13270
> URL: https://issues.apache.org/jira/browse/HDFS-13270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: maobaolong
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-13270.001.patch, HDFS-13270.002.patch
>
>
> We can use router auditlogger to log the client info and cmd, because the 
> FSNamesystem#Auditlogger's log think the client are all from router.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14567) If kms-acls is failed to load, and it will never be reload

2019-08-12 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904977#comment-16904977
 ] 

hemanthboyina commented on HDFS-14567:
--

thanks for the comment [~jojochuang]            __                  __     

 

                  _What I suggested will make sure it always reads a complete 
kms-acls.xml file. That is, use the local file system semantics to ensure it is 
written atomically, so you shouldn't see an exception loading the ACLs._

we can do by this way , but actually there is flaw in the code 
{code:java}
lastReload = System.currentTimeMillis();
Configuration conf = KMSConfiguration.getACLsConf();
// triggering the resource loading.
conf.get(Type.CREATE.getAclConfigKey());{code}
before getting the conf we are changing the lastReload time , if by any chance 
conf.get()  throws error , there will be a problem .
In our scenario it happened

>  If kms-acls is failed to load, and it will never be reload
> ---
>
> Key: HDFS-14567
> URL: https://issues.apache.org/jira/browse/HDFS-14567
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14567.patch
>
>
> Scenario : through one automation tool , we are generating kms-acls , though 
> the generation of kms-acls is not completed , the system will detect a 
> modification of kms-alcs and it will try to load
> Before getting the configuration we are modifiying last reload time , code 
> shown below
> {code:java}
> private Configuration loadACLsFromFile() {
> LOG.debug("Loading ACLs file");
> lastReload = System.currentTimeMillis();
> Configuration conf = KMSConfiguration.getACLsConf();
> // triggering the resource loading.
> conf.get(Type.CREATE.getAclConfigKey());
> return conf;
> }{code}
> if the kms-acls file written within next 100ms , the changes will not be 
> loaded as this condition "newer = f.lastModified() - time > 100" never meets 
> because we have modified last reload time before getting the configuration
> {code:java}
> public static boolean isACLsFileNewer(long time) {
> boolean newer = false;
> String confDir = System.getProperty(KMS_CONFIG_DIR);
> if (confDir != null) {
> Path confPath = new Path(confDir);
> if (!confPath.isUriPathAbsolute()) {
> throw new RuntimeException("System property '" + KMS_CONFIG_DIR +
> "' must be an absolute path: " + confDir);
> }
> File f = new File(confDir, KMS_ACLS_XML);
> LOG.trace("Checking file {}, modification time is {}, last reload time is"
> + " {}", f.getPath(), f.lastModified(), time);
> // at least 100ms newer than time, we do this to ensure the file
> // has been properly closed/flushed
> newer = f.lastModified() - time > 100;
> }
> return newer;
> } {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13270) RBF: Router audit logger

2019-08-11 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904824#comment-16904824
 ] 

hemanthboyina commented on HDFS-13270:
--

thanks for the comment [~xuzq_zander] 
this patch [link 
title|https://issues.apache.org/jira/secure/attachment/12977228/HDFS-13270.002.patch]
 was including the changes done with -HDFS-14685-. 

       _we should support one configuration to close audit log_

yes we can do that , when user no longer required audit log , we can make a 
configuration to close the audit log

 

 

> RBF: Router audit logger
> 
>
> Key: HDFS-13270
> URL: https://issues.apache.org/jira/browse/HDFS-13270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: maobaolong
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-13270.001.patch, HDFS-13270.002.patch
>
>
> We can use router auditlogger to log the client info and cmd, because the 
> FSNamesystem#Auditlogger's log think the client are all from router.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14625) Make DefaultAuditLogger class in FSnamesystem to Abstract

2019-08-11 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904825#comment-16904825
 ] 

hemanthboyina commented on HDFS-14625:
--

updated the patch
please check [~jojochuang] 

> Make DefaultAuditLogger class in FSnamesystem to Abstract 
> --
>
> Key: HDFS-14625
> URL: https://issues.apache.org/jira/browse/HDFS-14625
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14625 (1).patch, HDFS-14625(2).patch, 
> HDFS-14625.003.patch, HDFS-14625.004.patch, HDFS-14625.patch
>
>
> As per +HDFS-13270+  Audit logger for Router , we can make DefaultAuditLogger 
>  in FSnamesystem to be Abstract and common



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13019) dfs put with -f to dir with existing file in dest should return 0, not -1

2019-08-18 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909902#comment-16909902
 ] 

hemanthboyina commented on HDFS-13019:
--

[~bharatviswa] are you working on this ?

> dfs put with -f to dir with existing file in dest should return 0, not -1
> -
>
> Key: HDFS-13019
> URL: https://issues.apache.org/jira/browse/HDFS-13019
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3
>Reporter: BRYAN T VOLD
>Assignee: Bharat Viswanadham
>Priority: Major
>
> When doing an hdfs dfs -put   and there are existing 
> files, the return code will be -1, which is expected.  
> When you do an hdfs dfs -put -f   (force), the error code 
> still comes back as -1, which is unexpected.  
> If you use hdfs dfs -copyFromLocal using the same directories as above, the 
> -copyFromLocal stills gives the error which is expected and when you pass -f 
> to this version of the command, the error code is 0, which I think is the 
> correct behavior and I think the hdfs dfs -put should match this.  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12831) HDFS throws FileNotFoundException on getFileBlockLocations(path-to-directory)

2019-08-18 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909903#comment-16909903
 ] 

hemanthboyina commented on HDFS-12831:
--

it will be better if we throw exception as PathIsDirectoryException
Need to change it in INodeFIle.java

> HDFS throws FileNotFoundException on getFileBlockLocations(path-to-directory)
> -
>
> Key: HDFS-12831
> URL: https://issues.apache.org/jira/browse/HDFS-12831
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Hanisha Koneru
>Priority: Major
>
> The HDFS implementation of {{getFileBlockLocations(path, offset, len)}} 
> throws an exception if the path references a directory. 
> The base implementation (and all other filesystems) just return an empty 
> array, something implemented in {{getFileBlockLocations(filestatsus, offset, 
> len)}}; something written up in filesystem.md as the correct behaviour. 
> # has been shown to break things: SPARK-14959
> # there's no contract tests for these APIs; shows up in HADOOP-15044. 
> # even if this is considered a wontfix, it should raise something like 
> {{PathIsDirectoryException}} rather than FNFE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11911) SnapshotDiff should maintain the order of file/dir creation and deletion

2019-08-18 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909901#comment-16909901
 ] 

hemanthboyina commented on HDFS-11911:
--

[~manojg] are you working on this ? 

> SnapshotDiff should maintain the order of file/dir creation and deletion
> 
>
> Key: HDFS-11911
> URL: https://issues.apache.org/jira/browse/HDFS-11911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, snapshots
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
>
> {{DirectoryWithSnapshotFeature}} maintains a separate list for CREATED and 
> DELETED children but the ordering of these creation and deletion events are 
> not maintained. Assume a case like below, where the time is growing 
> downwards...
> {noformat}
> |
> +  CREATE File-1
> |
> + Snap S1 created
> |
> + DELETE File-1
> |
> + Snap S2 created
> |
> + CREATE File-1
> |
> + Snap S3 created
> |
> |
> V
> {noformat} 
> The snapshot diff report which takes in the DirectoryWithSnapshotFeature diff 
> entries and just prints all the creation first and then the deletions, 
> thereby giving the perception that file-1 got created first and then got 
> deleted. But after S3, file-1 is still available. 
> {noformat}
> The difference between snapshot S1 and snapshot S3 under the directory /:
> M .
> + ./file-1
> - ./file-1
> {noformat}
> Can we have DirectoryWithSnapshotFeature maintain the diff entries ordered by 
> time or sequence? 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-14739) RBF: LS command for mount point shows wrong owner and permission information.

2019-08-15 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14739:
-
Comment: was deleted

(was: hi [~xuzq_zander]

        _the owner of  */mnt/test1* should be *mnt_test1* instead of *test1* in 
result._
can you say the reason why it should be /mnt/test1 ?)

> RBF: LS command for mount point shows wrong owner and permission information.
> -
>
> Key: HDFS-14739
> URL: https://issues.apache.org/jira/browse/HDFS-14739
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Priority: Major
>
> ||source||target namespace||destination||owner||group||permission||
> |/mnt|ns0|/mnt|mnt|mnt_group|755|
> |/mnt/test1|ns1|/mnt/test1|mnt_test1|mnt_test1_group|755|
> |/test1|ns1|/test1|test1|test1_group|755|
> When do getListing("/mnt"), the owner of  */mnt/test1* should be *mnt_test1* 
> instead of *test1* in result.
>  
> And if the mount table as blew, we should support getListing("/mnt") instead 
> of throw IOException when dfs.federation.router.default.nameservice.enable is 
> false.
> ||source||target namespace||destination||owner||group||permission||
> |/mnt/test1|ns0|/mnt/test1|test1|test1|755|
> |/mnt/test2|ns1|/mnt/test2|test2|test2|755|
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14739) RBF: LS command for mount point shows wrong owner and permission information.

2019-08-15 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908013#comment-16908013
 ] 

hemanthboyina commented on HDFS-14739:
--

hi [~xuzq_zander]

        _the owner of  */mnt/test1* should be *mnt_test1* instead of *test1* in 
result._
can you say the reason why it should be /mnt/test1 ?

> RBF: LS command for mount point shows wrong owner and permission information.
> -
>
> Key: HDFS-14739
> URL: https://issues.apache.org/jira/browse/HDFS-14739
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Priority: Major
>
> ||source||target namespace||destination||owner||group||permission||
> |/mnt|ns0|/mnt|mnt|mnt_group|755|
> |/mnt/test1|ns1|/mnt/test1|mnt_test1|mnt_test1_group|755|
> |/test1|ns1|/test1|test1|test1_group|755|
> When do getListing("/mnt"), the owner of  */mnt/test1* should be *mnt_test1* 
> instead of *test1* in result.
>  
> And if the mount table as blew, we should support getListing("/mnt") instead 
> of throw IOException when dfs.federation.router.default.nameservice.enable is 
> false.
> ||source||target namespace||destination||owner||group||permission||
> |/mnt/test1|ns0|/mnt/test1|test1|test1|755|
> |/mnt/test2|ns1|/mnt/test2|test2|test2|755|
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14739) RBF: LS command for mount point shows wrong owner and permission information.

2019-08-15 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908014#comment-16908014
 ] 

hemanthboyina commented on HDFS-14739:
--

hi [~xuzq_zander]

        _the owner of  */mnt/test1* should be *mnt_test1* instead of *test1* in 
result._
can you say the reason why the owner should be mnt_test1 instead of test1 ?

> RBF: LS command for mount point shows wrong owner and permission information.
> -
>
> Key: HDFS-14739
> URL: https://issues.apache.org/jira/browse/HDFS-14739
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Priority: Major
>
> ||source||target namespace||destination||owner||group||permission||
> |/mnt|ns0|/mnt|mnt|mnt_group|755|
> |/mnt/test1|ns1|/mnt/test1|mnt_test1|mnt_test1_group|755|
> |/test1|ns1|/test1|test1|test1_group|755|
> When do getListing("/mnt"), the owner of  */mnt/test1* should be *mnt_test1* 
> instead of *test1* in result.
>  
> And if the mount table as blew, we should support getListing("/mnt") instead 
> of throw IOException when dfs.federation.router.default.nameservice.enable is 
> false.
> ||source||target namespace||destination||owner||group||permission||
> |/mnt/test1|ns0|/mnt/test1|test1|test1|755|
> |/mnt/test2|ns1|/mnt/test2|test2|test2|755|
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14739) RBF: LS command for mount point shows wrong owner and permission information.

2019-08-15 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908707#comment-16908707
 ] 

hemanthboyina commented on HDFS-14739:
--

while mount point creation  , we get the owner from 
{code:java}
UserGroupInformation ugi = NameNode.getRemoteUser();
record.setOwnerName(ugi.getShortUserName());{code}
with these details mountpoint will be created 
>From the code , the owner information is correct only
correct me if im wrong

> RBF: LS command for mount point shows wrong owner and permission information.
> -
>
> Key: HDFS-14739
> URL: https://issues.apache.org/jira/browse/HDFS-14739
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Priority: Major
>
> ||source||target namespace||destination||owner||group||permission||
> |/mnt|ns0|/mnt|mnt|mnt_group|755|
> |/mnt/test1|ns1|/mnt/test1|mnt_test1|mnt_test1_group|755|
> |/test1|ns1|/test1|test1|test1_group|755|
> When do getListing("/mnt"), the owner of  */mnt/test1* should be *mnt_test1* 
> instead of *test1* in result.
>  
> And if the mount table as blew, we should support getListing("/mnt") instead 
> of throw IOException when dfs.federation.router.default.nameservice.enable is 
> false.
> ||source||target namespace||destination||owner||group||permission||
> |/mnt/test1|ns0|/mnt/test1|test1|test1|755|
> |/mnt/test2|ns1|/mnt/test2|test2|test2|755|
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14625) Make DefaultAuditLogger class in FSnamesystem to Abstract

2019-08-10 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14625:
-
Attachment: HDFS-14625.004.patch

> Make DefaultAuditLogger class in FSnamesystem to Abstract 
> --
>
> Key: HDFS-14625
> URL: https://issues.apache.org/jira/browse/HDFS-14625
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14625 (1).patch, HDFS-14625(2).patch, 
> HDFS-14625.003.patch, HDFS-14625.004.patch, HDFS-14625.patch
>
>
> As per +HDFS-13270+  Audit logger for Router , we can make DefaultAuditLogger 
>  in FSnamesystem to be Abstract and common



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13270) RBF: Router audit logger

2019-08-10 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-13270:
-
Attachment: HDFS-13270.002.patch

> RBF: Router audit logger
> 
>
> Key: HDFS-13270
> URL: https://issues.apache.org/jira/browse/HDFS-13270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: maobaolong
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-13270.001.patch, HDFS-13270.002.patch
>
>
> We can use router auditlogger to log the client info and cmd, because the 
> FSNamesystem#Auditlogger's log think the client are all from router.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13123) RBF: Add a balancer tool to move data across subcluster

2019-08-10 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904410#comment-16904410
 ] 

hemanthboyina commented on HDFS-13123:
--

[~elgoiri] [~jojochuang] attached the intial patch , can you review it  

> RBF: Add a balancer tool to move data across subcluster 
> 
>
> Key: HDFS-13123
> URL: https://issues.apache.org/jira/browse/HDFS-13123
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS Router-Based Federation Rebalancer.pdf, 
> HDFS-13123.patch
>
>
> Follow the discussion in HDFS-12615. This Jira is to track effort for 
> building a rebalancer tool, used by router-based federation to move data 
> among subclusters.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14615) NPE writing edit logs

2019-08-10 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904417#comment-16904417
 ] 

hemanthboyina commented on HDFS-14615:
--

i think the group name was null here , creating NPE during MKDIR operation

> NPE writing edit logs
> -
>
> Key: HDFS-14615
> URL: https://issues.apache.org/jira/browse/HDFS-14615
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
>
> Hit a weird bug where writing mkdir op to edit log throws an NPE and NameNode 
> crashed
> {noformat}
> 2019-06-26 10:57:27,398 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: write op failed for 
> (journal 
> JournalAndStream(mgr=FileJournalManager(root=/ssd/work/src/upstream/impala/testdata/cluster/cdh6/node-1/data/dfs/nn),
>  
> stream=EditLogFileOutputStream(/ssd/work/src/upstream/impala/testdata/cluster/cdh6/node-1/data/dfs/nn/current/edits_inprogress_0598588)))
> java.lang.NullPointerException
> at org.apache.hadoop.io.Text.encode(Text.java:451)
> at org.apache.hadoop.io.Text.encode(Text.java:431)
> at org.apache.hadoop.io.Text.writeString(Text.java:491)
> at 
> org.apache.hadoop.fs.permission.PermissionStatus.write(PermissionStatus.java:104)
> at 
> org.apache.hadoop.fs.permission.PermissionStatus.write(PermissionStatus.java:84)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$MkdirOp.writeFields(FSEditLogOp.java:1654)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$Writer.writeOp(FSEditLogOp.java:4866)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer$TxnBuffer.writeOp(EditsDoubleBuffer.java:157)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.writeOp(EditsDoubleBuffer.java:60)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream.write(EditLogFileOutputStream.java:97)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$1.apply(JournalSet.java:444)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.write(JournalSet.java:440)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:481)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:288)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:232)
> {noformat}
> The stacktrace is similar to SENTRY-555, which is thought to be a Sentry bug 
> (authorization provider), but this cluster doesn't have Sentry and therefore 
> could be a genuine HDFS bug.
> File this jira to keep a record.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14174) Enhance Audit for chown ( internally setOwner)

2019-08-10 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904411#comment-16904411
 ] 

hemanthboyina commented on HDFS-14174:
--

now we are auditing the new owner information 
need to capture existing owner and new owner ? [~xkrogen] [~jojochuang] 

> Enhance Audit for chown ( internally setOwner)   
> -
>
> Key: HDFS-14174
> URL: https://issues.apache.org/jira/browse/HDFS-14174
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Sailesh Patel
>Assignee: hemanthboyina
>Priority: Minor
>
> When a hdfs dfs -chown  command is executed, the audit log  does not capture 
> the  existing owner and the new owner.    
> Need to capture the old and new owner to allow auditing to be effective
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13123) RBF: Add a balancer tool to move data across subcluster

2019-08-12 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905797#comment-16905797
 ] 

hemanthboyina commented on HDFS-13123:
--

thanks for the comment [~crh]

           _How is atomicity in distcp taken into account here? If distcp 
fails, destination cluster may have unused files lying around unaudited. May be 
user can specify atomicity flag through admin._

  __  if the distcp fails , we will delete the files copied in the destination 
cluster ,  +We can use atomicity flag for better purpose+ 

         _How are multiple rebalancings going to work if executed? Should admin 
maintain a state of what all rebalancing is in progress and what all completed. 
Some basic auditing at least._

 Yes admin should maintain what all rebalancing operations in progress , and 
for a given mount point, we only allow one concurrent rebalancing operation.

       _Rebalancing across secured clusters?_

As we are using distcp , distcp should be taken care of it in secure cluster 

> RBF: Add a balancer tool to move data across subcluster 
> 
>
> Key: HDFS-13123
> URL: https://issues.apache.org/jira/browse/HDFS-13123
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS Router-Based Federation Rebalancer.pdf, 
> HDFS-13123.patch
>
>
> Follow the discussion in HDFS-12615. This Jira is to track effort for 
> building a rebalancer tool, used by router-based federation to move data 
> among subclusters.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14719) Correct the safemode threshold value in BlockManagerSafeMode

2019-08-13 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14719:
-
Attachment: HDFS-14719.002.patch

> Correct the safemode threshold value in BlockManagerSafeMode
> 
>
> Key: HDFS-14719
> URL: https://issues.apache.org/jira/browse/HDFS-14719
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14719.002.patch, HDFS-14719.patch
>
>
> BlockManagerSafeMode is doing wrong parsing for safemode threshold. It is 
> storing float value in double, which will give different result some time. If 
> we store "0.999f" value in double then it will be converted to 
> "0.999128746033".
> {code:java}
> this.threshold = conf.getFloat(DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_KEY,
> DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT);{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14567) If kms-acls is failed to load, and it will never be reload

2019-08-19 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14567:
-
Attachment: HDFS-14567.001.patch

>  If kms-acls is failed to load, and it will never be reload
> ---
>
> Key: HDFS-14567
> URL: https://issues.apache.org/jira/browse/HDFS-14567
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14567.001.patch, HDFS-14567.patch
>
>
> Scenario : through one automation tool , we are generating kms-acls , though 
> the generation of kms-acls is not completed , the system will detect a 
> modification of kms-alcs and it will try to load
> Before getting the configuration we are modifiying last reload time , code 
> shown below
> {code:java}
> private Configuration loadACLsFromFile() {
> LOG.debug("Loading ACLs file");
> lastReload = System.currentTimeMillis();
> Configuration conf = KMSConfiguration.getACLsConf();
> // triggering the resource loading.
> conf.get(Type.CREATE.getAclConfigKey());
> return conf;
> }{code}
> if the kms-acls file written within next 100ms , the changes will not be 
> loaded as this condition "newer = f.lastModified() - time > 100" never meets 
> because we have modified last reload time before getting the configuration
> {code:java}
> public static boolean isACLsFileNewer(long time) {
> boolean newer = false;
> String confDir = System.getProperty(KMS_CONFIG_DIR);
> if (confDir != null) {
> Path confPath = new Path(confDir);
> if (!confPath.isUriPathAbsolute()) {
> throw new RuntimeException("System property '" + KMS_CONFIG_DIR +
> "' must be an absolute path: " + confDir);
> }
> File f = new File(confDir, KMS_ACLS_XML);
> LOG.trace("Checking file {}, modification time is {}, last reload time is"
> + " {}", f.getPath(), f.lastModified(), time);
> // at least 100ms newer than time, we do this to ensure the file
> // has been properly closed/flushed
> newer = f.lastModified() - time > 100;
> }
> return newer;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14174) Enhance Audit for chown ( internally setOwner)

2019-08-19 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910486#comment-16910486
 ] 

hemanthboyina commented on HDFS-14174:
--

when a user operation done , for a major change (in users perspective) like 
setting owner
It needs to be audited that , what was before and  what was now (owner)
Just to track out,  it was required .

> Enhance Audit for chown ( internally setOwner)   
> -
>
> Key: HDFS-14174
> URL: https://issues.apache.org/jira/browse/HDFS-14174
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Sailesh Patel
>Assignee: hemanthboyina
>Priority: Minor
>
> When a hdfs dfs -chown  command is executed, the audit log  does not capture 
> the  existing owner and the new owner.    
> Need to capture the old and new owner to allow auditing to be effective
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14501) BenchmarkThroughput.writeFile hangs with misconfigured BUFFER_SIZE

2019-08-19 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-14501:


Assignee: hemanthboyina

> BenchmarkThroughput.writeFile hangs with misconfigured BUFFER_SIZE
> --
>
> Key: HDFS-14501
> URL: https://issues.apache.org/jira/browse/HDFS-14501
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: John Doe
>Assignee: hemanthboyina
>Priority: Major
>
> When the configuration file is corrupted, reading BUFFER_SIZE from corrupted 
> conf can return 0.
>  The "for" loop in BenchmarkThroughput.writeLocalFile function hangs 
> endlessly.
>  Here is the code snippet.
> {code:java}
>   BUFFER_SIZE = conf.getInt("dfsthroughput.buffer.size", 4 * 1024);
>   private Path writeFile(FileSystem fs,
> String name,
> Configuration conf,
> long total
> ) throws IOException {
> Path f = dir.getLocalPathForWrite(name, total, conf);
> System.out.print("Writing " + name);
> resetMeasurements();
> OutputStream out = fs.create(f);
> byte[] data = new byte[BUFFER_SIZE];
> for(long size = 0; size < total; size += BUFFER_SIZE) { //Bug!
>   out.write(data);
> }
> out.close();
> printMeasurements();
> return f;
>   }
> {code}
> This configuration error also affects HDFS-13513, HDFS-13514, 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-08-20 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-14754:


 Summary: Erasure Coding :  The number of Under-Replicated Blocks 
never reduced
 Key: HDFS-14754
 URL: https://issues.apache.org/jira/browse/HDFS-14754
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina


Using EC RS-3-2, 6 DN 

We came accross a scenario where in the EC 5 blocks , same block is replicated 
thrice and two blocks got missing
Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-08-20 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14754:
-
Attachment: HDFS-14754.001.patch

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13879) FileSystem: Add allowSnapshot, disallowSnapshot, getSnapshotDiffReport and getSnapshottableDirListing

2019-08-17 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909786#comment-16909786
 ] 

hemanthboyina commented on HDFS-13879:
--

getSnapshottableDirListing returns SnapshottableDirectoryStatus
SnapshottableDirectoryStatus.java was there in client , there is no dependency 
from client to common 
SnapshottableDirectoryStatus internally uses dfsutilclient so we cant move that 
to common .
any suggestions [~jojochuang] [~smeng]

> FileSystem: Add allowSnapshot, disallowSnapshot, getSnapshotDiffReport and 
> getSnapshottableDirListing
> -
>
> Key: HDFS-13879
> URL: https://issues.apache.org/jira/browse/HDFS-13879
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.1
>Reporter: Siyao Meng
>Assignee: hemanthboyina
>Priority: Major
>
> I wonder whether we should add allowSnapshot() and disallowSnapshot() to 
> FileSystem abstract class.
> I think we should because createSnapshot(), renameSnapshot() and 
> deleteSnapshot() are already part of it.
> Any reason why we don't want to do this?
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14739) RBF: LS command for mount point shows wrong owner and permission information.

2019-08-16 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908854#comment-16908854
 ] 

hemanthboyina commented on HDFS-14739:
--

    _then it will be overwrite_

It will overwrite if you explicitly specify the group while mount point 
creation , so thats not our scenario .

For listing mount points the command is dfsrouteradmin -ls 
what you have been checking is for listing files in /mnt 

> RBF: LS command for mount point shows wrong owner and permission information.
> -
>
> Key: HDFS-14739
> URL: https://issues.apache.org/jira/browse/HDFS-14739
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Priority: Major
> Attachments: image-2019-08-16-16-27-08-003.png, 
> image-2019-08-16-16-28-15-022.png
>
>
> ||source||target namespace||destination||owner||group||permission||
> |/mnt|ns0|/mnt|mnt|mnt_group|755|
> |/mnt/test1|ns1|/mnt/test1|mnt_test1|mnt_test1_group|755|
> |/test1|ns1|/test1|test1|test1_group|755|
> When do getListing("/mnt"), the owner of  */mnt/test1* should be *mnt_test1* 
> instead of *test1* in result.
>  
> And if the mount table as blew, we should support getListing("/mnt") instead 
> of throw IOException when dfs.federation.router.default.nameservice.enable is 
> false.
> ||source||target namespace||destination||owner||group||permission||
> |/mnt/test1|ns0|/mnt/test1|test1|test1|755|
> |/mnt/test2|ns1|/mnt/test2|test2|test2|755|
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14739) RBF: LS command for mount point shows wrong owner and permission information.

2019-08-16 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908883#comment-16908883
 ] 

hemanthboyina commented on HDFS-14739:
--

as far as i know both are unrelated 
the mount points you have  created (/mnt/test1) with owner(_mnt_test1)_ and ls 
of files in /mnt 


[~elgoiri] is this issue a valid one ?

> RBF: LS command for mount point shows wrong owner and permission information.
> -
>
> Key: HDFS-14739
> URL: https://issues.apache.org/jira/browse/HDFS-14739
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Priority: Major
> Attachments: image-2019-08-16-17-15-50-614.png, 
> image-2019-08-16-17-16-00-863.png, image-2019-08-16-17-16-34-325.png
>
>
> ||source||target namespace||destination||owner||group||permission||
> |/mnt|ns0|/mnt|mnt|mnt_group|755|
> |/mnt/test1|ns1|/mnt/test1|mnt_test1|mnt_test1_group|755|
> |/test1|ns1|/test1|test1|test1_group|755|
> When do getListing("/mnt"), the owner of  */mnt/test1* should be *mnt_test1* 
> instead of *test1* in result.
>  
> And if the mount table as blew, we should support getListing("/mnt") instead 
> of throw IOException when dfs.federation.router.default.nameservice.enable is 
> false.
> ||source||target namespace||destination||owner||group||permission||
> |/mnt/test1|ns0|/mnt/test1|test1|test1|755|
> |/mnt/test2|ns1|/mnt/test2|test2|test2|755|
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13270) RBF: Router audit logger

2019-08-17 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-13270:
-
Attachment: HDFS-13270.003.patch

> RBF: Router audit logger
> 
>
> Key: HDFS-13270
> URL: https://issues.apache.org/jira/browse/HDFS-13270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: maobaolong
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-13270.001.patch, HDFS-13270.002.patch, 
> HDFS-13270.003.patch
>
>
> We can use router auditlogger to log the client info and cmd, because the 
> FSNamesystem#Auditlogger's log think the client are all from router.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14630) Configuration.getTimeDurationHelper() should not log time unit warning in info log.

2019-08-17 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909820#comment-16909820
 ] 

hemanthboyina commented on HDFS-14630:
--

uploaded  patch please check [~surendrasingh]

> Configuration.getTimeDurationHelper() should not log time unit warning in 
> info log.
> ---
>
> Key: HDFS-14630
> URL: https://issues.apache.org/jira/browse/HDFS-14630
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Minor
> Attachments: HDFS-14630.patch
>
>
> To solve [HDFS-12920|https://issues.apache.org/jira/browse/HDFS-12920] issue 
> we configured "dfs.client.datanode-restart.timeout" without time unit. No log 
> file is full of
> {noformat}
> 2019-06-22 20:13:14,605 | INFO  | pool-12-thread-1 | No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS 
> org.apache.hadoop.conf.Configuration.logDeprecation(Configuration.java:1409){noformat}
> No need to log this, just give the behavior in property description.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

2019-08-17 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14720:
-
Attachment: HDFS-14720.001.patch

> DataNode shouldn't report block as bad block if the block length is 
> Long.MAX_VALUE.
> ---
>
> Key: HDFS-14720
> URL: https://issues.apache.org/jira/browse/HDFS-14720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14720.001.patch
>
>
> {noformat}
> 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Can't replicate block 
> BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because 
> on-disk length 175085 is shorter than NameNode recorded length 
> 9223372036854775807.{noformat}
> If the block length is Long.MAX_VALUE, means file belongs to this block is 
> deleted from the namenode and DN got the command after deletion of file. In 
> this case command should be ignored.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14574) [distcp] Add ability to increase the replication factor for fileList.seq

2019-08-17 Thread hemanthboyina (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909821#comment-16909821
 ] 

hemanthboyina commented on HDFS-14574:
--

[~jojochuang] , In DIstCp we have preserve status (rbugpc..)
if we have an option for replication  then these replications will override 
any suggestions about this ?

> [distcp] Add ability to increase the replication factor for fileList.seq
> 
>
> Key: HDFS-14574
> URL: https://issues.apache.org/jira/browse/HDFS-14574
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Reporter: Wei-Chiu Chuang
>Assignee: hemanthboyina
>Priority: Major
>
> distcp creates fileList.seq with default replication factor = 3.
> For large clusters runing distcp job with thousands of mappers, that 
> 3-replica for the file listing file is not good enough, because DataNodes 
> easily run out of max number of xceivers.
>  
> It looks like we can pass in a distcp option, update replication factor in 
> when creating the sequence file writer: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L517-L521]
>  
> Like this:
> {code:java}
> return SequenceFile.createWriter(getConf(),
> SequenceFile.Writer.file(pathToListFile),
> SequenceFile.Writer.keyClass(Text.class),
> SequenceFile.Writer.valueClass(CopyListingFileStatus.class),
> SequenceFile.Writer.compression(SequenceFile.CompressionType.NONE),
> SequenceFile.Writer.replication((short)100)); <-- this line
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

2019-08-17 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14720:
-
Attachment: HDFS-14720.001.patch
Status: Patch Available  (was: Open)

> DataNode shouldn't report block as bad block if the block length is 
> Long.MAX_VALUE.
> ---
>
> Key: HDFS-14720
> URL: https://issues.apache.org/jira/browse/HDFS-14720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14720.001.patch
>
>
> {noformat}
> 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Can't replicate block 
> BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because 
> on-disk length 175085 is shorter than NameNode recorded length 
> 9223372036854775807.{noformat}
> If the block length is Long.MAX_VALUE, means file belongs to this block is 
> deleted from the namenode and DN got the command after deletion of file. In 
> this case command should be ignored.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

2019-08-17 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14720:
-
Attachment: (was: HDFS-14720.001.patch)

> DataNode shouldn't report block as bad block if the block length is 
> Long.MAX_VALUE.
> ---
>
> Key: HDFS-14720
> URL: https://issues.apache.org/jira/browse/HDFS-14720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14720.001.patch
>
>
> {noformat}
> 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Can't replicate block 
> BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because 
> on-disk length 175085 is shorter than NameNode recorded length 
> 9223372036854775807.{noformat}
> If the block length is Long.MAX_VALUE, means file belongs to this block is 
> deleted from the namenode and DN got the command after deletion of file. In 
> this case command should be ignored.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14358) Provide LiveNode and DeadNode filter in DataNode UI

2019-08-20 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14358:
-
Attachment: HDFS-14358.005.patch

> Provide LiveNode and DeadNode filter in DataNode UI
> ---
>
> Key: HDFS-14358
> URL: https://issues.apache.org/jira/browse/HDFS-14358
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Ravuri Sushma sree
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14358 (4).patch, HDFS-14358(2).patch, 
> HDFS-14358(3).patch, HDFS-14358.005.patch, HDFS14358.JPG, hdfs-14358.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14358) Provide LiveNode and DeadNode filter in DataNode UI

2019-08-20 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14358:
-
Attachment: (was: HDFS-14358.005.patch)

> Provide LiveNode and DeadNode filter in DataNode UI
> ---
>
> Key: HDFS-14358
> URL: https://issues.apache.org/jira/browse/HDFS-14358
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Ravuri Sushma sree
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14358 (4).patch, HDFS-14358(2).patch, 
> HDFS-14358(3).patch, HDFS-14358.005.patch, HDFS14358.JPG, hdfs-14358.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13123) RBF: Add a balancer tool to move data across subcluster

2019-08-20 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911953#comment-16911953
 ] 

hemanthboyina commented on HDFS-13123:
--

hi [~jojochuang] , thanks for the review

        _You should make sure both directories on the source and destination 
are snapshottable before running this tool._

we are making source and destination snapshottable , before doing any operation
{code:java}
srcFs.allowSnapshot(srcmountPath);
destFs.allowSnapshot(destFolderPath);{code}
      _Probably not a good idea to hard code the snapshot name as "s1" and 
"s2". Use randomly generated name instead._

will change this 

     _I don't understand why you create two snapshots in the source cluster 
almost immediately. If you do so, you only update the files added/deleted 
during the two snapshots_.

According to design in the document : "Do a first copy first, then put the 
lock, and do a second copy to capture any new changes there." so created 
snapshot and did disctp now data was in destination , if there any files 
added/deleted during this time , we take a second snapshot and do the diff 

    _The state of "s1" snapshot on the source should be exactly the same as the 
state of "s1" snapshot on the destination. You'll hit various strange issues if 
the destination is not a mirror of source._

After using distcp , we check the return code of distcp , if the copy was not 
successfull we delete the copied folder in destination ,we revert back the 
changes we have done(allowsnapshot , createsnapshot) and will return (didnt 
updated in the code , should modify it )

     _make sure you delete the snapshots even if the prior steps hit errors._ 

have done that , please check in this part of code .

 
{code:java}
 if (exitCode != 0) {
 srcFs.deleteSnapshot(srcmountPath, "s1");
 srcFs.disallowSnapshot(srcmountPath);

 if (distcpUpdateExitCode != 0) {
srcFs.deleteSnapshot(srcmountPath, "s1");
srcFs.deleteSnapshot(srcmountPath, "s2");
srcFs.disallowSnapshot(srcmountPath);
destFs.deleteSnapshot(destFolderPath, "s1");
destFs.disallowSnapshot(destFolderPath);
destFs.delete(new Path(destpath), true);{code}
 

 

 

> RBF: Add a balancer tool to move data across subcluster 
> 
>
> Key: HDFS-13123
> URL: https://issues.apache.org/jira/browse/HDFS-13123
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS Router-Based Federation Rebalancer.pdf, 
> HDFS-13123.patch
>
>
> Follow the discussion in HDFS-12615. This Jira is to track effort for 
> building a rebalancer tool, used by router-based federation to move data 
> among subclusters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14758) Decrease lease hard limit

2019-08-20 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-14758:


Assignee: hemanthboyina

> Decrease lease hard limit
> -
>
> Key: HDFS-14758
> URL: https://issues.apache.org/jira/browse/HDFS-14758
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Eric Payne
>Assignee: hemanthboyina
>Priority: Minor
>
> The hard limit is currently hard-coded to be 1 hour. This also determines the 
> NN automatic lease recovery interval. Something like 20 min will make more 
> sense.
> After the 5 min soft limit, other clients can recover the lease. If no one 
> else takes the lease away, the original client still can renew the lease 
> within the hard limit. So even after a NN full GC of 8 minutes, leases can be 
> still valid.
> However, there is one risk in reducing the hard limit. E.g. Reduced to 20 
> min. If the NN crashes and the manual failover takes more than 20 minutes, 
> clients will abort.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-08-20 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14754:
-
Attachment: HDFS-14754.002.patch

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14766) RBF: MountTableStoreImpl#getMountTableEntries returns extra entry

2019-08-22 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913261#comment-16913261
 ] 

hemanthboyina commented on HDFS-14766:
--

hi [~zhangchen]

 there are other *startsWith()* exists in the router part of code , maybe we 
can file a single jira to fix them once ?

> RBF: MountTableStoreImpl#getMountTableEntries returns extra entry
> -
>
> Key: HDFS-14766
> URL: https://issues.apache.org/jira/browse/HDFS-14766
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14766.001.patch
>
>
> Similar issue with HDFS-14756, should use \{{FederationUtil.isParentEntry()}} 
> instead of \{{String.startsWith()}} to identify parent path



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14764) HDFS count doesn't include snapshot files correctly

2019-08-22 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913164#comment-16913164
 ] 

hemanthboyina commented on HDFS-14764:
--

          _While the count command shows there is still space._

          _Because, when we write files into a directory, it will also check 
the snapshot files. But the count command will not check._

[~renxunsaky] good analysis , feel it is a valid issue , need to fix

> HDFS count doesn't include snapshot files correctly
> ---
>
> Key: HDFS-14764
> URL: https://issues.apache.org/jira/browse/HDFS-14764
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xun REN
>Priority: Major
> Attachments: hdfs_count_withsnapshot.txt
>
>
> Hi,
>  
> When we set a quota on a path, and that path contains some snapshots, in this 
> case, the status shown by the command "hdfs dfs -count -v -q /my_path" 
> doesn't match the real quota usage.
> The -count here will only count the current path without counting the files 
> in the snapshots which are already deleted in the current path.
> If there is a job continues to write files into that path, it will report an 
> error like 
> {code:java}
> The NameSpace quota (directories and files) of directory /my_path is 
> exceeded{code}
> While the count command shows there is still space.
> Because, when we write files into a directory, it will also check the 
> snapshot files. But the count command will not check.
>  
> The idea here is to modify the report of "hdfs dfs -count" to include also 
> the files in snapshots. Ideally, we could add an additional column to show 
> the total number of files of the current directory + files deleted from the 
> current directory but referenced in the snapshots.
>  
> You could find in the attached text file the steps to reproduce the issue.
>  
> Thanks.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13527) craeteLocatedBlock IsCorrupt logic is fault when all block are corrupt.

2019-08-22 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-13527:


Assignee: hemanthboyina

> craeteLocatedBlock IsCorrupt logic is fault when all block are corrupt.
> ---
>
> Key: HDFS-13527
> URL: https://issues.apache.org/jira/browse/HDFS-13527
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Affects Versions: 3.2.0
>Reporter: maobaolong
>Assignee: hemanthboyina
>Priority: Major
>
> the step is:
> 1. put a small file into hdfs FILEPATH
> 2. remove block replicas in all datanode blockpool.
> 3. restart datanode
> 4. restart namenode( leave safemode)
> 5. hdfs fsck FILEPATH -files -blocks  -locations 
> 6. namenode think this block is not corrupt block.
> the code logic is:
> {code:java}
> // get block locations
> NumberReplicas numReplicas = countNodes(blk);
> final int numCorruptNodes = numReplicas.corruptReplicas();
> final int numCorruptReplicas = corruptReplicas.numCorruptReplicas(blk);
> if (numCorruptNodes != numCorruptReplicas) {
>   LOG.warn("Inconsistent number of corrupt replicas for {}"
>   + " blockMap has {} but corrupt replicas map has {}",
>   blk, numCorruptNodes, numCorruptReplicas);
> }
> final int numNodes = blocksMap.numNodes(blk);
> final boolean isCorrupt;
> if (blk.isStriped()) {
>   BlockInfoStriped sblk = (BlockInfoStriped) blk;
>   isCorrupt = numCorruptReplicas != 0 &&
>   numReplicas.liveReplicas() < sblk.getRealDataBlockNum();
> } else {
>   isCorrupt = numCorruptReplicas != 0 && numCorruptReplicas == numNodes;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14764) HDFS count doesn't include snapshot files correctly

2019-08-22 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913327#comment-16913327
 ] 

hemanthboyina commented on HDFS-14764:
--

[~ayushtkn] any comments about this  ?

> HDFS count doesn't include snapshot files correctly
> ---
>
> Key: HDFS-14764
> URL: https://issues.apache.org/jira/browse/HDFS-14764
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xun REN
>Priority: Major
> Attachments: hdfs_count_withsnapshot.txt
>
>
> Hi,
>  
> When we set a quota on a path, and that path contains some snapshots, in this 
> case, the status shown by the command "hdfs dfs -count -v -q /my_path" 
> doesn't match the real quota usage.
> The -count here will only count the current path without counting the files 
> in the snapshots which are already deleted in the current path.
> If there is a job continues to write files into that path, it will report an 
> error like 
> {code:java}
> The NameSpace quota (directories and files) of directory /my_path is 
> exceeded{code}
> While the count command shows there is still space.
> Because, when we write files into a directory, it will also check the 
> snapshot files. But the count command will not check.
>  
> The idea here is to modify the report of "hdfs dfs -count" to include also 
> the files in snapshots. Ideally, we could add an additional column to show 
> the total number of files of the current directory + files deleted from the 
> current directory but referenced in the snapshots.
>  
> You could find in the attached text file the steps to reproduce the issue.
>  
> Thanks.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14767) NPE in FSEditLogOp

2019-08-22 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-14767:


 Summary: NPE in FSEditLogOp
 Key: HDFS-14767
 URL: https://issues.apache.org/jira/browse/HDFS-14767
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina


In Inner class of FSEditLogOp , AddCacheDirectiveInfoOp  toString caused Null 
Pointer Exception



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2019-08-22 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913340#comment-16913340
 ] 

hemanthboyina commented on HDFS-14762:
--

written a junit , the exception was thrown if the child string containing":"
[~ayushtkn]  need to fix with a check , else let the exception be thrown ?

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Priority: Major
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14767) NPE in FSEditLogOp

2019-08-22 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913363#comment-16913363
 ] 

hemanthboyina commented on HDFS-14767:
--

{code:java}
  builder.append("AddCacheDirectiveInfo [")
  .append("id=" + directive.getId() + ",")
  .append("path=" + directive.getPath().toUri().getPath() + ",")
  .append("replication=" + directive.getReplication() + ",")
  .append("pool=" + directive.getPool() + ",")
  .append("expiration=" + directive.getExpiration().getMillis()); {code}
the directive.getPath()  was null ,  caused NPE

> NPE in FSEditLogOp
> --
>
> Key: HDFS-14767
> URL: https://issues.apache.org/jira/browse/HDFS-14767
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> In Inner class of FSEditLogOp , AddCacheDirectiveInfoOp  toString caused Null 
> Pointer Exception



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-08-20 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14754:
-
Attachment: (was: HDFS-14754.001.patch)

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-08-20 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14754:
-
Attachment: HDFS-14754.001.patch
Status: Patch Available  (was: Open)

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14358) Provide LiveNode and DeadNode filter in DataNode UI

2019-08-20 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911349#comment-16911349
 ] 

hemanthboyina commented on HDFS-14358:
--

thanks [~surendrasingh]  for the comment 
done formatting and uploaded the patch
after the changes the UI will be in this way
!HDFS14358.JPG|width=1037,height=71!

> Provide LiveNode and DeadNode filter in DataNode UI
> ---
>
> Key: HDFS-14358
> URL: https://issues.apache.org/jira/browse/HDFS-14358
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Ravuri Sushma sree
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14358 (4).patch, HDFS-14358(2).patch, 
> HDFS-14358(3).patch, HDFS-14358.005.patch, HDFS14358.JPG, hdfs-14358.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14358) Provide LiveNode and DeadNode filter in DataNode UI

2019-08-20 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14358:
-
Attachment: HDFS-14358.005.patch

> Provide LiveNode and DeadNode filter in DataNode UI
> ---
>
> Key: HDFS-14358
> URL: https://issues.apache.org/jira/browse/HDFS-14358
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Ravuri Sushma sree
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14358 (4).patch, HDFS-14358(2).patch, 
> HDFS-14358(3).patch, HDFS-14358.005.patch, HDFS14358.JPG, hdfs-14358.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14358) Provide LiveNode and DeadNode filter in DataNode UI

2019-08-20 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14358:
-
Attachment: HDFS14358.JPG

> Provide LiveNode and DeadNode filter in DataNode UI
> ---
>
> Key: HDFS-14358
> URL: https://issues.apache.org/jira/browse/HDFS-14358
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Ravuri Sushma sree
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14358 (4).patch, HDFS-14358(2).patch, 
> HDFS-14358(3).patch, HDFS-14358.005.patch, HDFS14358.JPG, hdfs-14358.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13270) RBF: Router audit logger

2019-08-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915994#comment-16915994
 ] 

hemanthboyina commented on HDFS-13270:
--

[~maobaolong] [~elgoiri] [~jojochuang] [~surendrasingh]  please check the 
recent patch

> RBF: Router audit logger
> 
>
> Key: HDFS-13270
> URL: https://issues.apache.org/jira/browse/HDFS-13270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: maobaolong
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-13270.001.patch, HDFS-13270.002.patch, 
> HDFS-13270.003.patch
>
>
> We can use router auditlogger to log the client info and cmd, because the 
> FSNamesystem#Auditlogger's log think the client are all from router.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13220) Change lastCheckpointTime to use fsimage mostRecentCheckpointTime

2019-08-26 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-13220:
-
Attachment: HDFS-13220.002.patch

> Change lastCheckpointTime to use fsimage mostRecentCheckpointTime
> -
>
> Key: HDFS-13220
> URL: https://issues.apache.org/jira/browse/HDFS-13220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nie Gus
>Assignee: hemanthboyina
>Priority: Minor
> Attachments: HDFS-13220.002.patch, HDFS-13220.patch
>
>
> we found the our standby nn did not do the checkpoint, and the checkpoint 
> alert keep alert, we use the jmx last checkpoint time and 
> dfs.namenode.checkpoint.period to do the monitor check.
>  
> then check the code and log, found the standby NN are using monotonicNow, not 
> fsimage checkpoint time, so when Standby NN restart or switch to Active, then 
> the
> lastCheckpointTime in doWork will be reset. so there is risk standby nn 
> restart or stand active switch will cause the checkpoint delay. 
>  StandbyCheckpointer.java
> {code:java}
> private void doWork() {
> final long checkPeriod = 1000 * checkpointConf.getCheckPeriod();
> // Reset checkpoint time so that we don't always checkpoint
> // on startup.
> lastCheckpointTime = monotonicNow();
> while (shouldRun) {
> boolean needRollbackCheckpoint = namesystem.isNeedRollbackFsImage();
> if (!needRollbackCheckpoint) {
> try {
> Thread.sleep(checkPeriod);
> } catch (InterruptedException ie) {
> }
> if (!shouldRun) {
> break;
> }
> }
> try {
> // We may have lost our ticket since last checkpoint, log in again, just in 
> case
> if (UserGroupInformation.isSecurityEnabled()) {
> UserGroupInformation.getCurrentUser().checkTGTAndReloginFromKeytab();
> }
> final long now = monotonicNow();
> final long uncheckpointed = countUncheckpointedTxns();
> final long secsSinceLast = (now - lastCheckpointTime) / 1000;
> boolean needCheckpoint = needRollbackCheckpoint;
> if (needCheckpoint) {
> LOG.info("Triggering a rollback fsimage for rolling upgrade.");
> } else if (uncheckpointed >= checkpointConf.getTxnCount()) {
> LOG.info("Triggering checkpoint because there have been " +
> uncheckpointed + " txns since the last checkpoint, which " +
> "exceeds the configured threshold " +
> checkpointConf.getTxnCount());
> needCheckpoint = true;
> } else if (secsSinceLast >= checkpointConf.getPeriod()) {
> LOG.info("Triggering checkpoint because it has been " +
> secsSinceLast + " seconds since the last checkpoint, which " +
> "exceeds the configured interval " + checkpointConf.getPeriod());
> needCheckpoint = true;
> }
> synchronized (cancelLock) {
> if (now < preventCheckpointsUntil) {
> LOG.info("But skipping this checkpoint since we are about to failover!");
> canceledCount++;
> continue;
> }
> assert canceler == null;
> canceler = new Canceler();
> }
> if (needCheckpoint) {
> doCheckpoint();
> // reset needRollbackCheckpoint to false only when we finish a ckpt
> // for rollback image
> if (needRollbackCheckpoint
> && namesystem.getFSImage().hasRollbackFSImage()) {
> namesystem.setCreatedRollbackImages(true);
> namesystem.setNeedRollbackFsImage(false);
> }
> lastCheckpointTime = now;
> }
> } catch (SaveNamespaceCancelledException ce) {
> LOG.info("Checkpoint was cancelled: " + ce.getMessage());
> canceledCount++;
> } catch (InterruptedException ie) {
> LOG.info("Interrupted during checkpointing", ie);
> // Probably requested shutdown.
> continue;
> } catch (Throwable t) {
> LOG.error("Exception in doCheckpoint", t);
> } finally {
> synchronized (cancelLock) {
> canceler = null;
> }
> }
> }
> }
> }
> {code}
>  
> can we use the fsimage's mostRecentCheckpointTime to do the check.
>  
> thanks,
> Gus



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14778) BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage state is failed

2019-08-26 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-14778:


 Summary: BlockManager findAndMarkBlockAsCorrupt adds block to the 
map if the Storage state is failed
 Key: HDFS-14778
 URL: https://issues.apache.org/jira/browse/HDFS-14778
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina


Should not mark the block as corrupt if the storage state is failed



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14778) BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage state is failed

2019-08-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916030#comment-16916030
 ] 

hemanthboyina commented on HDFS-14778:
--

{code:java}
 if (storage == null) {
storage = storedBlock.findStorageInfo(node);
 }
 if (storage == null) {
   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
  blk, dn);
  return;
 }
 markBlockAsCorrupt(new BlockToMarkCorrupt(reportedBlock, storedBlock,
blk.getGenerationStamp(), reason, Reason.CORRUPTION_REPORTED), 
storage, node); {code}
before marking the block as corrupt , we should check if the storage state is 
failed 

> BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage 
> state is failed
> ---
>
> Key: HDFS-14778
> URL: https://issues.apache.org/jira/browse/HDFS-14778
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> Should not mark the block as corrupt if the storage state is failed



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14567) If kms-acls is failed to load, and it will never be reload

2019-08-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916084#comment-16916084
 ] 

hemanthboyina commented on HDFS-14567:
--

please review the updated test code [~jojochuang] , thanks

>  If kms-acls is failed to load, and it will never be reload
> ---
>
> Key: HDFS-14567
> URL: https://issues.apache.org/jira/browse/HDFS-14567
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14567.001.patch, HDFS-14567.002.patch, 
> HDFS-14567.patch
>
>
> Scenario : through one automation tool , we are generating kms-acls , though 
> the generation of kms-acls is not completed , the system will detect a 
> modification of kms-alcs and it will try to load
> Before getting the configuration we are modifiying last reload time , code 
> shown below
> {code:java}
> private Configuration loadACLsFromFile() {
> LOG.debug("Loading ACLs file");
> lastReload = System.currentTimeMillis();
> Configuration conf = KMSConfiguration.getACLsConf();
> // triggering the resource loading.
> conf.get(Type.CREATE.getAclConfigKey());
> return conf;
> }{code}
> if the kms-acls file written within next 100ms , the changes will not be 
> loaded as this condition "newer = f.lastModified() - time > 100" never meets 
> because we have modified last reload time before getting the configuration
> {code:java}
> public static boolean isACLsFileNewer(long time) {
> boolean newer = false;
> String confDir = System.getProperty(KMS_CONFIG_DIR);
> if (confDir != null) {
> Path confPath = new Path(confDir);
> if (!confPath.isUriPathAbsolute()) {
> throw new RuntimeException("System property '" + KMS_CONFIG_DIR +
> "' must be an absolute path: " + confDir);
> }
> File f = new File(confDir, KMS_ACLS_XML);
> LOG.trace("Checking file {}, modification time is {}, last reload time is"
> + " {}", f.getPath(), f.lastModified(), time);
> // at least 100ms newer than time, we do this to ensure the file
> // has been properly closed/flushed
> newer = f.lastModified() - time > 100;
> }
> return newer;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14758) Decrease lease hard limit

2019-08-25 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915244#comment-16915244
 ] 

hemanthboyina commented on HDFS-14758:
--

yes [~zhangchen] even decreasing hard limit doesn't solve in all the scenarios 
, 
as [~jojochuang] mentioned _there can be network partitions or client may 
simply crash (NN crash doesn't loose state). HDFS-14694 doesn't address all 
failure scenarios._

so i feel better we have both which covers all the failure scenarios 

> Decrease lease hard limit
> -
>
> Key: HDFS-14758
> URL: https://issues.apache.org/jira/browse/HDFS-14758
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Eric Payne
>Assignee: hemanthboyina
>Priority: Minor
>
> The hard limit is currently hard-coded to be 1 hour. This also determines the 
> NN automatic lease recovery interval. Something like 20 min will make more 
> sense.
> After the 5 min soft limit, other clients can recover the lease. If no one 
> else takes the lease away, the original client still can renew the lease 
> within the hard limit. So even after a NN full GC of 8 minutes, leases can be 
> still valid.
> However, there is one risk in reducing the hard limit. E.g. Reduced to 20 
> min. If the NN crashes and the manual failover takes more than 20 minutes, 
> clients will abort.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13220) Change lastCheckpointTime to use fsimage mostRecentCheckpointTime

2019-08-25 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-13220:
-
Attachment: HDFS-13220.patch
Status: Patch Available  (was: Open)

> Change lastCheckpointTime to use fsimage mostRecentCheckpointTime
> -
>
> Key: HDFS-13220
> URL: https://issues.apache.org/jira/browse/HDFS-13220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nie Gus
>Assignee: hemanthboyina
>Priority: Minor
> Attachments: HDFS-13220.patch
>
>
> we found the our standby nn did not do the checkpoint, and the checkpoint 
> alert keep alert, we use the jmx last checkpoint time and 
> dfs.namenode.checkpoint.period to do the monitor check.
>  
> then check the code and log, found the standby NN are using monotonicNow, not 
> fsimage checkpoint time, so when Standby NN restart or switch to Active, then 
> the
> lastCheckpointTime in doWork will be reset. so there is risk standby nn 
> restart or stand active switch will cause the checkpoint delay. 
>  StandbyCheckpointer.java
> {code:java}
> private void doWork() {
> final long checkPeriod = 1000 * checkpointConf.getCheckPeriod();
> // Reset checkpoint time so that we don't always checkpoint
> // on startup.
> lastCheckpointTime = monotonicNow();
> while (shouldRun) {
> boolean needRollbackCheckpoint = namesystem.isNeedRollbackFsImage();
> if (!needRollbackCheckpoint) {
> try {
> Thread.sleep(checkPeriod);
> } catch (InterruptedException ie) {
> }
> if (!shouldRun) {
> break;
> }
> }
> try {
> // We may have lost our ticket since last checkpoint, log in again, just in 
> case
> if (UserGroupInformation.isSecurityEnabled()) {
> UserGroupInformation.getCurrentUser().checkTGTAndReloginFromKeytab();
> }
> final long now = monotonicNow();
> final long uncheckpointed = countUncheckpointedTxns();
> final long secsSinceLast = (now - lastCheckpointTime) / 1000;
> boolean needCheckpoint = needRollbackCheckpoint;
> if (needCheckpoint) {
> LOG.info("Triggering a rollback fsimage for rolling upgrade.");
> } else if (uncheckpointed >= checkpointConf.getTxnCount()) {
> LOG.info("Triggering checkpoint because there have been " +
> uncheckpointed + " txns since the last checkpoint, which " +
> "exceeds the configured threshold " +
> checkpointConf.getTxnCount());
> needCheckpoint = true;
> } else if (secsSinceLast >= checkpointConf.getPeriod()) {
> LOG.info("Triggering checkpoint because it has been " +
> secsSinceLast + " seconds since the last checkpoint, which " +
> "exceeds the configured interval " + checkpointConf.getPeriod());
> needCheckpoint = true;
> }
> synchronized (cancelLock) {
> if (now < preventCheckpointsUntil) {
> LOG.info("But skipping this checkpoint since we are about to failover!");
> canceledCount++;
> continue;
> }
> assert canceler == null;
> canceler = new Canceler();
> }
> if (needCheckpoint) {
> doCheckpoint();
> // reset needRollbackCheckpoint to false only when we finish a ckpt
> // for rollback image
> if (needRollbackCheckpoint
> && namesystem.getFSImage().hasRollbackFSImage()) {
> namesystem.setCreatedRollbackImages(true);
> namesystem.setNeedRollbackFsImage(false);
> }
> lastCheckpointTime = now;
> }
> } catch (SaveNamespaceCancelledException ce) {
> LOG.info("Checkpoint was cancelled: " + ce.getMessage());
> canceledCount++;
> } catch (InterruptedException ie) {
> LOG.info("Interrupted during checkpointing", ie);
> // Probably requested shutdown.
> continue;
> } catch (Throwable t) {
> LOG.error("Exception in doCheckpoint", t);
> } finally {
> synchronized (cancelLock) {
> canceler = null;
> }
> }
> }
> }
> }
> {code}
>  
> can we use the fsimage's mostRecentCheckpointTime to do the check.
>  
> thanks,
> Gus



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2019-08-25 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915242#comment-16915242
 ] 

hemanthboyina commented on HDFS-14762:
--

[~zsxwing] , then we need to handle the exception with a proper message ? 

whats your expecting point on the issue ?

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Assignee: hemanthboyina
>Priority: Major
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2019-08-25 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14762:
-
Comment: was deleted

(was: [~zsxwing] , then we need to handle the exception with a proper message ? 

whats your expecting point on this ?)

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Assignee: hemanthboyina
>Priority: Major
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2019-08-25 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915241#comment-16915241
 ] 

hemanthboyina commented on HDFS-14762:
--

[~zsxwing] , then we need to handle the exception with a proper message ? 

whats your expecting point on this ?

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Assignee: hemanthboyina
>Priority: Major
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14781) DN Web UI : Navigate to Live Nodes in Datanodes Page when click on Live Nodes in Overview

2019-08-27 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-14781:


 Summary: DN Web UI : Navigate to Live Nodes in Datanodes Page when 
click on Live Nodes in Overview
 Key: HDFS-14781
 URL: https://issues.apache.org/jira/browse/HDFS-14781
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: hemanthboyina
Assignee: hemanthboyina


HDFS-14358 provided filter in DataNode UI
So clicking on live nodes in overview should navigate to DataNode UI with 
filter added as live , same for all DN states 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11291) Avoid unnecessary edit log for setStoragePolicy() and setReplication()

2019-08-28 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917450#comment-16917450
 ] 

hemanthboyina commented on HDFS-11291:
--

Hi [~surendrasingh] , I would like to work on this 

> Avoid unnecessary edit log for setStoragePolicy() and setReplication()
> --
>
> Key: HDFS-11291
> URL: https://issues.apache.org/jira/browse/HDFS-11291
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-11291.001.patch, HDFS-11291.002.patch
>
>
> We are setting the storage policy for file without checking the current 
> policy of file for avoiding extra getStoragePolicy() rpc call. Currently 
> namenode is not checking the current storage policy before setting new one 
> and adding edit logs. I think if the old and new storage policy is same we 
> can avoid set operation.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-14800) Data race between block report and recoverLease()

2019-08-30 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14800:
-
Comment: was deleted

(was: hi [~LiJinglun] when the file creation is completed , the 
BlockUnderConstructionFeature will become null )

> Data race between block report and recoverLease()
> -
>
> Key: HDFS-14800
> URL: https://issues.apache.org/jira/browse/HDFS-14800
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Priority: Major
> Attachments: PreCommit-HDFS-Build #27717 test - 
> testUpgradeFromRel1BBWImage [Jenkins].pdf
>
>
> I thought I fixed it in HDFS-10240, but I am seeing a similar race condition 
> in a precommit test again.
> https://builds.apache.org/job/PreCommit-HDFS-Build/27717/testReport/org.apache.hadoop.hdfs/TestDFSUpgradeFromImage/testUpgradeFromRel1BBWImage/
> {noformat}
> 2019-08-29 13:34:31,788 [IPC Server handler 9 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:recoverLeaseInternal(2682)) - 
> recoverLease: [Lease.  Holder: DFSClient_8256078, pending creates: 13], 
> src=/1kb-multiple-checksum-blocks-64-16 from client DFSClient_8256078
> 2019-08-29 13:34:31,788 [IPC Server handler 9 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:internalReleaseLease(3424)) - 
> Recovering [Lease.  Holder: DFSClient_8256078, pending creates: 13], 
> src=/1kb-multiple-checksum-blocks-64-16
> 2019-08-29 13:34:31,789 [IPC Server handler 9 on default port 39637] WARN  
> BlockStateChange 
> (BlockUnderConstructionFeature.java:initializeBlockRecovery(238)) - BLOCK* 
> BlockUnderConstructionFeature.initializeBlockRecovery: No blocks found, lease 
> removed.
> 2019-08-29 13:34:31,790 [IPC Server handler 9 on default port 39637] WARN  
> hdfs.StateChange (FSNamesystem.java:internalReleaseLease(3550)) - DIR* 
> NameSystem.internalReleaseLease: File /1kb-multiple-checksum-blocks-64-16 has 
> not been closed. Lease recovery is in progress. RecoveryId = 1031 for block 
> blk_7162739548153522810_1020
> 2019-08-29 13:34:31,791 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2645)) - BLOCK* processReport 
> 0x8f0bcadff51597e8: Processing first storage report for 
> DS-dd914776-8c9f-4f0d-9bd3-91a552bdc351 from datanode 
> 187b2e09-75e8-4bd0-ab46-90e32839a21d
> 2019-08-29 13:34:31,793 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2674)) - BLOCK* processReport 
> 0x8f0bcadff51597e8: from storage DS-dd914776-8c9f-4f0d-9bd3-91a552bdc351 node 
> DatanodeRegistration(127.0.0.1:32987, 
> datanodeUuid=187b2e09-75e8-4bd0-ab46-90e32839a21d, infoPort=42569, 
> infoSecurePort=0, ipcPort=39147, 
> storageInfo=lv=-57;cid=testClusterId;nsid=889473900;c=1567085670457), blocks: 
> 13, hasStaleStorage: false, processing time: 2 msecs, invalidatedBlocks: 0
> 2019-08-29 13:34:31,808 [BP-268013932-172.17.0.2-1567085670423 heartbeating 
> to localhost/127.0.0.1:39637] INFO  datanode.DataNode 
> (BPServiceActor.java:blockReport(432)) - Successfully sent block report 
> 0x8f0bcadff51597e8,  containing 1 storage report(s), of which we sent 1. The 
> reports had 13 total blocks and used 1 RPC(s). This took 4 msecs to generate 
> and 33 msecs for RPC and NN processing. Got back no commands.
> 2019-08-29 13:34:32,795 [IPC Server handler 8 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:recoverLeaseInternal(2682)) - 
> recoverLease: [Lease.  Holder: DFSClient_NONMAPREDUCE_1352689944_1, pending 
> creates: 1], src=/1kb-multiple-checksum-blocks-64-16 from client 
> DFSClient_NONMAPREDUCE_1352689944_1
> 2019-08-29 13:34:32,795 [IPC Server handler 8 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:internalReleaseLease(3424)) - 
> Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_1352689944_1, pending 
> creates: 1], src=/1kb-multiple-checksum-blocks-64-16
> 2019-08-29 13:34:32,796 [IPC Server handler 8 on default port 39637] INFO  
> blockmanagement.BlockManager (PendingRecoveryBlocks.java:add(80)) - Block 
> recovery attempt for blk_7162739548153522810_1020 rejected, as the previous 
> attempt times out in 88 seconds.
> {noformat}
> Looks like if client calls recoverLease on a file before NameNode receives a 
> block report from DataNodes, it will fail to recover the lease, and the file 
> remains unclosed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14800) Data race between block report and recoverLease()

2019-08-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919501#comment-16919501
 ] 

hemanthboyina commented on HDFS-14800:
--

hi [~LiJinglun] when the file creation is completed , the 
BlockUnderConstructionFeature will become null 

> Data race between block report and recoverLease()
> -
>
> Key: HDFS-14800
> URL: https://issues.apache.org/jira/browse/HDFS-14800
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Priority: Major
> Attachments: PreCommit-HDFS-Build #27717 test - 
> testUpgradeFromRel1BBWImage [Jenkins].pdf
>
>
> I thought I fixed it in HDFS-10240, but I am seeing a similar race condition 
> in a precommit test again.
> https://builds.apache.org/job/PreCommit-HDFS-Build/27717/testReport/org.apache.hadoop.hdfs/TestDFSUpgradeFromImage/testUpgradeFromRel1BBWImage/
> {noformat}
> 2019-08-29 13:34:31,788 [IPC Server handler 9 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:recoverLeaseInternal(2682)) - 
> recoverLease: [Lease.  Holder: DFSClient_8256078, pending creates: 13], 
> src=/1kb-multiple-checksum-blocks-64-16 from client DFSClient_8256078
> 2019-08-29 13:34:31,788 [IPC Server handler 9 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:internalReleaseLease(3424)) - 
> Recovering [Lease.  Holder: DFSClient_8256078, pending creates: 13], 
> src=/1kb-multiple-checksum-blocks-64-16
> 2019-08-29 13:34:31,789 [IPC Server handler 9 on default port 39637] WARN  
> BlockStateChange 
> (BlockUnderConstructionFeature.java:initializeBlockRecovery(238)) - BLOCK* 
> BlockUnderConstructionFeature.initializeBlockRecovery: No blocks found, lease 
> removed.
> 2019-08-29 13:34:31,790 [IPC Server handler 9 on default port 39637] WARN  
> hdfs.StateChange (FSNamesystem.java:internalReleaseLease(3550)) - DIR* 
> NameSystem.internalReleaseLease: File /1kb-multiple-checksum-blocks-64-16 has 
> not been closed. Lease recovery is in progress. RecoveryId = 1031 for block 
> blk_7162739548153522810_1020
> 2019-08-29 13:34:31,791 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2645)) - BLOCK* processReport 
> 0x8f0bcadff51597e8: Processing first storage report for 
> DS-dd914776-8c9f-4f0d-9bd3-91a552bdc351 from datanode 
> 187b2e09-75e8-4bd0-ab46-90e32839a21d
> 2019-08-29 13:34:31,793 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2674)) - BLOCK* processReport 
> 0x8f0bcadff51597e8: from storage DS-dd914776-8c9f-4f0d-9bd3-91a552bdc351 node 
> DatanodeRegistration(127.0.0.1:32987, 
> datanodeUuid=187b2e09-75e8-4bd0-ab46-90e32839a21d, infoPort=42569, 
> infoSecurePort=0, ipcPort=39147, 
> storageInfo=lv=-57;cid=testClusterId;nsid=889473900;c=1567085670457), blocks: 
> 13, hasStaleStorage: false, processing time: 2 msecs, invalidatedBlocks: 0
> 2019-08-29 13:34:31,808 [BP-268013932-172.17.0.2-1567085670423 heartbeating 
> to localhost/127.0.0.1:39637] INFO  datanode.DataNode 
> (BPServiceActor.java:blockReport(432)) - Successfully sent block report 
> 0x8f0bcadff51597e8,  containing 1 storage report(s), of which we sent 1. The 
> reports had 13 total blocks and used 1 RPC(s). This took 4 msecs to generate 
> and 33 msecs for RPC and NN processing. Got back no commands.
> 2019-08-29 13:34:32,795 [IPC Server handler 8 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:recoverLeaseInternal(2682)) - 
> recoverLease: [Lease.  Holder: DFSClient_NONMAPREDUCE_1352689944_1, pending 
> creates: 1], src=/1kb-multiple-checksum-blocks-64-16 from client 
> DFSClient_NONMAPREDUCE_1352689944_1
> 2019-08-29 13:34:32,795 [IPC Server handler 8 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:internalReleaseLease(3424)) - 
> Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_1352689944_1, pending 
> creates: 1], src=/1kb-multiple-checksum-blocks-64-16
> 2019-08-29 13:34:32,796 [IPC Server handler 8 on default port 39637] INFO  
> blockmanagement.BlockManager (PendingRecoveryBlocks.java:add(80)) - Block 
> recovery attempt for blk_7162739548153522810_1020 rejected, as the previous 
> attempt times out in 88 seconds.
> {noformat}
> Looks like if client calls recoverLease on a file before NameNode receives a 
> block report from DataNodes, it will fail to recover the lease, and the file 
> remains unclosed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14803) Truncate return value was wrong

2019-08-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919584#comment-16919584
 ] 

hemanthboyina edited comment on HDFS-14803 at 8/30/19 2:21 PM:
---

block size 11000 bytes ,to  truncate 100 bytes


was (Author: hemanthboyina):
block size 11000 , truncate 100

> Truncate return value was wrong
> ---
>
> Key: HDFS-14803
> URL: https://issues.apache.org/jira/browse/HDFS-14803
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> Even the truncate block is updated as Under Construction  and set with new 
> time stamp , Truncates returns false as result 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14800) Data race between block report and recoverLease()

2019-08-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919599#comment-16919599
 ] 

hemanthboyina commented on HDFS-14800:
--

when addblock() the block will be made as under construction state , and the 
replicas will be of 3 which has Data Node Storageinfos

> Data race between block report and recoverLease()
> -
>
> Key: HDFS-14800
> URL: https://issues.apache.org/jira/browse/HDFS-14800
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Priority: Major
> Attachments: PreCommit-HDFS-Build #27717 test - 
> testUpgradeFromRel1BBWImage [Jenkins].pdf
>
>
> I thought I fixed it in HDFS-10240, but I am seeing a similar race condition 
> in a precommit test again.
> https://builds.apache.org/job/PreCommit-HDFS-Build/27717/testReport/org.apache.hadoop.hdfs/TestDFSUpgradeFromImage/testUpgradeFromRel1BBWImage/
> {noformat}
> 2019-08-29 13:34:31,788 [IPC Server handler 9 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:recoverLeaseInternal(2682)) - 
> recoverLease: [Lease.  Holder: DFSClient_8256078, pending creates: 13], 
> src=/1kb-multiple-checksum-blocks-64-16 from client DFSClient_8256078
> 2019-08-29 13:34:31,788 [IPC Server handler 9 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:internalReleaseLease(3424)) - 
> Recovering [Lease.  Holder: DFSClient_8256078, pending creates: 13], 
> src=/1kb-multiple-checksum-blocks-64-16
> 2019-08-29 13:34:31,789 [IPC Server handler 9 on default port 39637] WARN  
> BlockStateChange 
> (BlockUnderConstructionFeature.java:initializeBlockRecovery(238)) - BLOCK* 
> BlockUnderConstructionFeature.initializeBlockRecovery: No blocks found, lease 
> removed.
> 2019-08-29 13:34:31,790 [IPC Server handler 9 on default port 39637] WARN  
> hdfs.StateChange (FSNamesystem.java:internalReleaseLease(3550)) - DIR* 
> NameSystem.internalReleaseLease: File /1kb-multiple-checksum-blocks-64-16 has 
> not been closed. Lease recovery is in progress. RecoveryId = 1031 for block 
> blk_7162739548153522810_1020
> 2019-08-29 13:34:31,791 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2645)) - BLOCK* processReport 
> 0x8f0bcadff51597e8: Processing first storage report for 
> DS-dd914776-8c9f-4f0d-9bd3-91a552bdc351 from datanode 
> 187b2e09-75e8-4bd0-ab46-90e32839a21d
> 2019-08-29 13:34:31,793 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2674)) - BLOCK* processReport 
> 0x8f0bcadff51597e8: from storage DS-dd914776-8c9f-4f0d-9bd3-91a552bdc351 node 
> DatanodeRegistration(127.0.0.1:32987, 
> datanodeUuid=187b2e09-75e8-4bd0-ab46-90e32839a21d, infoPort=42569, 
> infoSecurePort=0, ipcPort=39147, 
> storageInfo=lv=-57;cid=testClusterId;nsid=889473900;c=1567085670457), blocks: 
> 13, hasStaleStorage: false, processing time: 2 msecs, invalidatedBlocks: 0
> 2019-08-29 13:34:31,808 [BP-268013932-172.17.0.2-1567085670423 heartbeating 
> to localhost/127.0.0.1:39637] INFO  datanode.DataNode 
> (BPServiceActor.java:blockReport(432)) - Successfully sent block report 
> 0x8f0bcadff51597e8,  containing 1 storage report(s), of which we sent 1. The 
> reports had 13 total blocks and used 1 RPC(s). This took 4 msecs to generate 
> and 33 msecs for RPC and NN processing. Got back no commands.
> 2019-08-29 13:34:32,795 [IPC Server handler 8 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:recoverLeaseInternal(2682)) - 
> recoverLease: [Lease.  Holder: DFSClient_NONMAPREDUCE_1352689944_1, pending 
> creates: 1], src=/1kb-multiple-checksum-blocks-64-16 from client 
> DFSClient_NONMAPREDUCE_1352689944_1
> 2019-08-29 13:34:32,795 [IPC Server handler 8 on default port 39637] INFO  
> namenode.FSNamesystem (FSNamesystem.java:internalReleaseLease(3424)) - 
> Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_1352689944_1, pending 
> creates: 1], src=/1kb-multiple-checksum-blocks-64-16
> 2019-08-29 13:34:32,796 [IPC Server handler 8 on default port 39637] INFO  
> blockmanagement.BlockManager (PendingRecoveryBlocks.java:add(80)) - Block 
> recovery attempt for blk_7162739548153522810_1020 rejected, as the previous 
> attempt times out in 88 seconds.
> {noformat}
> Looks like if client calls recoverLease on a file before NameNode receives a 
> block report from DataNodes, it will fail to recover the lease, and the file 
> remains unclosed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14803) Truncate return value was wrong

2019-08-30 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-14803:


 Summary: Truncate return value was wrong
 Key: HDFS-14803
 URL: https://issues.apache.org/jira/browse/HDFS-14803
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina


Even the truncate block is updated as Under Construction  and set with new time 
stamp , Truncates returns false as result 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14803) Truncate return value was wrong

2019-08-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919584#comment-16919584
 ] 

hemanthboyina commented on HDFS-14803:
--

block size 11000 , truncate 100

> Truncate return value was wrong
> ---
>
> Key: HDFS-14803
> URL: https://issues.apache.org/jira/browse/HDFS-14803
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> Even the truncate block is updated as Under Construction  and set with new 
> time stamp , Truncates returns false as result 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14803) Truncate return value was wrong

2019-08-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919553#comment-16919553
 ] 

hemanthboyina commented on HDFS-14803:
--

{code:java}
  onBlockBoundary = unprotectedTruncate(fsn, iip, newLength,
  toRemoveBlocks, mtime, delta);
  if (!onBlockBoundary) {
  
truncateBlock = prepareFileForTruncate(fsn, iip, clientName,
clientMachine, lastBlockDelta, null);
  } 
   
   return new TruncateResult(onBlockBoundary, fsd.getAuditFileInfo(iip));{code}
the onblockboundary returns false , if the number of blocks is one . Even after 
preparing the file for truncate the result was not updated as true . 

 

> Truncate return value was wrong
> ---
>
> Key: HDFS-14803
> URL: https://issues.apache.org/jira/browse/HDFS-14803
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> Even the truncate block is updated as Under Construction  and set with new 
> time stamp , Truncates returns false as result 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2019-08-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919244#comment-16919244
 ] 

hemanthboyina commented on HDFS-14762:
--

hi [~ayushtkn] [~zsxwing]  written a unit test with "a:b" , was passed in 
windows and failed in linux , ideally it should be success at both 
I suggest we should fix that at root level i.e at Path constructor .

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Assignee: hemanthboyina
>Priority: Major
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13879) FileSystem: Add allowSnapshot, disallowSnapshot, getSnapshotDiffReport and getSnapshottableDirListing

2019-09-01 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920321#comment-16920321
 ] 

hemanthboyina commented on HDFS-13879:
--

should we go ahead with having new class in hadoop-common ?

> FileSystem: Add allowSnapshot, disallowSnapshot, getSnapshotDiffReport and 
> getSnapshottableDirListing
> -
>
> Key: HDFS-13879
> URL: https://issues.apache.org/jira/browse/HDFS-13879
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.1
>Reporter: Siyao Meng
>Assignee: hemanthboyina
>Priority: Major
>
> I wonder whether we should add allowSnapshot() and disallowSnapshot() to 
> FileSystem abstract class.
> I think we should because createSnapshot(), renameSnapshot() and 
> deleteSnapshot() are already part of it.
> Any reason why we don't want to do this?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14799) Do Not Call Map containsKey In Conjunction with get

2019-09-01 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920436#comment-16920436
 ] 

hemanthboyina commented on HDFS-14799:
--

Uupdated the patch, pls review [~ayushtkn]

> Do Not Call Map containsKey In Conjunction with get
> ---
>
> Key: HDFS-14799
> URL: https://issues.apache.org/jira/browse/HDFS-14799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: newbie, noob
> Attachments: HDFS-14799.001.patch
>
>
> {code:java|title=InvalidateBlocks.java}
>   private final Map>
>   nodeToBlocks = new HashMap<>();
>   private final Map>
>   nodeToECBlocks = new HashMap<>();
> ...
>   private LightWeightHashSet getBlocksSet(final DatanodeInfo dn) {
> if (nodeToBlocks.containsKey(dn)) {
>   return nodeToBlocks.get(dn);
> }
> return null;
>   }
>   private LightWeightHashSet getECBlocksSet(final DatanodeInfo dn) {
> if (nodeToECBlocks.containsKey(dn)) {
>   return nodeToECBlocks.get(dn);
> }
> return null;
>   }
> {code}
> There is no need to check for {{containsKey}} here since a call to {{get}} 
> will already return 'null' if the key is not there.  This just adds overhead 
> of having to dive into the Map twice to get the value.
> {code}
>   private LightWeightHashSet getECBlocksSet(final DatanodeInfo dn) {
> return nodeToECBlocks.get(dn);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14799) Do Not Call Map containsKey In Conjunction with get

2019-09-01 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14799:
-
Attachment: HDFS-14799.001.patch
Status: Patch Available  (was: Open)

> Do Not Call Map containsKey In Conjunction with get
> ---
>
> Key: HDFS-14799
> URL: https://issues.apache.org/jira/browse/HDFS-14799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: newbie, noob
> Attachments: HDFS-14799.001.patch
>
>
> {code:java|title=InvalidateBlocks.java}
>   private final Map>
>   nodeToBlocks = new HashMap<>();
>   private final Map>
>   nodeToECBlocks = new HashMap<>();
> ...
>   private LightWeightHashSet getBlocksSet(final DatanodeInfo dn) {
> if (nodeToBlocks.containsKey(dn)) {
>   return nodeToBlocks.get(dn);
> }
> return null;
>   }
>   private LightWeightHashSet getECBlocksSet(final DatanodeInfo dn) {
> if (nodeToECBlocks.containsKey(dn)) {
>   return nodeToECBlocks.get(dn);
> }
> return null;
>   }
> {code}
> There is no need to check for {{containsKey}} here since a call to {{get}} 
> will already return 'null' if the key is not there.  This just adds overhead 
> of having to dive into the Map twice to get the value.
> {code}
>   private LightWeightHashSet getECBlocksSet(final DatanodeInfo dn) {
> return nodeToECBlocks.get(dn);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14630) Configuration.getTimeDurationHelper() should not log time unit warning in info log.

2019-09-01 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14630:
-
Attachment: HDFS-14630.001.patch

> Configuration.getTimeDurationHelper() should not log time unit warning in 
> info log.
> ---
>
> Key: HDFS-14630
> URL: https://issues.apache.org/jira/browse/HDFS-14630
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Minor
> Attachments: HDFS-14630.001.patch, HDFS-14630.patch
>
>
> To solve [HDFS-12920|https://issues.apache.org/jira/browse/HDFS-12920] issue 
> we configured "dfs.client.datanode-restart.timeout" without time unit. No log 
> file is full of
> {noformat}
> 2019-06-22 20:13:14,605 | INFO  | pool-12-thread-1 | No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS 
> org.apache.hadoop.conf.Configuration.logDeprecation(Configuration.java:1409){noformat}
> No need to log this, just give the behavior in property description.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14630) Configuration.getTimeDurationHelper() should not log time unit warning in info log.

2019-09-01 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920434#comment-16920434
 ] 

hemanthboyina commented on HDFS-14630:
--

Updated the patch

> Configuration.getTimeDurationHelper() should not log time unit warning in 
> info log.
> ---
>
> Key: HDFS-14630
> URL: https://issues.apache.org/jira/browse/HDFS-14630
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Minor
> Attachments: HDFS-14630.001.patch, HDFS-14630.patch
>
>
> To solve [HDFS-12920|https://issues.apache.org/jira/browse/HDFS-12920] issue 
> we configured "dfs.client.datanode-restart.timeout" without time unit. No log 
> file is full of
> {noformat}
> 2019-06-22 20:13:14,605 | INFO  | pool-12-thread-1 | No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS 
> org.apache.hadoop.conf.Configuration.logDeprecation(Configuration.java:1409){noformat}
> No need to log this, just give the behavior in property description.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2019-09-01 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920435#comment-16920435
 ] 

hemanthboyina commented on HDFS-14762:
--

Updated the patch , pls review [~ayushtkn]

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14762.001.patch
>
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2019-09-02 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921182#comment-16921182
 ] 

hemanthboyina commented on HDFS-14762:
--

updated the patch , no test failures now

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14762.001.patch, HDFS-14762.002.patch, 
> HDFS-14762.003.patch
>
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14758) Decrease lease hard limit

2019-09-03 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921580#comment-16921580
 ] 

hemanthboyina commented on HDFS-14758:
--

should change the hard coded  value or should make it configurable ? 
[~jojochuang] [~eepayne] ?

> Decrease lease hard limit
> -
>
> Key: HDFS-14758
> URL: https://issues.apache.org/jira/browse/HDFS-14758
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Eric Payne
>Assignee: hemanthboyina
>Priority: Minor
>
> The hard limit is currently hard-coded to be 1 hour. This also determines the 
> NN automatic lease recovery interval. Something like 20 min will make more 
> sense.
> After the 5 min soft limit, other clients can recover the lease. If no one 
> else takes the lease away, the original client still can renew the lease 
> within the hard limit. So even after a NN full GC of 8 minutes, leases can be 
> still valid.
> However, there is one risk in reducing the hard limit. E.g. Reduced to 20 
> min. If the NN crashes and the manual failover takes more than 20 minutes, 
> clients will abort.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10663) Comparison of two System.nanoTime methods return values are against standard java recommendations.

2019-09-03 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-10663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921583#comment-16921583
 ] 

hemanthboyina commented on HDFS-10663:
--

test failures seems unrelated

> Comparison of two System.nanoTime methods return values are against standard 
> java recommendations.
> --
>
> Key: HDFS-10663
> URL: https://issues.apache.org/jira/browse/HDFS-10663
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Rushabh S Shah
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-10663.001.patch
>
>
> I was chasing a bug where the namenode didn't declare a datanode dead even 
> when the last contact time was 2.5 hours before.
> Before I could debug, the datanode was re-imaged (all the logs were deleted) 
> and the namenode was restarted and upgraded to new software.
> While debugging, I came across this heartbeat check code where the comparison 
> of two System.nanoTime is against the java's recommended way.
> Here is the hadoop code:
> {code:title=DatanodeManager.java|borderStyle=solid}
>   /** Is the datanode dead? */
>   boolean isDatanodeDead(DatanodeDescriptor node) {
> return (node.getLastUpdateMonotonic() <
> (monotonicNow() - heartbeatExpireInterval));
>   }
> {code}
> The montonicNow() is calculated as:
> {code:title=Time.java|borderStyle=solid}
>   public static long monotonicNow() {
> final long NANOSECONDS_PER_MILLISECOND = 100;
> return System.nanoTime() / NANOSECONDS_PER_MILLISECOND;
>   }
> {code}
> As per javadoc of System.nanoTime, it is clearly stated that we should 
> subtract two nano time output 
> {noformat}
> To compare two nanoTime values
>  long t0 = System.nanoTime();
>  ...
>  long t1 = System.nanoTime();
> one should use t1 - t0 < 0, not t1 < t0, because of the possibility of 
> numerical overflow.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14816) TestFileCorruption#testCorruptionWithDiskFailure is flaky

2019-09-03 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14816:
-
Attachment: HDFS-14816.001.patch
Status: Patch Available  (was: Open)

> TestFileCorruption#testCorruptionWithDiskFailure is flaky
> -
>
> Key: HDFS-14816
> URL: https://issues.apache.org/jira/browse/HDFS-14816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14816.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14816) TestFileCorruption#testCorruptionWithDiskFailure is flaky

2019-09-03 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-14816:


 Summary: TestFileCorruption#testCorruptionWithDiskFailure is flaky
 Key: HDFS-14816
 URL: https://issues.apache.org/jira/browse/HDFS-14816
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14816) TestFileCorruption#testCorruptionWithDiskFailure logic is not correct

2019-09-03 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921954#comment-16921954
 ] 

hemanthboyina commented on HDFS-14816:
--

in +HDFS-9958+ the UT added to check if any replica is in failed storage , but 
the UT not able to make blocks storage as failed

update the block storage to failed

> TestFileCorruption#testCorruptionWithDiskFailure logic is not correct
> -
>
> Key: HDFS-14816
> URL: https://issues.apache.org/jira/browse/HDFS-14816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14816.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14816) TestFileCorruption#testCorruptionWithDiskFailure logic is not correct

2019-09-03 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14816:
-
Summary: TestFileCorruption#testCorruptionWithDiskFailure logic is not 
correct  (was: TestFileCorruption#testCorruptionWithDiskFailure is flaky)

> TestFileCorruption#testCorruptionWithDiskFailure logic is not correct
> -
>
> Key: HDFS-14816
> URL: https://issues.apache.org/jira/browse/HDFS-14816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14816.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11291) Avoid unnecessary edit log for setStoragePolicy() and setReplication()

2019-08-28 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-11291:
-
Attachment: HDFS-11291.003.patch

> Avoid unnecessary edit log for setStoragePolicy() and setReplication()
> --
>
> Key: HDFS-11291
> URL: https://issues.apache.org/jira/browse/HDFS-11291
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-11291.001.patch, HDFS-11291.002.patch, 
> HDFS-11291.003.patch
>
>
> We are setting the storage policy for file without checking the current 
> policy of file for avoiding extra getStoragePolicy() rpc call. Currently 
> namenode is not checking the current storage policy before setting new one 
> and adding edit logs. I think if the old and new storage policy is same we 
> can avoid set operation.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14781) DN Web UI : Navigate to Live Nodes in Datanodes Page when click on Live Nodes in Overview

2019-08-28 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14781:
-
Attachment: HDFS-14781.001.patch
Status: Patch Available  (was: Open)

> DN Web UI : Navigate to Live Nodes in Datanodes Page when click on Live Nodes 
> in Overview
> -
>
> Key: HDFS-14781
> URL: https://issues.apache.org/jira/browse/HDFS-14781
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14781.001.patch
>
>
> HDFS-14358 provided filter in DataNode UI
> So clicking on live nodes in overview should navigate to DataNode UI with 
> filter added as live , same for all DN states 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14778) BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage state is failed

2019-08-28 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14778:
-
Attachment: HDFS-14778.001.patch
Status: Patch Available  (was: Open)

> BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage 
> state is failed
> ---
>
> Key: HDFS-14778
> URL: https://issues.apache.org/jira/browse/HDFS-14778
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14778.001.patch
>
>
> Should not mark the block as corrupt if the storage state is failed



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14793) BlockTokenSecretManager should LOG block token range it operates on.

2019-08-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918751#comment-16918751
 ] 

hemanthboyina commented on HDFS-14793:
--

Hi [~shv] are you working on this ? 

> BlockTokenSecretManager should LOG block token range it operates on.
> 
>
> Key: HDFS-14793
> URL: https://issues.apache.org/jira/browse/HDFS-14793
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> At startup log enough information to identified the range of block token keys 
> for the NameNode. This should make it easier to debug issues with block 
> tokens.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14798) Synchronize invalidateBlocks in DatanodeDescriptor

2019-08-29 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-14798:


Assignee: hemanthboyina

> Synchronize invalidateBlocks in DatanodeDescriptor
> --
>
> Key: HDFS-14798
> URL: https://issues.apache.org/jira/browse/HDFS-14798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: n00b, newbie
>
> {code:java|title=DatanodeDescriptor.java}
> public void resetBlocks() {
>   ...
>   this.invalidateBlocks.clear();
>   ...
> }
> public void clearBlockQueues() {
>   synchronized (invalidateBlocks) {
> this.invalidateBlocks.clear();
>   }
>   ...
> }
> {code}
> It may not be strictly necessary, but why risk it? The invalidateBlocks 
> should be protected in {{resetBlocks()}} just like it is in 
> {{clearBlockQueues()}}/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14798) Synchronize invalidateBlocks in DatanodeDescriptor

2019-08-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918817#comment-16918817
 ] 

hemanthboyina commented on HDFS-14798:
--

good catch [~belugabehr] , gone through the code , we are using *synchronized* 
in all places ** , missed out here

> Synchronize invalidateBlocks in DatanodeDescriptor
> --
>
> Key: HDFS-14798
> URL: https://issues.apache.org/jira/browse/HDFS-14798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: n00b, newbie
>
> {code:java|title=DatanodeDescriptor.java}
> public void resetBlocks() {
>   ...
>   this.invalidateBlocks.clear();
>   ...
> }
> public void clearBlockQueues() {
>   synchronized (invalidateBlocks) {
> this.invalidateBlocks.clear();
>   }
>   ...
> }
> {code}
> It may not be strictly necessary, but why risk it? The invalidateBlocks 
> should be protected in {{resetBlocks()}} just like it is in 
> {{clearBlockQueues()}}/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2019-08-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918827#comment-16918827
 ] 

hemanthboyina commented on HDFS-14762:
--

[~zsxwing] root cause fix should be at Path constructor , i think fixing at 
these two places only wont resolve the problem 
[~ayushtkn]  any thoughts ?

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Assignee: hemanthboyina
>Priority: Major
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14778) BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage state is failed

2019-08-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918837#comment-16918837
 ] 

hemanthboyina commented on HDFS-14778:
--

+HDFS-9958+ added null check , if the storage is null to return ,
{code:java}
  if (storageID != null) {
  storage = node.getStorageInfo(storageID);
  } 
  if (storage == null) {
   storage = storedBlock.findStorageInfo(node);
  }
  if (storage == null) { 
   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on 
{}",blk, dn);
   return;
  } {code}
    they missed out an extra check  , we should have a check if the storage 
state is failed , we shouldn't create a block if storage state is failed 
can refer the scenario from +HDFS-9958+

> BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage 
> state is failed
> ---
>
> Key: HDFS-14778
> URL: https://issues.apache.org/jira/browse/HDFS-14778
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14778.001.patch
>
>
> Should not mark the block as corrupt if the storage state is failed



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11291) Avoid unnecessary edit log for setStoragePolicy() and setReplication()

2019-08-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918901#comment-16918901
 ] 

hemanthboyina commented on HDFS-11291:
--

uploaded patch , please check [~surendrasingh]

> Avoid unnecessary edit log for setStoragePolicy() and setReplication()
> --
>
> Key: HDFS-11291
> URL: https://issues.apache.org/jira/browse/HDFS-11291
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-11291.001.patch, HDFS-11291.002.patch, 
> HDFS-11291.003.patch
>
>
> We are setting the storage policy for file without checking the current 
> policy of file for avoiding extra getStoragePolicy() rpc call. Currently 
> namenode is not checking the current storage policy before setting new one 
> and adding edit logs. I think if the old and new storage policy is same we 
> can avoid set operation.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14758) Decrease lease hard limit

2019-08-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918829#comment-16918829
 ] 

hemanthboyina commented on HDFS-14758:
--

[~jojochuang] [~eepayne]  should we go ahead by changing the default value to 
20mins ?

> Decrease lease hard limit
> -
>
> Key: HDFS-14758
> URL: https://issues.apache.org/jira/browse/HDFS-14758
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Eric Payne
>Assignee: hemanthboyina
>Priority: Minor
>
> The hard limit is currently hard-coded to be 1 hour. This also determines the 
> NN automatic lease recovery interval. Something like 20 min will make more 
> sense.
> After the 5 min soft limit, other clients can recover the lease. If no one 
> else takes the lease away, the original client still can renew the lease 
> within the hard limit. So even after a NN full GC of 8 minutes, leases can be 
> still valid.
> However, there is one risk in reducing the hard limit. E.g. Reduced to 20 
> min. If the NN crashes and the manual failover takes more than 20 minutes, 
> clients will abort.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-14567) If kms-acls is failed to load, and it will never be reload

2019-08-29 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14567:
-
Comment: was deleted

(was: thanks [~jojochuang]  for the review
uploaded the patch with changes)

>  If kms-acls is failed to load, and it will never be reload
> ---
>
> Key: HDFS-14567
> URL: https://issues.apache.org/jira/browse/HDFS-14567
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14567.001.patch, HDFS-14567.002.patch, 
> HDFS-14567.patch
>
>
> Scenario : through one automation tool , we are generating kms-acls , though 
> the generation of kms-acls is not completed , the system will detect a 
> modification of kms-alcs and it will try to load
> Before getting the configuration we are modifiying last reload time , code 
> shown below
> {code:java}
> private Configuration loadACLsFromFile() {
> LOG.debug("Loading ACLs file");
> lastReload = System.currentTimeMillis();
> Configuration conf = KMSConfiguration.getACLsConf();
> // triggering the resource loading.
> conf.get(Type.CREATE.getAclConfigKey());
> return conf;
> }{code}
> if the kms-acls file written within next 100ms , the changes will not be 
> loaded as this condition "newer = f.lastModified() - time > 100" never meets 
> because we have modified last reload time before getting the configuration
> {code:java}
> public static boolean isACLsFileNewer(long time) {
> boolean newer = false;
> String confDir = System.getProperty(KMS_CONFIG_DIR);
> if (confDir != null) {
> Path confPath = new Path(confDir);
> if (!confPath.isUriPathAbsolute()) {
> throw new RuntimeException("System property '" + KMS_CONFIG_DIR +
> "' must be an absolute path: " + confDir);
> }
> File f = new File(confDir, KMS_ACLS_XML);
> LOG.trace("Checking file {}, modification time is {}, last reload time is"
> + " {}", f.getPath(), f.lastModified(), time);
> // at least 100ms newer than time, we do this to ensure the file
> // has been properly closed/flushed
> newer = f.lastModified() - time > 100;
> }
> return newer;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14798) Synchronize invalidateBlocks in DatanodeDescriptor

2019-08-29 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14798:
-
Attachment: HDFS-14798.001.patch
Status: Patch Available  (was: Open)

> Synchronize invalidateBlocks in DatanodeDescriptor
> --
>
> Key: HDFS-14798
> URL: https://issues.apache.org/jira/browse/HDFS-14798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: n00b, newbie
> Attachments: HDFS-14798.001.patch
>
>
> {code:java|title=DatanodeDescriptor.java}
> public void resetBlocks() {
>   ...
>   this.invalidateBlocks.clear();
>   ...
> }
> public void clearBlockQueues() {
>   synchronized (invalidateBlocks) {
> this.invalidateBlocks.clear();
>   }
>   ...
> }
> {code}
> It may not be strictly necessary, but why risk it? The invalidateBlocks 
> should be protected in {{resetBlocks()}} just like it is in 
> {{clearBlockQueues()}}/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14798) Synchronize invalidateBlocks in DatanodeDescriptor

2019-08-29 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14798:
-
Attachment: (was: HDFS-14798.001.patch)

> Synchronize invalidateBlocks in DatanodeDescriptor
> --
>
> Key: HDFS-14798
> URL: https://issues.apache.org/jira/browse/HDFS-14798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: n00b, newbie
> Attachments: HDFS-14798.001.patch
>
>
> {code:java|title=DatanodeDescriptor.java}
> public void resetBlocks() {
>   ...
>   this.invalidateBlocks.clear();
>   ...
> }
> public void clearBlockQueues() {
>   synchronized (invalidateBlocks) {
> this.invalidateBlocks.clear();
>   }
>   ...
> }
> {code}
> It may not be strictly necessary, but why risk it? The invalidateBlocks 
> should be protected in {{resetBlocks()}} just like it is in 
> {{clearBlockQueues()}}/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2019-08-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918897#comment-16918897
 ] 

hemanthboyina commented on HDFS-14762:
--

fixing the issue at constructor level will make sure Path a:b will be accepted

a:b is not ambigious then ?

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Assignee: hemanthboyina
>Priority: Major
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14798) Synchronize invalidateBlocks in DatanodeDescriptor

2019-08-29 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14798:
-
Attachment: HDFS-14798.001.patch

> Synchronize invalidateBlocks in DatanodeDescriptor
> --
>
> Key: HDFS-14798
> URL: https://issues.apache.org/jira/browse/HDFS-14798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: n00b, newbie
> Attachments: HDFS-14798.001.patch
>
>
> {code:java|title=DatanodeDescriptor.java}
> public void resetBlocks() {
>   ...
>   this.invalidateBlocks.clear();
>   ...
> }
> public void clearBlockQueues() {
>   synchronized (invalidateBlocks) {
> this.invalidateBlocks.clear();
>   }
>   ...
> }
> {code}
> It may not be strictly necessary, but why risk it? The invalidateBlocks 
> should be protected in {{resetBlocks()}} just like it is in 
> {{clearBlockQueues()}}/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13220) Change lastCheckpointTime to use fsimage mostRecentCheckpointTime

2019-08-29 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918900#comment-16918900
 ] 

hemanthboyina commented on HDFS-13220:
--

hi [~surendrasingh] the present time should be taken as now() or monotonicnow() 
?

> Change lastCheckpointTime to use fsimage mostRecentCheckpointTime
> -
>
> Key: HDFS-13220
> URL: https://issues.apache.org/jira/browse/HDFS-13220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nie Gus
>Assignee: hemanthboyina
>Priority: Minor
> Attachments: HDFS-13220.002.patch, HDFS-13220.patch
>
>
> we found the our standby nn did not do the checkpoint, and the checkpoint 
> alert keep alert, we use the jmx last checkpoint time and 
> dfs.namenode.checkpoint.period to do the monitor check.
>  
> then check the code and log, found the standby NN are using monotonicNow, not 
> fsimage checkpoint time, so when Standby NN restart or switch to Active, then 
> the
> lastCheckpointTime in doWork will be reset. so there is risk standby nn 
> restart or stand active switch will cause the checkpoint delay. 
>  StandbyCheckpointer.java
> {code:java}
> private void doWork() {
> final long checkPeriod = 1000 * checkpointConf.getCheckPeriod();
> // Reset checkpoint time so that we don't always checkpoint
> // on startup.
> lastCheckpointTime = monotonicNow();
> while (shouldRun) {
> boolean needRollbackCheckpoint = namesystem.isNeedRollbackFsImage();
> if (!needRollbackCheckpoint) {
> try {
> Thread.sleep(checkPeriod);
> } catch (InterruptedException ie) {
> }
> if (!shouldRun) {
> break;
> }
> }
> try {
> // We may have lost our ticket since last checkpoint, log in again, just in 
> case
> if (UserGroupInformation.isSecurityEnabled()) {
> UserGroupInformation.getCurrentUser().checkTGTAndReloginFromKeytab();
> }
> final long now = monotonicNow();
> final long uncheckpointed = countUncheckpointedTxns();
> final long secsSinceLast = (now - lastCheckpointTime) / 1000;
> boolean needCheckpoint = needRollbackCheckpoint;
> if (needCheckpoint) {
> LOG.info("Triggering a rollback fsimage for rolling upgrade.");
> } else if (uncheckpointed >= checkpointConf.getTxnCount()) {
> LOG.info("Triggering checkpoint because there have been " +
> uncheckpointed + " txns since the last checkpoint, which " +
> "exceeds the configured threshold " +
> checkpointConf.getTxnCount());
> needCheckpoint = true;
> } else if (secsSinceLast >= checkpointConf.getPeriod()) {
> LOG.info("Triggering checkpoint because it has been " +
> secsSinceLast + " seconds since the last checkpoint, which " +
> "exceeds the configured interval " + checkpointConf.getPeriod());
> needCheckpoint = true;
> }
> synchronized (cancelLock) {
> if (now < preventCheckpointsUntil) {
> LOG.info("But skipping this checkpoint since we are about to failover!");
> canceledCount++;
> continue;
> }
> assert canceler == null;
> canceler = new Canceler();
> }
> if (needCheckpoint) {
> doCheckpoint();
> // reset needRollbackCheckpoint to false only when we finish a ckpt
> // for rollback image
> if (needRollbackCheckpoint
> && namesystem.getFSImage().hasRollbackFSImage()) {
> namesystem.setCreatedRollbackImages(true);
> namesystem.setNeedRollbackFsImage(false);
> }
> lastCheckpointTime = now;
> }
> } catch (SaveNamespaceCancelledException ce) {
> LOG.info("Checkpoint was cancelled: " + ce.getMessage());
> canceledCount++;
> } catch (InterruptedException ie) {
> LOG.info("Interrupted during checkpointing", ie);
> // Probably requested shutdown.
> continue;
> } catch (Throwable t) {
> LOG.error("Exception in doCheckpoint", t);
> } finally {
> synchronized (cancelLock) {
> canceler = null;
> }
> }
> }
> }
> }
> {code}
>  
> can we use the fsimage's mostRecentCheckpointTime to do the check.
>  
> thanks,
> Gus



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    1   2   3   4   5   6   7   8   9   >