[jira] [Created] (HDFS-17375) Take down docs for all Ozone versions prior to 1.3.0

2024-02-07 Thread Arpit Agarwal (Jira)
Arpit Agarwal created HDFS-17375:


 Summary: Take down docs for all Ozone versions prior to 1.3.0
 Key: HDFS-17375
 URL: https://issues.apache.org/jira/browse/HDFS-17375
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Arpit Agarwal


Can we offline the docs for all versions prior to 1.3.0. They are being indexed 
with higher priority in Google docs and have commands that fail on the latest 
releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16950) Gap in edits after -initializeSharedEdits

2023-04-27 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDFS-16950:


Assignee: (was: Karthik Palanisamy)

> Gap in edits after -initializeSharedEdits
> -
>
> Key: HDFS-16950
> URL: https://issues.apache.org/jira/browse/HDFS-16950
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, namenode
>Reporter: Karthik Palanisamy
>Priority: Major
>
> Namenode failed in the production cluster when JN role is migrated. 
> {code:java}
> ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start 
> namenode.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid xx, but got txid xx. {code}
> InitializeSharedEdits issued as part of the role migration step. Note, no 
> checkpoint is performed in the past few hours. 
> InitializeSharedEdits created a new log segment from the edit_inprogres 
> transaction and deleted all old transactions. 
> My ask here is to delete any edit transaction older than the fimage 
> transaction. But currently, it deletes all transactions and no check is 
> enforced in JNStorage#format(). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16950) Gap in edits after -initializeSharedEdits

2023-04-27 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDFS-16950:


Assignee: Karthik Palanisamy

> Gap in edits after -initializeSharedEdits
> -
>
> Key: HDFS-16950
> URL: https://issues.apache.org/jira/browse/HDFS-16950
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, namenode
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>
> Namenode failed in the production cluster when JN role is migrated. 
> {code:java}
> ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start 
> namenode.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid xx, but got txid xx. {code}
> InitializeSharedEdits issued as part of the role migration step. Note, no 
> checkpoint is performed in the past few hours. 
> InitializeSharedEdits created a new log segment from the edit_inprogres 
> transaction and deleted all old transactions. 
> My ask here is to delete any edit transaction older than the fimage 
> transaction. But currently, it deletes all transactions and no check is 
> enforced in JNStorage#format(). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16849) Terminate SNN when failing to perform EditLogTailing

2023-03-07 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697607#comment-17697607
 ] 

Arpit Agarwal commented on HDFS-16849:
--

I see, that is strange why the SNN couldn't recover on retries. That needs 
further investigation to check why.

> Terminate SNN when failing to perform EditLogTailing
> 
>
> Key: HDFS-16849
> URL: https://issues.apache.org/jira/browse/HDFS-16849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Karthik Palanisamy
>Priority: Major
>
> We should terminate SNN if we fail LogTrailing for sufficient JN. We found 
> this after Kerberos error. 
> {code:java}
> 2022-10-14 10:53:16,796 INFO 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms 
> (timeout=2 ms) for a response for selectStreamingInputStreams. Exceptions 
> so far: [:8485:  DestHost:destPort :8485 , LocalHost:localPort 
> /:0. Failed on local exception: 
> org.apache.hadoop.security.KerberosAuthException: Login failure for user: 
> hdfs/  javax.security.auth.login.LoginException: Client not found in 
> Kerberos database (6)]
> 2022-10-14 10:53:30,796 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input 
> streams from QJM to [:8485, :8485, :8485]. Skipping.
> java.io.IOException: Timed out waiting 2ms for a quorum of nodes to 
> respond.
>         at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:138)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectStreamingInputStreams(QuorumJournalManager.java:605)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:523)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:269)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1673)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1706)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:311)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:464)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:414)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:431)
>         at java.base/java.security.AccessController.doPrivileged(Native 
> Method)
>         at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:427)
>  {code}
>  
> We have no check whether sufficient JN met: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L280]
> So we should implement a similar check this,
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L395]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16849) Terminate SNN when failing to perform EditLogTailing

2023-03-07 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697590#comment-17697590
 ] 

Arpit Agarwal commented on HDFS-16849:
--

[~kpalanisamy] what causes the login failure? This particular error doesn't 
seem recoverable. Was it a cluster misconfiguration?

> Terminate SNN when failing to perform EditLogTailing
> 
>
> Key: HDFS-16849
> URL: https://issues.apache.org/jira/browse/HDFS-16849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Karthik Palanisamy
>Priority: Major
>
> We should terminate SNN if we fail LogTrailing for sufficient JN. We found 
> this after Kerberos error. 
> {code:java}
> 2022-10-14 10:53:16,796 INFO 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms 
> (timeout=2 ms) for a response for selectStreamingInputStreams. Exceptions 
> so far: [:8485:  DestHost:destPort :8485 , LocalHost:localPort 
> /:0. Failed on local exception: 
> org.apache.hadoop.security.KerberosAuthException: Login failure for user: 
> hdfs/  javax.security.auth.login.LoginException: Client not found in 
> Kerberos database (6)]
> 2022-10-14 10:53:30,796 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input 
> streams from QJM to [:8485, :8485, :8485]. Skipping.
> java.io.IOException: Timed out waiting 2ms for a quorum of nodes to 
> respond.
>         at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:138)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectStreamingInputStreams(QuorumJournalManager.java:605)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:523)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:269)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1673)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1706)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:311)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:464)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:414)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:431)
>         at java.base/java.security.AccessController.doPrivileged(Native 
> Method)
>         at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:427)
>  {code}
>  
> We have no check whether sufficient JN met: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L280]
> So we should implement a similar check this,
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L395]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16849) Terminate SNN when failing to perform EditLogTailing

2023-03-07 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697525#comment-17697525
 ] 

Arpit Agarwal commented on HDFS-16849:
--

It has been years since I looked at this code so this may be a dumb question. 
What is the benefit of self-terminating the SNN? Also what does the SNN do 
today - does it keep retrying? If this is a recoverable/potentially transient 
error then retrying may be the right thing to do.

> Terminate SNN when failing to perform EditLogTailing
> 
>
> Key: HDFS-16849
> URL: https://issues.apache.org/jira/browse/HDFS-16849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Karthik Palanisamy
>Priority: Major
>
> We should terminate SNN if we fail LogTrailing for sufficient JN. We found 
> this after Kerberos error. 
> {code:java}
> 2022-10-14 10:53:16,796 INFO 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms 
> (timeout=2 ms) for a response for selectStreamingInputStreams. Exceptions 
> so far: [:8485:  DestHost:destPort :8485 , LocalHost:localPort 
> /:0. Failed on local exception: 
> org.apache.hadoop.security.KerberosAuthException: Login failure for user: 
> hdfs/  javax.security.auth.login.LoginException: Client not found in 
> Kerberos database (6)]
> 2022-10-14 10:53:30,796 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input 
> streams from QJM to [:8485, :8485, :8485]. Skipping.
> java.io.IOException: Timed out waiting 2ms for a quorum of nodes to 
> respond.
>         at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:138)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectStreamingInputStreams(QuorumJournalManager.java:605)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:523)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:269)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1673)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1706)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:311)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:464)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:414)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:431)
>         at java.base/java.security.AccessController.doPrivileged(Native 
> Method)
>         at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:427)
>  {code}
>  
> We have no check whether sufficient JN met: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L280]
> So we should implement a similar check this,
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L395]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2021-10-04 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424036#comment-17424036
 ] 

Arpit Agarwal edited comment on HDFS-16252 at 10/4/21, 5:00 PM:


+1, thanks for the v2 patch. This is easier to understand.


was (Author: arpitagarwal):
+1, thanks for the updated patch. This is easier to understand.

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2021-10-04 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424036#comment-17424036
 ] 

Arpit Agarwal commented on HDFS-16252:
--

+1, thanks for the updated patch. This is easier to understand.

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2021-10-04 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424030#comment-17424030
 ] 

Arpit Agarwal commented on HDFS-16252:
--

+1

What an unfortunate choice of ordering.

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-16252.001.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15850) Superuser actions should be reported to external enforcers

2021-04-21 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-15850:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Superuser actions should be reported to external enforcers
> --
>
> Key: HDFS-15850
> URL: https://issues.apache.org/jira/browse/HDFS-15850
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: security
>Affects Versions: 3.3.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15850.branch-3.3.001.patch, HDFS-15850.v1.patch, 
> HDFS-15850.v2.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Currently, HDFS superuser checks or actions are not reported to external 
> enforcers like Ranger and the audit report provided by such external enforces 
> are not complete and are missing the superuser actions. To fix this, add a 
> new method to "AccessControlEnforcer" for all superuser checks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled

2021-04-19 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325352#comment-17325352
 ] 

Arpit Agarwal commented on HDFS-15614:
--

bq. , if providing an external command to create the Trash directory by admins 
is feasible and makes sense

The external command will add more friction to enabling the feature. We want it 
to be transparent as far as possible. I like the option to auto create the 
.Trash dir better.

> Initialize snapshot trash root during NameNode startup if enabled
> -
>
> Key: HDFS-15614
> URL: https://issues.apache.org/jira/browse/HDFS-15614
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up to HDFS-15607.
> Goal:
> Initialize (create) snapshot trash root for all existing snapshottable 
> directories if {{dfs.namenode.snapshot.trashroot.enabled}} is set to 
> {{true}}. So admins won't have to run {{dfsadmin -provisionTrash}} manually 
> on all those existing snapshottable directories.
> The change is expected to land in {{FSNamesystem}}.
> Discussion:
> 1. Currently in HDFS-15607, the snapshot trash root creation logic is on the 
> client side. But in order for NN to create it at startup, the logic must 
> (also) be implemented on the server side as well. -- which is also a 
> requirement by WebHDFS (HDFS-15612).
> 2. Alternatively, we can provide an extra parameter to the 
> {{-provisionTrash}} command like: {{dfsadmin -provisionTrash -all}} to 
> initialize/provision trash root on all existing snapshottable dirs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15879) Exclude slow nodes when choose targets for blocks

2021-03-06 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296736#comment-17296736
 ] 

Arpit Agarwal commented on HDFS-15879:
--

Hi [~tomscut], I may not have time to review in the coming days. Please try 
reaching out on the dev mailing list or Slack channel for reviewers.

> Exclude slow nodes when choose targets for blocks
> -
>
> Key: HDFS-15879
> URL: https://issues.apache.org/jira/browse/HDFS-15879
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Previously, we have monitored the slow nodes, related to 
> [HDFS-11194|https://issues.apache.org/jira/browse/HDFS-11194].
> We can use a thread to periodically collect these slow nodes into a set. Then 
> use the set to filter out slow nodes when choose targets for blocks.
> This feature can be configured to be turned on when needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15854) Make some parameters configurable for SlowDiskTracker and SlowPeerTracker

2021-03-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDFS-15854.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the contribution [~tomscut].

> Make some parameters configurable for SlowDiskTracker and SlowPeerTracker
> -
>
> Key: HDFS-15854
> URL: https://issues.apache.org/jira/browse/HDFS-15854
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Make some parameters configurable for SlowDiskTracker and SlowPeerTracker. 
> Related to https://issues.apache.org/jira/browse/HDFS-15814.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-15 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285041#comment-17285041
 ] 

Arpit Agarwal commented on HDFS-15808:
--

Hi  [~tomscut], I probably won't be able to look at this soon.

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11551) Handle SlowDiskReport from DataNode at the NameNode

2021-02-09 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281844#comment-17281844
 ] 

Arpit Agarwal commented on HDFS-11551:
--

Thanks for the contribution [~tomscut]! Would you consider filing a new Apache 
Jira and linking your PR there. Feel free to tag me on the new Jira.

> Handle SlowDiskReport from DataNode at the NameNode
> ---
>
> Key: HDFS-11551
> URL: https://issues.apache.org/jira/browse/HDFS-11551
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: HDFS-11551-branch-2.001.patch, 
> HDFS-11551-branch-2.002.patch, HDFS-11551.001.patch, HDFS-11551.002.patch, 
> HDFS-11551.003.patch, HDFS-11551.004.patch, HDFS-11551.005.patch, 
> HDFS-11551.006.patch, HDFS-11551.007.patch, HDFS-11551.008.patch, 
> HDFS-11551.009.patch, HDFS-11551.010.patch
>
>
> DataNodes send slow disk reports via heartbeats. Handle these reports at the 
> NameNode to find the topN slow disks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2020-12-10 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247447#comment-17247447
 ] 

Arpit Agarwal commented on HDFS-15725:
--

[~szetszwo], can you take a look at this patch?

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15725.001.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that 
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the 
> file and move the replicas to FINALIZED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-8432) Introduce a minimum compatible layout version to allow downgrade in more rolling upgrade use cases.

2020-05-06 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100927#comment-17100927
 ] 

Arpit Agarwal edited comment on HDFS-8432 at 5/6/20, 4:02 PM:
--

[~heliangjun] looks like some other folks worked on downgrade support for the 
upgrade from 2.x to 3.y. See HDFS-14396. You could try describing your exact 
problem there.


was (Author: arpitagarwal):
[~heliangjun] looks like some other folks worked on downgrade support for the 
upgrade. See HDFS-14396. You could try describing your exact problem there.

> Introduce a minimum compatible layout version to allow downgrade in more 
> rolling upgrade use cases.
> ---
>
> Key: HDFS-8432
> URL: https://issues.apache.org/jira/browse/HDFS-8432
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, rolling upgrades
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-8432-HDFS-Downgrade-Extended-Support.pdf, 
> HDFS-8432-branch-2.002.patch, HDFS-8432-branch-2.003.patch, 
> HDFS-8432.001.patch, HDFS-8432.002.patch
>
>
> Maintain the prior layout version during the upgrade window and reject 
> attempts to use new features until after the upgrade has been finalized.  
> This guarantees that the prior software version can read the fsimage and edit 
> logs if the administrator decides to downgrade.  This will make downgrade 
> usable for the majority of NameNode layout version changes, which just 
> involve introduction of new edit log operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8432) Introduce a minimum compatible layout version to allow downgrade in more rolling upgrade use cases.

2020-05-06 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100927#comment-17100927
 ] 

Arpit Agarwal commented on HDFS-8432:
-

[~heliangjun] looks like some other folks worked on downgrade support for the 
upgrade. See HDFS-14396. You could try describing your exact problem there.

> Introduce a minimum compatible layout version to allow downgrade in more 
> rolling upgrade use cases.
> ---
>
> Key: HDFS-8432
> URL: https://issues.apache.org/jira/browse/HDFS-8432
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, rolling upgrades
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-8432-HDFS-Downgrade-Extended-Support.pdf, 
> HDFS-8432-branch-2.002.patch, HDFS-8432-branch-2.003.patch, 
> HDFS-8432.001.patch, HDFS-8432.002.patch
>
>
> Maintain the prior layout version during the upgrade window and reject 
> attempts to use new features until after the upgrade has been finalized.  
> This guarantees that the prior software version can read the fsimage and edit 
> logs if the administrator decides to downgrade.  This will make downgrade 
> usable for the majority of NameNode layout version changes, which just 
> involve introduction of new edit log operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies

2020-03-25 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-15154:
-
   Fix Version/s: 3.3.0
Hadoop Flags: Reviewed
Target Version/s:   (was: 3.3.0)
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

I've committed this based on [~ayushtkn]'s +1. Thanks for the contribution 
[~swagle] and thanks Ayush for the reviews.

> Allow only hdfs superusers the ability to assign HDFS storage policies
> --
>
> Key: HDFS-15154
> URL: https://issues.apache.org/jira/browse/HDFS-15154
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Bob Cauthen
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, 
> HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, 
> HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, 
> HDFS-15154.09.patch, HDFS-15154.10.patch, HDFS-15154.11.patch, 
> HDFS-15154.12.patch, HDFS-15154.13.patch, HDFS-15154.14.patch, 
> HDFS-15154.15.patch
>
>
> Please provide a way to limit only HDFS superusers the ability to assign HDFS 
> Storage Policies to HDFS directories.
> Currently, and based on Jira HDFS-7093, all storage policies can be disabled 
> cluster wide by setting the following:
> dfs.storage.policy.enabled to false
> But we need a way to allow only HDFS superusers the ability to assign an HDFS 
> Storage Policy to an HDFS directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies

2020-03-24 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066220#comment-17066220
 ] 

Arpit Agarwal commented on HDFS-15154:
--

Sorry I didn't get time to look at this in detail. [~ayushtkn] if the changes 
look good to you then please go ahead. Main thing would be to ensure there is 
no incompatibility introduced by the change.

> Allow only hdfs superusers the ability to assign HDFS storage policies
> --
>
> Key: HDFS-15154
> URL: https://issues.apache.org/jira/browse/HDFS-15154
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Bob Cauthen
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, 
> HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, 
> HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, 
> HDFS-15154.09.patch, HDFS-15154.10.patch, HDFS-15154.11.patch, 
> HDFS-15154.12.patch, HDFS-15154.13.patch, HDFS-15154.14.patch
>
>
> Please provide a way to limit only HDFS superusers the ability to assign HDFS 
> Storage Policies to HDFS directories.
> Currently, and based on Jira HDFS-7093, all storage policies can be disabled 
> cluster wide by setting the following:
> dfs.storage.policy.enabled to false
> But we need a way to allow only HDFS superusers the ability to assign an HDFS 
> Storage Policy to an HDFS directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies

2020-03-24 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066220#comment-17066220
 ] 

Arpit Agarwal edited comment on HDFS-15154 at 3/24/20, 10:03 PM:
-

Sorry I didn't get time to look at this in detail. [~ayushtkn] if the changes 
look good to you then please go ahead and commit. Main thing would be to ensure 
there is no incompatibility introduced by the change.


was (Author: arpitagarwal):
Sorry I didn't get time to look at this in detail. [~ayushtkn] if the changes 
look good to you then please go ahead. Main thing would be to ensure there is 
no incompatibility introduced by the change.

> Allow only hdfs superusers the ability to assign HDFS storage policies
> --
>
> Key: HDFS-15154
> URL: https://issues.apache.org/jira/browse/HDFS-15154
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Bob Cauthen
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, 
> HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, 
> HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, 
> HDFS-15154.09.patch, HDFS-15154.10.patch, HDFS-15154.11.patch, 
> HDFS-15154.12.patch, HDFS-15154.13.patch, HDFS-15154.14.patch
>
>
> Please provide a way to limit only HDFS superusers the ability to assign HDFS 
> Storage Policies to HDFS directories.
> Currently, and based on Jira HDFS-7093, all storage policies can be disabled 
> cluster wide by setting the following:
> dfs.storage.policy.enabled to false
> But we need a way to allow only HDFS superusers the ability to assign an HDFS 
> Storage Policy to an HDFS directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-03-13 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058827#comment-17058827
 ] 

Arpit Agarwal commented on HDFS-15160:
--

Nice work [~sodonnell], the patch looks pretty good to me. Two questions:
# Is it safe to obtain/release volume references with a read lock?
# Why is the locking entirely removed from {{validateBlockFile}} - I assume you 
verified somehow that the callers are always holding the lock?


> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15205) FSImage sort section logic is wrong

2020-03-03 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-15205:
-
Target Version/s: 3.1.4, 3.2.2, 3.3.1

> FSImage sort section logic is wrong
> ---
>
> Key: HDFS-15205
> URL: https://issues.apache.org/jira/browse/HDFS-15205
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: angerszhu
>Priority: Blocker
> Attachments: HDFS-15205.001.patch
>
>
> When load FSImage, it will sort sections in FileSummary and load Section's in 
> SectionName enum sequence. But the sort method is wrong , when I use 
> branch-2.6.0 to load fsimage write by branch-2 with patch  
> https://issues.apache.org/jira/browse/HDFS-14771, it will throw NPE because 
> it load INODE first 
> {code:java}
> 2020-03-03 14:33:26,618 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadPermission(FSImageFormatPBINode.java:101)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:148)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadRootINode(FSImageFormatPBINode.java:332)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:218)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1036)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1020)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:741)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1092)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:780)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:609)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:666)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:838)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:817)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1538)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1606)
> {code}
> I print the load  order:
> {code:java}
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = INODE,  
> offset = 37, length = 11790829 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 37, length = 826591 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 826628, length = 828192 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 1654820, length = 835240 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 2490060, length = 833630 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 3323690, length = 909445 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 4233135, length = 866147 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 5099282, length = 1272751 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 6372033, length = 1311876 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 7683909, length = 1251510 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 8935419, length = 1296120 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 

[jira] [Commented] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies

2020-02-09 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033329#comment-17033329
 ] 

Arpit Agarwal commented on HDFS-15154:
--

Hi [~swagle], I recommend deprecating the old config key 
{{dfs.storage.policy.enabled}}. You can introduce a new config key like 
{{dfs.storage.policies.enabled}} and have it support three values like 
{{DISABLED, ADMINISTRATORS, ALL_USERS}}.

> Allow only hdfs superusers the ability to assign HDFS storage policies
> --
>
> Key: HDFS-15154
> URL: https://issues.apache.org/jira/browse/HDFS-15154
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Bob Cauthen
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch
>
>
> Please provide a way to limit only HDFS superusers the ability to assign HDFS 
> Storage Policies to HDFS directories.
> Currently, and based on Jira HDFS-7093, all storage policies can be disabled 
> cluster wide by setting the following:
> dfs.storage.policy.enabled to false
> But we need a way to allow only HDFS superusers the ability to assign an HDFS 
> Storage Policy to an HDFS directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14743) Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to support Authorization of mkdir, rm, rmdir, copy, move etc...

2020-02-03 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029332#comment-17029332
 ] 

Arpit Agarwal commented on HDFS-14743:
--

One potential downside of thread-locals is if we forget to save a new 
operation, then stale state can be passed to the authorizer plugin. This is 
impossible with parameter passing.

> Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to 
> support Authorization of mkdir, rm, rmdir, copy, move etc...
> ---
>
> Key: HDFS-14743
> URL: https://issues.apache.org/jira/browse/HDFS-14743
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Ramesh Mani
>Assignee: Wei-Chiu Chuang
>Priority: Critical
> Attachments: HDFS-14743 Enhance INodeAttributeProvider_ 
> AccessControlEnforcer Interface.pdf
>
>
> Enhance INodeAttributeProvider / AccessControlEnforcer Interface in HDFS to 
> support Authorization of mkdir, rm, rmdir, copy, move etc..., this should 
> help the implementors of the interface like Apache Ranger's HDFS 
> Authorization plugin to authorize and audit those command sets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14743) Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to support Authorization of mkdir, rm, rmdir, copy, move etc...

2020-02-03 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029329#comment-17029329
 ] 

Arpit Agarwal commented on HDFS-14743:
--

Thanks for sharing the PoC [~weichiu] . I understand why you chose 
thread-locals to store the operation info, it avoids significant refactoring of 
the code.

I don't have a strong opinion on which is better, thread-locals vs passing 
parameters all the way through. [~xyao] [~aengineer] do you have an opinion?

> Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to 
> support Authorization of mkdir, rm, rmdir, copy, move etc...
> ---
>
> Key: HDFS-14743
> URL: https://issues.apache.org/jira/browse/HDFS-14743
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Ramesh Mani
>Assignee: Wei-Chiu Chuang
>Priority: Critical
> Attachments: HDFS-14743 Enhance INodeAttributeProvider_ 
> AccessControlEnforcer Interface.pdf
>
>
> Enhance INodeAttributeProvider / AccessControlEnforcer Interface in HDFS to 
> support Authorization of mkdir, rm, rmdir, copy, move etc..., this should 
> help the implementors of the interface like Apache Ranger's HDFS 
> Authorization plugin to authorize and audit those command sets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15135) EC : ArrayIndexOutOfBoundsException in BlockRecoveryWorker#RecoveryTaskStriped.

2020-01-21 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-15135:
-
Component/s: erasure-coding

> EC : ArrayIndexOutOfBoundsException in 
> BlockRecoveryWorker#RecoveryTaskStriped.
> ---
>
> Key: HDFS-15135
> URL: https://issues.apache.org/jira/browse/HDFS-15135
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Surendra Singh Lilhore
>Assignee: Ravuri Sushma sree
>Priority: Major
>
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 8
>at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskStriped.recover(BlockRecoveryWorker.java:464)
>at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:602)
>at java.lang.Thread.run(Thread.java:745) {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-12-17 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998407#comment-16998407
 ] 

Arpit Agarwal commented on HDFS-15012:
--

+1 for the updated patch.

> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch
>
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {code}
> We confirmed that fsimage and edit files were NOT corrupted, as reverting 
> HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken 
> and failed to parse edit log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Updated] (HDDS-2542) Race condition between read and write stateMachineData

2019-11-19 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2542:

Priority: Blocker  (was: Critical)

> Race condition between read and write stateMachineData
> --
>
> Key: HDDS-2542
> URL: https://issues.apache.org/jira/browse/HDDS-2542
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Marton Elek
>Assignee: Shashikant Banerjee
>Priority: Blocker
>
> The write payload (the chunk itself) is sent to the Ratis as an external, 
> binary byte array. It's not part of the LogEntry and saved from an async 
> thread with calling ContainerStateMachine.writeStateMachineData
>  
> As it's an async thread it's possible that the stateMachineData is not yet 
> written when the data should be sent to the followers in the next heartbeat.
> By design a cache is used to avoid this issue but there are multiple problems 
> with the cache.
> First, the current cache size is chunkExecutor.getCorePoolSize() which is not 
> enough. By default it means 60 executor threads and a cache with size 60. But 
> in case of one very slow and 59 very fast writer the cache entries can be 
> invalidated before the write.
> In my tests (freon datanode-chunk-writer-generator) I have seen missed cache 
> hits even with cache size 5000.
> Second: as the readStateMachineData and writeStateMachien data are called 
> from two different thread there is a race condition independent from the the 
> cache size. It's possible that the write thread has not yet added the data to 
> the cache but the read thread needs it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12348) disable removing blocks to trash while rolling upgrade

2019-11-19 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977559#comment-16977559
 ] 

Arpit Agarwal commented on HDFS-12348:
--

Hi [~surendrasingh], this means you lose the ability to rollback. Just curious 
- how long does rolling upgrade take in your cluster?

> disable removing blocks to trash while rolling upgrade
> --
>
> Key: HDFS-12348
> URL: https://issues.apache.org/jira/browse/HDFS-12348
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Major
> Attachments: HDFS-12348.001.patch, HDFS-12348.002.patch, 
> HDFS-12348.003.patch
>
>
> DataNode remove block file and meta file to trash while rolling upgrade,and 
> do delete when
> executing finalize. 
> This  leads disk of datanode to be full, because
> (1) frequently creating and deleting files(eg,Hbase compaction);
> (2) cluster is very big, and rolling upgrade often last several days;
> Current our solution is clean trash by hand, but this is very dangerous in 
> product environment. 
> we think disable trash of datanode maybe a good method to avoid disk to be 
> full.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException

2019-11-07 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2372:

Target Version/s: 0.5.0

> Datanode pipeline is failing with NoSuchFileException
> -
>
> Key: HDDS-2372
> URL: https://issues.apache.org/jira/browse/HDDS-2372
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Assignee: Shashikant Banerjee
>Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp

2019-11-06 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2392:

Priority: Blocker  (was: Major)

> Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
> ---
>
> Key: HDDS-2392
> URL: https://issues.apache.org/jira/browse/HDDS-2392
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
> fails as the DNs fail to restart XceiverServerRatis. 
> RaftServer#start() fails with following exception:
> {code:java}
> java.io.IOException: java.lang.IllegalStateException: Not started
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Not started
>   at 
> org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
>   at 
> org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp

2019-11-06 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2392:

Target Version/s: 0.5.0

> Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
> ---
>
> Key: HDDS-2392
> URL: https://issues.apache.org/jira/browse/HDDS-2392
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp 
> fails as the DNs fail to restart XceiverServerRatis. 
> RaftServer#start() fails with following exception:
> {code:java}
> java.io.IOException: java.lang.IllegalStateException: Not started
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Not started
>   at 
> org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
>   at 
> org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
>   at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-529) Some Ozone DataNode logs go to a separate ozone.log file

2019-11-06 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDDS-529.

Resolution: Done

Thanks for the note [~Huachao]. This appears to be fixed now. I no longer see 
log output going to ozone.log file.

> Some Ozone DataNode logs go to a separate ozone.log file
> 
>
> Key: HDDS-529
> URL: https://issues.apache.org/jira/browse/HDDS-529
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Arpit Agarwal
>Assignee: YiSheng Lien
>Priority: Blocker
>  Labels: beta1
>
> Some, but not all DataNode logs go to a separate ozone.log file. Couple of 
> things to fix here:
> # The behavior should be consistent. All log messages should go to the new 
> log file.
> # The new log file name should follow the Hadoop log file convention e.g. 
> _hadoop---.log_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-2208) Propagate System Exceptions from OM transaction apply phase

2019-11-04 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reopened HDDS-2208:
-

> Propagate System Exceptions from OM transaction apply phase
> ---
>
> Key: HDDS-2208
> URL: https://issues.apache.org/jira/browse/HDDS-2208
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The change for HDDS-2206 tracks system exceptions during preExecute phase of 
> OM request handling.
> The current jira is to implement exception propagation once the OM request is 
> submitted to Ratis - when the handler is running validateAndUpdateCache for 
> the request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2404) Add support for Registered id as service identifier for CSR.

2019-11-04 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967062#comment-16967062
 ] 

Arpit Agarwal commented on HDDS-2404:
-

Thanks [~aengineer] and [~apurohit] for picking this up. Also cc 
[~hanishakoneru].

> Add support for Registered id as service identifier for CSR.
> 
>
> Key: HDDS-2404
> URL: https://issues.apache.org/jira/browse/HDDS-2404
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Anu Engineer
>Assignee: Abhishek Purohit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The SCM HA needs the ability to represent a group as a single entity. So that 
> Tokens for each of the OM which is part of an HA group can be honored by the 
> data nodes. 
> This patch adds the notion of a service group ID to the Certificate 
> Infrastructure. In the next JIRAs, we will use this capability when issuing 
> certificates to OM -- especially when they are in HA mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2274) Avoid buffer copying in Codec

2019-11-04 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDDS-2274:
---

Assignee: Attila Doroszlai  (was: Tsz-wo Sze)

> Avoid buffer copying in Codec
> -
>
> Key: HDDS-2274
> URL: https://issues.apache.org/jira/browse/HDDS-2274
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>
> Codec declares byte[] as a parameter in fromPersistedFormat(..) and a return 
> type in toPersistedFormat(..).  It leads to buffer copying when using it with 
> ByteString.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet

2019-11-04 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDDS-2270:
---

Assignee: Attila Doroszlai  (was: Tsz-wo Sze)

> Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
> --
>
> Key: HDDS-2270
> URL: https://issues.apache.org/jira/browse/HDDS-2270
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>
> ContainerStateMachine:
> - In loadSnapshot(..), it first reads the snapshotFile to a  byte[] and then 
> parses it to ContainerProtos.Container2BCSIDMapProto.  The buffer copying can 
> be avoided.
> {code}
> try (FileInputStream fin = new FileInputStream(snapshotFile)) {
>   byte[] container2BCSIDData = IOUtils.toByteArray(fin);
>   ContainerProtos.Container2BCSIDMapProto proto =
>   ContainerProtos.Container2BCSIDMapProto
>   .parseFrom(container2BCSIDData);
>   ...
> }
> {code}
> - persistContainerSet(..) has similar problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2291) Acceptance tests for OM HA

2019-11-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2291:

Status: Patch Available  (was: Open)

> Acceptance tests for OM HA
> --
>
> Key: HDDS-2291
> URL: https://issues.apache.org/jira/browse/HDDS-2291
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: HA, om
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add robot tests to test OM HA functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2388) Teragen test failure due to OM exception

2019-11-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDDS-2388:
---

Assignee: Hanisha Koneru

> Teragen test failure due to OM exception
> 
>
> Key: HDDS-2388
> URL: https://issues.apache.org/jira/browse/HDDS-2388
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.5.0
>
>
> Ran into below exception while running teragen:
> {code:java}
> Unable to get delta updates since sequenceNumber 79932 
> org.rocksdb.RocksDBException: Requested sequence not yet written in the db
>   at org.rocksdb.RocksDB.getUpdatesSince(Native Method)
>   at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3587)
>   at 
> org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:338)
>   at 
> org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3283)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:404)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handle(OzoneManagerRequestHandler.java:314)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:219)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:134)
>   at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:102)
>   at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2393) HDDS-1847 broke some unit tests

2019-11-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDDS-2393.
-
Resolution: Not A Problem

Reverted HDDS-1847 for now, so this should not be necessary.

Let's include the full fix there.

> HDDS-1847 broke some unit tests
> ---
>
> Key: HDDS-2393
> URL: https://issues.apache.org/jira/browse/HDDS-2393
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Chris Teoh
>Assignee: Chris Teoh
>Priority: Major
>
> Siyao Meng commented on HDDS-1847:
> --
> Looks like this commit breaks {{TestKeyManagerImpl}} in {{setUp()}} and 
> {{cleanup()}}. Run {{TestKeyManagerImpl#testListStatus()}} to steadily repro. 
> I believe there could be other tests that are broken by this.
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.getSpnegoPrincipal(StorageContainerManagerHttpServer.java:74)
> at 
> org.apache.hadoop.hdds.server.BaseHttpServer.(BaseHttpServer.java:81)
> at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.(StorageContainerManagerHttpServer.java:36)
> at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:330)
> at org.apache.hadoop.hdds.scm.TestUtils.getScm(TestUtils.java:544)
> at 
> org.apache.hadoop.ozone.om.TestKeyManagerImpl.setUp(TestKeyManagerImpl.java:150)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.om.TestKeyManagerImpl.cleanup(TestKeyManagerImpl.java:176)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent

2019-11-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1847:

Fix Version/s: (was: 0.5.0)

> Datanode Kerberos principal and keytab config key looks inconsistent
> 
>
> Key: HDDS-1847
> URL: https://issues.apache.org/jira/browse/HDDS-1847
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Eric Yang
>Assignee: Chris Teoh
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Ozone Kerberos configuration can be very confusing:
> | config name | Description |
> | hdds.scm.kerberos.principal | SCM service principal |
> | hdds.scm.kerberos.keytab.file | SCM service keytab file |
> | ozone.om.kerberos.principal | Ozone Manager service principal |
> | ozone.om.kerberos.keytab.file | Ozone Manager keytab file |
> | hdds.scm.http.kerberos.principal | SCM service spnego principal |
> | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file |
> | ozone.om.http.kerberos.principal | Ozone Manager spnego principal |
> | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file |
> | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file |
> | hdds.datanode.http.kerberos.principal | Datanode spnego principal |
> | dfs.datanode.kerberos.principal | Datanode service principal |
> | dfs.datanode.keytab.file | Datanode service keytab file |
> The prefix are very different for each of the datanode configuration.  It 
> would be nice to have some consistency for datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent

2019-11-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1847:

 Target Version/s: 0.5.0
Affects Version/s: (was: 0.5.0)

> Datanode Kerberos principal and keytab config key looks inconsistent
> 
>
> Key: HDDS-1847
> URL: https://issues.apache.org/jira/browse/HDDS-1847
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Chris Teoh
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Ozone Kerberos configuration can be very confusing:
> | config name | Description |
> | hdds.scm.kerberos.principal | SCM service principal |
> | hdds.scm.kerberos.keytab.file | SCM service keytab file |
> | ozone.om.kerberos.principal | Ozone Manager service principal |
> | ozone.om.kerberos.keytab.file | Ozone Manager keytab file |
> | hdds.scm.http.kerberos.principal | SCM service spnego principal |
> | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file |
> | ozone.om.http.kerberos.principal | Ozone Manager spnego principal |
> | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file |
> | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file |
> | hdds.datanode.http.kerberos.principal | Datanode spnego principal |
> | dfs.datanode.kerberos.principal | Datanode service principal |
> | dfs.datanode.keytab.file | Datanode service keytab file |
> The prefix are very different for each of the datanode configuration.  It 
> would be nice to have some consistency for datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent

2019-11-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reopened HDDS-1847:
-

I've reverted this to unblock other CI runs which may get stuck on the failing 
tests.

We can recommit with UT fixes.

> Datanode Kerberos principal and keytab config key looks inconsistent
> 
>
> Key: HDDS-1847
> URL: https://issues.apache.org/jira/browse/HDDS-1847
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Eric Yang
>Assignee: Chris Teoh
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Ozone Kerberos configuration can be very confusing:
> | config name | Description |
> | hdds.scm.kerberos.principal | SCM service principal |
> | hdds.scm.kerberos.keytab.file | SCM service keytab file |
> | ozone.om.kerberos.principal | Ozone Manager service principal |
> | ozone.om.kerberos.keytab.file | Ozone Manager keytab file |
> | hdds.scm.http.kerberos.principal | SCM service spnego principal |
> | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file |
> | ozone.om.http.kerberos.principal | Ozone Manager spnego principal |
> | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file |
> | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file |
> | hdds.datanode.http.kerberos.principal | Datanode spnego principal |
> | dfs.datanode.kerberos.principal | Datanode service principal |
> | dfs.datanode.keytab.file | Datanode service keytab file |
> The prefix are very different for each of the datanode configuration.  It 
> would be nice to have some consistency for datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent

2019-11-01 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965068#comment-16965068
 ] 

Arpit Agarwal edited comment on HDDS-1847 at 11/1/19 8:17 PM:
--

Looks like this commit did not get a full Anzix CI run in GitHub.

I propose reverting this and rolling up the changes to fix the unit tests in a 
new commit.


was (Author: arpitagarwal):
Looks like this commit did not get a full Anzix CI run in GitHub.

> Datanode Kerberos principal and keytab config key looks inconsistent
> 
>
> Key: HDDS-1847
> URL: https://issues.apache.org/jira/browse/HDDS-1847
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Eric Yang
>Assignee: Chris Teoh
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Ozone Kerberos configuration can be very confusing:
> | config name | Description |
> | hdds.scm.kerberos.principal | SCM service principal |
> | hdds.scm.kerberos.keytab.file | SCM service keytab file |
> | ozone.om.kerberos.principal | Ozone Manager service principal |
> | ozone.om.kerberos.keytab.file | Ozone Manager keytab file |
> | hdds.scm.http.kerberos.principal | SCM service spnego principal |
> | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file |
> | ozone.om.http.kerberos.principal | Ozone Manager spnego principal |
> | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file |
> | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file |
> | hdds.datanode.http.kerberos.principal | Datanode spnego principal |
> | dfs.datanode.kerberos.principal | Datanode service principal |
> | dfs.datanode.keytab.file | Datanode service keytab file |
> The prefix are very different for each of the datanode configuration.  It 
> would be nice to have some consistency for datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent

2019-11-01 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965068#comment-16965068
 ] 

Arpit Agarwal commented on HDDS-1847:
-

Looks like this commit did not get a full Anzix CI run in GitHub.

> Datanode Kerberos principal and keytab config key looks inconsistent
> 
>
> Key: HDDS-1847
> URL: https://issues.apache.org/jira/browse/HDDS-1847
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Eric Yang
>Assignee: Chris Teoh
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Ozone Kerberos configuration can be very confusing:
> | config name | Description |
> | hdds.scm.kerberos.principal | SCM service principal |
> | hdds.scm.kerberos.keytab.file | SCM service keytab file |
> | ozone.om.kerberos.principal | Ozone Manager service principal |
> | ozone.om.kerberos.keytab.file | Ozone Manager keytab file |
> | hdds.scm.http.kerberos.principal | SCM service spnego principal |
> | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file |
> | ozone.om.http.kerberos.principal | Ozone Manager spnego principal |
> | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file |
> | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file |
> | hdds.datanode.http.kerberos.principal | Datanode spnego principal |
> | dfs.datanode.kerberos.principal | Datanode service principal |
> | dfs.datanode.keytab.file | Datanode service keytab file |
> The prefix are very different for each of the datanode configuration.  It 
> would be nice to have some consistency for datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-426) Add field modificationTime for Volume and Bucket

2019-10-29 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962528#comment-16962528
 ] 

Arpit Agarwal commented on HDDS-426:


[~aengineer] I updated the resolution. However I realized the two linked bugs 
added createTime, this jira is requesting a modifiedTime.

> Add field modificationTime for Volume and Bucket
> 
>
> Key: HDDS-426
> URL: https://issues.apache.org/jira/browse/HDDS-426
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Dinesh Chitlangia
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie
>
> There are update operations that can be performed for Volume, Bucket and Key.
> While Key records the modification time, Volume and & Bucket do not capture 
> this.
>  
> This Jira proposes to add the required field to Volume and Bucket in order to 
> capture the modficationTime.
>  
> Current Status:
> {noformat}
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoVolume /dummyvol
> 2018-09-10 17:16:12 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "owner" : {
> "name" : "bilbo"
> },
> "quota" : {
> "unit" : "TB",
> "size" : 1048576
> },
> "volumeName" : "dummyvol",
> "createdOn" : "Mon, 10 Sep 2018 17:11:32 GMT",
> "createdBy" : "bilbo"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoBucket /dummyvol/mybuck
> 2018-09-10 17:15:25 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "volumeName" : "dummyvol",
> "bucketName" : "mybuck",
> "createdOn" : "Mon, 10 Sep 2018 17:12:09 GMT",
> "acls" : [ {
> "type" : "USER",
> "name" : "hadoop",
> "rights" : "READ_WRITE"
> }, {
> "type" : "GROUP",
> "name" : "users",
> "rights" : "READ_WRITE"
> }, {
> "type" : "USER",
> "name" : "spark",
> "rights" : "READ_WRITE"
> } ],
> "versioning" : "DISABLED",
> "storageType" : "DISK"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoKey /dummyvol/mybuck/myk1
> 2018-09-10 17:19:43 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "version" : 0,
> "md5hash" : null,
> "createdOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "modifiedOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "size" : 0,
> "keyName" : "myk1",
> "keyLocations" : [ ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-426) Add field modificationTime for Volume and Bucket

2019-10-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDDS-426.

Resolution: Duplicate

> Add field modificationTime for Volume and Bucket
> 
>
> Key: HDDS-426
> URL: https://issues.apache.org/jira/browse/HDDS-426
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Dinesh Chitlangia
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie
>
> There are update operations that can be performed for Volume, Bucket and Key.
> While Key records the modification time, Volume and & Bucket do not capture 
> this.
>  
> This Jira proposes to add the required field to Volume and Bucket in order to 
> capture the modficationTime.
>  
> Current Status:
> {noformat}
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoVolume /dummyvol
> 2018-09-10 17:16:12 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "owner" : {
> "name" : "bilbo"
> },
> "quota" : {
> "unit" : "TB",
> "size" : 1048576
> },
> "volumeName" : "dummyvol",
> "createdOn" : "Mon, 10 Sep 2018 17:11:32 GMT",
> "createdBy" : "bilbo"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoBucket /dummyvol/mybuck
> 2018-09-10 17:15:25 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "volumeName" : "dummyvol",
> "bucketName" : "mybuck",
> "createdOn" : "Mon, 10 Sep 2018 17:12:09 GMT",
> "acls" : [ {
> "type" : "USER",
> "name" : "hadoop",
> "rights" : "READ_WRITE"
> }, {
> "type" : "GROUP",
> "name" : "users",
> "rights" : "READ_WRITE"
> }, {
> "type" : "USER",
> "name" : "spark",
> "rights" : "READ_WRITE"
> } ],
> "versioning" : "DISABLED",
> "storageType" : "DISK"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoKey /dummyvol/mybuck/myk1
> 2018-09-10 17:19:43 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "version" : 0,
> "md5hash" : null,
> "createdOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "modifiedOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "size" : 0,
> "keyName" : "myk1",
> "keyLocations" : [ ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-426) Add field modificationTime for Volume and Bucket

2019-10-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reopened HDDS-426:


> Add field modificationTime for Volume and Bucket
> 
>
> Key: HDDS-426
> URL: https://issues.apache.org/jira/browse/HDDS-426
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Dinesh Chitlangia
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie
>
> There are update operations that can be performed for Volume, Bucket and Key.
> While Key records the modification time, Volume and & Bucket do not capture 
> this.
>  
> This Jira proposes to add the required field to Volume and Bucket in order to 
> capture the modficationTime.
>  
> Current Status:
> {noformat}
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoVolume /dummyvol
> 2018-09-10 17:16:12 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "owner" : {
> "name" : "bilbo"
> },
> "quota" : {
> "unit" : "TB",
> "size" : 1048576
> },
> "volumeName" : "dummyvol",
> "createdOn" : "Mon, 10 Sep 2018 17:11:32 GMT",
> "createdBy" : "bilbo"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoBucket /dummyvol/mybuck
> 2018-09-10 17:15:25 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "volumeName" : "dummyvol",
> "bucketName" : "mybuck",
> "createdOn" : "Mon, 10 Sep 2018 17:12:09 GMT",
> "acls" : [ {
> "type" : "USER",
> "name" : "hadoop",
> "rights" : "READ_WRITE"
> }, {
> "type" : "GROUP",
> "name" : "users",
> "rights" : "READ_WRITE"
> }, {
> "type" : "USER",
> "name" : "spark",
> "rights" : "READ_WRITE"
> } ],
> "versioning" : "DISABLED",
> "storageType" : "DISK"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoKey /dummyvol/mybuck/myk1
> 2018-09-10 17:19:43 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "version" : 0,
> "md5hash" : null,
> "createdOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "modifiedOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "size" : 0,
> "keyName" : "myk1",
> "keyLocations" : [ ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-426) Add field modificationTime for Volume and Bucket

2019-10-29 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-426:
---
Fix Version/s: (was: 0.5.0)

> Add field modificationTime for Volume and Bucket
> 
>
> Key: HDDS-426
> URL: https://issues.apache.org/jira/browse/HDDS-426
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Dinesh Chitlangia
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie
>
> There are update operations that can be performed for Volume, Bucket and Key.
> While Key records the modification time, Volume and & Bucket do not capture 
> this.
>  
> This Jira proposes to add the required field to Volume and Bucket in order to 
> capture the modficationTime.
>  
> Current Status:
> {noformat}
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoVolume /dummyvol
> 2018-09-10 17:16:12 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "owner" : {
> "name" : "bilbo"
> },
> "quota" : {
> "unit" : "TB",
> "size" : 1048576
> },
> "volumeName" : "dummyvol",
> "createdOn" : "Mon, 10 Sep 2018 17:11:32 GMT",
> "createdBy" : "bilbo"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoBucket /dummyvol/mybuck
> 2018-09-10 17:15:25 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "volumeName" : "dummyvol",
> "bucketName" : "mybuck",
> "createdOn" : "Mon, 10 Sep 2018 17:12:09 GMT",
> "acls" : [ {
> "type" : "USER",
> "name" : "hadoop",
> "rights" : "READ_WRITE"
> }, {
> "type" : "GROUP",
> "name" : "users",
> "rights" : "READ_WRITE"
> }, {
> "type" : "USER",
> "name" : "spark",
> "rights" : "READ_WRITE"
> } ],
> "versioning" : "DISABLED",
> "storageType" : "DISK"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoKey /dummyvol/mybuck/myk1
> 2018-09-10 17:19:43 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "version" : 0,
> "md5hash" : null,
> "createdOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "modifiedOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "size" : 0,
> "keyName" : "myk1",
> "keyLocations" : [ ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2341) Validate tar entry path during extraction

2019-10-25 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2341:

   Fix Version/s: 0.5.0
Target Version/s:   (was: 0.5.0)
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

+1 I've committed this. Thanks for the contribution [~adoroszlai] and the great 
test coverage.

> Validate tar entry path during extraction
> -
>
> Key: HDDS-2341
> URL: https://issues.apache.org/jira/browse/HDDS-2341
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Containers extracted from tar.gz should be validated to confine entries to 
> the archive's root directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2206) Separate handling for OMException and IOException in the Ozone Manager

2019-10-25 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2206:

Fix Version/s: (was: 0.5.0)

> Separate handling for OMException and IOException in the Ozone Manager
> --
>
> Key: HDDS-2206
> URL: https://issues.apache.org/jira/browse/HDDS-2206
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> As part of improving error propagation from the OM for ease of 
> troubleshooting and diagnosis, the proposal is to handle IOExceptions 
> separately from the business exceptions which are thrown as OMExceptions.
> Handling for OMExceptions will not be changed in this jira.
> Handling for IOExceptions will include logging the stacktrace on the server, 
> and propagation to the client under the control of a config parameter.
> Similar handling is also proposed for SCMException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2206) Separate handling for OMException and IOException in the Ozone Manager

2019-10-25 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2206:

Target Version/s: 0.5.0

> Separate handling for OMException and IOException in the Ozone Manager
> --
>
> Key: HDDS-2206
> URL: https://issues.apache.org/jira/browse/HDDS-2206
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> As part of improving error propagation from the OM for ease of 
> troubleshooting and diagnosis, the proposal is to handle IOExceptions 
> separately from the business exceptions which are thrown as OMExceptions.
> Handling for OMExceptions will not be changed in this jira.
> Handling for IOExceptions will include logging the stacktrace on the server, 
> and propagation to the client under the control of a config parameter.
> Similar handling is also proposed for SCMException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-2206) Separate handling for OMException and IOException in the Ozone Manager

2019-10-25 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reopened HDDS-2206:
-

Reverted this based on offline conversation with [~aengineer].

Anu has requested we add a config key to control this behavior.

> Separate handling for OMException and IOException in the Ozone Manager
> --
>
> Key: HDDS-2206
> URL: https://issues.apache.org/jira/browse/HDDS-2206
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> As part of improving error propagation from the OM for ease of 
> troubleshooting and diagnosis, the proposal is to handle IOExceptions 
> separately from the business exceptions which are thrown as OMExceptions.
> Handling for OMExceptions will not be changed in this jira.
> Handling for IOExceptions will include logging the stacktrace on the server, 
> and propagation to the client under the control of a config parameter.
> Similar handling is also proposed for SCMException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2206) Separate handling for OMException and IOException in the Ozone Manager

2019-10-25 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2206:

Labels:   (was: pull-request-available)

> Separate handling for OMException and IOException in the Ozone Manager
> --
>
> Key: HDDS-2206
> URL: https://issues.apache.org/jira/browse/HDDS-2206
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> As part of improving error propagation from the OM for ease of 
> troubleshooting and diagnosis, the proposal is to handle IOExceptions 
> separately from the business exceptions which are thrown as OMExceptions.
> Handling for OMExceptions will not be changed in this jira.
> Handling for IOExceptions will include logging the stacktrace on the server, 
> and propagation to the client under the control of a config parameter.
> Similar handling is also proposed for SCMException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2355:

Priority: Blocker  (was: Critical)

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2355:

Fix Version/s: 0.5.0

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
> Fix For: 0.5.0
>
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2273) Avoid buffer copying in GrpcReplicationService

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDDS-2273:
---

Assignee: Attila Doroszlai  (was: Tsz-wo Sze)

> Avoid buffer copying in GrpcReplicationService
> --
>
> Key: HDDS-2273
> URL: https://issues.apache.org/jira/browse/HDDS-2273
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>
> In GrpcOutputStream, it writes data to a ByteArrayOutputStream and copies 
> them to a ByteString.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2272) Avoid buffer copying in GrpcReplicationClient

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDDS-2272:
---

Assignee: Attila Doroszlai  (was: Tsz-wo Sze)

> Avoid buffer copying in GrpcReplicationClient
> -
>
> Key: HDDS-2272
> URL: https://issues.apache.org/jira/browse/HDDS-2272
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>
> In StreamDownloader.onNext, CopyContainerResponseProto is copied to a byte[] 
> and then it is written out to the stream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2357) Add replication factor option to new Freon tests

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2357:

Labels:   (was: pull-request-available)

> Add replication factor option to new Freon tests
> 
>
> Key: HDDS-2357
> URL: https://issues.apache.org/jira/browse/HDDS-2357
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: freon
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> New Freon generators (OCKG and OKG) use fixed replication factor of 3.  
> Sometimes it's useful to be able to test single-node replication.  The goal 
> of this task to add a command-line option to specify replication factor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2357) Add replication factor option to new Freon tests

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2357:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

+1 I've committed this via GitHub. Thanks for the contribution [~adoroszlai].

> Add replication factor option to new Freon tests
> 
>
> Key: HDDS-2357
> URL: https://issues.apache.org/jira/browse/HDDS-2357
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: freon
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> New Freon generators (OCKG and OKG) use fixed replication factor of 3.  
> Sometimes it's useful to be able to test single-node replication.  The goal 
> of this task to add a command-line option to specify replication factor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2358) Change to replication factor THREE in acceptance tests

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2358:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

+1 I've committed this via GitHub. Thanks for catching this [~adoroszlai].

> Change to replication factor THREE in acceptance tests
> --
>
> Key: HDDS-2358
> URL: https://issues.apache.org/jira/browse/HDDS-2358
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Acceptance test clusters are currently configured to replication factor of 
> ONE.  This way the [test 
> succeeds|https://elek.github.io/ozone-ci-q4/pr/pr-hdds-2305-c92ks/acceptance/summary.html]
>  in spite of Ratis leader election problems (note "term 1464"):
> {noformat:title=https://raw.githubusercontent.com/elek/ozone-ci-q4/master/pr/pr-hdds-2305-c92ks/acceptance/docker-ozones3-ozones3-s3-scm.log}
> datanode_2  | 2019-10-15 03:18:06,953 INFO impl.RoleInfo: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748: start FollowerState
> datanode_2  | 2019-10-15 03:18:06,953 INFO impl.FollowerState: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13-FollowerState was 
> interrupted: java.lang.InterruptedException: sleep interrupted
> datanode_2  | 2019-10-15 03:18:07,090 INFO impl.RaftServerImpl: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13: changes role from  
> FOLLOWER to FOLLOWER at term 1464 for 
> recognizeCandidate:5ce55bf6-dbcc-40fb-8fb4-6e78032f4b8c
> datanode_2  | 2019-10-15 03:18:07,090 INFO impl.RoleInfo: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748: shutdown FollowerState
> {noformat}
> The goal of this change is to configure factor of THREE, to allow acceptance 
> test to catch such issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2358) Change to replication factor THREE in acceptance tests

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2358:

Labels:   (was: pull-request-available)

> Change to replication factor THREE in acceptance tests
> --
>
> Key: HDDS-2358
> URL: https://issues.apache.org/jira/browse/HDDS-2358
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Acceptance test clusters are currently configured to replication factor of 
> ONE.  This way the [test 
> succeeds|https://elek.github.io/ozone-ci-q4/pr/pr-hdds-2305-c92ks/acceptance/summary.html]
>  in spite of Ratis leader election problems (note "term 1464"):
> {noformat:title=https://raw.githubusercontent.com/elek/ozone-ci-q4/master/pr/pr-hdds-2305-c92ks/acceptance/docker-ozones3-ozones3-s3-scm.log}
> datanode_2  | 2019-10-15 03:18:06,953 INFO impl.RoleInfo: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748: start FollowerState
> datanode_2  | 2019-10-15 03:18:06,953 INFO impl.FollowerState: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13-FollowerState was 
> interrupted: java.lang.InterruptedException: sleep interrupted
> datanode_2  | 2019-10-15 03:18:07,090 INFO impl.RaftServerImpl: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13: changes role from  
> FOLLOWER to FOLLOWER at term 1464 for 
> recognizeCandidate:5ce55bf6-dbcc-40fb-8fb4-6e78032f4b8c
> datanode_2  | 2019-10-15 03:18:07,090 INFO impl.RoleInfo: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748: shutdown FollowerState
> {noformat}
> The goal of this change is to configure factor of THREE, to allow acceptance 
> test to catch such issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1228) Chunk Scanner Checkpoints

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1228:

Labels:   (was: pull-request-available)

> Chunk Scanner Checkpoints
> -
>
> Key: HDDS-1228
> URL: https://issues.apache.org/jira/browse/HDDS-1228
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Reporter: Supratim Deka
>Assignee: Attila Doroszlai
>Priority: Critical
> Fix For: 0.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Checkpoint the progress of the chunk verification scanner.
> Save the checkpoint persistently to support scanner resume from checkpoint - 
> after a datanode restart.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1228) Chunk Scanner Checkpoints

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1228:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1 I've committed this. Thanks for the contribution [~adoroszlai].

> Chunk Scanner Checkpoints
> -
>
> Key: HDDS-1228
> URL: https://issues.apache.org/jira/browse/HDDS-1228
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Reporter: Supratim Deka
>Assignee: Attila Doroszlai
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Checkpoint the progress of the chunk verification scanner.
> Save the checkpoint persistently to support scanner resume from checkpoint - 
> after a datanode restart.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2351) Fix write performance issue in Non-HA OM

2019-10-23 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958226#comment-16958226
 ] 

Arpit Agarwal commented on HDDS-2351:
-

Thanka for reporting this [~rajesh.balamohan] can you please share your configs.

[~bharat] did some testing and found just 40% degradation with the sync flag 
enabled because we do batching of writes.

Bharat - can you please share your results and config files?

> Fix write performance issue in Non-HA OM 
> -
>
> Key: HDDS-2351
> URL: https://issues.apache.org/jira/browse/HDDS-2351
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Rajesh Balamohan
>Assignee: Nanda kumar
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2019-10-23 at 2.27.05 PM.png
>
>
> HDDS-2333 enables sync option in OM non-HA mode. However, this flushes very 
> frequently causing disk to saturate its IOPS soon. It creates way too small 
> write workloads and disk hits limit.
> To put it in perspective, in simple write benchmark of creating keys with 10 
> clients, it generates {{0.33MB/s}} write workload with {{116 IOPS}}. This 
> causes disk to saturate at 98%.
>  
> !Screenshot 2019-10-23 at 2.27.05 PM.png|width=621,height=370!
>  
> Reverting back HDDS-2333 fixes this issue. I see >{{10x}} degradation with 
> HDDS-2333.
> In case non-HA is supported in OM, it would be good to call it out. 
> Currently, code explicitly enables sync option. 
> [https://github.com/apache/hadoop-ozone/commit/c6c9794fc590371ad9c3b8fdcd7a36ed42909b40#diff-3ed3ab4891d7b4fa31ca96740b78ae5bR261]
>  
> I used  {{commit 1baa5a158d13f469c12bef86ef288d60ef0eee85}} in master branch.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2025) Update the Dockerfile of the official apache/ozone image and use latest 0.4.1 release

2019-10-21 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956627#comment-16956627
 ] 

Arpit Agarwal commented on HDDS-2025:
-

This says patch available however there is no patch. May I cancel patch to 
remove it from the review queue?

> Update the Dockerfile of the official apache/ozone image and use latest 0.4.1 
> release
> -
>
> Key: HDDS-2025
> URL: https://issues.apache.org/jira/browse/HDDS-2025
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>
> The hadoop-docker-ozone repository contains the definition of the 
> apache/ozone image. 
> https://github.com/apache/hadoop-docker-ozone/tree/ozone-latest
> It creates a docker packaging for the voted and released artifact, therefore 
> it can be released after the final vote.
> Since the latest release we did some modification in our Dockerfiles. We need 
> to apply the changes to the official image as well. Especially:
>  1. use ozone-runner as a base image instead of hadoop-runner
>  2. rename ozoneManager service to om as we did everywhere
>  3. Adjust the starter location (the script is moved to the released tar file)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2330) Random key generator can get stuck

2019-10-21 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2330:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

+1 Merged via GitHub. Thanks for the contribution [~adoroszlai].

> Random key generator can get stuck
> --
>
> Key: HDDS-2330
> URL: https://issues.apache.org/jira/browse/HDDS-2330
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: freon
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Freon's random key generator can get stuck waiting for completion (without 
> any hint to what's happening) if object creation encounters any 
> non-IOException.
> Steps to reproduce:
> # Start Ozone cluster with 1 datanode
> # Start Freon (5K keys of size 1MB)
> Result: after a few hundred keys progress stops.
> {noformat}
> $ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1 
> --numOfBuckets 1 --replicationType RATIS --factor ONE --keySize $(echo '2^20' 
> | bc -lq) --numOfKeys $(echo '5 * 2^10' | bc -lq) --bufferSize $(echo '2^16' 
> | bc -lq)
> 2019-10-18 10:44:45,224 INFO impl.MetricsConfig: Loaded properties from 
> hadoop-metrics2.properties
> 2019-10-18 10:44:45,381 INFO impl.MetricsSystemImpl: Scheduled Metric 
> snapshot period at 10 second(s).
> 2019-10-18 10:44:45,381 INFO impl.MetricsSystemImpl: ozone-freon metrics 
> system started
> 2019-10-18 10:44:47,140 [main] INFO   - Number of Threads: 1
> 2019-10-18 10:44:47,145 [main] INFO   - Number of Volumes: 1.
> 2019-10-18 10:44:47,146 [main] INFO   - Number of Buckets per Volume: 1.
> 2019-10-18 10:44:47,146 [main] INFO   - Number of Keys per Bucket: 5120.
> 2019-10-18 10:44:47,147 [main] INFO   - Key size: 1048576 bytes
> 2019-10-18 10:44:47,147 [main] INFO   - Buffer size: 65536 bytes
> 2019-10-18 10:44:47,147 [main] INFO   - validateWrites : false
> 2019-10-18 10:44:47,151 [main] INFO   - Starting progress bar Thread.
> ...
>  7.07% |  
>|  362/5120 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2343) Add immutable entries in to the DoubleBuffer for Bucket requests.

2019-10-21 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2343:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

+1 committed via GitHub. Thanks for the contribution [~bharat].

> Add immutable entries in to the DoubleBuffer for Bucket requests.
> -
>
> Key: HDDS-2343
> URL: https://issues.apache.org/jira/browse/HDDS-2343
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> OMBucketCreateRequest.java L181:
> omClientResponse =
>  new OMBucketCreateResponse(omBucketInfo,
>  omResponse.build());
>  
> We add this to double-buffer, and double-buffer flushThread which is running 
> in the background when picks up, converts to protoBuf and to ByteArray and 
> write to rocksDB tables. So, during this conversion(This conversion will be 
> done without any lock acquire), if any other request changes internal 
> structure(like acls list) of OMBucketInfo we might get 
> ConcurrentModificationException.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2333) Enable sync option for OM non-HA

2019-10-21 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDDS-2333.
-
Fix Version/s: 0.5.0
   Resolution: Fixed

Merged this with [~aengineer]'s +1 on the PR. Thanks for the review Anu and 
thanks Bharat for the contribution.

> Enable sync option for OM non-HA 
> -
>
> Key: HDDS-2333
> URL: https://issues.apache.org/jira/browse/HDDS-2333
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In OM non-HA when double buffer flushes, it should commit with sync turned 
> on. As in non-HA when power failure/system crashes, the operations which are 
> acknowledged by OM might be lost in this kind of scenario. (As in rocks DB 
> with Sync false, the flush is asynchronous and it will not persist to storage 
> system)
>  
> In HA, this is not a problem because the guarantee is provided by ratis and 
> ratis logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2324) Enhance locking mechanism in OzoneMangaer

2019-10-18 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954808#comment-16954808
 ] 

Arpit Agarwal commented on HDDS-2324:
-

Have we tried switching the fairness setting to unfair? Java RW locks perform 
best in unfair mode.

> Enhance locking mechanism in OzoneMangaer
> -
>
> Key: HDDS-2324
> URL: https://issues.apache.org/jira/browse/HDDS-2324
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Nanda kumar
>Priority: Major
>  Labels: performance
> Attachments: om_lock_100_percent_read_benchmark.svg, 
> om_lock_reader_and_writer_workload.svg
>
>
> OM has reentrant RW lock. With 100% read or 100% write benchmarks, it works 
> out reasonably fine. There is already a ticket to optimize the write codepath 
> (as it incurs reading from DB for key checks).
> However, when small amount of write workload (e.g 3-5 threads) is added to 
> the running read benchmark, throughput suffers significantly. This is due to 
> the fact that the reader threads would get blocked often.  I have observed 
> around 10x slower throughput (i.e 100% read benchmark was running at 12,000 
> TPS and with couple of writer threads added to it, it goes down to 1200-1800 
> TPS).
> 1. Instead of single write lock, one option could be good to scale out the 
> write lock depending on the number of cores available in the system and 
> acquire relevant lock by hashing the key.
> 2. Another option is to explore if we can make use of StampedLocks of JDK 
> 8.x, which scales well when multiple readers and writers are there. But it is 
> not a reentrant lock. So need to explore whether it can be an option or not.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2240) Command line tool for OM Admin

2019-10-14 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2240:

Status: Patch Available  (was: Open)

> Command line tool for OM Admin
> --
>
> Key: HDDS-2240
> URL: https://issues.apache.org/jira/browse/HDDS-2240
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> A command line tool (*ozone omha*) to get information related to OM HA. 
> This Jira proposes to add the _getServiceState_ option for OM HA which lists 
> all the OMs in the service and their corresponding Ratis server roles 
> (LEADER/ FOLLOWER). 
> We can later add more options to this tool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1985) Fix listVolumes API

2019-10-14 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951311#comment-16951311
 ] 

Arpit Agarwal commented on HDDS-1985:
-

I believe this was going to be resolved as "Won't Fix". [~bharat] can you 
confirm and resolve?

> Fix listVolumes API
> ---
>
> Key: HDDS-1985
> URL: https://issues.apache.org/jira/browse/HDDS-1985
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> This Jira is to fix lisVolumes API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listVolumes, it should use both 
> in-memory cache and rocksdb volume table to list volumes for a user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2303) [IGNORE] Test Jira

2019-10-14 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951153#comment-16951153
 ] 

Arpit Agarwal edited comment on HDDS-2303 at 10/14/19 4:57 PM:
---

Test comment (edited).


was (Author: arpitagarwal):
Test comment.

> [IGNORE] Test Jira
> --
>
> Key: HDDS-2303
> URL: https://issues.apache.org/jira/browse/HDDS-2303
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Priority: Major
>
> Ignore this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2303) [IGNORE] Test Jira

2019-10-14 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951153#comment-16951153
 ] 

Arpit Agarwal commented on HDDS-2303:
-

Test comment.

> [IGNORE] Test Jira
> --
>
> Key: HDDS-2303
> URL: https://issues.apache.org/jira/browse/HDDS-2303
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Priority: Major
>
> Ignore this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2303) [IGNORE] Test Jira

2019-10-14 Thread Arpit Agarwal (Jira)
Arpit Agarwal created HDDS-2303:
---

 Summary: [IGNORE] Test Jira
 Key: HDDS-2303
 URL: https://issues.apache.org/jira/browse/HDDS-2303
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Arpit Agarwal


Ignore this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2213) Reduce key provider loading log level in OzoneFileSystem#getAdditionalTokenIssuers

2019-10-11 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDDS-2213.
-
Fix Version/s: 0.5.0
   Resolution: Fixed

I've merged this.

> Reduce key provider loading log level in 
> OzoneFileSystem#getAdditionalTokenIssuers
> --
>
> Key: HDDS-2213
> URL: https://issues.apache.org/jira/browse/HDDS-2213
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Shweta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> OzoneFileSystem#getAdditionalTokenIssuers log an error when secure client 
> tries to collect ozone delegation token to run MR/Spark jobs but ozone file 
> system does not have a kms provider configured. In this case, we simply 
> return null provider here in the code below. This is a benign error and we 
> should reduce the log level to debug level.
> {code:java}
> KeyProvider keyProvider;
>  try {
>   keyProvider = getKeyProvider(); }
> catch (IOException ioe) {
>   LOG.error("Error retrieving KeyProvider.", ioe);
>   return null;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-10-11 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-14305:
-
Fix Version/s: (was: 3.2.2)
   (was: 3.1.4)
   (was: 3.3.0)
   (was: 2.10.0)

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn, release-blocker
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-10-11 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-14305:
-
Target Version/s: 2.10.0, 3.3.0, 3.1.4, 3.2.2  (was: 2.10.0)

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn, release-blocker
> Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-10-11 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reopened HDFS-14305:
--

Reopening because this still needs to be fixed correctly.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn, release-blocker
> Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2194) Replication of Container fails with "Only closed containers could be exported"

2019-10-08 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947358#comment-16947358
 ] 

Arpit Agarwal commented on HDDS-2194:
-

It would be useful to have the log message include the actual container state 
on error.

> Replication of Container fails with "Only closed containers could be exported"
> --
>
> Key: HDDS-2194
> URL: https://issues.apache.org/jira/browse/HDDS-2194
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Mukul Kumar Singh
>Assignee: Bharat Viswanadham
>Priority: Major
>
> Replication of Container fails with "Only closed containers could be exported"
> cc: [~nanda]
> {code}
> 2019-09-26 15:00:17,640 [grpc-default-executor-13] INFO  
> replication.GrpcReplicationService (GrpcReplicationService.java:download(57)) 
> - Streaming container data (37) to other
> datanode
> Sep 26, 2019 3:00:17 PM 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor run
> SEVERE: Exception while executing runnable 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@70e641f2
> java.lang.IllegalStateException: Only closed containers could be exported: 
> ContainerId=37
> 2019-09-26 15:00:17,644 [grpc-default-executor-17] ERROR 
> replication.GrpcReplicationClient (GrpcReplicationClient.java:onError(142)) - 
> Container download was unsuccessfull
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(KeyValueContainer.java:527)
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNKNOWN
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.exportContainer(KeyValueHandler.java:875)
> at 
> org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.exportContainer(ContainerController.java:134)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
> at 
> org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource
>  at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
> .java:64)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
> at 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClient
>  at 
> org.apache.hadoop.hdds.protocol.datanode.proto.IntraDatanodeProtocolServiceGrpc$MethodHandlers.invoke(IntraDatanodeProtocolSCallListener.java:40)
> erviceGrpc.java:217)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.
>  at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
> java:171)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClient
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:710)
> CallListener.java:40)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.ja
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> va:397)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at 
> 

[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-10-07 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946210#comment-16946210
 ] 

Arpit Agarwal commented on HDFS-14305:
--

Incompatibility is not worse than an obviously broken implementation. Also Erik 
explained above the mitigation for the incompatibility.

This patch was committed over my valid technical objection. I hope you will 
respect that, as we have respected your objections in the past.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn, release-blocker
> Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-10-04 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944871#comment-16944871
 ] 

Arpit Agarwal commented on HDFS-14305:
--

I think the right fix would be for NameNodes to push their range assignments 
into the edit log, so other NameNodes are aware of it and do not pick a 
conflicting range. Konstantin, this should also solve the hard-coded limit of 
64 that you objected to.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn, release-blocker
> Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-10-04 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944863#comment-16944863
 ] 

Arpit Agarwal edited comment on HDFS-14305 at 10/4/19 10:05 PM:


How do we guarantee that the ranges will not have an overlap across NameNodes? 
This is arguably worse than what we had before the original patch was reverted.

I am -1 on this new change and would like to see this reverted.


was (Author: arpitagarwal):
How do we guarantee that the ranges will not have an overlap across NameNodes? 
This is arguably worse than what we had before.

I am -1 on this change and would like to see this reverted.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn, release-blocker
> Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-10-04 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944863#comment-16944863
 ] 

Arpit Agarwal commented on HDFS-14305:
--

How do we guarantee that the ranges will not have an overlap across NameNodes? 
This is arguably worse than what we had before.

I am -1 on this change and would like to see this reverted.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn, release-blocker
> Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2211) Collect docker logs if env fails to start

2019-10-03 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2211:

   Fix Version/s: 0.5.0
Target Version/s:   (was: 0.5.0)
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

Committed this. Thanks for the contributon [~adoroszlai].

> Collect docker logs if env fails to start
> -
>
> Key: HDDS-2211
> URL: https://issues.apache.org/jira/browse/HDDS-2211
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Occasionally some acceptance test docker environment fails to start up 
> properly.  We need docker logs for analysis, but they are not being collected.
> https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-extra-20190930-74rp4/acceptance/output.log#L3765-L3768



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14890) HDFS is not starting in Windows

2019-10-03 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943817#comment-16943817
 ] 

Arpit Agarwal commented on HDFS-14890:
--

Thanks for the heads up [~inigoiri]. While we are not actively maintaining 
Hadoop on Windows we should certainly look into this if it broke recently.

> HDFS is not starting in Windows
> ---
>
> Key: HDFS-14890
> URL: https://issues.apache.org/jira/browse/HDFS-14890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: Windows 10.
>Reporter: hirik
>Priority: Blocker
>
> Hi,
> HDFS NameNode and JournalNode are not starting in Windows machine. Found 
> below related exception in logs. 
> Caused by: java.lang.UnsupportedOperationExceptionCaused by: 
> java.lang.UnsupportedOperationException
> at java.base/java.nio.file.Files.setPosixFilePermissions(Files.java:2155)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:422)
> at 
> com.slog.dfs.hdfs.nn.NameNodeServiceImpl.delayedStart(NameNodeServiceImpl.java:147)
>  
> Code changes related to this issue: 
> [https://github.com/apache/hadoop/commit/07e3cf952eac9e47e7bd5e195b0f9fc28c468313#diff-1a56e69d50f21b059637cfcbf1d23f11]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1369) Containers should be processed by Container Scanner right after close.

2019-10-03 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDDS-1369:
---

Assignee: Attila Doroszlai  (was: Hrishikesh Gadre)

> Containers should be processed by Container Scanner right after close.
> --
>
> Key: HDDS-1369
> URL: https://issues.apache.org/jira/browse/HDDS-1369
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Attila Doroszlai
>Priority: Major
>
> Containers which have been closed by the datanode, should be processed by 
> container scanner immediately. This proposal is to identify any potential 
> problem in the container closing or with container metadata immediately. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1228) Chunk Scanner Checkpoints

2019-10-03 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1228:

Fix Version/s: 0.5.0

> Chunk Scanner Checkpoints
> -
>
> Key: HDDS-1228
> URL: https://issues.apache.org/jira/browse/HDDS-1228
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Reporter: Supratim Deka
>Assignee: Attila Doroszlai
>Priority: Critical
> Fix For: 0.5.0
>
>
> Checkpoint the progress of the chunk verification scanner.
> Save the checkpoint persistently to support scanner resume from checkpoint - 
> after a datanode restart.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1812) Du while calculating used disk space reports that chunk files are file not found

2019-10-03 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1812:

Priority: Critical  (was: Major)

> Du while calculating used disk space reports that chunk files are file not 
> found
> 
>
> Key: HDDS-1812
> URL: https://issues.apache.org/jira/browse/HDDS-1812
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Attila Doroszlai
>Priority: Critical
>
> {code}
> 2019-07-16 08:16:49,787 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Could 
> not get disk usage information for path /data/3/ozone-0715
> ExitCodeException exitCode=1: du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/b113dd390e68e914d3ff405f3deec564_stream_60448f
> 77-6349-48fa-ae86-b2d311730569_chunk_1.tmp.1.14118085': No such file or 
> directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/37993af2849bdd0320d0f9d4a6ef4b92_stream_1f68be9f-e083-45e5-84a9-08809bc392ed
> _chunk_1.tmp.1.14118091': No such file or directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/a38677def61389ec0be9105b1b4fddff_stream_9c3c3741-f710-4482-8423-7ac6695be96b
> _chunk_1.tmp.1.14118102': No such file or directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/a689c89f71a75547471baf6182f3be01_stream_baf0f21d-2fb0-4cd8-84b0-eff1723019a0
> _chunk_1.tmp.1.14118105': No such file or directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/f58cf0fa5cb9360058ae25e8bc983e84_stream_d8d5ea61-995f-4ff5-88fb-4a9e97932f00
> _chunk_1.tmp.1.14118109': No such file or directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/a1d13ee6bbefd1f8156b1bd8db0d1b67_stream_db214bdd-a0c0-4f4a-8bc7-a3817e047e45_chunk_1.tmp.1.14118115':
>  No such file or directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/8f8a4bd3f6c31161a70f82cb5ab8ee60_stream_d532d657-3d87-4332-baf8-effad9b3db23_chunk_1.tmp.1.14118127':
>  No such file or directory
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
> at org.apache.hadoop.util.Shell.run(Shell.java:901)
> at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:62)
> at org.apache.hadoop.fs.DU.refresh(DU.java:53)
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:181)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1812) Du while calculating used disk space reports that chunk files are file not found

2019-10-03 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDDS-1812:
---

Assignee: Attila Doroszlai

> Du while calculating used disk space reports that chunk files are file not 
> found
> 
>
> Key: HDDS-1812
> URL: https://issues.apache.org/jira/browse/HDDS-1812
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Attila Doroszlai
>Priority: Major
>
> {code}
> 2019-07-16 08:16:49,787 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Could 
> not get disk usage information for path /data/3/ozone-0715
> ExitCodeException exitCode=1: du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/b113dd390e68e914d3ff405f3deec564_stream_60448f
> 77-6349-48fa-ae86-b2d311730569_chunk_1.tmp.1.14118085': No such file or 
> directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/37993af2849bdd0320d0f9d4a6ef4b92_stream_1f68be9f-e083-45e5-84a9-08809bc392ed
> _chunk_1.tmp.1.14118091': No such file or directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/a38677def61389ec0be9105b1b4fddff_stream_9c3c3741-f710-4482-8423-7ac6695be96b
> _chunk_1.tmp.1.14118102': No such file or directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/a689c89f71a75547471baf6182f3be01_stream_baf0f21d-2fb0-4cd8-84b0-eff1723019a0
> _chunk_1.tmp.1.14118105': No such file or directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/f58cf0fa5cb9360058ae25e8bc983e84_stream_d8d5ea61-995f-4ff5-88fb-4a9e97932f00
> _chunk_1.tmp.1.14118109': No such file or directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/a1d13ee6bbefd1f8156b1bd8db0d1b67_stream_db214bdd-a0c0-4f4a-8bc7-a3817e047e45_chunk_1.tmp.1.14118115':
>  No such file or directory
> du: cannot access 
> '/data/3/ozone-0715/hdds/1b467d25-46cd-4de0-a4a1-e9405bde23ff/current/containerDir3/1724/chunks/8f8a4bd3f6c31161a70f82cb5ab8ee60_stream_d532d657-3d87-4332-baf8-effad9b3db23_chunk_1.tmp.1.14118127':
>  No such file or directory
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
> at org.apache.hadoop.util.Shell.run(Shell.java:901)
> at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:62)
> at org.apache.hadoop.fs.DU.refresh(DU.java:53)
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:181)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-10-01 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942183#comment-16942183
 ] 

Arpit Agarwal edited comment on HDDS-2175 at 10/1/19 5:36 PM:
--

Thank you for the link to the paper. It looks like a great weekend read.

This quote from chapter 1 stands out :)
bq. While it is widely accepted that exception handling has a number of 
problems, it is the best we currently have available[38, 72].


was (Author: arpitagarwal):
Thank you for the link to the paper. It looks like a great weekend read.

This quote from chapter 1 stands out:
bq. While it is widely accepted that exception handling has a number of 
problems, it is the best we currently have available[38, 72].

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-10-01 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942183#comment-16942183
 ] 

Arpit Agarwal commented on HDDS-2175:
-

Thank you for the link to the paper. It looks like a great weekend read.

This quote from chapter 1 stands out:
bq. While it is widely accepted that exception handling has a number of 
problems, it is the best we currently have available[38, 72].

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-10-01 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942165#comment-16942165
 ] 

Arpit Agarwal commented on HDDS-2175:
-

C++ exceptions are [widely considered 
broken|http://yosefk.com/c++fqa/defective.html#defect-10] so we can't directly 
compare C++ best practices with Java. Golang not having exceptions is a step 
backwards for debuggability. Perhaps it works well for Google, for mere mortals 
like me exceptions are a boon. :) It is especially valuable in this phase of 
Ozone where we are stabilizing it. 

bq. But as I said; I think the disagreement is a question of taste; so I do not 
want perfect to be the enemy of good
Thanks for giving the option to go ahead. One thing we can do is make this 
behavior configurable. In the future we can turn it off entirely if it turns 
out not to be useful.

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1720) Add ability to configure RocksDB logs for Ozone Manager

2019-10-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1720:

Status: Patch Available  (was: Open)

> Add ability to configure RocksDB logs for Ozone Manager
> ---
>
> Key: HDDS-1720
> URL: https://issues.apache.org/jira/browse/HDDS-1720
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> While doing performance testing, it was seen that there was no way to get 
> RocksDB logs for Ozone Manager. Along with Rocksdb metrics, this may be a 
> useful mechanism to understand the health of Rocksdb while investigating 
> large clusters. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1984) Fix listBucket API

2019-10-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1984:

Status: Patch Available  (was: Open)

> Fix listBucket API
> --
>
> Key: HDDS-1984
> URL: https://issues.apache.org/jira/browse/HDDS-1984
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This Jira is to fix listBucket API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listBuckets, it should use both 
> in-memory cache and rocksdb bucket table to list buckets in a volume.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-09-30 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941456#comment-16941456
 ] 

Arpit Agarwal commented on HDDS-2175:
-

bq. it is hard to parse these exceptions even when they are part of normal log 
files.
And yet these exceptions are a godsend. I would rather see one exception than 
10 obscure log messages since it tells me exactly when something 'exceptional' 
happened and the code path leading to the occurrence.

bq. If we add exceptions to those strings, the human readability of those error 
messages goes down.
The readability goes up. You now actually get a sense for what actually went 
wrong instead of some generic message. 

bq. I had a chat with Supratim Deka and I said that I am all for increasing the 
fidelity of the error codes, that is we can add more error codes if we want to 
fine tune these messages. 
Lot more work with inferior results. Error codes are terrible in layered 
systems [since multiple layers will often wind up translating 
codes|https://twitter.com/Obdurodon/status/1161700056740876289]. The only way 
to maintain full fidelity is add a new error code for every single failure 
path, an impossible task. Instead just present the original exception as it 
happened. This is friendlier for your end users and painless for developers.

bq. I prefer a clear, simple contract between the server and client, I think it 
makes it easier for future clients to be developed more easily.
Exceptions as added here will make development of future clients super easy. 
Since the exception is stringified and propagated over the wire, all the client 
has to do is print the string without any interpretation. The fears seems 
unfounded to me.

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-09-28 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939880#comment-16939880
 ] 

Arpit Agarwal commented on HDDS-2175:
-

I feel that call stacks are invaluable when included in the bug report to the 
developer.

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1146) Adding container related metrics in SCM

2019-09-27 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDDS-1146:
---

Assignee: Bharat Viswanadham  (was: Supratim Deka)

> Adding container related metrics in SCM
> ---
>
> Key: HDDS-1146
> URL: https://issues.apache.org/jira/browse/HDDS-1146
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1146.000.patch, HDDS-1146.001.patch, 
> HDDS-1146.002.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This jira aims to add more container related metrics to SCM.
>  Following metrics will be added as part of this jira:
>  * Number of containers
>  * Number of open containers
>  * Number of closed containers
>  * Number of quasi closed containers
>  * Number of closing containers
> Above are already handled in HDDS-918.
>  * Number of successful create container calls
>  * Number of failed create container calls
>  * Number of successful delete container calls
>  * Number of failed delete container calls
> Handled in HDDS-2193.
>  * Number of successful container report processing
>  * Number of failed container report processing
>  * Number of successful incremental container report processing
>  * Number of failed incremental container report processing
> These will be handled in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-09-26 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938950#comment-16938950
 ] 

Arpit Agarwal edited comment on HDFS-14305 at 9/26/19 8:37 PM:
---

I agree and I had the same question [back in 
Feb|https://issues.apache.org/jira/browse/HDFS-14305?focusedCommentId=16780743=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16780743].
 I was convinced by [Erik's 
response|https://issues.apache.org/jira/browse/HDFS-14305?focusedCommentId=16780746=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16780746].


was (Author: arpitagarwal):
I agree and I had the same question back in Feb. I was convinced by Erik's 
response.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, 
> HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch, 
> HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >