[jira] [Created] (HDFS-15404) ShellCommandFencer should expose info about source

2020-06-09 Thread Chen Liang (Jira)
Chen Liang created HDFS-15404:
-

 Summary: ShellCommandFencer should expose info about source
 Key: HDFS-15404
 URL: https://issues.apache.org/jira/browse/HDFS-15404
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang


Currently the HA fencing logic in ShellCommandFencer exposes environment 
variable about only the fencing target. i.e. the $target_* variables as 
mentioned in this [document 
page|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html]).
 

But here only the fencing target variables are getting exposed. Sometimes it is 
useful to expose info about the fencing source node. One use case is would 
allow source and target node to identify themselves separately and run 
different commands/scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15293) Relax FSImage upload time delta check restriction

2020-04-21 Thread Chen Liang (Jira)
Chen Liang created HDFS-15293:
-

 Summary: Relax FSImage upload time delta check restriction
 Key: HDFS-15293
 URL: https://issues.apache.org/jira/browse/HDFS-15293
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Chen Liang
Assignee: Chen Liang


HDFS-12979 introduced the logic that, if ANN sees consecutive fs image upload 
from Standby with a small delta comparing to previous fsImage. ANN would reject 
this image. This is to avoid overly frequent fsImage in case of when there are 
multiple Standby node. However this check could be too stringent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15197) Change ObserverRetryOnActiveException log to debug

2020-02-27 Thread Chen Liang (Jira)
Chen Liang created HDFS-15197:
-

 Summary: Change ObserverRetryOnActiveException log to debug
 Key: HDFS-15197
 URL: https://issues.apache.org/jira/browse/HDFS-15197
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Chen Liang
Assignee: Chen Liang


Currently in ObserverReadProxyProvider, when a ObserverRetryOnActiveException 
happens, ObserverReadProxyProvider logs a message at INFO level. This can be a 
large volume of logs in some scenarios. For example, when some job tries to 
access lots of files that haven't been accessed for a long time, all these 
accesses may trigger atime updates, which led to 
ObserverRetryOnActiveException. We should change this log to DEBUG.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15153) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT fails intermittently

2020-02-12 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang resolved HDFS-15153.
---
Resolution: Duplicate

> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT fails 
> intermittently
> ---
>
> Key: HDFS-15153
> URL: https://issues.apache.org/jira/browse/HDFS-15153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> The unit TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT is 
> failing consistently. Seems this is due to a log message change. We should 
> fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15153) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT fails intermittently

2020-02-04 Thread Chen Liang (Jira)
Chen Liang created HDFS-15153:
-

 Summary: 
TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT fails 
intermittently
 Key: HDFS-15153
 URL: https://issues.apache.org/jira/browse/HDFS-15153
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chen Liang
Assignee: Chen Liang


The unit TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT is 
failing consistently. Seems this is due to a log message change. We should fix 
it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15148) dfs.namenode.send.qop.enabled should not apply to primary NN port

2020-01-27 Thread Chen Liang (Jira)
Chen Liang created HDFS-15148:
-

 Summary: dfs.namenode.send.qop.enabled should not apply to primary 
NN port
 Key: HDFS-15148
 URL: https://issues.apache.org/jira/browse/HDFS-15148
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.10.1, 3.3.1
Reporter: Chen Liang
Assignee: Chen Liang


In HDFS-13617, NameNode can be configured to wrap its established QOP into 
block access token as an encrypted message. Later on DataNode will use this 
message to create SASL connection. But this new behavior should only apply to 
new auxiliary NameNode ports, not the primary port (the one configured in 
fs.defaultFS), as it may cause conflicting behavior with existing other SASL 
related configuration (e.g. dfs.data.transfer.protection). Since this configure 
is introduced for to auxiliary ports only, we should restrict this new behavior 
to not apply to primary port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14991) Backport HDFS-14346 Better time precision in getTimeDuration to branch-2

2019-11-15 Thread Chen Liang (Jira)
Chen Liang created HDFS-14991:
-

 Summary: Backport HDFS-14346 Better time precision in 
getTimeDuration to branch-2
 Key: HDFS-14991
 URL: https://issues.apache.org/jira/browse/HDFS-14991
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Chen Liang
Assignee: Chen Liang


This is to backport HDFS-14346 to branch 2, as Standby reads in branch-2 
requires being able to properly specify ms time granularity for Edit log 
tailing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-29 Thread Chen Liang (Jira)
Chen Liang created HDFS-14941:
-

 Summary: Potential editlog race condition can cause corrupted file
 Key: HDFS-14941
 URL: https://issues.apache.org/jira/browse/HDFS-14941
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Chen Liang


Recently we encountered an issue that, after a failover, NameNode complains 
corrupted file/missing blocks. The blocks did recover after full block reports, 
so the blocks are not actually missing. After further investigation, we believe 
this is what happened:

First of all, on SbN, it is possible that it receives block reports before 
corresponding edit tailing happened. In which case SbN postpones processing the 
DN block report, handled by the guarding logic below:
{code:java}
  if (shouldPostponeBlocksFromFuture &&
  namesystem.isGenStampInFuture(iblk)) {
queueReportedBlock(storageInfo, iblk, reportedState,
QUEUE_REASON_FUTURE_GENSTAMP);
continue;
  }
{code}
Basically if reported block has a future generation stamp, the DN report gets 
requeued.

However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
{code:java}
  // allocate new block, record block locations in INode.
  newBlock = createNewBlock();
  INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
  saveAllocatedBlock(src, inodesInPath, newBlock, targets);

  persistNewBlock(src, pendingFile);
  offset = pendingFile.computeFileSize();
{code}
The line
 {{newBlock = createNewBlock();}}
 Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
Standby
 while the following line
 {{persistNewBlock(src, pendingFile);}}
 would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
Standby.

Then the race condition is that, imagine Standby has just processed 
{{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to be 
in different setment). Now a block report with new generation stamp comes in.

Since the genstamp bump has already been processed, the reported block may not 
be considered as future block. So the guarding logic passes. But actually, the 
block hasn't been added to blockmap, because the second edit is yet to be 
tailed. So, the block then gets added to invalidate block list and we saw 
messages like:
{code:java}
BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
{code}
Even worse, since this IBR is effectively lost, the NameNode has no information 
about this block, until the next full block report. So after a failover, the NN 
marks it as corrupt.

This issue won't happen though, if both of the edit entries get tailed all 
together, so no IBR processing can happen in between. But in our case, we set 
edit tailing interval to super low (to allow Standby read), so when under high 
workload, there is a much much higher chance that the two entries are tailed 
separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-12979) StandbyNode should upload FsImage to ObserverNode after checkpointing.

2019-10-02 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang reopened HDFS-12979:
---

Thanks for the catch [~shv]. I've committed to branch-3.2 and branch-3.1 as 
there were only some imports difference. But branch-2 patch is quite different, 
re-open to post the patch for jenkins run.

> StandbyNode should upload FsImage to ObserverNode after checkpointing.
> --
>
> Key: HDFS-12979
> URL: https://issues.apache.org/jira/browse/HDFS-12979
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-12979-branch-2.001.patch, HDFS-12979.001.patch, 
> HDFS-12979.002.patch, HDFS-12979.003.patch, HDFS-12979.004.patch, 
> HDFS-12979.005.patch, HDFS-12979.006.patch, HDFS-12979.007.patch, 
> HDFS-12979.008.patch, HDFS-12979.009.patch, HDFS-12979.010.patch, 
> HDFS-12979.011.patch, HDFS-12979.012.patch, HDFS-12979.013.patch, 
> HDFS-12979.014.patch, HDFS-12979.015.patch
>
>
> ObserverNode does not create checkpoints. So it's fsimage file can get very 
> old making bootstrap of ObserverNode too long. A StandbyNode should copy 
> latest fsimage to ObserverNode(s) along with ANN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14858) [SBN read] Allow configurably enable/disable AlignmentContext on NameNode

2019-09-19 Thread Chen Liang (Jira)
Chen Liang created HDFS-14858:
-

 Summary: [SBN read] Allow configurably enable/disable 
AlignmentContext on NameNode
 Key: HDFS-14858
 URL: https://issues.apache.org/jira/browse/HDFS-14858
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Chen Liang
Assignee: Chen Liang


As brought up under HDFS-14277, we should make sure SBN read has no performance 
impact when it is not enabled. One potential overhead of SBN read is 
maintaining and updating additional state status on NameNode. Specifically, 
this is done by creating/updating/checking a {{GlobalStateIdContext}} instance. 
Currently, even without enabling SBN read, this logic is still be checked.  We 
can make this configurable so that when SBN read is not enabled, there is no 
such overhead and everything works as-is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14822) Revisit GlobalStateIdContext locking when getting server state id

2019-09-04 Thread Chen Liang (Jira)
Chen Liang created HDFS-14822:
-

 Summary: Revisit GlobalStateIdContext locking when getting server 
state id
 Key: HDFS-14822
 URL: https://issues.apache.org/jira/browse/HDFS-14822
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Chen Liang
Assignee: Chen Liang


As mentioned under HDFS-14277. One potential performance issue of Observer read 
is that {{GlobalStateIdContext#getLastSeenStateId}} calls 
getCorrectLastAppliedOrWrittenTxId which ends up acquiring lock on txnid. We 
internally had some discussion and analysis, we believe this lock can be 
avoided, by calling the non-locking version method 
{{getLastAppliedOrWrittenTxId.}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14806) Bootstrap standby may fail if used in-progress tailing

2019-08-30 Thread Chen Liang (Jira)
Chen Liang created HDFS-14806:
-

 Summary: Bootstrap standby may fail if used in-progress tailing
 Key: HDFS-14806
 URL: https://issues.apache.org/jira/browse/HDFS-14806
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.3.0
Reporter: Chen Liang
Assignee: Chen Liang


One issue we went across was that if in-progress tailing is enabled, bootstrap 
standby could fail.

When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get 
edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an 
upper bound on how many txnid can be included in one RPC call. The default is 
5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from 
JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's 
current transactionID, NN2 may return a state that is > 5000 txnid from NN1's 
current image. But NN1 can only see 5000 more txnid from JNs. At this point NN1 
goes panic, because txnid retuned by JNs is behind NN2's returned state, 
bootstrap then fail.

Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super 
large value allowed bootstrap to continue. But this is hardly the ideal 
solution.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14785) [SBN read] Change client logging to be less aggressive

2019-08-27 Thread Chen Liang (Jira)
Chen Liang created HDFS-14785:
-

 Summary: [SBN read] Change client logging to be less aggressive
 Key: HDFS-14785
 URL: https://issues.apache.org/jira/browse/HDFS-14785
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Affects Versions: 3.1.2, 3.2.0, 2.10.0, 3.3.0
Reporter: Chen Liang
Assignee: Chen Liang


Currently {{ObserverReadProxyProvider}} logs a lot of information. There are 
states that are acceptable but {{ObserverReadProxyProvider}} still log an 
overwhelmingly large amount of messages. One example is that, if some NN runs 
an older version, the lack of {{getHAServiceState}} method in older version NN 
will lead to a Exception prints on every single call.

We can change them to debug log. This should be minimum risk, because this is 
only client side, we can always enable the log back by changing to DEBUG log 
level on client side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14726) Fix JN incompatibility issue in branch-2 due to backport of HDFS-10519

2019-08-12 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14726:
-

 Summary: Fix JN incompatibility issue in branch-2 due to backport 
of HDFS-10519
 Key: HDFS-14726
 URL: https://issues.apache.org/jira/browse/HDFS-14726
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node
Affects Versions: 2.10.0
Reporter: Chen Liang
Assignee: Chen Liang


HDFS-10519 has been backported to branch-2. However HDFS-10519 introduced an 
incompatibility issue between NN and JN due to the new protobuf field 
{{committedTxnId}} in {{HdfsServer.proto}}. This field was introduced as a 
required field so if JN and NN are not on same version, it will run into 
missing field exception. Although currently we can get around by making sure JN 
always gets upgraded properly before NN, we can potentially fix this 
incompatibility by changing the field to optional. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14611) Move handshake secret field from Token to BlockAccessToken

2019-06-26 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14611:
-

 Summary: Move handshake secret field from Token to BlockAccessToken
 Key: HDFS-14611
 URL: https://issues.apache.org/jira/browse/HDFS-14611
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Chen Liang
Assignee: Chen Liang


Currently the handshake secret is included in Token, but conceptually this 
should belong to Block Access Token only. In fact, having this field in Token 
could potentially break compatibility. Moreover, having this field as part of 
Block Access Token also means we may not need to encrypt this field anymore, 
because block access token is already encrypted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14573) Backport Standby Read to branch-3

2019-06-14 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14573:
-

 Summary: Backport Standby Read to branch-3
 Key: HDFS-14573
 URL: https://issues.apache.org/jira/browse/HDFS-14573
 Project: Hadoop HDFS
  Issue Type: Task
  Components: hdfs
Reporter: Chen Liang
Assignee: Chen Liang


This Jira tracks backporting the feature consistent read from standby 
(HDFS-12943) to branch-3.x, including 3.0, 3.1, 3.2. This is required for 
backporting to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-01-14 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14205:
-

 Summary: Backport HDFS-6440 to branch-2
 Key: HDFS-14205
 URL: https://issues.apache.org/jira/browse/HDFS-14205
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang


Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
(consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14204) Backport HDFS-12943 to branch-2

2019-01-14 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14204:
-

 Summary: Backport HDFS-12943 to branch-2
 Key: HDFS-14204
 URL: https://issues.apache.org/jira/browse/HDFS-14204
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang


Currently, consistent read from standby feature (HDFS-12943) is only in trunk 
(branch-3). This JIRA aims to backport the feature to branch-2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys

2018-12-11 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14142:
-

 Summary: Move ipfailover config key out of HdfsClientConfigKeys
 Key: HDFS-14142
 URL: https://issues.apache.org/jira/browse/HDFS-14142
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Running TestHdfsConfigFields throws error complaining missing key 
dfs.client.failover.ipfailover.virtual-address. Since this config key is 
specific to only ORFPPwithIP, This Jira moves this config prefix to 
ObserverReadProxyProviderWithIPFailover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14120) ORFPP should also clone DT for the virtual IP

2018-11-30 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14120:
-

 Summary: ORFPP should also clone DT for the virtual IP
 Key: HDFS-14120
 URL: https://issues.apache.org/jira/browse/HDFS-14120
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-12943
Reporter: Chen Liang
Assignee: Chen Liang


Currently with HDFS-14017, ORFPP behaves the similar way on handling delegation 
as ConfiguredFailoverProxyProvider. Specifically, given the delegation token 
associated with name service ID, it clones the DTs for all the corresponding 
physical addresses. But ORFPP requires more work than CFPP in the sense that it 
also leverages VIP address for failover, meaning in addition to cloning DT for 
physical addresses, ORFPP also needs to clone DT for the VIP address, which is 
missed from HDFS-14017.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13547) Add ingress port based sasl resolver

2018-11-29 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang resolved HDFS-13547.
---
   Resolution: Fixed
Fix Version/s: 3.1.1

> Add ingress port based sasl resolver
> 
>
> Key: HDFS-13547
> URL: https://issues.apache.org/jira/browse/HDFS-13547
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: HDFS-13547.001.patch, HDFS-13547.002.patch, 
> HDFS-13547.003.patch, HDFS-13547.004.patch
>
>
> This Jira extends the SASL properties resolver interface to take an ingress 
> port parameter, and also adds an implementation based on this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider

2018-11-29 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14116:
-

 Summary: Fix a potential class cast error in 
ObserverReadProxyProvider
 Key: HDFS-14116
 URL: https://issues.apache.org/jira/browse/HDFS-14116
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Chen Liang


Currently in {{ObserverReadProxyProvider}} constructor there is this line 
{code}
((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
{code}
This could potentially cause failure, because it is possible that factory can 
not be casted here. Specifically,  
{{NameNodeProxiesClient.createFailoverProxyProvider}} is where the constructor 
will be called, and there are two paths that could call into this:
(1).{{NameNodeProxies.createProxy}}
(2).{{NameNodeProxiesClient.createFailoverProxyProvider}}

(2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
{{NameNodeHAProxyFactory}} which can not be casted to {{ClientHAProxyFactory}}, 
this happens when, for example, running NNThroughputBenmarck. To fix this we 
can at least:
1. introduce setAlignmentContext to HAProxyFactory which is the parent of both  
ClientHAProxyFactory and NameNodeHAProxyFactory OR
2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
if check with reflection. 
Depending on whether it make sense to have alignment context for the case (1) 
calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14035) NN status discovery does not leverage delegation token

2018-10-29 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14035:
-

 Summary: NN status discovery does not leverage delegation token
 Key: HDFS-14035
 URL: https://issues.apache.org/jira/browse/HDFS-14035
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Currently ObserverReadProxyProvider uses {{HAServiceProtocol#getServiceStatus}} 
to get the status of each NN. {{HAServiceProtocol}} does not leverage 
delegation token. So when YARN node manager makes this call, token 
authentication will fail, causing the application to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14017) ObserverReadProxyProviderWithIPFailover does not quite work

2018-10-22 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14017:
-

 Summary: ObserverReadProxyProviderWithIPFailover does not quite 
work
 Key: HDFS-14017
 URL: https://issues.apache.org/jira/browse/HDFS-14017
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Currently {{ObserverReadProxyProviderWithIPFailover}} extends 
{{ObserverReadProxyProvider}}, and the only difference is changing the proxy 
factory to use {{IPFailoverProxyProvider}}. However this is not enough because 
when calling constructor of {{ObserverReadProxyProvider}} in super(...), the 
follow line:
{code}
nameNodeProxies = getProxyAddresses(uri,
HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
{code}
will try to resolve the all configured NN addresses to do configured failover. 
But in the case of IPFailover, this does not really apply. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14016) ObserverReadProxyProvider can never enable observer read except in tests

2018-10-22 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14016:
-

 Summary: ObserverReadProxyProvider can never enable observer read 
except in tests
 Key: HDFS-14016
 URL: https://issues.apache.org/jira/browse/HDFS-14016
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Currently in {{ObserverReadProxyProvider#invoke}}, only when 
{{observerReadEnabled && isRead(method)}} is true, the code will check whether 
to talk to Observer. Otherwise always talk to active. The issue here is that 
currently it can only be set through {{setObserverReadEnabled}}, which is used 
by tests only. So observer read is always disabled in deployment and no way to 
enable it. We may want to either expose a configuration key, or hard code it to 
true so it can only be changed for testing purpose, or simply remove this 
variable. This is closely related to HDFS-13923.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13880) Add mechanism to allow certain RPC calls to bypass sync

2018-08-28 Thread Chen Liang (JIRA)
Chen Liang created HDFS-13880:
-

 Summary: Add mechanism to allow certain RPC calls to bypass sync
 Key: HDFS-13880
 URL: https://issues.apache.org/jira/browse/HDFS-13880
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Chen Liang
Assignee: Chen Liang


Currently, every single call to NameNode will be synced, in the sense that 
NameNode will not process it until state id catches up. But in certain cases, 
we would like to bypass this check and allow the call to return immediately, 
even when the server id is not up to date. One case could be the to-be-added 
new API in HDFS-13749 that request for current state id. Others may include 
calls that do not promise real time responses such as {{getContentSummary}}. 
This Jira is to add the mechanism to allow certain calls to bypass sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13767) Add msync server implementation.

2018-07-25 Thread Chen Liang (JIRA)
Chen Liang created HDFS-13767:
-

 Summary: Add msync server implementation.
 Key: HDFS-13767
 URL: https://issues.apache.org/jira/browse/HDFS-13767
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Chen Liang
Assignee: Chen Liang


This is a followup on HDFS-13688, where msync API is introduced to 
{{ClientProtocol}} but the server side implementation is missing. This is Jira 
is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP

2018-06-25 Thread Chen Liang (JIRA)
Chen Liang created HDFS-13699:
-

 Summary: Add DFSClient sending handshake token to DataNode, and 
allow DataNode overwrite downstream QOP
 Key: HDFS-13699
 URL: https://issues.apache.org/jira/browse/HDFS-13699
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to 
redirect the encrypt secret to DataNode. The encrypted message is the QOP that 
client and NameNode have used. DataNode decrypts the message and enforce the 
QOP for the client connection. Also, this Jira will also include overwriting 
downstream QOP, as mentioned in the HDFS-13541 design doc. Namely, this is to 
allow inter-DN QOP that is different from client-DN QOP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13688) Introduce msync API call

2018-06-18 Thread Chen Liang (JIRA)
Chen Liang created HDFS-13688:
-

 Summary: Introduce msync API call
 Key: HDFS-13688
 URL: https://issues.apache.org/jira/browse/HDFS-13688
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


As mentioned in the design doc in HDFS-12943, to ensure consistent read, we 
need to introduce an RPC call {{msync}}. Specifically, client can issue a msync 
call to Observer node along with a transactionID. The msync will only return 
when the Observer's transactionID has caught up to the given ID. This JIRA is 
to add this API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message

2018-05-24 Thread Chen Liang (JIRA)
Chen Liang created HDFS-13617:
-

 Summary: Allow wrapping NN QOP into token in encrypted message
 Key: HDFS-13617
 URL: https://issues.apache.org/jira/browse/HDFS-13617
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang
 Attachments: HDFS-13617.001.patch

This Jira allows NN to configurably wrap the QOP it has established with the 
client into the token message sent back to the client. The QOP is sent back in 
encrypted message, using BlockAccessToken encryption key as the key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13566) Add configurable additional RPC listener to NameNode

2018-05-15 Thread Chen Liang (JIRA)
Chen Liang created HDFS-13566:
-

 Summary: Add configurable additional RPC listener to NameNode
 Key: HDFS-13566
 URL: https://issues.apache.org/jira/browse/HDFS-13566
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ipc
Reporter: Chen Liang
Assignee: Chen Liang


This Jira aims to add the capability to NameNode to run additional listener(s). 
Such that NameNode can be accessed from multiple ports. Fundamentally, this 
Jira tries to extend ipc.Server to allow configured with more listeners, 
binding to different ports, but sharing the same call queue and the handlers. 
Useful when different clients are only allowed to access certain different 
ports. Combined with HDFS-13547, this also allows different ports to have 
different SASL security levels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13547) Add ingress port based sasl resolver

2018-05-11 Thread Chen Liang (JIRA)
Chen Liang created HDFS-13547:
-

 Summary: Add ingress port based sasl resolver
 Key: HDFS-13547
 URL: https://issues.apache.org/jira/browse/HDFS-13547
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: security
Reporter: Chen Liang
Assignee: Chen Liang


This Jira extends the SASL properties resolver interface to take an ingress 
port parameter, and also adds an implementation based on this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13541) NameNode Port based selective encryption

2018-05-09 Thread Chen Liang (JIRA)
Chen Liang created HDFS-13541:
-

 Summary: NameNode Port based selective encryption
 Key: HDFS-13541
 URL: https://issues.apache.org/jira/browse/HDFS-13541
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode, security
Reporter: Chen Liang
Assignee: Chen Liang
 Attachments: NameNode Port based selective encryption-v1.pdf

Here at LinkedIn, one issue we face is that we need to enforce different 
security requirement based on the location of client and the cluster. 
Specifically, for clients from outside of the data center, it is required by 
regulation that all traffic must be encrypted. But for clients within the same 
data center, unencrypted connections are more desired to avoid the high 
encryption overhead. 

HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335 
introduced WhitelistBasedResolver which solves the same problem. However we 
found it difficult to fit into our environment for several reasons. In this 
JIRA, on top of pluggable SASL resolver, *we propose a different approach of 
running RPC two ports on NameNode, and the two ports will be enforcing 
encrypted and unencrypted connections respectively, and the following DataNode 
access will simply follow the same behaviour of encryption/unencryption*. Then 
by blocking unencrypted port on datacenter firewall, we can completely block 
unencrypted external access.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12989) Ozone: Container : Add key versioning support-4

2018-01-04 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12989:
-

 Summary: Ozone: Container : Add key versioning support-4
 Key: HDFS-12989
 URL: https://issues.apache.org/jira/browse/HDFS-12989
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ozone
Reporter: Chen Liang
Assignee: Chen Liang


After HDFS-12925 and HDFS-12954 get added, every key write call will generate a 
new version, and we will be able to read any specific version of a key. This 
JIRA adds a new newKeyReader API {{newKeyReaderWithVersion}} to 
{{StorageHandler}}. This method takes an extra version field such that the 
caller can read any older version. This JIRA also adds all the other changes 
needed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12958) Ozone: remove setAllocatedBytes method in ContainerInfo

2017-12-21 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12958:
-

 Summary: Ozone: remove setAllocatedBytes method in ContainerInfo
 Key: HDFS-12958
 URL: https://issues.apache.org/jira/browse/HDFS-12958
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Priority: Minor


We may want to remove {{setAllocatedBytes}} method from {{ContainerInfo}} and 
we keep all fields of {{ContainerInfo}} immutable, such that client won't 
accidentally change {{ContainerInfo}} and rely on the changed instance.

An alternative of having {{setAllocatedBytes}} is to always create a new 
{{ContainerInfo}} instance whenever it needs to be changed.

This is based on [this 
comment|https://issues.apache.org/jira/browse/HDFS-12751?focusedCommentId=16299750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16299750]
 from HDFS-12751.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12954) Ozone: Container : Add key versioning support-3

2017-12-20 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12954:
-

 Summary: Ozone: Container : Add key versioning support-3
 Key: HDFS-12954
 URL: https://issues.apache.org/jira/browse/HDFS-12954
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Chen Liang
Assignee: Chen Liang


A new version of a key is effectively overwriting some consecutive range of 
bytes in the entire key offset range. For each version, we need to keep exactly 
what the range is in order for the IO vector to work.

Currently, since we only write from the start (offset = 0), so offset range of 
a version is only up to the key data size field when the version gets 
committed. But currently we only keep one single key data size variable.(see 
{{KeyManagerImpl#commitKey}}). We need to know the corresponding key data size 
for each version. This JIRA is to the tracking of offset range for each version.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12925) Ozone: Container : Add key versioning support-2

2017-12-14 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12925:
-

 Summary: Ozone: Container : Add key versioning support-2
 Key: HDFS-12925
 URL: https://issues.apache.org/jira/browse/HDFS-12925
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Chen Liang
Assignee: Chen Liang


One component for versioning is assembling read IO vector, (please see 4.2 
section of the versioning design doc HDFS-12000 for the detail). This JIRA adds 
the util functions that takes a list with blocks from different versions and 
properly generate the read vector for the requested version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12265) Ozone : better handling of operation fail due to chill mode

2017-12-13 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang resolved HDFS-12265.
---
  Resolution: Fixed
Release Note: Looks like this has been handled as part of HDFS-12387, close 
this JIRA.

> Ozone : better handling of operation fail due to chill mode
> ---
>
> Key: HDFS-12265
> URL: https://issues.apache.org/jira/browse/HDFS-12265
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Chen Liang
>Priority: Minor
>  Labels: OzonePostMerge
>
> Currently if someone tries to create a container while SCM is in chill mode, 
> there will be exception of INTERNAL_ERROR, which is not very informative and 
> can be confusing for debugging.
> We should make it easier to identify problems caused by chill mode. For 
> example, we may detect if SCM is in chill mode and report back to client in 
> some way, such that the client can backup and try again later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12879) Ozone : add scm init command to document.

2017-11-30 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12879:
-

 Summary: Ozone : add scm init command to document.
 Key: HDFS-12879
 URL: https://issues.apache.org/jira/browse/HDFS-12879
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ozone
Reporter: Chen Liang
Priority: Minor


When an Ozone cluster is initialized, before starting SCM through {{hdfs 
--daemon start scm}}, the command {{hdfs scm -init}} needs to be called first. 
But seems this command is not being documented. We should add this note to 
document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12793) Ozone : TestSCMCli is failing consistently

2017-11-08 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12793:
-

 Summary: Ozone : TestSCMCli is failing consistently
 Key: HDFS-12793
 URL: https://issues.apache.org/jira/browse/HDFS-12793
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ozone
Reporter: Chen Liang
Assignee: Chen Liang


In the Jenkins build of HDFS-12787 and HDFS-12758, there are same three tests 
in {{TestSCMCli}} that failed: {{testCloseContainer}}, {{testDeleteContainer}} 
and {{testInfoContainer}}. I tested locally, these three tests have been 
failing consistently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12626) Ozone : delete open key entries that will no longer be closed

2017-10-10 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12626:
-

 Summary: Ozone : delete open key entries that will no longer be 
closed
 Key: HDFS-12626
 URL: https://issues.apache.org/jira/browse/HDFS-12626
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


HDFS-12543 introduced the notion of "open key" where when a key is opened, an 
open key entry gets persisted, only after client calls a close will this entry 
be made visible. One issue is that if the client does not call close (e.g. 
failed), then that open key entry will never be deleted from meta data. This 
JIRA tracks this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12543) Ozone : allow create key without specifying size

2017-09-25 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12543:
-

 Summary: Ozone : allow create key without specifying size
 Key: HDFS-12543
 URL: https://issues.apache.org/jira/browse/HDFS-12543
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang


Currently when creating a key, it is required to specify the total size of the 
key. This makes it inconvenient for the case where a key is created and data 
keeps coming and being appended. This JIRA is remove the requirement of 
specifying the size on key creation, and allows appending to the key 
indefinitely.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12475) Ozone : add document for port sharing with WebHDFS

2017-09-15 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12475:
-

 Summary: Ozone : add document for port sharing with WebHDFS
 Key: HDFS-12475
 URL: https://issues.apache.org/jira/browse/HDFS-12475
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chen Liang


Currently Ozone's REST API uses the port 9864, all commands mentioned in 
OzoneCommandShell.md use the address localhost:9864.

This port was used by WebHDFS and is now shared by Ozone. The value is 
controlled by the config key {{dfs.datanode.http.address}}. We should document 
this information in {{OzoneCommandShell.md}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-12268) Ozone: Add metrics for pending storage container requests

2017-09-15 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang reopened HDFS-12268:
---

> Ozone: Add metrics for pending storage container requests
> -
>
> Key: HDFS-12268
> URL: https://issues.apache.org/jira/browse/HDFS-12268
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>  Labels: ozoneMerge
> Attachments: HDFS-12268-HDFS-7240.001.patch, 
> HDFS-12268-HDFS-7240.002.patch, HDFS-12268-HDFS-7240.003.patch, 
> HDFS-12268-HDFS-7240.004.patch, HDFS-12268-HDFS-7240.005.patch, 
> HDFS-12268-HDFS-7240.006.patch, HDFS-12268-HDFS-7240.007.patch
>
>
>  As storage container async interface has been supported after HDFS-11580, we 
> need to keep an eye on the queue depth of pending container requests. It can 
> help us better found if there are some performance problems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12454) Ozone : the sample ozone-site.xml in OzoneGettingStarted does not work

2017-09-14 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12454:
-

 Summary: Ozone : the sample ozone-site.xml in OzoneGettingStarted 
does not work
 Key: HDFS-12454
 URL: https://issues.apache.org/jira/browse/HDFS-12454
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang


In OzoneGettingStarted.md there is a sample ozone-site.xml file. But there are 
a few issues with it.
1.
{code}

  ozone.scm.block.client.address
  scm.hadoop.apache.org


 
ozone.ksm.address
ksm.hadoop.apache.org
  
{code}
The value should be an address instead.

2.
{{datanode.ObjectStoreHandler.(ObjectStoreHandler.java:103)}} requires 
{{ozone.scm.client.address}} to be set, which is missing from this sample file. 
Missing this config will seem to cause failure on starting datanode.

3.
{code}

  ozone.scm.names
  scm.hadoop.apache.org

{code}
This value did not make much sense to, I found the comment in {{ScmConfigKeys}} 
that says
{code}
// ozone.scm.names key is a set of DNS | DNS:PORT | IP Address | IP:PORT.
// Written as a comma separated string. e.g. scm1, scm2:8020, 7.7.7.7:
{code}
So maybe we should write something like scm1 as value here.

4. I'm not entirely sure about this, but 
[here|https://wiki.apache.org/hadoop/Ozone#Configuration] it says 
{code}

ozone.handler.type
local
  
{code}
is also part of minimum setting, do we need to add this [~anu]?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12368) [branch-2] Enable DFSNetworkTopology as default

2017-08-28 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12368:
-

 Summary: [branch-2] Enable DFSNetworkTopology as default
 Key: HDFS-12368
 URL: https://issues.apache.org/jira/browse/HDFS-12368
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang


This JIRA is to backport HDFS-11998 to branch-2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12346) [branch-2] Combine the old and the new chooseRandom for better performance

2017-08-23 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12346:
-

 Summary: [branch-2] Combine the old and the new chooseRandom for 
better performance
 Key: HDFS-12346
 URL: https://issues.apache.org/jira/browse/HDFS-12346
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang


This JIRA is to backport HDFS-11577 back to branch-2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12334) [branch-2] Add storage type demand to into DFSNetworkTopology#chooseRandom

2017-08-21 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12334:
-

 Summary: [branch-2] Add storage type demand to into 
DFSNetworkTopology#chooseRandom
 Key: HDFS-12334
 URL: https://issues.apache.org/jira/browse/HDFS-12334
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chen Liang
Assignee: Chen Liang


This JIRA is to backport HDFS-11514 to branch-2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12325) SFTPFileSystem operations should restore cwd

2017-08-18 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12325:
-

 Summary: SFTPFileSystem operations should restore cwd
 Key: HDFS-12325
 URL: https://issues.apache.org/jira/browse/HDFS-12325
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chen Liang
Assignee: Chen Liang


We've seen a case where writing to {{SFTPFileSystem}} led to unexpected 
behaviour:

Given a directory ./data with more than one files in it, the steps it took to 
get this error was simply:
{code}
hdfs dfs -fs sftp://x.y.z -mkdir dir0
hdfs dfs -fs sftp://x.y.z -copyFromLocal data dir0
hdfs dfs -fs sftp://x.y.z -ls -R dir0
{code}
But not all files show up as in the ls output, in fact more often just one 
single file shows up in that path...

Digging deeper, we found that rename, mkdirs and create operations in 
{{SFTPFileSystem}} are changing the current working directory during it's 
execution. For example in create there are:
{code}
  client.cd(parent.toUri().getPath());
  os = client.put(f.getName());
{code}

The issue here is {{SFTPConnectionPool}} is caching SFTP sessions (in 
{{idleConnections}}), which contains their current working directory. So after 
these operations, the sessions will be put back to cache with a changed working 
directory. This accumulates in each call and ends up causing unexpected weird 
behaviour. Basically this error happens when processing multiple file system 
objects in one operation, and relative path is being used. 

The fix here is to restore the current working directory of the SFTP sessions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12322) [branch-2] Add storage type demand to into DFSNetworkTopology#chooseRandom.

2017-08-18 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12322:
-

 Summary: [branch-2] Add storage type demand to into 
DFSNetworkTopology#chooseRandom.
 Key: HDFS-12322
 URL: https://issues.apache.org/jira/browse/HDFS-12322
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chen Liang
Assignee: Chen Liang


This JIRA is to backport HDFS-11482 to branch-2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12321) Ozone : debug cli: add support to load user-provided SQL query

2017-08-18 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12321:
-

 Summary: Ozone : debug cli: add support to load user-provided SQL 
query
 Key: HDFS-12321
 URL: https://issues.apache.org/jira/browse/HDFS-12321
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang
 Fix For: ozone


This JIRA extends SQL CLI to support loading a user-provided file that includes 
any sql query the user wants to run on the SQLite db obtained by converting 
Ozone metadata db.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12311) [branch-2] HDFS specific network topology classes with storage type info included

2017-08-16 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12311:
-

 Summary: [branch-2] HDFS specific network topology classes with 
storage type info included
 Key: HDFS-12311
 URL: https://issues.apache.org/jira/browse/HDFS-12311
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Chen Liang
Assignee: Chen Liang


This JIRA is to backport HDFS-11450 to branch 2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12306) [branch-2]Separate class InnerNode from class NetworkTopology and make it extendable

2017-08-15 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12306:
-

 Summary: [branch-2]Separate class InnerNode from class 
NetworkTopology and make it extendable
 Key: HDFS-12306
 URL: https://issues.apache.org/jira/browse/HDFS-12306
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang


This JIRA is to backport HDFS-11430 to branch-2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12304) Remove unused parameter from FsDatasetImpl#addVolume

2017-08-15 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12304:
-

 Summary: Remove unused parameter from FsDatasetImpl#addVolume
 Key: HDFS-12304
 URL: https://issues.apache.org/jira/browse/HDFS-12304
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang
Priority: Minor


FsDatasetImpl has this method
{code}
  private void addVolume(Collection dataLocations,
  Storage.StorageDirectory sd) throws IOException
{code}
Parameter {{dataLocations}} was introduced in HDFS-6740, this variable was used 
to get storage type info. But HDFS-10637 has changed the way of getting storage 
type in this method, making dataLocations no longer being used at all here. We 
should probably remove dataLocations for a cleaner interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12287) Remove a no-longer applicable TODO comment in DatanodeManager

2017-08-10 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12287:
-

 Summary: Remove a no-longer applicable TODO comment in 
DatanodeManager
 Key: HDFS-12287
 URL: https://issues.apache.org/jira/browse/HDFS-12287
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Chen Liang
Assignee: Chen Liang
Priority: Trivial


{{DatanodeManager}} has this this TODO comment
{code}
// TODO: Enables DFSNetworkTopology by default after more stress
// testings/validations.
{code}

This has been resolved in HDFS-11998, but it missed removing this comment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12266) Ozone : add debug cli to hdfs script

2017-08-04 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12266:
-

 Summary: Ozone : add debug cli to hdfs script
 Key: HDFS-12266
 URL: https://issues.apache.org/jira/browse/HDFS-12266
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chen Liang
Assignee: Chen Liang
Priority: Minor


The debug CLI (which converts metadata levelDB/RocksDB file to sqlite file) is 
still missing in hdfs script, this JIRA adds it as one of the hdfs subcommands. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12265) Ozone : better handling of operation fail due to chill mode

2017-08-04 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12265:
-

 Summary: Ozone : better handling of operation fail due to chill 
mode
 Key: HDFS-12265
 URL: https://issues.apache.org/jira/browse/HDFS-12265
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chen Liang
Priority: Minor


Currently if someone tries to create a container while SCM is in chill mode, 
there will be exception of INTERNAL_ERROR, which is not very informative and 
can be confusing for debugging.

We should make it easier to identify problems caused by chill mode. For 
example, we may detect if SCM is in chill mode and report back to client in 
some way, such that the client can backup and try again later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12256) Ozone : handle inactive containers on DataNode side

2017-08-03 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12256:
-

 Summary: Ozone : handle inactive containers on DataNode side
 Key: HDFS-12256
 URL: https://issues.apache.org/jira/browse/HDFS-12256
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chen Liang


When a container gets created, corresponding metadata gets added to 
{{ContainerManagerImpl#containerMap}}. What {{containerMap}} stores is a 
containerName to {{ContainerStatus}} instance map. When datanode starts, it 
also loads this map from disk file metadata. As long as the containerName is 
found in this map, it is considered an existing container.

An issue we saw was that, occasionally, when the container creation on datanode 
fails, the metadata of the failed container may still get added to 
{{containerMap}}, with active flag set to false. But currently such containers 
are not being handled, containers with active=false are just treated as normal 
containers. Then when someone tries to write to this container, fails can 
happen.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12187) Ozone : add support to DEBUG CLI for ksm.db

2017-07-21 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12187:
-

 Summary: Ozone : add support to DEBUG CLI for ksm.db
 Key: HDFS-12187
 URL: https://issues.apache.org/jira/browse/HDFS-12187
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


This JIRA adds the ability to convert ksm meta data file (ksm.db) into sqlite 
db.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12138) Remove redundant 'public' modifiers from BlockCollection

2017-07-13 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12138:
-

 Summary: Remove redundant 'public' modifiers from BlockCollection
 Key: HDFS-12138
 URL: https://issues.apache.org/jira/browse/HDFS-12138
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang
Priority: Trivial


The 'public' modifier of the methods in {{BlockCollection}} are redundant, 
since this is a public interface. Running checkstyle against also complains 
this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12130) Optimizing permission check for getContentSummary

2017-07-12 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12130:
-

 Summary: Optimizing permission check for getContentSummary
 Key: HDFS-12130
 URL: https://issues.apache.org/jira/browse/HDFS-12130
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Chen Liang
Assignee: Chen Liang


Currently, {{getContentSummary}} takes two phases to complete:
- phase1. check the permission of the entire subtree. If any subdirectory does 
not have {{READ_EXECUTE}}, an access control exception is thrown and 
{{getContentSummary}} terminates here (unless it's super user).
- phase2. If phase1 passed, it will then traverse the entire tree recursively 
to get the actual content summary.

An issue is, both phases currently hold the fs lock.

Phase 2 has already been written that, it will yield the fs lock over time, 
such that it does not block other operations for too long. However phase 1 does 
not yield. Meaning it's possible that the permission check phase still blocks 
things for long time.

One fix is to add lock yield to phase 1. But a simpler fix is to merge phase 1 
into phase 2. Namely, instead of doing a full traversal for permission check 
first, we start with phase 2 directly, but for each directory, before obtaining 
its summary, check its permission first. This way we take advantage of existing 
lock yield in phase 2 code and still able to check permission and terminate on 
access exception.

Thanks [~szetszwo] for the offline discussions!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12041) Block Storage : make the server address config more concise

2017-06-29 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang resolved HDFS-12041.
---
Resolution: Won't Fix

> Block Storage : make the server address config more concise
> ---
>
> Key: HDFS-12041
> URL: https://issues.apache.org/jira/browse/HDFS-12041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
>
> Currently there are a few places where the address are read from config like 
> such 
> {code}
> String cbmIPAddress = ozoneConf.get(
> DFS_CBLOCK_JSCSI_CBLOCK_SERVER_ADDRESS_KEY,
> DFS_CBLOCK_JSCSI_CBLOCK_SERVER_ADDRESS_DEFAULT
> );
> int cbmPort = ozoneConf.getInt(
> DFS_CBLOCK_JSCSI_PORT_KEY,
> DFS_CBLOCK_JSCSI_PORT_DEFAULT
> );
> {code}
> Similarly for jscsi address config. Maybe we should consider merge these to 
> one single key config in form of host:port.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12043) Add counters for block re-replication

2017-06-26 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12043:
-

 Summary: Add counters for block re-replication
 Key: HDFS-12043
 URL: https://issues.apache.org/jira/browse/HDFS-12043
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chen Liang
Assignee: Chen Liang


We occasionally see that the under-replicated block count is not going down 
quickly enough. We've made at least one fix to speed up block replications 
(HDFS-9205) but we need better insight into the current state and activity of 
the block re-replication logic. For example, we need to understand whether is 
it because re-replication is not making forward progress at all, or is it 
because new under-replicated blocks are being added faster.

We should include additional metrics:
# Cumulative number of blocks that were successfully replicated. 
# Cumulative number of re-replications that timed out.
# Cumulative number of blocks that were dequeued for re-replication but not 
scheduled e.g. because they were invalid, or under-construction or replication 
was postponed.
 
The growth rate of of the above metrics will make it clear whether block 
replication is making forward progress and if not then provide potential clues 
about why it is stalled.

Thanks [~arpitagarwal] for the offline discussions.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12041) Block Storage : make the server address config more concise

2017-06-26 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12041:
-

 Summary: Block Storage : make the server address config more 
concise
 Key: HDFS-12041
 URL: https://issues.apache.org/jira/browse/HDFS-12041
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Priority: Minor


Currently there are a few places where the address are read from config like 
such 
{code}
String cbmIPAddress = ozoneConf.get(
DFS_CBLOCK_JSCSI_CBLOCK_SERVER_ADDRESS_KEY,
DFS_CBLOCK_JSCSI_CBLOCK_SERVER_ADDRESS_DEFAULT
);
int cbmPort = ozoneConf.getInt(
DFS_CBLOCK_JSCSI_PORT_KEY,
DFS_CBLOCK_JSCSI_PORT_DEFAULT
);
{code}
Similarly for jscsi address config. Maybe we should consider merge these to one 
single key config in form of host:port.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12002) Ozone : SCM cli misc fixes/improvements

2017-06-20 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12002:
-

 Summary: Ozone : SCM cli misc fixes/improvements
 Key: HDFS-12002
 URL: https://issues.apache.org/jira/browse/HDFS-12002
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang
 Fix For: ozone


Currently there are a few minor issues with the SCM CLI:

1. some commands do not use -c option to take container name. an issue with 
this is that arguments need to be in a certain order to be correctly parsed, 
e.g.:
{{./bin/hdfs scm -container -del c0 -f}} works, but
{{./bin/hdfs scm -container -del -f c0}} will not

2.some subcommands are not displaying the errors in the best way it could be, 
e.g.:
{{./bin/hdfs scm -container -del}} is wrong because it misses container name. 
So cli complains 
{code}
Missing argument for option: del
Unrecognized options:[-container, -del]
usage: hdfs scm  []
where  can be one of the following
 -container   Container related options
{code}
but this does not really show that it is container name it is missing

3. probably better to rename -del to -delete to be consistent with other 
commands like -create and -info

4. when passing in invalid argument e.g. -info on a non-existing container, an 
exception will be displayed. We probably should not scare the users, and only 
display just one error message. And move the exception display to debug mode 
display or something.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11998) Enable DFSNetworkTopology as default

2017-06-19 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11998:
-

 Summary: Enable DFSNetworkTopology as default
 Key: HDFS-11998
 URL: https://issues.apache.org/jira/browse/HDFS-11998
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


HDFS-11530 has made it configurable to use {{DFSNetworkTopology}}, and still 
uses {{NetworkTopology}} as default. 

Given the stress testing in HDFS-11923 which shows the correctness of 
DFSNetworkTopology, and the performance testing in HDFS-11535 which shows how 
DFSNetworkTopology can outperform NetworkTopology. I think we are at the point 
where I can and should enable DFSNetworkTopology as default.

Any comments/thoughts are more than welcome!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11997) ChunkManager functions do not use the argument keyName

2017-06-19 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11997:
-

 Summary: ChunkManager functions do not use the argument keyName
 Key: HDFS-11997
 URL: https://issues.apache.org/jira/browse/HDFS-11997
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


{{ChunkManagerImpl}}'s functions i.e. {{writeChunk}} {{readChunk}} 
{{deleteChunk}} all take a {{keyName}} argument, which is not being used by any 
of them.

I think this makes sense because conceptually {{ChunkManager}} should not have 
to know keyName to do anything, probably except for some sort of sanity check 
or logging, which is not there either. We should revisit whether we need it 
here. I think we should remove it to make the Chunk syntax, and the function 
signatures more cleanly abstracted.

Any comments? [~anu]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11996) Ozone : add partial read of chunks

2017-06-19 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11996:
-

 Summary: Ozone : add partial read of chunks
 Key: HDFS-11996
 URL: https://issues.apache.org/jira/browse/HDFS-11996
 Project: Hadoop HDFS
  Issue Type: Sub-task
 Environment: Currently when reading a chunk, it is always the whole 
chunk that gets returned. However it is possible the reader may only need to 
read a subset of the chunk. This JIRA adds the partial read of chunks.
Reporter: Chen Liang
Assignee: Chen Liang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11939) Ozone : add read/write random access to Chunks of a key

2017-06-06 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11939:
-

 Summary: Ozone : add read/write random access to Chunks of a key
 Key: HDFS-11939
 URL: https://issues.apache.org/jira/browse/HDFS-11939
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


In Ozone, the value of a key is a sequence of container chunks. Currently, the 
only way to read/write the chunks is by using ChunkInputStream and 
ChunkOutputStream. However, by the nature of streams, these classes are 
currently implemented to only allow sequential read/write. 

Ideally we would like to support random access of the chunks. For example, we 
want to be able to seek to a specific offset and read/write some data. This 
will be critical for key range read/write feature, and potentially important 
for supporting parallel read/write.

This JIRA tracks adding support by implementing FileChannel class on top Chunks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11932) BPServiceActor thread name is not correctly set

2017-06-05 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11932:
-

 Summary: BPServiceActor thread name is not correctly set
 Key: HDFS-11932
 URL: https://issues.apache.org/jira/browse/HDFS-11932
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Chen Liang
Assignee: Chen Liang


When running unit tests (e.g. TestJMXGet), we often get this following 
exception, although the tests still passed:
{code}
WARN  datanode.DataNode (BPOfferService.java:getBlockPoolId(192)) - Block pool 
ID needed, but service not yet registered with NN
java.lang.Exception: trace 
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:192)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.formatThreadName(BPServiceActor.java:556)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.start(BPServiceActor.java:544)
 at 
...
{code}

It seems that, although this does not affect normal operations, this is causing 
the thread name of BPServiceActor not correctly set as desired. More 
specifically,:
{code}
 bpThread = new Thread(this, formatThreadName("heartbeating", nnAddr));
 bpThread.setDaemon(true); // needed for JUnit testing
 bpThread.start();
{code}

The first line tries to call formatThreadName to get format a thread name, and 
formatThreadName is reading the value of BPOfferService#bpNSInfo. However this 
value is set only after the thread started (the third line above). So we get 
exception in the first line for reading non-existing value.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11923) Stress test of DFSNetworkTopology

2017-06-02 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11923:
-

 Summary: Stress test of DFSNetworkTopology
 Key: HDFS-11923
 URL: https://issues.apache.org/jira/browse/HDFS-11923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


I wrote a stress test with {{DFSNetworkTopology}} to verify its correctness 
under huge number of datanode changes e.g., data node insert/delete, storage 
addition/removal etc. The goal is to show that the topology maintains the 
correct counters all time. The test is written that, unless manually 
terminated, it will keep randomly performing the operations nonstop. (and 
because of this, the test is ignored in the patch).

My local test lasted 40 min before I stopped it, it involved more than one 
million datanode changes, and no error happened. We believe this should be 
sufficient to show the correctness of {{DFSNetworkTopology}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11920) Ozone : add key partition

2017-06-02 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11920:
-

 Summary: Ozone : add key partition
 Key: HDFS-11920
 URL: https://issues.apache.org/jira/browse/HDFS-11920
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Currently, each key corresponds to one single SCM block, and putKey/getKey 
writes/reads to this single SCM block. This works fine for keys with reasonably 
small data size. However if the data is too huge, (e.g. not even fits into a 
single container), then we need to be able to partition the key data into 
multiple blocks, each in one container. This JIRA changes the key-related 
classes to support this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently

2017-05-31 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11907:
-

 Summary: NameNodeResourceChecker should avoid calling 
df.getAvailable too frequently
 Key: HDFS-11907
 URL: https://issues.apache.org/jira/browse/HDFS-11907
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang


Currently, {{HealthMonitor#doHealthChecks}} invokes {{NameNode#monitorHealth}} 
which ends up invoking {{NameNodeResourceChecker#isResourceAvailable}}, at the 
frequency of once per second by default. And 
NameNodeResourceChecker#isResourceAvailable invokes {{df.getAvailable();}} 
every time it is called. Which can be a potentially very expensive operation.

Since available space information should rarely be changing dramatically at the 
pace of per second. A cached value should be sufficient. i.e. only try to get 
the updated value when the cached value is too old. otherwise simply return the 
cached value. This way df.getAvailable() gets invoked less.

Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11906) Add log for NameNode#monitorHealth

2017-05-31 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11906:
-

 Summary: Add log for NameNode#monitorHealth
 Key: HDFS-11906
 URL: https://issues.apache.org/jira/browse/HDFS-11906
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang
Priority: Minor


We've seen cases where NN had long delays that we suspect were due to 
{{NameNode#monitorHealth}} was spending too much time on 
{{getNamesystem().checkAvailableResources();}}. However due to the lack of 
logging, it can be hard to verify.  This JIRA tries to add some log to this 
function, that display the actual time spent.

Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11891) DU#refresh should print the path of the directory when an exception is caught

2017-05-26 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11891:
-

 Summary: DU#refresh should print the path of the directory when an 
exception is caught
 Key: HDFS-11891
 URL: https://issues.apache.org/jira/browse/HDFS-11891
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang
Priority: Minor


the refresh() method DU is as follows, 
{code}
  @Override
  protected synchronized void refresh() {
try {
  duShell.startRefresh();
} catch (IOException ioe) {
  LOG.warn("Could not get disk usage information", ioe);
}
  }
{code}
the log warning message should also be printing out the directory that failed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11886) Ozone : improving error handling for putkey operation

2017-05-25 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11886:
-

 Summary: Ozone : improving error handling for putkey operation
 Key: HDFS-11886
 URL: https://issues.apache.org/jira/browse/HDFS-11886
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chen Liang


Ozone's putKey operations involve a couple steps:
1. KSM calls allocateBlock to SCM, writes this info to KSM's local metastore
2. allocatedBlock gets returned to client, client checks to see if container 
needs to be created on datanode, if yes, create the container
3. writes the data to container.

it is possible that 1 succeeded, but 2 or 3 failed, in this case there will be 
an entry in KSM's local metastore, but the key is actually nowhere to be found. 
We need to revert 1 is 2 or 3 failed in this case. This can be done with a 
deleteKey() call to KSM.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11872) Ozone : implement StorageContainerManager#getStorageContainerLocations

2017-05-24 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang resolved HDFS-11872.
---
Resolution: Won't Fix

I misread {{getStorageContainerLocations}} as the lookup of container given 
container's name. But it turns out this is look up container given a specific 
key. In this case this should probably indeed move to KSM. May need to revisit 
this later, but will not 'fix' this for the time being.

> Ozone : implement StorageContainerManager#getStorageContainerLocations
> --
>
> Key: HDFS-11872
> URL: https://issues.apache.org/jira/browse/HDFS-11872
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Chen Liang
>Assignee: Chen Liang
>
> We should implement {{StorageContainerManager#getStorageContainerLocations}} 
> . 
> Although the comment says it will be moved to KSM, the functionality of 
> container lookup by name it should actually be part of SCM functionality.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11872) Ozone : implement StorageContainerManager#getStorageContainerLocations

2017-05-23 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11872:
-

 Summary: Ozone : implement 
StorageContainerManager#getStorageContainerLocations
 Key: HDFS-11872
 URL: https://issues.apache.org/jira/browse/HDFS-11872
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chen Liang
Assignee: Chen Liang


We should implement {{StorageContainerManager#getStorageContainerLocations}} . 

Although the comment says it will be moved to KSM, the functionality of 
container lookup by name it should actually be part of SCM functionality.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11859) Ozone : separate blockLocationProtocol out of containerLocationProtocol

2017-05-19 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11859:
-

 Summary: Ozone : separate blockLocationProtocol out of 
containerLocationProtocol
 Key: HDFS-11859
 URL: https://issues.apache.org/jira/browse/HDFS-11859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang


Currently StorageLocationProtcol contains two types of operations: container 
related operations and block related operations. Although there is 
{{ScmBlockLocationProtocol}} for block operations, only 
{{StorageContainerLocationProtocolServerSideTranslatorPB}} is making the 
distinguish. 

This JIRA tries to make the separation complete and thorough for all places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11857) Ozone : need to refactor StorageContainerLocationProtocolServerSideTranslatorPB

2017-05-19 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11857:
-

 Summary: Ozone : need to refactor 
StorageContainerLocationProtocolServerSideTranslatorPB
 Key: HDFS-11857
 URL: https://issues.apache.org/jira/browse/HDFS-11857
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Currently, StorageContainerLocationProtocolServerSideTranslatorPB has two 
protocol impls:
{{StorageContainerLocationProtocol impl}}
{{ScmBlockLocationProtocol blockImpl}}.
the class provides container-related services by invoking {{impl}}, and 
block-related services by invoking {{blockImpl}}. Namely, on server side, the 
implementation makes a distinguish between "container protocol" and "block 
protocol". 

An issue is that, currently, everywhere except for the server side is viewing 
"container protocol" and "block protocol" as different. More specifically, 
StorageContainerLocationProtocol.proto still includes both container operation 
and block operation in itself alone. As a result of this difference, it is 
difficult to implement certain APIs  (e.g. putKey) properly from client side.

This JIRA merges "block protocol" back to "container protocol" in 
StorageContainerLocationProtocolServerSideTranslatorPB, to unblock the 
implementation of other APIs for client side. 

Please note that, in the long run, separating these two protocols does seem to 
be the right way. This JIRA is only a temporary solution to unblock developing 
other APIs. Will need to revisit these protocols in the future.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11836) Ozone : add sql debug CLI to hdfs script

2017-05-16 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11836:
-

 Summary: Ozone : add sql debug CLI to hdfs script
 Key: HDFS-11836
 URL: https://issues.apache.org/jira/browse/HDFS-11836
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chen Liang
Assignee: Chen Liang


HDFS-11698 was missing one change, which is that {{SQLCLI}} should be exposed 
to commandline via hdfs script. This JIRA addresses this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11802) Ozone : add DEBUG CLI support for open container db file

2017-05-10 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11802:
-

 Summary: Ozone : add DEBUG CLI support for open container db file
 Key: HDFS-11802
 URL: https://issues.apache.org/jira/browse/HDFS-11802
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chen Liang
Assignee: Chen Liang


This is a following-up of HDFS-11698. This JIRA adds the converting of 
openContainer.db levelDB file.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11788) Ozone : add DEBUG CLI support for nodepool db file

2017-05-09 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11788:
-

 Summary: Ozone : add DEBUG CLI support for nodepool db file
 Key: HDFS-11788
 URL: https://issues.apache.org/jira/browse/HDFS-11788
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


This is a following-up of HDFS-11698. This JIRA adds the converting of 
nodepool.db levelDB file.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11759) Ozone : SCMNodeManager#close() should also close node pool manager object

2017-05-04 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11759:
-

 Summary: Ozone : SCMNodeManager#close() should also close node 
pool manager object
 Key: HDFS-11759
 URL: https://issues.apache.org/jira/browse/HDFS-11759
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


{{SCMNodeManager#close()}} should also call {{nodePoolManager.close();}} to 
close it's {{SCMNodePoolManager}} instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11756) Ozone : add DEBUG CLI support of blockDB file

2017-05-04 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11756:
-

 Summary: Ozone : add DEBUG CLI support of blockDB file
 Key: HDFS-11756
 URL: https://issues.apache.org/jira/browse/HDFS-11756
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


This is a following-up of HDFS-11698. This JIRA adds the convert of block.db 
levelDB file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11747) Ozone : need to fix OZONE_SCM_DEFAULT_PORT

2017-05-03 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11747:
-

 Summary: Ozone : need to fix  OZONE_SCM_DEFAULT_PORT
 Key: HDFS-11747
 URL: https://issues.apache.org/jira/browse/HDFS-11747
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


We were deploying things in a physical cluster, and found an issue that 
{{OZONE_SCM_DEFAULT_PORT}} should be set to {{OZONE_SCM_DATANODE_PORT_DEFAULT}} 
instead of 9862 in the config keys.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11728) Ozone: add the DB names to OzoneConsts

2017-05-01 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11728:
-

 Summary: Ozone: add the DB names to OzoneConsts
 Key: HDFS-11728
 URL: https://issues.apache.org/jira/browse/HDFS-11728
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Currently there are several places that use levelDB, and the name of the 
levelDBs are hard coded in the classes that use levelDB. We should extract them 
into OzoneConsts instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11723) Should log a warning message when users try to make certain directories encryption zone

2017-04-28 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11723:
-

 Summary: Should log a warning message when users try to make 
certain directories encryption zone
 Key: HDFS-11723
 URL: https://issues.apache.org/jira/browse/HDFS-11723
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: encryption, hdfs-client
Reporter: Chen Liang
Assignee: Chen Liang


If a user tries to make the entire /user directory an encryption zone, and if 
trash is enabled, there will be problem when the user tries to delete 
unencrypted file from /user to trash directory. The problem will happen even 
with the fix in HDFS-8831. So we should log a WARN message when users try to 
make such directories encryption zone. Such directories include:
{{/user}}, 
{{/user/$user}} 
{{/user/$user/.Trash}}

Thanks [~xyao] for the offline discussion.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11650) Ozone: fix the consistently timeout test testUpgradeFromRel22Image

2017-04-12 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11650:
-

 Summary: Ozone: fix the consistently timeout test 
testUpgradeFromRel22Image
 Key: HDFS-11650
 URL: https://issues.apache.org/jira/browse/HDFS-11650
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Recently, the test TestDFSUpgradeFromImage.testUpgradeFromRel22Image has been 
consistently failing due to timeout. JIRAs that encountered this include (but 
not limited to) HDFS-11642, HDFS-11635, HDFS-11062 and HDFS-11618. While this 
test passes in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11649) Ozone : add SCM CLI shell code placeholder classes

2017-04-12 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11649:
-

 Summary: Ozone : add SCM CLI shell code placeholder classes
 Key: HDFS-11649
 URL: https://issues.apache.org/jira/browse/HDFS-11649
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


HDFS-11470 has outlined how the SCM CLI would look like. Based on the design, 
this JIRA adds the basic placeholder classes for all commands to be filled in.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11645) DataXceiver thread should log the actual error when getting InvalidMagicNumberException

2017-04-11 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11645:
-

 Summary: DataXceiver thread should log the actual error when 
getting InvalidMagicNumberException
 Key: HDFS-11645
 URL: https://issues.apache.org/jira/browse/HDFS-11645
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0-alpha1, 2.8.1
Reporter: Chen Liang
Assignee: Chen Liang
Priority: Minor


Currently, {{DataXceiver#run}} method only logs an error message when getting 
an {{InvalidMagicNumberException}}. It should also log the actual exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11631) Block Storage : allow cblock server to be started from hdfs command

2017-04-06 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11631:
-

 Summary: Block Storage : allow cblock server to be started from 
hdfs command
 Key: HDFS-11631
 URL: https://issues.apache.org/jira/browse/HDFS-11631
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


This JIRA adds CBlock main() method, also adds entry to hdfs script, such that 
cblock server can be started by hdfs script and run as a daemon process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom

2017-03-24 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang reopened HDFS-11535:
---

> Performance analysis of new DFSNetworkTopology#chooseRandom
> ---
>
> Key: HDFS-11535
> URL: https://issues.apache.org/jira/browse/HDFS-11535
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11535.001.patch, HDFS-11535.002.patch, PerfTest.pdf
>
>
> This JIRA is created to post the results of some performance experiments we 
> did.  For those who are interested, please the attached .pdf file for more 
> detail. The attached patch file includes the experiment code we ran. 
> The key insights we got from these tests is that: although *the new method 
> outperforms the current one in most cases*. There is still *one case where 
> the current one is better*. Which is when there is only one storage type in 
> the cluster, and we also always look for this storage type. In this case, it 
> is simply a waste of time to perform storage-type-based pruning, blindly 
> picking up a random node (current methods) would suffice.
> Therefore, based on the analysis, we propose to use a *combination of both 
> the old and the new methods*:
> say, we search for a node of type X, since now inner node all keep storage 
> type info, we can *just check root node to see if X is the only type it has*. 
> If yes, blindly picking a random leaf will work, so we simply call the old 
> method, otherwise we call the new method.
> There is still at least one missing piece in this performance test, which is 
> garbage collection. The new method does a few more object creation when doing 
> the search, which adds overhead to GC. I'm still thinking of any potential 
> optimization but this seems tricky, also I'm not sure whether this 
> optimization worth doing at all. Please feel free to leave any 
> comments/suggestions.
> Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom

2017-03-24 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang resolved HDFS-11535.
---
Resolution: Information Provided

> Performance analysis of new DFSNetworkTopology#chooseRandom
> ---
>
> Key: HDFS-11535
> URL: https://issues.apache.org/jira/browse/HDFS-11535
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11535.001.patch, HDFS-11535.002.patch, PerfTest.pdf
>
>
> This JIRA is created to post the results of some performance experiments we 
> did.  For those who are interested, please the attached .pdf file for more 
> detail. The attached patch file includes the experiment code we ran. 
> The key insights we got from these tests is that: although *the new method 
> outperforms the current one in most cases*. There is still *one case where 
> the current one is better*. Which is when there is only one storage type in 
> the cluster, and we also always look for this storage type. In this case, it 
> is simply a waste of time to perform storage-type-based pruning, blindly 
> picking up a random node (current methods) would suffice.
> Therefore, based on the analysis, we propose to use a *combination of both 
> the old and the new methods*:
> say, we search for a node of type X, since now inner node all keep storage 
> type info, we can *just check root node to see if X is the only type it has*. 
> If yes, blindly picking a random leaf will work, so we simply call the old 
> method, otherwise we call the new method.
> There is still at least one missing piece in this performance test, which is 
> garbage collection. The new method does a few more object creation when doing 
> the search, which adds overhead to GC. I'm still thinking of any potential 
> optimization but this seems tricky, also I'm not sure whether this 
> optimization worth doing at all. Please feel free to leave any 
> comments/suggestions.
> Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11577) Combine the old and the new chooseRandom for better performance

2017-03-24 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11577:
-

 Summary: Combine the old and the new chooseRandom for better 
performance
 Key: HDFS-11577
 URL: https://issues.apache.org/jira/browse/HDFS-11577
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


As discussed in HDFS-11535, this JIRA adds a new function combining both the 
new and the old chooseRandom methods for better performance.

More specifically, when choosing a random node with storage type requirement, 
the combined method first tries the old method of blindly picking a random 
node. If this node satisfies, it is returned. Otherwise, the new chooseRandom 
is called, which guarantees to find a eligible node in one call (if there is 
one at all).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11539) Block Storage : configurable max cache size

2017-03-16 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11539:
-

 Summary: Block Storage : configurable max cache size
 Key: HDFS-11539
 URL: https://issues.apache.org/jira/browse/HDFS-11539
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang


Currently, there is no max size limit for CBlock's local cache. In theory, this 
means the cache can potentially increase unbounded. We should make the max size 
configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11537) Block Storage : add cache layer

2017-03-15 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11537:
-

 Summary: Block Storage : add cache layer
 Key: HDFS-11537
 URL: https://issues.apache.org/jira/browse/HDFS-11537
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


This JIRA adds the cache layer. Specifically, this JIRA implements the cache 
interface in HDFS-11361 and adds the code that actually talks to containers. 
The upper layer can simply view the storage as a cache with simple put and get 
interface, while in the backend the get and put are actually talking to 
containers. This is a critical part to the cblock performance. [~anu] is 
actually the author who contributed to most of this part.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom

2017-03-15 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11535:
-

 Summary: Performance analysis of new 
DFSNetworkTopology#chooseRandom
 Key: HDFS-11535
 URL: https://issues.apache.org/jira/browse/HDFS-11535
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Chen Liang
Assignee: Chen Liang
 Attachments: PerfTest.pdf

This JIRA is created to post the results of some performance experiments we 
did.  For those who are interested, please the attached .pdf file for more 
detail. The attached patch file includes the experiment code we ran. 

The key insights we got from these tests is that: although *the new method 
outperforms the current one in most cases*. There is still *one case where the 
current one is better*. Which is when there is only one storage type in the 
cluster, and we also always look for this storage type. In this case, it is 
simply a waste of time to perform storage-type-based pruning, blindly picking 
up a random node (current methods) would suffice.

Therefore, based on the analysis, we propose to use a *combination of both the 
old and the new methods*:

say, we search for a node of type X, since now inner node all keep storage type 
info, we can *just check root node to see if X is the only type it has*. If 
yes, blindly picking a random leaf will work, so we simply call the old method, 
otherwise we call the new method.

There is still at least one missing piece in this performance test, which is 
garbage collection. The new method does a few more object creation when doing 
the search, which adds overhead to GC. I'm still thinking of any potential 
optimization but this seems tricky, also I'm not sure whether this optimization 
worth doing at all. Please feel free to leave any comments/suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11514) ChooseRandom can potentially be optimized

2017-03-08 Thread Chen Liang (JIRA)
Chen Liang created HDFS-11514:
-

 Summary: ChooseRandom can potentially be optimized
 Key: HDFS-11514
 URL: https://issues.apache.org/jira/browse/HDFS-11514
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Based on the offline discussion, one potential improvement to the 
{{chooseRandomWithStorageType}} added in HDFS-11482 is that, currently given a 
node, the method iterates all its children to sum up the number of candidate 
datanodes. Since datanode status change is much less frequent than block 
placement request. It is more efficient to get rid of this iteration check, by 
probably maintaining another disk type counter map. This JIRA tracks (but not 
limited) this optimization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11507) NetworkTopology#chooseRandom may run into a dead loop due to race condition

2017-03-07 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang resolved HDFS-11507.
---
Resolution: Not A Problem

> NetworkTopology#chooseRandom may run into a dead loop due to race condition
> ---
>
> Key: HDFS-11507
> URL: https://issues.apache.org/jira/browse/HDFS-11507
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>
> {{NetworkTopology#chooseRandom()}} works as:
> 1. counts the number of available nodes as {{availableNodes}},
> 2. checks how many nodes are excluded, deduct from {{availableNodes}}
> 3. if {{availableNodes}} still > 0, then there are nodes available.
> 4. keep looping to find that node
> But now imagine, in the meantime, the actually available nodes got removed in 
> step 3 or step 4, and all remaining nodes are excluded nodes. Then, although 
> there are no more nodes actually available, the code would still run as 
> {{availableNodes}} > 0, and then it would keep getting excluded node and loop 
> forever, as 
> {{if (excludedNodes == null || !excludedNodes.contains(ret))}} 
> will always be false.
> We may fix this by expanding the while loop to also include the 
> {{availableNodes}} calculation. Such that we re-calculate {{availableNodes}} 
> every time it fails to find an available node.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



  1   2   >