date:20180207

[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-07 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356614#comment-16356614
 ] 

Yiqun Lin commented on HDFS-13119:
--

Just looked into this,
{quote}When a federated cluster has one of the subcluster down, operations that 
run in every subcluster (RouterRpcClient#invokeAll()) may take all the RPC 
connections.
{quote}
Looked into the related code, I didn't see the logic for triggering RPC 
requests for every subclustet once one subcluster was down. I just looked the 
method {{RouterRpcClient#invoke}} invoked in {{RouterRpcClient#invokeMethod}}. 
Correct me If I am wrong.

Not so clear for this, would you describe more?
{quote}
Better control of the number of RPC clients
{quote}

I have a proposal for "No need to try so many times if we "know" the subcluster 
is down": When the failed happened, then query from {{ActiveNamenodeResolver}} 
if the cluster is down, if yes, don't do retry. In addition, current default 
retry times (10 times) can be decreased a lot.

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356603#comment-16356603
 ] 

genericqa commented on HDFS-13120:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 37s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}134m 38s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m 20s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13120 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909724/HDFS-13120.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fd68dcc0e599 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f491f71 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22988/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22988/testReport/ |
| Max. process+thread count | 4460 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22988/console |
| Powered by | Apache Yetus

[jira] [Commented] (HDFS-13117) Proposal to support writing replications to HDFS asynchronously

2018-02-07 Thread xuchuanyin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356592#comment-16356592
 ] 

xuchuanyin commented on HDFS-13117:
---

[~jojochuang] [~kihwal] Thanks for your response.

I haven't noticed the consequences of directly write files to HDFS in our apps. 
Actually we haven't tested it yet as we already knew that when client write 
data to HDFS, it will return only if the last block of last replication is done 
– at least the time between the last block of the first replication and the 
last block of the last replication can be saved.

 

Our apps are high performance in-memory-calculating related processes (written 
in C language). The app's performance of reading local file is about 3~4X 
better than that of reading HDFS file, So we are worrying about the write 
performance would encounter the same problem.

 

Now we want to make a balance between efficiency and disk write：

The process write temporary local files and copy it to HDFS in another thread 
will certainly make the process high performance but will cause more disk 
write. So I propose the above proposal.

 

I’m not afraid if I have made my idea clear...

> Proposal to support writing replications to HDFS asynchronously
> ---
>
> Key: HDFS-13117
> URL: https://issues.apache.org/jira/browse/HDFS-13117
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: xuchuanyin
>Priority: Major
>
> My initial question was as below:
> ```
> I've learned that When We write data to HDFS using the interface provided by 
> HDFS such as 'FileSystem.create', our client will block until all the blocks 
> and their replications are done. This will cause efficiency problem if we use 
> HDFS as our final data storage. And many of my colleagues write the data to 
> local disk in the main thread and copy it to HDFS in another thread. 
> Obviously, it increases the disk I/O.
>  
>    So, is there a way to optimize this usage? I don't want to increase the 
> disk I/O, neither do I want to be blocked during the writing of extra 
> replications.
>   How about writing to HDFS by specifying only one replication in the main 
> thread and set the actual number of replication in another thread? Or is 
> there any better way to do this?
> ```
>  
> So my proposal here is to support writing extra replications to HDFS 
> asynchronously. User can set a minimum replicator as acceptable number of 
> replications ( < default or expected replicator). When writing to HDFS, user 
> will only be blocked until the minimum replicator has been finished and HDFS 
> will continue to complete the extra replications in background.Since HDFS 
> will periodically check the integrity of all the replications, we can also 
> leave this work to HDFS itself.
>  
> There are ways to provide the interfaces：
> 1. Creating a series of interfaces by adding `acceptableReplication` 
> parameter to the current interfaces as below:
> ```
> Before：
> FSDataOutputStream create(Path f,
>   boolean overwrite,
>   int bufferSize,
>   short replication,
>   long blockSize
> ) throws IOException
>  
> After：
> FSDataOutputStream create(Path f,
>   boolean overwrite,
>   int bufferSize,
>   short replication,
>   short acceptableReplication, // minimum number of replication to finish 
> before return
>   long blockSize
> ) throws IOException
> ```
>  
> 2. Adding the `acceptableReplication` and `asynchronous` to the runtime (or 
> default) configuration, so user will not have to change any interface and 
> will benefit from this feature.
>  
> How do you think about this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13046) consider load of datanodes when read blocks of file

2018-02-07 Thread hu xiaodong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356562#comment-16356562
 ] 

hu xiaodong commented on HDFS-13046:


hello, [~ajisakaa],

     what do you think?

> consider load of datanodes when read blocks of file
> ---
>
> Key: HDFS-13046
> URL: https://issues.apache.org/jira/browse/HDFS-13046
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: hu xiaodong
>Assignee: hu xiaodong
>Priority: Minor
> Attachments: 
> HDFS-13046-considerLoadAfterSortBydistance-001-sample.patch, 
> HDFS-13046-sample.patch
>
>
> When sorting block locations, we just consider the distance of datanodes. can 
> we consider the load of datanodes? We can add a configuration such as 
> 'dfs.namenode.reading.considerLoad', if set to true, then sort the 
> blocklocations by load of the datanodes, otherwise sort by distance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-07 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356540#comment-16356540
 ] 

Xiao Chen commented on HDFS-11187:
--

Thanks for working on the branch-2 patch Gabor! 

I'm not entirely familiar with this part of code, so would be great if 
[~jojochuang] or [~kihwal] can double check.

Some review comments:
Agree we should be fine to not port HDFS-10636 (the replicainfo -> file jira) 
to branch-2, as Gabor said.

FinalizedReplica.java:
- Should keep the import change (remove fnfe)
- Should remove {{getLastChecksumAndDataLen}}

FsVolumeImpl:
- I think we probably should keep the trunk patch's logic in 
{{addFinalizedBlock}}. But given this interface difference, maybe we can move 
this to FsDatasetImple, after the {{addFinalizedBlock}} returns and before 
creating the {{FinalizedReplica}} object?

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, HDFS-11187.001.patch, 
> HDFS-11187.002.patch, HDFS-11187.003.patch, HDFS-11187.004.patch, 
> HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Jianfei Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356543#comment-16356543
 ] 

Jianfei Jiang commented on HDFS-12935:
--

 Thanks [~brahmareddy]. Thank you for your guidance.

I reviewed the failure in branch 2 test results. Most of them are related to 
OOM exception. Maybe the running environment has something.

 

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 3.0.2
>
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Aaron T. Myers (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356542#comment-16356542
 ] 

Aaron T. Myers commented on HDFS-12051:
---

I've taken a quick look at the patch and have some questions:

# Is there a way to disable the cache entirely, if we find that there's some 
bug in the implementation? e.g. if you set the ratio to 0, does everything 
behave correctly?
# How hard would it be to not make this class a static singleton, and instead 
have a single instance of it in the NN that can be referenced, perhaps as an 
instance variable of the {{FSNamesystem}}? That seems a bit less fragile if 
it's possible, and could allow for the class to be more easily tested.
# Have you done any verification of the correctness of this cache in any of 
your benchmarks? e.g. something that walked the file system tree to ensure that 
the names are identical with/without this cache I think would help allay 
correctness concerns.
# I'd really like to see some more tests of the actual cache implementation 
itself, e.g. in the presence of hash collisions, behavior at the boundaries of 
the main cache array, overlap of slots probed in the open addressing search, 
other edge cases, etc.
# I see that precommit raised some findbugs warnings and had some failed unit 
tests. Can we please address the findbugs warnings, and also confirm that those 
unit test failures are unrelated?
# Seems like this cache will have a somewhat odd behavior if an item hashes to 
a slot that's within {{MAX_COLLISION_CHAIN_LEN}} slots of the end of the array, 
in that it looks like we'll just probe the same slot over and over again up to 
{{MAX_COLLISION_CHAIN_LEN}} times. Is this to be expected?

[~mi...@cloudera.com] - in general I share [~szetszwo]'s concern that we just 
need to be very careful with changes to this sort of code in the NN, because 
even a small bug could subtly result in very severe consequences. I realize 
that the length of time that this patch has been up is frustrating for you, but 
please understand that the concerns being raised are in good faith, and are 
just focused on trying to ensure that file system data is not ever put at risk. 
The more tests you can include in the patch, and the more correctness testing 
you can report having done on the patch, will help all reviewers feel more 
comfortable and confident in committing this very valuable change. The recent 
reviews that this patch have received demonstrate to me that we're moving in a 
good direction to getting this JIRA resolved.

I'd also like to ping [~daryn] and [~kihwal] to see if they have time to review 
this change, as I bet they'll be keenly interested in this improvement.

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ...

[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Attachment: HDFS-13110-HDFS-10285-03.patch

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch, 
> HDFS-13110-HDFS-10285-03.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356528#comment-16356528
 ] 

genericqa commented on HDFS-13099:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 57s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}133m 23s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}180m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.datanode.TestDataNodeUUID |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13099 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909716/HDFS-13099.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 38feb5c9d3af 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f491f71 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22987/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results |

[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356529#comment-16356529
 ] 

Rakesh R commented on HDFS-13110:
-

Attached another patch, where I've modified the low redundancy checks of 
striped block as redundancy of striped block is summation of ({{data + 
parity}}) nums.

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch, 
> HDFS-13110-HDFS-10285-03.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13123) RBF: Add a balancer tool to move data across subsluter

2018-02-07 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356499#comment-16356499
 ] 

Wei Yan commented on HDFS-13123:


I'll put a quick doc summarizing the solution by end of this week.

> RBF: Add a balancer tool to move data across subsluter 
> ---
>
> Key: HDFS-13123
> URL: https://issues.apache.org/jira/browse/HDFS-13123
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Major
>
> Follow the discussion in HDFS-12615. This Jira is to track effort for 
> building a rebalancer tool, used by router-based federation to move data 
> among subclusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-13123) RBF: Add a balancer tool to move data across subsluter

2018-02-07 Thread Wei Yan (JIRA)

Wei Yan created HDFS-13123:
--

 Summary: RBF: Add a balancer tool to move data across subsluter 
 Key: HDFS-13123
 URL: https://issues.apache.org/jira/browse/HDFS-13123
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan


Follow the discussion in HDFS-12615. This Jira is to track effort for building 
a rebalancer tool, used by router-based federation to move data among 
subclusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12615) Router-based HDFS federation phase 2

2018-02-07 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356497#comment-16356497
 ] 

Wei Yan commented on HDFS-12615:


[~elgoiri] [~linyiqun] just created HDFS-13123 for the balancer.

> Router-based HDFS federation phase 2
> 
>
> Key: HDFS-12615
> URL: https://issues.apache.org/jira/browse/HDFS-12615
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: RBF
>
> This umbrella JIRA tracks set of improvements over the Router-based HDFS 
> federation (HDFS-10467).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356483#comment-16356483
 ] 

Xiaoyu Yao commented on HDFS-13120:
---

Fix checkstyle issue in the test added.

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13120.001.patch, HDFS-13120.002.patch
>
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13120:
--
Attachment: HDFS-13120.002.patch

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13120.001.patch, HDFS-13120.002.patch
>
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356466#comment-16356466
 ] 

genericqa commented on HDFS-13120:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 4 new + 41 unchanged - 0 fixed = 45 total (was 41) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 57s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}177m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration |
|   | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13120 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909704/HDFS-13120.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 68f244ec032a 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f491f71 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22986/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22986/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|

[jira] [Commented] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted

2018-02-07 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356452#comment-16356452
 ] 

Xiao Chen commented on HDFS-12985:
--

bq. I cherry-picked to 3.0.1.
https://hadoop.apache.org/versioning.html defines the commit flow. Since the 
commit was done to trunk and branch-2, we need to make sure the branches in the 
middle are committed too. 3.0.1 is great, but we also need branch-3.0. I just 
did it but please try your best to commit to necessary branches.
Thanks a lot.


> NameNode crashes during restart after an OpenForWrite file present in the 
> Snapshot got deleted
> --
>
> Key: HDFS-12985
> URL: https://issues.apache.org/jira/browse/HDFS-12985
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: HDFS-12985.01.patch
>
>
> NameNode crashes repeatedly with NPE at the startup when trying to find the 
> total number of under construction blocks. This crash happens after an open 
> file, which was also part of a snapshot gets deleted along with the snapshot.
> {noformat}
> Failed to start namenode.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356446#comment-16356446
 ] 

Yongjun Zhang commented on HDFS-13115:
--

Thanks a lot [~xiaochen].

 

 

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted

2018-02-07 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356443#comment-16356443
 ] 

Yongjun Zhang commented on HDFS-12985:
--

Thanks [~jojochuang].

I cherry-picked to 3.0.1.

 

> NameNode crashes during restart after an OpenForWrite file present in the 
> Snapshot got deleted
> --
>
> Key: HDFS-12985
> URL: https://issues.apache.org/jira/browse/HDFS-12985
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: HDFS-12985.01.patch
>
>
> NameNode crashes repeatedly with NPE at the startup when trying to find the 
> total number of under construction blocks. This crash happens after an open 
> file, which was also part of a snapshot gets deleted along with the snapshot.
> {noformat}
> Failed to start namenode.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted

2018-02-07 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-12985:
-
Fix Version/s: 3.0.1

> NameNode crashes during restart after an OpenForWrite file present in the 
> Snapshot got deleted
> --
>
> Key: HDFS-12985
> URL: https://issues.apache.org/jira/browse/HDFS-12985
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: HDFS-12985.01.patch
>
>
> NameNode crashes repeatedly with NPE at the startup when trying to find the 
> total number of under construction blocks. This crash happens after an open 
> file, which was also part of a snapshot gets deleted along with the snapshot.
> {noformat}
> Failed to start namenode.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-07 Thread Yiqun Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin reassigned HDFS-13119:


Assignee: Yiqun Lin

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-07 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356434#comment-16356434
 ] 

Yiqun Lin commented on HDFS-13119:
--

I'd like to make this improvement, :). Assign it to myself.

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12615) Router-based HDFS federation phase 2

2018-02-07 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356427#comment-16356427
 ] 

Yiqun Lin commented on HDFS-12615:
--

{quote}
For the rebalancer itself, I have some code working in my local, by reusing 
lots of code from your prototype. I can put a quick doc summarizing the idea, 
and we can start to merge the code back to trunk.
{quote}
Sounds great. Please go ahead, [~ywskycn], :).

> Router-based HDFS federation phase 2
> 
>
> Key: HDFS-12615
> URL: https://issues.apache.org/jira/browse/HDFS-12615
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: RBF
>
> This umbrella JIRA tracks set of improvements over the Router-based HDFS 
> federation (HDFS-10467).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13099:
-
Attachment: HDFS-13099.006.patch

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch, HDFS-13099.005.patch, 
> HDFS-13099.006.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356421#comment-16356421
 ] 

Yiqun Lin commented on HDFS-13099:
--

New patch attached for addressing the comment.

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch, HDFS-13099.005.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-13122) FSImage should not update quota counts on ObserverNode

2018-02-07 Thread Erik Krogen (JIRA)

Erik Krogen created HDFS-13122:
--

 Summary: FSImage should not update quota counts on ObserverNode
 Key: HDFS-13122
 URL: https://issues.apache.org/jira/browse/HDFS-13122
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs, namenode
Reporter: Erik Krogen
Assignee: Erik Krogen


Currently in {{FSImage#loadEdits()}}, after applying a set of edits, we call
{code}
updateCountForQuota(target.getBlockManager().getStoragePolicySuite(), 
target.dir.rootDir);
{code}
to update the quota counts for the entire namespace, which can be very 
expensive. This makes sense if we are about to become the ANN, since we need 
valid quotas, but not on an ObserverNode which does not need to enforce quotas.

This is related to increasing the frequency with which the SbNN can tail edits 
from the ANN to decrease the lag time for transactions to appear on the 
Observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13122) Tailing edits should not update quota counts on ObserverNode

2018-02-07 Thread Erik Krogen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-13122:
---
Summary: Tailing edits should not update quota counts on ObserverNode  
(was: FSImage should not update quota counts on ObserverNode)

> Tailing edits should not update quota counts on ObserverNode
> 
>
> Key: HDFS-13122
> URL: https://issues.apache.org/jira/browse/HDFS-13122
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> Currently in {{FSImage#loadEdits()}}, after applying a set of edits, we call
> {code}
> updateCountForQuota(target.getBlockManager().getStoragePolicySuite(), 
> target.dir.rootDir);
> {code}
> to update the quota counts for the entire namespace, which can be very 
> expensive. This makes sense if we are about to become the ANN, since we need 
> valid quotas, but not on an ObserverNode which does not need to enforce 
> quotas.
> This is related to increasing the frequency with which the SbNN can tail 
> edits from the ANN to decrease the lag time for transactions to appear on the 
> Observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-13121) NPE when request file descriptors when SC read

2018-02-07 Thread Gang Xie (JIRA)

Gang Xie created HDFS-13121:
---

 Summary: NPE when request file descriptors when SC read
 Key: HDFS-13121
 URL: https://issues.apache.org/jira/browse/HDFS-13121
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Gang Xie


Recently, we hit an issue that the DFSClient throws NPE. The case is that, the 
app process exceeds the limit of the max open file. In the case, the libhadoop 
never throw and exception but return null to the request of fds. But 
requestFileDescriptors use the returned fds directly without any check and then 
NPE. 

 

We need add a sanity check here of null pointer.

 

private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer,
 Slot slot) throws IOException {
 ShortCircuitCache cache = clientContext.getShortCircuitCache();
 final DataOutputStream out =
 new DataOutputStream(new BufferedOutputStream(peer.getOutputStream()));
 SlotId slotId = slot == null ? null : slot.getSlotId();
 new Sender(out).requestShortCircuitFds(block, token, slotId, 1,
 failureInjector.getSupportsReceiptVerification());
 DataInputStream in = new DataInputStream(peer.getInputStream());
 BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
 PBHelperClient.vintPrefixed(in));
 DomainSocket sock = peer.getDomainSocket();
 failureInjector.injectRequestFileDescriptorsFailure();
 switch (resp.getStatus()) {
 case SUCCESS:
 byte buf[] = new byte[1];
 FileInputStream[] fis = new FileInputStream[2];
 {color:#d04437}sock.recvFileInputStreams(fis, buf, 0, buf.length);{color}
 ShortCircuitReplica replica = null;
 try {
 ExtendedBlockId key =
 new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId());
 if (buf[0] == USE_RECEIPT_VERIFICATION.getNumber()) {
 LOG.trace("Sending receipt verification byte for slot {}", slot);
 sock.getOutputStream().write(0);
 }
 {color:#d04437}replica = new ShortCircuitReplica(key, fis[0], fis[1], 
cache,{color}
{color:#d04437} Time.monotonicNow(), slot);{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356351#comment-16356351
 ] 

Xiaoyu Yao commented on HDFS-13120:
---

Thanks [~arpitagarwal],[~jnp] and [~szetszwo] for the offline discussion on the 
fix. 

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13120.001.patch
>
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12512) RBF: Add WebHDFS

2018-02-07 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356335#comment-16356335
 ] 

Wei Yan commented on HDFS-12512:


I'll try one more later night, maybe can help avoid other parallel jobs. For 
the split and other ideas, I think we can discuss more details in the HDFS bug 
bash session.

> RBF: Add WebHDFS
> 
>
> Key: HDFS-12512
> URL: https://issues.apache.org/jira/browse/HDFS-12512
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Íñigo Goiri
>Assignee: Wei Yan
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-12512.000.patch, HDFS-12512.001.patch, 
> HDFS-12512.002.patch, HDFS-12512.003.patch, HDFS-12512.004.patch, 
> HDFS-12512.005.patch, HDFS-12512.006.patch
>
>
> The Router currently does not support WebHDFS. It needs to implement 
> something similar to {{NamenodeWebHdfsMethods}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12615) Router-based HDFS federation phase 2

2018-02-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356319#comment-16356319
 ] 

Íñigo Goiri commented on HDFS-12615:


[~ywskycn] I think it makes sense to create a separate umbrella and add design 
docs etc, there.

> Router-based HDFS federation phase 2
> 
>
> Key: HDFS-12615
> URL: https://issues.apache.org/jira/browse/HDFS-12615
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: RBF
>
> This umbrella JIRA tracks set of improvements over the Router-based HDFS 
> federation (HDFS-10467).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13120:
--
Attachment: HDFS-13120.001.patch

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13120.001.patch
>
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13120:
--
Status: Patch Available  (was: Open)

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13120.001.patch
>
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12615) Router-based HDFS federation phase 2

2018-02-07 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356309#comment-16356309
 ] 

Wei Yan commented on HDFS-12615:


[~elgoiri] For the #1 rebalancer, do we just create another separate umbrea 
Jira, so that we can try to close this phase2 Jira? Or we still put the umbrea 
one under this Jira?

For the rebalancer itself, I have some code working in my local, by reusing 
lots of code from your prototype. I can put a quick doc summarizing the idea, 
and we can start to merge the code back to trunk.

> Router-based HDFS federation phase 2
> 
>
> Key: HDFS-12615
> URL: https://issues.apache.org/jira/browse/HDFS-12615
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: RBF
>
> This umbrella JIRA tracks set of improvements over the Router-based HDFS 
> federation (HDFS-10467).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356298#comment-16356298
 ] 

Xiao Chen commented on HDFS-13115:
--

cherry-picked to branch-3.0. thanks all

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356279#comment-16356279
 ] 

Xiaoyu Yao commented on HDFS-13120:
---

*Error stack with SnapshotDiffReport*
{code}
org.apache.hadoop.ipc.RemoteException(java.lang.IllegalArgumentException): 
Illegal Capacity: -2
at java.util.ArrayList.(ArrayList.java:156)
at org.apache.hadoop.hdfs.util.Diff.apply2Previous(Diff.java:363)
at org.apache.hadoop.hdfs.util.Diff.apply2Current(Diff.java:413)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:232)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:240)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.computeDiffRecursively(DirectorySnapshottableFeature.java:466)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.computeDiff(DirectorySnapshottableFeature.java:332)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.diff(SnapshotManager.java:492)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.getSnapshotDiffReportListing(FSDirSnapshotOp.java:183)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getSnapshotDiffReportListing(FSNamesystem.java:6532)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getSnapshotDiffReportListing(NameNodeRpcServer.java:1896)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getSnapshotDiffReportListing(ClientNamenodeProtocolServerSideTranslatorPB.java:1281)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)

{code}

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   at 
>

[jira] [Created] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-13120:
-

 Summary: Snapshot diff could be corrupted after concat
 Key: HDFS-13120
 URL: https://issues.apache.org/jira/browse/HDFS-13120
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, snapshots
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


The snapshot diff can be corrupted after concat files. This could lead to 
Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 

For example, we have seen customers hit stack trace similar to the one below 
but during loading edit entry of DeleteSnapshotOp. After the investigation, we 
found this is a regression caused by HDFS-3689 where the snapshot diff is not 
fully cleaned up after concat. 

I will post the unit test to repro this and fix for it shortly.

{code}
org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)

{code} 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356249#comment-16356249
 ] 

genericqa commented on HDFS-12051:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 50s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 1235 unchanged - 18 fixed = 1238 total (was 1253) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  
2s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Increment of volatile field 
org.apache.hadoop.hdfs.server.namenode.NameCache.size in 
org.apache.hadoop.hdfs.server.namenode.NameCache.put(byte[])  At 
NameCache.java:in org.apache.hadoop.hdfs.server.namenode.NameCache.put(byte[])  
At NameCache.java:[line 119] |
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12051 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909669/HDFS-12051.11.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux c995c1528d5b 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |

[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.1
   2.10.0
   3.1.0
   Status: Resolved  (was: Patch Available)

Thanks [~billyean] and [~jojochuang] for the review. I committed to trunk, 
3.0.1, branch-2.

 

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-07 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13119:
---
Summary: RBF: Manage unavailable clusters  (was: RBF: manage unavailable 
clusters)

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356211#comment-16356211
 ] 

Íñigo Goiri commented on HDFS-13119:


We had this happening the other day when we added a subcluster for testing and 
the Namenodes in this subcluster were down for a few days. The Routers ended up 
with thousands of threads trying to do RPC connections to the Namenodes that 
were down. One example was {{renewLease()}}, this operation is executed in all 
the subclusters and we were had connections stuck for more than 3 minutes 
because the default retry policy was to try 10 times with a timeout of 20 
seconds.

We should do a couple things:
* Better control of the number of RPC clients
* No need to try so many times if we "know" the subcluster is down

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-13119) RBF: manage unavailable clusters

2018-02-07 Thread JIRA

Íñigo Goiri created HDFS-13119:
--

 Summary: RBF: manage unavailable clusters
 Key: HDFS-13119
 URL: https://issues.apache.org/jira/browse/HDFS-13119
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Íñigo Goiri


When a federated cluster has one of the subcluster down, operations that run in 
every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356167#comment-16356167
 ] 

Íñigo Goiri edited comment on HDFS-13099 at 2/7/18 10:24 PM:
-

[^HDFS-13099.005.patch] seems to run the unit tests properly.

I checked and it was successful at running the ones that use the State Store. 
Most of them rely on the curator mini ZK cluster so I think this is safe.

We could do the TestMetricsBase change in the RouterConfigBuilder itself when 
we set {{stateStore()}}.


was (Author: elgoiri):
[^HDFS-13099.005.patch] seems to run the unit tests properly.

I checked and it was successful at running the ones that use the State Store. 
Most of them rely on the curator mini ZK cluster so I think this is safe.

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch, HDFS-13099.005.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356167#comment-16356167
 ] 

Íñigo Goiri commented on HDFS-13099:


[^HDFS-13099.005.patch] seems to run the unit tests properly.

I checked and it was successful at running the ones that use the State Store. 
Most of them rely on the curator mini ZK cluster so I think this is safe.

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch, HDFS-13099.005.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356159#comment-16356159
 ] 

Hudson commented on HDFS-13115:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13632 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13632/])
HDFS-13115. In getNumUnderConstructionBlocks(), ignore the inodeIds for 
(yzhang: rev f491f717e9ee6b75ad5cfca48da9c6297e94a8f7)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java


> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356160#comment-16356160
 ] 

genericqa commented on HDFS-13116:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
34s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
49s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
16s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
11s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
13s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
7s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
48s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}119m 43s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}194m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestErasureCodingPolicies |
|   | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.TestReadStripedFileWithDNFailure |
|   | hadoop.ozone.scm.container.TestContainerStateManager |
|   | hadoop.ozone.TestMiniOzoneCluster |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.ozone.TestStorageContainerManager |
|   | hadoop.hdfs.TestHDFSFileSystemContract |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce

[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HDFS-12051:
--
Status: Patch Available  (was: In Progress)

Addressed the latest commment by [~yzhangal] regarding the cache size as a heap 
size ratio format (switched from int to float).

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
>

[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HDFS-12051:
--
Attachment: HDFS-12051.11.patch

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
>

[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HDFS-12051:
--
Status: In Progress  (was: Patch Available)

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
>

[jira] [Comment Edited] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356082#comment-16356082
 ] 

Anu Engineer edited comment on HDFS-13116 at 2/7/18 9:04 PM:
-

[~msingh] Thanks for working on this patch. It looks very good. I have some 
very minor comments.
 * Conduit --> Something like PipelineInfo
 * interface to Abstract --I am wondering if this makes future changes less 
flexible. For example, if we have erasure coded pipelines, I am not sure if 
this abstract class implementation would work. But for the current case of 
stand-alone and Ratis, I do see it works well.

 * {{PipelineManager#getPipeline}}
{code:java}
  if (conduit == null) {
  LOG.error("Get conduit call failed. We are not able to find free nodes" +
  " or operational conduit.");
}
return new Pipeline(containerName, conduit);
 {code}
Shouldn't we throw after the error, or do you want to return a new pipeline 
with conduit equal to null?

{{PipelineManager#findOpenConduit}}
Should this function be protected via a synchronized keyword or a lock?

Here is what I am thinking – please correct me if I am wrong.

– This is the point of check
{code:java}
  if (activeConduits.size() == 0) {
{code}
– This is the point of use.
{code:java}
private int getNextIndex() {
return conduitsIndex.incrementAndGet() % activeConduits.size();
}
{code}
What happens if the activeConduits size becomes zero? is that possible?


was (Author: anu):
[~msingh] Thanks for working on this patch. It looks very good. I have some 
very minor comments.
* Conduit --> Something like PipelineInfo
* interface to Abstract --I am wondering if this makes future changes less 
flexible. For example, if we have erasure coded pipelines, I am not sure if 
this abstract class implementation would work. But for the current case of 
stand-alone and Ratis, I do see it works well.

* {{PipelineManager#getPipeline}}
{code}
  if (conduit == null) {
  LOG.error("Get conduit call failed. We are not able to find free nodes" +
  " or operational conduit.");
}
return new Pipeline(containerName, conduit);
 {code}
 Shouldn't we throw after the error, or do you want to return a new pipeline 
with conduit equal to null?

{{PipelineManager#findOpenConduit}}
Should this function be protected via a synchronized keyword or a lock? 

Here is what I am thinking -- please correct me if I am wrong.

-- This is the point of check 
{code}
  if (activeConduits.size() == 0) {
{code}

--  This is the point of use.
{code}
private int getNextIndex() {
return conduitsIndex.incrementAndGet() % activeConduits.size();
}
{code}
What happens if the activeConduits size becomes zero? ispossible?ible ?







> Ozone: Refactor Pipeline to have transport and container specific information
> -
>
> Key: HDFS-13116
> URL: https://issues.apache.org/jira/browse/HDFS-13116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13116-HDFS-7240.001.patch, 
> HDFS-13116-HDFS-7240.002.patch, HDFS-13116-HDFS-7240.003.patch, 
> HDFS-13116-HDFS-7240.004.patch, HDFS-13116-HDFS-7240.005.patch, 
> HDFS-13116-HDFS-7240.006.patch
>
>
> Currently pipeline has information about both the container as well Transport 
> layer. This results in cases where a new pipeline (i.e. transport) 
> information is allocated for each container creation.
> This code can be refactored so that the Transport information is separated 
> from the container, then the {{Transport}} can be shared between multiple 
> pipelines/containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356082#comment-16356082
 ] 

Anu Engineer commented on HDFS-13116:
-

[~msingh] Thanks for working on this patch. It looks very good. I have some 
very minor comments.
* Conduit --> Something like PipelineInfo
* interface to Abstract --I am wondering if this makes future changes less 
flexible. For example, if we have erasure coded pipelines, I am not sure if 
this abstract class implementation would work. But for the current case of 
stand-alone and Ratis, I do see it works well.

* {{PipelineManager#getPipeline}}
{code}
  if (conduit == null) {
  LOG.error("Get conduit call failed. We are not able to find free nodes" +
  " or operational conduit.");
}
return new Pipeline(containerName, conduit);
 {code}
 Shouldn't we throw after the error, or do you want to return a new pipeline 
with conduit equal to null?

{{PipelineManager#findOpenConduit}}
Should this function be protected via a synchronized keyword or a lock? 

Here is what I am thinking -- please correct me if I am wrong.

-- This is the point of check 
{code}
  if (activeConduits.size() == 0) {
{code}

--  This is the point of use.
{code}
private int getNextIndex() {
return conduitsIndex.incrementAndGet() % activeConduits.size();
}
{code}
What happens if the activeConduits size becomes zero? ispossible?ible ?







> Ozone: Refactor Pipeline to have transport and container specific information
> -
>
> Key: HDFS-13116
> URL: https://issues.apache.org/jira/browse/HDFS-13116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13116-HDFS-7240.001.patch, 
> HDFS-13116-HDFS-7240.002.patch, HDFS-13116-HDFS-7240.003.patch, 
> HDFS-13116-HDFS-7240.004.patch, HDFS-13116-HDFS-7240.005.patch, 
> HDFS-13116-HDFS-7240.006.patch
>
>
> Currently pipeline has information about both the container as well Transport 
> layer. This results in cases where a new pipeline (i.e. transport) 
> information is allocated for each container creation.
> This code can be refactored so that the Transport information is separated 
> from the container, then the {{Transport}} can be shared between multiple 
> pipelines/containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Comment: was deleted

(was: Hi [~szetszwo],

Sorry I posted my previous comment in wrong Jira here. Would you please move 
your comment above to HDFS-12051?

Thanks.

 )

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356036#comment-16356036
 ] 

Yongjun Zhang commented on HDFS-13115:
--

Hi [~szetszwo],

Sorry I posted my previous comment in wrong Jira here. Would you please move 
your comment above to HDFS-12051?

Thanks.

 

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-13115:
---
Comment: was deleted

(was: And this seems not the correct JIRA?)

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-13115:
---
Comment: was deleted

(was: > ... and Tsz Wo Nicholas Sze for the review.

I like to clarify one more time that I neither have reviewed the patch nor the 
results.  I do have taken quick looks on the results but, honestly, I have not 
checked the details.  Thanks.

> ... would you agree to push this forward?

I won't be able to comment on this.  Sorry.)

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356031#comment-16356031
 ] 

Tsz Wo Nicholas Sze commented on HDFS-12051:


> ... and Tsz Wo Nicholas Sze for the review.

I like to clarify one more time that I neither have reviewed the patch nor the 
results.  I do have taken quick looks on the results but, honestly, I have not 
checked the details.  Thanks.

> ... would you agree to push this forward?

I won't be able to comment on this.  Sorry.

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
>

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356028#comment-16356028
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13115:


And this seems not the correct JIRA?

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356026#comment-16356026
 ] 

Yongjun Zhang commented on HDFS-12051:
--

Thanks [~mi...@cloudera.com] for the new revs and [~szetszwo] for the review.

Hi Misha,

Sorry I did not review your latest rev in time. One minor suggestion, the ratio 
config is more intuitive to be a floating point, like other ratio kind of 
config parameters in DFSConfigKeys.java. I noticed that the default value in 
the code and in hdfs-default.xml is not the same. We need to make them same.

Hi [~szetszwo], are you ok with setting the default cache ratio to 1/400 
(0.0025)?  Given that the existing cache is not working well for some cases we 
examined, would you agree to push this forward?

Thanks.

 

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <--

[jira] [Issue Comment Deleted] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Comment: was deleted

(was: Thanks [~mi...@cloudera.com] for the new revs and [~szetszwo] for the 
review.

Hi Misha,

Sorry I did not review your latest rev in time. One minor suggestion, the ratio 
config is more intuitive to be a floating point, like other ratio kind of 
config parameters in DFSConfigKeys.java. I noticed that the default value in 
the code and in hdfs-default.xml is not the same. We need to make them same.

Hi [~szetszwo], are you ok with setting the default cache ratio to 1/400 
(0.0025)?  Given that the existing cache is not working well for some cases we 
examined, would you agree to push this forward?

Thanks.

 

 

 )

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356025#comment-16356025
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13115:


> ... and Tsz Wo Nicholas Sze for the review.

I like to clarify one more time that I neither have reviewed the patch nor the 
results.  I do have taken quick looks on the results but, honestly, I have not 
checked the details.  Thanks.

> ... would you agree to push this forward?

I won't be able to comment on this.  Sorry.

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356019#comment-16356019
 ] 

genericqa commented on HDFS-12935:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
16s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 400 unchanged - 1 fixed = 400 total (was 401) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 49s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  1m 
31s{color} | {color:red} The patch generated 260 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-hdfs:27 |
| Failed junit tests | hadoop.hdfs.server.datanode.TestRefreshNamenodes |
|   | hadoop.hdfs.TestBlocksScheduledCounter |
|   | hadoop.hdfs.server.datanode.TestReadOnlySharedStorage |
|   | hadoop.hdfs.web.TestHttpsFileSystem |
|   | hadoop.hdfs.server.datanode.TestDataNodeTransferSocketSize |
|   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
|   | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
|   | hadoop.hdfs.server.datanode.TestTriggerBlockReport |
|   | hadoop.hdfs.web.TestWebHDFSAcl |
|   | hadoop.hdfs.server.datanode.TestStorageReport |
|   | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport |
| Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
|   | org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery |
|   | org.apache.hadoop.hdfs.server.datanode.TestDataNodeFaultInjector |
|   | org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream |
|   | org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade |
|   | org.apache.hadoop.hdfs.TestFileAppendRestart |
|   | org.apache.hadoop.hdfs.security.TestDelegationToken |
|   | org.apache.hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter |
|   | org.apache.hadoop.hdfs.TestDFSMkdirs |
|   | org.apache.hadoop.hdfs.TestDFSOutputStream |
|   | org.apache.hadoop.hdfs.web.TestWebHDFS |
|   |

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356011#comment-16356011
 ] 

Yongjun Zhang commented on HDFS-13115:
--

Thanks [~mi...@cloudera.com] for the new revs and [~szetszwo] for the review.

Hi Misha,

Sorry I did not review your latest rev in time. One minor suggestion, the ratio 
config is more intuitive to be a floating point, like other ratio kind of 
config parameters in DFSConfigKeys.java. I noticed that the default value in 
the code and in hdfs-default.xml is not the same. We need to make them same.

Hi [~szetszwo], are you ok with setting the default cache ratio to 1/400 
(0.0025)?  Given that the existing cache is not working well for some cases we 
examined, would you agree to push this forward?

Thanks.

 

 

 

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13078) Ozone: Ratis read fail because stream is closed before the reply is received

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355968#comment-16355968
 ] 

genericqa commented on HDFS-13078:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 7s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
8s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
9s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
40s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
12s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
6s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 26m  4s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
51s{color} | {color:red} The patch generated 640 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}114m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.ozShell.TestOzoneShell |
|   | hadoop.ozone.web.client.TestOzoneClient |
|   | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.hdfs.TestDFSOutputStream |
|   | hadoop.ozone.scm.TestXceiverClientManager |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.ozone.container.replication.TestContainerReplicationManager |
|   | hadoop.ozone.TestMiniOzoneCluster |
|   | hadoop.fs.viewfs.TestViewFileSystemWithXAttrs |
|   | hadoop.fs.TestHDFSFileContextMainOperations |
|   | hadoop.hdfs.TestDistributedFileSystem |
|   | hadoop.hdfs.TestDFSShell |
|   | hadoop.fs.viewfs.TestViewFileSystemWithAcls |
|   | hadoop.ozone.web.client.TestBuckets |
|   | hadoop.fs.viewfs.TestViewFsWithAcls |
|   | hadoop.ozone.scm.node.TestQueryNode |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure000 |
|   |

[jira] [Commented] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355965#comment-16355965
 ] 

Hudson commented on HDFS-11701:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13631 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13631/])
HDFS-11701. NPE from Unresolved Host causes permanent DFSInputStream (jitendra: 
rev b061215ecfebe476bf58f70788113d1af816f553)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientContext.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/client/impl/TestBlockReaderFactory.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/shortcircuit/DomainSocketFactory.java


> NPE from Unresolved Host causes permanent DFSInputStream failures
> -
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>Reporter: James Moore
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355959#comment-16355959
 ] 

genericqa commented on HDFS-13110:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-10285 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 7s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} HDFS-10285 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}138m 23s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}188m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfierWithStripedFile 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13110 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909631/HDFS-13110-HDFS-10285-02.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 952a16ecd4a2 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-10285 / 4a42e7a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22981/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results |

[jira] [Updated] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread Jitendra Nath Pandey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-11701:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

I have committed this to the trunk. Thanks [~ljain]!

> NPE from Unresolved Host causes permanent DFSInputStream failures
> -
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>Reporter: James Moore
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-02-07 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355941#comment-16355941
 ] 

Anu Engineer commented on HDFS-12990:
-

Committed to branch-3.0 also.

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 3.0.1
>
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-07 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355937#comment-16355937
 ] 

Brahma Reddy Battula commented on HDFS-11187:
-

Need to be in branch-3.0.1 also..?

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, HDFS-11187.001.patch, 
> HDFS-11187.002.patch, HDFS-11187.003.patch, HDFS-11187.004.patch, 
> HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13081) Datanode#checkSecureConfig should check HTTPS and SASL encryption

2018-02-07 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355938#comment-16355938
 ] 

Jitendra Nath Pandey commented on HDFS-13081:
-

{quote}Delegation tokens send passwords in the clear over http.  Webhdfs is at 
high risk.
{quote}
That is a valid point. That explains why check for HTTPS was added in the first 
place. It should be documented in javadocs.

IIUC the required checks are following:

1) For RPC: It should be either a privileged port or must use SASL for mutual 
authentication.

2) For HTTP: It should be either a privileged port or must use HTTPS 

However a combination like privileged port for HTTP and SASL for RPC should 
also work. 

The advantage of SASL is that it allows qop negotiation and different clients 
can choose encryption depending on where they are connecting from and 
sensitivity of data.

[~daryn], what are your thoughts on having privileged port for HTTP with SASL 
for RPC?

> Datanode#checkSecureConfig should check HTTPS and SASL encryption
> -
>
> Key: HDFS-13081
> URL: https://issues.apache.org/jira/browse/HDFS-13081
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, security
>Affects Versions: 3.0.0
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDFS-13081.000.patch
>
>
> Datanode#checkSecureConfig currently check the following to determine if 
> secure datanode is enabled. 
>  # The server has bound to privileged ports for RPC and HTTP via 
> SecureDataNodeStarter.
>  # The configuration enables SASL on DataTransferProtocol and HTTPS (no plain 
> HTTP) for the HTTP server. The SASL handshake guarantees authentication of 
> the RPC server before a client transmits a secret, such as a block access 
> token. Similarly, SSL guarantees authentication of the
>  HTTP server before a client transmits a secret, such as a delegation token.
> For the 2nd case, HTTPS_ONLY means all the traffic between REST client/server 
> will be encrypted. However, the logic to check only if SASL property resolver 
> is configured does not mean server requires an encrypted RPC. 
> This ticket is open to further check and ensure datanode SASL property 
> resolver has a QoP that includes auth-conf(PRIVACY). Note that the SASL QoP 
> (Quality of Protection) negotiation may drop RPC protection level from 
> auth-conf(PRIVACY) to auth-int(integrity) or auth(authentication) only, which 
> should be fine by design.
>  
> cc: [~cnauroth] , [~daryn], [~jnpandey] for additional feedback.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12933) Improve logging when DFSStripedOutputStream failed to write some blocks

2018-02-07 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355932#comment-16355932
 ] 

Brahma Reddy Battula commented on HDFS-12933:
-

Need to be in branch-3.0.1 also..?

> Improve logging when DFSStripedOutputStream failed to write some blocks
> ---
>
> Key: HDFS-12933
> URL: https://issues.apache.org/jira/browse/HDFS-12933
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: chencan
>Priority: Minor
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-12933.001.patch
>
>
> Currently if there are less DataNodes than the erasure coding policy's (# of 
> data blocks + # of parity blocks), the client sees this:
> {noformat}
> 09:18:24 17/12/14 09:18:24 WARN hdfs.DFSOutputStream: Cannot allocate parity 
> block(index=13, policy=RS-10-4-1024k). Not enough datanodes? Exclude nodes=[]
> 09:18:24 17/12/14 09:18:24 WARN hdfs.DFSOutputStream: Block group <1> has 1 
> corrupt blocks.
> {noformat}
> The 1st line is good. The 2nd line may be confusing to end users. We should 
> investigate the error and be more general / accurate. Maybe something like 
> 'failed to read x blocks'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355918#comment-16355918
 ] 

Jitendra Nath Pandey commented on HDFS-11701:
-

+1, I will commit shortly.

> NPE from Unresolved Host causes permanent DFSInputStream failures
> -
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>Reporter: James Moore
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-02-07 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355903#comment-16355903
 ] 

Anu Engineer commented on HDFS-12990:
-

[~brahmareddy] I will commit this to 3.0 now. Thanks for flagging it.

 

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 3.0.1
>
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-02-07 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355897#comment-16355897
 ] 

Brahma Reddy Battula commented on HDFS-12990:
-

should be committed to branch-3.0 which is going to be 3.0.2.?

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 3.0.1
>
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13116:
-
Attachment: HDFS-13116-HDFS-7240.006.patch

> Ozone: Refactor Pipeline to have transport and container specific information
> -
>
> Key: HDFS-13116
> URL: https://issues.apache.org/jira/browse/HDFS-13116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13116-HDFS-7240.001.patch, 
> HDFS-13116-HDFS-7240.002.patch, HDFS-13116-HDFS-7240.003.patch, 
> HDFS-13116-HDFS-7240.004.patch, HDFS-13116-HDFS-7240.005.patch, 
> HDFS-13116-HDFS-7240.006.patch
>
>
> Currently pipeline has information about both the container as well Transport 
> layer. This results in cases where a new pipeline (i.e. transport) 
> information is allocated for each container creation.
> This code can be refactored so that the Transport information is separated 
> from the container, then the {{Transport}} can be shared between multiple 
> pipelines/containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355848#comment-16355848
 ] 

Hudson commented on HDFS-12935:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13629 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13629/])
HDFS-12935. Get ambiguous result for DFSAdmin command in HA mode when (brahma: 
rev 01bd6ab18fa48f4c7cac1497905b52e547962599)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HAUtil.java


> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 3.0.2
>
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-12935:

Fix Version/s: 3.0.2
   3.0.1
   3.1.0

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 3.0.2
>
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13112) Token expiration edits may cause log corruption or deadlock

2018-02-07 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355844#comment-16355844
 ] 

Kihwal Lee commented on HDFS-13112:
---

The patch looks good.
- The addition of read locks ensures these edit logging activities do not 
collide with edit rolling or HA transitions(In addition to the level of safety 
provided by {{noInterruptsLock}}).
- A write lock is not required since these don't change any state other threads 
are accessing with a read lock.

And only the secret manager is edit logging with a read lock and all others are 
using a write lock, there can be no concurrent edit logging and it covers the 
general {{FSEditLog}} thread safety issue, not only the issue between logging 
and rolling.

Now, if we believe that it is only unsafe between edit logging and rolling 
(i.e. normal edit logging activities are thread safe), we could make 
{{getDelegationToken()}}, {{renewDelegationToken()}} and 
{{cancelDelegationToken()}} acquire a read lock.  And perhaps lease-related 
calls too.  Any thoughts on this?

In any case, I'm +1 on the patch. If you think we can make additional locking 
changes, please file a follow-up jira.

> Token expiration edits may cause log corruption or deadlock
> ---
>
> Key: HDFS-13112
> URL: https://issues.apache.org/jira/browse/HDFS-13112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.1.0-beta, 0.23.8
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-13112.patch
>
>
> HDFS-4477 specifically did not acquire the fsn lock during token cancellation 
> based on the belief that edit logs are thread-safe.  However, log rolling is 
> not thread-safe.  Failure to externally synchronize on the fsn lock during a 
> roll will cause problems.
> For sync edit logging, it may cause corruption by interspersing edits with 
> the end/start segment edits.  Async edit logging may encounter a deadlock if 
> the log queue overflows.  Luckily, losing the race is extremely rare.  In ~5 
> years, we've never encountered it.  However, HDFS-13051 lost the race with 
> async edits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355839#comment-16355839
 ] 

Brahma Reddy Battula edited comment on HDFS-12935 at 2/7/18 6:13 PM:
-

{{Committed to trunk}} and branch-3.0 .[~jiangjianfei] thanks for your 
contribution, appreciate your dedication towards close this Jira.

 

Re-uploaded the branch-2 patch to run jenkins.


was (Author: brahmareddy):
{{Committed to {{trunk}} and branch-3.0 .[~jiangjianfei] thanks for your 
contribution, appreciate your dedication towards close this Jira.}}

 

Re-uploaded the branch-2 patch to run jenkins.

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355839#comment-16355839
 ] 

Brahma Reddy Battula commented on HDFS-12935:
-

{{Committed to {{trunk}} and branch-3.0 .[~jiangjianfei] thanks for your 
contribution, appreciate your dedication towards close this Jira.}}

 

Re-uploaded the branch-2 patch to run jenkins.

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355838#comment-16355838
 ] 

Lokesh Jain commented on HDFS-11701:


The unit test failures are not related.

> NPE from Unresolved Host causes permanent DFSInputStream failures
> -
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>Reporter: James Moore
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-12935:

Attachment: HDFS-12935.009-branch-2.patch

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355827#comment-16355827
 ] 

Brahma Reddy Battula commented on HDFS-12935:
-

latest trunk patch lgtm, Committing shortly. will re-upload branch-2 patch to 
trigger the jenkins.

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, 
> HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13078) Ozone: Ratis read fail because stream is closed before the reply is received

2018-02-07 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13078:
-
Status: Patch Available  (was: Open)

> Ozone: Ratis read fail because stream is closed before the reply is received
> 
>
> Key: HDFS-13078
> URL: https://issues.apache.org/jira/browse/HDFS-13078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13078-HDFS-7240.001.patch
>
>
> In Ozone, reads from Ratis read fail because stream is closed before the 
> reply is received.
> {code}
> Jan 23, 2018 1:27:14 PM 
> org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException:
>  Stream closed before write could take place
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
> at 
>

[jira] [Updated] (HDFS-13078) Ozone: Ratis read fail because stream is closed before the reply is received

2018-02-07 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13078:
-
Attachment: HDFS-13078-HDFS-7240.001.patch

> Ozone: Ratis read fail because stream is closed before the reply is received
> 
>
> Key: HDFS-13078
> URL: https://issues.apache.org/jira/browse/HDFS-13078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13078-HDFS-7240.001.patch
>
>
> In Ozone, reads from Ratis read fail because stream is closed before the 
> reply is received.
> {code}
> Jan 23, 2018 1:27:14 PM 
> org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException:
>  Stream closed before write could take place
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
> at 
>

[jira] [Commented] (HDFS-13078) Ozone: Ratis read fail because stream is closed before the reply is received

2018-02-07 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355814#comment-16355814
 ] 

Mukul Kumar Singh commented on HDFS-13078:
--

Added some debug logging and determined that the reads were failing because of
{code}
org.apache.ratis.shaded.io.grpc.StatusRuntimeException: INTERNAL: Frame size 
16777607 exceeds maximum: 4194304. If this is normal, increase the 
maxMessageSize in the channel/server builder
{code}

This fix will require RATIS-197, which provides an option to specify max 
message size for RaftClient. Once RATIS-197, this jira can be fixed as well.


> Ozone: Ratis read fail because stream is closed before the reply is received
> 
>
> Key: HDFS-13078
> URL: https://issues.apache.org/jira/browse/HDFS-13078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
>
> In Ozone, reads from Ratis read fail because stream is closed before the 
> reply is received.
> {code}
> Jan 23, 2018 1:27:14 PM 
> org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException:
>  Stream closed before write could take place
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
>

[jira] [Commented] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355759#comment-16355759
 ] 

genericqa commented on HDFS-13116:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
19s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
10s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
7s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
17s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
10s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 49s{color} | {color:orange} hadoop-hdfs-project: The patch generated 10 new 
+ 0 unchanged - 0 fixed = 10 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
8s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 56s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}205m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.web.client.TestKeysRatis |
|   | hadoop.ozone.ozShell.TestOzoneShell |
|   | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.ozone.tools.TestCorona |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.ozone.web.client.TestKeys |
|   | hadoop.ozone.scm.container.TestContainerStateManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce

[jira] [Commented] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355747#comment-16355747
 ] 

genericqa commented on HDFS-13099:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 11s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}185m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13099 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909619/HDFS-13099.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 21b8ca2fa443 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e5c2fdd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22980/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22980/testReport/ |
| Max.

[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355685#comment-16355685
 ] 

Rakesh R commented on HDFS-13110:
-

Attached new patch fixing test case failures and checkstyle warnings,

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Attachment: HDFS-13110-HDFS-10285-02.patch

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Attachment: (was: HDFS-13110-HDFS-02.patch)

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Attachment: HDFS-13110-HDFS-02.patch

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-02.patch, 
> HDFS-13110-HDFS-10285-00.patch, HDFS-13110-HDFS-10285-01.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355659#comment-16355659
 ] 

genericqa commented on HDFS-13116:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
18s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
59s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
10s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}200m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.ozone.web.client.TestKeys |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.cblock.TestBufferManager |
|   | hadoop.ozone.scm.container.TestContainerStateManager |
|   | hadoop.ozone.scm.TestSCMCli |
|   | hadoop.ozone.web.client.TestKeysRatis |
|   | hadoop.cblock.TestCBlockReadWrite |
|   | hadoop.ozone.TestOzoneConfigurationFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-13116 |
| JIRA

[jira] [Comment Edited] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.

2018-02-07 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355650#comment-16355650
 ] 

Erik Krogen edited comment on HDFS-10453 at 2/7/18 4:16 PM:


Re: v008 patch, looks like you are using {{getPreferredBlockSize()}} instead of 
{{getNumBytes()}} , that does not seem right, was it an unintentional change?

Other than that I am pleased with the simplicity of the new change. This looks 
valid to go trunk~2.8 as well.


was (Author: xkrogen):
Re: v008 patch, looks like you are using {{getPreferredBlockSize()}} instead of 
{{getNumBytes()}} , that does not seem right, was it an unintentional change?

> ReplicationMonitor thread could stuck for long time due to the race between 
> replication and delete of same file in a large cluster.
> ---
>
> Key: HDFS-10453
> URL: https://issues.apache.org/jira/browse/HDFS-10453
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 2.7.6
>
> Attachments: HDFS-10453-branch-2.001.patch, 
> HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, 
> HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, 
> HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, 
> HDFS-10453.001.patch
>
>
> ReplicationMonitor thread could stuck for long time and loss data with little 
> probability. Consider the typical scenario：
> (1) create and close a file with the default replicas(3);
> (2) increase replication (to 10) of the file.
> (3) delete the file while ReplicationMonitor is scheduling blocks belong to 
> that file for replications.
> if ReplicationMonitor stuck reappeared, NameNode will print log as:
> {code:xml}
> 2016-04-19 10:20:48,083 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> ..
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
> replicas: expected size is 7 but only 0 storage types can be selected 
> (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, 
> DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) All required storage types are unavailable:  
> unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) 
> process same block at the same moment.
> (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to 
> replicate and leave the global lock.
> (2) FSNamesystem#delete invoked to delete blocks then clear the reference in 
> blocksmap, needReplications, etc. the block's NumBytes will set 
> NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does 
> not need explicit ACK from the node. 
> (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to 
> chooseTargets for the same blocks and no node will be selected after traverse 
> whole cluster because  no node choice satisfy the goodness criteria 
> (remaining spaces achieve required size Long.MAX_VALUE). 
> During of stage#3 ReplicationMonitor stuck for long time, especial in a large 
> cluster. invalidateBlocks & neededReplications continues to grow and no 
> consumes. it will loss data at the worst.
> This can

[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.

2018-02-07 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355650#comment-16355650
 ] 

Erik Krogen commented on HDFS-10453:


Re: v008 patch, looks like you are using {{getPreferredBlockSize()}} instead of 
{{getNumBytes()}} , that does not seem right, was it an unintentional change?

> ReplicationMonitor thread could stuck for long time due to the race between 
> replication and delete of same file in a large cluster.
> ---
>
> Key: HDFS-10453
> URL: https://issues.apache.org/jira/browse/HDFS-10453
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 2.7.6
>
> Attachments: HDFS-10453-branch-2.001.patch, 
> HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, 
> HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, 
> HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, 
> HDFS-10453.001.patch
>
>
> ReplicationMonitor thread could stuck for long time and loss data with little 
> probability. Consider the typical scenario：
> (1) create and close a file with the default replicas(3);
> (2) increase replication (to 10) of the file.
> (3) delete the file while ReplicationMonitor is scheduling blocks belong to 
> that file for replications.
> if ReplicationMonitor stuck reappeared, NameNode will print log as:
> {code:xml}
> 2016-04-19 10:20:48,083 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> ..
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
> replicas: expected size is 7 but only 0 storage types can be selected 
> (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, 
> DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) All required storage types are unavailable:  
> unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) 
> process same block at the same moment.
> (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to 
> replicate and leave the global lock.
> (2) FSNamesystem#delete invoked to delete blocks then clear the reference in 
> blocksmap, needReplications, etc. the block's NumBytes will set 
> NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does 
> not need explicit ACK from the node. 
> (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to 
> chooseTargets for the same blocks and no node will be selected after traverse 
> whole cluster because  no node choice satisfy the goodness criteria 
> (remaining spaces achieve required size Long.MAX_VALUE). 
> During of stage#3 ReplicationMonitor stuck for long time, especial in a large 
> cluster. invalidateBlocks & neededReplications continues to grow and no 
> consumes. it will loss data at the worst.
> This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block 
> and remove it from neededReplications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail:

[jira] [Updated] (HDFS-13104) Make hadoop proxy user changes reconfigurable in Namenode

2018-02-07 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13104:
-
Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

As pointed by [~kihwal], -refreshSuperUserGroupsConfiguration provides method 
to update proxy user information on NN.

> Make hadoop proxy user changes reconfigurable in Namenode
> -
>
> Key: HDFS-13104
> URL: https://issues.apache.org/jira/browse/HDFS-13104
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDFS-13104.001.patch, HDFS-13104.002.patch
>
>
> Currently any changes to add/delete a new proxy user requires NN restart 
> requiring a downtime. This jira proposes to make the changes in proxy/user 
> configuration reconfiguration via that ReconfigurationProtocol so that the 
> changes can take effect without a NN restart. For details please refer 
> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/Superusers.html.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-13105) Make hadoop proxy user changes reconfigurable in Datanode

2018-02-07 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDFS-13105.
--
Resolution: Not A Problem

As pointed by [~kihwal] & [~rajive], -refreshSuperUserGroupsConfiguration 
provides method to update proxy user information on NN.

> Make hadoop proxy user changes reconfigurable in Datanode
> -
>
> Key: HDFS-13105
> URL: https://issues.apache.org/jira/browse/HDFS-13105
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>
> Currently any changes to add/delete a new proxy user requires DN restart 
> requiring a downtime. This jira proposes to make the changes in proxy/user 
> configuration reconfiguration via that ReconfigurationProtocol so that the 
> changes can take effect without a DN restart. For details please refer 
> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/Superusers.html.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13117) Proposal to support writing replications to HDFS asynchronously

2018-02-07 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355520#comment-16355520
 ] 

Kihwal Lee commented on HDFS-13117:
---

During writes, the data is written to multiple nodes concurrently. A client may 
experience a slowness when one of the nodes is slow (e.g. transient I/O 
overload).  But you can encounter such a node even if you write only one copy. 
Also, a single failure will fail the write permanently, which users normally 
cannot afford.  HDFS was originally designed for batch processing, but is 
widely used for more demanding environment today.  It could be a totally wrong 
choice for your app, or it could become usable by configuration changes.  More 
analysis is needed to warrant a design change.

What is your application's write performance requirement?  What are you seeing 
in your cluster? What is the version of hadoop you are running? Did you profile 
or jstack datanodes or clients, by any chance? Does the app periodically 
sync/hsync/hflush the stream? Does streams tend to hang in the middle or at the 
end of a block?  Do you see frequent pipline breakages and recoveries with the 
slowness? What is the I/O scheduler on the datanodes?

> Proposal to support writing replications to HDFS asynchronously
> ---
>
> Key: HDFS-13117
> URL: https://issues.apache.org/jira/browse/HDFS-13117
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: xuchuanyin
>Priority: Major
>
> My initial question was as below:
> ```
> I've learned that When We write data to HDFS using the interface provided by 
> HDFS such as 'FileSystem.create', our client will block until all the blocks 
> and their replications are done. This will cause efficiency problem if we use 
> HDFS as our final data storage. And many of my colleagues write the data to 
> local disk in the main thread and copy it to HDFS in another thread. 
> Obviously, it increases the disk I/O.
>  
>    So, is there a way to optimize this usage? I don't want to increase the 
> disk I/O, neither do I want to be blocked during the writing of extra 
> replications.
>   How about writing to HDFS by specifying only one replication in the main 
> thread and set the actual number of replication in another thread? Or is 
> there any better way to do this?
> ```
>  
> So my proposal here is to support writing extra replications to HDFS 
> asynchronously. User can set a minimum replicator as acceptable number of 
> replications ( < default or expected replicator). When writing to HDFS, user 
> will only be blocked until the minimum replicator has been finished and HDFS 
> will continue to complete the extra replications in background.Since HDFS 
> will periodically check the integrity of all the replications, we can also 
> leave this work to HDFS itself.
>  
> There are ways to provide the interfaces：
> 1. Creating a series of interfaces by adding `acceptableReplication` 
> parameter to the current interfaces as below:
> ```
> Before：
> FSDataOutputStream create(Path f,
>   boolean overwrite,
>   int bufferSize,
>   short replication,
>   long blockSize
> ) throws IOException
>  
> After：
> FSDataOutputStream create(Path f,
>   boolean overwrite,
>   int bufferSize,
>   short replication,
>   short acceptableReplication, // minimum number of replication to finish 
> before return
>   long blockSize
> ) throws IOException
> ```
>  
> 2. Adding the `acceptableReplication` and `asynchronous` to the runtime (or 
> default) configuration, so user will not have to change any interface and 
> will benefit from this feature.
>  
> How do you think about this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13022) Block Storage: Kubernetes dynamic persistent volume provisioner

2018-02-07 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355486#comment-16355486
 ] 

Mukul Kumar Singh commented on HDFS-13022:
--

Thanks for the updated patch [~elek], The new patch looks really good. Some 
minor comments.

1) Please fix the findbugs and checkstyle issues
2) Nitpick, CBlockManager.java Please move the static imports to the section 
with static imports, this needs to be done for DynamicProvisioner.java
3) DynamicProvisioner.java#stop, should join on the watcher thread as well.
4)  We should add ASF license to json files as well ?
5) Should LICENSE.txt should also be updated as part of this change
6) I feel that the file ozone-site.xml is not needed, can we set the required 
fields in the test(TestDynamicProvisioner) ?

> Block Storage: Kubernetes dynamic persistent volume provisioner
> ---
>
> Key: HDFS-13022
> URL: https://issues.apache.org/jira/browse/HDFS-13022
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS-7240
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13022-HDFS-7240.001.patch, 
> HDFS-13022-HDFS-7240.002.patch, HDFS-13022-HDFS-7240.003.patch, 
> HDFS-13022-HDFS-7240.004.patch
>
>
> {color:#FF}{color}
> With HDFS-13017 and HDFS-13018 the cblock/jscsi server could be used in a 
> kubernetes cluster as the backend for iscsi persistent volumes.
> Unfortunatelly we need to create all the required cblocks manually with 'hdfs 
> cblok -c user volume...' for all the Persistent Volumes.
>  
> But it could be handled with a simple optional component. An additional 
> service could listen on the kubernetes event stream. In case of new 
> PersistentVolumeClaim (where the storageClassName is cblock) the cblock 
> server could create cblock in advance AND create the persistent volume could 
> be created.
>  
> The code is very simple, and this additional component could be optional in 
> the cblock server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13099:
-
Attachment: HDFS-13099.005.patch

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch, HDFS-13099.005.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355476#comment-16355476
 ] 

Yiqun Lin commented on HDFS-13099:
--

Attach the patch to fix  failed unit test.

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 126 matches

Mail list logo