date:20141104


 [ 
https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Desai reassigned HDFS-7335:
-

Assignee: Milan Desai

 Redundant checkOperation() in FSN.analyzeFileState()
 

 Key: HDFS-7335
 URL: https://issues.apache.org/jira/browse/HDFS-7335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
  Labels: newbie

 FSN.analyzeFileState() should not call checkOperation(). It is already 
 properly checked before the call. First time as READ category, second time as 
 WRITE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7338) Reed-Solomon codec library support


 [ 
https://issues.apache.org/jira/browse/HDFS-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng reassigned HDFS-7338:
---

Assignee: Kai Zheng  (was: Li Bo)

 Reed-Solomon codec library support
 --

 Key: HDFS-7338
 URL: https://issues.apache.org/jira/browse/HDFS-7338
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-EC
Reporter: Zhe Zhang
Assignee: Kai Zheng





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7285) Erasure Coding Support inside HDFS

[
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kai Zheng updated HDFS-7285:

Fix Version/s: HDFS-EC

Erasure Coding Support inside HDFS
--

Key: HDFS-7285
URL: https://issues.apache.org/jira/browse/HDFS-7285
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Weihua Jiang
Assignee: Zhe Zhang
Fix For: HDFS-EC

Attachments: HDFSErasureCodingDesign-20141028.pdf

Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice
of data reliability, comparing to the existing HDFS 3-replica approach. For
example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks,
with storage overhead only being 40%. This makes EC a quite attractive
alternative for big data storage, particularly for cold data.
Facebook had a related open source project called HDFS-RAID. It used to be
one of the contribute packages in HDFS but had been removed since Hadoop 2.0
for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends
on MapReduce to do encoding and decoding tasks; 2) it can only be used for
cold files that are intended not to be appended anymore; 3) the pure Java EC
coding implementation is extremely slow in practical use. Due to these, it
might not be a good idea to just bring HDFS-RAID back.
We (Intel and Cloudera) are working on a design to build EC into HDFS that
gets rid of any external dependencies, makes it self-contained and
independently maintained. This design lays the EC feature on the storage type
support and considers compatible with existing HDFS features like caching,
snapshot, encryption, high availability and etc. This design will also
support different EC coding schemes, implementations and policies for
different deployment scenarios. By utilizing advanced libraries (e.g. Intel
ISA-L library), an implementation can greatly improve the performance of EC
encoding/decoding and makes the EC solution even more attractive. We will
post the design document soon.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN


[ 
https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195968#comment-14195968
 ] 

Hadoop QA commented on HDFS-7314:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679168/HDFS-7314.patch
  against trunk revision 2bb327e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8638//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8638//console

This message is automatically generated.

 Aborted DFSClient's impact on long running service like YARN
 

 Key: HDFS-7314
 URL: https://issues.apache.org/jira/browse/HDFS-7314
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7314.patch


 It happened in YARN nodemanger scenario. But it could happen to any long 
 running service that use cached instance of DistrbutedFileSystem.
 1. Active NN is under heavy load. So it became unavailable for 10 minutes; 
 any DFSClient request will get ConnectTimeoutException.
 2. YARN nodemanager use DFSClient for certain write operation such as log 
 aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's 
 renewLease RPC got ConnectTimeoutException.
 {noformat}
 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to 
 renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds.  
 Aborting ...
 {noformat}
 3. After DFSClient is in Aborted state, YARN NM can't use that cached 
 instance of DistributedFileSystem.
 {noformat}
 2014-10-29 20:26:23,991 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc...
 java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 We can make YARN or DFSClient more tolerant to temporary NN unavailability. 
 Given the callstack is YARN - DistributedFileSystem - DFSClient, this can 
 be addressed at different layers.
 * YARN closes the DistributedFileSystem object when it receives some well 
 defined exception. Then the next HDFS call will create a new instance of 
 DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS 
 applications need to address this as well.
 * DistributedFileSystem detects Aborted DFSClient and create a new instance 
 of DFSClient. We will need to fix all the places DistributedFileSystem calls 
 DFSClient.
 * After DFSClient gets into Aborted state, it doesn't have to reject all 
 requests , instead it can retry. If NN is available again it can transition 
 to healthy state.

[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

2014-11-04 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195980#comment-14195980
 ] 

Steve Loughran commented on HDFS-6803:
--

maybe we should say


MUST be consistent with serialized operations
SHOULD  be concurrent

What we really wants is for two parallel operations to always produce the right 
data; concurrency boosts throughput, but is not guarantees
{code}
 read(pos1,dest,, len) -
  dest[0..len-1] = [data(FS, path, pos1), data(FS, path, pos1+1) ... data(FS, 
path, pos1+ len -1]
{code}
and  {{read(pos2, dest2, len2)}} does the same for pos2..pos2+len2-1
 
This defines the isolation; the SHOULD/MAY sets the policy.



 Documenting DFSClient#DFSInputStream expectations reading and preading in 
 concurrent context
 

 Key: HDFS-6803
 URL: https://issues.apache.org/jira/browse/HDFS-6803
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.1
Reporter: stack
 Attachments: 9117.md.txt, DocumentingDFSClientDFSInputStream (1).pdf, 
 DocumentingDFSClientDFSInputStream.v2.pdf


 Reviews of the patch posted the parent task suggest that we be more explicit 
 about how DFSIS is expected to behave when being read by contending threads. 
 It is also suggested that presumptions made internally be made explicit 
 documenting expectations.
 Before we put up a patch we've made a document of assertions we'd like to 
 make into tenets of DFSInputSteam.  If agreement, we'll attach to this issue 
 a patch that weaves the assumptions into DFSIS as javadoc and class comments. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-11-04 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195981#comment-14195981
 ] 

Steve Loughran commented on HDFS-6698:
--

bq. it's just so easy to write incorrect code with volatile.

yes, but its very fast incorrect code ...

 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
 HDFS-6698v2.txt, HDFS-6698v3.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7017) Implement OutputStream for libhdfs3


 [ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhanwei Wang updated HDFS-7017:
---
Attachment: HDFS-7017-pnative.003.patch

 Implement OutputStream for libhdfs3
 ---

 Key: HDFS-7017
 URL: https://issues.apache.org/jira/browse/HDFS-7017
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7017-pnative.002.patch, 
 HDFS-7017-pnative.003.patch, HDFS-7017.patch


 Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7017) Implement OutputStream for libhdfs3


 [ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhanwei Wang updated HDFS-7017:
---
Attachment: (was: HDFS-7017-pnative.003.patch)

 Implement OutputStream for libhdfs3
 ---

 Key: HDFS-7017
 URL: https://issues.apache.org/jira/browse/HDFS-7017
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7017-pnative.002.patch, HDFS-7017.patch


 Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7017) Implement OutputStream for libhdfs3


 [ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhanwei Wang updated HDFS-7017:
---
Attachment: HDFS-7017-pnative.003.patch

 Implement OutputStream for libhdfs3
 ---

 Key: HDFS-7017
 URL: https://issues.apache.org/jira/browse/HDFS-7017
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7017-pnative.002.patch, 
 HDFS-7017-pnative.003.patch, HDFS-7017.patch


 Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7337) Configurable and pluggable Erasure Codec and schema


 [ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7337:

Fix Version/s: (was: HDFS-EC)

 Configurable and pluggable Erasure Codec and schema
 ---

 Key: HDFS-7337
 URL: https://issues.apache.org/jira/browse/HDFS-7337
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang

 According to HDFS-7285 and the design, this considers to support multiple 
 Erasure Codecs via pluggable approach. It allows to define and configure 
 multiple codec schemas
 with different coding algorithms and parameters. The resultant codec schemas 
 can be utilized and specified via command tool for different file folders. 
 While design and implement such pluggable framework, it’s also to implement a 
 concrete codec by default (Reed Solomon) to prove the framework is useful and 
 workable. Separate JIRA could be opened for the RS codec implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding


 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7339:

Fix Version/s: (was: HDFS-EC)

 Create block groups for initial block encoding
 --

 Key: HDFS-7339
 URL: https://issues.apache.org/jira/browse/HDFS-7339
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7324) haadmin command usage prints incorrect command name


[ 
https://issues.apache.org/jira/browse/HDFS-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196025#comment-14196025
 ] 

Hudson commented on HDFS-7324:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #733 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/733/])
HDFS-7324. haadmin command usage prints incorrect command name. Contributed by 
Brahma Reddy Battula. (sureshms: rev 237890feabc809ade4e7542039634e04219d0bcb)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 haadmin command usage prints incorrect command name
 ---

 Key: HDFS-7324
 URL: https://issues.apache.org/jira/browse/HDFS-7324
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, tools
Affects Versions: 2.5.1
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Fix For: 2.7.0

 Attachments: HDFS-7324.patch


 Scenario:
 ===
 Try the help command for hadadmin like following..
 Here usage is coming as DFSHAAdmin -ns, Ideally this not availble which we 
 can check following command.
 [root@linux156 bin]#  *{color:red}./hdfs haadmin{color}* 
 No GC_PROFILE is given. Defaults to medium.
  *{color:red}Usage: DFSHAAdmin [-ns nameserviceId]{color}* 
 [-transitionToActive serviceId [--forceactive]]
 [-transitionToStandby serviceId]
 [-failover [--forcefence] [--forceactive] serviceId serviceId]
 [-getServiceState serviceId]
 [-checkHealth serviceId]
 [-help command]
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]
  *{color:blue}[root@linux156 bin]# ./hdfs DFSHAAdmin -ns 100{color}*  
 Error: Could not find or load main class DFSHAAdmin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7147) Update archival storage user documentation


[ 
https://issues.apache.org/jira/browse/HDFS-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196017#comment-14196017
 ] 

Hudson commented on HDFS-7147:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #733 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/733/])
HDFS-7147. Update archival storage user documentation. Contributed by Tsz Wo 
Nicholas Sze. (wheat9: rev 35d353e0f66b424508e2dd93bd036718cc4d5876)
* hadoop-project/src/site/site.xml
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ArchivalStorage.apt.vm
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockStoragePolicySuite.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/resources/blockStoragePolicy-default.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Update archival storage user documentation
 --

 Key: HDFS-7147
 URL: https://issues.apache.org/jira/browse/HDFS-7147
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Blocker
 Fix For: 2.6.0

 Attachments: h7147_20140926.patch, h7147_20141101.patch, 
 h7147_20141103.patch


 The Configurations section is no longer valid.  It should be removed.
 Also, if there are new APIs able to get in such as the addStoragePolicy API 
 proposed in HDFS-7076, the corresponding user documentation should be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7328) TestTraceAdmin assumes Unix line endings.


[ 
https://issues.apache.org/jira/browse/HDFS-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196024#comment-14196024
 ] 

Hudson commented on HDFS-7328:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #733 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/733/])
HDFS-7328. TestTraceAdmin assumes Unix line endings. Contributed by Chris 
Nauroth. (cnauroth: rev 2bb327eb939f57626d3dac10f7016ed634375d94)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java


 TestTraceAdmin assumes Unix line endings.
 -

 Key: HDFS-7328
 URL: https://issues.apache.org/jira/browse/HDFS-7328
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Fix For: 2.6.0

 Attachments: HDFS-7328.1.patch


 {{TestTraceAdmin}} contains some string assertions that assume Unix line 
 endings.  The test fails on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3


[ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196027#comment-14196027
 ] 

Zhanwei Wang commented on HDFS-7017:


Add log when fail to close file, remove OutputStream::lastError and related 
code.

I catch std::bad_alloc in lease renewer, if overcommit turned on, it does 
nothing, but if it is thrown in some case, I do not want the library die in 
backend working thread. std::bad_alloc will be thrown again somewhere in main 
thread and the API can handle it well. 

 Implement OutputStream for libhdfs3
 ---

 Key: HDFS-7017
 URL: https://issues.apache.org/jira/browse/HDFS-7017
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7017-pnative.002.patch, 
 HDFS-7017-pnative.003.patch, HDFS-7017.patch


 Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7344) Erasure Coding worker and support in DataNode


 [ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7344:

 Target Version/s: HDFS-EC
Affects Version/s: (was: HDFS-EC)

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo

 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. It mainly 
 covers the following aspects, and separate tasks may be opened to handle each 
 of them.
 * Process encoding work, calculating parity blocks as specified in block 
 groups and codec schema;
 * Process decoding work, recovering data blocks according to block groups and 
 codec schema;
 * Handle client requests for passive recovery blocks data and serving data on 
 demand while reconstructing;
 * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7345) Local Reconstruction Codes (LRC)


 [ 
https://issues.apache.org/jira/browse/HDFS-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7345:

 Target Version/s: HDFS-EC
Affects Version/s: (was: HDFS-EC)

 Local Reconstruction Codes (LRC)
 

 Key: HDFS-7345
 URL: https://issues.apache.org/jira/browse/HDFS-7345
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng

 HDFS-7285 proposes to support Erasure Coding inside HDFS, supports multiple 
 Erasure Coding codecs via pluggable framework and implements Reed Solomon 
 code by default. This is to support a more advanced coding mechanism, Local 
 Reconstruction Codes (LRC). As discussed in the paper 
 (https://www.usenix.org/system/files/conference/atc12/atc12-final181_0.pdf), 
 LRC reduces the number of erasure coding fragments that need to be read when 
 reconstructing data fragments that are offline, while still keeping the 
 storage overhead low. The important benefits of LRC are that it reduces the 
 bandwidth and I/Os required for repair reads over prior codes, while still 
 allowing a significant reduction in storage overhead. Intel ISA library also 
 supports LRC in its update and can also be leveraged. The implementation 
 would also consider how to distribute the calculating of local and global 
 parity blocks to other relevant DataNodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7285) Erasure Coding Support inside HDFS

2014-11-04 Thread Yi Liu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yi Liu updated HDFS-7285:
-
Target Version/s: HDFS-EC
Fix Version/s: (was: HDFS-EC)

Erasure Coding Support inside HDFS
--

Key: HDFS-7285
URL: https://issues.apache.org/jira/browse/HDFS-7285
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Weihua Jiang
Assignee: Zhe Zhang
Attachments: HDFSErasureCodingDesign-20141028.pdf

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7343) A comprehensive and flexible storage policy engine

2014-11-04 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-7343:
-
Target Version/s: HDFS-EC

 A comprehensive and flexible storage policy engine
 --

 Key: HDFS-7343
 URL: https://issues.apache.org/jira/browse/HDFS-7343
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Kai Zheng

 As discussed in HDFS-7285, it would be better to have a comprehensive and 
 flexible storage policy engine considering file attributes, metadata, data 
 temperature, storage type, EC codec, available hardware capabilities, 
 user/application preference and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7347) Configurable erasure coding policy for individual files and directories

2014-11-04 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-7347:
-
Fix Version/s: (was: HDFS-EC)

 Configurable erasure coding policy for individual files and directories
 ---

 Key: HDFS-7347
 URL: https://issues.apache.org/jira/browse/HDFS-7347
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 HDFS users and admins should be able to turn on and off erasure coding for 
 individual files or directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7349) Support DFS command for the EC encoding

Vinayakumar B created HDFS-7349:
---

 Summary: Support DFS command for the EC encoding
 Key: HDFS-7349
 URL: https://issues.apache.org/jira/browse/HDFS-7349
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Vinayakumar B
Assignee: Vinayakumar B


Support implementation of the following commands
*hdfs dfs -convertToEC path*
   path: Converts all blocks under this path to EC form (if not already in EC 
form, and if can be coded).
*hdfs dfs -convertToRep path*
   path: Converts all blocks under this path to be replicated form.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7350) WebHDFS: Support EC commands through webhdfs

Uma Maheswara Rao G created HDFS-7350:
-

 Summary: WebHDFS: Support EC commands through webhdfs
 Key: HDFS-7350
 URL: https://issues.apache.org/jira/browse/HDFS-7350
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-EC
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7351) Document the HDFS Erasure Coding feature

Uma Maheswara Rao G created HDFS-7351:
-

 Summary: Document the HDFS Erasure Coding feature
 Key: HDFS-7351
 URL: https://issues.apache.org/jira/browse/HDFS-7351
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-EC
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7352) Common side changes for HDFS Erasure coding support

Uma Maheswara Rao G created HDFS-7352:
-

 Summary: Common side changes for HDFS Erasure coding support
 Key: HDFS-7352
 URL: https://issues.apache.org/jira/browse/HDFS-7352
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: HDFS-EC
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G


This is umbrella JIRA for tracking the common side changes for HDFS Erasure 
Coding support,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7352) Common side changes for HDFS Erasure coding support


 [ 
https://issues.apache.org/jira/browse/HDFS-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-7352:
--
Issue Type: New Feature  (was: Bug)

 Common side changes for HDFS Erasure coding support
 ---

 Key: HDFS-7352
 URL: https://issues.apache.org/jira/browse/HDFS-7352
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: HDFS-EC
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G

 This is umbrella JIRA for tracking the common side changes for HDFS Erasure 
 Coding support,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7353) Common Erasure Codec API and plugin support

Kai Zheng created HDFS-7353:
---

 Summary: Common Erasure Codec API and plugin support
 Key: HDFS-7353
 URL: https://issues.apache.org/jira/browse/HDFS-7353
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng


This is to abstract and define common codec API across different codec 
algorithms like RS, XOR and etc. Such API can be implemented by utilizing 
various library support, such as Intel ISA library and Jerasure library. It 
provides default implementation and also allows to plugin vendor specific ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7324) haadmin command usage prints incorrect command name


[ 
https://issues.apache.org/jira/browse/HDFS-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196137#comment-14196137
 ] 

Hudson commented on HDFS-7324:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1922 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1922/])
HDFS-7324. haadmin command usage prints incorrect command name. Contributed by 
Brahma Reddy Battula. (sureshms: rev 237890feabc809ade4e7542039634e04219d0bcb)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java


 haadmin command usage prints incorrect command name
 ---

 Key: HDFS-7324
 URL: https://issues.apache.org/jira/browse/HDFS-7324
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, tools
Affects Versions: 2.5.1
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Fix For: 2.7.0

 Attachments: HDFS-7324.patch


 Scenario:
 ===
 Try the help command for hadadmin like following..
 Here usage is coming as DFSHAAdmin -ns, Ideally this not availble which we 
 can check following command.
 [root@linux156 bin]#  *{color:red}./hdfs haadmin{color}* 
 No GC_PROFILE is given. Defaults to medium.
  *{color:red}Usage: DFSHAAdmin [-ns nameserviceId]{color}* 
 [-transitionToActive serviceId [--forceactive]]
 [-transitionToStandby serviceId]
 [-failover [--forcefence] [--forceactive] serviceId serviceId]
 [-getServiceState serviceId]
 [-checkHealth serviceId]
 [-help command]
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]
  *{color:blue}[root@linux156 bin]# ./hdfs DFSHAAdmin -ns 100{color}*  
 Error: Could not find or load main class DFSHAAdmin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7147) Update archival storage user documentation


[ 
https://issues.apache.org/jira/browse/HDFS-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196129#comment-14196129
 ] 

Hudson commented on HDFS-7147:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1922 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1922/])
HDFS-7147. Update archival storage user documentation. Contributed by Tsz Wo 
Nicholas Sze. (wheat9: rev 35d353e0f66b424508e2dd93bd036718cc4d5876)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/resources/blockStoragePolicy-default.xml
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ArchivalStorage.apt.vm
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-project/src/site/site.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockStoragePolicySuite.java


 Update archival storage user documentation
 --

 Key: HDFS-7147
 URL: https://issues.apache.org/jira/browse/HDFS-7147
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Blocker
 Fix For: 2.6.0

 Attachments: h7147_20140926.patch, h7147_20141101.patch, 
 h7147_20141103.patch


 The Configurations section is no longer valid.  It should be removed.
 Also, if there are new APIs able to get in such as the addStoragePolicy API 
 proposed in HDFS-7076, the corresponding user documentation should be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7353) Common Erasure Codec API and plugin support


 [ 
https://issues.apache.org/jira/browse/HDFS-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng reassigned HDFS-7353:
---

Assignee: Kai Zheng

 Common Erasure Codec API and plugin support
 ---

 Key: HDFS-7353
 URL: https://issues.apache.org/jira/browse/HDFS-7353
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng

 This is to abstract and define common codec API across different codec 
 algorithms like RS, XOR and etc. Such API can be implemented by utilizing 
 various library support, such as Intel ISA library and Jerasure library. It 
 provides default implementation and also allows to plugin vendor specific 
 ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7328) TestTraceAdmin assumes Unix line endings.


[ 
https://issues.apache.org/jira/browse/HDFS-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196136#comment-14196136
 ] 

Hudson commented on HDFS-7328:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1922 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1922/])
HDFS-7328. TestTraceAdmin assumes Unix line endings. Contributed by Chris 
Nauroth. (cnauroth: rev 2bb327eb939f57626d3dac10f7016ed634375d94)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestTraceAdmin assumes Unix line endings.
 -

 Key: HDFS-7328
 URL: https://issues.apache.org/jira/browse/HDFS-7328
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Fix For: 2.6.0

 Attachments: HDFS-7328.1.patch


 {{TestTraceAdmin}} contains some string assertions that assume Unix line 
 endings.  The test fails on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7338) Reed-Solomon codec library support


[ 
https://issues.apache.org/jira/browse/HDFS-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196142#comment-14196142
 ] 

Kai Zheng commented on HDFS-7338:
-

This will follow the common codec API and plugin support to be defined in 
HDFS-7353, supports the RS codec. We're considering to provide the default 
implementation by utilizing Intel ISA library as it's desired for better 
performance and the license is friendly.

 Reed-Solomon codec library support
 --

 Key: HDFS-7338
 URL: https://issues.apache.org/jira/browse/HDFS-7338
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-EC
Reporter: Zhe Zhang
Assignee: Kai Zheng





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7338) Reed-Solomon codec library support


 [ 
https://issues.apache.org/jira/browse/HDFS-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-7338:

Description: This is to provide RS codec implementation for encoding and 
decoding.

 Reed-Solomon codec library support
 --

 Key: HDFS-7338
 URL: https://issues.apache.org/jira/browse/HDFS-7338
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-EC
Reporter: Zhe Zhang
Assignee: Kai Zheng

 This is to provide RS codec implementation for encoding and decoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

[
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kai Zheng updated HDFS-7337:

Description:
According to HDFS-7285 and the design, this considers to support multiple
Erasure Codecs via pluggable approach. It allows to define and configure
multiple codec schemas with different coding algorithms and parameters. The
resultant codec schemas can be utilized and specified via command tool for
different file folders. While design and implement such pluggable framework,
it’s also to implement a concrete codec by default (Reed Solomon) to prove the
framework is useful and workable. Separate JIRA could be opened for the RS
codec implementation.

Note HDFS-7353 will focus on the very low level codec API and implementation to
make concrete vendor libraries transparent to the upper layer. This JIRA
focuses on high level stuffs that interact with configuration, schema and etc.

was:
According to HDFS-7285 and the design, this considers to support multiple
Erasure Codecs via pluggable approach. It allows to define and configure
multiple codec schemas
with different coding algorithms and parameters. The resultant codec schemas
can be utilized and specified via command tool for different file folders.
While design and implement such pluggable framework, it’s also to implement a
concrete codec by default (Reed Solomon) to prove the framework is useful and
workable. Separate JIRA could be opened for the RS codec implementation.

Configurable and pluggable Erasure Codec and schema
---

Key: HDFS-7337
URL: https://issues.apache.org/jira/browse/HDFS-7337
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Zhe Zhang

According to HDFS-7285 and the design, this considers to support multiple
Erasure Codecs via pluggable approach. It allows to define and configure
multiple codec schemas with different coding algorithms and parameters. The
resultant codec schemas can be utilized and specified via command tool for
different file folders. While design and implement such pluggable framework,
it’s also to implement a concrete codec by default (Reed Solomon) to prove
the framework is useful and workable. Separate JIRA could be opened for the
RS codec implementation.
Note HDFS-7353 will focus on the very low level codec API and implementation
to make concrete vendor libraries transparent to the upper layer. This JIRA
focuses on high level stuffs that interact with configuration, schema and etc.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

[
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196154#comment-14196154
]

Kai Zheng commented on HDFS-7337:
-

Also opened HDFS-7353 to focus on the very low level codec API and
implementation to make concrete vendor libraries transparent to the upper
layer. This JIRA focuses on high level stuffs that interact with configuration,
schema and etc.

Configurable and pluggable Erasure Codec and schema
---

Key: HDFS-7337
URL: https://issues.apache.org/jira/browse/HDFS-7337
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Zhe Zhang

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema


[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196163#comment-14196163
 ] 

Kai Zheng commented on HDFS-7337:
-

Zhe, let me consider these issues together and think about how to define and 
implement such configurable and pluggable codec plus schema. Will give my 
thoughts here for the discussion. Assigned to me.

 Configurable and pluggable Erasure Codec and schema
 ---

 Key: HDFS-7337
 URL: https://issues.apache.org/jira/browse/HDFS-7337
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang

 According to HDFS-7285 and the design, this considers to support multiple 
 Erasure Codecs via pluggable approach. It allows to define and configure 
 multiple codec schemas with different coding algorithms and parameters. The 
 resultant codec schemas can be utilized and specified via command tool for 
 different file folders. While design and implement such pluggable framework, 
 it’s also to implement a concrete codec by default (Reed Solomon) to prove 
 the framework is useful and workable. Separate JIRA could be opened for the 
 RS codec implementation.
 Note HDFS-7353 will focus on the very low level codec API and implementation 
 to make concrete vendor libraries transparent to the upper layer. This JIRA 
 focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7337) Configurable and pluggable Erasure Codec and schema


 [ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng reassigned HDFS-7337:
---

Assignee: Kai Zheng

 Configurable and pluggable Erasure Codec and schema
 ---

 Key: HDFS-7337
 URL: https://issues.apache.org/jira/browse/HDFS-7337
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Kai Zheng

 According to HDFS-7285 and the design, this considers to support multiple 
 Erasure Codecs via pluggable approach. It allows to define and configure 
 multiple codec schemas with different coding algorithms and parameters. The 
 resultant codec schemas can be utilized and specified via command tool for 
 different file folders. While design and implement such pluggable framework, 
 it’s also to implement a concrete codec by default (Reed Solomon) to prove 
 the framework is useful and workable. Separate JIRA could be opened for the 
 RS codec implementation.
 Note HDFS-7353 will focus on the very low level codec API and implementation 
 to make concrete vendor libraries transparent to the upper layer. This JIRA 
 focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7353) Common Erasure Codec API and plugin support


 [ 
https://issues.apache.org/jira/browse/HDFS-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-7353:

Fix Version/s: HDFS-EC

 Common Erasure Codec API and plugin support
 ---

 Key: HDFS-7353
 URL: https://issues.apache.org/jira/browse/HDFS-7353
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: HDFS-EC


 This is to abstract and define common codec API across different codec 
 algorithms like RS, XOR and etc. Such API can be implemented by utilizing 
 various library support, such as Intel ISA library and Jerasure library. It 
 provides default implementation and also allows to plugin vendor specific 
 ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7147) Update archival storage user documentation


[ 
https://issues.apache.org/jira/browse/HDFS-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196198#comment-14196198
 ] 

Hudson commented on HDFS-7147:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1947 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1947/])
HDFS-7147. Update archival storage user documentation. Contributed by Tsz Wo 
Nicholas Sze. (wheat9: rev 35d353e0f66b424508e2dd93bd036718cc4d5876)
* hadoop-project/src/site/site.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/resources/blockStoragePolicy-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockStoragePolicySuite.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ArchivalStorage.apt.vm


 Update archival storage user documentation
 --

 Key: HDFS-7147
 URL: https://issues.apache.org/jira/browse/HDFS-7147
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Blocker
 Fix For: 2.6.0

 Attachments: h7147_20140926.patch, h7147_20141101.patch, 
 h7147_20141103.patch


 The Configurations section is no longer valid.  It should be removed.
 Also, if there are new APIs able to get in such as the addStoragePolicy API 
 proposed in HDFS-7076, the corresponding user documentation should be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7324) haadmin command usage prints incorrect command name


[ 
https://issues.apache.org/jira/browse/HDFS-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196206#comment-14196206
 ] 

Hudson commented on HDFS-7324:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1947 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1947/])
HDFS-7324. haadmin command usage prints incorrect command name. Contributed by 
Brahma Reddy Battula. (sureshms: rev 237890feabc809ade4e7542039634e04219d0bcb)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java


 haadmin command usage prints incorrect command name
 ---

 Key: HDFS-7324
 URL: https://issues.apache.org/jira/browse/HDFS-7324
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, tools
Affects Versions: 2.5.1
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Fix For: 2.7.0

 Attachments: HDFS-7324.patch


 Scenario:
 ===
 Try the help command for hadadmin like following..
 Here usage is coming as DFSHAAdmin -ns, Ideally this not availble which we 
 can check following command.
 [root@linux156 bin]#  *{color:red}./hdfs haadmin{color}* 
 No GC_PROFILE is given. Defaults to medium.
  *{color:red}Usage: DFSHAAdmin [-ns nameserviceId]{color}* 
 [-transitionToActive serviceId [--forceactive]]
 [-transitionToStandby serviceId]
 [-failover [--forcefence] [--forceactive] serviceId serviceId]
 [-getServiceState serviceId]
 [-checkHealth serviceId]
 [-help command]
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]
  *{color:blue}[root@linux156 bin]# ./hdfs DFSHAAdmin -ns 100{color}*  
 Error: Could not find or load main class DFSHAAdmin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7328) TestTraceAdmin assumes Unix line endings.


[ 
https://issues.apache.org/jira/browse/HDFS-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196205#comment-14196205
 ] 

Hudson commented on HDFS-7328:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1947 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1947/])
HDFS-7328. TestTraceAdmin assumes Unix line endings. Contributed by Chris 
Nauroth. (cnauroth: rev 2bb327eb939f57626d3dac10f7016ed634375d94)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestTraceAdmin assumes Unix line endings.
 -

 Key: HDFS-7328
 URL: https://issues.apache.org/jira/browse/HDFS-7328
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Fix For: 2.6.0

 Attachments: HDFS-7328.1.patch


 {{TestTraceAdmin}} contains some string assertions that assume Unix line 
 endings.  The test fails on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent

2014-11-04 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196334#comment-14196334
 ] 

Suresh Srinivas commented on HDFS-7340:
---

+1 for the patch. Couple of comments:
bq. upgrade has been finalized
Can you please change this to upgrade already has been finalized?

Also please add Idempotent annotation to the method.


 make rollingUpgrade start/finalize idempotent
 -

 Key: HDFS-7340
 URL: https://issues.apache.org/jira/browse/HDFS-7340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Attachments: HDFS-7340.000.patch


 I was running this on a HA cluster with 
 dfs.client.test.drop.namenode.response.number set to 1. So the first request 
 goes through but the response is dropped. Which then causes another request 
 which fails and says a request is already in progress. We should add retry 
 cache support for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-04 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196336#comment-14196336
 ] 

Plamen Jeliazkov commented on HDFS-3107:


[~cmccabe],

At the time [~shv] talked about my new patch there was nothing posted yet in 
HDFS-7056 minus Konstantin's design doc.
We only uploaded even newer patches yesterday around noon.

Please be careful not to confuse [~shv] and [~cos].
The snapshot support patch (for HDFS-7056) was not ready yet when [~cos] made 
his comment.

We don't have to commit HDFS-3107 on its own. 
There is the option to treat the combined patch HDFS-3107--7056 as the first 
patch, which accounts for upgrade and rollback functionality as well as 
snapshot support, demonstrated in unit test.
This should address your comment: My reasoning is that if the first patch 
breaks rollback, it's tough to see it getting into trunk.
I am not objecting to do work on a branch but I am unsure it is necessary given 
the combined patch seems to meet the support requirements asked for this work.

I'll investigate the FindBugs.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
 HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
 editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent


[ 
https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196355#comment-14196355
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7340:
---

+1 patch looks good.  No additional comments.

 make rollingUpgrade start/finalize idempotent
 -

 Key: HDFS-7340
 URL: https://issues.apache.org/jira/browse/HDFS-7340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Attachments: HDFS-7340.000.patch


 I was running this on a HA cluster with 
 dfs.client.test.drop.namenode.response.number set to 1. So the first request 
 goes through but the response is dropped. Which then causes another request 
 which fails and says a request is already in progress. We should add retry 
 cache support for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure


 [ 
https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7218:

Hadoop Flags: Reviewed

+1 from me too.  Thank you, Charles.

 FSNamesystem ACL operations should write to audit log on failure
 

 Key: HDFS-7218
 URL: https://issues.apache.org/jira/browse/HDFS-7218
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, 
 HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch


 Various Acl methods in FSNamesystem do not write to the audit log when the 
 operation is not successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding

2014-11-04 Thread Zhe Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7339:

Description: 
All erasure codec operations center around the concept of _block groups_, which 
are formed in encoding and looked up in decoding. This JIRA creates a 
lightweight {{BlockGroup}} class to record the original and parity blocks in an 
encoding group, as well as a pointer to the codec schema. Pluggable codec 
schemas will be supported in HDFS-7337. 

The NameNode creates and maintains {{BlockGroup}} instances through 2 new 
components; the attached figure has an illustration of the architecture.

{{ECManager}}: This module manages {{BlockGroups}} and associated codec 
schemas. As a simple example, it stores the codec schema of Reed-Solomon 
algorithm with 3 original and 2 parity blocks (5 blocks in each group). Each 
{{BlockGroup}} points to the schema it uses. To facilitate lookups during 
recovery requests, {{BlockGroups}} should be oraganized as a map keyed by 
{{Blocks}}.

{{ErasureCodingBlocks}}: Block encoding work is triggered by multiple events. 
This module analyzes the incoming events, and dispatches tasks to 
{{UnderReplicatedBlocks}} to create parity blocks. A new queue 
({{QUEUE_INITIAL_ENCODING}}) will be added to the 5 existing priority queues to 
maintain the relative order of encoding and replication tasks.
* Whenever a block is finalized and meets EC criteria -- including 1) block 
size is full; 2) the file’s storage policy allows EC -- {{ErasureCodingBlocks}} 
tries to form a {{BlockGroup}}. In order to do so it needs to store a set of 
blocks waiting to be encoded. Different grouping algorithms can be applied -- 
e.g., always grouping blocks in the same file. Blocks in a group should also 
reside on different DataNodes, and ideally on different racks, to tolerate node 
and rack failures. If successful, it records the formed group with 
{{ECManager}} and insert the parity blocks into {{QUEUE_INITIAL_ENCODING}}.
* When a parity block or a raw block in {{ENCODED}} state is found missing, 
{{ErasureCodingBlocks}} adds it to existing priority queues in 
{{UnderReplicatedBlocks}}. E.g., if all parity blocks in a group are lost, they 
should be added to {{QUEUE_HIGHEST_PRIORITY}}. New priorities might be added 
for fine grained differentiation (e.g., loss of a raw block versus a parity 
one).

 Create block groups for initial block encoding
 --

 Key: HDFS-7339
 URL: https://issues.apache.org/jira/browse/HDFS-7339
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: Encoding-design-NN.jpg


 All erasure codec operations center around the concept of _block groups_, 
 which are formed in encoding and looked up in decoding. This JIRA creates a 
 lightweight {{BlockGroup}} class to record the original and parity blocks in 
 an encoding group, as well as a pointer to the codec schema. Pluggable codec 
 schemas will be supported in HDFS-7337. 
 The NameNode creates and maintains {{BlockGroup}} instances through 2 new 
 components; the attached figure has an illustration of the architecture.
 {{ECManager}}: This module manages {{BlockGroups}} and associated codec 
 schemas. As a simple example, it stores the codec schema of Reed-Solomon 
 algorithm with 3 original and 2 parity blocks (5 blocks in each group). Each 
 {{BlockGroup}} points to the schema it uses. To facilitate lookups during 
 recovery requests, {{BlockGroups}} should be oraganized as a map keyed by 
 {{Blocks}}.
 {{ErasureCodingBlocks}}: Block encoding work is triggered by multiple events. 
 This module analyzes the incoming events, and dispatches tasks to 
 {{UnderReplicatedBlocks}} to create parity blocks. A new queue 
 ({{QUEUE_INITIAL_ENCODING}}) will be added to the 5 existing priority queues 
 to maintain the relative order of encoding and replication tasks.
 * Whenever a block is finalized and meets EC criteria -- including 1) block 
 size is full; 2) the file’s storage policy allows EC -- 
 {{ErasureCodingBlocks}} tries to form a {{BlockGroup}}. In order to do so it 
 needs to store a set of blocks waiting to be encoded. Different grouping 
 algorithms can be applied -- e.g., always grouping blocks in the same file. 
 Blocks in a group should also reside on different DataNodes, and ideally on 
 different racks, to tolerate node and rack failures. If successful, it 
 records the formed group with {{ECManager}} and insert the parity blocks into 
 {{QUEUE_INITIAL_ENCODING}}.
 * When a parity block or a raw block in {{ENCODED}} state is found missing, 
 {{ErasureCodingBlocks}} adds it to existing priority queues in 
 {{UnderReplicatedBlocks}}. E.g., if all parity blocks in a group are lost, 
 they should be added to {{QUEUE_HIGHEST_PRIORITY}}. New priorities might

[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding

2014-11-04 Thread Zhe Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7339:

Attachment: Encoding-design-NN.jpg

Architecture of NameNode extensions

 Create block groups for initial block encoding
 --

 Key: HDFS-7339
 URL: https://issues.apache.org/jira/browse/HDFS-7339
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: Encoding-design-NN.jpg


 All erasure codec operations center around the concept of _block groups_, 
 which are formed in encoding and looked up in decoding. This JIRA creates a 
 lightweight {{BlockGroup}} class to record the original and parity blocks in 
 an encoding group, as well as a pointer to the codec schema. Pluggable codec 
 schemas will be supported in HDFS-7337. 
 The NameNode creates and maintains {{BlockGroup}} instances through 2 new 
 components; the attached figure has an illustration of the architecture.
 {{ECManager}}: This module manages {{BlockGroups}} and associated codec 
 schemas. As a simple example, it stores the codec schema of Reed-Solomon 
 algorithm with 3 original and 2 parity blocks (5 blocks in each group). Each 
 {{BlockGroup}} points to the schema it uses. To facilitate lookups during 
 recovery requests, {{BlockGroups}} should be oraganized as a map keyed by 
 {{Blocks}}.
 {{ErasureCodingBlocks}}: Block encoding work is triggered by multiple events. 
 This module analyzes the incoming events, and dispatches tasks to 
 {{UnderReplicatedBlocks}} to create parity blocks. A new queue 
 ({{QUEUE_INITIAL_ENCODING}}) will be added to the 5 existing priority queues 
 to maintain the relative order of encoding and replication tasks.
 * Whenever a block is finalized and meets EC criteria -- including 1) block 
 size is full; 2) the file’s storage policy allows EC -- 
 {{ErasureCodingBlocks}} tries to form a {{BlockGroup}}. In order to do so it 
 needs to store a set of blocks waiting to be encoded. Different grouping 
 algorithms can be applied -- e.g., always grouping blocks in the same file. 
 Blocks in a group should also reside on different DataNodes, and ideally on 
 different racks, to tolerate node and rack failures. If successful, it 
 records the formed group with {{ECManager}} and insert the parity blocks into 
 {{QUEUE_INITIAL_ENCODING}}.
 * When a parity block or a raw block in {{ENCODED}} state is found missing, 
 {{ErasureCodingBlocks}} adds it to existing priority queues in 
 {{UnderReplicatedBlocks}}. E.g., if all parity blocks in a group are lost, 
 they should be added to {{QUEUE_HIGHEST_PRIORITY}}. New priorities might be 
 added for fine grained differentiation (e.g., loss of a raw block versus a 
 parity one).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception


[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196496#comment-14196496
 ] 

Colin Patrick McCabe commented on HDFS-7199:


Can you post a new patch with the else on the same line as the close brace as 
per our coding standard?  Then I'll commit this.  Thanks guys.

 DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
 exception
 ---

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Attachments: HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7340) make rollingUpgrade start/finalize idempotent


 [ 
https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7340:

Attachment: HDFS-7340.001.patch

Thanks Suresh and Nicholas for the review. Update the patch to address Suresh's 
comments.

bq. add Idempotent annotation to the method

The ClientProtocol#rollingUpgrade has already been annotated as idempotent 
before the fix.

 make rollingUpgrade start/finalize idempotent
 -

 Key: HDFS-7340
 URL: https://issues.apache.org/jira/browse/HDFS-7340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch


 I was running this on a HA cluster with 
 dfs.client.test.drop.namenode.response.number set to 1. So the first request 
 goes through but the response is dropped. Which then causes another request 
 which fails and says a request is already in progress. We should add retry 
 cache support for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

2014-11-04 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-6803:

Attachment: HDFS-6803v2.txt

Thanks [~ste...@apache.org].  Here is v2.

 Documenting DFSClient#DFSInputStream expectations reading and preading in 
 concurrent context
 

 Key: HDFS-6803
 URL: https://issues.apache.org/jira/browse/HDFS-6803
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.1
Reporter: stack
 Attachments: 9117.md.txt, DocumentingDFSClientDFSInputStream (1).pdf, 
 DocumentingDFSClientDFSInputStream.v2.pdf, HDFS-6803v2.txt


 Reviews of the patch posted the parent task suggest that we be more explicit 
 about how DFSIS is expected to behave when being read by contending threads. 
 It is also suggested that presumptions made internally be made explicit 
 documenting expectations.
 Before we put up a patch we've made a document of assertions we'd like to 
 make into tenets of DFSInputSteam.  If agreement, we'll attach to this issue 
 a patch that weaves the assumptions into DFSIS as javadoc and class comments. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent


[ 
https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196511#comment-14196511
 ] 

Hadoop QA commented on HDFS-7340:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679263/HDFS-7340.001.patch
  against trunk revision 3dfd6e6.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8639//console

This message is automatically generated.

 make rollingUpgrade start/finalize idempotent
 -

 Key: HDFS-7340
 URL: https://issues.apache.org/jira/browse/HDFS-7340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch


 I was running this on a HA cluster with 
 dfs.client.test.drop.namenode.response.number set to 1. So the first request 
 goes through but the response is dropped. Which then causes another request 
 which fails and says a request is already in progress. We should add retry 
 cache support for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent


[ 
https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196514#comment-14196514
 ] 

Hudson commented on HDFS-7340:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6434 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6434/])
HDFS-7340. Make rollingUpgrade start/finalize idempotent. Contributed by Jing 
Zhao. (jing9: rev 3dfd6e68fe5028fe3766ae5056dc175c38cc97e1)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 make rollingUpgrade start/finalize idempotent
 -

 Key: HDFS-7340
 URL: https://issues.apache.org/jira/browse/HDFS-7340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch


 I was running this on a HA cluster with 
 dfs.client.test.drop.namenode.response.number set to 1. So the first request 
 goes through but the response is dropped. Which then causes another request 
 which fails and says a request is already in progress. We should add retry 
 cache support for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7340) make rollingUpgrade start/finalize idempotent


 [ 
https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7340:

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Since the change between 000 and 001 patch is only adding a word into the 
dfsadmin output (and we do not check the content of this output in the current 
unit tests), I committed this patch before waiting for Jenkins again.

 make rollingUpgrade start/finalize idempotent
 -

 Key: HDFS-7340
 URL: https://issues.apache.org/jira/browse/HDFS-7340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Fix For: 2.6.0

 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch


 I was running this on a HA cluster with 
 dfs.client.test.drop.namenode.response.number set to 1. So the first request 
 goes through but the response is dropped. Which then causes another request 
 which fails and says a request is already in progress. We should add retry 
 cache support for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures


[ 
https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196576#comment-14196576
 ] 

Haohui Mai commented on HDFS-7334:
--

Looks good to me.

{code}
 conf.set(DFSConfigKeys.DFS_NAMENODE_CHECKPOINT_MAX_RETRIES_KEY, 1);
-conf.set(DFSConfigKeys.DFS_NAMENODE_CHECKPOINT_CHECK_PERIOD_KEY, 1);
+conf.set(DFSConfigKeys.DFS_NAMENODE_CHECKPOINT_PERIOD_KEY, 1);
{code}

I think that the code should call {{setInt}} instead. Can you use the jira to 
clean them up? Thanks.

 Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
 -

 Key: HDFS-7334
 URL: https://issues.apache.org/jira/browse/HDFS-7334
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7334.001.patch


 TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test 
 timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3

[
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196585#comment-14196585
]

Colin Patrick McCabe commented on HDFS-7017:

Thanks, this looks better.

bq. I catch std::bad_alloc in lease renewer, if overcommit turned on, it does
nothing, but if it is thrown in some case, I do not want the library die in
backend working thread. std::bad_alloc will be thrown again somewhere in main
thread and the API can handle it well.

I really can't agree with this rationale. If {{std::bad_alloc}} is causing
arbitrary threads to terminate (without any message, since we don't log
anything currently), how is the user supposed to know? And why do we think
that std::bad_alloc will be thrown again somewhere in main thread? Perhaps
terminating this thread freed up enough memory to proceed.

I think that 99.% of all users will run with memory overcommit turned on,
which means that this catch block will never be an issue. The fact that nobody
runs with overcommit disabled also means this code will never be tested.

If we want to keep the catch block, let's at least log a message. If you're
concerned that the logging will throw another exception, we can have another
try... catch block.

Implement OutputStream for libhdfs3
---

Key: HDFS-7017
URL: https://issues.apache.org/jira/browse/HDFS-7017
Project: Hadoop HDFS
Issue Type: Sub-task
Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
Attachments: HDFS-7017-pnative.002.patch,
HDFS-7017-pnative.003.patch, HDFS-7017.patch

Implement pipeline and OutputStream C++ interface

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN


[ 
https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196603#comment-14196603
 ] 

Colin Patrick McCabe commented on HDFS-7314:


Thanks, [~mingma].  It's interesting that all the unit tests pass with the 
changed behavior of {{DFSClient#abort}}.

I would prefer not to add this new configuration key, because I really can't 
think of any cases where I'd like to set it to {{true}}.

I think it would be better just to have the lease timeout logic call a function 
other than {{DFSClient#abort}}.  Basically create something like 
{{DFSClient#abortOpenFiles}} and have the lease timeout code call this instead 
of abort.  That way we don't get confused about what abort means, but we also 
have the nice behavior that our client continues to be useful after a lease 
timeout.

 Aborted DFSClient's impact on long running service like YARN
 

 Key: HDFS-7314
 URL: https://issues.apache.org/jira/browse/HDFS-7314
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7314.patch


 It happened in YARN nodemanger scenario. But it could happen to any long 
 running service that use cached instance of DistrbutedFileSystem.
 1. Active NN is under heavy load. So it became unavailable for 10 minutes; 
 any DFSClient request will get ConnectTimeoutException.
 2. YARN nodemanager use DFSClient for certain write operation such as log 
 aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's 
 renewLease RPC got ConnectTimeoutException.
 {noformat}
 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to 
 renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds.  
 Aborting ...
 {noformat}
 3. After DFSClient is in Aborted state, YARN NM can't use that cached 
 instance of DistributedFileSystem.
 {noformat}
 2014-10-29 20:26:23,991 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Failed to download rsrc...
 java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 We can make YARN or DFSClient more tolerant to temporary NN unavailability. 
 Given the callstack is YARN - DistributedFileSystem - DFSClient, this can 
 be addressed at different layers.
 * YARN closes the DistributedFileSystem object when it receives some well 
 defined exception. Then the next HDFS call will create a new instance of 
 DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS 
 applications need to address this as well.
 * DistributedFileSystem detects Aborted DFSClient and create a new instance 
 of DFSClient. We will need to fix all the places DistributedFileSystem calls 
 DFSClient.
 * After DFSClient gets into Aborted state, it doesn't have to reject all 
 requests , instead it can retry. If NN is available again it can transition 
 to healthy state.
 Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7354) Support parity blocks in block management

2014-11-04 Thread Zhe Zhang (JIRA)

Zhe Zhang created HDFS-7354:
---

 Summary: Support parity blocks in block management
 Key: HDFS-7354
 URL: https://issues.apache.org/jira/browse/HDFS-7354
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang


In the current block management system, each block is associated with a file. 
Orphan blocks are considered corrupt and will be removed.

In this JIRA we extend {{Block}} with a binary flag denoting whether it is a 
parity block ({{isParity}}). Parity blocks are created, stored, and reported 
the same way as raw ones. They have regular block IDs which are unrelated to 
those of the raw blocks in the same group; their replicas (normally only 1) are 
stored in RBW and finalized directories on the DataNode depending on the stage; 
they are also included in block reports. The only distinction of a parity block 
is the lack of file affiliation. The block management system will be aware of 
parity blocks and will _not_ try to remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

[
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196619#comment-14196619
]

Colin Patrick McCabe commented on HDFS-3107:

bq. There is the option to treat the combined patch HDFS-3107--7056 as the
first patch, which accounts for upgrade and rollback functionality as well as
snapshot support, demonstrated in unit test.

That's fine with me. It can go into trunk directly if it doesn't break
rollback + snapshots.

bq. I am not objecting to do work on a branch but I am unsure it is necessary
given the combined patch seems to meet the support requirements asked for this
work.

I suggested a branch since I thought it would let us commit things quicker.
But I don't think it's necessary if you can do things without breaking trunk.
It is going to be no more than 3-4 patches anyway as I understand. Whatever is
easiest for you guys.

Just one request: Can you post the combined patch on a subtask rather than this
JIRA? I think having patches on this umbrella jira is very confusing. If
you're going to combine the patches, post the combined patch on either
HDFS-7341 or HDFS-7056 please. Thanks.

HDFS truncate
-

Key: HDFS-3107
URL: https://issues.apache.org/jira/browse/HDFS-3107
Project: Hadoop HDFS
Issue Type: New Feature
Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch,
HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch,
HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch,
HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf,
HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf,
editsStored, editsStored.xml

Original Estimate: 1,344h
Remaining Estimate: 1,344h

Systems with transaction support often need to undo changes made to the
underlying storage when a transaction is aborted. Currently HDFS does not
support truncate (a standard Posix operation) which is a reverse operation of
append, which makes upper layer applications use ugly workarounds (such as
keeping track of the discarded byte range per file in a separate metadata
store, and periodically running a vacuum process to rewrite compacted files)
to overcome this limitation of HDFS.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users


[ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196620#comment-14196620
 ] 

Haohui Mai commented on HDFS-6982:
--

{code}
+if (bucket.isStaleNow(time)) {
+  bucket.safeReset(time);
+}
{code}

Maybe I'm missing something, but it looks like that it resets the bucket on 
every intervals, causing hiccups in the data. It might make more sense to use a 
decay function in this case:

{code}
bucket - alpha * bucket + delta
{code}

Where 0  alpha  1. Assuming that the requests follow a Poisson distribution, 
you can calculate alpha w.r.t. each window based on the timespan of the delta.

 nntop: top-like tool for name node users
 -

 Key: HDFS-6982
 URL: https://issues.apache.org/jira/browse/HDFS-6982
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh
 Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, 
 HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, 
 nntop-design-v1.pdf


 In this jira we motivate the need for nntop, a tool that, similarly to what 
 top does in Linux, gives the list of top users of the HDFS name node and 
 gives insight about which users are sending majority of each traffic type to 
 the name node. This information turns out to be the most critical when the 
 name node is under pressure and the HDFS admin needs to know which user is 
 hammering the name node and with what kind of requests. Here we present the 
 design of nntop which has been in production at Twitter in the past 10 
 months. nntop proved to have low cpu overhead ( 2% in a cluster of 4K 
 nodes), low memory footprint (less than a few MB), and quite efficient for 
 the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception


 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-7199:
-
Status: Open  (was: Patch Available)

Need to address Collins comments.

 DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
 exception
 ---

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Attachments: HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception


 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-7199:
-
Status: Patch Available  (was: Open)

 DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
 exception
 ---

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception


 [ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-7199:
-
Attachment: HDFS-7199-1.patch

Updated patch with addressing Collins comment.

 DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
 exception
 ---

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users

2014-11-04 Thread Maysam Yabandeh (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196632#comment-14196632
]

Maysam Yabandeh commented on HDFS-6982:
---

Thanks [~wheat9] for the comment. Let me explain how the buckets are employed
in the rolling window implementation.

The rolling window can compute the total value of the event in the past period
of time, lets say a minute. The last minute is divided to multiple buckets
where buckets are placed in a ring. The total number of the events in the last
minute the sum of the values of the buckets.

As the time rolls forward a bucket of the last time period is reused for the
current time period. Lets says that the bucket that we are writing to, was used
to accumulate the events of 67 seconds ago. Before we start adding events to
that (which will be used to compute the event of the last 60 seconds) we need
to zero the content of the bucket. Whether the bucket is stale or not is
determined by #isStaleNow method.

Considering the above explanation let me know if the current implementation of
zeroing stale buckets makes sense to you.

nntop: top-like tool for name node users
-

Key: HDFS-6982
URL: https://issues.apache.org/jira/browse/HDFS-6982
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh
Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch,
HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch,
nntop-design-v1.pdf

In this jira we motivate the need for nntop, a tool that, similarly to what
top does in Linux, gives the list of top users of the HDFS name node and
gives insight about which users are sending majority of each traffic type to
the name node. This information turns out to be the most critical when the
name node is under pressure and the HDFS admin needs to know which user is
hammering the name node and with what kind of requests. Here we present the
design of nntop which has been in production at Twitter in the past 10
months. nntop proved to have low cpu overhead ( 2% in a cluster of 4K
nodes), low memory footprint (less than a few MB), and quite efficient for
the write path (only two hash lookup for updating a metric).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures

2014-11-04 Thread Charles Lamb (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7334:
---
Attachment: HDFS-7334.002.patch

[~wheat9],

Thanks for the review!

The .002 patch changes the set to setInt.


 Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
 -

 Key: HDFS-7334
 URL: https://issues.apache.org/jira/browse/HDFS-7334
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch


 TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test 
 timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users


[ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196644#comment-14196644
 ] 

Haohui Mai commented on HDFS-6982:
--

bq. Before we start adding events to that (which will be used to compute the 
event of the last 60 seconds) we need to zero the content of the bucket.

Thanks for the explanation. When the bucket is zeroed and the metrics are 
collected right after it, does it mean that the metrics have smaller numbers?

 nntop: top-like tool for name node users
 -

 Key: HDFS-6982
 URL: https://issues.apache.org/jira/browse/HDFS-6982
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh
 Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, 
 HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, 
 nntop-design-v1.pdf


 In this jira we motivate the need for nntop, a tool that, similarly to what 
 top does in Linux, gives the list of top users of the HDFS name node and 
 gives insight about which users are sending majority of each traffic type to 
 the name node. This information turns out to be the most critical when the 
 name node is under pressure and the HDFS admin needs to know which user is 
 hammering the name node and with what kind of requests. Here we present the 
 design of nntop which has been in production at Twitter in the past 10 
 months. nntop proved to have low cpu overhead ( 2% in a cluster of 4K 
 nodes), low memory footprint (less than a few MB), and quite efficient for 
 the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures


[ 
https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196649#comment-14196649
 ] 

Haohui Mai commented on HDFS-7334:
--

+1 pending jenkins.

 Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
 -

 Key: HDFS-7334
 URL: https://issues.apache.org/jira/browse/HDFS-7334
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch


 TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test 
 timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7056) Snapshot support for truncate


[ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196655#comment-14196655
 ] 

Jing Zhao commented on HDFS-7056:
-

Thanks for working on this, [~shv] and [~zero45]. So far I just went through 
the namenode snapshot part (INode, INodeFile, FileDiff, and FileDiffList) and I 
will continue reviewing the remaining.
# Looks like {{findLaterSnapshotWithBlocks}} and 
{{findEarlierSnapshotWithBlocks}} are always coupled with 
{{FileDiff#getBlocks}}. Maybe we can combine them so that we can wrap the logic 
like the following code into two methods like findBlocksAfter and 
findBlocksBefore?
{code}
+FileDiff diff = getDiffs().getDiffById(snapshot);
+BlockInfo[] snapshotBlocks = diff == null ? getBlocks() : diff.getBlocks();
+if(snapshotBlocks != null)
+  return snapshotBlocks;
+// Blocks are not in the current snapshot
+// Find next snapshot with blocks present or return current file blocks
+diff = getDiffs().findLaterSnapshotWithBlocks(diff.getSnapshotId());
+snapshotBlocks = (diff == null) ? getBlocks() : diff.getBlocks();
{code}
# Since the same block can be included in different file diffs, we may have 
duplicated blocks in the {{collectedBlocks}}. Will this lead to duplicated 
records in invalid block list?
{code}
  public void destroyAndCollectSnapshotBlocks(
  BlocksMapUpdateInfo collectedBlocks) {
for(FileDiff d : asList())
  d.destroyAndCollectSnapshotBlocks(collectedBlocks);
  }
{code}
# INodeFile#destroyAndCollectBlocks destroys the whole file, including the file 
diffs for snapshots. Thus we do not need to call {{collectBlocksAndClear}} and 
define a new destroyAndCollectAllBlocks method. Instead, we can simply first 
destroy all the blocks belonging to the current file, then check if calling 
{{sf.getDiffs().destroyAndCollectSnapshotBlocks}} is necessary.
{code}
+FileWithSnapshotFeature sf = getFileWithSnapshotFeature();
+if(sf == null || getDiffs().asList().isEmpty()) {
+  destroyAndCollectAllBlocks(collectedBlocks, removedINodes);
+  return;
+}
+sf.getDiffs().destroyAndCollectSnapshotBlocks(collectedBlocks);
{code}
# How do we currently calculate/update quota for a file? I guess we need to 
update the quota calculation algorithm for an INodeFile here.
# I guess the semantic of {{findEarlierSnapshotWithBlocks}} is to find the 
FileDiff that satisfies: 1) its block list is not null, and 2) its snapshot id 
is less than the given {{snapshotId}}. However, if the given {{snapshotId}} is 
not {{CURRENT_STATE_ID}}, the current implementation may return a FileDiff 
whose snapshot id is = the given {{snapshotId}} (since {{getDiffById}} may 
return a diff with snapshot id greater than the given id).
{code}
  public FileDiff findEarlierSnapshotWithBlocks(int snapshotId) {
FileDiff diff = (snapshotId == Snapshot.CURRENT_STATE_ID) ?
getLast() : getDiffById(snapshotId);
BlockInfo[] snapshotBlocks = null;
while(diff != null) {
  snapshotBlocks = diff.getBlocks();
  if(snapshotBlocks != null)
break;
  int p = getPrior(diff.getSnapshotId(), true);
  diff = (p == Snapshot.NO_SNAPSHOT_ID) ? null : getDiffById(p);
}
return diff;
  }
{code}
# Still for findEarlierSnapshotWithBlocks, because {{getPrior}} currently is a 
{{log\(n\)}} operation, the worst time complexity thus can be {{nlog\(n\)}}. 
Considering the list of the snapshot diff list is usually not big (we have an 
upper limit for the total number of snapshots), we may consider directly doing 
a linear scan for the file diff list.
# In INode.java, why do we need the following change?
{code}
   public final boolean isInLatestSnapshot(final int latestSnapshotId) {
-if (latestSnapshotId == Snapshot.CURRENT_STATE_ID) {
+if (latestSnapshotId == Snapshot.CURRENT_STATE_ID ||
+latestSnapshotId == Snapshot.NO_SNAPSHOT_ID) {
{code}
# Nit: need to add \{ and \} for the while loop according to our current 
coding style. Similar for several other places (e.g., 
{{FileDiffList#destroyAndCollectSnapshotBlocks}}).
{code}
+while(i  currentBlocks.length  i  snapshotBlocks.length 
+  currentBlocks[i] == snapshotBlocks[i])
+  i++;
+// Collect the remaining blocks of the file
+while(i  currentBlocks.length)
+  collectedBlocks.addDeleteBlock(currentBlocks[i++]);
{code}
# Minor: In the following code, instead of calling {{getDiffById}} to search 
for the file diff, we can let {{AbstractINodeDiffList#saveSelf2Snapshot}} 
return the diff it just finds/creates.
{code}
  public void saveSelf2Snapshot(int latestSnapshotId, INodeFile iNodeFile,
  INodeFileAttributes snapshotCopy, boolean withBlocks)
  throws QuotaExceededException {
super.saveSelf2Snapshot(latestSnapshotId, iNodeFile, snapshotCopy);
if(! withBlocks) return;
final FileDiff diff = getDiffById(latestSnapshotId);
//

[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users

2014-11-04 Thread Maysam Yabandeh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196664#comment-14196664
 ] 

Maysam Yabandeh commented on HDFS-6982:
---

Let me take an example. 
Time period: 60 seconds
Bucket duration: 20 seconds
Bucket/window = 3

Event1: time 00:10:55

Current time: 00:11:03
Current sum for the past window (1 min) = 1

Event2: time 00:11:54

Current time: 00:12:07
Current sum for the past window (1 min) = 1

Now for this behavior to be implemented correctly we need to zero the content 
of the bucket number 3 because both Event1 and Event2 map to the same bucket 
but Event1 is irrelevant at time 00:12:07 since it happened before the last 60 
seconds.

Makes sense?


 nntop: top-like tool for name node users
 -

 Key: HDFS-6982
 URL: https://issues.apache.org/jira/browse/HDFS-6982
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh
 Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, 
 HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, 
 nntop-design-v1.pdf


 In this jira we motivate the need for nntop, a tool that, similarly to what 
 top does in Linux, gives the list of top users of the HDFS name node and 
 gives insight about which users are sending majority of each traffic type to 
 the name node. This information turns out to be the most critical when the 
 name node is under pressure and the HDFS admin needs to know which user is 
 hammering the name node and with what kind of requests. Here we present the 
 design of nntop which has been in production at Twitter in the past 10 
 months. nntop proved to have low cpu overhead ( 2% in a cluster of 4K 
 nodes), low memory footprint (less than a few MB), and quite efficient for 
 the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception


[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1419#comment-1419
 ] 

Colin Patrick McCabe commented on HDFS-7199:


+1 pending jenkins

 DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
 exception
 ---

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN

2014-11-04 Thread Ming Ma (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ming Ma updated HDFS-7314:
--
Attachment: HDFS-7314-2.patch

Thanks, [~cmccabe]. I have updated the patch based on your suggestion.

Aborted DFSClient's impact on long running service like YARN

Key: HDFS-7314
URL: https://issues.apache.org/jira/browse/HDFS-7314
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
Attachments: HDFS-7314-2.patch, HDFS-7314.patch

It happened in YARN nodemanger scenario. But it could happen to any long
running service that use cached instance of DistrbutedFileSystem.
1. Active NN is under heavy load. So it became unavailable for 10 minutes;
any DFSClient request will get ConnectTimeoutException.
2. YARN nodemanager use DFSClient for certain write operation such as log
aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's
renewLease RPC got ConnectTimeoutException.
{noformat}
2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to
renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds.
Aborting ...
{noformat}
3. After DFSClient is in Aborted state, YARN NM can't use that cached
instance of DistributedFileSystem.
{noformat}
2014-10-29 20:26:23,991 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Failed to download rsrc...
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
We can make YARN or DFSClient more tolerant to temporary NN unavailability.
Given the callstack is YARN - DistributedFileSystem - DFSClient, this can
be addressed at different layers.
* YARN closes the DistributedFileSystem object when it receives some well
defined exception. Then the next HDFS call will create a new instance of
DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS
applications need to address this as well.
* DistributedFileSystem detects Aborted DFSClient and create a new instance
of DFSClient. We will need to fix all the places DistributedFileSystem calls
DFSClient.
* After DFSClient gets into Aborted state, it doesn't have to reject all
requests , instead it can retry. If NN is available again it can transition
to healthy state.
Comments?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-11-04 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196716#comment-14196716
 ] 

Plamen Jeliazkov commented on HDFS-3107:


I will attach it to HDFS-7056 since it has the design doc attached to it and is 
assigned to me. Thanks [~cmccabe].

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
 HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
 editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7056) Snapshot support for truncate

2014-11-04 Thread Plamen Jeliazkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-7056:
---
Attachment: HDFS-3107-HDFS-7056-combined.patch

Attaching combined patch here as well.

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, 
 HDFSSnapshotWithTruncateDesign.docx


 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7343) A comprehensive and flexible storage policy engine

2014-11-04 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196759#comment-14196759
 ] 

Andrew Purtell commented on HDFS-7343:
--

Most of the ideas mentioned in the description of HDFS-4672 have made it in. 
Might be worth examining the remainder in the context of this issue. (Or not.)

 A comprehensive and flexible storage policy engine
 --

 Key: HDFS-7343
 URL: https://issues.apache.org/jira/browse/HDFS-7343
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Kai Zheng

 As discussed in HDFS-7285, it would be better to have a comprehensive and 
 flexible storage policy engine considering file attributes, metadata, data 
 temperature, storage type, EC codec, available hardware capabilities, 
 user/application preference and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()


 [ 
https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Desai updated HDFS-7335:
--
Status: Patch Available  (was: Open)

 Redundant checkOperation() in FSN.analyzeFileState()
 

 Key: HDFS-7335
 URL: https://issues.apache.org/jira/browse/HDFS-7335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
  Labels: newbie
 Attachments: HDFS-7335.patch


 FSN.analyzeFileState() should not call checkOperation(). It is already 
 properly checked before the call. First time as READ category, second time as 
 WRITE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.


 [ 
https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7355:

Status: Patch Available  (was: Open)

 TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on 
 Windows, because we cannot deny access to the file owner.
 

 Key: HDFS-7355
 URL: https://issues.apache.org/jira/browse/HDFS-7355
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-7355.1.patch


 {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on 
 Windows.  The test attempts to simulate volume failure by denying permissions 
 to data volume directories.  This doesn't work on Windows, because Windows 
 allows the file owner access regardless of the permission settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.


 [ 
https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7355:

Attachment: HDFS-7355.1.patch

The attached patch skips the test.

 TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on 
 Windows, because we cannot deny access to the file owner.
 

 Key: HDFS-7355
 URL: https://issues.apache.org/jira/browse/HDFS-7355
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-7355.1.patch


 {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on 
 Windows.  The test attempts to simulate volume failure by denying permissions 
 to data volume directories.  This doesn't work on Windows, because Windows 
 allows the file owner access regardless of the permission settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.

Chris Nauroth created HDFS-7355:
---

 Summary: 
TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, 
because we cannot deny access to the file owner.
 Key: HDFS-7355
 URL: https://issues.apache.org/jira/browse/HDFS-7355
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial


{{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on 
Windows.  The test attempts to simulate volume failure by denying permissions 
to data volume directories.  This doesn't work on Windows, because Windows 
allows the file owner access regardless of the permission settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()


 [ 
https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Desai updated HDFS-7335:
--
Attachment: HDFS-7335.patch

Attaching a patch that removes the checkOperation call in 
FSNamesystem.analyzeFileState. No tests added as this change is trivial.

 Redundant checkOperation() in FSN.analyzeFileState()
 

 Key: HDFS-7335
 URL: https://issues.apache.org/jira/browse/HDFS-7335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
  Labels: newbie
 Attachments: HDFS-7335.patch


 FSN.analyzeFileState() should not call checkOperation(). It is already 
 properly checked before the call. First time as READ category, second time as 
 WRITE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.


[ 
https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196734#comment-14196734
 ] 

Chris Nauroth commented on HDFS-7355:
-

http://technet.microsoft.com/en-us/library/cc783530(v=ws.10).aspx

Quoting the relevant section:

{quote}
Permissions enable the owner of each secured object, such as a file, Active 
Directory object, or registry key, to control who can perform an operation or a 
set of operations on the object or object property. Because access to an object 
is at the owner’s discretion, the type of access control that is used in 
Windows Server 2003 is called discretionary access control. An owner of an 
object always has the ability to read and change permissions on the 
object.{quote}

We'll need to skip this test on Windows.

 TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on 
 Windows, because we cannot deny access to the file owner.
 

 Key: HDFS-7355
 URL: https://issues.apache.org/jira/browse/HDFS-7355
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-7355.1.patch


 {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on 
 Windows.  The test attempts to simulate volume failure by denying permissions 
 to data volume directories.  This doesn't work on Windows, because Windows 
 allows the file owner access regardless of the permission settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7208) NN doesn't schedule replication when a DN storage fails

[
https://issues.apache.org/jira/browse/HDFS-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196752#comment-14196752
]

Chris Nauroth commented on HDFS-7208:
-

The new test cannot work correctly on Windows. See HDFS-7355 for a full
explanation and a trivial patch to skip the test on Windows.

NN doesn't schedule replication when a DN storage fails
---

Key: HDFS-7208
URL: https://issues.apache.org/jira/browse/HDFS-7208
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Reporter: Ming Ma
Assignee: Ming Ma
Fix For: 2.6.0

Attachments: HDFS-7208-2.patch, HDFS-7208-3.patch, HDFS-7208.patch

We found the following problem. When a storage device on a DN fails, NN
continues to believe replicas of those blocks on that storage are valid and
doesn't schedule replication.
A DN has 12 storage disks. So there is one blockReport for each storage. When
a disk fails, # of blockReport from that DN is reduced from 12 to 11. Given
dfs.datanode.failed.volumes.tolerated is configured to be 0, NN still
considers that DN healthy.
1. A disk failed. All blocks of that disk are removed from DN dataset.

{noformat}
2014-10-04 02:11:12,626 WARN
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing
replica BP-1748500278-xx.xx.xx.xxx-1377803467793:1121568886 on failed volume
/data/disk6/dfs/current
{noformat}
2. NN receives DatanodeProtocol.DISK_ERROR. But that isn't enough to have NN
remove the DN and the replicas from the BlocksMap. In addition, blockReport
doesn't provide the diff given that is done per storage.
{noformat}
2014-10-04 02:11:12,681 WARN org.apache.hadoop.hdfs.server.namenode.NameNode:
Disk error on DatanodeRegistration(xx.xx.xx.xxx,
datanodeUuid=f3b8a30b-e715-40d6-8348-3c766f9ba9ab, infoPort=50075,
ipcPort=50020,
storageInfo=lv=-55;cid=CID-e3c38355-fde5-4e3a-b7ce-edacebdfa7a1;nsid=420527250;c=1410283484939):
DataNode failed volumes:/data/disk6/dfs/current
{noformat}
3. Run fsck on the file and confirm the NN's BlocksMap still has that replica.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()


[ 
https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196781#comment-14196781
 ] 

Hadoop QA commented on HDFS-7335:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679304/HDFS-7335.patch
  against trunk revision 1eed102.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8644//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8644//console

This message is automatically generated.

 Redundant checkOperation() in FSN.analyzeFileState()
 

 Key: HDFS-7335
 URL: https://issues.apache.org/jira/browse/HDFS-7335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
  Labels: newbie
 Attachments: HDFS-7335.patch


 FSN.analyzeFileState() should not call checkOperation(). It is already 
 properly checked before the call. First time as READ category, second time as 
 WRITE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs

Haohui Mai created HDFS-7356:


 Summary: Use DirectoryListing.hasMore() directly in nfs
 Key: HDFS-7356
 URL: https://issues.apache.org/jira/browse/HDFS-7356
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Priority: Minor


In NFS the following code path can be simplified using 
{{DirectoryListing.hasMore()}}:

{code}
boolean eof = (n  fstatus.length) ? false : (dlisting
.getRemainingEntries() == 0);
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs

2014-11-04 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned HDFS-7356:
---

Assignee: Li Lu

 Use DirectoryListing.hasMore() directly in nfs
 --

 Key: HDFS-7356
 URL: https://issues.apache.org/jira/browse/HDFS-7356
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Li Lu
Priority: Minor

 In NFS the following code path can be simplified using 
 {{DirectoryListing.hasMore()}}:
 {code}
 boolean eof = (n  fstatus.length) ? false : (dlisting
 .getRemainingEntries() == 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7233) NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException


[ 
https://issues.apache.org/jira/browse/HDFS-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196806#comment-14196806
 ] 

Rushabh S Shah commented on HDFS-7233:
--

All the tests passing on my local setup.

 NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException
 ---

 Key: HDFS-7233
 URL: https://issues.apache.org/jira/browse/HDFS-7233
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: HDFS-7233.patch


 Namenode logs the UnresolvedPathExceptioneven though that file exists in HDFS.
 Each time a symlink is accessed the NN will
 throw an UnresolvedPathException to have the client resolve it.  This 
 shouldn't
 be logged in the NN log and we could have really large NN logs  if we
 don't fix this since every MR job on the cluster will access this symlink and
 cause a stacktrace to be logged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7357) FSNamesystem.checkFileProgress should log file path

Tsz Wo Nicholas Sze created HDFS-7357:
-

 Summary: FSNamesystem.checkFileProgress should log file path
 Key: HDFS-7357
 URL: https://issues.apache.org/jira/browse/HDFS-7357
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


There is a log message in FSNamesystem.checkFileProgress for in-complete 
blocks.  However, the log message does not include the file path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs

2014-11-04 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated HDFS-7356:

Attachment: HDFS-7356-110414.patch

Hi [~wheat9], I've fixed this in my patch. If you have time please feel free to 
have a look at it. Thanks! 

 Use DirectoryListing.hasMore() directly in nfs
 --

 Key: HDFS-7356
 URL: https://issues.apache.org/jira/browse/HDFS-7356
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Li Lu
Priority: Minor
 Attachments: HDFS-7356-110414.patch


 In NFS the following code path can be simplified using 
 {{DirectoryListing.hasMore()}}:
 {code}
 boolean eof = (n  fstatus.length) ? false : (dlisting
 .getRemainingEntries() == 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs

2014-11-04 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated HDFS-7356:

Status: Patch Available  (was: Open)

 Use DirectoryListing.hasMore() directly in nfs
 --

 Key: HDFS-7356
 URL: https://issues.apache.org/jira/browse/HDFS-7356
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Li Lu
Priority: Minor
 Attachments: HDFS-7356-110414.patch


 In NFS the following code path can be simplified using 
 {{DirectoryListing.hasMore()}}:
 {code}
 boolean eof = (n  fstatus.length) ? false : (dlisting
 .getRemainingEntries() == 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs


[ 
https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196821#comment-14196821
 ] 

Jing Zhao commented on HDFS-7356:
-

+1 pending Jenkins

 Use DirectoryListing.hasMore() directly in nfs
 --

 Key: HDFS-7356
 URL: https://issues.apache.org/jira/browse/HDFS-7356
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Li Lu
Priority: Minor
 Attachments: HDFS-7356-110414.patch


 In NFS the following code path can be simplified using 
 {{DirectoryListing.hasMore()}}:
 {code}
 boolean eof = (n  fstatus.length) ? false : (dlisting
 .getRemainingEntries() == 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7357) FSNamesystem.checkFileProgress should log file path


 [ 
https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7357:
--
Status: Patch Available  (was: Open)

 FSNamesystem.checkFileProgress should log file path
 ---

 Key: HDFS-7357
 URL: https://issues.apache.org/jira/browse/HDFS-7357
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7357_20141104.patch


 There is a log message in FSNamesystem.checkFileProgress for in-complete 
 blocks.  However, the log message does not include the file path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7357) FSNamesystem.checkFileProgress should log file path


 [ 
https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7357:
--
Attachment: h7357_20141104.patch

h7357_20141104.patch: 
- add path and other info to the log messages in checkFileProgress;
- replace FSNamesystem.LOG with LOG;
- avoid printing block pool Id;
- slightly cleanup some other log messages.

 FSNamesystem.checkFileProgress should log file path
 ---

 Key: HDFS-7357
 URL: https://issues.apache.org/jira/browse/HDFS-7357
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7357_20141104.patch


 There is a log message in FSNamesystem.checkFileProgress for in-complete 
 blocks.  However, the log message does not include the file path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7357) FSNamesystem.checkFileProgress should log file path


[ 
https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196834#comment-14196834
 ] 

Haohui Mai commented on HDFS-7357:
--

+1 pending jenkins.

 FSNamesystem.checkFileProgress should log file path
 ---

 Key: HDFS-7357
 URL: https://issues.apache.org/jira/browse/HDFS-7357
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7357_20141104.patch


 There is a log message in FSNamesystem.checkFileProgress for in-complete 
 blocks.  However, the log message does not include the file path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.

2014-11-04 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196849#comment-14196849
 ] 

Ming Ma commented on HDFS-7355:
---

Thanks, [~cnauroth]. The patch looks good. BTW, it seems some other test cases 
use {{assumeTrue(!System.getProperty(os.name).startsWith(Windows));}}. 
Perhaps this came up before, if we want to make unit tests pass on other non 
linux OS, should we set up Jenkins builds for that?

 TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on 
 Windows, because we cannot deny access to the file owner.
 

 Key: HDFS-7355
 URL: https://issues.apache.org/jira/browse/HDFS-7355
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-7355.1.patch


 {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on 
 Windows.  The test attempts to simulate volume failure by denying permissions 
 to data volume directories.  This doesn't work on Windows, because Windows 
 allows the file owner access regardless of the permission settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.


[ 
https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196855#comment-14196855
 ] 

Haohui Mai commented on HDFS-7355:
--

+1 pending jenkins.

 TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on 
 Windows, because we cannot deny access to the file owner.
 

 Key: HDFS-7355
 URL: https://issues.apache.org/jira/browse/HDFS-7355
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-7355.1.patch


 {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on 
 Windows.  The test attempts to simulate volume failure by denying permissions 
 to data volume directories.  This doesn't work on Windows, because Windows 
 allows the file owner access regardless of the permission settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7056) Snapshot support for truncate

2014-11-04 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196891#comment-14196891
 ] 

Konstantin Shvachko commented on HDFS-7056:
---

 Actually Guo and I have finished the POC for this several months ago. But we 
 couldn't open source it

Hi Hu.It shows indeed in the design. Too bad you couldn't open source yours.
Hope ours is similar. I know at least getBlocks(snapshotId) method is in common 
:-)
Looking at Jing's comments.

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, 
 HDFSSnapshotWithTruncateDesign.docx


 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token

2014-11-04 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196896#comment-14196896
 ] 

Allen Wittenauer commented on HDFS-7295:


FWIW, I *do* think the max lifespan should be configurable.  But letting that 
abs max time span be up to the user is suicide for security.

 Support arbitrary max expiration times for delegation token
 ---

 Key: HDFS-7295
 URL: https://issues.apache.org/jira/browse/HDFS-7295
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. 
 This is a problem for different users of HDFS such as long running YARN apps. 
 Users should be allowed to optionally specify max lifetime for their tokens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7231) rollingupgrade needs some guard rails

2014-11-04 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7231:
---
Target Version/s: 2.6.0

 rollingupgrade needs some guard rails
 -

 Key: HDFS-7231
 URL: https://issues.apache.org/jira/browse/HDFS-7231
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Allen Wittenauer
Priority: Blocker

 See first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()


 [ 
https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Desai updated HDFS-7335:
--
Attachment: HDFS-7335.patch

New patch removes git diff prefix

 Redundant checkOperation() in FSN.analyzeFileState()
 

 Key: HDFS-7335
 URL: https://issues.apache.org/jira/browse/HDFS-7335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Konstantin Shvachko
Assignee: Milan Desai
  Labels: newbie
 Attachments: HDFS-7335.patch, HDFS-7335.patch


 FSN.analyzeFileState() should not call checkOperation(). It is already 
 properly checked before the call. First time as READ category, second time as 
 WRITE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods


 [ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7279:
-
Attachment: HDFS-7279.004.patch

 Use netty to implement DatanodeWebHdfsMethods
 -

 Key: HDFS-7279
 URL: https://issues.apache.org/jira/browse/HDFS-7279
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, webhdfs
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, 
 HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch


 Currently the DN implements all related webhdfs functionality using jetty. As 
 the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
 and connection management, DN often suffers from long latency and OOM when 
 its webhdfs component is under sustained heavy load.
 This jira proposes to implement the webhdfs component in DN using netty, 
 which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception


[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196933#comment-14196933
 ] 

Hadoop QA commented on HDFS-7199:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679281/HDFS-7199-1.patch
  against trunk revision 1eed102.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1224 javac 
compiler warnings (more than the trunk's current 1223 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestLeaseRecovery2

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestFileCreation

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8640//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8640//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8640//console

This message is automatically generated.

 DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
 exception
 ---

 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Rushabh S Shah
Priority: Critical
 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch


 If the DataStreamer thread encounters a non-I/O exception then it closes the 
 output stream but does not set lastException.  When the client later calls 
 close on the output stream then it will see the stream is already closed with 
 lastException == null, mistakently think this is a redundant close call, and 
 fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures


[ 
https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196934#comment-14196934
 ] 

Hadoop QA commented on HDFS-7334:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679284/HDFS-7334.002.patch
  against trunk revision 1eed102.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestFileCreation

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8641//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8641//console

This message is automatically generated.

 Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
 -

 Key: HDFS-7334
 URL: https://issues.apache.org/jira/browse/HDFS-7334
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch


 TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test 
 timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures