[jira] [Commented] (HDFS-5042) Completed files lost after power failure

2017-05-29 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028792#comment-16028792
 ] 

Vinayakumar B commented on HDFS-5042:
-

bq. I think we need to ensure dir sync on hsync() also as client apps may 
consider the data is flushed to disk. What is your view?
I think, its a good point.
I have been trying to verify this issue.
Found small blocks created and closed before powere failure, were nowhere 
exists on disk. Neither in rbw nor in finalized. May be because when the block 
files were created in rbw these entries also failed to sync to device.
May be first hsync() request on block file can call fsync on its parent 
directory (rbw) directory.

> Completed files lost after power failure
> 
>
> Key: HDFS-5042
> URL: https://issues.apache.org/jira/browse/HDFS-5042
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
>Reporter: Dave Latham
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-5042-01.patch, HDFS-5042-02.patch, 
> HDFS-5042-03.patch, HDFS-5042-04.patch, HDFS-5042-05-branch-2.patch, 
> HDFS-5042-05.patch, HDFS-5042-branch-2-01.patch, HDFS-5042-branch-2-05.patch
>
>
> We suffered a cluster wide power failure after which HDFS lost data that it 
> had acknowledged as closed and complete.
> The client was HBase which compacted a set of HFiles into a new HFile, then 
> after closing the file successfully, deleted the previous versions of the 
> file.  The cluster then lost power, and when brought back up the newly 
> created file was marked CORRUPT.
> Based on reading the logs it looks like the replicas were created by the 
> DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
> closed they were moved to the 'current' directory.  After the power cycle 
> those replicas were again in the blocksBeingWritten directory of the 
> underlying file system (ext3).  When those DataNodes reported in to the 
> NameNode it deleted those replicas and lost the file.
> Some possible fixes could be having the DataNode fsync the directory(s) after 
> moving the block from blocksBeingWritten to current to ensure the rename is 
> durable or having the NameNode accept replicas from blocksBeingWritten under 
> certain circumstances.
> Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
> {noformat}
> RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
> Creating 
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  with permission=rwxrwxrwx
> NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
>  blk_1395839728632046111_357084589
> DN 2013-06-29 11:16:06,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
> blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
> /10.0.5.237:50010
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
> blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
> blk_1395839728632046111_357084589 terminating
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
> lease on  file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  from client DFSClient_hb_rs_hs745,60020,1372470111932
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.completeFile: file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Renaming compacted file at 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  to 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c

[jira] [Updated] (HDFS-11897) Ozone: KSM: Changing log level for client calls in KSM

2017-05-29 Thread Nandakumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandakumar updated HDFS-11897:
--
Priority: Minor  (was: Major)

> Ozone: KSM: Changing log level for client calls in KSM
> --
>
> Key: HDFS-11897
> URL: https://issues.apache.org/jira/browse/HDFS-11897
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Nandakumar
>Assignee: Nandakumar
>Priority: Minor
>
> Whenever there is no Volume/Bucker/Key found in MetadataDB for a client call, 
> KSM logs ERROR which is not necessary. The level of these log messages can be 
> changed to DEBUG, which will be helpful in debugging.
> Changes are to be made in the following classes
> * VolumeManagerImpl
> * BucketManagerImpl
> * KeyManagerImpl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9016) Display upgrade domain information in fsck

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-9016:

Fix Version/s: 2.9.0

> Display upgrade domain information in fsck
> --
>
> Key: HDFS-9016
> URL: https://issues.apache.org/jira/browse/HDFS-9016
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-alpha1, 2.8.2
>
> Attachments: HDFS-9016-2.patch, HDFS-9016-3.patch, HDFS-9016-4.patch, 
> HDFS-9016-4.patch, HDFS-9016-branch-2-2.patch, 
> HDFS-9016.branch-2.8.001.patch, HDFS-9016-branch-2.patch, HDFS-9016.patch
>
>
> This will make it easy for people to use fsck to check block placement when 
> upgrade domain is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9005) Provide configuration support for upgrade domain

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-9005:

Fix Version/s: 2.9.0

> Provide configuration support for upgrade domain
> 
>
> Key: HDFS-9005
> URL: https://issues.apache.org/jira/browse/HDFS-9005
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-alpha1, 2.8.2
>
> Attachments: HDFS-9005-2.patch, HDFS-9005-3.patch, HDFS-9005-4.patch, 
> HDFS-9005.branch-2.8.001.patch, HDFS-9005.patch
>
>
> As part of the upgrade domain feature, we need to provide a mechanism to 
> specify upgrade domain for each datanode. One way to accomplish that is to 
> allow admins specify an upgrade domain script that takes DN ip or hostname as 
> input and return the upgrade domain. Then namenode will use it at run time to 
> set {{DatanodeInfo}}'s upgrade domain string. The configuration can be 
> something like:
> {noformat}
> 
> dfs.namenode.upgrade.domain.script.file.name
> /etc/hadoop/conf/upgrade-domain.sh
> 
> {noformat}
> just like topology script, 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-9276:

Fix Version/s: 2.9.0

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Fix For: 2.9.0, 3.0.0-alpha1, 2.8.1
>
> Attachments: debug1.PNG, debug2.PNG, HDFS-9276.01.patch, 
> HDFS-9276.02.patch, HDFS-9276.03.patch, HDFS-9276.04.patch, 
> HDFS-9276.05.patch, HDFS-9276.06.patch, HDFS-9276.07.patch, 
> HDFS-9276.08.patch, HDFS-9276.09.patch, HDFS-9276.10.patch, 
> HDFS-9276.11.patch, HDFS-9276.12.patch, HDFS-9276.13.patch, 
> HDFS-9276.14.patch, HDFS-9276.15.patch, HDFS-9276.16.patch, 
> HDFS-9276.17.patch, HDFS-9276.18.patch, HDFS-9276.19.patch, 
> HDFS-9276.20.patch, HDFSReadLoop.scala
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandl

[jira] [Updated] (HDFS-10683) Make class Token$PrivateToken private

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-10683:
-
Fix Version/s: 2.9.0

> Make class Token$PrivateToken private
> -
>
> Key: HDFS-10683
> URL: https://issues.apache.org/jira/browse/HDFS-10683
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.9.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: fs, ha, security, security_token
> Fix For: 2.9.0, 3.0.0-alpha2, 2.8.1
>
> Attachments: HDFS-10683.001.patch, HDFS-10683.002.patch
>
>
> Avoid {{instanceof}} or typecasting of {{Toke.PrivateToken}} by introducing 
> an interface method in {{Token}}. Make class {{Toke.PrivateToken}} private. 
> Use a factory method instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9705) Refine the behaviour of getFileChecksum when length = 0

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-9705:

Fix Version/s: 2.9.0

> Refine the behaviour of getFileChecksum when length = 0
> ---
>
> Key: HDFS-9705
> URL: https://issues.apache.org/jira/browse/HDFS-9705
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kai Zheng
>Assignee: SammiChen
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-9705-branch-2.001.patch, 
> HDFS-9705-branch-2.002.patch, HDFS-9705-v1.patch, HDFS-9705-v2.patch, 
> HDFS-9705-v3.patch, HDFS-9705-v4.patch, HDFS-9705-v5.patch, 
> HDFS-9705-v6.patch, HDFS-9705-v7.patch
>
>
> {{FileSystem#getFileChecksum}} may accept {{length}} parameter and 0 is a 
> valid value. Currently it will return {{null}} when length is 0, in the 
> following code block:
> {code}
> //compute file MD5
> final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData());
> switch (crcType) {
> case CRC32:
>   return new MD5MD5CRC32GzipFileChecksum(bytesPerCRC,
>   crcPerBlock, fileMD5);
> case CRC32C:
>   return new MD5MD5CRC32CastagnoliFileChecksum(bytesPerCRC,
>   crcPerBlock, fileMD5);
> default:
>   // If there is no block allocated for the file,
>   // return one with the magic entry that matches what previous
>   // hdfs versions return.
>   if (locatedblocks.size() == 0) {
> return new MD5MD5CRC32GzipFileChecksum(0, 0, fileMD5);
>   }
>   // we should never get here since the validity was checked
>   // when getCrcType() was called above.
>   return null;
> }
> {code}
> The comment says "we should never get here since the validity was checked" 
> but it does. As we're using the MD5-MD5-X approach, and {{EMPTY--CONTENT}} 
> actually is a valid case in which the MD5 value is 
> {{d41d8cd98f00b204e9800998ecf8427e}}, so suggest we return a reasonable value 
> other than null. At least some useful information in the returned value can 
> be seen, like values from block checksum header.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11177) 'storagepolicies -getStoragePolicy' command should accept URI based path.

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11177:
-
Fix Version/s: 2.9.0

> 'storagepolicies -getStoragePolicy' command should accept URI based path.
> -
>
> Key: HDFS-11177
> URL: https://issues.apache.org/jira/browse/HDFS-11177
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11177.001.patch, HDFS-11177.002.patch, 
> HDFS-11177.003.patch
>
>
> {noformat}
> hdfs storagepolicies -getStoragePolicy -path hdfs://127.0.0.1:8020/t1
> RemoteException: Invalid path name Invalid file name: hdfs://127.0.0.1:8020/t1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11163:
-
Fix Version/s: 2.9.0

> Mover should move the file blocks to default storage once policy is unset
> -
>
> Key: HDFS-11163
> URL: https://issues.apache.org/jira/browse/HDFS-11163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch, 
> HDFS-11163-003.patch, HDFS-11163-004.patch, HDFS-11163-005.patch, 
> HDFS-11163-006.patch, HDFS-11163-007.patch, HDFS-11163-branch-2.001.patch, 
> HDFS-11163-branch-2.002.patch, HDFS-11163-branch-2.003.patch, 
> temp-YARN-6278.HDFS-11163.patch
>
>
> HDFS-9534 added new API in FileSystem to unset the storage policy. Once 
> policy is unset blocks should move back to the default storage policy.
> Currently mover is not moving file blocks which have zero storage ID
> {code}
>   // currently we ignore files with unspecified storage policy
>   if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) {
> return;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11466) Change dfs.namenode.write-lock-reporting-threshold-ms default from 1000ms to 5000ms

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11466:
-
Fix Version/s: 2.9.0

> Change dfs.namenode.write-lock-reporting-threshold-ms default from 1000ms to 
> 5000ms
> ---
>
> Key: HDFS-11466
> URL: https://issues.apache.org/jira/browse/HDFS-11466
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11466.001.patch
>
>
> Per discussion on HDFS-10798, it might make sense to change the default value 
> for long write lock holds to 5000ms like the read threshold, to avoid 
> spamming the log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11395) RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the Exception thrown from NameNode

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11395:
-
Fix Version/s: 2.9.0

> RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the 
> Exception thrown from NameNode
> 
>
> Key: HDFS-11395
> URL: https://issues.apache.org/jira/browse/HDFS-11395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Nandakumar
>Assignee: Nandakumar
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11395.000.patch, HDFS-11395.001.patch, 
> HDFS-11395.002.patch, HDFS-11395.003.patch, HDFS-11395.004.patch, 
> HDFS-11395.005.patch
>
>
> When using RequestHedgingProxyProvider, in case of Exception (like 
> FileNotFoundException) from ActiveNameNode, 
> {{RequestHedgingProxyProvider#RequestHedgingInvocationHandler.invoke}} 
> receives {{ExecutionException}} since we use {{CompletionService}} for the 
> call. The ExecutionException is put into a map and wrapped with 
> {{MultiException}}.
> So for a FileNotFoundException the client receives 
> {{MultiException(Map(ExecutionException(InvocationTargetException(RemoteException(FileNotFoundException)}}
> It will cause problem in clients which are handling RemoteExceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11432) Federation : Support fully qualified path for Quota/Snapshot/cacheadmin/cryptoadmin commands

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11432:
-
Fix Version/s: 2.9.0

> Federation : Support fully qualified path for 
> Quota/Snapshot/cacheadmin/cryptoadmin commands
> 
>
> Key: HDFS-11432
> URL: https://issues.apache.org/jira/browse/HDFS-11432
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: federation
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11432-002.patch, HDFS-11432-003.patch, 
> HDFS-11432-004.patch, HDFS-11432.patch
>
>
> As of now, client needs to change "fs.defaultFS" to connect to specific name 
> service when cluster is federated cluster which will be inconvenient for user.
>  *Snapshot* 
> {noformat}
>  hdfs dfsadmin -allowSnapshot hdfs://hacluster/Dir1/snapdir2
> allowSnapshot: FileSystem viewfs://ClusterX/ is not an HDFS file system
> Usage: hdfs dfsadmin [-allowSnapshot ]
> {noformat}
>  *Quota* 
> {noformat}
> hdfs dfsadmin -setQuota 10 hdfs://hacluster/Bulkload
> setQuota: FileSystem viewfs://ClusterX/ is not an HDFS file system
> Usage: hdfs dfsadmin [-setQuota  ...]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11558) BPServiceActor thread name is too long

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11558:
-
Fix Version/s: 2.9.0

> BPServiceActor thread name is too long
> --
>
> Key: HDFS-11558
> URL: https://issues.apache.org/jira/browse/HDFS-11558
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11558.000.patch, HDFS-11558.001.patch, 
> HDFS-11558.002.patch, HDFS-11558.003.patch, HDFS-11558.004.patch, 
> HDFS-11558.005.patch, HDFS-11558.006.patch, HDFS-11558-branch-2.006.patch, 
> HDFS-11558-branch-2.8.006.patch
>
>
> Currently, the thread name looks like
> {code}
> 2017-03-20 18:32:22,022 [DataNode: 
> [[[DISK]file:/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/dn1_data0,
>  
> [DISK]file:/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/dn1_data1]]
>   heartbeating to localhost/127.0.0.1:51772] INFO  ...
> {code}
> which contains the full path for each storage dir.  It is unnecessarily long.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11592) Closing a file has a wasteful preconditions in NameNode

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11592:
-
Fix Version/s: 2.9.0

> Closing a file has a wasteful preconditions in NameNode
> ---
>
> Key: HDFS-11592
> URL: https://issues.apache.org/jira/browse/HDFS-11592
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11592.001.patch
>
>
> When a file is closed, the NN checks if all the blocks are complete. Instead 
> of a simple 'if (!complete) throw new IllegalState(expensive-err-string)" it 
> invokes "Preconditions.checkStatus(complete, expensive-err-string)". The 
> check is done in a loop for all blocks, so more blocks = more penalty. The 
> expensive string should only be computed when an error actually occurs. A 
> telltale sign is seeing this in a stacktrace:
> {noformat}
>at java.lang.Class.getEnclosingMethod0(Native Method)
> at java.lang.Class.getEnclosingMethodInfo(Class.java:1072)
> at java.lang.Class.getEnclosingClass(Class.java:1272)
> at java.lang.Class.getSimpleBinaryName(Class.java:1443)
> at java.lang.Class.getSimpleName(Class.java:1309)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:246)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11538) Move ClientProtocol HA proxies into hadoop-hdfs-client

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11538:
-
Fix Version/s: 2.9.0

> Move ClientProtocol HA proxies into hadoop-hdfs-client
> --
>
> Key: HDFS-11538
> URL: https://issues.apache.org/jira/browse/HDFS-11538
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Andrew Wang
>Assignee: Huafeng Wang
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11538.001.patch, HDFS-11538.002.patch, 
> HDFS-11538.003.patch, HDFS-11538-branch-2.001.patch
>
>
> Follow-up for HDFS-11431. We should move this missing class over rather than 
> pulling in the whole hadoop-hdfs dependency.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11609:
-
Fix Version/s: 2.9.0

> Some blocks can be permanently lost if nodes are decommissioned while dead
> --
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, 
> HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch, 
> HDFS-11609_v3.branch-2.7.patch, HDFS-11609_v3.branch-2.patch, 
> HDFS-11609_v3.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while 
> they are dead, they get decommissioned right away even if there are missing 
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online. 
> The namenode no longer shows missing blocks, which creates a false sense of 
> cluster health. When the decommissioned nodes are removed and reformatted, 
> the block data is permanently lost. The namenode will report missing blocks 
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the 
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in 
> testing vs. production, it took a while to reproduce it in a unit test. I 
> will present analysis and proposal soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11661) GetContentSummary uses excessive amounts of memory

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11661:
-
Fix Version/s: 2.9.0

> GetContentSummary uses excessive amounts of memory
> --
>
> Key: HDFS-11661
> URL: https://issues.apache.org/jira/browse/HDFS-11661
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Nathan Roberts
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11661.001.patch, HDFs-11661.002.patch, Heap 
> growth.png
>
>
> ContentSummaryComputationContext::nodeIncluded() is being used to keep track 
> of all INodes visited during the current content summary calculation. This 
> can be all of the INodes in the filesystem, making for a VERY large hash 
> table. This simply won't work on large filesystems. 
> We noticed this after upgrading a namenode with ~100Million filesystem 
> objects was spending significantly more time in GC. Fortunately this system 
> had some memory breathing room, other clusters we have will not run with this 
> additional demand on memory.
> This was added as part of HDFS-10797 as a way of keeping track of INodes that 
> have already been accounted for - to avoid double counting.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11648) Lazy construct the IIP pathname

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11648:
-
Fix Version/s: 2.9.0

> Lazy construct the IIP pathname 
> 
>
> Key: HDFS-11648
> URL: https://issues.apache.org/jira/browse/HDFS-11648
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11648.patch
>
>
> The IIP pathname is a string constructed from the byte[][] components.  If 
> the pathname will never be accessed, ex. processing listStatus children, 
> building the path is unnecessarily expensive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11702) Remove indefinite caching of key provider uri in DFSClient

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11702:
-
Fix Version/s: 2.9.0

> Remove indefinite caching of key provider uri in DFSClient
> --
>
> Key: HDFS-11702
> URL: https://issues.apache.org/jira/browse/HDFS-11702
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11702.patch
>
>
> There is an indefinite caching of key provider uri in dfsclient.
> Relevant piece of code.
> {code:title=DFSClient.java|borderStyle=solid}
>   /**
>* The key provider uri is searched in the following order.
>* 1. If there is a mapping in Credential's secrets map for namenode uri.
>* 2. From namenode getServerDefaults rpc.
>* 3. Finally fallback to local conf.
>* @return keyProviderUri if found from either of above 3 cases,
>* null otherwise
>* @throws IOException
>*/
>   URI getKeyProviderUri() throws IOException {
> if (keyProviderUri != null) {
>   return keyProviderUri;
> }
> // Lookup the secret in credentials object for namenodeuri.
> Credentials credentials = ugi.getCredentials();
>...
>...
> {code}
> Once the key provider uri is set, it won't refresh the value even if the key 
> provider uri on namenode is changed.
> For long running clients like on oozie servers, this means we have to bounce 
> all the oozie servers to get the change reflected.
> After this change, the client will cache the value for an hour after which it 
> will issue getServerDefaults call and will refresh the key provider uri.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11714) Newly added NN storage directory won't get initialized and cause space exhaustion

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11714:
-
Fix Version/s: 2.9.0

> Newly added NN storage directory won't get initialized and cause space 
> exhaustion
> -
>
> Key: HDFS-11714
> URL: https://issues.apache.org/jira/browse/HDFS-11714
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11714.trunk.patch, HDFS-11714.v2.branch-2.patch, 
> HDFS-11714.v2.trunk.patch, HDFS-11714.v3.branch-2.patch, 
> HDFS-11714.v3.trunk.patch
>
>
> When an empty namenode storage directory is detected on normal NN startup, it 
> may not be fully initialized. The new directory is still part of "in-service" 
> NNStrage and when a checkpoint image is uploaded, a copy will also be written 
> there.  However, the retention manager won't be able to purge old files since 
> it is lacking a VERSION file.  This causes fsimages to pile up in the 
> directory.  With a big name space, the disk will be filled in the order of 
> days or weeks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11787) After HDFS-11515, -du still throws ConcurrentModificationException

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11787:
-
Fix Version/s: 2.9.0

> After HDFS-11515, -du still throws ConcurrentModificationException
> --
>
> Key: HDFS-11787
> URL: https://issues.apache.org/jira/browse/HDFS-11787
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots, tools
>Affects Versions: 3.0.0-alpha4, 2.8.1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.1
>
>
> I ran a modified NameNode that was patched against HDFS-11515 on a production 
> cluster fsimage, and I am still seeing ConcurrentModificationException.
> It seems that there are corner cases not convered by HDFS-11515. File this 
> jira to discuss how to proceed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11817:
-
Fix Version/s: 2.9.0

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt, 
> HDFS-11817.v2.branch-2.8.patch, HDFS-11817.v2.branch-2.patch, 
> HDFS-11817.v2.trunk.patch
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11597) Ozone: Add Ratis management API

2017-05-29 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028737#comment-16028737
 ] 

Anu Engineer commented on HDFS-11597:
-

[~szetszwo] Thanks for the patch. Just to make sure that I understand the flow 
correctly,  I wanted to describe the high level flow and then discuss this 
specific patch. Please correct me if my understanding is flawed.

# I am presuming that we have 2 separate states , one is the SCM side -- which 
returns the pipeline and RatisManagerImpl takes care of forming a Ratis cluster 
using that information. At some point of time,  this info will also be written 
to the local Ratis of SCM.
# Then this information is used to create a Ratis cluster dynamically, which is 
not part of this patch, this patch deals only with the SCM part of it
# When we update members we are only updating the SCM view and due to that we 
don't need to do a Ratis change, but at some point if we are changing the 
memebership of a Ratis cluster, then this will be more involved. That is, it 
will have a remove datanode followed by an add datanode, so that Ratis logs can 
catch up.

Just want to make sure that my understanding is correct.




> Ozone: Add Ratis management API
> ---
>
> Key: HDFS-11597
> URL: https://issues.apache.org/jira/browse/HDFS-11597
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-11597-HDFS-7240.20170522.patch, 
> HDFS-11597-HDFS-7240.20170523.patch, HDFS-11597-HDFS-7240.20170524.patch, 
> HDFS-11597-HDFS-7240.20170528b.patch, HDFS-11597-HDFS-7240.20170528.patch, 
> HDFS-11597-HDFS-7240.20170529.patch
>
>
> We need APIs to manage Ratis clusters for the following operations:
> - create cluster;
> - close cluster;
> - get members; and
> - update members.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11799) Introduce a config to allow setting up write pipeline with fewer nodes than replication factor

2017-05-29 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028729#comment-16028729
 ] 

Brahma Reddy Battula commented on HDFS-11799:
-

[~yzhangal] can you please review patch ..?

> Introduce a config to allow setting up write pipeline with fewer nodes than 
> replication factor
> --
>
> Key: HDFS-11799
> URL: https://issues.apache.org/jira/browse/HDFS-11799
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
> Attachments: HDFS-11799.patch
>
>
> During pipeline recovery, if not enough DNs can be found, if 
> dfs.client.block.write.replace-datanode-on-failure.best-effort
> is enabled, we let the pipeline to continue, even if there is a single DN.
> Similarly, when we create the write pipeline initially, if for some reason we 
> can't find enough DNs, we can have a similar config to enable writing with a 
> single DN.
> More study will be done.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11900) initThreadsNumForHedgedReads does not synchronize access to the static pool

2017-05-29 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028594#comment-16028594
 ] 

John Zhuge edited comment on HDFS-11900 at 5/30/17 1:30 AM:


There were related discussions in the original HDFS-5776. The rationale behind 
the static pool was that too many threads with many DFSClients.
* 
https://issues.apache.org/jira/browse/HDFS-5776?focusedCommentId=13879476&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13879476
*  
https://issues.apache.org/jira/browse/HDFS-5776?focusedCommentId=13880280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13880280
* 
https://issues.apache.org/jira/browse/HDFS-5776?focusedCommentId=13882606&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13882606

So per-client thread pool is not desirable. And shared thread pool does manage 
the resource more efficiently. However still do not like the "static" part; DI 
may be better.



was (Author: jzhuge):
There were related discussions in the original HDFS-5776. The rationale behind 
the static pool was that too many threads with many DFSClients if pool size is 
big.
* 
https://issues.apache.org/jira/browse/HDFS-5776?focusedCommentId=13879476&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13879476
*  
https://issues.apache.org/jira/browse/HDFS-5776?focusedCommentId=13880280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13880280
* 
https://issues.apache.org/jira/browse/HDFS-5776?focusedCommentId=13882606&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13882606

So per-client thread pool is desirable. And shared thread pool does manage the 
resource more efficiently. However still do not like the "static" part; DI may 
be better.


> initThreadsNumForHedgedReads does not synchronize access to the static pool
> ---
>
> Key: HDFS-11900
> URL: https://issues.apache.org/jira/browse/HDFS-11900
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: John Zhuge
>
> *Non-static* synchronized method initThreadsNumForHedgedReads can't 
> synchronize the access to the *static* class variable HEDGED_READ_THREAD_POOL.
> {code}
>   private static ThreadPoolExecutor HEDGED_READ_THREAD_POOL;
> ...
>   private synchronized void initThreadsNumForHedgedReads(int num) {
> {code}
> 2 DFS clients may update the same static variable in a race because the lock 
> is on each DFS client object, not on the shared DFSClient class object.
> There are 2 possible fixes:
> 1. "Global thread pool": Change initThreadsNumForHedgedReads to static
> 2. "Per-client thread pool": Change HEDGED_READ_THREAD_POOL to non-static
> From the description for property {{dfs.client.hedged.read.threadpool.size}}:
> {quote}
> to a positive number. The threadpool size is how many threads to dedicate
> to the running of these 'hedged', concurrent reads in your client.
> {quote}
> it seems to indicate the thread pool is per DFS client.
> Let's assume we go with #1 "Global thread pool". One DFS client has the 
> property set to 10 in its config, while the other client has the property set 
> to 5 in its config, what is supposed to the size of the global thread pool? 
> 5? 10? Or 15?
> The 2nd fix seems more reasonable to me.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11900) initThreadsNumForHedgedReads does not synchronize access to the static pool

2017-05-29 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-11900:
--
Summary: initThreadsNumForHedgedReads does not synchronize access to the 
static pool  (was: HEDGED_READ_THREAD_POOL should not be static)

> initThreadsNumForHedgedReads does not synchronize access to the static pool
> ---
>
> Key: HDFS-11900
> URL: https://issues.apache.org/jira/browse/HDFS-11900
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: John Zhuge
>
> *Non-static* synchronized method initThreadsNumForHedgedReads can't 
> synchronize the access to the *static* class variable HEDGED_READ_THREAD_POOL.
> {code}
>   private static ThreadPoolExecutor HEDGED_READ_THREAD_POOL;
> ...
>   private synchronized void initThreadsNumForHedgedReads(int num) {
> {code}
> 2 DFS clients may update the same static variable in a race because the lock 
> is on each DFS client object, not on the shared DFSClient class object.
> There are 2 possible fixes:
> 1. "Global thread pool": Change initThreadsNumForHedgedReads to static
> 2. "Per-client thread pool": Change HEDGED_READ_THREAD_POOL to non-static
> From the description for property {{dfs.client.hedged.read.threadpool.size}}:
> {quote}
> to a positive number. The threadpool size is how many threads to dedicate
> to the running of these 'hedged', concurrent reads in your client.
> {quote}
> it seems to indicate the thread pool is per DFS client.
> Let's assume we go with #1 "Global thread pool". One DFS client has the 
> property set to 10 in its config, while the other client has the property set 
> to 5 in its config, what is supposed to the size of the global thread pool? 
> 5? 10? Or 15?
> The 2nd fix seems more reasonable to me.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11900) HEDGED_READ_THREAD_POOL should not be static

2017-05-29 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028594#comment-16028594
 ] 

John Zhuge commented on HDFS-11900:
---

There were related discussions in the original HDFS-5776. The rationale behind 
the static pool was that too many threads with many DFSClients if pool size is 
big.
* 
https://issues.apache.org/jira/browse/HDFS-5776?focusedCommentId=13879476&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13879476
*  
https://issues.apache.org/jira/browse/HDFS-5776?focusedCommentId=13880280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13880280
* 
https://issues.apache.org/jira/browse/HDFS-5776?focusedCommentId=13882606&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13882606

So per-client thread pool is desirable. And shared thread pool does manage the 
resource more efficiently. However still do not like the "static" part; DI may 
be better.


> HEDGED_READ_THREAD_POOL should not be static
> 
>
> Key: HDFS-11900
> URL: https://issues.apache.org/jira/browse/HDFS-11900
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: John Zhuge
>
> *Non-static* synchronized method initThreadsNumForHedgedReads can't 
> synchronize the access to the *static* class variable HEDGED_READ_THREAD_POOL.
> {code}
>   private static ThreadPoolExecutor HEDGED_READ_THREAD_POOL;
> ...
>   private synchronized void initThreadsNumForHedgedReads(int num) {
> {code}
> 2 DFS clients may update the same static variable in a race because the lock 
> is on each DFS client object, not on the shared DFSClient class object.
> There are 2 possible fixes:
> 1. "Global thread pool": Change initThreadsNumForHedgedReads to static
> 2. "Per-client thread pool": Change HEDGED_READ_THREAD_POOL to non-static
> From the description for property {{dfs.client.hedged.read.threadpool.size}}:
> {quote}
> to a positive number. The threadpool size is how many threads to dedicate
> to the running of these 'hedged', concurrent reads in your client.
> {quote}
> it seems to indicate the thread pool is per DFS client.
> Let's assume we go with #1 "Global thread pool". One DFS client has the 
> property set to 10 in its config, while the other client has the property set 
> to 5 in its config, what is supposed to the size of the global thread pool? 
> 5? 10? Or 15?
> The 2nd fix seems more reasonable to me.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-5042) Completed files lost after power failure

2017-05-29 Thread Kanaka Kumar Avvaru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028431#comment-16028431
 ] 

Kanaka Kumar Avvaru commented on HDFS-5042:
---

Thanks for the patch [~vinayrpet]. I think we need to ensure dir sync on 
*hsync()* also as client apps may consider the data is flushed to disk. What is 
your view?

> Completed files lost after power failure
> 
>
> Key: HDFS-5042
> URL: https://issues.apache.org/jira/browse/HDFS-5042
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
>Reporter: Dave Latham
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-5042-01.patch, HDFS-5042-02.patch, 
> HDFS-5042-03.patch, HDFS-5042-04.patch, HDFS-5042-05-branch-2.patch, 
> HDFS-5042-05.patch, HDFS-5042-branch-2-01.patch, HDFS-5042-branch-2-05.patch
>
>
> We suffered a cluster wide power failure after which HDFS lost data that it 
> had acknowledged as closed and complete.
> The client was HBase which compacted a set of HFiles into a new HFile, then 
> after closing the file successfully, deleted the previous versions of the 
> file.  The cluster then lost power, and when brought back up the newly 
> created file was marked CORRUPT.
> Based on reading the logs it looks like the replicas were created by the 
> DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
> closed they were moved to the 'current' directory.  After the power cycle 
> those replicas were again in the blocksBeingWritten directory of the 
> underlying file system (ext3).  When those DataNodes reported in to the 
> NameNode it deleted those replicas and lost the file.
> Some possible fixes could be having the DataNode fsync the directory(s) after 
> moving the block from blocksBeingWritten to current to ensure the rename is 
> durable or having the NameNode accept replicas from blocksBeingWritten under 
> certain circumstances.
> Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
> {noformat}
> RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
> Creating 
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  with permission=rwxrwxrwx
> NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
>  blk_1395839728632046111_357084589
> DN 2013-06-29 11:16:06,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
> blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
> /10.0.5.237:50010
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
> blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
> blk_1395839728632046111_357084589 terminating
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
> lease on  file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  from client DFSClient_hb_rs_hs745,60020,1372470111932
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.completeFile: file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Renaming compacted file at 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  to 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
> RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Completed major compaction of 7 file(s) in n of 
> users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
> 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
> ---  CRASH, RESTART -
> NN 2013-06-29 12:01:1

[jira] [Commented] (HDFS-11583) Parent spans not initialized to NullScope for every DFSPacket

2017-05-29 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028408#comment-16028408
 ] 

Masatake Iwasaki commented on HDFS-11583:
-

I need +1 from another committer to commit this, pinging [~cmccabe] and 
[~stack] for review.

> Parent spans not initialized to NullScope for every DFSPacket
> -
>
> Key: HDFS-11583
> URL: https://issues.apache.org/jira/browse/HDFS-11583
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tracing
>Affects Versions: 2.7.1
>Reporter: Karan Mehta
>Assignee: Masatake Iwasaki
> Attachments: HDFS-11583-branch-2.7.001.patch, 
> HDFS-11583-branch-2.7.002.patch, HDFS-11583-branch-2.7.003.patch
>
>
> The issue was found while working with PHOENIX-3752.
> Each packet received by the {{run()}} method of {{DataStreamer}} class, uses 
> the {{parents}} field of the {{DFSPacket}} to create a new {{dataStreamer}} 
> span, which in turn creates a {{writeTo}} span as its child span. The parents 
> field is initialized when the packet is added to the {{dataQueue}} and the 
> value is initialized from the {{ThreadLocal}}. This is how HTrace handles 
> spans. 
> A {{TraceScope}} is created and initialized to {{NullScope}} before the loop 
> which runs till the point when the stream is closed. 
> Consider the following scenario, when the {{dataQueue}} contains multiple 
> packets, only the first of which has a tracing enabled. The scope is 
> initialized to the {{dataStreamer}} scope and a {{writeTo}} span is created 
> as its child, which gets closed once the packet is sent out to a remote 
> datanode. Before {{writeTo}} span is started, the {{dataStreamer}} scope is 
> detached. So calling the close method on it doesn't do anything at the end of 
> loop. 
> The second iteration will be using the stale value of the {{scope}} variable 
> with a DFSPacket on which tracing is not enabled. This results in generation 
> of an orphan {{writeTo}} spans which are being delivered to the 
> {{SpanReceiver}} as registered in the TraceFramework. This may result in 
> unlimited number of spans being generated and sent out to receiver. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11779) Ozone: KSM: add listBuckets

2017-05-29 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028395#comment-16028395
 ] 

Yiqun Lin edited comment on HDFS-11779 at 5/29/17 2:17 PM:
---

Thanks [~cheersyang] for updated patch. Just catch one nit in latest patch:
{code}
+  int maxNumOfKeys = args.getMaxKeys();
+  if (maxNumOfKeys < 0 ||
+  maxNumOfKeys > OzoneConsts.MAX_LISTBUCKETS_SIZE) {
+throw new IllegalArgumentException(
+String.format("Illegal max number of keys specified,"
++ " the value must be in range (0, %d), actual : %d.",
+OzoneConsts.MAX_LISTBUCKETS_SIZE, maxNumOfKeys));
+  }
{code}
The {{maxNumOfKeys < 0}} should be {{maxNumOfKeys <=0}}, var {{maxNumOfKeys}} 
should be a positive number.


was (Author: linyiqun):
Thanks [~cheersyang] for updated patch. Just catch one nit in latest patch:
{code}
+  int maxNumOfKeys = args.getMaxKeys();
+  if (maxNumOfKeys < 0 ||
+  maxNumOfKeys > OzoneConsts.MAX_LISTBUCKETS_SIZE) {
+throw new IllegalArgumentException(
+String.format("Illegal max number of keys specified,"
++ " the value must be in range (0, %d), actual : %d.",
+OzoneConsts.MAX_LISTBUCKETS_SIZE, maxNumOfKeys));
+  }
{code}
The {{maxNumOfKeys < 0}} should be {{maxNumOfKeys <=0 }}, var {{maxNumOfKeys}} 
should be a positive number.

> Ozone: KSM: add listBuckets
> ---
>
> Key: HDFS-11779
> URL: https://issues.apache.org/jira/browse/HDFS-11779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Weiwei Yang
> Attachments: HDFS-11779-HDFS-7240.001.patch, 
> HDFS-11779-HDFS-7240.002.patch, HDFS-11779-HDFS-7240.003.patch, 
> HDFS-11779-HDFS-7240.004.patch, HDFS-11779-HDFS-7240.005.patch, 
> HDFS-11779-HDFS-7240.006.patch, HDFS-11779-HDFS-7240.007.patch
>
>
> Lists buckets of a given volume. Similar to listVolumes, paging supported via 
> prevKey, prefix and maxKeys.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11779) Ozone: KSM: add listBuckets

2017-05-29 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028395#comment-16028395
 ] 

Yiqun Lin commented on HDFS-11779:
--

Thanks [~cheersyang] for updated patch. Just catch one nit in latest patch:
{code}
+  int maxNumOfKeys = args.getMaxKeys();
+  if (maxNumOfKeys < 0 ||
+  maxNumOfKeys > OzoneConsts.MAX_LISTBUCKETS_SIZE) {
+throw new IllegalArgumentException(
+String.format("Illegal max number of keys specified,"
++ " the value must be in range (0, %d), actual : %d.",
+OzoneConsts.MAX_LISTBUCKETS_SIZE, maxNumOfKeys));
+  }
{code}
The {{maxNumOfKeys < 0}} should be {{maxNumOfKeys <=0 }}, var {{maxNumOfKeys}} 
should be a positive number.

> Ozone: KSM: add listBuckets
> ---
>
> Key: HDFS-11779
> URL: https://issues.apache.org/jira/browse/HDFS-11779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Weiwei Yang
> Attachments: HDFS-11779-HDFS-7240.001.patch, 
> HDFS-11779-HDFS-7240.002.patch, HDFS-11779-HDFS-7240.003.patch, 
> HDFS-11779-HDFS-7240.004.patch, HDFS-11779-HDFS-7240.005.patch, 
> HDFS-11779-HDFS-7240.006.patch, HDFS-11779-HDFS-7240.007.patch
>
>
> Lists buckets of a given volume. Similar to listVolumes, paging supported via 
> prevKey, prefix and maxKeys.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11896) Non-dfsUsed will be doubled on dead node re-registration in branch-2.7.

2017-05-29 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027775#comment-16027775
 ] 

Brahma Reddy Battula edited comment on HDFS-11896 at 5/29/17 10:50 AM:
---

bq.Can we do just that in this jira for 2.7. Will it fix NonDfsUsed reporting 
issue?

Yes,we can backport HDFS-9034.
bq. I think your patch is useful in all versions, should not be restricted to 
2.7 only, especially the test case. Why don't you target this change for trunk, 
etc.

Yes, it can be committed to all versions.But issue will not occur in other 
versions because nondfs is reset during the register before adding back stats. 
and  *test will not fail without source changes .* 

IMO,we can merge attached patch to all versions which will fix the 
{{nondfsused}} and we can raise seperate issue to backport HDFS-9034.


was (Author: brahmareddy):
bq.Can we do just that in this jira for 2.7. Will it fix NonDfsUsed reporting 
issue?

Yes,we can backport HDFS-9034.

bq. I think your patch is useful in all versions, should not be restricted to 
2.7 only, especially the test case. Why don't you target this change for trunk, 
etc.

Yes, it can be committed to all versions.But issue will not occur in other 
versions because nondfs is reset during the register before adding back stats. 
and  *test will not fail without source changes .* 



> Non-dfsUsed will be doubled on dead node re-registration in branch-2.7.
> ---
>
> Key: HDFS-11896
> URL: https://issues.apache.org/jira/browse/HDFS-11896
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>  Labels: release-blocker
> Attachments: HDFS-11896-branch-2.7-001.patch
>
>
>  *Scenario:* 
> i)Make you sure you've non-dfs data.
> ii) Stop Datanode
> iii) wait it becomes dead
> iv) now restart and check the non-dfs data



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11899) ASF License warnings generated intermittently in trunk

2017-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028179#comment-16028179
 ] 

Hadoop QA commented on HDFS-11899:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 203 unchanged - 2 fixed = 203 total (was 205) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 19s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}117m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks 
|
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11899 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870282/HDFS-11899.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 69de858d0fca 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 6c6a7a5 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19671/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19671/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19671/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> ASF License warnings generated intermittently in trunk
> --
>
> Key: HDFS-11899
> URL: 

[jira] [Commented] (HDFS-11898) DFSClient#isHedgedReadsEnabled() should be per client flag

2017-05-29 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028175#comment-16028175
 ] 

Vinayakumar B commented on HDFS-11898:
--

bq. Vinayakumar B While the patch 01 LGTM, there seems to be some fundamental 
issue described in HDFS-11900 I just filed. Could you please take a look?
Yes, that seems to be reasonable analysis.
May be if go with #2 there non-static threadpool, current issue would be solved 
automatically. Do you want to upload patch for that? or convert current jira 
itself to solve HDFS-11900.


> DFSClient#isHedgedReadsEnabled() should be per client flag 
> ---
>
> Key: HDFS-11898
> URL: https://issues.apache.org/jira/browse/HDFS-11898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-11898-01.patch
>
>
> DFSClient#isHedgedReadsEnabled() returns value based on static 
> {{HEDGED_READ_THREAD_POOL}}. 
> Hence if any of the client initialized this in JVM, all remaining client 
> reads will be going through hedged read itself.
> This flag should be per client value.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11898) DFSClient#isHedgedReadsEnabled() should be per client flag

2017-05-29 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028173#comment-16028173
 ] 

John Zhuge commented on HDFS-11898:
---

[~vinayrpet] While the patch 01 LGTM, there seems to be some fundamental issue 
described in HDFS-11900 I just filed. Could you please take a look?

> DFSClient#isHedgedReadsEnabled() should be per client flag 
> ---
>
> Key: HDFS-11898
> URL: https://issues.apache.org/jira/browse/HDFS-11898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-11898-01.patch
>
>
> DFSClient#isHedgedReadsEnabled() returns value based on static 
> {{HEDGED_READ_THREAD_POOL}}. 
> Hence if any of the client initialized this in JVM, all remaining client 
> reads will be going through hedged read itself.
> This flag should be per client value.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11900) HEDGED_READ_THREAD_POOL should not be static

2017-05-29 Thread John Zhuge (JIRA)
John Zhuge created HDFS-11900:
-

 Summary: HEDGED_READ_THREAD_POOL should not be static
 Key: HDFS-11900
 URL: https://issues.apache.org/jira/browse/HDFS-11900
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.8.0
Reporter: John Zhuge


*Non-static* synchronized method initThreadsNumForHedgedReads can't synchronize 
the access to the *static* class variable HEDGED_READ_THREAD_POOL.
{code}
  private static ThreadPoolExecutor HEDGED_READ_THREAD_POOL;
...
  private synchronized void initThreadsNumForHedgedReads(int num) {
{code}
2 DFS clients may update the same static variable in a race because the lock is 
on each DFS client object, not on the shared DFSClient class object.

There are 2 possible fixes:
1. "Global thread pool": Change initThreadsNumForHedgedReads to static
2. "Per-client thread pool": Change HEDGED_READ_THREAD_POOL to non-static

>From the description for property {{dfs.client.hedged.read.threadpool.size}}:
{quote}
to a positive number. The threadpool size is how many threads to dedicate
to the running of these 'hedged', concurrent reads in your client.
{quote}
it seems to indicate the thread pool is per DFS client.

Let's assume we go with #1 "Global thread pool". One DFS client has the 
property set to 10 in its config, while the other client has the property set 
to 5 in its config, what is supposed to the size of the global thread pool? 5? 
10? Or 15?

The 2nd fix seems more reasonable to me.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11832) Switch leftover logs to slf4j format in BlockManager.java

2017-05-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028160#comment-16028160
 ] 

Hudson commented on HDFS-11832:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11795 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11795/])
HDFS-11832. Switch leftover logs to slf4j format in BlockManager.java. 
(aajisaka: rev a7f085d6bf499edf23e650a4f7211c53a442da0e)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java


> Switch leftover logs to slf4j format in BlockManager.java
> -
>
> Key: HDFS-11832
> URL: https://issues.apache.org/jira/browse/HDFS-11832
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.0, 2.8.0, 3.0.0-alpha1
>Reporter: Hui Xu
>Assignee: Chen Liang
>Priority: Minor
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-11832.001.patch, HDFS-11832.002.patch, 
> HDFS-11832.003.patch, HDFS-11832.004.patch, HDFS-11832.005.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> HDFS-7706 Switch BlockManager logging to use slf4j. But the logging formats 
> were not modified appropriately. For example:
>   if (LOG.isDebugEnabled()) {
> LOG.debug("blocks = " + java.util.Arrays.asList(blocks));
>   }
> These codes should be modified to:
>   LOG.debug("blocks = {}", java.util.Arrays.asList(blocks));



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11882) Client fails if acknowledged size is greater than bytes sent

2017-05-29 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028143#comment-16028143
 ] 

Akira Ajisaka commented on HDFS-11882:
--

Thanks [~tasanuma0829] for the review! Updated the title and description.

> Client fails if acknowledged size is greater than bytes sent
> 
>
> Key: HDFS-11882
> URL: https://issues.apache.org/jira/browse/HDFS-11882
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding, test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Attachments: HDFS-11882.01.patch
>
>
> Some tests of erasure coding fails by the following exception. The following 
> test was removed by HDFS-11823, however, this type of error can happen in 
> real cluster.
> {noformat}
> Running 
> org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 10, Time elapsed: 89.086 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
> testMultipleDatanodeFailure56(org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure)
>   Time elapsed: 38.831 sec  <<< ERROR!
> java.lang.IllegalStateException: null
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
>   at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.updatePipeline(DFSStripedOutputStream.java:780)
>   at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamerFailures(DFSStripedOutputStream.java:664)
>   at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1034)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:842)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>   at 
> org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:472)
>   at 
> org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTestWithMultipleFailure(TestDFSStripedOutputStreamWithFailure.java:381)
>   at 
> org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56(TestDFSStripedOutputStreamWithFailure.java:245)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11882) Client fails if acknowledged size is greater than bytes sent

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11882:
-
Description: 
Some tests of erasure coding fails by the following exception. The following 
test was removed by HDFS-11823, however, this type of error can happen in real 
cluster.
{noformat}
Running 
org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
Tests run: 14, Failures: 0, Errors: 1, Skipped: 10, Time elapsed: 89.086 sec 
<<< FAILURE! - in 
org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
testMultipleDatanodeFailure56(org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure)
  Time elapsed: 38.831 sec  <<< ERROR!
java.lang.IllegalStateException: null
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.updatePipeline(DFSStripedOutputStream.java:780)
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamerFailures(DFSStripedOutputStream.java:664)
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1034)
at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:842)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
at 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:472)
at 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTestWithMultipleFailure(TestDFSStripedOutputStreamWithFailure.java:381)
at 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56(TestDFSStripedOutputStreamWithFailure.java:245)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}

  was:
{noformat}
Running 
org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
Tests run: 14, Failures: 0, Errors: 1, Skipped: 10, Time elapsed: 89.086 sec 
<<< FAILURE! - in 
org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
testMultipleDatanodeFailure56(org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure)
  Time elapsed: 38.831 sec  <<< ERROR!
java.lang.IllegalStateException: null
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.updatePipeline(DFSStripedOutputStream.java:780)
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamerFailures(DFSStripedOutputStream.java:664)
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1034)
at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:842)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
at 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:472)
at 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTestWithMultipleFailure(TestDFSStripedOutputStreamWithFailure.java:381)
at 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56(TestDFSStripedOutputStreamWithFailure.java:245)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod

[jira] [Updated] (HDFS-11882) Client fails if acknowledged size is greater than bytes sent

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11882:
-
Summary: Client fails if acknowledged size is greater than bytes sent  
(was: TestDFSRSDefault10x4StripedOutputStreamWithFailure and 
TestDFSStripedOutputStreamWithFailure010 fail)

> Client fails if acknowledged size is greater than bytes sent
> 
>
> Key: HDFS-11882
> URL: https://issues.apache.org/jira/browse/HDFS-11882
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding, test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Attachments: HDFS-11882.01.patch
>
>
> {noformat}
> Running 
> org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 10, Time elapsed: 89.086 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
> testMultipleDatanodeFailure56(org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure)
>   Time elapsed: 38.831 sec  <<< ERROR!
> java.lang.IllegalStateException: null
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
>   at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.updatePipeline(DFSStripedOutputStream.java:780)
>   at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamerFailures(DFSStripedOutputStream.java:664)
>   at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1034)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:842)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>   at 
> org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:472)
>   at 
> org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTestWithMultipleFailure(TestDFSStripedOutputStreamWithFailure.java:381)
>   at 
> org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56(TestDFSStripedOutputStreamWithFailure.java:245)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11832) Switch leftover logs to slf4j format in BlockManager.java

2017-05-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11832:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha4
   Status: Resolved  (was: Patch Available)

Committed this to trunk. Thanks [~vagarychen] and [~10075197] for the 
contribution!

> Switch leftover logs to slf4j format in BlockManager.java
> -
>
> Key: HDFS-11832
> URL: https://issues.apache.org/jira/browse/HDFS-11832
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.0, 2.8.0, 3.0.0-alpha1
>Reporter: Hui Xu
>Assignee: Chen Liang
>Priority: Minor
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-11832.001.patch, HDFS-11832.002.patch, 
> HDFS-11832.003.patch, HDFS-11832.004.patch, HDFS-11832.005.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> HDFS-7706 Switch BlockManager logging to use slf4j. But the logging formats 
> were not modified appropriately. For example:
>   if (LOG.isDebugEnabled()) {
> LOG.debug("blocks = " + java.util.Arrays.asList(blocks));
>   }
> These codes should be modified to:
>   LOG.debug("blocks = {}", java.util.Arrays.asList(blocks));



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11832) Switch leftover logs to slf4j format in BlockManager.java

2017-05-29 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028137#comment-16028137
 ] 

Akira Ajisaka commented on HDFS-11832:
--

+1, checking this in.

> Switch leftover logs to slf4j format in BlockManager.java
> -
>
> Key: HDFS-11832
> URL: https://issues.apache.org/jira/browse/HDFS-11832
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.0, 2.8.0, 3.0.0-alpha1
>Reporter: Hui Xu
>Assignee: Chen Liang
>Priority: Minor
> Attachments: HDFS-11832.001.patch, HDFS-11832.002.patch, 
> HDFS-11832.003.patch, HDFS-11832.004.patch, HDFS-11832.005.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> HDFS-7706 Switch BlockManager logging to use slf4j. But the logging formats 
> were not modified appropriately. For example:
>   if (LOG.isDebugEnabled()) {
> LOG.debug("blocks = " + java.util.Arrays.asList(blocks));
>   }
> These codes should be modified to:
>   LOG.debug("blocks = {}", java.util.Arrays.asList(blocks));



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11899) ASF License warnings generated intermittently in trunk

2017-05-29 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11899:
-
Status: Patch Available  (was: Open)

> ASF License warnings generated intermittently in trunk
> --
>
> Key: HDFS-11899
> URL: https://issues.apache.org/jira/browse/HDFS-11899
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11899.001.patch
>
>
> Recently ASF License warnings generated intermittently in trunk.
> {noformat}
> Lines that start with ? in the ASF License  report indicate files that do 
> not have an Apache license header:
>  !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/include-hosts-file
>  !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/exclude-hosts-file
> {noformat}
> The root cause of this is that the include/exclude host file created in a 
> wrong place in test {{TestBalacner}}. It's expected in a right test 
> directory. When some unit tests ran timeout in {{TestBalacner}}, then these 
> host files cannot be cleared and generated ASF License warnings.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11899) ASF License warnings generated intermittently in trunk

2017-05-29 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028111#comment-16028111
 ] 

Yiqun Lin commented on HDFS-11899:
--

Use {{GenericTestUtils.getTestDir}} to help create host files in a right path.
Attach the simple patch. Kindly review.

> ASF License warnings generated intermittently in trunk
> --
>
> Key: HDFS-11899
> URL: https://issues.apache.org/jira/browse/HDFS-11899
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11899.001.patch
>
>
> Recently ASF License warnings generated intermittently in trunk.
> {noformat}
> Lines that start with ? in the ASF License  report indicate files that do 
> not have an Apache license header:
>  !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/include-hosts-file
>  !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/exclude-hosts-file
> {noformat}
> The root cause of this is that the include/exclude host file created in a 
> wrong place in test {{TestBalacner}}. It's expected in a right test 
> directory. When some unit tests ran timeout in {{TestBalacner}}, then these 
> host files cannot be cleared and generated ASF License warnings.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11899) ASF License warnings generated intermittently in trunk

2017-05-29 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11899:
-
Attachment: HDFS-11899.001.patch

> ASF License warnings generated intermittently in trunk
> --
>
> Key: HDFS-11899
> URL: https://issues.apache.org/jira/browse/HDFS-11899
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11899.001.patch
>
>
> Recently ASF License warnings generated intermittently in trunk.
> {noformat}
> Lines that start with ? in the ASF License  report indicate files that do 
> not have an Apache license header:
>  !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/include-hosts-file
>  !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/exclude-hosts-file
> {noformat}
> The root cause of this is that the include/exclude host file created in a 
> wrong place in test {{TestBalacner}}. It's expected in a right test 
> directory. When some unit tests ran timeout in {{TestBalacner}}, then these 
> host files cannot be cleared and generated ASF License warnings.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11899) ASF License warnings generated intermittently in trunk

2017-05-29 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11899:
-
Description: 
Recently ASF License warnings generated intermittently in trunk.
{noformat}
Lines that start with ? in the ASF License  report indicate files that do 
not have an Apache license header:
 !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/include-hosts-file
 !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/exclude-hosts-file
{noformat}
The root cause of this is that the include/exclude host file created in a wrong 
place in test {{TestBalacner}}. It's expected to be in a right test directory. 
When some unit tests ran timeout in {{TestBalacner}}, then these host files 
cannot be cleared and generated ASF License warnings.

  was:
Recently ASF License warnings generated intermittently in trunk.
{noformat}
Lines that start with ? in the ASF License  report indicate files that do 
not have an Apache license header:
 !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/include-hosts-file
 !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/exclude-hosts-file
{noformat}
The root cause of this is that the include/exclude host file created in 
{{TestBalacner}} is in a wrong place. It's expected to be in a right test 
directory. When some unit tests ran timeout in {{TestBalacner}}, then these 
host files cannot be cleared and generated ASF License warnings.


> ASF License warnings generated intermittently in trunk
> --
>
> Key: HDFS-11899
> URL: https://issues.apache.org/jira/browse/HDFS-11899
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>
> Recently ASF License warnings generated intermittently in trunk.
> {noformat}
> Lines that start with ? in the ASF License  report indicate files that do 
> not have an Apache license header:
>  !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/include-hosts-file
>  !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/exclude-hosts-file
> {noformat}
> The root cause of this is that the include/exclude host file created in a 
> wrong place in test {{TestBalacner}}. It's expected to be in a right test 
> directory. When some unit tests ran timeout in {{TestBalacner}}, then these 
> host files cannot be cleared and generated ASF License warnings.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11899) ASF License warnings generated intermittently in trunk

2017-05-29 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11899:
-
Description: 
Recently ASF License warnings generated intermittently in trunk.
{noformat}
Lines that start with ? in the ASF License  report indicate files that do 
not have an Apache license header:
 !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/include-hosts-file
 !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/exclude-hosts-file
{noformat}
The root cause of this is that the include/exclude host file created in a wrong 
place in test {{TestBalacner}}. It's expected in a right test directory. When 
some unit tests ran timeout in {{TestBalacner}}, then these host files cannot 
be cleared and generated ASF License warnings.

  was:
Recently ASF License warnings generated intermittently in trunk.
{noformat}
Lines that start with ? in the ASF License  report indicate files that do 
not have an Apache license header:
 !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/include-hosts-file
 !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/exclude-hosts-file
{noformat}
The root cause of this is that the include/exclude host file created in a wrong 
place in test {{TestBalacner}}. It's expected to be in a right test directory. 
When some unit tests ran timeout in {{TestBalacner}}, then these host files 
cannot be cleared and generated ASF License warnings.


> ASF License warnings generated intermittently in trunk
> --
>
> Key: HDFS-11899
> URL: https://issues.apache.org/jira/browse/HDFS-11899
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>
> Recently ASF License warnings generated intermittently in trunk.
> {noformat}
> Lines that start with ? in the ASF License  report indicate files that do 
> not have an Apache license header:
>  !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/include-hosts-file
>  !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/exclude-hosts-file
> {noformat}
> The root cause of this is that the include/exclude host file created in a 
> wrong place in test {{TestBalacner}}. It's expected in a right test 
> directory. When some unit tests ran timeout in {{TestBalacner}}, then these 
> host files cannot be cleared and generated ASF License warnings.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11899) ASF License warnings generated intermittently in trunk

2017-05-29 Thread Yiqun Lin (JIRA)
Yiqun Lin created HDFS-11899:


 Summary: ASF License warnings generated intermittently in trunk
 Key: HDFS-11899
 URL: https://issues.apache.org/jira/browse/HDFS-11899
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0-alpha3
Reporter: Yiqun Lin
Assignee: Yiqun Lin


Recently ASF License warnings generated intermittently in trunk.
{noformat}
Lines that start with ? in the ASF License  report indicate files that do 
not have an Apache license header:
 !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/include-hosts-file
 !? /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/exclude-hosts-file
{noformat}
The root cause of this is that the include/exclude host file created in 
{{TestBalacner}} is in a wrong place. It's expected to be in a right test 
directory. When some unit tests ran timeout in {{TestBalacner}}, then these 
host files cannot be cleared and generated ASF License warnings.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11359) DFSAdmin report command supports displaying maintenance state datanodes

2017-05-29 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028081#comment-16028081
 ] 

Yiqun Lin commented on HDFS-11359:
--

Failure tests are not relate, seems the failure test {{TestBalancer}} cause the 
ASF License warnings.

> DFSAdmin report command supports displaying maintenance state datanodes
> ---
>
> Key: HDFS-11359
> URL: https://issues.apache.org/jira/browse/HDFS-11359
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11359.001.patch, HDFS-11359.002.patch, 
> HDFS-11359.003.patch, HDFS-11359.004.patch
>
>
> The datanode's maintenance state info can be showed in webUI/JMX. But it 
> can't be displayed via CLI. This JIRA will improve on this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11359) DFSAdmin report command supports displaying maintenance state datanodes

2017-05-29 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028081#comment-16028081
 ] 

Yiqun Lin edited comment on HDFS-11359 at 5/29/17 7:10 AM:
---

Failure tests are not related, seems the failure test {{TestBalancer}} cause 
the ASF License warnings.


was (Author: linyiqun):
Failure tests are not relate, seems the failure test {{TestBalancer}} cause the 
ASF License warnings.

> DFSAdmin report command supports displaying maintenance state datanodes
> ---
>
> Key: HDFS-11359
> URL: https://issues.apache.org/jira/browse/HDFS-11359
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11359.001.patch, HDFS-11359.002.patch, 
> HDFS-11359.003.patch, HDFS-11359.004.patch
>
>
> The datanode's maintenance state info can be showed in webUI/JMX. But it 
> can't be displayed via CLI. This JIRA will improve on this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org