[jira] [Resolved] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"

2019-02-27 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDDS-451.
--
Resolution: Cannot Reproduce

Resolving as "Cannot Reproduce".

> PutKey failed due to error "Rejecting write chunk request. Chunk overwrite 
> without explicit request"
> 
>
> Key: HDDS-451
> URL: https://issues.apache.org/jira/browse/HDDS-451
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.2.1
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: alpha2
> Attachments: all-node-ozone-logs-1536841590.tar.gz
>
>
> steps taken :
> --
>  # Ran Put Key command to write 50GB data. Put Key client operation failed 
> after 17 mins.
> error seen  ozone.log :
> 
>  
> {code}
> 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  tmp chunk file
> 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - 
> writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:WRITE_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 206
> 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 207
> 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG 
> (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next 
> container, there is no pending deletion block contained in remaining 
> containers.
> 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG 
> (ContainerSet.java:191) - Starting container report iteration.
> 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - 
> Rejecting write chunk request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - 
> Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : 
> Message: Rejecting write chunk request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR 
> (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite 
> without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO 
> (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: 
> 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk 
> request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
>  
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-826:


 Summary: Update Ratis to 0.3.0-6f3419a-SNAPSHOT
 Key: HDDS-826
 URL: https://issues.apache.org/jira/browse/HDDS-826
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations

2018-10-18 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-691:


 Summary: Dependency convergence error for 
org.apache.hadoop:hadoop-annotations
 Key: HDDS-691
 URL: https://issues.apache.org/jira/browse/HDDS-691
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-632) TimeoutScheduler and SlidingWindow should use daemon threads

2018-10-11 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-632:


 Summary: TimeoutScheduler and SlidingWindow should use daemon 
threads
 Key: HDDS-632
 URL: https://issues.apache.org/jira/browse/HDDS-632
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


In HDDS-625, we found that the Ozone client does not terminate.  The 
SlidingWindow (debug) thread and the TimeoutScheduler threads are holding up 
process termination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-554) In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..)

2018-09-25 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-554:


 Summary: In XceiverClientSpi, implements sendCommand(..) using 
sendCommandAsync(..)
 Key: HDDS-554
 URL: https://issues.apache.org/jira/browse/HDDS-554
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


The advantages is two-fold --
# it simplifies the code, and
# the async API is more efficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-372) There are two buffer copies in ChunkOutputStream

2018-08-23 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-372:


 Summary: There are two buffer copies in ChunkOutputStream
 Key: HDDS-372
 URL: https://issues.apache.org/jira/browse/HDDS-372
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


Currently, there are two buffer copies in ChunkOutputStream
# from byte[] to ByteBuffer, and
# from ByteBuffer to ByteString.

We should eliminate the ByteBuffer in the middle.

For zero copy io, we should support WritableByteChannel instead of 
OutputStream.  It won't be done in this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13205) Incorrect path is passed to checkPermission during authorization of file under a snapshot (specifically under a subdir) after original subdir is deleted

2018-08-10 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-13205.

Resolution: Not A Problem

Resolving as Not A Problem.

> Incorrect path is passed to checkPermission during authorization of file 
> under a snapshot (specifically under a subdir) after original subdir is 
> deleted
> 
>
> Key: HDFS-13205
> URL: https://issues.apache.org/jira/browse/HDFS-13205
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.7.4
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Shashikant Banerjee
>Priority: Major
>
> Steps to reproduce the issue.
> +As 'hdfs' superuser+ 
>  – Create a folder (/hdptest/test) with 700 permissions and ( 
> /hdptest/test/mydir) with 755.
> --HDFS Ranger policy is defined  with RWX for user "test" on /hdptest/test/ 
> recursively.
>  --Allow snapshot on the directory  /hdptest/test/mydir: 
> {code:java}
> #su - test
> [test@node1 ~]$ hdfs dfs -ls /hdptest/test/mydir
> [test@node1 ~]$ hdfs dfs -mkdir /hdptest/test/mydir/test
> [test@node1 ~]$ hdfs dfs -put /etc/passwd /hdptest/test/mydir/test
> [test@node1 ~]$ hdfs lsSnapshottableDir
> drwxr-xr-x 0 test hdfs 0 2018-01-25 14:22 1 65536 /hdptest/test/mydir
>  
> {code}
>  
> -->Create Snapshot  
> {code:java}
> [test@node1 ~]$ hdfs dfs -createSnapshot /hdptest/test/mydir
> Created snapshot /hdptest/test/mydir/.snapshot/s20180125-135430.953
> {code}
>  -->Verifying that snapshot directory has the current files from directory 
> and verify the file is accessible  .snapshot path:  
> {code:java}
> [test@node1 ~]$ hdfs dfs -ls -R 
> /hdptest/test/mydir/.snapshot/s20180125-135430.953
> drwxr-xr-x   - test hdfs  0 2018-01-25 13:53 
> /hdptest/test/mydir/.snapshot/s20180125-135430.953/test
> -rw-r--r--   3 test hdfs   3227 2018-01-25 13:53 
> /hdptest/test/mydir/.snapshot/s20180125-135430.953/test/passwd
> [test@node1 ~]$ hdfs dfs -cat 
> /hdptest/test/mydir/.snapshot/s20180125-135430.953/test/passwd | tail
> livytest:x:1015:496::/home/livytest:/bin/bash
> ehdpzepp:x:1016:496::/home/ehdpzepp:/bin/bash
> zepptest:x:1017:496::/home/zepptest:/bin/bash
> {code}
>  -->Remove the file from main directory and verified that file is still 
> accessible:
> {code:java}
> [test@node1 ~]$ hdfs dfs -rm /hdptest/test/mydir/test/passwd
> 18/01/25 13:55:06 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://rangerSME/hdptest/test/mydir/test/passwd' to trash at: 
> hdfs://rangerSME/user/test/.Trash/Current/hdptest/test/mydir/test/passwd
> [test@node1 ~]$ hdfs dfs -cat 
> /hdptest/test/mydir/.snapshot/s20180125-135430.953/test/passwd | tail
> livytest:x:1015:496::/home/livytest:/bin/bash
> {code}
>  -->Remove the parent directory of the file which was deleted, now accessing 
> the same file under .snapshot dir fails with permission denied error
> {code:java}
> [test@node1 ~]$ hdfs dfs -rm -r /hdptest/test/mydir/test
> 18/01/25 13:55:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://rangerSME/hdptest/test/mydir/test' to trash at: 
> hdfs://rangerSME/user/test/.Trash/Current/hdptest/test/mydir/test1516888525269
> [test@node1 ~]$ hdfs dfs -cat 
> /hdptest/test/mydir/.snapshot/s20180125-135430.953/test/passwd | tail
> cat: Permission denied: user=test, access=EXECUTE, 
> inode="/hdptest/test/mydir/.snapshot/s20180125-135430.953/test/passwd":hdfs:hdfs:drwxr-x---
>  
> {code}
>  Ranger policies are not honored in this case for .snapshot directories/files 
> after main directory is deleted under snapshotable directory.
>  Workaround is to provide execute permission at HDFS level for the parent 
> folder 
> {code:java}
> #su - hdfs
> #hdfs dfs -chmod 701 /hdptest/test
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-293) Reduce memory usage in KeyData

2018-07-25 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-293:


 Summary: Reduce memory usage in KeyData
 Key: HDDS-293
 URL: https://issues.apache.org/jira/browse/HDDS-293
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


Currently, the field chunks is declared as a List in KeyData as 
shown below.
{code}
//KeyData.java
  private List chunks;
{code}
It is expected that many KeyData objects only have a single chunk.  We could 
reduce the memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-288) Fix bugs in OpenContainerBlockMap

2018-07-24 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-288:


 Summary: Fix bugs in OpenContainerBlockMap
 Key: HDDS-288
 URL: https://issues.apache.org/jira/browse/HDDS-288
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


- OpenContainerBlockMap should not be synchronized for a better performance. 
- There is a memory leak in removeContainer(..) -- it sets the entry to null 
instead of removing it.
- addChunkToMap may add the same chunk twice.  See the comments below.
{code}
  keyDataSet.putIfAbsent(blockID.getLocalID(), getKeyData(info, blockID)); 
// (1) when id is absent, it puts
  keyDataSet.computeIfPresent(blockID.getLocalID(), (key, value) -> { // 
(2) now, the id is present, it adds again.
value.addChunk(info);
return value;
  });
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-42) Inconsistent module names and descriptions

2018-05-10 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-42:
---

 Summary: Inconsistent module names and descriptions
 Key: HDDS-42
 URL: https://issues.apache.org/jira/browse/HDDS-42
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


The hdds/ozone module names and descriptions are inconsistent:
- Missing "Hadoop" in some cases.
- Inconsistent use of acronyms.
- Inconsistent capitalization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-28) Duplicate declaration in hadoop-tools/hadoop-ozone/pom.xml

2018-05-07 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-28:
---

 Summary: Duplicate declaration in hadoop-tools/hadoop-ozone/pom.xml
 Key: HDDS-28
 URL: https://issues.apache.org/jira/browse/HDDS-28
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Filesystem
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


{code}
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hadoop:hadoop-ozone-filesystem:jar:3.2.0-SNAPSHOT
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
be unique: org.apache.hadoop:hadoop-hdds-server-framework:jar -> duplicate 
declaration of version (?) @ 
org.apache.hadoop:hadoop-ozone-filesystem:[unknown-version], 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-tools/hadoop-ozone/pom.xml, line 
173, column 17
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
be unique: org.apache.hadoop:hadoop-hdds-server-scm:jar -> duplicate 
declaration of version (?) @ 
org.apache.hadoop:hadoop-ozone-filesystem:[unknown-version], 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-tools/hadoop-ozone/pom.xml, line 
178, column 17
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
be unique: org.apache.hadoop:hadoop-hdds-client:jar -> duplicate declaration of 
version (?) @ org.apache.hadoop:hadoop-ozone-filesystem:[unknown-version], 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-tools/hadoop-ozone/pom.xml, line 
183, column 17
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
be unique: org.apache.hadoop:hadoop-hdds-container-service:jar -> duplicate 
declaration of version (?) @ 
org.apache.hadoop:hadoop-ozone-filesystem:[unknown-version], 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-tools/hadoop-ozone/pom.xml, line 
188, column 17
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
be unique: org.apache.hadoop:hadoop-ozone-ozone-manager:jar -> duplicate 
declaration of version (?) @ 
org.apache.hadoop:hadoop-ozone-filesystem:[unknown-version], 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-tools/hadoop-ozone/pom.xml, line 
193, column 17
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13526) Use TimeoutScheduler in RaftClientImpl

2018-05-03 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-13526:
--

 Summary: Use TimeoutScheduler in RaftClientImpl
 Key: HDFS-13526
 URL: https://issues.apache.org/jira/browse/HDFS-13526
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


TimeoutScheduler is thread safe and have auto-shutdown when there are no tasks. 
 Let's also use it in RaftClientImpl for submitting retry requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-898) Sequential generation of block ids

2018-03-29 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-898.
--
Resolution: Duplicate

This was done by HDFS-4645.  Resolving ...

> Sequential generation of block ids
> --
>
> Key: HDFS-898
> URL: https://issues.apache.org/jira/browse/HDFS-898
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 0.20.1
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: DuplicateBlockIds.patch, FreeBlockIds.pdf, 
> HighBitProjection.pdf, blockid.tex, blockid20100122.pdf
>
>
> This is a proposal to replace random generation of block ids with a 
> sequential generator in order to avoid block id reuse in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13252) Code refactoring: Remove Diff.ListType

2018-03-08 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-13252:
--

 Summary: Code refactoring: Remove Diff.ListType
 Key: HDFS-13252
 URL: https://issues.apache.org/jira/browse/HDFS-13252
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: snapshots
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


In Diff, there are only two lists, created and deleted.  It is easier to trace 
the code if the methods have the list type in the method name, instead of 
passing a ListType parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13223) Reduce DiffListBySkipList memory usage

2018-03-02 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-13223:
--

 Summary: Reduce DiffListBySkipList memory usage
 Key: HDFS-13223
 URL: https://issues.apache.org/jira/browse/HDFS-13223
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: snapshots
Reporter: Tsz Wo Nicholas Sze
Assignee: Shashikant Banerjee


There are several ways to reduce memory footprint of DiffListBySkipList.
- Move maxSkipLevels and skipInterval to DirectoryDiffListFactory.
- Use an array for skipDiffList instead of List.
- Do not store the level 0 element in skipDiffList.
- Do not create new ChildrenDiff for the same value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12839) Refactor ratis-server tests to reduce the use DEFAULT_CALLID

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-12839:
--

 Summary: Refactor ratis-server tests to reduce the use 
DEFAULT_CALLID
 Key: HDFS-12839
 URL: https://issues.apache.org/jira/browse/HDFS-12839
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


This JIRA is to help reducing the patch size in RATIS-141.

We refactor the tests so that DEFAULT_CALLID is only used in MiniRaftCluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12527) CLONE - javadoc: error - class file for org.apache.http.annotation.ThreadSafe not found

2017-09-21 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-12527:
--

 Summary: CLONE - javadoc: error - class file for 
org.apache.http.annotation.ThreadSafe not found
 Key: HDFS-12527
 URL: https://issues.apache.org/jira/browse/HDFS-12527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Tsz Wo Nicholas Sze
Assignee: Mukul Kumar Singh


{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:2.10.4:jar (module-javadocs) on 
project hadoop-hdfs-client: MavenReportException: Error while generating 
Javadoc: 
[ERROR] Exit code: 1 - 
/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java:694:
 warning - Tag @link: reference not found: StripingCell
[ERROR] javadoc: error - class file for org.apache.http.annotation.ThreadSafe 
not found
[ERROR] 
[ERROR] Command line was: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/jre/../bin/javadoc
 -J-Xmx768m @options @packages
[ERROR] 
[ERROR] Refer to the generated Javadoc files in 
'/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs-client/target/api' 
dir.
{code}
To reproduce the error above, run
{code}
mvn package -Pdist -DskipTests -DskipDocs -Dtar
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12507) StripedBlockUtil.java:694: warning - Tag @link: reference not found: StripingCell

2017-09-20 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-12507:
--

 Summary: StripedBlockUtil.java:694: warning - Tag @link: reference 
not found: StripingCell
 Key: HDFS-12507
 URL: https://issues.apache.org/jira/browse/HDFS-12507
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Tsz Wo Nicholas Sze


{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:2.10.4:jar (module-javadocs) on 
project hadoop-hdfs-client: MavenReportException: Error while generating 
Javadoc: 
[ERROR] Exit code: 1 - 
/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java:694:
 warning - Tag @link: reference not found: StripingCell
[ERROR] javadoc: error - class file for org.apache.http.annotation.ThreadSafe 
not found
[ERROR] 
[ERROR] Command line was: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/jre/../bin/javadoc
 -J-Xmx768m @options @packages
[ERROR] 
[ERROR] Refer to the generated Javadoc files in 
'/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs-client/target/api' 
dir.
{code}
To reproduce the error above, run
{code}
mvn package -Pdist -DskipTests -DskipDocs -Dtar
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12244) Ozone: the static cache provided by ContainerCache does not work in Unit tests

2017-08-01 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-12244:
--

 Summary: Ozone: the static cache provided by ContainerCache does 
not work in Unit tests 
 Key: HDFS-12244
 URL: https://issues.apache.org/jira/browse/HDFS-12244
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ozone
Reporter: Tsz Wo Nicholas Sze


Since a cluster may have >1 datanodes, a static ContainerCache is shared among 
the datanodes.  When one datanode shutdown, the cache will be shutdown so that 
the other datanodes cannot use the cache any more.  It results in 
"leveldb.DBException: Closed"

{code}
org.iq80.leveldb.DBException: Closed
at org.fusesource.leveldbjni.internal.JniDB.get(JniDB.java:75)
at org.apache.hadoop.utils.LevelDBStore.get(LevelDBStore.java:109)
at 
org.apache.hadoop.ozone.container.common.impl.KeyManagerImpl.getKey(KeyManagerImpl.java:116)
at 
org.apache.hadoop.ozone.container.common.impl.Dispatcher.handleGetSmallFile(Dispatcher.java:677)
at 
org.apache.hadoop.ozone.container.common.impl.Dispatcher.smallFileHandler(Dispatcher.java:293)
at 
org.apache.hadoop.ozone.container.common.impl.Dispatcher.dispatch(Dispatcher.java:121)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatch(ContainerStateMachine.java:94)
...
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12163) Ozone: MiniOzoneCluster uses 400+ threads

2017-07-19 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-12163:
--

 Summary: Ozone: MiniOzoneCluster uses 400+ threads
 Key: HDFS-12163
 URL: https://issues.apache.org/jira/browse/HDFS-12163
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ozone, test
Reporter: Tsz Wo Nicholas Sze


Checked the number of active threads used in MiniOzoneCluster with various 
settings:
- Local handlers
- Distributed handlers
- Ratis-Netty
- Ratis-gRPC

The results are similar for all the settings.  It uses 400+ threads.

Moreover, there is a thread leak -- a number of the threads do not shutdown 
after the test is finished.  Therefore, when tests run consecutively, the later 
tests use more threads.

Will post the details in comments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12006) Ozone: add TestDistributedOzoneVolumesRatis

2017-06-21 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-12006:
--

 Summary: Ozone: add TestDistributedOzoneVolumesRatis
 Key: HDFS-12006
 URL: https://issues.apache.org/jira/browse/HDFS-12006
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, test
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


Add Ratis tests similar to TestDistributedOzoneVolumes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11989) Oz

2017-06-17 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11989:
--

 Summary: Oz
 Key: HDFS-11989
 URL: https://issues.apache.org/jira/browse/HDFS-11989
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11979) Ozone: TestContainerPersistence never uses MiniOzoneCluster

2017-06-15 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11979:
--

 Summary: Ozone: TestContainerPersistence never uses 
MiniOzoneCluster
 Key: HDFS-11979
 URL: https://issues.apache.org/jira/browse/HDFS-11979
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, test
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11977) Ozone: cannot enable test debug/trace log

2017-06-15 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11977:
--

 Summary: Ozone: cannot enable test debug/trace log
 Key: HDFS-11977
 URL: https://issues.apache.org/jira/browse/HDFS-11977
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, test
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


Interestingly, the test debug/trace logs are not printed for Ozone classes even 
if we invoke GenericTestUtils.setLogLevel(log, Level.ALL).  Other classes such 
as Object do not have such problem.  Here is a test:
{code}
  @Test
  public void testLogLevel() throws Exception {
runTestLogLevel(StorageContainerManager.class);
runTestLogLevel(Object.class);
  }
  static void runTestLogLevel(Class clazz) throws Exception {
final Logger log = LoggerFactory.getLogger(clazz);
GenericTestUtils.setLogLevel(log, Level.ALL);
log.trace(clazz.getSimpleName() + " trace log");
log.debug(clazz.getSimpleName() + " debug log");
log.info(clazz.getSimpleName() + " info log");
log.warn(clazz.getSimpleName() + " warn log");
log.error(clazz.getSimpleName() + " error log");
  }
{code}
Output:
{code}
2017-06-15 00:19:07,133 [Thread-0] INFO   - StorageContainerManager info log
2017-06-15 00:19:07,135 [Thread-0] WARN   - StorageContainerManager warn log
2017-06-15 00:19:07,135 [Thread-0] ERROR  - StorageContainerManager error 
log
2017-06-15 00:19:07,135 [Thread-0] TRACE 
lang.Object(TestOzoneContainer.java:runTestLogLevel(64)) - Object trace log
2017-06-15 00:19:07,135 [Thread-0] DEBUG 
lang.Object(TestOzoneContainer.java:runTestLogLevel(65)) - Object debug log
2017-06-15 00:19:07,135 [Thread-0] INFO  
lang.Object(TestOzoneContainer.java:runTestLogLevel(66)) - Object info log
2017-06-15 00:19:07,135 [Thread-0] WARN  
lang.Object(TestOzoneContainer.java:runTestLogLevel(67)) - Object warn log
2017-06-15 00:19:07,135 [Thread-0] ERROR 
lang.Object(TestOzoneContainer.java:runTestLogLevel(68)) - Object error log
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11948) Ozone: change TestRatisManager to check cluster with data

2017-06-07 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11948:
--

 Summary: Ozone: change TestRatisManager to check cluster with data
 Key: HDFS-11948
 URL: https://issues.apache.org/jira/browse/HDFS-11948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


TestRatisManager first creates multiple Ratis clusters.  Then it changes the 
membership and closes some clusters.  However, it does not test the clusters 
with data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11947) BPOfferService prints a invalid warning message "Block pool ID needed, but service not yet registered with NN"

2017-06-07 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11947:
--

 Summary: BPOfferService prints a invalid warning message "Block 
pool ID needed, but service not yet registered with NN"
 Key: HDFS-11947
 URL: https://issues.apache.org/jira/browse/HDFS-11947
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11946) Ozone: Containers in different datanodes are mapped to the same location

2017-06-07 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11946:
--

 Summary: Ozone: Containers in different datanodes are mapped to 
the same location
 Key: HDFS-11946
 URL: https://issues.apache.org/jira/browse/HDFS-11946
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Tsz Wo Nicholas Sze
Assignee: Anu Engineer


This is a problem in unit tests.  Containers with the same container name in 
different datanodes are mapped to the same local path location. For example, 
As a result, the first datanode will be able to succeed creating the container 
file but the remaining datanodes will fail to create the container file with 
FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11865) Ozone: Do not initialize Ratis cluster during datanode startup

2017-05-22 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11865:
--

 Summary: Ozone: Do not initialize Ratis cluster during datanode 
startup
 Key: HDFS-11865
 URL: https://issues.apache.org/jira/browse/HDFS-11865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


During a datanode startup, we current pass dfs.container.ratis.conf so that the 
datanode is bound to a particular Ratis cluster.

In this JIRA, we change Datanode that the datanode is no longer bound to any 
Ratis cluster during startup. We use the Ratis reinitialize request (RATIS-86) 
to set up a Ratis cluster later on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11843) Ozone: XceiverClientRatis should implement XceiverClientSpi.connect()

2017-05-17 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11843:
--

 Summary: Ozone: XceiverClientRatis should implement 
XceiverClientSpi.connect()
 Key: HDFS-11843
 URL: https://issues.apache.org/jira/browse/HDFS-11843
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


When a XceiverClientRatis object is newly created, it automatically connect to 
the server.  This is not a correct behavior.

It should implement XceiverClientSpi.connect().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11735) Ozone: In Ratis, leader should validate ContainerCommandRequestProto before propagating it to followers

2017-05-01 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11735:
--

 Summary: Ozone: In Ratis, leader should validate 
ContainerCommandRequestProto before propagating it to followers
 Key: HDFS-11735
 URL: https://issues.apache.org/jira/browse/HDFS-11735
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11734) Ozone: provide a way to validate ContainerCommandRequestProto

2017-05-01 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11734:
--

 Summary: Ozone: provide a way to validate 
ContainerCommandRequestProto
 Key: HDFS-11734
 URL: https://issues.apache.org/jira/browse/HDFS-11734
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Tsz Wo Nicholas Sze
Assignee: Anu Engineer


We need some API to check if a ContainerCommandRequestProto is valid.

It is useful when the container pipeline is run with Ratis.  Then, the leader 
could first checks if a ContainerCommandRequestProto is valid before the 
request is propagated to the followers.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11597) Ozone: Add Ratis management API

2017-03-29 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11597:
--

 Summary: Ozone: Add Ratis management API
 Key: HDFS-11597
 URL: https://issues.apache.org/jira/browse/HDFS-11597
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


We need an API to manage raft clusters, e.g.
- RaftClusterId createRaftCluster(MembershipConfiguration)
- void closeRaftCluster(RaftClusterId)
- MembershipConfiguration getMembers(RaftClusterId)
- void changeMembership(RaftClusterId, newMembershipConfiguration)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11559) Ozone: MiniOzoneCluster prints too many log messages by default

2017-03-21 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11559:
--

 Summary: Ozone: MiniOzoneCluster prints too many log messages by 
default
 Key: HDFS-11559
 URL: https://issues.apache.org/jira/browse/HDFS-11559
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, test
Reporter: Tsz Wo Nicholas Sze
Priority: Minor


When running tests using MiniOzoneCluster, it prints out tons of debug and 
trace log messages from all logs including the ones in from the libraries such 
as  
- ipc.Server
{code}
2017-03-21 15:13:13,053 [Thread-0] DEBUG ipc.Server 
(RPC.java:registerProtocolAndImpl(895)) - RpcKind = RPC_PROTOCOL_BUFFER 
Protocol Name = org.apache.hadoop.ipc.ProtocolMetaInfoPB version=1 
ProtocolImpl=org.apache.hadoop.ipc.protobuf.ProtocolInfoProtos$ProtocolInfoService$2
 protocolClass=org.apache.hadoop.ipc.ProtocolMetaInfoPB
2017-03-21 15:13:13,058 [Thread-0] DEBUG ipc.Server 
(RPC.java:registerProtocolAndImpl(895)) - RpcKind = RPC_PROTOCOL_BUFFER 
Protocol Name = org.apache.hadoop.hdfs.protocol.ClientProtocol version=1 
ProtocolImpl=org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2
 protocolClass=org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB
2017-03-21 15:13:13,058 [Thread-0] DEBUG ipc.Server 
(RPC.java:registerProtocolAndImpl(895)) - RpcKind = RPC_PROTOCOL_BUFFER 
Protocol Name = org.apache.hadoop.ha.HAServiceProtocol version=1 
ProtocolImpl=org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2
 protocolClass=org.apache.hadoop.ha.protocolPB.HAServiceProtocolPB
{code}
- netty
{code}
2017-03-21 15:13:14,912 [Thread-0] DEBUG nio.NioEventLoop 
(Slf4JLogger.java:debug(76)) - -Dio.netty.noKeySetOptimization: false
2017-03-21 15:13:14,912 [Thread-0] DEBUG nio.NioEventLoop 
(Slf4JLogger.java:debug(76)) - -Dio.netty.selectorAutoRebuildThreshold: 512
2017-03-21 15:13:14,916 [Thread-0] TRACE nio.NioEventLoop 
(Slf4JLogger.java:trace(46)) - Instrumented an optimized java.util.Set into: 
sun.nio.ch.KQueueSelectorImpl@501c140b
2017-03-21 15:13:14,916 [Thread-0] TRACE nio.NioEventLoop 
(Slf4JLogger.java:trace(46)) - Instrumented an optimized java.util.Set into: 
sun.nio.ch.KQueueSelectorImpl@30ebe2a0
2017-03-21 15:13:14,917 [Thread-0] TRACE nio.NioEventLoop 
(Slf4JLogger.java:trace(46)) - Instrumented an optimized java.util.Set into: 
sun.nio.ch.KQueueSelectorImpl@98cbeb6
{code}
- beanutils
{code}
2017-03-21 15:13:10,490 [Thread-0] TRACE beanutils.BeanUtils 
(BeanUtilsBean.java:setProperty(888)) -   
setProperty(org.apache.commons.configuration2.PropertiesConfiguration@18f07b01, 
listDelimiterHandler, 
org.apache.commons.configuration2.convert.DefaultListDelimiterHandler@5473ddc2)
2017-03-21 15:13:10,491 [Thread-0] TRACE beanutils.BeanUtils 
(BeanUtilsBean.java:setProperty(906)) - Target bean = 
org.apache.commons.configuration2.PropertiesConfiguration@18f07b01
2017-03-21 15:13:10,491 [Thread-0] TRACE beanutils.BeanUtils 
(BeanUtilsBean.java:setProperty(907)) - Target name = listDelimiterHandler
{code}
- eclipse.jetty
{code} 
2017-03-21 15:13:14,796 [Thread-0] DEBUG component.ContainerLifeCycle 
(ContainerLifeCycle.java:addBean(323)) - 
org.eclipse.jetty.server.Server@32cd6303 added 
{qtp48399352{STOPPED,8<=0<=200,i=0,q=0},AUTO}
2017-03-21 15:13:14,797 [Thread-0] DEBUG util.DecoratedObjectFactory 
(DecoratedObjectFactory.java:addDecorator(52)) - Adding Decorator: 
org.eclipse.jetty.util.DeprecationWarning@b7a0755
2017-03-21 15:13:14,797 [Thread-0] DEBUG component.ContainerLifeCycle 
(ContainerLifeCycle.java:addBean(323)) - 
org.eclipse.jetty.server.session.SessionHandler@47175536 added 
{org.eclipse.jetty.server.session.HashSessionManager@51fce36f,AUTO}
{code}

The test output becomes very very long.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11558) BPServiceActor thread name is too long

2017-03-21 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11558:
--

 Summary: BPServiceActor thread name is too long
 Key: HDFS-11558
 URL: https://issues.apache.org/jira/browse/HDFS-11558
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


Currently, the thread name looks like
{code}
2017-03-20 18:32:22,022 [DataNode: 
[[[DISK]file:/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/dn1_data0,
 
[DISK]file:/Users/szetszwo/hadoop/t2/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/dn1_data1]]
  heartbeating to localhost/127.0.0.1:51772] INFO  ...
{code}
which contains the full path for each storage dir.  It is unnecessarily long.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11513) Ozone: Separate XceiverServer and XceiverClient into interfaces and implementations

2017-03-08 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11513:
--

 Summary: Ozone: Separate XceiverServer and XceiverClient into 
interfaces and implementations
 Key: HDFS-11513
 URL: https://issues.apache.org/jira/browse/HDFS-11513
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


XceiverServer and XceiverClient are endpoint  acts as the communication layer 
for Ozone containers.  We propose to separate them into interfaces and 
implementations so we can use Ratis or some other library to implement them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11429) Move out the Hadoop RPC config keys from RaftServerConfigKeys

2017-02-17 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11429:
--

 Summary: Move out the Hadoop RPC config keys from 
RaftServerConfigKeys
 Key: HDFS-11429
 URL: https://issues.apache.org/jira/browse/HDFS-11429
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


There are a few Hadoop Rpc specific config keys in RaftServerConfigKeys.  We 
should move them to the ratis-hadoop module.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11168) Bump Netty 4 version

2016-11-22 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11168:
--

 Summary: Bump Netty 4 version
 Key: HDFS-11168
 URL: https://issues.apache.org/jira/browse/HDFS-11168
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


The current Netty 4 version is 4.1.0.Beta5.  We should bump it to a non-beta 
version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10791) Delete block meta file when the block file is missing

2016-08-24 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-10791:
--

 Summary: Delete block meta file when the block file is missing
 Key: HDFS-10791
 URL: https://issues.apache.org/jira/browse/HDFS-10791
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Tsz Wo Nicholas Sze


When the block file is missing, the block meta file should be deleted if it 
exists.

Note that such situation is possible since the meta file is closed before the 
block file, the datanode could be killed in-between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10535) Rename AsyncDistributedFileSystem

2016-06-16 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-10535:
--

 Summary: Rename AsyncDistributedFileSystem
 Key: HDFS-10535
 URL: https://issues.apache.org/jira/browse/HDFS-10535
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


Per discussion in HDFS-9924, AsyncDistributedFileSystem is not a good name 
since we only support nonblocking calls for the moment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-8715) Checkpoint node keeps throwing exception

2016-06-14 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-8715.
---
Resolution: Duplicate

> Checkpoint node keeps throwing exception
> 
>
> Key: HDFS-8715
> URL: https://issues.apache.org/jira/browse/HDFS-8715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.2
> Environment: centos 6.4, sun jdk 1.7
>Reporter: Jiahongchao
>
> I tired to start a checkup node using "bin/hdfs namenode -checkpoint", but it 
> keeps printing
> 15/07/03 23:16:22 ERROR namenode.FSNamesystem: Swallowing exception in 
> NameNodeEditLogRoller:
> java.lang.IllegalStateException: Bad state: BETWEEN_LOG_SEGMENTS
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:172)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.getCurSegmentTxId(FSEditLog.java:495)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeEditLogRoller.run(FSNamesystem.java:4718)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10445) Add timeout tests for async DFS API

2016-05-26 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-10445.

Resolution: Duplicate

After HDFS-10431, all tests have timeout now.  Resolving as duplicate.

> Add timeout tests for async DFS API
> ---
>
> Key: HDFS-10445
> URL: https://issues.apache.org/jira/browse/HDFS-10445
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
>
> As a result of HADOOP-13168 commit, async DFS APIs should also be tested in 
> the case of timeout (i.e. Future#get(int timeout, TimeUnit unit)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10319) Balancer should not try to pair storages with different types

2016-04-20 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-10319:
--

 Summary: Balancer should not try to pair storages with different 
types
 Key: HDFS-10319
 URL: https://issues.apache.org/jira/browse/HDFS-10319
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


This is a performance bug – Balancer may pair a source datanode and a target 
datanode with different storage types. Fortunately, it will fail schedule any 
blocks in such pair since it will find out that the storage types are not 
matched later on.

The bug won't lead to incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9943) Support reconfiguring namenode replication confs

2016-03-10 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9943:
-

 Summary: Support reconfiguring namenode replication confs
 Key: HDFS-9943
 URL: https://issues.apache.org/jira/browse/HDFS-9943
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Xiaobing Zhou


The following confs should be re-configurable in runtime.
- dfs.namenode.replication.work.multiplier.per.iteration
- dfs.namenode.replication.max-streams
- dfs.namenode.replication.max-streams-hard-limit




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-03-08 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9924:
-

 Summary: [umbrella] Asynchronous HDFS Access
 Key: HDFS-9924
 URL: https://issues.apache.org/jira/browse/HDFS-9924
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: fs
Reporter: Tsz Wo Nicholas Sze
Assignee: Xiaobing Zhou


This is an umbrella JIRA for supporting Asynchronous HDFS Access.

Currently, all the API methods are blocking calls -- the caller is blocked 
until the method returns.  It is very slow if a client makes a large number of 
independent calls in a single thread since each call has to wait until the 
previous call is finished.  It is inefficient if a client needs to create a 
large number of threads to invoke the calls.

We propose adding a new API to support asynchronous calls, i.e. the caller is 
not blocked.  The methods in the new API immediately return a Java Future 
object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9838) Refactor the excessReplicateMap to a class

2016-02-19 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9838:
-

 Summary: Refactor the excessReplicateMap to a class
 Key: HDFS-9838
 URL: https://issues.apache.org/jira/browse/HDFS-9838
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h9838_20160219.patch

There are a lot of code duplication for accessing the excessReplicateMap in 
BlockManger.  Let's refactor the related code to a class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9825) Balancer should not terminate if only one of the namenodes has error

2016-02-17 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9825:
-

 Summary: Balancer should not terminate if only one of the 
namenodes has error
 Key: HDFS-9825
 URL: https://issues.apache.org/jira/browse/HDFS-9825
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer & mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


Currently, the Balancer terminates if only one of the namenodes has error in 
federation setting.  Instead, it should continue balancing the cluster with the 
remaining namenodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8050) Separate the client conf key from DFSConfigKeys

2016-02-17 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-8050.
---
Resolution: Duplicate

The was done by other subtasks in HDFS-8048.

> Separate the client conf key from DFSConfigKeys
> ---
>
> Key: HDFS-8050
> URL: https://issues.apache.org/jira/browse/HDFS-8050
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>
> Currently, all the conf keys are in DFSConfigKeys.  We should separate the 
> public client DFSConfigKeys to a new class in org.apache.hadoop.hdfs.client 
> as described by [~wheat9] in HDFS-6566.
> For the private conf keys, they may be moved to a new class in 
> org.apache.hadoop.hdfs.client.impl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9822) BlockManager.validateReconstructionWork throws AssertionError

2016-02-17 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9822:
-

 Summary: BlockManager.validateReconstructionWork throws 
AssertionError
 Key: HDFS-9822
 URL: https://issues.apache.org/jira/browse/HDFS-9822
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Reporter: Tsz Wo Nicholas Sze


Found the following AssertionError in 
https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
{code}
AssertionError: Should wait the previous reconstruction to finish
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
at java.lang.Thread.run(Thread.java:745)

at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9729) Use null to replace DataNode.EMPTY_DEL_HINT

2016-02-01 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9729:
-

 Summary: Use null to replace DataNode.EMPTY_DEL_HINT
 Key: HDFS-9729
 URL: https://issues.apache.org/jira/browse/HDFS-9729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


When a delete-hint is unavailable, the current code may use null or 
DataNode.EMPTY_DEL_HINT as a default value.  Let's uniformly use null for an 
empty delele-hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9726) Refactor IBR code to a new class

2016-01-30 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9726:
-

 Summary: Refactor IBR code to a new class
 Key: HDFS-9726
 URL: https://issues.apache.org/jira/browse/HDFS-9726
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h9726_20160131.patch

The IBR code currently is mainly in BPServiceActor.  The JIRA is to refactor it 
to a new class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9710) Change DN to send block receipt IBRs in batches

2016-01-26 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9710:
-

 Summary: Change DN to send block receipt IBRs in batches
 Key: HDFS-9710
 URL: https://issues.apache.org/jira/browse/HDFS-9710
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


When a DN has received a block, it immediately sends a block receipt IBR RPC to 
NN for report the block.  Even if a DN has received multiple blocks about the 
same time, it still sends multiple RPCs.  It does not scale well since NN has 
to process a huge number of RPCs when many DNs receiving many blocks at the 
same time.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9690) addBlock is not idempotent

2016-01-23 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9690:
-

 Summary: addBlock is not idempotent
 Key: HDFS-9690
 URL: https://issues.apache.org/jira/browse/HDFS-9690
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the 
bug. It failed in the following builds.
- 
https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
- 
https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
- 
https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9654) Code refactoring for HDFS-8578

2016-01-16 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9654:
-

 Summary: Code refactoring for HDFS-8578
 Key: HDFS-9654
 URL: https://issues.apache.org/jira/browse/HDFS-9654
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


This is a code refactoring JIRA in order to change Datanode to process all 
storage/data dirs in parallel; see also HDFS-8578.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9573) o.a.h.hdfs.protocol.SnapshotDiffReport$DiffReportEntry$hashCode inconsistent with equals

2016-01-12 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-9573.
---
Resolution: Invalid

"SnapshotDiffReport$DiffReportEntry$hashCode inconsistent with equals" is clear 
invalid.  Resolving as Invalid.

> o.a.h.hdfs.protocol.SnapshotDiffReport$DiffReportEntry$hashCode inconsistent 
> with equals
> 
>
> Key: HDFS-9573
> URL: https://issues.apache.org/jira/browse/HDFS-9573
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> DiffReportEntry.equals() uses field "type", but DiffReportEntry.hashCode() 
> doesn't. This breaks the rules on equals and hashCode:
> * if a class overrides equals, it must override hashCode
> * when they are both overridden, equals and hashCode must use the same set of 
> fields



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9527) The return type of FSNamesystem.getBlockCollection should be changed to INodeFile

2015-12-08 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9527:
-

 Summary: The return type of FSNamesystem.getBlockCollection should 
be changed to INodeFile
 Key: HDFS-9527
 URL: https://issues.apache.org/jira/browse/HDFS-9527
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


FSNamesystem.getBlockCollection always returns INodeFile.  It avoids 
unnecessary conversion from BlockCollection to INode/INodeFile after the change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9528) Cleanup namenode audit/log/exception messages

2015-12-08 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9528:
-

 Summary: Cleanup namenode audit/log/exception messages
 Key: HDFS-9528
 URL: https://issues.apache.org/jira/browse/HDFS-9528
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


- Cleanup unnecessary long methods for constructing message strings.
- Avoid calling toString() methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-4488) Confusing WebHDFS exception when host doesn't resolve

2015-12-04 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-4488.
---
  Resolution: Cannot Reproduce
Target Version/s: 2.1.0-beta, 3.0.0  (was: 3.0.0, 2.1.0-beta)

{code}
$hadoop fs -ls webhdfs://unresolvable-host/
15/12/04 11:48:33 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
-ls: java.net.UnknownHostException: unresolvable-host
...
$echo $?
255
{code}
The message is already fixed.  Resolving as Cannot Reproduce.

> Confusing WebHDFS exception when host doesn't resolve
> -
>
> Key: HDFS-4488
> URL: https://issues.apache.org/jira/browse/HDFS-4488
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 0.23.0
>Reporter: Daryn Sharp
>
> {noformat}
> $ hadoop fs -ls webhdfs://unresolvable-host/
> ls: unresolvable-host
> $ echo $?
> 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-2593) Rename webhdfs HTTP param 'delegation' to 'delegationtoken'

2015-12-04 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-2593.
---
Resolution: Not A Problem

Resolving this stale issue as Not A Problem.

> Rename webhdfs HTTP param 'delegation' to 'delegationtoken'
> ---
>
> Key: HDFS-2593
> URL: https://issues.apache.org/jira/browse/HDFS-2593
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 0.23.1, 1.0.0, 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>
> to be consistent with other params names and to be more clear for users on 
> what it is.
> webhdfs spec doc should be updated as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9509) Add new metrics for measuring datanode storage statistics

2015-12-04 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9509:
-

 Summary: Add new metrics for measuring datanode storage statistics
 Key: HDFS-9509
 URL: https://issues.apache.org/jira/browse/HDFS-9509
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Tsz Wo Nicholas Sze


We already have sendDataPacketBlockedOnNetworkNanos and 
sendDataPacketTransferNanos for the transferTo case.  We should add more 
metrics for the other cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3439) Balancer exits if fs.defaultFS is set to a different, but semantically identical, URI from dfs.namenode.rpc-address

2015-12-03 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-3439.
---
  Resolution: Cannot Reproduce
Target Version/s:   (was: )

Resolving as Cannot Reproduce.
{code}
$hadoop balancer -Dfs.defaultFS=hdfs://foo.example.com:8020/ 
-Ddfs.namenode.servicerpc-address=hdfs://foo.example.com:8020
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

15/12/03 13:47:12 INFO balancer.Balancer: namenodes  = 
[hdfs://foo.example.com:8020]
{code}

> Balancer exits if fs.defaultFS is set to a different, but semantically 
> identical, URI from dfs.namenode.rpc-address
> ---
>
> Key: HDFS-3439
> URL: https://issues.apache.org/jira/browse/HDFS-3439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>
> The balancer determines the set of NN URIs to balance by looking at 
> fs.defaultFS and all possible dfs.namenode.(service)rpc-address settings. If 
> fs.defaultFS is, for example, set to "hdfs://foo.example.com:8020/" (note the 
> trailing "/") and the rpc-address is set to "hdfs://foo.example.com:8020" 
> (without a "/"), then the balancer will conclude that there are two NNs and 
> try to balance both. However, since both of these URIs refer to the same 
> actual FS instance, the balancer will exit with "java.io.IOException: Another 
> balancer is running.  Exiting ..."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-5958) One very large node in a cluster prevents balancer from balancing data

2015-12-03 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-5958.
---
Resolution: Duplicate

> One very large node in a cluster prevents balancer from balancing data
> --
>
> Key: HDFS-5958
> URL: https://issues.apache.org/jira/browse/HDFS-5958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.2.0
> Environment: Hadoop cluster with 4 nodes: 3 with 500Gb drives and one 
> with 4Tb drive.
>Reporter: Alexey Kovyrin
>
> In a cluster with a set of small nodes and one much larger node balancer 
> always selects the large node as the target even though it already has a copy 
> of each block in the cluster.
> This causes the balancer to enter an infinite loop and stop balancing other 
> nodes because each balancing iteration selects the same target and then could 
> not find a single block to move.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-2220) balancer.Balancer: java.lang.NullPointerException while HADOOP_CONF_DIR is empty or wrong

2015-12-03 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-2220.
---
Resolution: Cannot Reproduce

Let's resolve this as Cannot Reproduce.

> balancer.Balancer: java.lang.NullPointerException while HADOOP_CONF_DIR is 
> empty or wrong
> -
>
> Key: HDFS-2220
> URL: https://issues.apache.org/jira/browse/HDFS-2220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 0.20.204.0
>Reporter: Rajit Saha
>
> When HADOOP_CONF_DIR is empty or wrongly set and balancer is called without 
> proper --config , in clientside STDOUT we get NPE.
> $ echo $HADOOP_CONF_DIR
> $ hadoop balancer
> Balancing took 46.0 milliseconds
> 11/06/13 05:14:04 ERROR balancer.Balancer: java.lang.NullPointerException
> at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:136)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:176)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:206)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:200)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.createNamenode(Balancer.java:911)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.init(Balancer.java:860)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:1475)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:811)
> I think it would be good to give more meaningful error messege instead of NPE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-2851) HA: Optimize stale block processing by triggering block reports immediately on failover

2015-12-03 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-2851.
---
  Resolution: Not A Problem
Target Version/s:   (was: )

Resolving this stale issue as Not A Problem.

> HA: Optimize stale block processing by triggering block reports immediately 
> on failover
> ---
>
> Key: HDFS-2851
> URL: https://issues.apache.org/jira/browse/HDFS-2851
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, datanode, ha, namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-2851-HDFS-1623-Test.patch
>
>
> After Balancer runs, usedSpace is not balancing correctly.
> {code}
> java.util.concurrent.TimeoutException: Cluster failed to reached expected 
> values of totalSpace (current: 1500, expected: 1500), or usedSpace (current: 
> 390, expected: 300), in more than 2 msec.
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForHeartBeat(TestBalancer.java:233)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes.testBalancerWithHANameNodes(TestBalancerWithHANameNodes.java:99)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-2621) Balancer is not checking ALREADY_RUNNING state and never returns this state.

2015-12-03 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-2621.
---
Resolution: Not A Problem

Resolving this stale issue as Not A Problem.

> Balancer is not checking ALREADY_RUNNING state and never returns this state.
> 
>
> Key: HDFS-2621
> URL: https://issues.apache.org/jira/browse/HDFS-2621
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 0.23.1, 2.0.0-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-9434.
---
Resolution: Fixed

Sangjin, thanks for the review.

I have committed the branch-2.6 patch.

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.6.3
>
> Attachments: h9434_20151116.patch, h9434_20151116_branch-2.6.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reopened HDFS-9434:
---

Reopen for backporting to branch-2.6

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.6.3
>
> Attachments: h9434_20151116.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-8246) Get HDFS file name based on block pool id and block id

2015-11-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reopened HDFS-8246:
---

> Get HDFS file name based on block pool id and block id
> --
>
> Key: HDFS-8246
> URL: https://issues.apache.org/jira/browse/HDFS-8246
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client, namenode
>Reporter: feng xu
>Assignee: feng xu
>  Labels: BB2015-05-TBR
> Attachments: HDFS-8246.0.patch
>
>
> This feature provides HDFS shell command and C/Java API to retrieve HDFS file 
> name based on block pool id and block id.
> 1. The Java API in class DistributedFileSystem
> public String getFileName(String poolId, long blockId) throws IOException
> 2. The C API in hdfs.c
> char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId)
> 3. The HDFS shell command 
>  hdfs dfs [generic options] -fn  
> This feature is useful if you have HDFS block file name in local file system 
> and want to  find out the related HDFS file name in HDFS name space 
> (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop).
>   Each HDFS block file name in local file system contains both block pool id 
> and block id, for sample HDFS block file name 
> /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825,
>   the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id 
> is 1073741825. The block  pool id is uniquely related to a HDFS name 
> node/name space,  and the block id is uniquely related to a HDFS file within 
> a HDFS name node/name space, so the combination of block pool id and a block 
> id is uniquely related a HDFS file name. 
> The shell command and C/Java API do not map the block pool id to name node, 
> so it’s user’s responsibility to talk to the correct name node in federation 
> environment that has multiple name nodes. The block pool id is used by name 
> node to check if the user is talking with the correct name node.
> The implementation is straightforward. The client request to get HDFS file 
> name reaches the new method String getFileName(String poolId, long blockId) 
> in FSNamesystem in name node through RPC,  and the new method does the 
> followings,
> (1)   Validate the block pool id.
> (2)   Create Block  based on the block id.
> (3)   Get BlockInfoContiguous from Block.
> (4)   Get BlockCollection from BlockInfoContiguous.
> (5)   Get file name from BlockCollection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8246) Get HDFS file name based on block pool id and block id

2015-11-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-8246.
---
Resolution: Won't Fix

> Get HDFS file name based on block pool id and block id
> --
>
> Key: HDFS-8246
> URL: https://issues.apache.org/jira/browse/HDFS-8246
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client, namenode
>Reporter: feng xu
>Assignee: feng xu
>  Labels: BB2015-05-TBR
> Attachments: HDFS-8246.0.patch
>
>
> This feature provides HDFS shell command and C/Java API to retrieve HDFS file 
> name based on block pool id and block id.
> 1. The Java API in class DistributedFileSystem
> public String getFileName(String poolId, long blockId) throws IOException
> 2. The C API in hdfs.c
> char* hdfsGetFileName(hdfsFS fs, const char* poolId, int64_t blockId)
> 3. The HDFS shell command 
>  hdfs dfs [generic options] -fn  
> This feature is useful if you have HDFS block file name in local file system 
> and want to  find out the related HDFS file name in HDFS name space 
> (http://stackoverflow.com/questions/10881449/how-to-find-file-from-blockname-in-hdfs-hadoop).
>   Each HDFS block file name in local file system contains both block pool id 
> and block id, for sample HDFS block file name 
> /hdfs/1/hadoop/hdfs/data/current/BP-97622798-10.3.11.84-1428081035160/current/finalized/subdir0/subdir0/blk_1073741825,
>   the block pool id is BP-97622798-10.3.11.84-1428081035160 and the block id 
> is 1073741825. The block  pool id is uniquely related to a HDFS name 
> node/name space,  and the block id is uniquely related to a HDFS file within 
> a HDFS name node/name space, so the combination of block pool id and a block 
> id is uniquely related a HDFS file name. 
> The shell command and C/Java API do not map the block pool id to name node, 
> so it’s user’s responsibility to talk to the correct name node in federation 
> environment that has multiple name nodes. The block pool id is used by name 
> node to check if the user is talking with the correct name node.
> The implementation is straightforward. The client request to get HDFS file 
> name reaches the new method String getFileName(String poolId, long blockId) 
> in FSNamesystem in name node through RPC,  and the new method does the 
> followings,
> (1)   Validate the block pool id.
> (2)   Create Block  based on the block id.
> (3)   Get BlockInfoContiguous from Block.
> (4)   Get BlockCollection from BlockInfoContiguous.
> (5)   Get file name from BlockCollection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8244) HDFS Custom Storage Tier Policies

2015-11-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-8244.
---
Resolution: Duplicate

Resolving.

> HDFS Custom Storage Tier Policies
> -
>
> Key: HDFS-8244
> URL: https://issues.apache.org/jira/browse/HDFS-8244
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer & mover, datanode, hdfs-client, namenode
>Affects Versions: 2.6.0
> Environment: HDP 2.2
>Reporter: Hari Sekhon
>Priority: Minor
>
> Feature request to be able to define custom HDFS storage policies.
> For example, being able to define DISK:2, Archive:n - 2.
> Motivation for this is when integrating the archive tier on another cheaper 
> storage system such as Hedvig which we are not in control of and want to 
> hedge our bets in case something goes wrong with that archive storage system 
> (it's new and unproven) we don't want just one copy of the data left on our 
> cluster in case we lose a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8464) hdfs namenode UI shows "Max Non Heap Memory" is -1 B

2015-11-23 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-8464.
---
Resolution: Duplicate

> hdfs namenode UI shows "Max Non Heap Memory" is -1 B
> 
>
> Key: HDFS-8464
> URL: https://issues.apache.org/jira/browse/HDFS-8464
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
> Environment: suse11.3
>Reporter: tongshiquan
>Priority: Minor
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-8727) Allow using path style addressing for accessing the s3 endpoint

2015-11-23 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reopened HDFS-8727:
---

Since no patch was committed, we should not resolve this as fixed.

> Allow using path style addressing for accessing the s3 endpoint
> ---
>
> Key: HDFS-8727
> URL: https://issues.apache.org/jira/browse/HDFS-8727
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Andrew Baptist
>Assignee: Andrew Baptist
>  Labels: features
> Attachments: hdfs-8728.patch.2
>
>
> There is no ability to specify using path style access for the s3 endpoint. 
> There are numerous non-amazon implementations of storage that support the 
> amazon API's but only support path style access such as Cleversafe and Ceph. 
> Additionally in many environments it is difficult to configure DNS correctly 
> to get virtual host style addressing to work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8727) Allow using path style addressing for accessing the s3 endpoint

2015-11-23 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-8727.
---
   Resolution: Not A Problem
Fix Version/s: (was: 2.7.2)

> Allow using path style addressing for accessing the s3 endpoint
> ---
>
> Key: HDFS-8727
> URL: https://issues.apache.org/jira/browse/HDFS-8727
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Andrew Baptist
>Assignee: Andrew Baptist
>  Labels: features
> Attachments: hdfs-8728.patch.2
>
>
> There is no ability to specify using path style access for the s3 endpoint. 
> There are numerous non-amazon implementations of storage that support the 
> amazon API's but only support path style access such as Cleversafe and Ceph. 
> Additionally in many environments it is difficult to configure DNS correctly 
> to get virtual host style addressing to work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9441) Do not call construct path string when choosing block placement targets

2015-11-18 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9441:
-

 Summary: Do not call construct path string when choosing block 
placement targets
 Key: HDFS-9441
 URL: https://issues.apache.org/jira/browse/HDFS-9441
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


- INodeFile.getName() is expensive since it involves quite a few string 
operations.  The method is called in both ReplicationWork and ErasureCodingWork 
but the default BlockPlacementPolicy does not use the returned string.  We 
should simply pass BlockCollection to reduce unnecessary computation when using 
the default BlockPlacementPolicy.

- Another improvement: the return type of FSNamesystem.getBlockCollection 
should be changed to INodeFile since it always returns an INodeFile object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-16 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9434:
-

 Summary: Recommission a datanode with 500k blocks may pause NN for 
30 seconds
 Key: HDFS-9434
 URL: https://issues.apache.org/jira/browse/HDFS-9434
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


In BlockManager, processOverReplicatedBlocksOnReCommission is called within the 
namespace lock.  There is a (not very useful) log message printed in 
processOverReplicatedBlock.  When there is a large number of blocks stored in a 
storage, printing the log message for each block can pause NN to process any 
other operations.  We did see that it could pause NN  for 30 seconds for a 
storage with 500k blocks.

I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9365) Balaner should call getNNServiceRpcAddressesForCluster after HDFS-6376

2015-11-02 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9365:
-

 Summary: Balaner should call getNNServiceRpcAddressesForCluster 
after HDFS-6376
 Key: HDFS-9365
 URL: https://issues.apache.org/jira/browse/HDFS-9365
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


HDFS-6376 added support for DistCp between two HA clusters.  After the change, 
Balaner will use all the NN from both the local and the remote clusters.  It 
should call getNNServiceRpcAddressesForCluster and only use the local cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9346) TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56 may throw NPE

2015-10-30 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9346:
-

 Summary: 
TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56 may throw 
NPE
 Key: HDFS-9346
 URL: https://issues.apache.org/jira/browse/HDFS-9346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Tsz Wo Nicholas Sze
Priority: Minor


See the NPE in [build 
13294#|https://builds.apache.org/job/PreCommit-HDFS-Build/13294/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testMultipleDatanodeFailure56/].
  It seems a bug in the test.
{code}
java.lang.NullPointerException: null
at 
org.apache.hadoop.hdfs.MiniDFSCluster.stopDataNode(MiniDFSCluster.java:2157)
at 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.killDatanode(TestDFSStripedOutputStreamWithFailure.java:445)
at 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:374)
at 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTestWithMultipleFailure(TestDFSStripedOutputStreamWithFailure.java:301)
at 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56(TestDFSStripedOutputStreamWithFailure.java:172)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9205) Do not sehedule corrupted blocks for replication

2015-10-06 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9205:
-

 Summary: Do not sehedule corrupted blocks for replication
 Key: HDFS-9205
 URL: https://issues.apache.org/jira/browse/HDFS-9205
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


Corrupted blocks by definition are blocks cannot be read. As a consequence, 
they cannot be replicated.  In UnderReplicatedBlocks, there is a queue for 
QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks 
from it.  It seems that scheduling corrupted block for replication is wasting 
resource and potentially slow down replication for the higher priority blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9194) AlreadyBeingCreatedException ... because pendingCreates is non-null but no leases found.

2015-10-05 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-9194.
---
Resolution: Duplicate

> AlreadyBeingCreatedException ... because pendingCreates is non-null but no 
> leases found.
> 
>
> Key: HDFS-9194
> URL: https://issues.apache.org/jira/browse/HDFS-9194
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>
> There is a possible bug in FSDirectory.addFile causing no leases found for 
> under construction files.
> {code}
> //FSDirectory
>   INodeFile addFile(String path, PermissionStatus permissions,
> short replication, long preferredBlockSize,
> String clientName, String clientMachine)
> throws FileAlreadyExistsException, QuotaExceededException,
>   UnresolvedLinkException, SnapshotAccessControlException, AclException {
> long modTime = now();
> INodeFile newNode = newINodeFile(namesystem.allocateNewInodeId(),
> permissions, modTime, modTime, replication, preferredBlockSize);
> newNode.toUnderConstruction(clientName, clientMachine);
> boolean added = false;
> writeLock();
> try {
>   added = addINode(path, newNode);
> } finally {
>   writeUnlock();
> }
> ...
>   }
> {code}
> - newNode.toUnderConstruction(clientName, clientMachine) adds 
> FileUnderConstructionFeature to the INode, i.e. the file becomes an under 
> construction file.  At this moment, there is no lease for this file yet.  The 
> lease will be added later in FSNamesystem.startFileInternal(..).
> - It is possible that addINode(path, newNode) adds the inode to the namespace 
> tree but throws QuotaExceededException later on when calling 
> updateModificationTime.  (i.e. addINode -> addLastINode -> addChild -> 
> parent.addChild -> updateModificationTime throws QuotaExceededException) 
> Then, the newly added uc file is left in namespace but the corresponding 
> lease won't be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9194) AlreadyBeingCreatedException ... because pendingCreates is non-null but no leases found.

2015-10-03 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-9194:
-

 Summary: AlreadyBeingCreatedException ... because pendingCreates 
is non-null but no leases found.
 Key: HDFS-9194
 URL: https://issues.apache.org/jira/browse/HDFS-9194
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


There is a possible bug in FSDirectory.addFile causing no leases found for 
under construction files.
{code}
//FSDirectory
  INodeFile addFile(String path, PermissionStatus permissions,
short replication, long preferredBlockSize,
String clientName, String clientMachine)
throws FileAlreadyExistsException, QuotaExceededException,
  UnresolvedLinkException, SnapshotAccessControlException, AclException {

long modTime = now();
INodeFile newNode = newINodeFile(namesystem.allocateNewInodeId(),
permissions, modTime, modTime, replication, preferredBlockSize);
newNode.toUnderConstruction(clientName, clientMachine);
boolean added = false;
writeLock();
try {
  added = addINode(path, newNode);
} finally {
  writeUnlock();
}
...
  }
{code}
- newNode.toUnderConstruction(clientName, clientMachine) adds 
FileUnderConstructionFeature to the INode, i.e. the file becomes an under 
construction file.  At this moment, there is no lease for this file yet.  The 
lease will be added later in FSNamesystem.startFileInternal(..).
- It is possible that addINode(path, newNode) adds the inode to the namespace 
tree but throws QuotaExceededException later on when calling 
updateModificationTime.  (i.e. addINode -> addLastINode -> addChild -> 
parent.addChild -> updateModificationTime throws QuotaExceededException) Then, 
the newly added uc file is left in namespace but the corresponding lease won't 
be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8341) HDFS mover stuck in loop trying to move corrupt block with no other valid replicas, doesn't move rest of other data blocks

2015-09-23 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-8341.
---
Resolution: Cannot Reproduce

> HDFS mover stuck in loop trying to move corrupt block with no other valid 
> replicas, doesn't move rest of other data blocks
> --
>
> Key: HDFS-8341
> URL: https://issues.apache.org/jira/browse/HDFS-8341
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0
> Environment: HDP 2.2
>Reporter: Hari Sekhon
>Priority: Minor
>
> HDFS mover gets stuck looping on a block that fails to move and doesn't 
> migrate the rest of the blocks.
> This is preventing recovery of data from a decomissioning external storage 
> tier used for archive (we've had problems with that proprietary "hyperscale" 
> storage product which is why a couple blocks here and there have checksum 
> problems or premature eof as shown below), but this should not prevent moving 
> all the other blocks to recover our data:
> {code}hdfs mover -p /apps/hive/warehouse/
> 15/05/07 14:52:50 INFO mover.Mover: namenodes = 
> {hdfs://nameservice1=[/apps/hive/warehouse/]}
> 15/05/07 14:52:51 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 15/05/07 14:52:51 INFO block.BlockTokenSecretManager: Setting block keys
> 15/05/07 14:52:51 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> 15/05/07 14:52:52 INFO block.BlockTokenSecretManager: Setting block keys
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 WARN balancer.Dispatcher: Failed to move 
> blk_1075156654_1438349 with size=134217728 from :1019:ARCHIVE to 
> :1019:DISK through :1019: block move is failed: opReplaceBlock 
> BP-120244285--1417023863606:blk_1075156654_1438349 received exception 
> java.io.EOFException: Premature EOF: no length prefix available
> 
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 WARN balancer.Dispatcher: Failed to move 
> blk_1075156654_1438349 with size=134217728 from :1019:ARCHIVE to 
> :1019:DISK through :1019: block move is failed: opReplaceBlock 
> BP-120244285--1417023863606:blk_1075156654_1438349 received exception 
> java.io.EOFException: Premature EOF: no length prefix available
> ..
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8341) (Summary & Description may be invalid) HDFS mover stuck in loop after failing to move block, doesn't move rest of blocks, can't get data back off decommissioning external

2015-09-17 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-8341.
---
Resolution: Invalid

Resolving as invalid.  Please feel free to reopen if you disagree.

> (Summary & Description may be invalid) HDFS mover stuck in loop after failing 
> to move block, doesn't move rest of blocks, can't get data back off 
> decommissioning external storage tier as a result
> ---
>
> Key: HDFS-8341
> URL: https://issues.apache.org/jira/browse/HDFS-8341
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0
> Environment: HDP 2.2
>Reporter: Hari Sekhon
>Priority: Minor
>
> HDFS mover gets stuck looping on a block that fails to move and doesn't 
> migrate the rest of the blocks.
> This is preventing recovery of data from a decomissioning external storage 
> tier used for archive (we've had problems with that proprietary "hyperscale" 
> storage product which is why a couple blocks here and there have checksum 
> problems or premature eof as shown below), but this should not prevent moving 
> all the other blocks to recover our data:
> {code}hdfs mover -p /apps/hive/warehouse/
> 15/05/07 14:52:50 INFO mover.Mover: namenodes = 
> {hdfs://nameservice1=[/apps/hive/warehouse/]}
> 15/05/07 14:52:51 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 15/05/07 14:52:51 INFO block.BlockTokenSecretManager: Setting block keys
> 15/05/07 14:52:51 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> 15/05/07 14:52:52 INFO block.BlockTokenSecretManager: Setting block keys
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:52:52 WARN balancer.Dispatcher: Failed to move 
> blk_1075156654_1438349 with size=134217728 from :1019:ARCHIVE to 
> :1019:DISK through :1019: block move is failed: opReplaceBlock 
> BP-120244285--1417023863606:blk_1075156654_1438349 received exception 
> java.io.EOFException: Premature EOF: no length prefix available
> 
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: 
> /default-rack/:1019
> 15/05/07 14:53:31 WARN balancer.Dispatcher: Failed to move 
> blk_1075156654_1438349 with size=134217728 from :1019:ARCHIVE to 
> :1019:DISK through :1019: block move is failed: opReplaceBlock 
> BP-120244285--1417023863606:blk_1075156654_1438349 received exception 
> java.io.EOFException: Premature EOF: no length prefix available
> ..
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8921) Add an option to Balancer so tha it only uses the k-most over-utilized DNs or all over-utilized DNs as sources.

2015-08-18 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8921:
-

 Summary: Add an option to Balancer so tha it only uses the k-most 
over-utilized DNs or all over-utilized DNs as sources.
 Key: HDFS-8921
 URL: https://issues.apache.org/jira/browse/HDFS-8921
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer  mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


Arpit suggested to add a separate option to source from the most over-utilized 
DataNodes first so the administrator does not have to pass the source DNs 
manually; see [this 
comment|https://issues.apache.org/jira/browse/HDFS-8826?focusedCommentId=14700576page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14700576].
  The new option could allow specifying the k-most over-utilized DNs or all 
over-utilized DNs as sources.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-07-29 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8838:
-

 Summary: Tolerate datanode failures in DFSStripedOutputStream when 
the data length is small
 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8824) Do not use small blocks for balancing the cluster

2015-07-27 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8824:
-

 Summary: Do not use small blocks for balancing the cluster
 Key: HDFS-8824
 URL: https://issues.apache.org/jira/browse/HDFS-8824
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


Balancer gets datanode block lists from NN and then move the blocks in order to 
balance the cluster.  It should not use the blocks with small size since moving 
the small blocks generates a lot of overhead and the small blocks do not help 
balancing the cluster much.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8825) Enhancements to Balancer

2015-07-27 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8825:
-

 Summary: Enhancements to Balancer
 Key: HDFS-8825
 URL: https://issues.apache.org/jira/browse/HDFS-8825
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


This is an umbrella JIRA to enhance Balancer.  The goal is to make it runs 
faster, more efficient and improve its usability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8826) Balancer may not move blocks efficiently in some cases

2015-07-27 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8826:
-

 Summary: Balancer may not move blocks efficiently in some cases
 Key: HDFS-8826
 URL: https://issues.apache.org/jira/browse/HDFS-8826
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer  mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


Balancer is inefficient in the following case:
|| Datanode || Utilization || Rack ||
| D1 | 95% | A |
| D2 | 30% | B |
| D3, D4, D5 | 0% | B |

The average utilization is 25% so that D2 is within 10% threshold.  However, 
Balancer currently will first move blocks from D2 to D3, D4 and D5 since they 
are under the same rack.  Then, it will move blocks from D1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-852) Balancer shutdown synchronisation could do with a review

2015-07-27 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-852.
--
Resolution: Not A Problem

I think this issue got stale.  Resolving as Not a Problem.  Please feel free to 
reopen if you disagree.

 Balancer shutdown synchronisation could do with a review
 

 Key: HDFS-852
 URL: https://issues.apache.org/jira/browse/HDFS-852
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Affects Versions: 0.22.0
Reporter: Steve Loughran
Priority: Minor

 Looking at the source of the Balancer, there's a lot 
 {{catch(InterruptedException)}} clauses, which runs the risk of swallowing 
 exceptions, making it harder to shut down a balancer.
 for example, the {{AccessKeyUpdater swallows the InterruptedExceptions which 
 get used to tell it to shut down, and while it does poll the shared field 
 {{shouldRun}}, that field isn't volatile: the shutdown may }}not work. 
 Elsewhere, the {{dispatchBlocks()}} method swallows interruptions without 
 even looking for any shutdown flag. 
 This is all minor as it is shutdown logic, but it is the stuff that it hard 
 to test and leads to problems in the field, the problems that leave the ops 
 team resorting to {{kill -9}}, and we don't want that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-1676) DateFormat.getDateTimeInstance() is very expensive, we can cache it to improve performance

2015-07-27 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-1676.
---
Resolution: Not A Problem

Resolving this as not-a-problem.  Please feel free to reopen if you disagree.

 DateFormat.getDateTimeInstance() is very expensive, we can cache it to 
 improve performance
 --

 Key: HDFS-1676
 URL: https://issues.apache.org/jira/browse/HDFS-1676
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Affects Versions: 0.21.0
Reporter: Xiaoming Shi
  Labels: newbie

 In the file:
 ./hadoop-0.21.0/hdfs/src/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
   line:1520
 In the while loop, DateFormat.getDateTimeInstance()is called in each 
 iteration. We can cache the result by moving it outside the loop or adding a 
 class member.
 This is similar to the Apache bug 
 https://issues.apache.org/bugzilla/show_bug.cgi?id=48778 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3619) isGoodBlockCandidate() in Balancer is not handling properly if replica factor 3

2015-07-27 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-3619.
---
Resolution: Not A Problem

Resolving as not-a-problem.  Please feel free to reopen if you disagree.


 isGoodBlockCandidate() in Balancer is not handling properly if replica factor 
 3
 

 Key: HDFS-3619
 URL: https://issues.apache.org/jira/browse/HDFS-3619
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Junping Du
Assignee: Junping Du

 Let's assume:
 1. replica factor = 4
 2. source node in rack 1 has 1st replica, 2nd and 3rd replica are in rack 2, 
 4th replica in rack3 and target node is in rack3. 
 So, It should be good for balancer to move replica from source node to target 
 node but will return false in isGoodBlockCandidate(). I think we can fix it 
 by simply making judgement that at least one replica node (other than source) 
 is on the different rack of target node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3411) Balancer fails to balance blocks between aboveAvgUtilized and belowAvgUtilized datanodes.

2015-07-27 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-3411.
---
Resolution: Not A Problem

Resolving as not-a-problem.  Please feel free to reopen if you disagree.

 Balancer fails to balance blocks between aboveAvgUtilized and 
 belowAvgUtilized datanodes.
 -

 Key: HDFS-3411
 URL: https://issues.apache.org/jira/browse/HDFS-3411
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Affects Versions: 0.23.0
Reporter: Ashish Singhi

 Scaenario:
 replication set to 1.
 1. Start 1NN and IDN
 2. pump 1GB of data.
 3. Start one more DN
 4. Run balancer with threshold 1.
 Now DN1 is added into aboveAvgUtilizedDatanodes and DN2 into 
 belowAvgUtilizedDatanodes. Hence overLoadedBytes and underLoadedBytes will be 
 equal to 0. Resulting in bytesLeftToMove equal to 0. Thus balancer will exit 
 without balancing the blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8818) Allow Balancer to run faster

2015-07-23 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8818:
-

 Summary: Allow Balancer to run faster
 Key: HDFS-8818
 URL: https://issues.apache.org/jira/browse/HDFS-8818
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


The original design of Balancer is intentionally to make it run slowly so that 
the balancing activities won't affect the normal cluster activities and the 
running jobs.

There are new use case that cluster admin may choose to balance the cluster 
when the cluster load is low, or in a maintain window.  So that we should have 
an option to allow Balancer to run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8528) Erasure Coding: optimize client writing by making the writing of data and parity concurrently

2015-06-04 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-8528.
---
Resolution: Duplicate

This duplicates HDFS-8287.  Let's resolve it.

 Erasure Coding: optimize client writing by making the writing of data and 
 parity  concurrently
 --

 Key: HDFS-8528
 URL: https://issues.apache.org/jira/browse/HDFS-8528
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo

 HDFS-8425 shows the client writing is not very efficient currently. One 
 factor is that  when data buffers are full, client suspends until the 
 parities are encoded and written. This sub task tries to make the two 
 writings concurrently to enhance the efficiency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8540) Mover should exit with NO_MOVE_BLOCK if no block can be moved

2015-06-04 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8540:
-

 Summary: Mover should exit with NO_MOVE_BLOCK if no block can be 
moved
 Key: HDFS-8540
 URL: https://issues.apache.org/jira/browse/HDFS-8540
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Tsz Wo Nicholas Sze


When there are files not satisfying their storage policy and no move is 
possible, Mover exits with SUCCESS.  It should exit with NO_MOVE_BLOCK.

The bug seems in the following code.  When StorageTypeDiff is not empty and 
scheduleMoves4Block return false, it does not update hasRemaining.  Also, there 
is no indication of No block can be moved for the entire iteration.
{code}
//Mover.processFile(..)
if (!diff.removeOverlap(true)) {
  if (scheduleMoves4Block(diff, lb, ecSchema)) {
hasRemaining |= (diff.existing.size()  1 
diff.expected.size()  1);
  }
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8541) Mover should exit with NO_MOVE_PROGRESS if there is no move progress

2015-06-04 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8541:
-

 Summary: Mover should exit with NO_MOVE_PROGRESS if there is no 
move progress
 Key: HDFS-8541
 URL: https://issues.apache.org/jira/browse/HDFS-8541
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Priority: Minor


HDFS-8143 changed Mover to exit after some retry when failed to move blocks.  
Two additional suggestions:
# Mover retry counter should be incremented only if all moves fail.  If there 
are some successful moves, the counter should be reset.
# Mover should exit with NO_MOVE_PROGRESS instead of IO_EXCEPTION in case of 
failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8437) Fail/warn if HDFS is setup with an even number of QJMs.

2015-05-19 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8437:
-

 Summary: Fail/warn if HDFS is setup with an even number of QJMs.
 Key: HDFS-8437
 URL: https://issues.apache.org/jira/browse/HDFS-8437
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


When setting an even number (2n, n1) of QJMs, the number of failure it can 
tolerate is the same as one node less (2n-1).  Therefore, it does not make 
sense to setup with an even number of QJMs.  We should either fail it or warn 
the users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8433) blockToken is not set in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil

2015-05-19 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8433:
-

 Summary: blockToken is not set in constructInternalBlock and 
parseStripedBlockGroup in StripedBlockUtil
 Key: HDFS-8433
 URL: https://issues.apache.org/jira/browse/HDFS-8433
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze


The blockToken provided in LocatedStripedBlock is not used to create 
LocatedBlock in constructInternalBlock and parseStripedBlockGroup in 
StripedBlockUtil.

We should also add ec tests with security on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8405) Fix a typo in NamenodeFsck

2015-05-14 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8405:
-

 Summary: Fix a typo in NamenodeFsck
 Key: HDFS-8405
 URL: https://issues.apache.org/jira/browse/HDFS-8405
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze
Assignee: Takanobu Asanuma
Priority: Minor


DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY below should not be quoted.
{code}
  res.append(\n  
).append(DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY:\t)
 .append(minReplication);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8397) Refactor the error handling code in DataStreamer

2015-05-13 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8397:
-

 Summary: Refactor the error handling code in DataStreamer
 Key: HDFS-8397
 URL: https://issues.apache.org/jira/browse/HDFS-8397
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


DataStreamer handles (1) bad datanode, (2) restarting datanode and (3) datanode 
replacement and keeps various state and indexes.  This issue is to clean up the 
code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-05-12 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8383:
-

 Summary: Tolerate multiple failures in DFSStripedOutputStream
 Key: HDFS-8383
 URL: https://issues.apache.org/jira/browse/HDFS-8383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8384) Allow NN to startup if there are files having a lease but are not under construction

2015-05-12 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8384:
-

 Summary: Allow NN to startup if there are files having a lease but 
are not under construction
 Key: HDFS-8384
 URL: https://issues.apache.org/jira/browse/HDFS-8384
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


When there are files having a lease but are not under construction, NN will 
fail to start up with
{code}
15/05/12 00:36:31 ERROR namenode.FSImage: Unable to save image for 
/hadoop/hdfs/namenode
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.getINodesUnderConstruction(LeaseManager.java:412)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFilesUnderConstruction(FSNamesystem.java:7124)
...
{code}
The actually problem is that the image could be corrupted by bugs like 
HDFS-7587.  We should have an option/conf to allow NN to start up so that the 
problematic files could possibly be deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   >