[jira] [Reopened] (HDFS-15971) Make mkstemp cross platform

2021-04-16 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reopened HDFS-15971:


> Make mkstemp cross platform
> ---
>
> Key: HDFS-15971
> URL: https://issues.apache.org/jira/browse/HDFS-15971
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> mkstemp isn't available in Visual C++. Need to make it cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15971) Make mkstemp cross platform

2021-04-16 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-15971:
---
Fix Version/s: (was: 3.4.0)

I've reverted this from trunk

> Make mkstemp cross platform
> ---
>
> Key: HDFS-15971
> URL: https://issues.apache.org/jira/browse/HDFS-15971
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> mkstemp isn't available in Visual C++. Need to make it cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15971) Make mkstemp cross platform

2021-04-16 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323953#comment-17323953
 ] 

Eric Badger commented on HDFS-15971:


Yea, I think reverting would be best until we can figure out how to fix it on 
RHEL. I'll revert it.

I'm not familiar with the code that was modified, but I'm happy to test any 
patches on RHEL to make sure that they work on that environment before we merge 
again.

> Make mkstemp cross platform
> ---
>
> Key: HDFS-15971
> URL: https://issues.apache.org/jira/browse/HDFS-15971
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> mkstemp isn't available in Visual C++. Need to make it cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15971) Make mkstemp cross platform

2021-04-15 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322535#comment-17322535
 ] 

Eric Badger commented on HDFS-15971:


{noformat}
[INFO] Running cmake 
/home/ebadger/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src 
-DGENERATED_JAVAH=/home/ebadger/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/javah
 -DHADOOP_BUILD=1 -DJVM_ARCH_DATA_MODEL=64 -DREQUIRE_FUSE=false 
-DREQUIRE_LIBWEBHDFS=false -DREQUIRE_VALGRIND=false -G Unix Makefiles
[INFO] with extra environment variables {}
[WARNING] JAVA_HOME=, 
JAVA_JVM_LIBRARY=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre/lib/amd64/server/libjvm.so
[WARNING] 
JAVA_INCLUDE_PATH=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/include,
 
JAVA_INCLUDE_PATH2=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/include/linux
[WARNING] Located all JNI components successfully.
[WARNING] CUSTOM_OPENSSL_PREFIX =
[WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED
[WARNING] -- Performing Test THREAD_LOCAL_SUPPORTED - Failed
[WARNING] CMake Warning at CMakeLists.txt:174 (message):
[WARNING]   WARNING: Libhdfs++ library was not built because the required 
feature
[WARNING]   thread_local storage is not supported by your compiler.  Known 
compilers
[WARNING]   that support this feature: GCC 4.8+, Visual Studio 2015+, Clang 
(community
[WARNING]   version 3.3+), Clang (version for Xcode 8+ and iOS 9+).
[WARNING]
[WARNING]
[WARNING] -- Checking for module 'fuse'
[WARNING] --   No package 'fuse' found
[WARNING] -- Failed to find Linux FUSE libraries or include files.  Will not 
build FUSE client.
[WARNING] -- Configuring done
[WARNING] CMake Error at CMakeLists.txt:94 (add_executable):
[WARNING]   Error evaluating generator expression:
[WARNING]
[WARNING] $
[WARNING]
[WARNING]   Objects of target "x_platform_obj_c_api" referenced but no such 
target
[WARNING]   exists.
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfs/CMakeLists.txt:74 (build_libhdfs_test)
[WARNING]
[WARNING]
[WARNING] CMake Error at CMakeLists.txt:94 (add_executable):
[WARNING]   Error evaluating generator expression:
[WARNING]
[WARNING] $
[WARNING]
[WARNING]   Objects of target "x_platform_obj_c_api" referenced but no such 
target
[WARNING]   exists.
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfs/CMakeLists.txt:66 (build_libhdfs_test)
[WARNING]
[WARNING]
[WARNING] CMake Error at CMakeLists.txt:94 (add_executable):
[WARNING]   Error evaluating generator expression:
[WARNING]
[WARNING] $
[WARNING]
[WARNING]   Objects of target "x_platform_obj_c_api" referenced but no such 
target
[WARNING]   exists.
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfs/CMakeLists.txt:61 (build_libhdfs_test)
[WARNING]
[WARNING]
[WARNING] CMake Error at CMakeLists.txt:94 (add_executable):
[WARNING]   Error evaluating generator expression:
[WARNING]
[WARNING] $
[WARNING]
[WARNING]   Objects of target "x_platform_obj_c_api" referenced but no such 
target
[WARNING]   exists.
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfs/CMakeLists.txt:57 (build_libhdfs_test)
[WARNING]
[WARNING]
[WARNING] CMake Error at CMakeLists.txt:94 (add_executable):
[WARNING]   No SOURCES given to target: test_libhdfs_vecsum_hdfs
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfs/CMakeLists.txt:74 (build_libhdfs_test)
[WARNING]
[WARNING]
[WARNING] CMake Error at CMakeLists.txt:94 (add_executable):
[WARNING]   No SOURCES given to target: test_libhdfs_zerocopy_hdfs_static
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfs/CMakeLists.txt:66 (build_libhdfs_test)
[WARNING]
[WARNING]
[WARNING] CMake Error at CMakeLists.txt:94 (add_executable):
[WARNING]   No SOURCES given to target: test_libhdfs_threaded_hdfs_static
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfs/CMakeLists.txt:61 (build_libhdfs_test)
[WARNING]
[WARNING]
[WARNING] CMake Error at CMakeLists.txt:94 (add_executable):
[WARNING]   No SOURCES given to target: test_libhdfs_ops_hdfs_static
[WARNING] Call Stack (most recent call first):
[WARNING]   main/native/libhdfs/CMakeLists.txt:57 (build_libhdfs_test)
[WARNING]
[WARNING]
[WARNING] -- Build files have been written to: 
/home/ebadger/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target
{noformat}
[~gautham], [~inigoiri], this PR broke the build for me on trunk. I'm running 
on RHEL 7.6 and narrowed it down to this PR. Reverting it allows the build to 
succeed for me. Please revert this unless it is a very quick fix

> Make mkstemp cross platform
> ---
>
> Key: HDFS-15971
> URL: https://issues.apache.org/jira/browse/HDFS-15971
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  

[jira] [Commented] (HDFS-15646) Track failing tests in HDFS

2020-10-22 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219348#comment-17219348
 ] 

Eric Badger commented on HDFS-15646:


I am very +1 for moving towards a no-commit policy on failed unit tests. If the 
unit test is bad, then fix it. If the unit test reveals a race/bug in the code, 
fix the code. But just ignoring them does basically no good for anything. 

> Track failing tests in HDFS
> ---
>
> Key: HDFS-15646
> URL: https://issues.apache.org/jira/browse/HDFS-15646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Ahmed Hussein
>Priority: Blocker
>
> There are several Units that are consistently failing on Yetus for a log 
> period of time.
>  The list keeps growing and it is driving the repository into unstable 
> status. Qbt  reports more than *40 failing unit tests* on average.
> Personally, over the last week, with every submitted patch, I have to spend a 
> considerable time looking at the same stack trace to double check whether or 
> not the patch contributes to those failures.
> I found out that the majority of those tests were failing for quite sometime 
> but +no Jiras were filed+.
> The main problem of those consistent failures is that they have side effect 
> on the runtime of the other Junits by sucking up resources such as memory and 
> ports.
> {{StripedFile}} and {{EC}} tests in particular are 100% show-ups in the list 
> of bad tests.
>  I looked at those tests and they certainly need some improvements (i.e., 
> HDFS-15459). Is any one interested in those test cases? Can we just turn them 
> off?
> I like to give some heads-up that we need some more collaboration to enforce 
> the stability of the code set.
>  * For all developers, please, {color:#ff}file a Jira once you see a 
> failing test whether it is unrelated to your patch or not{color}. This gives 
> heads-up to other developers about the potential failures. Please do not stop 
> at commenting on your patch "_+this is unrelated to my work+_".
>  * Volunteer to dedicate more time on fixing flaky tests.
>  * Periodically, make sure that the list of failing tests does not exceed a 
> certain number of tests. We have Qbt reports to monitor that, but there is no 
> follow up on its status.
>  * We should consider aggressive strategies such as blocking any merges until 
> the code is brought back to stability.
>  * We need a clear and well-defined process to address Yetus issues: 
> configuration, investigating running out of memory, slowness..etc.
>  * Turn-off the Junits within the modules that are not being actively used in 
> the community (i.e., EC, stripedFiles, or..etc.). 
>  
> CC: [~aajisaka], [~elgoiri], [~kihwal], [~daryn], [~weichiu]
> Do you guys have any thoughts on the current status of the HDFS ?
>  
> +The following list is a quick list of failing Junits from Qbt reports:+
>  
> !https://ci-hadoop.apache.org/static/0ead8630/images/16x16/document_add.png!  
> [org.apache.hadoop.crypto.key.kms.server.TestKMS.testKMSProviderCaching|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/testReport/org.apache.hadoop.crypto.key.kms.server/TestKMS/testKMSProviderCaching/]1.5
>  
> sec[1|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/]
> !https://ci-hadoop.apache.org/static/0ead8630/images/16x16/document_add.png!  
> [org.apache.hadoop.fs.azure.TestBlobMetadata.testFolderMetadata|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/testReport/org.apache.hadoop.fs.azure/TestBlobMetadata/testFolderMetadata/]42
>  
> ms[3|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/297/]
> !https://ci-hadoop.apache.org/static/0ead8630/images/16x16/document_add.png!  
> [org.apache.hadoop.fs.azure.TestBlobMetadata.testFirstContainerVersionMetadata|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/testReport/org.apache.hadoop.fs.azure/TestBlobMetadata/testFirstContainerVersionMetadata/]46
>  
> ms[3|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/297/]
> !https://ci-hadoop.apache.org/static/0ead8630/images/16x16/document_add.png!  
> [org.apache.hadoop.fs.azure.TestBlobMetadata.testPermissionMetadata|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/testReport/org.apache.hadoop.fs.azure/TestBlobMetadata/testPermissionMetadata/]27
>  
> ms[3|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/297/]
> !https://ci-hadoop.apache.org/static/0ead8630/images/16x16/document_add.png!  
> [org.apache.hadoop.fs.azure.TestBlobMetadata.testOldPermissionMetadata|https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/299/testReport/org.apache.hadoop.fs.azure/TestBlobMetadata/testOldPermissionMetadata/]19
>  
> 

[jira] [Updated] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-13 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14498:
---
Fix Version/s: (was: 3.1.5)
   (was: 3.2.2)

I have reverted this from branch-3.2 and branch-3.1. It was earlier reverted 
from branch-2.10

[~hexiaoqiao], please compile each branch before committing things. Blindly 
cherry-picking commits and pushing them leads to potentially breaking the build 
like this and wastes other developers' time. It is your responsibility as a 
committer to make sure that you don't break the build

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-14498-branch-2.10.001.patch, HDFS-14498.001.patch, 
> HDFS-14498.002.patch
>
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2020-04-02 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14986:
---
Fix Version/s: (was: 3.2.2)

[~weichiu], cherry-picking this patch to branch-3.2 broke compilation. I have 
reverted it from branch-3.2. I see that you cherry-picked several other patches 
to branch-3.2 after this one. Please compile to make sure that you don't 
unintentionally break the build and cause other developers to spend time fixing 
it. [~cliang], you also committed a patch to branch-3.2 (albeit in YARN, not 
HDFS) after this patch had broken compilation. I know it's annoying to compile 
every little change, but it's pretty frustrating having to track down the patch 
that broke compilation, revert it, and update the relevant JIRAs. 

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 2.10.1
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15062) Add LOG when sendIBRs failed

2019-12-20 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001157#comment-17001157
 ] 

Eric Badger commented on HDFS-15062:


I have reverted this patch from branch-3.2 and branch-3.1. I didn't bother with 
branch-3.0, since that branch is no longer active. 

> Add LOG when sendIBRs failed
> 
>
> Key: HDFS-15062
> URL: https://issues.apache.org/jira/browse/HDFS-15062
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-15062.001.patch, HDFS-15062.002.patch, 
> HDFS-15062.003.patch
>
>
> {code}
>   /** Send IBRs to namenode. */
>   void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration,
>   String bpid, String nnRpcLatencySuffix) throws IOException {
> // Generate a list of the pending reports for each storage under the lock
> final StorageReceivedDeletedBlocks[] reports = generateIBRs();
> if (reports.length == 0) {
>   // Nothing new to report.
>   return;
> }
> // Send incremental block reports to the Namenode outside the lock
> if (LOG.isDebugEnabled()) {
>   LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports));
> }
> boolean success = false;
> final long startTime = monotonicNow();
> try {
>   namenode.blockReceivedAndDeleted(registration, bpid, reports);
>   success = true;
> } finally {
>   if (success) {
> dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime,
> nnRpcLatencySuffix);
> lastIBR = startTime;
>   } else {
> // If we didn't succeed in sending the report, put all of the
> // blocks back onto our queue, but only in the case where we
> // didn't put something newer in the meantime.
> putMissing(reports);
>   }
> }
>   }
> {code}
> When call namenode.blockReceivedAndDelete failed, will put reports to 
> pendingIBRs. Maybe we should add log for failed case. It is helpful for 
> trouble shooting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-15062) Add LOG when sendIBRs failed

2019-12-20 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reopened HDFS-15062:


This patch breaks branch-3.2 and branch-3.1 compilation. Remember that you 
always need to compile the code on each branch before you merge. 

> Add LOG when sendIBRs failed
> 
>
> Key: HDFS-15062
> URL: https://issues.apache.org/jira/browse/HDFS-15062
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-15062.001.patch, HDFS-15062.002.patch, 
> HDFS-15062.003.patch
>
>
> {code}
>   /** Send IBRs to namenode. */
>   void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration,
>   String bpid, String nnRpcLatencySuffix) throws IOException {
> // Generate a list of the pending reports for each storage under the lock
> final StorageReceivedDeletedBlocks[] reports = generateIBRs();
> if (reports.length == 0) {
>   // Nothing new to report.
>   return;
> }
> // Send incremental block reports to the Namenode outside the lock
> if (LOG.isDebugEnabled()) {
>   LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports));
> }
> boolean success = false;
> final long startTime = monotonicNow();
> try {
>   namenode.blockReceivedAndDeleted(registration, bpid, reports);
>   success = true;
> } finally {
>   if (success) {
> dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime,
> nnRpcLatencySuffix);
> lastIBR = startTime;
>   } else {
> // If we didn't succeed in sending the report, put all of the
> // blocks back onto our queue, but only in the case where we
> // didn't put something newer in the meantime.
> putMissing(reports);
>   }
> }
>   }
> {code}
> When call namenode.blockReceivedAndDelete failed, will put reports to 
> pendingIBRs. Maybe we should add log for failed case. It is helpful for 
> trouble shooting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width

2019-10-28 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14931:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> hdfs crypto commands limit column width
> ---
>
> Key: HDFS-14931
> URL: https://issues.apache.org/jira/browse/HDFS-14931
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14931.001.patch
>
>
> {noformat}
> foo@bar$ hdfs crypto -listZones
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
>   
> yptio
>   nzon
>   e1
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
>   
> yptio
>   nzon
>   e2
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
>   
> yptio
>   nzon
>   e3
> {noformat}
> The command ends up looking something really ugly like this when the path is 
> long. This also makes it very difficult to pipe the output into other 
> utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width

2019-10-28 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14931:
---
Fix Version/s: 3.2.2
   3.1.4
   3.3.0
   3.0.4

Thanks for the review, [~weichiu]! I committed this to trunk, branch-3.2, 
branch-3.1, and branch-3.0

> hdfs crypto commands limit column width
> ---
>
> Key: HDFS-14931
> URL: https://issues.apache.org/jira/browse/HDFS-14931
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14931.001.patch
>
>
> {noformat}
> foo@bar$ hdfs crypto -listZones
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
>   
> yptio
>   nzon
>   e1
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
>   
> yptio
>   nzon
>   e2
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
>   
> yptio
>   nzon
>   e3
> {noformat}
> The command ends up looking something really ugly like this when the path is 
> long. This also makes it very difficult to pipe the output into other 
> utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14931) hdfs crypto commands limit column width

2019-10-25 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959904#comment-16959904
 ] 

Eric Badger commented on HDFS-14931:


I ran TestDistributedFileSystem locally and it didn't fail for me. I don't 
believe it is related to this patch.

> hdfs crypto commands limit column width
> ---
>
> Key: HDFS-14931
> URL: https://issues.apache.org/jira/browse/HDFS-14931
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: HDFS-14931.001.patch
>
>
> {noformat}
> foo@bar$ hdfs crypto -listZones
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
>   
> yptio
>   nzon
>   e1
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
>   
> yptio
>   nzon
>   e2
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
>   
> yptio
>   nzon
>   e3
> {noformat}
> The command ends up looking something really ugly like this when the path is 
> long. This also makes it very difficult to pipe the output into other 
> utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width

2019-10-24 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14931:
---
Status: Patch Available  (was: Open)

> hdfs crypto commands limit column width
> ---
>
> Key: HDFS-14931
> URL: https://issues.apache.org/jira/browse/HDFS-14931
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: HDFS-14931.001.patch
>
>
> {noformat}
> foo@bar$ hdfs crypto -listZones
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
>   
> yptio
>   nzon
>   e1
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
>   
> yptio
>   nzon
>   e2
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
>   
> yptio
>   nzon
>   e3
> {noformat}
> The command ends up looking something really ugly like this when the path is 
> long. This also makes it very difficult to pipe the output into other 
> utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width

2019-10-24 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14931:
---
Attachment: HDFS-14931.001.patch

> hdfs crypto commands limit column width
> ---
>
> Key: HDFS-14931
> URL: https://issues.apache.org/jira/browse/HDFS-14931
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: HDFS-14931.001.patch
>
>
> {noformat}
> foo@bar$ hdfs crypto -listZones
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
>   
> yptio
>   nzon
>   e1
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
>   
> yptio
>   nzon
>   e2
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
>   
> yptio
>   nzon
>   e3
> {noformat}
> The command ends up looking something really ugly like this when the path is 
> long. This also makes it very difficult to pipe the output into other 
> utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14931) hdfs crypto commands limit column width

2019-10-24 Thread Eric Badger (Jira)
Eric Badger created HDFS-14931:
--

 Summary: hdfs crypto commands limit column width
 Key: HDFS-14931
 URL: https://issues.apache.org/jira/browse/HDFS-14931
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


{noformat}
foo@bar$ hdfs crypto -listZones
/projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
  yptio
  nzon
  e1
/projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
  yptio
  nzon
  e2
/projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
  yptio
  nzon
  e3
{noformat}
The command ends up looking something really ugly like this when the path is 
long. This also makes it very difficult to pipe the output into other 
utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14759) HDFS cat logs an info message

2019-08-21 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14759:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> HDFS cat logs an info message
> -
>
> Key: HDFS-14759
> URL: https://issues.apache.org/jira/browse/HDFS-14759
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14759.001.patch
>
>
> HDFS-13699 changed a debug log line into an info log line and this line is 
> printed during {{hadoop fs -cat}} operations. This make it very difficult to 
> figure out where the log line ends and where the catted file begins, 
> especially when the output is sent to a tool for parsing. 
> {noformat}
> [ebadger@foobar bin]$ hadoop fs -cat /foo 2>/dev/null
> 2019-08-20 22:09:45,907 INFO  [main] sasl.SaslDataTransferClient 
> (SaslDataTransferClient.java:checkTrustAndSend(230)) - SASL encryption trust 
> check: localHostTrusted = false, remoteHostTrusted = false
> bar
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14759) HDFS cat logs an info message

2019-08-21 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14759:
---
Fix Version/s: 3.3.0

Thanks, [~anu]

> HDFS cat logs an info message
> -
>
> Key: HDFS-14759
> URL: https://issues.apache.org/jira/browse/HDFS-14759
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14759.001.patch
>
>
> HDFS-13699 changed a debug log line into an info log line and this line is 
> printed during {{hadoop fs -cat}} operations. This make it very difficult to 
> figure out where the log line ends and where the catted file begins, 
> especially when the output is sent to a tool for parsing. 
> {noformat}
> [ebadger@foobar bin]$ hadoop fs -cat /foo 2>/dev/null
> 2019-08-20 22:09:45,907 INFO  [main] sasl.SaslDataTransferClient 
> (SaslDataTransferClient.java:checkTrustAndSend(230)) - SASL encryption trust 
> check: localHostTrusted = false, remoteHostTrusted = false
> bar
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14759) HDFS cat logs an info message

2019-08-20 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14759:
---
Description: 
HDFS-13699 changed a debug log line into an info log line and this line is 
printed during {{hadoop fs -cat}} operations. This make it very difficult to 
figure out where the log line ends and where the catted file begins, especially 
when the output is sent to a tool for parsing. 

{noformat}
[ebadger@foobar bin]$ hadoop fs -cat /foo 2>/dev/null
2019-08-20 22:09:45,907 INFO  [main] sasl.SaslDataTransferClient 
(SaslDataTransferClient.java:checkTrustAndSend(230)) - SASL encryption trust 
check: localHostTrusted = false, remoteHostTrusted = false
bar
{noformat}

  was:HDFS-13699 changed a debug log line into an info log line and this line 
is printed during {{hadoop fs -cat}} operations. This make it very difficult to 
figure out where the log line ends and where the catted file begins, especially 
when the output is sent to a tool for parsing. 


> HDFS cat logs an info message
> -
>
> Key: HDFS-14759
> URL: https://issues.apache.org/jira/browse/HDFS-14759
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: HDFS-14759.001.patch
>
>
> HDFS-13699 changed a debug log line into an info log line and this line is 
> printed during {{hadoop fs -cat}} operations. This make it very difficult to 
> figure out where the log line ends and where the catted file begins, 
> especially when the output is sent to a tool for parsing. 
> {noformat}
> [ebadger@foobar bin]$ hadoop fs -cat /foo 2>/dev/null
> 2019-08-20 22:09:45,907 INFO  [main] sasl.SaslDataTransferClient 
> (SaslDataTransferClient.java:checkTrustAndSend(230)) - SASL encryption trust 
> check: localHostTrusted = false, remoteHostTrusted = false
> bar
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14759) HDFS cat logs an info message

2019-08-20 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911766#comment-16911766
 ] 

Eric Badger commented on HDFS-14759:


I have put up a patch to change the log line back to debug. However, this may 
not be the correct fix. I don't know why logging is going to stdout at all, 
regardless of the level. The correct fix might be to modify FSShell to write 
all logging to stderr. There may have been a regression there. 

> HDFS cat logs an info message
> -
>
> Key: HDFS-14759
> URL: https://issues.apache.org/jira/browse/HDFS-14759
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: HDFS-14759.001.patch
>
>
> HDFS-13699 changed a debug log line into an info log line and this line is 
> printed during {{hadoop fs -cat}} operations. This make it very difficult to 
> figure out where the log line ends and where the catted file begins, 
> especially when the output is sent to a tool for parsing. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14759) HDFS cat logs an info message

2019-08-20 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14759:
---
Attachment: HDFS-14759.001.patch

> HDFS cat logs an info message
> -
>
> Key: HDFS-14759
> URL: https://issues.apache.org/jira/browse/HDFS-14759
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: HDFS-14759.001.patch
>
>
> HDFS-13699 changed a debug log line into an info log line and this line is 
> printed during {{hadoop fs -cat}} operations. This make it very difficult to 
> figure out where the log line ends and where the catted file begins, 
> especially when the output is sent to a tool for parsing. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14759) HDFS cat logs an info message

2019-08-20 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reassigned HDFS-14759:
--

Assignee: Eric Badger

> HDFS cat logs an info message
> -
>
> Key: HDFS-14759
> URL: https://issues.apache.org/jira/browse/HDFS-14759
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>
> HDFS-13699 changed a debug log line into an info log line and this line is 
> printed during {{hadoop fs -cat}} operations. This make it very difficult to 
> figure out where the log line ends and where the catted file begins, 
> especially when the output is sent to a tool for parsing. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14759) HDFS cat logs an info message

2019-08-20 Thread Eric Badger (Jira)
Eric Badger created HDFS-14759:
--

 Summary: HDFS cat logs an info message
 Key: HDFS-14759
 URL: https://issues.apache.org/jira/browse/HDFS-14759
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Eric Badger


HDFS-13699 changed a debug log line into an info log line and this line is 
printed during {{hadoop fs -cat}} operations. This make it very difficult to 
figure out where the log line ends and where the catted file begins, especially 
when the output is sent to a tool for parsing. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1458) Create a maven profile to run fault injection tests

2019-05-02 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831742#comment-16831742
 ] 

Eric Badger commented on HDDS-1458:
---

bq. Jonathan Eagles Eric Badger, please speak up if you have concerns to rename 
docker profile to dist profile.

As I commented in YARN-7129, I am against adding mandatory Docker image builds 
to the default Hadoop build process. The community came to this same consensus 
via [this mailing list thread| 
https://lists.apache.org/thread.html/c63f404bc44f8f249cbc98ee3f6633384900d07e2308008fe4620150@%3Ccommon-dev.hadoop.apache.org%3E].
 

However, I am not an HDDS developer and do not have proper insight into HDDS 
development. So I can only give my thoughts on this from a YARN perspective. 
Maybe this is a great idea for HDDS, maybe it's not. Since I don't know 
anything about HDDS, I can't really give you an opinion. But I think that it 
definitely warrants getting more eyes and reviews on this from the HDDS 
community 

> Create a maven profile to run fault injection tests
> ---
>
> Key: HDDS-1458
> URL: https://issues.apache.org/jira/browse/HDDS-1458
> Project: Hadoop Distributed Data Store
>  Issue Type: Test
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: HDDS-1458.001.patch, HDDS-1458.002.patch
>
>
> Some fault injection tests have been written using blockade.  It would be 
> nice to have ability to start docker compose and exercise the blockade test 
> cases against Ozone docker containers, and generate reports.  This is 
> optional integration tests to catch race conditions and fault tolerance 
> defects. 
> We can introduce a profile with id: it (short for integration tests).  This 
> will launch docker compose via maven-exec-plugin and run blockade to simulate 
> container failures and timeout.
> Usage command:
> {code}
> mvn clean verify -Pit
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10755) TestDecommissioningStatus BindException Failure

2018-08-29 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596881#comment-16596881
 ] 

Eric Badger commented on HDFS-10755:


[~kennychang] were you actually able to reproduce the error when the patch is 
applied? This patch is from a few years ago so I don't remember the analysis. 
But it looks like it goes out and set the port in the conf to grab an ephemeral 
port. So I'm not sure why that would fail with a port bind issue.

> TestDecommissioningStatus BindException Failure
> ---
>
> Key: HDFS-10755
> URL: https://issues.apache.org/jira/browse/HDFS-10755
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: HDFS-10755.001.patch, HDFS-10755.002.patch
>
>
> Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They 
> are required to come back up on the same (initially ephemeral) port that they 
> were on before being shutdown. Because of this, there is an inherent race 
> condition where another process could bind to the port while the datanode is 
> down. If this happens then we get a BindException failure. However, all of 
> the tests in TestDecommissioningStatus depend on the cluster being up and 
> running for them to run correctly. So if a test blows up the cluster, the 
> subsequent tests will also fail. Below I show the BindException failure as 
> well as the subsequent test failure that occurred.
> {noformat}
> java.net.BindException: Problem binding to [localhost:35370] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at sun.nio.ch.Net.bind(Net.java:428)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:430)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:768)
>   at org.apache.hadoop.ipc.Server.(Server.java:2391)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:951)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:523)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:429)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426)
> {noformat}
> {noformat}
> java.lang.AssertionError: Number of Datanodes  expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275)
> {noformat}
> I don't think there's any way to avoid the inherent race condition with 
> getting the same ephemeral port, but we can definitely fix the tests so that 
> it doesn't cause subsequent tests to fail. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13565) [um

2018-05-15 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476265#comment-16476265
 ] 

Eric Badger commented on HDFS-13565:


+1 for this feature

> [um
> ---
>
> Key: HDFS-13565
> URL: https://issues.apache.org/jira/browse/HDFS-13565
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: stack
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10618) TestPendingReconstruction#testPendingAndInvalidate is flaky due to race condition

2018-03-12 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395770#comment-16395770
 ] 

Eric Badger commented on HDFS-10618:


Thanks [~anu]!

> TestPendingReconstruction#testPendingAndInvalidate is flaky due to race 
> condition
> -
>
> Key: HDFS-10618
> URL: https://issues.apache.org/jira/browse/HDFS-10618
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: flaky-test
> Fix For: 3.1.0, 2.10.0
>
> Attachments: HDFS-10618-b2.001.patch, HDFS-10618.001.patch
>
>
> TestPendingReconstruction#testPendingAndInvalidate fails intermittently. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12495) TestPendingInvalidateBlock#testPendingDeleteUnknownBlocks fails intermittently

2017-09-27 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182581#comment-16182581
 ] 

Eric Badger commented on HDFS-12495:


Thanks, [~linyiqun]!

> TestPendingInvalidateBlock#testPendingDeleteUnknownBlocks fails intermittently
> --
>
> Key: HDFS-12495
> URL: https://issues.apache.org/jira/browse/HDFS-12495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>Reporter: Eric Badger
>Assignee: Eric Badger
>  Labels: flaky-test
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2, 2.8.3, 3.0.0, 3.1.0
>
> Attachments: HDFS-12495.001.patch, HDFS-12495.002.patch
>
>
> {noformat}
> java.net.BindException: Problem binding to [localhost:36701] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:546)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
>   at org.apache.hadoop.ipc.Server.(Server.java:2655)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12548) HDFS Jenkins build is unstable on branch-2

2017-09-26 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180843#comment-16180843
 ] 

Eric Badger commented on HDFS-12548:


Possibly a completely separate issue, but Jenkins wasn't running at all on 
HDFS-12495 after submitting and resubmitting patches multiple times

> HDFS Jenkins build is unstable on branch-2
> --
>
> Key: HDFS-12548
> URL: https://issues.apache.org/jira/browse/HDFS-12548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.9.0
>Reporter: Rushabh S Shah
>Priority: Critical
>
> Feel free move the ticket to another project (e.g. infra).
> Recently I attached branch-2 patch while working on one jira 
> [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> There were at-least 100 failed and timed out tests. I am sure they are not 
> related to my patch.
> Also I came across another jira which was just a javadoc related change and 
> there were around 100 failed tests.
> Below are the details for pre-commits that failed in branch-2
> 1 [HDFS-12386 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069]
> {noformat}
> Ran on slave: asf912.gq1.ygridcore.net/H12
> Failed with following error message:
> Build timed out (after 300 minutes). Marking the build as aborted.
> Build was aborted
> Performing Post build task...
> {noformat}
> 2. [HDFS-12386 attempt 
> 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> {noformat}
> Ran on slave: asf900.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
> Caused: java.io.IOException: Backing channel 'H0' is disconnected.
>   at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
>   at 
> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
>   at com.sun.proxy.$Proxy125.isAlive(Unknown Source)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
>   at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
>   at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
>   at hudson.model.Build$BuildExecution.build(Build.java:206)
>   at hudson.model.Build$BuildExecution.doRun(Build.java:163)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490)
>   at hudson.model.Run.execute(Run.java:1735)
>   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>   at hudson.model.ResourceController.execute(ResourceController.java:97)
>   at hudson.model.Executor.run(Executor.java:405)
> {noformat}
> 3. [HDFS-12531 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493]
> {noformat}
> Ran on slave:  asf911.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at 

[jira] [Commented] (HDFS-12495) TestPendingInvalidateBlock#testPendingDeleteUnknownBlocks fails intermittently

2017-09-26 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180827#comment-16180827
 ] 

Eric Badger commented on HDFS-12495:


Thanks [~linyiqun]! Could we also commit this to branch-2 and branch-2.8?

> TestPendingInvalidateBlock#testPendingDeleteUnknownBlocks fails intermittently
> --
>
> Key: HDFS-12495
> URL: https://issues.apache.org/jira/browse/HDFS-12495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>Reporter: Eric Badger
>Assignee: Eric Badger
>  Labels: flaky-test
> Fix For: 3.1.0
>
> Attachments: HDFS-12495.001.patch, HDFS-12495.002.patch
>
>
> {noformat}
> java.net.BindException: Problem binding to [localhost:36701] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:546)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
>   at org.apache.hadoop.ipc.Server.(Server.java:2655)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently

2017-09-25 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179449#comment-16179449
 ] 

Eric Badger commented on HDFS-12495:


Looks like Jenkins really doesn't want to run on this JIRA

> TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
> --
>
> Key: HDFS-12495
> URL: https://issues.apache.org/jira/browse/HDFS-12495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>Reporter: Eric Badger
>Assignee: Eric Badger
>  Labels: flaky-test
> Attachments: HDFS-12495.001.patch
>
>
> {noformat}
> java.net.BindException: Problem binding to [localhost:36701] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:546)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
>   at org.apache.hadoop.ipc.Server.(Server.java:2655)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently

2017-09-20 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-12495:
---
Status: Open  (was: Patch Available)

> TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
> --
>
> Key: HDFS-12495
> URL: https://issues.apache.org/jira/browse/HDFS-12495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>Reporter: Eric Badger
>Assignee: Eric Badger
>  Labels: flaky-test
> Attachments: HDFS-12495.001.patch
>
>
> {noformat}
> java.net.BindException: Problem binding to [localhost:36701] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:546)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
>   at org.apache.hadoop.ipc.Server.(Server.java:2655)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently

2017-09-20 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-12495:
---
Status: Patch Available  (was: Open)

Not sure why Jenkins isn't running. Cancelling and resubmitting patch again

> TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
> --
>
> Key: HDFS-12495
> URL: https://issues.apache.org/jira/browse/HDFS-12495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>Reporter: Eric Badger
>Assignee: Eric Badger
>  Labels: flaky-test
> Attachments: HDFS-12495.001.patch
>
>
> {noformat}
> java.net.BindException: Problem binding to [localhost:36701] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:546)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
>   at org.apache.hadoop.ipc.Server.(Server.java:2655)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently

2017-09-20 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-12495:
---
Status: Open  (was: Patch Available)

> TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
> --
>
> Key: HDFS-12495
> URL: https://issues.apache.org/jira/browse/HDFS-12495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>Reporter: Eric Badger
>Assignee: Eric Badger
>  Labels: flaky-test
> Attachments: HDFS-12495.001.patch
>
>
> {noformat}
> java.net.BindException: Problem binding to [localhost:36701] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:546)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
>   at org.apache.hadoop.ipc.Server.(Server.java:2655)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently

2017-09-20 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-12495:
---
Status: Patch Available  (was: Open)

> TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
> --
>
> Key: HDFS-12495
> URL: https://issues.apache.org/jira/browse/HDFS-12495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>Reporter: Eric Badger
>Assignee: Eric Badger
>  Labels: flaky-test
> Attachments: HDFS-12495.001.patch
>
>
> {noformat}
> java.net.BindException: Problem binding to [localhost:36701] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:546)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
>   at org.apache.hadoop.ipc.Server.(Server.java:2655)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently

2017-09-19 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-12495:
---
Affects Version/s: 2.8.2
   3.0.0-beta1
   2.9.0

> TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
> --
>
> Key: HDFS-12495
> URL: https://issues.apache.org/jira/browse/HDFS-12495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-12495.001.patch
>
>
> {noformat}
> java.net.BindException: Problem binding to [localhost:36701] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:546)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
>   at org.apache.hadoop.ipc.Server.(Server.java:2655)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently

2017-09-19 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-12495:
---
Attachment: HDFS-12495.001.patch

Attaching a patch that has the datanodes restart on different ports so that we 
don't get bind exceptions from the DN not stopping completely before being 
restarted (HDFS-10371).

> TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
> --
>
> Key: HDFS-12495
> URL: https://issues.apache.org/jira/browse/HDFS-12495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-12495.001.patch
>
>
> {noformat}
> java.net.BindException: Problem binding to [localhost:36701] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:546)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
>   at org.apache.hadoop.ipc.Server.(Server.java:2655)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently

2017-09-19 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-12495:
---
Status: Patch Available  (was: Open)

> TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently
> --
>
> Key: HDFS-12495
> URL: https://issues.apache.org/jira/browse/HDFS-12495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-12495.001.patch
>
>
> {noformat}
> java.net.BindException: Problem binding to [localhost:36701] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:546)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
>   at org.apache.hadoop.ipc.Server.(Server.java:2655)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12495) TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks fails intermittently

2017-09-19 Thread Eric Badger (JIRA)
Eric Badger created HDFS-12495:
--

 Summary: TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks 
fails intermittently
 Key: HDFS-12495
 URL: https://issues.apache.org/jira/browse/HDFS-12495
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


{noformat}
java.net.BindException: Problem binding to [localhost:36701] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.apache.hadoop.ipc.Server.bind(Server.java:546)
at org.apache.hadoop.ipc.Server$Listener.(Server.java:955)
at org.apache.hadoop.ipc.Server.(Server.java:2655)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:968)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:367)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:810)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:954)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1314)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:481)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2611)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2499)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2546)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2152)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:175)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12089) Fix ambiguous NN retry log message

2017-07-05 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-12089:
---
Status: Patch Available  (was: Open)

> Fix ambiguous NN retry log message
> --
>
> Key: HDFS-12089
> URL: https://issues.apache.org/jira/browse/HDFS-12089
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-12089.001.patch
>
>
> {noformat}
> INFO [main] org.apache.hadoop.hdfs.web.WebHdfsFileSystem: Retrying connect to 
> namenode: foobar. Already tried 0 time(s); retry policy is 
> {noformat}
> The message is misleading since it has already tried once. This message 
> indicates the first retry attempt and that it had retried 0 times in the 
> past. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12089) Fix ambiguous NN retry log message

2017-07-05 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-12089:
---
Attachment: HDFS-12089.001.patch

Attaching patch. Changed "tried" to "retried". 

> Fix ambiguous NN retry log message
> --
>
> Key: HDFS-12089
> URL: https://issues.apache.org/jira/browse/HDFS-12089
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-12089.001.patch
>
>
> {noformat}
> INFO [main] org.apache.hadoop.hdfs.web.WebHdfsFileSystem: Retrying connect to 
> namenode: foobar. Already tried 0 time(s); retry policy is 
> {noformat}
> The message is misleading since it has already tried once. This message 
> indicates the first retry attempt and that it had retried 0 times in the 
> past. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12089) Fix ambiguous NN retry log message

2017-07-05 Thread Eric Badger (JIRA)
Eric Badger created HDFS-12089:
--

 Summary: Fix ambiguous NN retry log message
 Key: HDFS-12089
 URL: https://issues.apache.org/jira/browse/HDFS-12089
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


{noformat}
INFO [main] org.apache.hadoop.hdfs.web.WebHdfsFileSystem: Retrying connect to 
namenode: foobar. Already tried 0 time(s); retry policy is 
{noformat}
The message is misleading since it has already tried once. This message 
indicates the first retry attempt and that it had retried 0 times in the past. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11861) ipc.Client.Connection#sendRpcRequest should log request name

2017-06-08 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042837#comment-16042837
 ] 

Eric Badger commented on HDFS-11861:


[~jzhuge], [~xiaochen], this commit has broken the following tests in 
branch-2.8 and branch-2:
TestClientProtocolWithDelegationToken.testDelegationTokenRpc
TestClientToAMTokens.testClientToAMTokens
TestClientToAMTokens.testClientTokenRace

> ipc.Client.Connection#sendRpcRequest should log request name
> 
>
> Key: HDFS-11861
> URL: https://issues.apache.org/jira/browse/HDFS-11861
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Trivial
>  Labels: supportability
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11861.001.patch
>
>
> {{ipc.Client.Connection#sendRpcRequest}} only logs the call id.
> {code}
> if (LOG.isDebugEnabled())
>   LOG.debug(getName() + " sending #" + call.id);
> {code}
> It'd be much more helpful to log request name for several benefits:
> * Find out which requests sent to which target
> * Correlate with the debug log in {{ipc.Server.Handler}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-05 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037439#comment-16037439
 ] 

Eric Badger commented on HDFS-10816:


Precommit test failures look unrelated

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, 
> HDFS-10816.002.patch, HDFS-10816-branch-2.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-02 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034741#comment-16034741
 ] 

Eric Badger commented on HDFS-10816:


Not sure why hadoopqa isn't running on the latest patches. [~kihwal], can you 
kick the hadoopqa bot?

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, 
> HDFS-10816-branch-2.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-01 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-10816:
---
Status: Patch Available  (was: Open)

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, 
> HDFS-10816-branch-2.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-01 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-10816:
---
Status: Open  (was: Patch Available)

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, 
> HDFS-10816-branch-2.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-01 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-10816:
---
Attachment: HDFS-10816-branch-2.002.patch
HDFS-10816.002.patch

Attaching new patch for trunk. Looks like the replicationMonitor was renamed to 
the redundancyMonitor. The original patch works for branch-2, but uploading it 
as a branch-2 patch here for consistency. 

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, 
> HDFS-10816-branch-2.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11818) TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently

2017-05-12 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008707#comment-16008707
 ] 

Eric Badger commented on HDFS-11818:


lgtm +1 (non-binding)

> TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently
> ---
>
> Key: HDFS-11818
> URL: https://issues.apache.org/jira/browse/HDFS-11818
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2, 2.8.2
>Reporter: Eric Badger
>Assignee: Nathan Roberts
> Attachments: HDFS-11818-branch-2.patch, HDFS-11818.patch
>
>
> Saw a weird Mockito failure in last night's build with the following stack 
> trace:
> {noformat}
> org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 
> INodeFile cannot be returned by isRunning()
> isRunning() should return boolean
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.addBlockOnNodes(TestBlockManager.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:404)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:397)
> {noformat}
> This is pretty confusing since we explicitly set isRunning() to return true 
> in TestBlockManager's \@Before method
> {noformat}
> 154Mockito.doReturn(true).when(fsn).isRunning();
> {noformat}
> Also saw the following exception in the logs:
> {noformat}
> 2017-05-12 05:42:27,903 ERROR blockmanagement.BlockManager 
> (BlockManager.java:run(2796)) - Error while processing replication queues 
> async
> org.mockito.exceptions.base.MockitoException: 
> 'writeLockInterruptibly' is a *void method* and it *cannot* be stubbed with a 
> *return value*!
> Voids are usually stubbed with Throwables:
> doThrow(exception).when(mock).someVoidMethod();
> If the method you are trying to stub is *overloaded* then make sure you are 
> calling the right overloaded version.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processMisReplicatesAsync(BlockManager.java:2841)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.access$100(BlockManager.java:120)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$1.run(BlockManager.java:2792)
> {noformat}
> This is also weird since we don't do any explicit mocking with 
> {{writeLockInterruptibly}} via fsn in the test. It has to be something 
> changing the mocks or non-thread safe access or something like that. I can't 
> explain the failures otherwise. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11818) TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently

2017-05-12 Thread Eric Badger (JIRA)
Eric Badger created HDFS-11818:
--

 Summary: TestBlockManager.testSufficientlyReplBlocksUsesNewRack 
fails intermittently
 Key: HDFS-11818
 URL: https://issues.apache.org/jira/browse/HDFS-11818
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger


Saw a weird Mockito failure in last night's build with the following stack 
trace:
{noformat}
org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 
INodeFile cannot be returned by isRunning()
isRunning() should return boolean
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.addBlockOnNodes(TestBlockManager.java:555)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:404)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:397)
{noformat}
This is pretty confusing since we explicitly set isRunning() to return true in 
TestBlockManager's \@Before method
{noformat}
154Mockito.doReturn(true).when(fsn).isRunning();
{noformat}

Also saw the following exception in the logs:
{noformat}
2017-05-12 05:42:27,903 ERROR blockmanagement.BlockManager 
(BlockManager.java:run(2796)) - Error while processing replication queues async
org.mockito.exceptions.base.MockitoException: 
'writeLockInterruptibly' is a *void method* and it *cannot* be stubbed with a 
*return value*!
Voids are usually stubbed with Throwables:
doThrow(exception).when(mock).someVoidMethod();
If the method you are trying to stub is *overloaded* then make sure you are 
calling the right overloaded version.
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processMisReplicatesAsync(BlockManager.java:2841)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.access$100(BlockManager.java:120)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$1.run(BlockManager.java:2792)
{noformat}
This is also weird since we don't do any explicit mocking with 
{{writeLockInterruptibly}} via fsn in the test. It has to be something changing 
the mocks or non-thread safe access or something like that. I can't explain the 
failures otherwise. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11745) Increase HDFS test timeouts from 1 second to 10 seconds

2017-05-10 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11745:
---
Attachment: HDFS-11745.002.patch

Thanks for the review, [~jlowe]
bq. I noticed that TestNameNodeMetrics#testCapacityMetrics also has a pretty 
low timeout (1.8 seconds, seems like an odd number). I think we should bump 
that as well.
Uploading new patch that increases this to 10 seconds as well

> Increase HDFS test timeouts from 1 second to 10 seconds
> ---
>
> Key: HDFS-11745
> URL: https://issues.apache.org/jira/browse/HDFS-11745
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11745.001.patch, HDFS-11745.002.patch
>
>
> 1 second test timeouts are susceptible to failure on overloaded or otherwise 
> slow machines



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11745) Increase HDFS tests from 1 second to 10 seconds

2017-05-03 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11745:
---
Attachment: HDFS-11745.001.patch

Uploading patch

> Increase HDFS tests from 1 second to 10 seconds
> ---
>
> Key: HDFS-11745
> URL: https://issues.apache.org/jira/browse/HDFS-11745
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11745.001.patch
>
>
> 1 second test timeouts are susceptible to failure on overloaded or otherwise 
> slow machines



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11745) Increase HDFS tests from 1 second to 10 seconds

2017-05-03 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11745:
---
Status: Patch Available  (was: Open)

> Increase HDFS tests from 1 second to 10 seconds
> ---
>
> Key: HDFS-11745
> URL: https://issues.apache.org/jira/browse/HDFS-11745
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11745.001.patch
>
>
> 1 second test timeouts are susceptible to failure on overloaded or otherwise 
> slow machines



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11745) Increase HDFS test timeouts from 1 second to 10 seconds

2017-05-03 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11745:
---
Summary: Increase HDFS test timeouts from 1 second to 10 seconds  (was: 
Increase HDFS tests from 1 second to 10 seconds)

> Increase HDFS test timeouts from 1 second to 10 seconds
> ---
>
> Key: HDFS-11745
> URL: https://issues.apache.org/jira/browse/HDFS-11745
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11745.001.patch
>
>
> 1 second test timeouts are susceptible to failure on overloaded or otherwise 
> slow machines



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11745) Increase HDFS tests from 1 second to 10 seconds

2017-05-03 Thread Eric Badger (JIRA)
Eric Badger created HDFS-11745:
--

 Summary: Increase HDFS tests from 1 second to 10 seconds
 Key: HDFS-11745
 URL: https://issues.apache.org/jira/browse/HDFS-11745
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


1 second test timeouts are susceptible to failure on overloaded or otherwise 
slow machines



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10459) getTurnOffTip computes needed block incorrectly for threshold < 1 in b2.7

2017-04-28 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-10459:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Not a critical fix for 2.7, so closing as won't fix.

As for trunk, after speaking offline with [~daryn] and [~kihwal], it looks like 
truncating (i.e. rounding down) is the easier approach here, so we'll just 
leave this as is and not change anything. The off by 1 error is already fixed 
there. 

> getTurnOffTip computes needed block incorrectly for threshold < 1 in b2.7
> -
>
> Key: HDFS-10459
> URL: https://issues.apache.org/jira/browse/HDFS-10459
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10459.001.patch, HDFS-10459.002.patch, 
> HDFS-10459.003.patch, HDFS-10459-b2.7.002.patch, HDFS-10459-b2.7.003.patch
>
>
> GetTurnOffTip overstates the number of blocks necessary to come out of safe 
> mode by 1 due to an arbitrary '+1' in the code. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11662) TestJobEndNotifier.testNotificationTimeout fails intermittently

2017-04-17 Thread Eric Badger (JIRA)
Eric Badger created HDFS-11662:
--

 Summary: TestJobEndNotifier.testNotificationTimeout fails 
intermittently
 Key: HDFS-11662
 URL: https://issues.apache.org/jira/browse/HDFS-11662
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger


{noformat}
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:55)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.Assert.assertTrue(Assert.java:31)
at junit.framework.TestCase.assertTrue(TestCase.java:201)
at 
org.apache.hadoop.mapred.TestJobEndNotifier.testNotificationTimeout(TestJobEndNotifier.java:182)
{noformat}

This test depends on absolute timing, which can't be guaranteed. If 
{{JobEndNotifier.localRunnerNotification(jobConf, jobStatus);}} doesn't run in 
less than 2 seconds, the test will fail. Loading up my machine can cause this 
failure consistently. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10459) getTurnOffTip computes needed block incorrectly for threshold < 1 in b2.7

2017-04-07 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-10459:
---
Attachment: HDFS-10459.003.patch

Either missed a test when I initially uploaded the trunk patch or it was 
added/modified since I put it up. Anyway, here's an updated patch for trunk. 
This patch applies to trunk and branch-2. 

> getTurnOffTip computes needed block incorrectly for threshold < 1 in b2.7
> -
>
> Key: HDFS-10459
> URL: https://issues.apache.org/jira/browse/HDFS-10459
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10459.001.patch, HDFS-10459.002.patch, 
> HDFS-10459.003.patch, HDFS-10459-b2.7.002.patch, HDFS-10459-b2.7.003.patch
>
>
> GetTurnOffTip overstates the number of blocks necessary to come out of safe 
> mode by 1 due to an arbitrary '+1' in the code. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11592) Closing a file has a wasteful preconditions in NameNode

2017-03-30 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950007#comment-15950007
 ] 

Eric Badger commented on HDFS-11592:


Thanks, [~liuml07]!

> Closing a file has a wasteful preconditions in NameNode
> ---
>
> Key: HDFS-11592
> URL: https://issues.apache.org/jira/browse/HDFS-11592
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 3.0.0-alpha3, 2.8.1
>
> Attachments: HDFS-11592.001.patch
>
>
> When a file is closed, the NN checks if all the blocks are complete. Instead 
> of a simple 'if (!complete) throw new IllegalState(expensive-err-string)" it 
> invokes "Preconditions.checkStatus(complete, expensive-err-string)". The 
> check is done in a loop for all blocks, so more blocks = more penalty. The 
> expensive string should only be computed when an error actually occurs. A 
> telltale sign is seeing this in a stacktrace:
> {noformat}
>at java.lang.Class.getEnclosingMethod0(Native Method)
> at java.lang.Class.getEnclosingMethodInfo(Class.java:1072)
> at java.lang.Class.getEnclosingClass(Class.java:1272)
> at java.lang.Class.getSimpleBinaryName(Class.java:1443)
> at java.lang.Class.getSimpleName(Class.java:1309)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:246)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11592) Closing a file has a wasteful preconditions

2017-03-29 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11592:
---
Status: Patch Available  (was: Open)

> Closing a file has a wasteful preconditions
> ---
>
> Key: HDFS-11592
> URL: https://issues.apache.org/jira/browse/HDFS-11592
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11592.001.patch
>
>
> When a file is closed, the NN checks if all the blocks are complete. Instead 
> of a simple 'if (!complete) throw new IllegalState(expensive-err-string)" it 
> invokes "Preconditions.checkStatus(complete, expensive-err-string)". The 
> check is done in a loop for all blocks, so more blocks = more penalty. The 
> expensive string should only be computed when an error actually occurs. A 
> telltale sign is seeing this in a stacktrace:
> {noformat}
>at java.lang.Class.getEnclosingMethod0(Native Method)
> at java.lang.Class.getEnclosingMethodInfo(Class.java:1072)
> at java.lang.Class.getEnclosingClass(Class.java:1272)
> at java.lang.Class.getSimpleBinaryName(Class.java:1443)
> at java.lang.Class.getSimpleName(Class.java:1309)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:246)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11592) Closing a file has a wasteful preconditions

2017-03-29 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11592:
---
Attachment: HDFS-11592.001.patch

Uploading patch to get rid of Preconditions and just do the expression checking 
up front. 

> Closing a file has a wasteful preconditions
> ---
>
> Key: HDFS-11592
> URL: https://issues.apache.org/jira/browse/HDFS-11592
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11592.001.patch
>
>
> When a file is closed, the NN checks if all the blocks are complete. Instead 
> of a simple 'if (!complete) throw new IllegalState(expensive-err-string)" it 
> invokes "Preconditions.checkStatus(complete, expensive-err-string)". The 
> check is done in a loop for all blocks, so more blocks = more penalty. The 
> expensive string should only be computed when an error actually occurs. A 
> telltale sign is seeing this in a stacktrace:
> {noformat}
>at java.lang.Class.getEnclosingMethod0(Native Method)
> at java.lang.Class.getEnclosingMethodInfo(Class.java:1072)
> at java.lang.Class.getEnclosingClass(Class.java:1272)
> at java.lang.Class.getSimpleBinaryName(Class.java:1443)
> at java.lang.Class.getSimpleName(Class.java:1309)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:246)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11592) Closing a file has a wasteful preconditions

2017-03-29 Thread Eric Badger (JIRA)
Eric Badger created HDFS-11592:
--

 Summary: Closing a file has a wasteful preconditions
 Key: HDFS-11592
 URL: https://issues.apache.org/jira/browse/HDFS-11592
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


When a file is closed, the NN checks if all the blocks are complete. Instead of 
a simple 'if (!complete) throw new IllegalState(expensive-err-string)" it 
invokes "Preconditions.checkStatus(complete, expensive-err-string)". The check 
is done in a loop for all blocks, so more blocks = more penalty. The expensive 
string should only be computed when an error actually occurs. A telltale sign 
is seeing this in a stacktrace:
{noformat}
   at java.lang.Class.getEnclosingMethod0(Native Method)
at java.lang.Class.getEnclosingMethodInfo(Class.java:1072)
at java.lang.Class.getEnclosingClass(Class.java:1272)
at java.lang.Class.getSimpleBinaryName(Class.java:1443)
at java.lang.Class.getSimpleName(Class.java:1309)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:246)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11512) Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum

2017-03-08 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11512:
---
Attachment: HDFS-11512.001.patch

Uploading patch to increase timeout to 60s

> Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum
> 
>
> Key: HDFS-11512
> URL: https://issues.apache.org/jira/browse/HDFS-11512
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11512.001.patch
>
>
> Looks like I missed this test when I increased the timeout in HDFS-11404



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11512) Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum

2017-03-08 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11512:
---
Status: Patch Available  (was: Open)

> Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum
> 
>
> Key: HDFS-11512
> URL: https://issues.apache.org/jira/browse/HDFS-11512
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11512.001.patch
>
>
> Looks like I missed this test when I increased the timeout in HDFS-11404



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11512) Increase timeout on TestShortCircuitLocalRead.testSkipWithVerifyChecksum

2017-03-08 Thread Eric Badger (JIRA)
Eric Badger created HDFS-11512:
--

 Summary: Increase timeout on 
TestShortCircuitLocalRead.testSkipWithVerifyChecksum
 Key: HDFS-11512
 URL: https://issues.apache.org/jira/browse/HDFS-11512
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


Looks like I missed this test when I increased the timeout in HDFS-11404



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc

2017-02-23 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880650#comment-15880650
 ] 

Eric Badger commented on HDFS-11404:


Thanks, [~eepayne]!

> Increase timeout on 
> TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
> 
>
> Key: HDFS-11404
> URL: https://issues.apache.org/jira/browse/HDFS-11404
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 2.9.0, 3.0.0-alpha3, 2.8.1
>
> Attachments: HDFS-11404.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc

2017-02-10 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11404:
---
Attachment: HDFS-11404.001.patch

> Increase timeout on 
> TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
> 
>
> Key: HDFS-11404
> URL: https://issues.apache.org/jira/browse/HDFS-11404
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11404.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc

2017-02-10 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11404:
---
Attachment: (was: HDFS-11404.001.patch)

> Increase timeout on 
> TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
> 
>
> Key: HDFS-11404
> URL: https://issues.apache.org/jira/browse/HDFS-11404
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc

2017-02-10 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11404:
---
Attachment: HDFS-11404.001.patch

All of the tests that start up a MiniDFSCluster have a timeout of 60s (via 
HDFS-6610) except for this one. I saw it timeout recently in a local build.

> Increase timeout on 
> TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
> 
>
> Key: HDFS-11404
> URL: https://issues.apache.org/jira/browse/HDFS-11404
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11404.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc

2017-02-10 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11404:
---
Status: Patch Available  (was: Open)

> Increase timeout on 
> TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
> 
>
> Key: HDFS-11404
> URL: https://issues.apache.org/jira/browse/HDFS-11404
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11404.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc

2017-02-10 Thread Eric Badger (JIRA)
Eric Badger created HDFS-11404:
--

 Summary: Increase timeout on 
TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
 Key: HDFS-11404
 URL: https://issues.apache.org/jira/browse/HDFS-11404
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-20 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765376#comment-15765376
 ] 

Eric Badger commented on HDFS-11094:


[~liuml07], can we cherry-pick this to 2.8? I'm seeing test failures from 
{{TestLargeBlockReport.testBlockReportSucceedsWithLargerLengthLimit}} due to a 
race condition in getActiveNN() that this will fix. 

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-11094-branch-2.011.patch, HDFS-11094.001.patch, 
> HDFS-11094.002.patch, HDFS-11094.003.patch, HDFS-11094.004.patch, 
> HDFS-11094.005.patch, HDFS-11094.006.patch, HDFS-11094.007.patch, 
> HDFS-11094.008.patch, HDFS-11094.009-b2.patch, HDFS-11094.009.patch, 
> HDFS-11094.010-b2.patch, HDFS-11094.010.patch, HDFS-11094.011.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-15 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752852#comment-15752852
 ] 

Eric Badger commented on HDFS-11094:


Thanks, [~liuml07]!

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-11094-branch-2.011.patch, HDFS-11094.001.patch, 
> HDFS-11094.002.patch, HDFS-11094.003.patch, HDFS-11094.004.patch, 
> HDFS-11094.005.patch, HDFS-11094.006.patch, HDFS-11094.007.patch, 
> HDFS-11094.008.patch, HDFS-11094.009-b2.patch, HDFS-11094.009.patch, 
> HDFS-11094.010-b2.patch, HDFS-11094.010.patch, HDFS-11094.011.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-15 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: HDFS-11094-branch-2.011.patch

Uploading branch-2 patch. 

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094-branch-2.011.patch, HDFS-11094.001.patch, 
> HDFS-11094.002.patch, HDFS-11094.003.patch, HDFS-11094.004.patch, 
> HDFS-11094.005.patch, HDFS-11094.006.patch, HDFS-11094.007.patch, 
> HDFS-11094.008.patch, HDFS-11094.009-b2.patch, HDFS-11094.009.patch, 
> HDFS-11094.010-b2.patch, HDFS-11094.010.patch, HDFS-11094.011.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-13 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746006#comment-15746006
 ] 

Eric Badger commented on HDFS-11094:


[~liuml07], can you take a look at the latest patch?

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, 
> HDFS-11094.010.patch, HDFS-11094.011.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-12 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743081#comment-15743081
 ] 

Eric Badger commented on HDFS-11094:


Test failure looks like it's unrelated and doesn't fail for me locally

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, 
> HDFS-11094.010.patch, HDFS-11094.011.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-09 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: HDFS-11094.011.patch

The test was racy. Heartbeats were setting the active NN to null after it was 
getting set by the test. Fixed the test by turning off heartbeats. The other 
unit test is failing elsewhere and not related to this patch.

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, 
> HDFS-11094.010.patch, HDFS-11094.011.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-09 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735593#comment-15735593
 ] 

Eric Badger commented on HDFS-11094:


The TestBPOfferService failure is definitely relevant here. Not sure about the 
other one. Let me take a look

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, 
> HDFS-11094.010.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-08 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: HDFS-11094.010.patch

Attaching trunk patch again so it runs against jenkins instead of the branch-2 
patch

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, 
> HDFS-11094.010.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-08 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: (was: HDFS-11094.010.patch)

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11207) Revert HDFS-5079. Cleaning up NNHAStatusHeartbeat.State DatanodeProtocolProtos.

2016-12-08 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733672#comment-15733672
 ] 

Eric Badger commented on HDFS-11207:


Thanks, [~kihwal]!

> Revert HDFS-5079. Cleaning up NNHAStatusHeartbeat.State 
> DatanodeProtocolProtos.
> ---
>
> Key: HDFS-11207
> URL: https://issues.apache.org/jira/browse/HDFS-11207
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 3.0.0-alpha1
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Critical
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-11207.001.patch
>
>
> HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it 
> added in the {{INITIALIZING}} state via {{HAServiceStateProto}}.
> Before change:
> {noformat}
> enum State {
>ACTIVE = 0;
>STANDBY = 1;
> }
> {noformat}
> After change:
> {noformat}
> enum HAServiceStateProto {
>   INITIALIZING = 0;
>   ACTIVE = 1;
>   STANDBY = 2;
> }
> {noformat}
> So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new 
> {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as 
> unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that 
> haven't been updated will misinterpret the NN state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-08 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Status: Patch Available  (was: Reopened)

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, 
> HDFS-11094.010.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-08 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: HDFS-11094.010-b2.patch

Attaching associated branch-2/branch-2.8 patch since it won't cherry-pick 
cleanly from trunk.

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010-b2.patch, 
> HDFS-11094.010.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-08 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: HDFS-11094.010.patch

Attaching new trunk patch after the revert

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch, HDFS-11094.010.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-06 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726682#comment-15726682
 ] 

Eric Badger commented on HDFS-11094:


Ok, yes, that sounds good. Thanks!

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos

2016-12-06 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726624#comment-15726624
 ] 

Eric Badger commented on HDFS-11207:


I agree that we should revert HDFS-5079. If we intend to do that, we should 
revert HDFS-11094 first so that the build is not broken. Afterwards, we can 
work on putting HDFS-11094 back in with a new patch. This should make 
everything look clean in the change logs.

> Unnecessary incompatible change of NNHAStatusHeartbeat.state in 
> DatanodeProtocolProtos
> --
>
> Key: HDFS-11207
> URL: https://issues.apache.org/jira/browse/HDFS-11207
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 3.0.0-alpha1
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Critical
> Attachments: HDFS-11207.001.patch
>
>
> HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it 
> added in the {{INITIALIZING}} state via {{HAServiceStateProto}}.
> Before change:
> {noformat}
> enum State {
>ACTIVE = 0;
>STANDBY = 1;
> }
> {noformat}
> After change:
> {noformat}
> enum HAServiceStateProto {
>   INITIALIZING = 0;
>   ACTIVE = 1;
>   STANDBY = 2;
> }
> {noformat}
> So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new 
> {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as 
> unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that 
> haven't been updated will misinterpret the NN state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-06 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726425#comment-15726425
 ] 

Eric Badger commented on HDFS-11094:


[~liuml07], actually hold off on committing that branch-2/branch-2.8 patch. Can 
you instead revert the trunk commit? HDFS-11207 looks like it will probably 
revert HDFS-5079. However, we will need to revert this jira first to avoid 
breaking the build. After HDFS-5079 gets reverted, we should be able to use 1 
patch (the branch-2/branch-2.8 patch) to commit all the way through from trunk. 

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-06 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: HDFS-11094.009-b2.patch

[~liuml07], attaching a branch-2/branch-2.8 patch. Just had to change around 
the type definitions of some things. Also moved {{NNHAStatusHeartbeatProto}} 
from DatanodeProtocol.proto to HdfsServer.proto (which is imported by 
DatanodeProtocol.proto) so that it could be used by {{NamespaceInfoProto}}. 

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009-b2.patch, HDFS-11094.009.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos

2016-12-05 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723622#comment-15723622
 ] 

Eric Badger edited comment on HDFS-11207 at 12/5/16 10:54 PM:
--

It looks like {{HAServiceState}} is used in more places than just 
DatanodeProtocolProtos. Because of that, we can't simply change 
{{HAServiceState}} or else we will have the exact same problem that we're 
trying to fix. Moral of the story, we need 2 enums that will define {{ACTIVE}} 
and {{STANDBY}} differently. Cancelling the patch


was (Author: ebadger):
It looks like {{HAServiceState}} is used in more places than just 
DatanodeProtocolProtos. Because of that, we can't simply change 
{{HAServiceState}} or else we will have the exact same problem that we're 
trying to fix. Moral of the story, we need 2 enums that will define {{ACTIVE}} 
and {{STANDBY}} differently. 

> Unnecessary incompatible change of NNHAStatusHeartbeat.state in 
> DatanodeProtocolProtos
> --
>
> Key: HDFS-11207
> URL: https://issues.apache.org/jira/browse/HDFS-11207
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11207.001.patch
>
>
> HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it 
> added in the {{INITIALIZING}} state via {{HAServiceStateProto}}.
> Before change:
> {noformat}
> enum State {
>ACTIVE = 0;
>STANDBY = 1;
> }
> {noformat}
> After change:
> {noformat}
> enum HAServiceStateProto {
>   INITIALIZING = 0;
>   ACTIVE = 1;
>   STANDBY = 2;
> }
> {noformat}
> So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new 
> {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as 
> unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that 
> haven't been updated will misinterpret the NN state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos

2016-12-05 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723622#comment-15723622
 ] 

Eric Badger commented on HDFS-11207:


It looks like {{HAServiceState}} is used in more places than just 
DatanodeProtocolProtos. Because of that, we can't simply change 
{{HAServiceState}} or else we will have the exact same problem that we're 
trying to fix. Moral of the story, we need 2 enums that will define {{ACTIVE}} 
and {{STANDBY}} differently. 

> Unnecessary incompatible change of NNHAStatusHeartbeat.state in 
> DatanodeProtocolProtos
> --
>
> Key: HDFS-11207
> URL: https://issues.apache.org/jira/browse/HDFS-11207
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11207.001.patch
>
>
> HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it 
> added in the {{INITIALIZING}} state via {{HAServiceStateProto}}.
> Before change:
> {noformat}
> enum State {
>ACTIVE = 0;
>STANDBY = 1;
> }
> {noformat}
> After change:
> {noformat}
> enum HAServiceStateProto {
>   INITIALIZING = 0;
>   ACTIVE = 1;
>   STANDBY = 2;
> }
> {noformat}
> So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new 
> {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as 
> unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that 
> haven't been updated will misinterpret the NN state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos

2016-12-05 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11207:
---
Status: Open  (was: Patch Available)

> Unnecessary incompatible change of NNHAStatusHeartbeat.state in 
> DatanodeProtocolProtos
> --
>
> Key: HDFS-11207
> URL: https://issues.apache.org/jira/browse/HDFS-11207
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11207.001.patch
>
>
> HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it 
> added in the {{INITIALIZING}} state via {{HAServiceStateProto}}.
> Before change:
> {noformat}
> enum State {
>ACTIVE = 0;
>STANDBY = 1;
> }
> {noformat}
> After change:
> {noformat}
> enum HAServiceStateProto {
>   INITIALIZING = 0;
>   ACTIVE = 1;
>   STANDBY = 2;
> }
> {noformat}
> So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new 
> {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as 
> unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that 
> haven't been updated will misinterpret the NN state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos

2016-12-05 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11207:
---
Status: Patch Available  (was: Open)

> Unnecessary incompatible change of NNHAStatusHeartbeat.state in 
> DatanodeProtocolProtos
> --
>
> Key: HDFS-11207
> URL: https://issues.apache.org/jira/browse/HDFS-11207
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11207.001.patch
>
>
> HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it 
> added in the {{INITIALIZING}} state via {{HAServiceStateProto}}.
> Before change:
> {noformat}
> enum State {
>ACTIVE = 0;
>STANDBY = 1;
> }
> {noformat}
> After change:
> {noformat}
> enum HAServiceStateProto {
>   INITIALIZING = 0;
>   ACTIVE = 1;
>   STANDBY = 2;
> }
> {noformat}
> So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new 
> {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as 
> unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that 
> haven't been updated will misinterpret the NN state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos

2016-12-05 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11207:
---
Attachment: HDFS-11207.001.patch

Attaching a patch that adds a new field to the enum, but won't change the 
functionality of the old fields. This will still break the datanodes if they 
are not equipped to handle the {{INITIALIZING}} state.

> Unnecessary incompatible change of NNHAStatusHeartbeat.state in 
> DatanodeProtocolProtos
> --
>
> Key: HDFS-11207
> URL: https://issues.apache.org/jira/browse/HDFS-11207
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11207.001.patch
>
>
> HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it 
> added in the {{INITIALIZING}} state via {{HAServiceStateProto}}.
> Before change:
> {noformat}
> enum State {
>ACTIVE = 0;
>STANDBY = 1;
> }
> {noformat}
> After change:
> {noformat}
> enum HAServiceStateProto {
>   INITIALIZING = 0;
>   ACTIVE = 1;
>   STANDBY = 2;
> }
> {noformat}
> So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new 
> {{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as 
> unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that 
> haven't been updated will misinterpret the NN state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11207) Unnecessary incompatible change of NNHAStatusHeartbeat.state in DatanodeProtocolProtos

2016-12-05 Thread Eric Badger (JIRA)
Eric Badger created HDFS-11207:
--

 Summary: Unnecessary incompatible change of 
NNHAStatusHeartbeat.state in DatanodeProtocolProtos
 Key: HDFS-11207
 URL: https://issues.apache.org/jira/browse/HDFS-11207
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


HDFS-5079 changed the meaning of state in {{NNHAStatusHeartbeat}} when it added 
in the {{INITIALIZING}} state via {{HAServiceStateProto}}.

Before change:
{noformat}
enum State {
   ACTIVE = 0;
   STANDBY = 1;
}
{noformat}

After change:
{noformat}
enum HAServiceStateProto {
  INITIALIZING = 0;
  ACTIVE = 1;
  STANDBY = 2;
}
{noformat}

So the new {{INITIALIZING}} state will be interpreted as {{ACTIVE}}, new 
{{ACTIVE}} interpreted as {{STANDBY}} and new {{STANDBY}} interpreted as 
unknown. Any rolling upgrade to 3.0.0 will break because the datanodes that 
haven't been updated will misinterpret the NN state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-12-02 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15716729#comment-15716729
 ] 

Eric Badger commented on HDFS-11094:


bq. I think the existing tests are quite adequate. I understand that a 
full-blown mini cluster is sometimes needed to test the distributed file 
system. However, we should avoid adding such end-to-end tests if it is possible 
to have reasonable unit tests.

Upon looking at this again, I agree with [~kihwal]. I don't think that it is 
necessary for us to use a minicluster in this case. The current tests are 
adequate IMO since they test the methods that are directly used on either side 
of the version request. Additionally, the minicluster is expensive and creating 
a unit test with the minicluster would be difficult in this case since it 
requires a heartbeat to get out of its build() method (though difficulty is not 
my main objection).

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-11-29 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705818#comment-15705818
 ] 

Eric Badger commented on HDFS-11094:


{quote}
1. I discussed with Arpit Agarwal offline and he suggested us use the same 
logic in updateActorStatesFromHeartbeat to update the active NN 
bpServiceToActive, which has dealt with several cases carefully. Moreover, if 
we are updating bpServiceToActive we should likely also update 
lastActiveClaimTxId. To achieve this, I think we can pass 
NNHAStatusHeartbeatProto instead of HAServiceStateProto in NamespaceInfoProto.
{quote}
[~liuml07], I actually did it this way in the patch on purpose. The entire 
logic of updating {{bpServiceToActive}} will occur before any heartbeats start, 
since we are doing this during the handshake between the DN and the NN. If we 
send in an {{NNHAStatusHeartbeatProto}} instead of a {{HAServiceStateProto}} 
then we will have to deal with the {{lastActiveClaimTxId}} as you have 
mentioned. However, this would require more serious changes to the code, since 
we would have to either set and send along a TxId on the NN side (extra code 
change for what I see is negligible benefit) or we would need to arbitrarily 
create one on the DN side (would need to set it to be below the first heartbeat 
TxId, so it would have to be a negative number or would have to make extra 
changes).

At this point, we want the DN to have an active before it starts trying to do 
anything with it (the whole point of this fix). If, for whatever reason, both 
NNs declare themselves as active, then it will choose the first one and ignore 
the second. If the wrong assertion is made, then it will talk to the standby 
and we will get a simple standby exception and then once the next heartbeat 
comes we will update the correct active. So worst case scenario we get a 
standby exception and retry, which is still loads better than the NPE that we 
were getting before. I think that since this is such a small window that it is 
unnecessary to make more changes with the TxId. 

[~daryn] may have more thoughts on this.

{quote}
2. For the unit test, can we set a very large heartbeat interval in 
configuration, and check the active NN is not null after 
cluster.waitForActive()? Mocked tests are useful as well and can be kept. 
Another idea is to drop heartbeat request against a spied HeartbeatManager.
{quote}
This should be fairly easy to do. I'll put up a patch shortly with this added 
test. 


> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-11-23 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: HDFS-11094.009.patch

Addressing checkstyle issues

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch, 
> HDFS-11094.009.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-11-22 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: HDFS-11094.008.patch

[~liuml07], attaching a patch that includes unit tests both on the DN and NN 
side of the change. I mocked out most of it, so the tests should be pretty 
simple. But a review to make sure that I'm testing what I think I'm testing 
would be appreciated. 

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch, HDFS-11094.008.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-11-18 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: HDFS-11094.007.patch

I'm addressing the logging and checkstyle warnings in this patch. However, I 
will need some time to figure out how to do the unit testing. [~liuml07], do 
you have any suggestions? It seems like this will be quite difficult to mock 
out. 

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch, HDFS-11094.007.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-11-16 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670530#comment-15670530
 ] 

Eric Badger commented on HDFS-11094:


The test failures are unrelated to the patch and do not fail for me locally. 

[~liuml07], [~daryn], [~arpitagarwal], could you please review the latest 
patch? Thanks

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11094) Send back HAState along with NamespaceInfo during a versionRequest as an optional parameter

2016-11-15 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-11094:
---
Attachment: HDFS-11094.006.patch

New patch adds in {{INITIALIZING}} state to convert() methods to fix test 
failures. Optimized redundant code in convert() methods.

> Send back HAState along with NamespaceInfo during a versionRequest as an 
> optional parameter
> ---
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch, 
> HDFS-11094.003.patch, HDFS-11094.004.patch, HDFS-11094.005.patch, 
> HDFS-11094.006.patch
>
>
> The datanode should know which NN is active when it is connecting/registering 
> to the NN. Currently, it only figures this out during its first (and 
> subsequent) heartbeat(s) and so there is a period of time where the datanode 
> is alive and registered, but can't actually do anything because it doesn't 
> know which NN is active. A byproduct of this is that the MiniDFSCluster will 
> become active before it knows what NN is active, which can lead to NPEs when 
> calling getActiveNN(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   >