FW: Scripts to make the SSH password generation easier while trying to start a cluster

2011-05-02 Thread Ramkrishna S Vasudevan
 

Hi

 

Sometimes when we have a large number of clusters we may have to specify the
password of the different machines that we are using as slaves (datanodes).

 

If the cluster is very huge we may have to repeat this everytime.

So we would  to suggest a way to avoid this

 

1. Generate a SSH key from the name node machine

2. Read the entries from the conf/slaves file, for every entry add the key
generated in step 1 to a file of slave machine.

3. Repeat the same for master file also.

 

when you execute step 1 it will prompt for the password.  This is only for
the first time.

 

After that whenever you need to start the cluster then password need not be
specified.

 

This scenario is valid when we are sure of the cluster that we will be
maintaining and we are aware of the credentials of the machine.

 

This will help the cluster administrator.

 

Pls provide your comments.

 

If it is ok I can raise a JIRA and contribute the utility.

 

Regards

Ram

 



Hadoop-Hdfs-trunk - Build # 654 - Still Failing

2011-05-02 Thread Apache Jenkins Server
See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/654/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 795522 lines...]
[junit] at java.lang.Thread.run(Thread.java:662)
[junit] 
[junit] 2011-05-02 12:33:31,232 INFO  datanode.DataNode 
(DataNode.java:shutdown(1638)) - Waiting for threadgroup to exit, active 
threads is 0
[junit] 2011-05-02 12:33:31,233 WARN  datanode.DataNode 
(DataNode.java:offerService(1065)) - BPOfferService for block 
pool=BP-884505841-127.0.1.1-1304339610140 received 
exception:java.lang.InterruptedException
[junit] 2011-05-02 12:33:31,233 WARN  datanode.DataNode 
(DataNode.java:run(1218)) - DatanodeRegistration(127.0.0.1:35270, 
storageID=DS-2053464677-127.0.1.1-35270-1304339610717, infoPort=39784, 
ipcPort=35326, storageInfo=lv=-35;cid=testClusterID;nsid=1139630309;c=0) ending 
block pool service for: BP-884505841-127.0.1.1-1304339610140
[junit] 2011-05-02 12:33:31,233 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:removeBlockPool(277)) - Removed 
bpid=BP-884505841-127.0.1.1-1304339610140 from blockPoolScannerMap
[junit] 2011-05-02 12:33:31,233 INFO  datanode.DataNode 
(FSDataset.java:shutdownBlockPool(2547)) - Removing block pool 
BP-884505841-127.0.1.1-1304339610140
[junit] 2011-05-02 12:33:31,233 INFO  datanode.FSDatasetAsyncDiskService 
(FSDatasetAsyncDiskService.java:shutdown(133)) - Shutting down all async disk 
service threads...
[junit] 2011-05-02 12:33:31,233 INFO  datanode.FSDatasetAsyncDiskService 
(FSDatasetAsyncDiskService.java:shutdown(142)) - All async disk service threads 
have been shut down.
[junit] 2011-05-02 12:33:31,234 INFO  hdfs.MiniDFSCluster 
(MiniDFSCluster.java:shutdownDataNodes(1041)) - Shutting down DataNode 0
[junit] 2011-05-02 12:33:31,234 WARN  datanode.DirectoryScanner 
(DirectoryScanner.java:shutdown(297)) - DirectoryScanner: shutdown has been 
called
[junit] 2011-05-02 12:33:31,234 INFO  datanode.BlockPoolSliceScanner 
(BlockPoolSliceScanner.java:startNewPeriod(591)) - Starting a new period : work 
left in prev period : 100.00%
[junit] 2011-05-02 12:33:31,335 INFO  ipc.Server (Server.java:stop(1626)) - 
Stopping server on 60780
[junit] 2011-05-02 12:33:31,335 INFO  ipc.Server (Server.java:run(1459)) - 
IPC Server handler 0 on 60780: exiting
[junit] 2011-05-02 12:33:31,335 INFO  ipc.Server (Server.java:run(487)) - 
Stopping IPC Server listener on 60780
[junit] 2011-05-02 12:33:31,335 INFO  datanode.DataNode 
(DataNode.java:shutdown(1638)) - Waiting for threadgroup to exit, active 
threads is 1
[junit] 2011-05-02 12:33:31,336 WARN  datanode.DataNode 
(DataXceiverServer.java:run(143)) - 127.0.0.1:60473:DataXceiveServer: 
java.nio.channels.AsynchronousCloseException
[junit] at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
[junit] at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:159)
[junit] at 
sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:136)
[junit] at java.lang.Thread.run(Thread.java:662)
[junit] 
[junit] 2011-05-02 12:33:31,335 INFO  ipc.Server (Server.java:run(691)) - 
Stopping IPC Server Responder
[junit] 2011-05-02 12:33:31,338 INFO  datanode.DataNode 
(DataNode.java:shutdown(1638)) - Waiting for threadgroup to exit, active 
threads is 0
[junit] 2011-05-02 12:33:31,338 WARN  datanode.DataNode 
(DataNode.java:offerService(1065)) - BPOfferService for block 
pool=BP-884505841-127.0.1.1-1304339610140 received 
exception:java.lang.InterruptedException
[junit] 2011-05-02 12:33:31,338 WARN  datanode.DataNode 
(DataNode.java:run(1218)) - DatanodeRegistration(127.0.0.1:60473, 
storageID=DS-140658759-127.0.1.1-60473-1304339610593, infoPort=58360, 
ipcPort=60780, storageInfo=lv=-35;cid=testClusterID;nsid=1139630309;c=0) ending 
block pool service for: BP-884505841-127.0.1.1-1304339610140
[junit] 2011-05-02 12:33:31,438 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:removeBlockPool(277)) - Removed 
bpid=BP-884505841-127.0.1.1-1304339610140 from blockPoolScannerMap
[junit] 2011-05-02 12:33:31,438 INFO  datanode.DataNode 
(FSDataset.java:shutdownBlockPool(2547)) - Removing block pool 
BP-884505841-127.0.1.1-1304339610140
[junit] 2011-05-02 12:33:31,439 INFO  datanode.FSDatasetAsyncDiskService 
(FSDatasetAsyncDiskService.java:shutdown(133)) - Shutting down all async disk 
service threads...
[junit] 2011-05-02 12:33:31,439 INFO  datanode.FSDatasetAsyncDiskService 
(FSDatasetAsyncDiskService.java:shutdown(142)) - All async disk service threads 
have been shut down.
[junit] 2011-05-02 12:33:31,440 WARN  namenode.FSNamesystem 
(FSNamesystem.java:run(3009)) - 

Hadoop-Hdfs-trunk-Commit - Build # 617 - Still Failing

2011-05-02 Thread Apache Jenkins Server
See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/617/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2828 lines...]
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 8.643 sec
[junit] Running 
org.apache.hadoop.hdfs.server.datanode.TestInterDatanodeProtocol
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.622 sec
[junit] Running 
org.apache.hadoop.hdfs.server.datanode.TestSimulatedFSDataset
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.719 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestBackupNode
[junit] Tests run: 2, Failures: 2, Errors: 0, Time elapsed: 13.046 sec
[junit] Test org.apache.hadoop.hdfs.server.namenode.TestBackupNode FAILED
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 29.146 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestComputeInvalidateWork
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.617 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestDatanodeDescriptor
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.156 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestEditLog
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 11.965 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestFileLimit
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 4.151 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestHeartbeatHandling
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.991 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestHost2NodesMap
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.074 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.623 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestOverReplicatedBlocks
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.485 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestPendingReplication
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.317 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestReplicationPolicy
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.067 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestSafeMode
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.203 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestStartup
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 9.473 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.002 sec
[junit] Running org.apache.hadoop.net.TestNetworkTopology
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.098 sec
[junit] Running org.apache.hadoop.security.TestPermission
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.86 sec

checkfailure:
[touch] Creating 
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build/test/testsfailed

BUILD FAILED
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build.xml:705:
 The following error occurred while executing this line:
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build.xml:662:
 The following error occurred while executing this line:
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build.xml:730:
 Tests failed!

Total time: 8 minutes 29 seconds
[FINDBUGS] Skipping publisher since build result is FAILURE
Recording fingerprints
Archiving artifacts
Recording test results
Publishing Javadoc
Publishing Clover coverage report...
No Clover report will be published due to a Build Failure
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
2 tests failed.
FAILED:  org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError: null
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:152)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.__CLR3_0_2xuql33xs5(TestBackupNode.java:103)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:101)


FAILED:  
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testBackupRegistration

Error Message:
Only one backup node should be able to start

Stack Trace:

[jira] [Created] (HDFS-1875) MiniDFSCluster hard-codes dfs.datanode.address to localhost

2011-05-02 Thread Eric Payne (JIRA)
MiniDFSCluster hard-codes dfs.datanode.address to localhost
---

 Key: HDFS-1875
 URL: https://issues.apache.org/jira/browse/HDFS-1875
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Eric Payne
Assignee: Eric Payne
 Fix For: 0.23.0


When creating RPC addresses that represent the communication sockets for each 
simulated DataNode, the MiniDFSCluster class hard-codes the address of the 
dfs.datanode.address port to be 127.0.0.1:0

The DataNodeCluster test tool uses the MiniDFSCluster class to create a 
selected number of simulated datanodes on a single host. In the DataNodeCluster 
setup, the NameNode is not simulated but is started as a separate daemon.

The problem is that if the write requrests into the simulated datanodes are 
originated on a host that is not the same host running the simulated datanodes, 
the connections are refused. This is because the RPC sockets that are started 
by MiniDFSCluster are for localhost (127.0.0.1) and are not accessible from 
outside that same machine.

It is proposed that the MiniDFSCluster.setupDatanodeAddress() method be 
overloaded in order to accommodate an environment where the NameNode is on one 
host, the client is on another host, and the simulated DataNodes are on yet 
another host (or even multiple hosts simulating multiple DataNodes each).

The overloaded API would add a parameter that would be used as the basis for 
creating the RPS sockets. By default, it would remain 127.0.0.1


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HDFS-1773) Remove a datanode from cluster if include list is not empty and this datanode is removed from both include and exclude lists

2011-05-02 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang reopened HDFS-1773:



Port this jira to trunk.  Mark HDFS-1867 as a duplication of this jira.

 Remove a datanode from cluster if include list is not empty and this datanode 
 is removed from both include and exclude lists
 

 Key: HDFS-1773
 URL: https://issues.apache.org/jira/browse/HDFS-1773
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.20.203.1
 Environment: branch-20-security
Reporter: Tanping Wang
Assignee: Tanping Wang
Priority: Minor
 Fix For: 0.20.204.0

 Attachments: HDFS-1773-2.patch, HDFS-1773-3.patch, HDFS-1773.patch


 Our service engineering team who operates the clusters on a daily basis 
 founds it is confusing that after a data node is decommissioned, there is no 
 way to make the cluster forget about this data node and it always remains in 
 the dead node list.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-1867) Remove a datanode from cluster if include list is not empty and this datanode is removed from both include and exclude lists

2011-05-02 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang resolved HDFS-1867.


Resolution: Duplicate

Solve this jira on HDFS-1773.

 Remove a datanode from cluster if include list is not empty and this datanode 
 is removed from both include and exclude lists
 

 Key: HDFS-1867
 URL: https://issues.apache.org/jira/browse/HDFS-1867
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tanping Wang
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1867.patch


 This jira is to port HDFS-1773 into trunk.  HDF-1773 was originally fixed on 
 branch-20-security.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-05-02 Thread Todd Lipcon
Apparently this merge wasn't tested against MapReduce trunk at all -- MR
trunk has been failing to compile for several days. Please see
MAPREDUCE-2465. I attempted to fix it myself but don't have enough
background in the new federation code or in RAID.

-Todd

On Thu, Apr 28, 2011 at 11:30 PM, Konstantin Shvachko
shv.had...@gmail.comwrote:

 Thanks for clarifying, Owen.
 Should we have the bylaws somewhere on wiki?
 --Konstantin


 On Thu, Apr 28, 2011 at 1:33 PM, Owen O'Malley omal...@apache.org wrote:

  On Apr 27, 2011, at 10:12 PM, Konstantin Shvachko wrote:
 
   The question is whether this is a
   * Code Change,
   which requires Lazy consensus of active committers or a
   * Adoption of New Codebase,
   which needs Lazy 2/3 majority of PMC members
 
  This is a code change, just like all of our jiras. The standard rules of
 at
  least one +1 on the jira and no -1's apply.
 
  Adoption of new codebase is adopting a new subproject or completely
  replacing trunk.
 
   Lazy consensus requires 3 binding +1 votes and no binding vetoes.
 
  This was clarified in the bylaws back in November.
 
 
 
 http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3c159e99c4-b71c-437e-9640-aa24c50d6...@apache.org%3E
 
  Where it was modified to:
 
  Lazy consensus of active committers, but with a minimum of
  one +1. The code can be committed after the first +1.
 
  -- Owen




-- 
Todd Lipcon
Software Engineer, Cloudera


Hadoop-Hdfs-trunk-Commit - Build # 618 - Still Failing

2011-05-02 Thread Apache Jenkins Server
See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/618/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2828 lines...]
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 8.524 sec
[junit] Running 
org.apache.hadoop.hdfs.server.datanode.TestInterDatanodeProtocol
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.646 sec
[junit] Running 
org.apache.hadoop.hdfs.server.datanode.TestSimulatedFSDataset
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.712 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestBackupNode
[junit] Tests run: 2, Failures: 2, Errors: 0, Time elapsed: 12.916 sec
[junit] Test org.apache.hadoop.hdfs.server.namenode.TestBackupNode FAILED
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 28.875 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestComputeInvalidateWork
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.623 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestDatanodeDescriptor
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.167 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestEditLog
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 12.185 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestFileLimit
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.346 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestHeartbeatHandling
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.743 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestHost2NodesMap
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.075 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.689 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestOverReplicatedBlocks
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.582 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestPendingReplication
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.291 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestReplicationPolicy
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.069 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestSafeMode
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.208 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestStartup
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 9.38 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.097 sec
[junit] Running org.apache.hadoop.net.TestNetworkTopology
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.103 sec
[junit] Running org.apache.hadoop.security.TestPermission
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.422 sec

checkfailure:
[touch] Creating 
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build/test/testsfailed

BUILD FAILED
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build.xml:705:
 The following error occurred while executing this line:
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build.xml:662:
 The following error occurred while executing this line:
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build.xml:730:
 Tests failed!

Total time: 8 minutes 37 seconds
[FINDBUGS] Skipping publisher since build result is FAILURE
Recording fingerprints
Archiving artifacts
Recording test results
Publishing Javadoc
Publishing Clover coverage report...
No Clover report will be published due to a Build Failure
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
2 tests failed.
FAILED:  org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError: null
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:152)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.__CLR3_0_2xuql33xs5(TestBackupNode.java:103)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:101)


FAILED:  
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testBackupRegistration

Error Message:
Only one backup node should be able to start

Stack Trace:

Hadoop-Hdfs-trunk-Commit - Build # 619 - Still Failing

2011-05-02 Thread Apache Jenkins Server
See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/619/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2829 lines...]
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 8.532 sec
[junit] Running 
org.apache.hadoop.hdfs.server.datanode.TestInterDatanodeProtocol
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.981 sec
[junit] Running 
org.apache.hadoop.hdfs.server.datanode.TestSimulatedFSDataset
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.719 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestBackupNode
[junit] Tests run: 2, Failures: 2, Errors: 0, Time elapsed: 12.837 sec
[junit] Test org.apache.hadoop.hdfs.server.namenode.TestBackupNode FAILED
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 29.018 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestComputeInvalidateWork
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.554 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestDatanodeDescriptor
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.162 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestEditLog
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 11.983 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestFileLimit
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 3.941 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestHeartbeatHandling
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.89 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestHost2NodesMap
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.085 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.715 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestOverReplicatedBlocks
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.552 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestPendingReplication
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.3 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestReplicationPolicy
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.06 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestSafeMode
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.397 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestStartup
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 9.577 sec
[junit] Running org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 7.738 sec
[junit] Running org.apache.hadoop.net.TestNetworkTopology
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.114 sec
[junit] Running org.apache.hadoop.security.TestPermission
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.612 sec

checkfailure:
[touch] Creating 
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build/test/testsfailed

BUILD FAILED
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build.xml:705:
 The following error occurred while executing this line:
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build.xml:662:
 The following error occurred while executing this line:
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/build.xml:730:
 Tests failed!

Total time: 8 minutes 47 seconds
[FINDBUGS] Skipping publisher since build result is FAILURE
Recording fingerprints
Archiving artifacts
Recording test results
Publishing Javadoc
Publishing Clover coverage report...
No Clover report will be published due to a Build Failure
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
2 tests failed.
FAILED:  org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError: null
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:152)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.__CLR3_0_2xuql33xuw(TestBackupNode.java:103)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:101)


FAILED:  
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testBackupRegistration

Error Message:
Only one backup node should be able to start

Stack Trace:

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-05-02 Thread suresh srinivas
We have been testing federation regularly with MapReduce with yahoo-merge
branches. With trunk we missed the contrib (raid). The dependency with
project splits has been crazy. Not sure how large changes can keep on top of
all these things.

I am working on fixing the raid contrib.

On Mon, May 2, 2011 at 2:44 PM, Todd Lipcon t...@cloudera.com wrote:

 Apparently this merge wasn't tested against MapReduce trunk at all -- MR
 trunk has been failing to compile for several days. Please see
 MAPREDUCE-2465. I attempted to fix it myself but don't have enough
 background in the new federation code or in RAID.

 -Todd

 On Thu, Apr 28, 2011 at 11:30 PM, Konstantin Shvachko
 shv.had...@gmail.comwrote:

  Thanks for clarifying, Owen.
  Should we have the bylaws somewhere on wiki?
  --Konstantin
 
 
  On Thu, Apr 28, 2011 at 1:33 PM, Owen O'Malley omal...@apache.org
 wrote:
 
   On Apr 27, 2011, at 10:12 PM, Konstantin Shvachko wrote:
  
The question is whether this is a
* Code Change,
which requires Lazy consensus of active committers or a
* Adoption of New Codebase,
which needs Lazy 2/3 majority of PMC members
  
   This is a code change, just like all of our jiras. The standard rules
 of
  at
   least one +1 on the jira and no -1's apply.
  
   Adoption of new codebase is adopting a new subproject or completely
   replacing trunk.
  
Lazy consensus requires 3 binding +1 votes and no binding vetoes.
  
   This was clarified in the bylaws back in November.
  
  
  
 
 http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3c159e99c4-b71c-437e-9640-aa24c50d6...@apache.org%3E
  
   Where it was modified to:
  
   Lazy consensus of active committers, but with a minimum of
   one +1. The code can be committed after the first +1.
  
   -- Owen
 



 --
 Todd Lipcon
 Software Engineer, Cloudera




-- 
Regards,
Suresh


[jira] [Created] (HDFS-1878) race condition in FSNamesystem.close() causes NullPointerException without serious consequence - TestHDFSServerPorts unit test failure

2011-05-02 Thread Matt Foley (JIRA)
race condition in FSNamesystem.close() causes NullPointerException without 
serious consequence - TestHDFSServerPorts unit test failure
--

 Key: HDFS-1878
 URL: https://issues.apache.org/jira/browse/HDFS-1878
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.204.0
Reporter: Matt Foley
Assignee: Matt Foley
Priority: Minor
 Fix For: 0.20.205.0


TestHDFSServerPorts was observed to intermittently throw a 
NullPointerException.  This only happens when FSNamesystem.close() is called, 
which means system termination for the Namenode, so this is not a serious bug 
for .204.  TestHDFSServerPorts is more likely than normal execution to 
stimulate the race, because it runs two Namenodes in the same JVM, causing more 
interleaving and more potential to see a race condition.

The race is in FSNamesystem.close(), line 566, we have:
  if (replthread != null) replthread.interrupt();
  if (replmon != null) replmon = null;

Since the interrupted replthread is not waited on, there is a potential race 
condition with replmon being nulled before replthread is dead, but replthread 
references replmon in computeDatanodeWork() where the NullPointerException 
occurs.

The solution is either to wait on replthread or just don't null replmon.  The 
latter is preferred, since none of the sibling Namenode processing threads are 
waited on in close().

I'll attach a patch for .205.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira