Re: FileStatus.getLen(): bug in documentation or bug in implememtation?

2009-06-26 Thread Konstantin Shvachko

Documentation is wrong. Implementation wins.
Could you please file a bug.

Thanks,
--Konstantin

Dima Rzhevskiy wrote:

Hi all
I try get length of file hadoop(RawFilesysten or hdfs) .
In javadoc method  org.apache.hadoop.fs.FileStatus.getLen()  writtend that
this method return the length of this file, in blocks
But method return size in bytes.

Is this bug in documentation or implememtation?
I use  hadoop-0.18.3.


Dmitry Rzhevskiy.



Re: Add new Datnodes : Is redistribution of previous data required?

2009-06-24 Thread Konstantin Shvachko

These links should help you to rebalance the nodes:

http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing
http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Rebalancer
http://hadoop.apache.org/core/docs/current/commands_manual.html#balancer
http://issues.apache.org/jira/secure/attachment/12368261/RebalanceDesign6.pdf

--Konstantin

asif md wrote:

@Alex

 Thanks.

http://wiki.apache.org/hadoop/FAQ#6

has anyone any experience with this?

Please suggest.

On Wed, Jun 24, 2009 at 5:44 PM, Alex Loddengaard a...@cloudera.com wrote:


Hi,

Running the rebalancer script (by the way, you only need to run it once)
redistributes all of your data for you.  That is, after you've run the
rebalancer, your data should be stored evenly among your 10 nodes.

Alex

On Wed, Jun 24, 2009 at 2:50 PM, asif md asif.d...@gmail.com wrote:


hello everyone,

I have added 7 nodes to my 3 node cluster. I followed the following steps
to
do this

1. added the node's ip to conf/slaves at master
2. ran bin/start-balance.sh at each node

As i loaded the data when the size of the cluster was three which is now
TEN. Can i do anything to redistribute the data among all the nodes?

Any ideas appreciated.

Thanks and Regards

Asif.





Re: Max. Possible No. of Files

2009-06-05 Thread Konstantin Shvachko

There are some name-node memory estimates in this jira.
http://issues.apache.org/jira/browse/HADOOP-1687

With 16 GB you can normally have 60 million objects (files
+ blocks) on the name-node. The number of files would depend
on the file to block ratio.

--Konstantin


Brian Bockelman wrote:


On Jun 5, 2009, at 11:51 AM, Wasim Bari wrote:


Hi,
Does someone has some data regarding maximum possible number of 
files over HDFS ?




Hey Wasim,

I don't think that there is a maximum limit.  Remember:
1) Less is better.  HDFS is optimized for big files.
2) The amount of memory the HDFS namenode needs is a function of the 
number of files.  If you have a huge number of files, you get a huge 
memory requirement.


1-2 million files is fairly safe if you have a normal-looking namenode 
server (8-16GB RAM).  I know some of our UCSD colleagues just ran a test 
where they were able to put more than .5M files in a single directory 
and still have a useable file system.


Brian

my second question is, I created small files with small block size up 
to one lac and read the files from HDFS, reading performance remains 
almost unaffected with increasing number of files.


The possible reasons I could think are:

1  . One lac isn't a big number to disturb HDFS performance (I used 1 
namenode and 4 data nodes)


2.  As reading is done directly from datanode with first time 
interaction with namenode, so reading from different nodes doesn't 
affect the performance.



If someone could add or negate some information it will be highly 
appreciated.


Cheers,
Wasim





Re: Setting up another machine as secondary node

2009-05-27 Thread Konstantin Shvachko

I don't think you will find any step-by-step instructions.
Somebody has already mentioned in replies below that secondary node is NOT
a fail-over node. You can read about it here:
http://wiki.apache.org/hadoop/FAQ#7
http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Secondary+NameNode

In fact, Secondary NameNode is a checkpointer only: it cannot process
heartbeats from data-nodes or ls commands from hdfs clients.
You probably meant to do after NameNode failed on node1 is:
stop Secondary node on node2 and then start the real
NameNode on node2. You will also have to restart data-nodes to redirect
them to the new name-node.

Another way to model a fail-over is to play with the Backup node, which
is only available in trunk (not in 0.19, which you seem to be using), and
is supposed to replace secondary node in 0.21.

Backup node is a real name-node and it can start processing heartbeats
and client commands if you redirect them to the Backup node.
I guess nobody tried it yet. So please share your experience.

Regards,
--Konstantin


Rakhi Khatwani wrote:

Hi,
  Thanks for the suggestions. but my scenario is a little different.
i am doin a POC on namenode failover.

i have a 5 cluster node setup in which one acts as a master, 3 acts as
slaves and the last one, the secondary node.

i start my hadoop dfs, write something into it... and later kill my
namenode. (tryin to produce a real worls scenario where my namenode fails
due to some hardware error).

so my aim is to start the secondary node as the primary m/c.
so tht the dfs is intact (by copyin the checkpoint info)
and all the slave pcs becoming the slaves of the secondary namenode now.

1. Can this be achieved without shuttin down the cluster?... i have read
this somewhere... but coudnt achieve it.

2. Whats the step by step instruction to achieve it?.. i hv google it, got a
lot of different opinions n m totally confused now.

Thanks,
Raakhi




On Tue, May 26, 2009 at 11:27 PM, Konstantin Shvachko s...@yahoo-inc.comwrote:


Hi Rakhi,

This is because your name-node is trying to -importCheckpoint from a
directory,
which is locked by secondary name-node.
The secondary node is also running in your case, right?
You should use -importCheckpoint as the last resort, when name-node's
directories
are damaged.
In regular case you start name-node with
./hadoop-daemon.sh start namenode

Thanks,
--Konstantin


Rakhi Khatwani wrote:


Hi,
  I followed the instructions suggested by you all. but i still
come across this exception when i use the following command:
./hadoop-daemon.sh start namenode -importCheckpoint

the exception is as follows:
2009-05-26 14:43:48,004 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = germapp/192.168.0.1
STARTUP_MSG:   args = [-importCheckpoint]
STARTUP_MSG:   version = 0.19.0
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r
713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
/
2009-05-26 14:43:48,147 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=4
2009-05-26 14:43:48,154 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
germapp/192.168.0.1:4
2009-05-26 14:43:48,160 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2009-05-26 14:43:48,166 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-05-26 14:43:48,316 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=ithurs,ithurs
2009-05-26 14:43:48,317 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
supergroup=supergroup
2009-05-26 14:43:48,317 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2009-05-26 14:43:48,343 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-05-26 14:43:48,347 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2009-05-26 14:43:48,455 INFO
org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/tmp/hadoop-ithurs/dfs/name is not formatted.
2009-05-26 14:43:48,455 INFO
org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2009-05-26 14:43:48,457 INFO
org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage
/tmp/hadoop-ithurs/dfs/namesecondary. The directory is already locked.
2009-05-26 14:43:48,460 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.IOException: Cannot lock storage
/tmp/hadoop-ithurs/dfs/namesecondary. The directory is already locked

Re: Setting up another machine as secondary node

2009-05-26 Thread Konstantin Shvachko

Hi Rakhi,

This is because your name-node is trying to -importCheckpoint from a directory,
which is locked by secondary name-node.
The secondary node is also running in your case, right?
You should use -importCheckpoint as the last resort, when name-node's 
directories
are damaged.
In regular case you start name-node with
./hadoop-daemon.sh start namenode

Thanks,
--Konstantin

Rakhi Khatwani wrote:

Hi,
   I followed the instructions suggested by you all. but i still
come across this exception when i use the following command:
./hadoop-daemon.sh start namenode -importCheckpoint

the exception is as follows:
2009-05-26 14:43:48,004 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = germapp/192.168.0.1
STARTUP_MSG:   args = [-importCheckpoint]
STARTUP_MSG:   version = 0.19.0
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r
713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
/
2009-05-26 14:43:48,147 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=4
2009-05-26 14:43:48,154 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
germapp/192.168.0.1:4
2009-05-26 14:43:48,160 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2009-05-26 14:43:48,166 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-05-26 14:43:48,316 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=ithurs,ithurs
2009-05-26 14:43:48,317 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
supergroup=supergroup
2009-05-26 14:43:48,317 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2009-05-26 14:43:48,343 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-05-26 14:43:48,347 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2009-05-26 14:43:48,455 INFO
org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/tmp/hadoop-ithurs/dfs/name is not formatted.
2009-05-26 14:43:48,455 INFO
org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2009-05-26 14:43:48,457 INFO
org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage
/tmp/hadoop-ithurs/dfs/namesecondary. The directory is already locked.
2009-05-26 14:43:48,460 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.IOException: Cannot lock storage
/tmp/hadoop-ithurs/dfs/namesecondary. The directory is already locked.
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:510)
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:273)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:504)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:344)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:290)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:208)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:194)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
2009-05-26 14:43:48,464 INFO org.apache.hadoop.ipc.Server: Stopping
server on 4
2009-05-26 14:43:48,466 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException:
Cannot lock storage /tmp/hadoop-ithurs/dfs/namesecondary. The
directory is already locked.
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:510)
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:273)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:504)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:344)
at 

Re: HDFS read/write speeds, and read optimization

2009-04-10 Thread Konstantin Shvachko

I just wanted to add to this one other published benchmark
http://developer.yahoo.net/blogs/hadoop/2008/09/scaling_hadoop_to_4000_nodes_a.html
In this example on a very busy cluster of 4000 nodes both read and write 
throughputs
were close to the local disk bandwidth.
This benchmark (called TestDFSIO) uses large consequent write and reads.
You can run it yourself on your hardware to compare.


Is it more efficient to unify the disks into one volume (RAID or LVM), and
then present them as a single space? Or it's better to specify each disk
separately?


There was a discussion recently on this list about RAID0 vs separate disks.
Please search the archives. Separate disks turn out to perform better.


Reliability-wise, the latter sounds more correct, as a single/several (up to
3) disks going down won't take the whole node with them. But perhaps there
is a performance penalty?


You always have block replicas on other nodes, so one node going down should 
not be a problem.

Thanks,
--Konstantin


Re: Not a host:port pair when running balancer

2009-03-11 Thread Konstantin Shvachko

Clarifying: port # is missing in your configuration, should be
property
  namefs.default.name/name
  valuehdfs://hvcwydev0601:8020/value
/property

where 8020 is your port number.

--Konstantin



Hairong Kuang wrote:

Please try using the port number 8020.

Hairong

On 3/11/09 9:42 AM, Stuart White stuart.whi...@gmail.com wrote:


I've been running hadoop-0.19.0 for several weeks successfully.

Today, for the first time, I tried to run the balancer, and I'm receiving:

java.lang.RuntimeException: Not a host:port pair: hvcwydev0601

In my hadoop-site.xml, I have this:

property
  namefs.default.name/name
  valuehdfs://hvcwydev0601//value
/property

What do I need to change to get the balancer to work?  It seems I need
to add a port to fs.default.name.  If so, what port?  Can I just pick
any port?  If I specify a port, do I need to specify any other parms
accordingly?

I searched the forum, and found a few posts on this topic, but it
seems that the configuration parms have changed over time, so I'm not
sure what the current correct configuration is.

Also, if fs.default.name is supposed to have a port, I'll point out
that the docs don't say so:
http://hadoop.apache.org/core/docs/r0.19.1/cluster_setup.html

The example given for fs.default.name is hdfs://hostname/.

Thanks!





Re: Not a host:port pair when running balancer

2009-03-11 Thread Konstantin Shvachko

This is not about the default port.
The port was not specified at all in the original configuration.

--Konstantin

Doug Cutting wrote:

Konstantin Shvachko wrote:

Clarifying: port # is missing in your configuration, should be
property
  namefs.default.name/name
  valuehdfs://hvcwydev0601:8020/value
/property

where 8020 is your port number.


That's the work-around, but it's a bug.  One should not need to specify 
the default port number (8020).  Please file an issue in Jira.


Doug



Re: HDFS issues in 0.17.2.1 and 0.19.0 versions

2009-02-02 Thread Konstantin Shvachko

Are you sure you were using 0.19 not 0.20 ?

For 0.17 please check that configuration file hadoop-site.xml exists
in your configuration directory is not empty and points to hdfs rather
than local file system, which it does buy default.
In 0.17 all config variables have been in a common file. 0.19 was the same.
0.20 changed it so now we have hdfs-site.xml, core-site.xml, mapred-site.xml
See
https://issues.apache.org/jira/browse/HADOOP-4631

Hope this helps.
--Konstantin

Shyam Sarkar wrote:

Hello,

I am trying to understand the clustering inside 0.17.2.1 as opposed to
0.19.0 versions. I am trying to
create a directory inside 0.17.2.1  HDFS but it creates in Linux FS.
However, I can do that in 0.19.0
without any problem.

Can someone suggest what should I do for 0.17.2.1 so that I can create
directory in HDFS?

Thanks,
shyam.s.sar...@gmail.com



Re: HDFS loosing blocks or connection error

2009-01-23 Thread Konstantin Shvachko

Yes guys. We observed such problems.
They will be common for 0.18.2 and 0.19.0 exactly as you
described it when data-nodes become unstable.

There were several issues, please take a look
HADOOP-4997 workaround for tmp file handling on DataNodes
HADOOP-4663 - links to other related
HADOOP-4810 Data lost at cluster startup
HADOOP-4702 Failed block replication leaves an incomplete block


We run 0.18.3 now and it does not have these problems.
0.19.1 should be the same.

Thanks,
--Konstantin

Zak, Richard [USA] wrote:

It happens right after the MR job (though once or twice its happened
during).  I am not using EBS, just HDFS between the machines.  As for tasks,
there are 4 mappers and 0 reducers.


Richard J. Zak

-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
Jean-Daniel Cryans
Sent: Friday, January 23, 2009 13:24
To: core-user@hadoop.apache.org
Subject: Re: HDFS loosing blocks or connection error

xlarge is good. Is it normally happening during a MR job? If so, how many
tasks do you have running at the same moment overall? Also, is your data
stored on EBS?

Thx,

J-D

On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA]
zak_rich...@bah.comwrote:


4 slaves, 1 master, all are the m1.xlarge instance type.


Richard J. Zak

-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of 
Jean-Daniel Cryans

Sent: Friday, January 23, 2009 12:34
To: core-user@hadoop.apache.org
Subject: Re: HDFS loosing blocks or connection error

Richard,

This happens when the datanodes are too slow and eventually all 
replicas for a single block are tagged as bad.  What kind of 
instances are you using?

How many of them?

J-D

On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA]
zak_rich...@bah.comwrote:

 Might there be a reason for why this seems to routinely happen to 
me when using Hadoop 0.19.0 on Amazon EC2?


09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node:  java.io.IOException: 
No live nodes contain current block

09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node:  java.io.IOException: 
No live nodes contain current block

09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node:  java.io.IOException: 
No live nodes contain current block

09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
Could not obtain block: blk_-1757733438820764312_6736 
file=/stats.txt It seems hdfs isn't so robust or reliable as the 
website says and/or I have a configuration issue.



 Richard J. Zak



Re: Hadoop 0.17.1 = EOFException reading FSEdits file, what causes this? how to prevent?

2009-01-15 Thread Konstantin Shvachko

Joe,

It looks like you edits file is corrupted or truncated.
Most probably the last modification was not written to it,
when the name-node was turned off. This may happen if the
node crashes depending on the underlying local file system I guess.

Here are some options for you to consider:
- try an alternative replica of the image directory if you had one.
- try to edit the edits file if you know the internal format.
- try to modify local copy of your name-node code, which should
catch EOFException and ignore it.
- Use a checkpointed image if you can afford to loose latest modifications to 
the fs.
- Formatting of cause is the last resort since you loose everything.

Thanks,
--Konstantin

Joe Montanez wrote:

Hi:

 


I'm using Hadoop 0.17.1 and I'm encountering EOFException reading the
FSEdits file.  I don't have a clear understanding what is causing this
and how to prevent this.  Has anyone seen this and can advise?

 


Thanks in advance,

Joe

 


2009-01-12 22:51:45,573 ERROR org.apache.hadoop.dfs.NameNode:
java.io.EOFException

at java.io.DataInputStream.readFully(DataInputStream.java:180)

at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)

at
org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)

at
org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:599)

at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766)

at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640)

at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:223)

at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)

at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274)

at
org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:255)

at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133)

at org.apache.hadoop.dfs.NameNode.init(NameNode.java:178)

at org.apache.hadoop.dfs.NameNode.init(NameNode.java:164)

at
org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)

at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)

 


2009-01-12 22:51:45,574 INFO org.apache.hadoop.dfs.NameNode:
SHUTDOWN_MSG:

 





Re: Locks in hadoop

2009-01-15 Thread Konstantin Shvachko

Did you look at Zookeeper?
Thanks,
--Konstantin

Sagar Naik wrote:

I would like to implement a  locking mechanism across the hdfs cluster
I assume there is no inherent support for it

I was going to do it with files. According to my knowledge, file 
creation is an atomic operation. So the file-based lock should work.
I need to think through with all conditions but if some one has better 
idea/solution, pl share


Thanks
-Sagar




Re: TestDFSIO delivers bad values of throughput and average IO rate

2009-01-14 Thread Konstantin Shvachko

In TestDFSIO we want each task to create only one file.
It is a one-to-one mapping from files to map tasks.
And splits are defined so that each map gets only
one file name, which it creates or reads.

--Konstantin

tienduc_dinh wrote:

I don't understand, why the parameter -nrFiles of TestDFSIO should override
mapred.map.tasks. 
nrFiles is the number of the files which will be created and

mapred.map.tasks is the number how many splits will be done by the input
file.

Thanks


Konstantin Shvachko wrote:

Hi tienduc_dinh,

Just a bit of a background, which should help to answer your questions.
TestDFSIO mappers perform one operation (read or write) each, measure
the time taken by the operation and output the following three values:
(I am intentionally omitting some other output stuff.)
- size(i)
- time(i)
- rate(i) = size(i) / time(i)
i is the index of the map task 0 = i  N, and N is the -nrFiles value,
which equals the number of maps.

Then the reduce sums those values and writes them into part-0.
That is you get three fields in it
size = size(0) + ... + size(N-1)
time = time(0) + ... + time(N-1)
rate = rate(0) + ... + rate(N-1)

Then we calculate
throughput = size / time
averageIORate = rate / N

So answering your questions
- There should be only one reduce task, otherwise you will have to
manually sum corresponding values in part-0 and part-1.
- The value of the :rate after the reduce equals the sum of individual
rates of each operation. So if you want to have an average you should
divide it by the number tasks rather than multiply.

Now, in your case you create only one file -nrFiles 1, which means
you run only one map task.
Setting mapred.map.tasks to 10 in hadoop-site.xml defines the default
number of tasks per job. See here
http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.map.tasks
In case of TestDFSIO it will be overridden by -nrFiles.

Hope this answers your questions.
Thanks,
--Konstantin



tienduc_dinh wrote:

Hello,

I'm now using hadoop-0.18.0 and testing it on a cluster with 1 master and
4
slaves. In hadoop-site.xml the value of mapred.map.tasks is 10. Because
the values throughput and average IO rate are similar, I just post
the
values of throughput of the same command with 3 times running

-  hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048
-nrFiles 1

+ with dfs.replication = 1 = 33,60 / 31,48 / 30,95

+ with dfs.replication = 2 = 26,40 / 20,99 / 21,70

I find something strange while reading the source code. 

- The value of mapred.reduce.tasks is always set to 1 


job.setNumReduceTasks(1) in the function runIOTest()  and reduceFile =
new
Path(WRITE_DIR, part-0) in analyzeResult().

So I think, if we properly have mapred.reduce.tasks = 2, we will have on
the
file system 2 Paths to part-0 and part-1, e.g.
/benchmarks/TestDFSIO/io_write/part-0

- And i don't understand the line with double med = rate / 1000 /
tasks.
Is it not double med = rate * tasks / 1000 






Re: NameNode fatal crash - 0.18.1

2009-01-09 Thread Konstantin Shvachko

Hi, Jonathan.
The problem is that the local drive(s) you use for dfs.name.dir became
unaccessible. So the name-node is not able to persist name-space modifications
anymore, and therefore self terminated.
The rest are the consequences.
This is the core message
 2008-12-15 01:49:31,178 FATAL org.apache.hadoop.fs.FSNamesystem: Fatal Error
 : All storage directories are inaccessible.
Could you please check the drives.
--Konstantin


Jonathan Gray wrote:

I have a 10+1 node cluster, each slave running DataNode/TaskTracker/HBase
RegionServer.

At the time of this crash, NameNode and SecondaryNameNode were both hosted
on same master.

We do a nightly backup and about 95% of the way through, HDFS crashed
with...

NameNode shows:

2008-12-15 01:49:31,178 ERROR org.apache.hadoop.fs.FSNamesystem: Unable to
sync edit log. Fatal Error.
2008-12-15 01:49:31,178 FATAL org.apache.hadoop.fs.FSNamesystem: Fatal Error
: All storage directories are inaccessible.
2008-12-15 01:49:31,179 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:

Every single DataNode shows:

2008-12-15 01:49:32,340 WARN org.apache.hadoop.dfs.DataNode:
java.io.IOException: Call failed on local exception
at org.apache.hadoop.ipc.Client.call(Client.java:718)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy4.sendHeartbeat(Unknown Source)
at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:655)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:499)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:441)


This is virtually all of the information I have.  At the same time as the
backup, we have normal HBase traffic and our hourly batch MR jobs.  So slave
nodes were pretty heavily loaded, but don't see anything in DN logs besides
this Call failed.  There are no space issues or anything else, Ganglia shows
high CPU load around this time which has been typical every night, but I
don't see any issues in DN's or NN about expired leases/no heartbeats/etc. 


Is there a way to prevent this failure from happening in the first place?  I
guess just reduce total load across cluster?

Second question is about how to recover once NameNode does fail...

When trying to bring HDFS back up, we get hundreds of:

2008-12-15 07:54:13,265 ERROR org.apache.hadoop.dfs.LeaseManager: XXX not
found in lease.paths

And then

2008-12-15 07:54:13,267 ERROR org.apache.hadoop.fs.FSNamesystem:
FSNamesystem initialization failed.


Is there a way to recover from this?  As of time of this crash, we had
SecondaryNameNode on the same node.  Moving it to another node with
sufficient memory now, but would that even prevent this kind of FS botching?

Also, my SecondaryNameNode is telling me it cannot connect when trying to do
a checkpoint:

2008-12-15 09:59:48,017 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:
2008-12-15 09:59:48,018 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)

I changed my masters file to just contain the hostname of the
secondarynamenode, this seems to have properly started the NameNode where I
launched the ./bin/start-dfs.sh from and started SecondaryNameNode on
correct node as well.  But it seems to be unable to connect back to primary.
I have hadoop-site.xml pointing to fs.default.name of primary, but otherwise
there are not links back.  Where would I specify to the secondary where
primary is located?

We're also upgrading to Hadoop 0.19.0 at this time.

Thank you for any help.

Jonathan Gray




Re: 0.18.1 datanode psuedo deadlock problem

2009-01-09 Thread Konstantin Shvachko

Hi Jason,

2 million blocks per data-node is not going to work.
There were discussions about it previously, please
check the mail archives.

This means you have a lot of very small files, which
HDFS is not designed to support. A general recommendation
is to group small files into large ones, introducing
some kind of record structure delimiting those small files,
and control it in on the application level.

Thanks,
--Konstantin


Jason Venner wrote:
The problem we are having is that datanodes periodically stall for 10-15 
minutes and drop off the active list and then come back.


What is going on is that a long operation set is holding the lock on on 
FSDataset.volumes, and all of the other block service requests stall 
behind this lock.


DataNode: [/data/dfs-video-18/dfs/data] daemon prio=10 tid=0x4d7ad400 
nid=0x7c40 runnable [0x4c698000..0x4c6990d0]

  java.lang.Thread.State: RUNNABLE
   at java.lang.String.lastIndexOf(String.java:1628)
   at java.io.File.getName(File.java:399)
   at 
org.apache.hadoop.dfs.FSDataset$FSDir.getGenerationStampFromFile(FSDataset.java:148) 

   at 
org.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:181)
   at 
org.apache.hadoop.dfs.FSDataset$FSVolume.getBlockInfo(FSDataset.java:412)
   at 
org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getBlockInfo(FSDataset.java:511) 


   - locked 0x551e8d48 (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
   at org.apache.hadoop.dfs.FSDataset.getBlockReport(FSDataset.java:1053)
   at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:708)
   at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2890)
   at java.lang.Thread.run(Thread.java:619)

This is basically taking a stat on every hdfs block on the datanode, 
which in our case is ~ 2million, and can take 10+ minutes (we may be 
experiencing problems with our raid controller but have no visibility 
into it) at the OS level the file system seems fine and operations 
eventually finish.


It appears that a couple of different data structures are being locked 
with the single object FSDataset$Volume.


Then this happens:
org.apache.hadoop.dfs.datanode$dataxcei...@1bcee17 daemon prio=10 
tid=0x4da8d000 nid=0x7ae4 waiting for monitor entry 
[0x459fe000..0x459ff0d0]

  java.lang.Thread.State: BLOCKED (on object monitor)
   at 
org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:473) 

   - waiting to lock 0x551e8d48 (a 
org.apache.hadoop.dfs.FSDataset$FSVolumeSet)

   at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:934)
   - locked 0x54e550e0 (a org.apache.hadoop.dfs.FSDataset)
   at 
org.apache.hadoop.dfs.DataNode$BlockReceiver.init(DataNode.java:2322)
   at 
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1187)

   at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1045)
   at java.lang.Thread.run(Thread.java:619)

which locks the FSDataset while waiting on the volume object

and now all of the Datanode operations stall waiting on the FSDataset 
object.

--

Our particular installation doesn't use multiple directories for hdfs, 
so a first simple hack for a local fix would be to modify getNextVolume 
to just return the single volume and not be synchronized


A richer alternative would be to make the locking more fine grained on 
FSDataset$FSVolumeSet.


Of course we are also trying to fix the file system performance and dfs 
block loading that results in the block report taking a long time.


Any suggestions or warnings?

Thanks.





Re: TestDFSIO delivers bad values of throughput and average IO rate

2009-01-07 Thread Konstantin Shvachko



tienduc_dinh wrote:

Hi Konstantin,

thanks so much for your help. I was a litte bit confused about why my
setting mapred.map.tasks = 10 in hadoop-site.xml, but hadoop didn't map
anything. So your answer with 


In case of TestDFSIO it will be overridden by -nrFiles.


is the key. 

I need now your confirm to know, if I've understood it right. 


That is correct.


+ If I want to write 2 GB with 1 map task, I should use the following
command.


hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048 -nrFiles
1 


The values of throughput are, e.g. 33,60 / 31,48 / 30,95. 


+ If I want to write 2 GB with 4 map tasks, I should use the following
command.


hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 5012 -nrFiles
4


You are writing 20GB not 2GB.
Should be 512 instead of 5012.

The values of throughput are, e.g. 31,50 / 32,09 / 30,56. 


Can you please explain me, why the values in case 2 are much better. I have
1 master and 4 slaves and if I calculate it right, they must be even 4 times
higher, right ?


throughput is mb/sec per client.
It is great that you get the same numbers for 1 write and 4 parallel writes.
This means that Hadoop on your cluster scales well! :-)


Sorry for my poor english skill and thanks very much for your help.

Tien Duc Dinh


Konstantin Shvachko wrote:

Hi tienduc_dinh,

Just a bit of a background, which should help to answer your questions.
TestDFSIO mappers perform one operation (read or write) each, measure
the time taken by the operation and output the following three values:
(I am intentionally omitting some other output stuff.)
- size(i)
- time(i)
- rate(i) = size(i) / time(i)
i is the index of the map task 0 = i  N, and N is the -nrFiles value,
which equals the number of maps.

Then the reduce sums those values and writes them into part-0.
That is you get three fields in it
size = size(0) + ... + size(N-1)
time = time(0) + ... + time(N-1)
rate = rate(0) + ... + rate(N-1)

Then we calculate
throughput = size / time
averageIORate = rate / N

So answering your questions
- There should be only one reduce task, otherwise you will have to
manually sum corresponding values in part-0 and part-1.
- The value of the :rate after the reduce equals the sum of individual
rates of each operation. So if you want to have an average you should
divide it by the number tasks rather than multiply.

Now, in your case you create only one file -nrFiles 1, which means
you run only one map task.
Setting mapred.map.tasks to 10 in hadoop-site.xml defines the default
number of tasks per job. See here
http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.map.tasks
In case of TestDFSIO it will be overridden by -nrFiles.

Hope this answers your questions.
Thanks,
--Konstantin



tienduc_dinh wrote:

Hello,

I'm now using hadoop-0.18.0 and testing it on a cluster with 1 master and
4
slaves. In hadoop-site.xml the value of mapred.map.tasks is 10. Because
the values throughput and average IO rate are similar, I just post
the
values of throughput of the same command with 3 times running

-  hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048
-nrFiles 1

+ with dfs.replication = 1 = 33,60 / 31,48 / 30,95

+ with dfs.replication = 2 = 26,40 / 20,99 / 21,70

I find something strange while reading the source code. 

- The value of mapred.reduce.tasks is always set to 1 


job.setNumReduceTasks(1) in the function runIOTest()  and reduceFile =
new
Path(WRITE_DIR, part-0) in analyzeResult().

So I think, if we properly have mapred.reduce.tasks = 2, we will have on
the
file system 2 Paths to part-0 and part-1, e.g.
/benchmarks/TestDFSIO/io_write/part-0

- And i don't understand the line with double med = rate / 1000 /
tasks.
Is it not double med = rate * tasks / 1000 






Re: TestDFSIO delivers bad values of throughput and average IO rate

2009-01-06 Thread Konstantin Shvachko

Hi tienduc_dinh,

Just a bit of a background, which should help to answer your questions.
TestDFSIO mappers perform one operation (read or write) each, measure
the time taken by the operation and output the following three values:
(I am intentionally omitting some other output stuff.)
- size(i)
- time(i)
- rate(i) = size(i) / time(i)
i is the index of the map task 0 = i  N, and N is the -nrFiles value,
which equals the number of maps.

Then the reduce sums those values and writes them into part-0.
That is you get three fields in it
size = size(0) + ... + size(N-1)
time = time(0) + ... + time(N-1)
rate = rate(0) + ... + rate(N-1)

Then we calculate
throughput = size / time
averageIORate = rate / N

So answering your questions
- There should be only one reduce task, otherwise you will have to
manually sum corresponding values in part-0 and part-1.
- The value of the :rate after the reduce equals the sum of individual
rates of each operation. So if you want to have an average you should
divide it by the number tasks rather than multiply.

Now, in your case you create only one file -nrFiles 1, which means
you run only one map task.
Setting mapred.map.tasks to 10 in hadoop-site.xml defines the default
number of tasks per job. See here
http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.map.tasks
In case of TestDFSIO it will be overridden by -nrFiles.

Hope this answers your questions.
Thanks,
--Konstantin



tienduc_dinh wrote:

Hello,

I'm now using hadoop-0.18.0 and testing it on a cluster with 1 master and 4
slaves. In hadoop-site.xml the value of mapred.map.tasks is 10. Because
the values throughput and average IO rate are similar, I just post the
values of throughput of the same command with 3 times running

-  hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048
-nrFiles 1

+ with dfs.replication = 1 = 33,60 / 31,48 / 30,95

+ with dfs.replication = 2 = 26,40 / 20,99 / 21,70

I find something strange while reading the source code. 

- The value of mapred.reduce.tasks is always set to 1 


job.setNumReduceTasks(1) in the function runIOTest()  and reduceFile = new
Path(WRITE_DIR, part-0) in analyzeResult().

So I think, if we properly have mapred.reduce.tasks = 2, we will have on the
file system 2 Paths to part-0 and part-1, e.g.
/benchmarks/TestDFSIO/io_write/part-0

- And i don't understand the line with double med = rate / 1000 / tasks.
Is it not double med = rate * tasks / 1000 


Re: DFS replication and Error Recovery on failure

2008-12-29 Thread Konstantin Shvachko



1) If i set value of dfs.replication to 3 only in hadoop-site.xml of
namenode(master) and
then restart the cluster will this take effect. or  i have to change
hadoop-site.xml at all slaves ?


dfs.replication is the name-node parameter, so you need to restart
only the name-node in order to reset the value.
I should mention that setting new value will not immediately change
replication of the existing blocks, because replication is per file,
and you need to use setReplication to change it.
Although for new files the replication will be set to the new value
automatically.


2)
What can be possible cause of following error at a datanode. ?
ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: Incompatible
namespaceIDs in
/mnt/hadoop28/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-hadoop/dfs/data:
namenode namespaceID = 1396640905; datanode namespaceID = 820259954


namespaceID provides cluster integrity. name- and data-nodes share the same 
value.
This either means you ran the data-nodes with another name-node,
or you reformatted the name-node recently.
It is better to have a dedicated directory for data-node storage rather
than use tmp.


If my data node goes down due to above error, what should i do in
following scenarios
1) i have some data on the currupted data node that i need to recover,
how can i recover that data ?


You should make sure first which cluster it belongs to.


2) If i dont care about the data, but i want the node back on the
cluster, can i just delete the /mnt/hadoop28/HADOOP/hadoop-0.16.3/tmp
and include the node back in the cluster?


Yes you can remove the directory if you dont need the data.

Thanks,
--Konstantin


Re: Detect Dead DataNode

2008-12-29 Thread Konstantin Shvachko



Sandeep Dhawan wrote:

Hi,

I have a setup of 2-node Hadoop cluster running on Windows using cygwin. 
When I open up the web gui to view the number of Live Nodes, it shows 2. 
But when I kill the slave node and refreshes the gui, it still shows the

number of Live Nodes as 2.

Its only after some 20-30 mins,


It should be 10 minutes by default.


that the master node is able to detect the
failure which is then reflected in the gui. It then shows up :

Live Node : 1
Dead Node : 1

Also, after killing the slave datanode if I try to copy a file from the
local file system, it fails. 


1. Is there a way by which we can configure the time interval after which
master node can declare a datanode as dead.


You can modify heartbeat.recheck.interval which by default is set to 5 min.
The expiration time is twice this, that is 10 min.
So you set heartbeat.recheck.interval = 1 min, then your nodes will be
expiring in 2 minutes.


2. Why does the file transfer fail when one of the slave node is dead and
masternode is alive.


There could be different reasons, you need to read the message returned.

Thanks,
--Konstantin


Re: Datanode handling of single disk failure

2008-12-19 Thread Konstantin Shvachko


Brian Bockelman wrote:

Hello all,

I'd like to take the datanode's capability to handle multiple 
directories to a somewhat-extreme, and get feedback on how well this 
might work.


We have a few large RAID servers (12 to 48 disks) which we'd like to 
transition to Hadoop.  I'd like to mount each of the disks individually 
(i.e., /mnt/disk1, /mnt/disk2, ) and take advantage of Hadoop's 
replication - instead of pay the overhead to set up a RAID and still 
have to pay the overhead of replication.


In my experience this is the right way to go.

However, we're a bit concerned about how well Hadoop might handle one of 
the directories disappearing from underneath it.  If a single volume, 
say, /mnt/disk1 starts returning I/O errors, is Hadoop smart enough to 
figure out that this whole volume is broken?  Or will we have to restart 
the datanode after any disk failure for it to search the directory 
realize everything is broken?  What happens if you start up the datanode 
with a data directory that it can't write into?


In current implementation if at any point Datanode detects an unwritable or
unreadable drive it shuts itself down logging a message what went wrong and
reporting the problem to the name-node.
So yes if such thing happens you will have to restart the data-node.
But since the cluster takes care of data-node failures by re-replicating
lost blocks that should not be a problem.

Is anyone running in this fashion (i.e., multiple data directories 
corresponding to different disk volumes ... even better if you're doing 
it with more than a few disks)?


We have a large experience running 4 drives per data-node (no RAID).
So this is not something new or untested.

Thanks,
--Konstantin


Re: question: NameNode hanging on startup as it intends to leave safe mode

2008-12-10 Thread Konstantin Shvachko

This is probably related to HADOOP-4795.
http://issues.apache.org/jira/browse/HADOOP-4795

We are testing it on 0.18 now. Should be committed soon.
Please let know if it is something else.

Thanks,
--Konstantin

Karl Kleinpaste wrote:

We have a cluster comprised of 21 nodes holding a total capacity of
about 55T where we have had a problem twice in the last couple weeks on
startup of NameNode.  We are running 0.18.1.  DFS space is currently
just below the halfway point of actual occupation, about 25T.

Symptom is that there is normal startup logging on NameNode's part,
where it self-analyzes its expected DFS content, reports #files known,
and begins to accept reports from slaves' DataNodes about blocks they
hold.  During this time, NameNode is in safe mode pending adequate block
discovery from slaves.  As the fraction of reported blocks rises,
eventually it hits the required 0.9990 threshold and announces that it
will leave safe mode in 30 seconds.

The problem occurs when, at the point of logging 0 seconds to leave
safe mode, NameNode hangs: It uses no more CPU; it logs nothing
further; it stops responding on its port 50070 web interface; hadoop
fs commands report no contact with NameNode; netstat -atp shows a
number of open connections on 9000 and 50070, indicating the connections
are being accepted, but NameNode never processes them.

This has happened twice in the last 2 weeks and it has us fairly
concerned.  Both times, it has been adequate simply to start over again,
and NameNode successfully comes to life the 2nd time around.  Is anyone
else familiar with this sort of hang, and do you know of any solutions?




Re: When is decomissioning done?

2008-12-04 Thread Konstantin Shvachko

Just for the reference these links:
http://wiki.apache.org/hadoop/FAQ#17
http://hadoop.apache.org/core/docs/r0.19.0/hdfs_user_guide.html#DFSAdmin+Command

Decommissioning is not happening at once.
-refreshNodes just starts the process, but does not complete it.
There could be a lot of blocks on the nodes you want to decommission,
and replication takes time.
The progress can be monitored on the name-node web UI.
Right after -refreshNodes on the web ui you will see the nodes you chose for
decommission have state Decommission In Progress you should wait until it is
changed to Decommissioned and then turn the node off.

--Konstantin


David Hall wrote:

I'm starting to think I'm doing things wrong.

I have an absolute path to dfs.hosts.exclude that includes what i want
decommissioned, and a dfs.hosts which includes those i want to remain
commissioned (this points to the slaves file).

Nothing seems to do anything...

What am I missing?

-- David

On Thu, Dec 4, 2008 at 12:48 AM, David Hall [EMAIL PROTECTED] wrote:

Hi,

I'm trying to decommission some nodes. The process I tried to follow is:

1) add them to conf/excluding (hadoop-site points there)
2) invoke hadoop dfsadmin -refreshNodes

This returns immediately, so I thought it was done, so i killed off
the cluster and rebooted without the new nodes, but then fsck was very
unhappy...

Is there some way to watch the progress of decomissioning?

Thanks,
-- David





Re: Hadoop Development Status

2008-11-20 Thread Konstantin Shvachko

This is very nice.
A suggestion if it is related to the development status.
Do you think guys you can analyze which questions are
discussed most often in the mailing lists, so that we could
update our FAQs based on that.
Thanks,
--Konstantin


Alex Loddengaard wrote:

Some engineers here at Cloudera have been working on a website to report on
Hadoop development status, and we're happy to announce that the website is
now available!  We've written a blog post describing its usefulness, goals,
and future, so take a look if you're interested:


http://www.cloudera.com/blog/2008/11/18/introducing-hadoop-development-status/

The tool is hosted here:

http://community.cloudera.com

Please give us any feedback or suggestions off-list, to avoid polluting the
list.

Enjoy!

Alex, Jeff, and Tom



Re: The Case of a Long Running Hadoop System

2008-11-17 Thread Konstantin Shvachko

Bagri,

According to the numbers you posted your cluster has 6,000,000 block replicas
and only 12 data-nodes. The blocks are small on average about 78KB according
to fsck. So each node contains about 40GB worth of block data.
But the number of blocks is really huge 500,000 per node. Is my math correct?
I haven't seen data-nodes that big yet.
The problem here is that a data-node keeps a map of all its blocks in memory.
The map is a HashMap. With 500,000 entries you can get long lookup times I 
guess.
And also block reports can take long time.

So I believe restarting name-node will not help you.
You should somehow pack your small files into larger ones.
Alternatively, you can increase your cluster size, probably 5 to 10 times 
larger.
I don't remember whether we had any optimization patches related to data-nodes
block map since 0.15. Please advise if anybody remembers.

Thanks,
--Konstantin


Abhijit Bagri wrote:
We do not have a secondary namenode because 0.15.3 has serious bug which 
truncates the namenode image if there is a failure while namenode 
fetches image from secondary namenode. See HADOOP-3069


I have a patched version of 0.15.3 for this issue. From the patch of 
HADOOP-3069, the changes are on namenode _and_ secondary namenode, which 
means I just cant fire up a seconday namenode.


- Bagri


On Nov 15, 2008, at 11:36 PM, Billy Pearson wrote:

If I understand the secondary namenode merges the edits log in to the 
fsimage and reduces the edit log size.
Which is likely the root of your problems 8.5G seams large and likely 
putting a strain on your master servers memory and io bandwidth

Why do you not have a secondary namenode?

If you do not have the memory on the master I would look in to 
stopping a datanode/tasktracker on a server and loading the secondary 
namenode on it


Let it run for a while and watch your log for the secondary namenode 
you should see your edit log get smaller


I am not an expert but that would be my first action.

Billy



Abhijit Bagri [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]

Hi,

This is a long mail as I have tried to put in as much details as might
help any of the Hadoop dev/users to help us out. The gist is this:

We have a long running Hadoop system (masters not restarted for about
3 months). We have recently started seeing the DFS responding very
slowly which has resulted in failures on a system which depends on
Hadoop. Further, the DFS seems to be an unstable state (i.e if fsck is
a good representation which I believe it is). The edits file

These are the details (skip/return here later and jump to the
questions at the end of the mail for a quicker read) :

Hadoop Version: 0.15.3 on 32 bit systems.

Number of slaves: 12
Slaves heap size: 1G
Namenode heap: 2G
Jobtracker heap: 2G

The namenode and jobtrackers have not been restarted for about 3
months. We did restart slaves(all of them within a few hours) a few
times for some maintaineance in between though. We do not have a
secondary namenode in place.

There is another system X which talks to this hadoop cluster. X writes
to the Hadoop DFS and submits jobs to the Jobtracker. The number of
jobs submitted to Hadoop so far is over 650,000 ( I am using the job
id for jobs for this), each job may rad/write to multiple files and
has several dependent libraries which it loads from Distributed Cache.

Recently, we started seeing that there were several timeouts happening
while X tries to read/write to the DFS. This in turn results in DFS
becoming very slow in response. The writes are especially slow. The
trace we get in the logs are:

java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:129)
   at java.net.SocketInputStream.read(SocketInputStream.java:182)
   at java.io.DataInputStream.readShort(DataInputStream.java:284)
   at org.apache.hadoop.dfs.DFSClient
$DFSOutputStream.endBlock(DFSClient.java:1660)
   at org.apache.hadoop.dfs.DFSClient
$DFSOutputStream.close(DFSClient.java:1733)
   at org.apache.hadoop.fs.FSDataOutputStream
$PositionCache.close(FSDataOutputStream.java:49)
   at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:
64)
   ...

Also, datanode logs show a lot of traces like these:

2008-11-14 21:21:49,429 ERROR org.apache.hadoop.dfs.DataNode:
DataXceiver: java.io.IOException: Block blk_-1310124865741110666 is
valid, and cannot be written to.
   at
org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:551)
   at org.apache.hadoop.dfs.DataNode
$BlockReceiver.init(DataNode.java:1257)
   at org.apache.hadoop.dfs.DataNode
$DataXceiver.writeBlock(DataNode.java:901)
   at org.apache.hadoop.dfs.DataNode
$DataXceiver.run(DataNode.java:804)
   at java.lang.Thread.run(Thread.java:595)

and these

2008-11-14 21:21:50,695 WARN org.apache.hadoop.dfs.DataNode:
java.io.IOException: Error in deleting 

Re: SecondaryNameNode on separate machine

2008-11-03 Thread Konstantin Shvachko

You can either do what you just described with dfs.name.dir = dirX
or you can start name-node with -importCheckpoint option.
This is an automation for copying image files from secondary to primary.

See here:
http://hadoop.apache.org/core/docs/current/commands_manual.html#namenode
http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Secondary+NameNode
http://issues.apache.org/jira/browse/HADOOP-2585#action_12584755

--Konstantin

Tomislav Poljak wrote:

Hi,
Thank you all for your time and your answers!

Now SecondaryNameNode connects to the NameNode (after I configured
dfs.http.address to the NN's http server - NN hostname on port 50070)
and creates(transfers) edits and fsimage from NameNode.

Can you explain me a little bit more how NameNode failover should work
now? 


For example, SecondaryNameNode now stores fsimage and edits to (SNN's)
dirX and let's say NameNode goes down (disk becomes unreadable). Now I
create/dedicate a new machine for NameNode (also change DNS to point to
this new NameNode machine as nameNode host) and take the data dirX from
SNN and copy it to new NameNode. How do I configure new NameNode to use
data from dirX (do I configure dfs.name.dir to point to dirX and start
new NameNode)?

Thanks,
Tomislav



On Fri, 2008-10-31 at 11:38 -0700, Konstantin Shvachko wrote:

True, dfs.http.address is the NN Web UI address.
This where the NN http server runs. Besides the Web UI there also
a servlet running on that server which is used to transfer image
and edits from NN to the secondary using http get.
So SNN uses both addresses fs.default.name and dfs.http.address.

When SNN finishes the checkpoint the primary needs to transfer the
resulting image back. This is done via the http server running on SNN.

Answering Tomislav's question:
The difference between fs.default.name and dfs.http.address is that
fs.default.name is the name-node's PRC address, where clients and
data-nodes connect to, while dfs.http.address is the NN's http server
address where our browsers connect to, but it is also used for
transferring image and edits files.

--Konstantin

Otis Gospodnetic wrote:

Konstantin  Co, please correct me if I'm wrong, but looking at 
hadoop-default.xml makes me think that dfs.http.address is only the URL for the NN 
*Web UI*.  In other words, this is where we people go look at the NN.

The secondary NN must then be using only the Primary NN URL specified in fs.default.name. 
 This URL looks like hdfs://name-node-hostname-here/.  Something in Hadoop then knows the 
exact port for the Primary NN based on the URI schema (e.g. hdfs://) in this 
URL.

Is this correct?


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Tomislav Poljak [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Thursday, October 30, 2008 1:52:18 PM
Subject: Re: SecondaryNameNode on separate machine

Hi,
can you, please, explain the difference between fs.default.name and
dfs.http.address (like how and when is SecondaryNameNode using
fs.default.name and how/when dfs.http.address). I have set them both to
same (namenode's) hostname:port. Is this correct (or dfs.http.address
needs some other port)? 


Thanks,

Tomislav

On Wed, 2008-10-29 at 16:10 -0700, Konstantin Shvachko wrote:

SecondaryNameNode uses http protocol to transfer the image and the edits
from the primary name-node and vise versa.
So the secondary does not access local files on the primary directly.
The primary NN should know the secondary's http address.
And the secondary NN need to know both fs.default.name and dfs.http.address of 

the primary.

In general we usually create one configuration file hadoop-site.xml
and copy it to all other machines. So you don't need to set up different
values for all servers.

Regards,
--Konstantin

Tomislav Poljak wrote:

Hi,
I'm not clear on how does SecondaryNameNode communicates with NameNode
(if deployed on separate machine). Does SecondaryNameNode uses direct
connection (over some port and protocol) or is it enough for
SecondaryNameNode to have access to data which NameNode writes locally
on disk?

Tomislav

On Wed, 2008-10-29 at 09:08 -0400, Jean-Daniel Cryans wrote:

I think a lot of the confusion comes from this thread :
http://www.nabble.com/NameNode-failover-procedure-td11711842.html

Particularly because the wiki was updated with wrong information, not
maliciously I'm sure. This information is now gone for good.

Otis, your solution is pretty much like the one given by Dhruba Borthakur
and augmented by Konstantin Shvachko later in the thread but I never did it
myself.

One thing should be clear though, the NN is and will remain a SPOF (just
like HBase's Master) as long as a distributed manager service (like
Zookeeper) is not plugged into Hadoop to help with failover.

J-D

On Wed, Oct 29, 2008 at 2:12 AM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:


Hi,
So what is the recipe for avoiding NN SPOF using only what comes with
Hadoop

Re: SecondaryNameNode on separate machine

2008-10-29 Thread Konstantin Shvachko

SecondaryNameNode uses http protocol to transfer the image and the edits
from the primary name-node and vise versa.
So the secondary does not access local files on the primary directly.
The primary NN should know the secondary's http address.
And the secondary NN need to know both fs.default.name and dfs.http.address of 
the primary.

In general we usually create one configuration file hadoop-site.xml
and copy it to all other machines. So you don't need to set up different
values for all servers.

Regards,
--Konstantin

Tomislav Poljak wrote:

Hi,
I'm not clear on how does SecondaryNameNode communicates with NameNode
(if deployed on separate machine). Does SecondaryNameNode uses direct
connection (over some port and protocol) or is it enough for
SecondaryNameNode to have access to data which NameNode writes locally
on disk?

Tomislav

On Wed, 2008-10-29 at 09:08 -0400, Jean-Daniel Cryans wrote:

I think a lot of the confusion comes from this thread :
http://www.nabble.com/NameNode-failover-procedure-td11711842.html

Particularly because the wiki was updated with wrong information, not
maliciously I'm sure. This information is now gone for good.

Otis, your solution is pretty much like the one given by Dhruba Borthakur
and augmented by Konstantin Shvachko later in the thread but I never did it
myself.

One thing should be clear though, the NN is and will remain a SPOF (just
like HBase's Master) as long as a distributed manager service (like
Zookeeper) is not plugged into Hadoop to help with failover.

J-D

On Wed, Oct 29, 2008 at 2:12 AM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:


Hi,
So what is the recipe for avoiding NN SPOF using only what comes with
Hadoop?

From what I can tell, I think one has to do the following two things:

1) configure primary NN to save namespace and xa logs to multiple dirs, one
of which is actually on a remotely mounted disk, so that the data actually
lives on a separate disk on a separate box.  This saves namespace and xa
logs on multiple boxes in case of primary NN hardware failure.

2) configure secondary NN to periodically merge fsimage+edits and create
the fsimage checkpoint.  This really is a second NN process running on
another box.  It sounds like this secondary NN has to somehow have access to
fsimage  edits files from the primary NN server.
http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNodedoes
 not describe the best practise around that - the recommended way to
give secondary NN access to primary NN's fsimage and edits files.  Should
one mount a disk from the primary NN box to the secondary NN box to get
access to those files?  Or is there a simpler way?
In any case, this checkpoint is just a merge of fsimage+edits files and
again is there in case the box with the primary NN dies.  That's what's
described on
http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNodemore
 or less.

Is this sufficient, or are there other things one has to do to eliminate NN
SPOF?


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Jean-Daniel Cryans [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Tuesday, October 28, 2008 8:14:44 PM
Subject: Re: SecondaryNameNode on separate machine

Tomislav.

Contrary to popular belief the secondary namenode does not provide

failover,

it's only used to do what is described here :


http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNode

So the term secondary does not mean a second one but is more like a
second part of.

J-D

On Tue, Oct 28, 2008 at 9:44 AM, Tomislav Poljak wrote:


Hi,
I'm trying to implement NameNode failover (or at least NameNode local
data backup), but it is hard since there is no official documentation.
Pages on this subject are created, but still empty:

http://wiki.apache.org/hadoop/NameNodeFailover
http://wiki.apache.org/hadoop/SecondaryNameNode

I have been browsing the web and hadoop mailing list to see how this
should be implemented, but I got even more confused. People are asking
do we even need SecondaryNameNode etc. (since NameNode can write local
data to multiple locations, so one of those locations can be a mounted
disk from other machine). I think I understand the motivation for
SecondaryNameNode (to create a snapshoot of NameNode data every n
seconds/hours), but setting (deploying and running) SecondaryNameNode

on

different machine than NameNode is not as trivial as I expected. First

I

found that if I need to run SecondaryNameNode on other machine than
NameNode I should change masters file on NameNode (change localhost to
SecondaryNameNode host) and set some properties in hadoop-site.xml on
SecondaryNameNode (fs.default.name, fs.checkpoint.dir,
fs.checkpoint.period etc.)

This was enough to start SecondaryNameNode when starting NameNode with
bin/start-dfs.sh , but it didn't create image on SecondaryNameNode.

Then

I found that I need to set

Re: adding more datanode

2008-10-21 Thread Konstantin Shvachko
You just start the new data-node as the cluster is running using
bin/hadoop datanode
The configuration on the new data-node should be the same as on other nodes.
The data-node should join the cluster automatically.
Formatting will destroy your file system.

--Konstantin

David Wei wrote:
 Well, in my cluster, I do this:
 1. Adding new machines into conf/slaves on master machine
 2. On the new nodes, run format command
 3. Back to master, run start-all.sh
 4. Run start-balancer.sh , still on master
 
 Then I got the new nodes inside my cluster and no need to reboot the
 whole system.
 
 Hopefully this will help. ;=)
 
 
 Ski Gh3 写道:
 I'm not sure I get this.
 1. If you format the filesystem (which I thought is usually executed
 on the master node, but anyway)
 don't you erase all your data?
 2. I guess I need to add the new machine to the conf/slaves file,
 but then I run the start-all.sh again from the master node while my
 cluster is already running?

 Thanks!
 On Mon, Oct 20, 2008 at 5:59 PM, David Wei [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:

 this is quite easy. U can just config your new datanodes as others
 and format the filesystem before u start it.
 Remember to make it ssh-able for your master and run
 ./bin/start-all.sh on the master machine if you want to start all
 the deamons. This will start and add the new datanodes to the
 up-and-running cluster.

 hopefully my info will be help.


 Ski Gh3 写道:

 hi,

 I am wondering how to add more datanodes to an up-and-running
 hadoop
 instance?
 Couldn't find instructions on this from the wiki page.

 Thanks!





 
 
 


Re: dfs i/o stats

2008-09-29 Thread Konstantin Shvachko

We use TestDFSIO for measuring IO performance on our clusters.
It is called a test, but in fact its a benchmark.
It runs a map-reduce job, which either writes to or reads from files
and collects statistics.

Another thing is that Hadoop automatically collects metrics.
Like number of creates, deletes, ls's etc.
Here are some links:
http://wiki.apache.org/hadoop/GangliaMetrics
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/dfs/NameNodeMetrics.html
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/dfs/FSNamesystemMetrics.html

Hope this is helpful.
--Konstantin

Shirley Cohen wrote:

Hi,

I would like to measure the disk i/o performance of our hadoop cluster. 
However, running iostat on 16 nodes is rather cumbersome. Does dfs keep 
track of any stats like the number of blocks or bytes read and written? 
 From scanning the api, I found a class called 
org.apache.hadoop.fs.FileSystem.Statistics that could be relevant. 
Does anyone know if this is what I'm looking for?


Thanks,

Shirley



Re: Small Filesizes

2008-09-15 Thread Konstantin Shvachko

Peter,

You are likely to hit memory limitations on the name-node.
With 100 million small files it will need to support 200 mln objects,
which will require roughly 30 GB of RAM on the name-node.
You may also consider hadoop archives or present your files as a
collection of records and use Pig, Hive etc.

--Konstantin

Brian Vargas wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Peter,

In my testing with files of that size (well, larger, but still well
below the block size) it was impossible to achieve any real throughput
on the data because of the overhead of looking up the locations to all
those files on the NameNode.  Your application spends so much time
looking up file names that most of the CPUs sit idle.

A simple solution is to just load all of the small files into a sequence
file, and process the sequence file instead.

Brian

Peter McTaggart wrote:

Hi All,



I am considering using HDFS for an application that potentially has many
small files – ie  10-100 million files with an estimated average filesize of
50-100k (perhaps smaller) and is an online interactive application.

All of the documentation I have seen suggests that a blockszie of 64-128Mb
works best for Hadoop/HDFS and it is best used for batch oriented
applications.



Does anyone have any experience using it for files of this size  in an
online application environment?

Is it worth pursuing HDFS for this type of application?



Thanks

Peter


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (MingW32)
Comment: What is this? http://pgp.ardvaark.net

iD8DBQFIzlOt3YdPnMKx1eMRA18fAJ48voMDWLRiKPZHcBxAFAM1Kktk8wCguSDX
dIHsqlePzQHQYFr9AwhkI3I=
=gmAj
-END PGP SIGNATURE-



Re: is SecondaryNameNode in support for the NameNode?

2008-09-08 Thread Konstantin Shvachko

NameNodeFailover http://wiki.apache.org/hadoop/NameNodeFailover, with a
SecondaryNameNode http://wiki.apache.org/hadoop/SecondaryNameNode hosted

I think it is wrong, please correct it.


You probably look at some cached results. Both pages do not exist.
The first one was a cause of confusion and was removed.

Regards,
--Konstantin


2008/9/6, Jean-Daniel Cryans [EMAIL PROTECTED]:

Hi,

See http://wiki.apache.org/hadoop/FAQ#7 and


http://hadoop.apache.org/core/docs/r0.17.2/hdfs_user_guide.html#Secondary+Namenode

Regards,

J-D

On Sat, Sep 6, 2008 at 5:26 AM, ??? [EMAIL PROTECTED] wrote:


Hi all!

The NameNode is a Single Point of Failure for the HDFS Cluster. There
is support for NameNodeFailover, with a SecondaryNameNode hosted on a
separate machine being able to stand in for the original NameNode if
it goes down.

Is it right? is SecondaryNameNode in support  for the NameNode?

Sorry for my englist!!
?






Re: Hadoop over Lustre?

2008-09-03 Thread Konstantin Shvachko

Great!
If you decide to run TestDFSIO on your cluster, please let me know.
I'll run the same on the same scale with hdfs and we can compare the numbers.
--Konstantin

Joel Welling wrote:

That seems to have done the trick!  I am now running Hadoop 0.18
straight out of Lustre, without an intervening HDFS.  The unusual things
about my hadoop-site.xml are:

property
  namefs.default.name/name
  valuefile:///bessemer/welling/value
/property
property
  namemapred.system.dir/name
  value${fs.default.name}/hadoop_tmp/mapred/system/value
  descriptionThe shared directory where MapReduce stores control
files.
  /description
/property

where /bessemer/welling is a directory on a mounted Lustre filesystem.
I then do 'bin/start-mapred.sh' (without starting dfs), and I can run
Hadoop programs normally.  I do have to specify full input and output
file paths- they don't seem to be relative to fs.default.name .  That's
not too troublesome, though.

Thanks very much!  
-Joel

 [EMAIL PROTECTED]

On Fri, 2008-08-29 at 10:52 -0700, Owen O'Malley wrote:

Check the setting for mapred.system.dir. This needs to be a path that is on
a distributed file system. In old versions of Hadoop, it had to be on the
default file system, but that is no longer true. In recent versions, the
system dir only needs to be configured on the JobTracker and it is passed to
the TaskTrackers and clients.





Re: restarting datanode corrupts the hdfs

2008-09-03 Thread Konstantin Shvachko

I can see 3 reasons for that:
1. dfs.data.dir is pointing to a wrong data-node storage directory, or
2. somebody manually moved directory hadoop into /home/hadoop/dfs/tmp/,
which is supposed to contain only block files named blk_number
3. There is some collision of configuration variables so that the same directory
/home/hadoop/dfs/ is used by different servers (e.g. data-node and task tracker)
on your single node cluster.

To save hdfs data you can manually remove hadoop from /home/hadoop/dfs/tmp/
and then restart the data-node.
Or you can also manully remove tmp from /home/hadoop/dfs/.
In the latter case you risk to loose some latest blocks, but not the whole 
system.

--Konstantin

Barry Haddow wrote:
Hi 

Since upgrading to 0.18.0 I've noticed that restarting the datanode corrupts 
the hdfs so that the only option is to delete it and start again. I'm running 
hadoop in distributed mode, on a single host. It runs as the user hadoop and 
the hdfs is contained in a directory /home/hadoop/dfs.


When I restart hadoop using start-all.sh the datanode fails with the following 
message:


STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.18.0
STARTUP_MSG:   build = 
http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 686010; 
compiled by 'hadoopqa' on Thu Aug 14 19:48:33 UTC 2008

/
2008-09-01 12:06:55,871 ERROR org.apache.hadoop.dfs.DataNode: 
java.io.IOException: Found /home/hadoop/dfs/tmp/hadoop 
in /home/hadoop/dfs/tmp but it is not a file.
at 
org.apache.hadoop.dfs.FSDataset$FSVolume.recoverDetachedBlocks(FSDataset.java:437)

at org.apache.hadoop.dfs.FSDataset$FSVolume.init(FSDataset.java:310)
at org.apache.hadoop.dfs.FSDataset.init(FSDataset.java:671)
at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:277)
at org.apache.hadoop.dfs.DataNode.init(DataNode.java:190)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2987)
at 
org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:2942)

at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2950)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3072)

2008-09-01 12:06:55,872 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG:

Running an fsck on the hdfs shows that it is corrupt, and the only way to fix 
it seems to be to delete it and reformat.


Any suggestions?
regards
Barry



Re: Hadoop over Lustre?

2008-08-25 Thread Konstantin Shvachko

mapred.job.tracker is the address and port of the JobTracker - the main server 
that controls map-reduce jobs.
Every task tracker needs to know the address in order to connect.
Do you follow the docs, e.g. that one
http://wiki.apache.org/hadoop/GettingStartedWithHadoop

Can you start one node cluster?

 Are there standard tests of hadoop performance?

There is the sort benchmark. We also run DFSIO benchmark for read and write 
throughputs.

--Konstantin

Joel Welling wrote:

So far no success, Konstantin- the hadoop job seems to start up, but
fails immediately leaving no logs.  What is the appropriate setting for
mapred.job.tracker ?  The generic value references hdfs, but it also has
a port number- I'm not sure what that means.

My cluster is small, but if I get this working I'd be very happy to run
some benchmarks.  Are there standard tests of hadoop performance?

-Joel
 [EMAIL PROTECTED]

On Fri, 2008-08-22 at 15:59 -0700, Konstantin Shvachko wrote:

I think the solution should be easier than Arun and Steve advise.
Lustre is already mounted as a local directory on each cluster machines, right?
Say, it is mounted on /mnt/lustre.
Then you configure hadoop-site.xml and set
property
   namefs.default.name/name
   valuefile:///mnt/lustre/value
/property
And then you start map-reduce only without hdfs using start-mapred.sh

By this you basically redirect all FileSystem requests to Lustre and you don't 
need
data-nodes or the name-node.

Please let me know if that works.

Also it would very interesting to have your experience shared on this list.
Problems, performance - everything is quite interesting.

Cheers,
--Konstantin

Joel Welling wrote:
2. Could you set up symlinks from the local filesystem, so point every 
node at a local dir

  /tmp/hadoop
with each node pointing to a different subdir in the big filesystem?

Yes, I could do that!  Do I need to do it for the log directories as
well, or can they be shared?

-Joel

On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote:

Joel Welling wrote:

Thanks, Steve and Arun.  I'll definitely try to write something based on
the KFS interface.  I think that for our applications putting the mapper
on the right rack is not going to be that useful.  A lot of our
calculations are going to be disordered stuff based on 3D spatial
relationships like nearest-neighbor finding, so things will be in a
random access pattern most of the time.

Is there a way to set up the configuration for HDFS so that different
datanodes keep their data in different directories?  That would be a big
help in the short term.

yes, but you'd have to push out a different config to each datanode.

1. I have some stuff that could help there, but its not ready for 
production use yet [1].


2. Could you set up symlinks from the local filesystem, so point every 
node at a local dir

  /tmp/hadoop
with each node pointing to a different subdir in the big filesystem?


[1] 
http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf







Re: Hadoop over Lustre?

2008-08-22 Thread Konstantin Shvachko

I think the solution should be easier than Arun and Steve advise.
Lustre is already mounted as a local directory on each cluster machines, right?
Say, it is mounted on /mnt/lustre.
Then you configure hadoop-site.xml and set
property
  namefs.default.name/name
  valuefile:///mnt/lustre/value
/property
And then you start map-reduce only without hdfs using start-mapred.sh

By this you basically redirect all FileSystem requests to Lustre and you don't 
need
data-nodes or the name-node.

Please let me know if that works.

Also it would very interesting to have your experience shared on this list.
Problems, performance - everything is quite interesting.

Cheers,
--Konstantin

Joel Welling wrote:
2. Could you set up symlinks from the local filesystem, so point every 
node at a local dir

  /tmp/hadoop
with each node pointing to a different subdir in the big filesystem?


Yes, I could do that!  Do I need to do it for the log directories as
well, or can they be shared?

-Joel

On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote:

Joel Welling wrote:

Thanks, Steve and Arun.  I'll definitely try to write something based on
the KFS interface.  I think that for our applications putting the mapper
on the right rack is not going to be that useful.  A lot of our
calculations are going to be disordered stuff based on 3D spatial
relationships like nearest-neighbor finding, so things will be in a
random access pattern most of the time.

Is there a way to set up the configuration for HDFS so that different
datanodes keep their data in different directories?  That would be a big
help in the short term.

yes, but you'd have to push out a different config to each datanode.

1. I have some stuff that could help there, but its not ready for 
production use yet [1].


2. Could you set up symlinks from the local filesystem, so point every 
node at a local dir

  /tmp/hadoop
with each node pointing to a different subdir in the big filesystem?


[1] 
http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf





Re: When will hadoop version 0.18 be released?

2008-08-14 Thread Konstantin Shvachko

I don't think HADOOP-3781 will be fixed.

Here is the complete list of what is going to be fixed in 0.18
https://issues.apache.org/jira/secure/IssueNavigator.jspa?fixfor=12312972

--Konstantin

Thibaut_ wrote:

Will this bug (https://issues.apache.org/jira/browse/HADOOP-3781) also be
fixed, which makes it impossible to use the distributed jar file with any
external application? (Works only with a local recompile)

Thibaut


Konstantin Shvachko wrote:

But you won't get append in 0.18. It was committed for 0.19.
--konstantin

Arun C Murthy wrote:

On Aug 12, 2008, at 11:51 PM, 11 Nov. wrote:


Hi colleagues,
   As you know, the append writer will be available in version 0.18. 
We are

here waiting for the feature and want to know the rough time of release.
It's currently under vote, it should be released by the end of the week 
if it passes.


Arun







Re: When will hadoop version 0.18 be released?

2008-08-13 Thread Konstantin Shvachko

But you won't get append in 0.18. It was committed for 0.19.
--konstantin

Arun C Murthy wrote:


On Aug 12, 2008, at 11:51 PM, 11 Nov. wrote:


Hi colleagues,
   As you know, the append writer will be available in version 0.18. 
We are

here waiting for the feature and want to know the rough time of release.


It's currently under vote, it should be released by the end of the week 
if it passes.


Arun



Confusing NameNodeFailover page in Hadoop Wiki

2008-08-05 Thread Konstantin Shvachko

I was wondering around Hadoop wiki and found this page dedicated to name-node 
failover.
http://wiki.apache.org/hadoop/NameNodeFailover

I think it is confusing, contradicts other documentation on the subject and 
contains incorrect facts. See
http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Secondary+Namenode
http://wiki.apache.org/hadoop/FAQ#7

Besides it contains some kind of discussion.
It is not that I am against discussions, lets have them on this list.
But I was trying to understand were all the confusion about secondary-node 
issues comes from lately...

Imho we either need to correct it or remove.

Thanks,
--Konstantin


Re: Hadoop 4 disks per server

2008-07-30 Thread Konstantin Shvachko

On hdfs see
http://wiki.apache.org/hadoop/FAQ#15
In addition to the James's suggestion you can also specify dfs.name.dir
for the name-node to store extra copies of the namespace.


James Moore wrote:

On Tue, Jul 29, 2008 at 6:37 PM, Rafael Turk [EMAIL PROTECTED] wrote:

Hi All,

 I´m setting up a cluster with 4 disks per server. Is there any way to make
Hadoop aware of this setup and take benefits from that?


I believe all you need to do is give four directories (one on each
drive) as  the value for dfs.data.dir and mapred.local.dir.  Something
like:

property
  namedfs.data.dir/name
  
value/drive1/myDfsDir,/drive2/myDfsDir,/drive3/myDfsDir,/drive4/myDfsDir/value
  descriptionDetermines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.
  Directories that do not exist are ignored.
  /description
/property



Re: corrupted fsimage and edits

2008-07-30 Thread Konstantin Shvachko

You should also run a secondary name-node, which does namespace checkpoints and 
shrinks the edits log file.
And this is exactly the case when the checkpoint image comes handy.
http://wiki.apache.org/hadoop/FAQ#7
In the recent release you can start the primary node using the secondary image 
directly.
In the old releases you need to move some files around.
--Konstantin

Raghu Angadi wrote:

Torsten Curdt wrote:


On Jul 30, 2008, at 20:35, Raghu Angadi wrote:

You should always have more than one location (preferably on 
different disks) for fsimage and editslog.


On production we do frequent backups. Is there a mechanism from inside 
hadoop now to do something like that now? The more than one location 
bit sounds a little like that.


You can specify multiple directories for dfs.name.dir, in which case 
fsimage and editslog are written to multiple places. If one of these 
goes bad, you can use the other one.


See http://wiki.apache.org/hadoop/FAQ#15

Raghu.

A few months back I had a proposal to keep checksums for each record 
on fsimage and editslog and NameNode would recover transparently from 
such corruptions when there are more than one copies available. It 
didn't come up in priority since there were no such failures observed.


You should certainly report these cases and will help the feature 
gain more traction.


Will file a bug report tomorrow.

cheers
--
Torsten





Re: Inconsistency in namenode's and datanode's namespaceID

2008-07-03 Thread Konstantin Shvachko

Yes this is a known bug.
http://issues.apache.org/jira/browse/HADOOP-1212
You should manually remove current directory from every data-node
after reformatting the name-node and start the cluster again.
I do not believe there is any other way.
Thanks,
--Konstantin

Taeho Kang wrote:

No, I don't think it's a bug.

Your datanodes' data partition/directory was probably used in other HDFS
setup and thus had other namespaceID.

Or you could've used other partition/directory for your new HDFS setup by
setting different values for dfs.data.dir on your datanode. But in this
case, you can't access your old HDFS's data.


On Thu, Jul 3, 2008 at 4:21 AM, Xuan Dzung Doan [EMAIL PROTECTED]
wrote:


I was following the quickstart guide to run pseudo-distributed operations
with Hadoop 0.16.4. I got it to work successfully the first time. But I
failed to repeat the steps (I tried to re-do everything from re-formating
the HDFS). Then by looking at the log files of the daemons, I found out the
datanode failed to start because its namespaceID didn't match with the
namenode's. I after that found that the namespaceID is stored in the text
file VERSION under dfs/data/current and dfs/name/current for the datanode
and the namenode, respectively. The reformatting step does change
namespaceID of the namenode, but not for the datanode, and that's the cause
for the inconsistency. So after reformatting, if I manually update
namespaceID for the datanode, things will work totally fine again.

I guess there are probably others who had this same experience. Is it a bug
in Hadoop 0.16.4? If so, has it been taken care of in later versions?

Thanks,
David.








Re: HDFS blocks

2008-06-27 Thread Konstantin Shvachko



lohit wrote:

1. Can we have multiple files in DFS use different block sizes ?

No, current this might not be possible, we have fixed sized blocks.


Actually you can. HDFS provides api to specify block size
when you create a file. Here is the link
http://hadoop.apache.org/core/docs/r0.17.0/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long,%20org.apache.hadoop.util.Progressable)
This should probably be in H-FAQ.


2. If we use default block size for these small chunks, is the DFS space
wasted ?
 
DFS space is not wasted, all the blocks are stored on individual datanode's filesystem as is. But you would be wasting NameNode's namespace. NameNode holds the entire namespace in memory, so, instead of using 1 file with 128M block if you do multiple files of size 6M you would be having so many entries.



If not then does it mean that a single DFS block can hold data from
more than one file ?

DFS Block cannot hold data from more than one file. If your file size say 5M 
which is less than your default block size say 128M, then the block stored in 
DFS would be 5M alone.

To over come this, ppl usually run a map/reduce job with 1 reducer and Identity 
mapper, which basically merges all small files into one file. In hadoop 0.18 we 
have archives and once HADOOP-1700 is done, one could open the file to append 
to it.

Thanks,
Lohit


- Original Message 
From: Goel, Ankur [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Friday, June 27, 2008 2:27:57 AM
Subject: HDFS blocks


Hi Folks,
I have a setup where in I am streaming data into HDFS from a
remote location and creating a new files every X min. The file generated
is of a very small size (512 KB - 6 MB) size. Since that is the size
range the streaming code sets the block size to 6MB whereas default that
we have set for the cluster is 128 MB. The idea behind such a thing is
to generate small temporal data chunks from multiple sources and merge
them periodically into a big chunk with our default (128 MB) block size.

The webUI for DFS reports the block size for these files to be 6 MB. My
questions are.
1. Can we have multiple files in DFS use different block sizes ?
2. If we use default block size for these small chunks, is the DFS space
wasted ? 
   If not then does it mean that a single DFS block can hold data from

more than one file ?

Thanks
-Ankur




Re: realtime hadoop

2008-06-23 Thread Konstantin Shvachko

 Also HDFS might be critical since to access your data you need to close the 
file

Not anymore. Since 0.16 files are readable while being written to.

 it as fast as possible. I need to be able to maintain some guaranteed
 max. processing time, for example under 3 minutes.

It looks like you do not need very strict guarantees.
I think you can use hdfs as a data-storage.
Don't know what kind of data-processing you do, but I agree with Stefan
that map-reduce is designed for batch tasks rather than for real-time 
processing.



Stefan Groschupf wrote:

Hadoop might be the wrong technology for you.
Map Reduce is a batch processing mechanism. Also HDFS might be critical 
since to access your data you need to close the file - means you might 
have many small file, a situation where hdfs is not very strong 
(namespace is hold in memory).
Hbase might be an interesting tool for you, also zookeeper if you want 
to do something home grown...




On Jun 23, 2008, at 11:31 PM, Vadim Zaliva wrote:


Hi!

I am considering using Hadoop for (almost) realime data processing. I
have data coming every second and I would like to use hadoop cluster
to process
it as fast as possible. I need to be able to maintain some guaranteed
max. processing time, for example under 3 minutes.

Does anybody have experience with using Hadoop in such manner? I will
appreciate if you can share your experience or give me pointers
to some articles or pages on the subject.

Vadim



~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com





Re: hadoop file system error

2008-06-18 Thread Konstantin Shvachko

Did you close those files?
If not they may be empty.


??? wrote:

Dears,

I use hadoop-0.16.4 to do some work and found a error which i can't get the
reasons.

The scenario is like this: In the reduce step, instead of using
OutputCollector to write result, i use FSDataOutputStream to write result to
files on HDFS(becouse i want to split the result by some rules). After the
job finished, i found that *some* files(but not all) are empty on HDFS. But
i'm sure in the reduce step the files are not empty since i added some logs
to read the generated file. It seems that some file's contents are lost
after the reduce step. Is anyone happen to face such errors? or it's a
hadoop bug?

Please help me to find the reason if you some guys know

Thanks  Regards
Guangfeng



Re: dfs put fails

2008-06-17 Thread Konstantin Shvachko

Looks like the client machine from which you call -put cannot connect to the 
data-nodes.
It could be firewall or wrong configuration parameters that you use for the 
client.

Alexander Arimond wrote:
hi, 

i'm new in hadoop and im just testing it at the moment. 
i set up a cluster with 2 nodes and it seems like they are running
normally, 
the log files of the namenode and the datanodes dont show errors. 
Firewall should be set right. 
but when i try to upload a file to the dfs i get following message: 

[EMAIL PROTECTED]:~/hadoop$ bin/hadoop dfs -put file.txt file.txt 
08/06/12 14:44:19 INFO dfs.DFSClient: Exception in
createBlockOutputStream java.net.ConnectException: Connection refused 
08/06/12 14:44:19 INFO dfs.DFSClient: Abandoning block
blk_5837981856060447217 
08/06/12 14:44:28 INFO dfs.DFSClient: Exception in
createBlockOutputStream java.net.ConnectException: Connection refused 
08/06/12 14:44:28 INFO dfs.DFSClient: Abandoning block
blk_2573458924311304120 
08/06/12 14:44:37 INFO dfs.DFSClient: Exception in
createBlockOutputStream java.net.ConnectException: Connection refused 
08/06/12 14:44:37 INFO dfs.DFSClient: Abandoning block
blk_1207459436305221119 
08/06/12 14:44:46 INFO dfs.DFSClient: Exception in
createBlockOutputStream java.net.ConnectException: Connection refused 
08/06/12 14:44:46 INFO dfs.DFSClient: Abandoning block
blk_-8263828216969765661 
08/06/12 14:44:52 WARN dfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block. 
08/06/12 14:44:52 WARN dfs.DFSClient: Error Recovery for block
blk_-8263828216969765661 bad datanode[0] 



dont know what that means and didnt found something about that.. 
Hope somebody can help with that. 


Thank you!




Re: Best practices for handling many small files

2008-04-25 Thread Konstantin Shvachko

Would the new archive feature HADOOP-3307 that is currently being developed 
help this problem?
http://issues.apache.org/jira/browse/HADOOP-3307

--Konstantin

Subramaniam Krishnan wrote:


We have actually written a custom Multi File Splitter that collapses all 
the small files to a single split till the DFS Block Size is hit.
We also take care of handling big files by splitting them on Block Size 
and adding up all the reminders(if any) to a single split.


It works great for us:-)
We are working on optimizing it further to club all the small files in a 
single data node together so that the Map can have maximum local data.


We plan to share this(provided it's found acceptable, of course) once 
this is done.


Regards,
Subru

Stuart Sierra wrote:


Thanks for the advice, everyone.  I'm going to go with #2, packing my
million files into a small number of SequenceFiles.  This is slow, but
only has to be done once.  My datacenter is Amazon Web Services :),
so storing a few large, compressed files is the easiest way to go.

My code, if anyone's interested, is here:
http://stuartsierra.com/2008/04/24/a-million-little-files

-Stuart
altlaw.org


On Wed, Apr 23, 2008 at 11:55 AM, Stuart Sierra 
[EMAIL PROTECTED] wrote:
 


Hello all, Hadoop newbie here, asking: what's the preferred way to
 handle large (~1 million) collections of small files (10 to 100KB) in
 which each file is a single record?

 1. Ignore it, let Hadoop create a million Map processes;
 2. Pack all the files into a single SequenceFile; or
 3. Something else?

 I started writing code to do #2, transforming a big tar.bz2 into a
 BLOCK-compressed SequenceFile, with the file names as keys.  Will that
 work?

 Thanks,
 -Stuart, altlaw.org







Re: TestDU.testDU() throws assertionfailederror

2008-04-18 Thread Konstantin Shvachko

Edward,

testDU() writes a 32K file to the local fs and then verifies whether the value 
reported by du
changes exactly to the amount written.
Although this is true for most block oriented file systems it might not be true 
for some.
I suspect that in your case the file is written to tmpfs, which is a memory fs 
and thus
creates an inode a directory entry and may be something else in memory (4K 
total) in addition
to the actual data. That is why du returns a different value and test fails.
Although TestDU is not universal we still want it to run in order to prevent bugs in DU 
on real file systems.

Thanks,
--Konstantin


Edward J. Yoon wrote:

Hi, community.

In my local computer (CentOS 5), TestDU.testDU() throws assertionfailederror.
What is the 4096 byte?

Testsuite: org.apache.hadoop.fs.TestDU
Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 5.143 sec

Testcase: testDU took 5.138 sec
   FAILED
expected:32768 but was:36864
junit.framework.AssertionFailedError: expected:32768 but was:36864
   at org.apache.hadoop.fs.TestDU.testDU(TestDU.java:77)

[EMAIL PROTECTED] hadoop]# df -T
FilesystemType   1K-blocks  Used Available Use% Mounted on
/dev/sda1 ext352980472  11103380  39185808  23% /
none tmpfs  516924 0516924   0% /dev/shm

Thanks.


Re: secondary namenode web interface

2008-04-08 Thread Konstantin Shvachko

Yuri,

The NullPointerException should be fixed as Dhruba proposed.
We do not have any secondary nn web interface as of today.
The http server is used for transferring data between the primary and the 
secondary.
I don't see we can display anything useful on the secondary web UI except for 
the
current status, config values, and the last checkpoint date/time.
If you have anything in mind that can be displayed on the UI please let us know.
You can also find a jira for the issue, it would be good if this discussion
is reflected in it.

Thanks,
--Konstantin

dhruba Borthakur wrote:

The secondary Namenode uses the HTTP interface to pull the fsimage from
the primary. Similarly, the primary Namenode uses the
dfs.secondary.http.address to pull the checkpointed-fsimage back from
the secondary to the primary. So, the definition of
dfs.secondary.http.address is needed.

However, the servlet dfshealth.jsp should not be served from the
secondary Namenode. This servet should be setup in such a way that only
the primary Namenode invokes this servlet.

Thanks,
dhruba

-Original Message-
From: Yuri Pradkin [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 08, 2008 10:11 AM

To: core-user@hadoop.apache.org
Subject: Re: secondary namenode web interface

I'd be happy to file a JIRA for the bug, I just want to make sure I
understand 
what the bug is: is it the misleading null pointer message or is it
that 
someone is listening on this port and not doing anything useful?  I

mean,
what is the configuration parameter dfs.secondary.http.address for?
Unless 
there are plans to make this interface work, this config parameter
should go 
away, and so should the listening thread, shouldn't they?


Thanks,
  -Yuri

On Friday 04 April 2008 03:30:46 pm dhruba Borthakur wrote:


Your configuration is good. The secondary Namenode does not publish a
web interface. The null pointer message in the secondary Namenode


log


is a harmless bug but should be fixed. It would be nice if you can


open


a JIRA for it.

Thanks,
Dhruba


-Original Message-
From: Yuri Pradkin [mailto:[EMAIL PROTECTED]
Sent: Friday, April 04, 2008 2:45 PM
To: core-user@hadoop.apache.org
Subject: Re: secondary namenode web interface

I'm re-posting this in hope that someone would help.  Thanks!

On Wednesday 02 April 2008 01:29:45 pm Yuri Pradkin wrote:


Hi,

I'm running Hadoop (latest snapshot) on several machines and in our


setup



namenode and secondarynamenode are on different systems.  I see from


the



logs than secondary namenode regularly checkpoints fs from primary
namenode.

But when I go to the secondary namenode HTTP


(dfs.secondary.http.address)



in my browser I see something like this:

HTTP ERROR: 500
init
RequestURI=/dfshealth.jsp
Powered by Jetty://

And in secondary's log I find these lines:

2008-04-02 11:26:25,357 WARN /: /dfshealth.jsp:
java.lang.NullPointerException
   at
org.apache.hadoop.dfs.dfshealth_jsp.init(dfshealth_jsp.java:21) at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native


Method)


at




sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorA


cce



ssorImpl.java:57) at




sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCons


tru



ctorAccessorImpl.java:45) at
java.lang.reflect.Constructor.newInstance(Constructor.java:539) at
java.lang.Class.newInstance0(Class.java:373)
   at java.lang.Class.newInstance(Class.java:326)
   at


org.mortbay.jetty.servlet.Holder.newInstance(Holder.java:199)



   at




org.mortbay.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:32


6)



at


org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:405)



at




org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationH


and



ler.java:475) at




org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)


at



org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at




org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCon


tex



t.java:635) at


org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at



org.mortbay.http.HttpServer.service(HttpServer.java:954) at
org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)


at


org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at




org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244


)



at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)


at


org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

Is something missing from my configuration?  Anybody else seen


these?


Thanks,

 -Yuri







Re: secondary namenode web interface

2008-04-08 Thread Konstantin Shvachko

Unfortunately we do not have an api for the secondary nn that would allow 
browsing the checkpoint.
I agree it would be nice to have one.
Thanks for filing the issue.
--Konstantin

Yuri Pradkin wrote:

On Tuesday 08 April 2008 11:54:35 am Konstantin Shvachko wrote:


If you have anything in mind that can be displayed on the UI please let us
know. You can also find a jira for the issue, it would be good if this
discussion is reflected in it.



Well, I guess we could have interface to browse the checkpointed image 
(actually this is what I was expecting to see), but it's not that big of a 
deal.


Filed 
https://issues.apache.org/jira/browse/HADOOP-3212


Thanks,

  -Yuri



Hadoop-Patch buil is not progressing for 6 hours

2008-03-17 Thread Konstantin Shvachko

Usually a build takes 2 hours or less.
This one is stuck and I don't see changes in the QUEUE OF PENDING PATCHES when 
I submit a patch.
I guess something is wrong with Hadson.
Could anybody please check.
--Konstantin


Re: Namenode not a host port pair

2008-03-11 Thread Konstantin Shvachko

You should use host:port rather than just port.
See HADOOP-2404, and HADOOP-2185.

Ved Prakash wrote:

Hi friends,

I have been trying to start hadoop on the master but it doesn't start the
name node on it, checking the logs I found the following error



hadoop-site.xml listing

configuration
  property
namefs.default.name/name
valuehdfs://ved-desktop:50001/value
  /property
  property
namemapred.job.tracker/name
valueved-desktop:50002/value
  /property
  property
namedfs.secondary.http.address/name
value50003/value
  /property


Should be host:port not just port.


  property
namedfs.http.address/name
value50004/value
  /property


Same here.


  property
namemapred.job.tracker.http.address/name
value50005/value
  /property


Same here.


  property
nametasktracker.http.address/name
value50006/value
  /property


Same here.



/configuration

Yesterday I could start namenode, tasktracker, jobtracker, secondarynamenode
properly but today its giving me problem. What could be the reason, can
anyone help me with this?

Thanks



Re: File size and number of files considerations

2008-03-11 Thread Konstantin Shvachko



Naama Kraus wrote:

Hi,

Thanks all for the input.

Here are my further questions:

I can consolidate data off-line to have big enough files (64M) or copy to
dfs smaller files and then consolidate using MapReduce.


You can also let MapReduce read your original small local files and write them
into large hdfs file consolidating them so to speak on the fly.
Something like distcp (see related issues in Jira) but with custom processing 
of your inputs.


1. If I choose the first option, would the copy of a 64M file into dfs from
a local file system perform well ?

2. If I choose the second option, how would one suggest to implement it ? I
am not sure how I control the size of the reduce output files.

3. I had the impression that dfs splits large files and distributes splits
around. Is that true ? 


Just wanted to mention that splits are logical not physical. It is not like
hdfs cuts files into pieces and moves them around. You can think of splits as
file ranges.


If so, why should I mind if my files are extremely
large ? Say Gigas or even Teras ? Doesn't dfs take care of it internally and
thus scales up in terms of file size ? I am quoting from the HDFS
architecture document in
http://hadoop.apache.org/core/docs/current/hdfs_design.html#Large+Data+Sets
Applications that run on HDFS have large data sets. A typical file in HDFS
is gigabytes to terabytes in size. Thus, HDFS is tuned to support large
files.

4. Is there further recommended material to read about these issues ?

Thanks, Naama

On Mon, Mar 10, 2008 at 6:43 PM, Amar Kamat [EMAIL PROTECTED] wrote:



By chunks I meant basic unit of processing i.e a dfs block. Sorry for
the confusion, I should have mentioned it clearly. What I meant was in
case of files smaller than the default block size, the file becomes the
basic
unit for computation. Now one can have a very huge file and rely on the
dfs block size but a simpler approach would be create small files in the
beginning itself (if possible). This avoids playing around with the block
size and adds lesser confusion in terms of record boundaries etc. I dont
have any specific values for the file sizes but files with very small
sizes will cause lots of maps which will cause reducers to be slower. So
make sure to have files that form the logical unit of computation and good
enough size.
Thanks Ted for pointing it out.
Amar


On Mon, 10 Mar 2008, Ted Dunning wrote:

Amar's comments are a little strange.

Replication occurs at the block level, not the file level.  Storing data


in


a small number of large files or a large number of small files will have
less than a factor of two effect on number of replicated blocks if the


small


files are 64MB.  Files smaller than that will hurt performance due to


seek


costs.

To address Naama's question, you should consolidate your files so that


you


have files of at least 64 MB and preferably a bit larger than that.


This


helps because it allows the reading of the files to proceed in a nice
sequential manner which can greatly increase throughput.

If consolidating these files off-line is difficult, it is easy to do in


a


preliminary map-reduce step.  This will incur a one-time cost, but if


you


are doing multiple passes over the data later, it will be worth it.


On 3/10/08 3:12 AM, Amar Kamat [EMAIL PROTECTED] wrote:



On Mon, 10 Mar 2008, Naama Kraus wrote:



Hi,

In our system, we plan to upload data into Hadoop from external


sources and


use it later on for analysis tasks. The interface to the external
repositories allows us to fetch pieces of data in chunks. E.g. get n


records


at a time. Records are relatively small, though the overall amount of


data


is assumed to be large. For each repository, we fetch pieces of data


in a


serial manner. Number of repositories is small (few of them).

My first step is to put the data in plain files in HDFS. My question


is what


is the optimized file sizes to use. Many small files (to the extent of


each


record in a file) ? - guess not. Few huge files each holding all data


of


same type ? Or maybe put each chunk we get in a separate file, and


close it


right after a chunk was uploaded ?



I think it should be more based on the size of the data you want to
process in a map which I think here is the chunk size, no?
Larger the file less the replicas and hence more the network transfers


in


case of more maps. In case of smaller file size the NN will be


bottleneck


but you will end up having more replicas for each map task and hence


more


locality.
Amar


How would HFDS perform best, with few large files or more smaller


files ? As


I wrote we plan to run MapReduce jobs over the data in the files in


order to


organize the data and analyze it.

Thanks for any help,
Naama











Re: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread Konstantin Shvachko

André,
You can try to rollback.
You did use upgrade when you switched to the new trunk, right?
--Konstantin

Raghu Angadi wrote:

André Martin wrote:


Hi Raghu,
done: https://issues.apache.org/jira/browse/HADOOP-2873
Subsequent tries did not succeed - so it looks like I need to 
re-format the cluster :-(



Please back up the log files and name node image files if you can before 
re-format.


Raghu.



Re: dfsadmin reporting wrong disk usage numbers

2008-02-15 Thread Konstantin Shvachko

Yes, please file a bug.
There are file systems with different block sizes out there Linux or Solaris.

Thanks,
--Konstantin

Martin Traverso wrote:

I think I found the issue. The class org.apache.hadoop.fs.DU assumes
1024-byte blocks when reporting usage information:

   this.used = Long.parseLong(tokens[0])*1024;

This works fine in linux, but in Solaris and Mac OS the reported number of
blocks is based on 512-byte blocks.

The solution is simple: DU should use du -sk instead of du -s.

Should I file I bug for this?

Martin