from:"Dhruba Borthakur"

Re: Copying files from HDFS to remote database

2009-04-21 Thread Dhruba Borthakur

You can use any of these:

1. bin/hadoop dfs -get hdfsfile 
2. Thrift API : http://wiki.apache.org/hadoop/HDFS-APIs
3. use fuse-mount ot mount hdfs as a regular file system on remote machine:
http://wiki.apache.org/hadoop/MountableHDFS

thanks,
dhruba

On Mon, Apr 20, 2009 at 9:40 PM, Parul Kudtarkar <
parul_kudtar...@hms.harvard.edu> wrote:

>
> Our application is using hadoop to parallelize jobs across ec2 cluster.
> HDFS
> is used to store output files. How would you ideally copy output files from
> HDFS to remote databases?
>
> Thanks,
> Parul V. Kudtarkar
> --
> View this message in context:
> http://www.nabble.com/Copying-files-from-HDFS-to-remote-database-tp23149085p23149085.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
>

Re: File Modification timestamp

2009-01-08 Thread Dhruba Borthakur

Hi sandeep,

This should have worked. Please look at a sample program that used
modification time /src/test/org/apache/hadoop/hdfs/TestModTime.java.

If you can write any sort of test that demonstrates the exact problem you
are seeing, it will help in debugging.

thanks,
dhruba


On Tue, Dec 30, 2008 at 11:35 PM, Sandeep Dhawan  wrote:

>
> Hi Dhruba,
>
> The file is being closed properly but the timestamp does not get modified.
> The modification timestamp
> still shows the file creation time.
> I am creating a new file and writing data into this file.
>
> Thanks,
> Sandeep
>
>
> Dhruba Borthakur-2 wrote:
> >
> > I believe that file modification times are updated only when the file is
> > closed. Are you "appending" to a preexisting file?
> >
> > thanks,
> > dhruba
> >
> >
> > On Tue, Dec 30, 2008 at 3:14 AM, Sandeep Dhawan  wrote:
> >
> >>
> >> Hi,
> >>
> >> I have a application which creates a simple text file on hdfs. There is
> a
> >> second application which processes this file. The second application
> >> picks
> >> up the file for processing only when the file has not been modified for
> >> 10
> >> mins. In this way, the second application is sure that this file is
> ready
> >> for processing.
> >>
> >> But, what is happening is that the Hadoop is not updating the
> >> modification
> >> timestamp of the file even when the file is being written into. The
> >> modification timestamp of the file is same as the timestamp when the
> file
> >> was created.
> >>
> >> I am using hadoop 0.18.2.
> >>
> >> 1. Is this a bug in hadoop or is this way hadoop works
> >> 2. Is there way by which I can programmitically set the modification
> >> timestamp of the file
> >>
> >> Thanks,
> >> Sandeep
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/File-Modification-timestamp-tp21215824p21215824.html
> >> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/File-Modification-timestamp-tp21215824p21228299.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Re: File Modification timestamp

2008-12-30 Thread Dhruba Borthakur

I believe that file modification times are updated only when the file is
closed. Are you "appending" to a preexisting file?

thanks,
dhruba


On Tue, Dec 30, 2008 at 3:14 AM, Sandeep Dhawan  wrote:

>
> Hi,
>
> I have a application which creates a simple text file on hdfs. There is a
> second application which processes this file. The second application picks
> up the file for processing only when the file has not been modified for 10
> mins. In this way, the second application is sure that this file is ready
> for processing.
>
> But, what is happening is that the Hadoop is not updating the modification
> timestamp of the file even when the file is being written into. The
> modification timestamp of the file is same as the timestamp when the file
> was created.
>
> I am using hadoop 0.18.2.
>
> 1. Is this a bug in hadoop or is this way hadoop works
> 2. Is there way by which I can programmitically set the modification
> timestamp of the file
>
> Thanks,
> Sandeep
>
> --
> View this message in context:
> http://www.nabble.com/File-Modification-timestamp-tp21215824p21215824.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Re: 64 bit namenode and secondary namenode & 32 bit datanod

2008-11-25 Thread Dhruba Borthakur

The design is such that running multiple secondary namenodes should not
corrupt the image (modulo any bugs). Are you seeing image corruptions when
this happens?

You can run all or any daemons in 32-bit mode or 64 bit-mode. You can
mix-and-match. If you have many millions of files, then you might want to
allocte more than 3GB heap space to the namenode and secondary namenode. In
that case, you will jave to run the namenode and secondary namenode using 64
bit JVM.

dhruba


On Tue, Nov 25, 2008 at 4:39 PM, lohit <[EMAIL PROTECTED]> wrote:

> Well, if I think about,  image corruption might not happen, since each
> checkpoint initiation would have unique number.
>
> I was just wondering what would happen in this case
> Consider this scenario.
> Time 1 <-- SN1 asks NN image and edits to merge
> Time 2 <-- SN2 asks NN image and edits to merge
> Time 2 <-- SN2 returns new image
> Time 3 <-- SN1 returns new image.
> I am not sure what happens here, but its best to test it out before setting
> up something like this.
>
> And if you have multiple entries in NN file, then one SNN checkpoint would
> update all NN entries, so redundant SNN isnt buying you much.
>
> Thanks,
> Lohit
>
>
>
> - Original Message 
> From: Sagar Naik <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Tuesday, November 25, 2008 4:32:26 PM
> Subject: Re: 64 bit namenode and secondary namenode & 32 bit datanode
>
>
>
> lohit wrote:
> > I might be wrong, but my assumption is running SN either in 64/32
> shouldn't matter.
> > But I am curious how two instances of Secondary namenode is setup, will
> both of them talk to same NN and running in parallel?
> > what are the advantages here.
> >
> I just have multiple entries master file. I am not aware of image
> corruption (did not take look into it). I did for SNN redundancy
> Pl correct me if I am wrong
> Thanks
> Sagar
> > Wondering if there are chances of image corruption.
> >
> > Thanks,
> > lohit
> >
> > - Original Message 
> > From: Sagar Naik <[EMAIL PROTECTED]>
> > To: core-user@hadoop.apache.org
> > Sent: Tuesday, November 25, 2008 3:58:53 PM
> > Subject: 64 bit namenode and secondary namenode & 32 bit datanode
> >
> > I am trying to migrate from 32 bit jvm and 64 bit for namenode only.
> > *setup*
> > NN - 64 bit
> > Secondary namenode (instance 1) - 64 bit
> > Secondary namenode (instance 2)  - 32 bit
> > datanode- 32 bit
> >
> > From the mailing list I deduced that NN-64 bit and Datanode -32 bit combo
> works
> > But, I am not sure if S-NN-(instance 1--- 64 bit ) and S-NN (instance 2
> -- 32 bit) will work with this setup.
> >
> > Also, do shud I be aware of any other issues for migrating over to 64 bit
> namenode
> >
> > Thanks in advance for all the suggestions
> >
> >
> > -Sagar
> >
> >
>

Re: Block placement in HDFS

2008-11-25 Thread Dhruba Borthakur

Hi Dennis,

There were some discussions on this topic earlier:

http://issues.apache.org/jira/browse/HADOOP-3799

Do you have any specific use-case for this feature?

thanks,
dhruba

On Mon, Nov 24, 2008 at 10:22 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:

>
> On Nov 24, 2008, at 8:44 PM, Mahadev Konar wrote:
>
>  Hi Dennis,
>>  I don't think that is possible to do.
>>
>
> No, it is not possible.
>
>   The block placement is determined
>> by HDFS internally (which is local, rack local and off rack).
>>
>
> Actually, it was changed in 0.17 or so to be node-local, off-rack, and a
> second node off rack.
>
> -- Owen
>

Re: Anything like RandomAccessFile in Hadoop FS ?

2008-11-13 Thread Dhruba Borthakur

One can open a file and then seek to an offset and then start reading
from there. For writing, one can write only to the end of an existing
file using FileSystem.append().

hope this helps,
dhruba

On Thu, Nov 13, 2008 at 1:24 PM, Tsz Wo (Nicholas), Sze
<[EMAIL PROTECTED]> wrote:
> Append is going to be available in 0.19 (not yet released).  There are new 
> FileSystem APIs for append, e.g.
>
> //FileSysetm.java
>  public abstract FSDataOutputStream append(Path f, int bufferSize,
>  Progressable progress) throws IOException;
>
> Nicholas Sze
>
>
>
>
> - Original Message 
>> From: Bryan Duxbury <[EMAIL PROTECTED]>
>> To: Wasim Bari <[EMAIL PROTECTED]>
>> Cc: [EMAIL PROTECTED]
>> Sent: Thursday, November 13, 2008 1:11:57 PM
>> Subject: Re: Anything like RandomAccessFile in Hadoop FS ?
>>
>> I'm not sure off hand. Maybe someone else can point you in the right 
>> direction?
>>
>> On Nov 13, 2008, at 1:09 PM, Wasim Bari wrote:
>>
>> > Hi,
>> >Thanks for reply.
>> >
>> > HDFS supports append file. How can I do this ?
>> > I tried to look API under fileSystem create method but couldn't find.
>> >
>> > Thanks for ur help.
>> >
>> > Wasim
>> >
>> > --
>> > From: "Bryan Duxbury"
>> > Sent: Thursday, November 13, 2008 9:48 PM
>> > To:
>> > Subject: Re: Anything like RandomAccessFile in Hadoop FS ?
>> >
>> >> If you mean a file where you can write anywhere, then no. HDFS is  
>> >> streaming
>> only. If you want to read from anywhere, then no problem -  just use seek() 
>> and
>> then read.
>> >> On Nov 13, 2008, at 11:40 AM, Wasim Bari wrote:
>> >>> Hi,
>> >>>  Is there any Utility for Hadoop files which can work same as
>> RandomAccessFile in Java ?
>> >>> Thanks,
>> >>>
>> >>> Wasim
>
>

Re: Best way to handle namespace host failures

2008-11-10 Thread Dhruba Borthakur

Couple of things that one can do:

1. dfs.name.dir should have at least two locations, one on the local
disk and one on NFS. This means that all transactions are
synchronously logged into two places.

2. Create a virtual IP, say name.xx.com that points to the real
machine name of the machine on which the namenode runs.

If the namenode machine burns, then change the virtual IP to point to
a new machine. Copy the namenode metadata from the NFS location to the
local disk on this new machine. Then start namenode on this new
machine.

Done!
-dhruba

On Mon, Nov 10, 2008 at 12:24 AM, Goel, Ankur <[EMAIL PROTECTED]> wrote:
> Hi Folks,
>
> I am looking for some advice on some the ways / techniques
> that people are using to get around namenode failures (Both disk and
> host).
>
> We have a small cluster with several job scheduled for periodic
> execution on the same host where name server runs. What we would like to
> have is an automatic failover mechanism in hadoop so that a secondary
> namenode automatically takes the roll of a master.
>
>
>
> I can move this discussion to a JIRA if people are interested.
>
>
>
> Thanks
>
> -Ankur
>
>

Re: Can FSDataInputStream.read return 0 bytes and if so, what does that mean?

2008-11-08 Thread Dhruba Borthakur

It can return 0 if and only if the requested size was zero. For EOF,
it should return -1.

dhruba

On Fri, Nov 7, 2008 at 8:09 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote:
> Just want to ensure 0 iff EOF or the requested #of bytes was 0.
>
> On 11/7/08 6:13 PM, "Pete Wyckoff" <[EMAIL PROTECTED]> wrote:
>
>
>
> The javadocs says reads up to size bytes. What happens if it returns < 0
> (presumably an error) or 0 bytes (??)
>
> Thanks, pete
>
>
>
>

Re: Question on opening file info from namenode in DFSClient

2008-11-08 Thread Dhruba Borthakur

Hi Taeho,

Thanks for ur explanation. If your application opens a dfs file and
does not close it, then the dfsclient will automatcally keep block
locations cached. So, you could achieve your desired goal by
developing a cache layer (above HDFS) that does not close the hdfs
file even if the user has closed it. This cache-layer needs to manage
this cache-pol of HDFS fle handles.

does this help?
thanks,
dhruba




On Fri, Nov 7, 2008 at 12:53 AM, Taeho Kang <[EMAIL PROTECTED]> wrote:
> Hi, thanks for your reply Dhruba,
>
> One of my co-workers is writing a BigTable-like application that could be
> used for online, near-real-time, services. So since the application could be
> hooked into online services, there would times when a large number of users
> (e.g. 1000 users) request to access few files in a very short time.
>
> Of course, in a batch process job, this is a rare case, but for online
> services, it's quite a common case.
> I think HBase developers would have run into similar issues as well.
>
> Is this enough explanation?
>
> Thanks in advance,
>
> Taeho
>
>
>
> On Tue, Nov 4, 2008 at 3:12 AM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote:
>
>> In the current code, details about block locations of a file are
>> cached on the client when the file is opened. This cache remains with
>> the client until the file is closed. If the same file is re-opened by
>> the same DFSClient, it re-contacts the namenode and refetches the
>> block locations. This works ok for most map-reduce apps because it is
>> rare that the same DSClient re-opens the same file again.
>>
>> Can you pl explain your use-case?
>>
>> thanks,
>> dhruba
>>
>>
>> On Sun, Nov 2, 2008 at 10:57 PM, Taeho Kang <[EMAIL PROTECTED]> wrote:
>> > Dear Hadoop Users and Developers,
>> >
>> > I was wondering if there's a plan to add "file info cache" in DFSClient?
>> >
>> > It could eliminate network travelling cost for contacting Namenode and I
>> > think it would greatly improve the DFSClient's performance.
>> > The code I was looking at was this
>> >
>> > ---
>> > DFSClient.java
>> >
>> >/**
>> > * Grab the open-file info from namenode
>> > */
>> >synchronized void openInfo() throws IOException {
>> >  /* Maybe, we could add a file info cache here! */
>> >  LocatedBlocks newInfo = callGetBlockLocations(src, 0, prefetchSize);
>> >  if (newInfo == null) {
>> >throw new IOException("Cannot open filename " + src);
>> >  }
>> >  if (locatedBlocks != null) {
>> >Iterator oldIter =
>> > locatedBlocks.getLocatedBlocks().iterator();
>> >Iterator newIter =
>> > newInfo.getLocatedBlocks().iterator();
>> >while (oldIter.hasNext() && newIter.hasNext()) {
>> >  if (!
>> oldIter.next().getBlock().equals(newIter.next().getBlock()))
>> > {
>> >throw new IOException("Blocklist for " + src + " has
>> changed!");
>> >  }
>> >}
>> >  }
>> >  this.locatedBlocks = newInfo;
>> >  this.currentNode = null;
>> >}
>> > ---
>> >
>> > Does anybody have an opinion on this matter?
>> >
>> > Thank you in advance,
>> >
>> > Taeho
>> >
>>
>

Re: Question on opening file info from namenode in DFSClient

2008-11-03 Thread Dhruba Borthakur

In the current code, details about block locations of a file are
cached on the client when the file is opened. This cache remains with
the client until the file is closed. If the same file is re-opened by
the same DFSClient, it re-contacts the namenode and refetches the
block locations. This works ok for most map-reduce apps because it is
rare that the same DSClient re-opens the same file again.

Can you pl explain your use-case?

thanks,
dhruba


On Sun, Nov 2, 2008 at 10:57 PM, Taeho Kang <[EMAIL PROTECTED]> wrote:
> Dear Hadoop Users and Developers,
>
> I was wondering if there's a plan to add "file info cache" in DFSClient?
>
> It could eliminate network travelling cost for contacting Namenode and I
> think it would greatly improve the DFSClient's performance.
> The code I was looking at was this
>
> ---
> DFSClient.java
>
>/**
> * Grab the open-file info from namenode
> */
>synchronized void openInfo() throws IOException {
>  /* Maybe, we could add a file info cache here! */
>  LocatedBlocks newInfo = callGetBlockLocations(src, 0, prefetchSize);
>  if (newInfo == null) {
>throw new IOException("Cannot open filename " + src);
>  }
>  if (locatedBlocks != null) {
>Iterator oldIter =
> locatedBlocks.getLocatedBlocks().iterator();
>Iterator newIter =
> newInfo.getLocatedBlocks().iterator();
>while (oldIter.hasNext() && newIter.hasNext()) {
>  if (! oldIter.next().getBlock().equals(newIter.next().getBlock()))
> {
>throw new IOException("Blocklist for " + src + " has changed!");
>  }
>}
>  }
>  this.locatedBlocks = newInfo;
>  this.currentNode = null;
>}
> ---
>
> Does anybody have an opinion on this matter?
>
> Thank you in advance,
>
> Taeho
>

Re: [hive-users] Hive Roadmap (Some information)

2008-10-27 Thread Dhruba Borthakur

Hi Ben,

And, if I may add, if you would like to contribute the code to make this 
happen, that will be awesome! In that case, we can move this discussion to a 
JIRA.

Thanks,
dhruba


On 10/27/08 1:41 PM, "Ashish Thusoo" <[EMAIL PROTECTED]> wrote:

We did have some discussions around it a while back but we put it on the back 
burner considering that there were a lot of algorithmic improvements that we 
could make in the current code itself. We reckoned that we could make 
significant improvements there first and then measure the improvements we could 
get out of byte code generation. What kind of performance speedups have you 
seen with byte code generation in data processing applications?

Ashish

-Original Message-
From: Ben Maurer [mailto:[EMAIL PROTECTED]
Sent: Monday, October 27, 2008 1:08 PM
To: Ashish Thusoo
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; core-user@hadoop.apache.org
Subject: Re: [hive-users] Hive Roadmap (Some information)

Have you guys considered translating the syntax tree for queries into Java 
bytecode? Java bytecode is great for this type of process because it's 
extremely high level -- the code generation mostly focuses on type checking and 
name resolution. However, it enables the JIT to perform register allocation and 
other low level optimizations for good performance.

-b

On Mon, 27 Oct 2008, Ashish Thusoo wrote:

> Folks,
>
> Here are some of the things that we are working on internally at Facebook. We 
> thought it would be a good idea to let everyone know what is going on with 
> Hive development. We will put this up on the wiki as well.
>
> 1. Integrating Dynamic SerDe with the DDL. (Zheng/Pete) - This allows
> the users to create typed tables along with list and map types from
> the DDL 2. Support for Statistics. (Ashish) - These stats are needed
> to make optimization decisions 3. Join Optimizations. (Prasad) -
> Mapside joins, semi join techniques etc to do the join faster 4.
> Predicate Pushdown Optimizations. (Namit) - pushing predicates just above the 
> table scan for certain situations in joins as well as ensuring that only 
> required columns are sent across map/reduce boundaries 5. Group By 
> Optimizations. (Joydeep) - various optimizations to make group by faster 6. 
> Optimizations to reduce the number of map files created by filter operations. 
> (Dhrubha) - Filters with a large number of mappers produces a lot of files 
> which slows down the following operations. This tries to address problems 
> with that.
> 7. Transformations in LOAD. (Joydeep) - LOAD currently does not transform the 
> input data if it is not in the format expected by the destination table.
> 8. Schemaless map/reduce. (Zheng) - TRANSFORM needs schema while map/reduce 
> is schema less.
> 9. Improvements to TRANSFORM. (Zheng) - Make this more intuitive to 
> map/reduce developers - evaluate some other keywords etc..
> 10. Error Reporting Improvements. (Pete) - Make error reporting for
> parse errors better 11. Help on CLI. (Joydeep) - add help to the CLI
> 12. Explode and Collect Operators. (Zheng) - Explode and collect operators to 
> convert collections to individual items and vice versa.
> 13. Propagating sort properties to destination tables. (Prasad) - If the 
> query produces sorted we want to capture that in the destination table's 
> metadata so that downstream optimizations can be enabled.
>
> Other contributions from outside FB ...
> 1. JDBC driver (Michi Mutsuzaki @ stanford.edu, Raghu @ stanford.edu)
> 2. Fixes to CLI driver (Jeremy Huylebroeck) 3. Web interface...
>
> Most of these have a JIRA associated. A lot of focus is on running things 
> faster in Hive considering that we have a good feature set now...
>
> Comments/contributions are welcome. Please go to the JIRA and check out 
> contrib/hive...
>
> Thanks,
> Ashish
> ___
> hive-users mailing list
> [EMAIL PROTECTED]
> http://publists.facebook.com/mailman/listinfo/hive-users
>
>
___
hive-users mailing list
[EMAIL PROTECTED]
http://publists.facebook.com/mailman/listinfo/hive-users

Re: Thinking about retriving DFS metadata from datanodes!!!

2008-09-11 Thread Dhruba Borthakur

My opinion is to not store file-namespace related metadata on the
datanodes. When a file is renamed, one has to contact all datanodes to
change this new metadata. Worse still, if one renames an entire
subdirectory, all blocks that belongs to all files in the subdirectory
have to be updated. Similarly, if in future,  a file has multiple
patches to it (links), a block may belong to two filenames.

In the future, if HDFS wants to implement any kind of de-duplication
(i.e. if the same block data appears in multiple files, the file
system can intelligently keep only one copy of the block).. it will be
difficult to do.

thanks,
dhruba



On Wed, Sep 10, 2008 at 7:40 PM, 叶双明 <[EMAIL PROTECTED]> wrote:
> Thanks Ari Rabkin!
>
> 1. I think the cost is very low, if the block's size is 10m, 1k/10m almost
> 0.01% of the disk space.
>
> 2. Actually, if two of racks lose and replication <= 3, it seem that we
> can't recover all data. But in the situation of losing one rack of two racks
> and replication >=2, we can recover all data.
>
> 3. Suppose we recover 87.5% of data. I am not sure whether or not the random
> 87.5% of the data is usefull for every user. But in the situation of the
> size of most file is less than block'size, we can recover  so much data,.Any
> recovered data may be  valuable for some user.
>
> 4. I guess most small companies or organizations just have a cluster with
> 10-100 nodes, and they can not afford a second HDFS cluster in a different
> place or SAN. And it is a simple way to I think they would be pleased to
> ensure data safety for they.
>
> 5. We can config to turn on when someone need it, or turn it off otherwise.
>
> Glad to discuss with you！
>
>
> 2008/9/11 Ariel Rabkin <[EMAIL PROTECTED]>
>
>> I don't understand this use case.
>>
>> Suppose that you lose half the nodes in the cluster.  On average,
>> 12.5% of your blocks were exclusively stored on the half the cluster
>> that's dead.  For many (most?) applications, a random 87.5% of the
>> data isn't really useful.  Storing metadata in more places would let
>> you turn a dead cluster into a corrupt cluster, but not into a working
>> one.   If you need to survive major disasters, you want a second HDFS
>> cluster in a different place.
>>
>> The thing that might be useful to you, if you're worried about
>> simultaneous namenode and secondary NN failure, is to store the edit
>> log and fsimage on a SAN, and get fault tolerance that way.
>>
>> --Ari
>>
>> On Tue, Sep 9, 2008 at 6:38 PM, 叶双明 <[EMAIL PROTECTED]> wrote:
>> > Thanks for paying attention  to my tentative idea!
>> >
>> > What I thought isn't how to store the meradata, but the final (or last)
>> way
>> > to recover valuable data in the cluster when something worst (which
>> destroy
>> > the metadata in all multiple NameNode) happen. i.e. terrorist attack  or
>> > natural disasters destroy half of cluster nodes within all NameNode, we
>> can
>> > recover as much data as possible by this mechanism, and hava big chance
>> to
>> > recover entire data of cluster because fo original replication.
>> >
>> > Any suggestion is appreciate!
>> >
>> > 2008/9/10 Pete Wyckoff <[EMAIL PROTECTED]>
>> >
>> >> +1 -
>> >>
>> >> from the perspective of the data nodes, dfs is just a block-level store
>> and
>> >> is thus much more robust and scalable.
>> >>
>> >>
>> >>
>> >> On 9/9/08 9:14 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:
>> >>
>> >> > This isn't a very stable direction. You really don't want multiple
>> >> distinct
>> >> > methods for storing the metadata, because discrepancies are very bad.
>> >> High
>> >> > Availability (HA) is a very important medium term goal for HDFS, but
>> it
>> >> will
>> >> > likely be done using multiple NameNodes and ZooKeeper.
>> >> >
>> >> > -- Owen
>> >>
>>
>> --
>> Ari Rabkin [EMAIL PROTECTED]
>> UC Berkeley Computer Science Department
>>
>
>
>
> --
> Sorry for my english!!  明
> Please help me to correct my english expression and error in syntax
>

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

2008-09-07 Thread Dhruba Borthakur

The DFS errors might have been caused by

http://issues.apache.org/jira/browse/HADOOP-4040

thanks,
dhruba

On Sat, Sep 6, 2008 at 6:59 AM, Devaraj Das <[EMAIL PROTECTED]> wrote:
> These exceptions are apparently coming from the dfs side of things. Could
> someone from the dfs side please look at these?
>
>
> On 9/5/08 3:04 PM, "Espen Amble Kolstad" <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Thanks!
>> The patch applies without change to hadoop-0.18.0, and should be
>> included in a 0.18.1.
>>
>> However, I'm still seeing:
>> in hadoop.log:
>> 2008-09-05 11:13:54,805 WARN  dfs.DFSClient - Exception while reading
>> from blk_3428404120239503595_2664 of
>> /user/trank/segments/20080905102650/crawl_generate/part-00010 from
>> somehost:50010: java.io.IOException: Premeture EOF from in
>> putStream
>>
>> in datanode.log:
>> 2008-09-05 11:15:09,554 WARN  dfs.DataNode -
>> DatanodeRegistration(somehost:50010,
>> storageID=DS-751763840-somehost-50010-1219931304453, infoPort=50075,
>> ipcPort=50020):Got exception while serving
>> blk_-4682098638573619471_2662 to
>> /somehost:
>> java.net.SocketTimeoutException: 48 millis timeout while waiting
>> for channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/somehost:50010
>> remote=/somehost:45244]
>>
>> These entries in datanode.log happens a few minutes apart repeatedly.
>> I've reduced # map-tasks so load on this node is below 1.0 with 5GB of
>> free memory (so it's not resource starvation).
>>
>> Espen
>>
>> On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das <[EMAIL PROTECTED]> wrote:
 I started a profile of the reduce-task. I've attached the profiling output.
 It seems from the samples that ramManager.waitForDataToMerge() doesn't
 actually wait.
 Has anybody seen this behavior.
>>>
>>> This has been fixed in HADOOP-3940
>>>
>>>
>>> On 9/4/08 6:36 PM, "Espen Amble Kolstad" <[EMAIL PROTECTED]> wrote:
>>>
 I have the same problem on our cluster.

 It seems the reducer-tasks are using all cpu, long before there's anything
 to
 shuffle.

 I started a profile of the reduce-task. I've attached the profiling output.
 It seems from the samples that ramManager.waitForDataToMerge() doesn't
 actually wait.
 Has anybody seen this behavior.

 Espen

 On Thursday 28 August 2008 06:11:42 wangxu wrote:
> Hi,all
> I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
> and running hadoop on one namenode and 4 slaves.
> attached is my hadoop-site.xml, and I didn't change the file
> hadoop-default.xml
>
> when data in segments are large,this kind of errors occure:
>
> java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129
> file=/user/root/crawl_debug/segments/20080825053518/content/part-2/data
> at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.jav
> a:1462) at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1
> 312) at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417) 
> at
> java.io.DataInputStream.readFully(DataInputStream.java:178)
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64
> ) at 
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> at
> org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
> at
> org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.ja
> va:1712) at
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:
> 1787) at
> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceF
> ileRecordReader.java:104) at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRe
> ader.java:79) at
> org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.
> java:112) at
> org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReade
> r.java:130) at
> org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(Compo
> siteRecordReader.java:398) at
> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:5
> 6) at
> org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:3
> 3) at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
>
>
> how can I correct this?
> thanks.
> Xu

>>>
>>>
>>>
>
>
>

Re: Setting up a Hadoop cluster where nodes are spread over the Internet

2008-08-09 Thread Dhruba Borthakur

In almost all hadoop configurations, all host names can be specified
as IP address. So, in your hadoop-site.xml, please specify the IP
address of the namenode (instead of its hostname).

-dhruba

2008/8/8 Lucas Nazário dos Santos <[EMAIL PROTECTED]>:
> Thanks Andreas. I'll try it.
>
>
> On Fri, Aug 8, 2008 at 5:47 PM, Andreas Kostyrka <[EMAIL PROTECTED]>wrote:
>
>> On Friday 08 August 2008 15:43:46 Lucas Nazário dos Santos wrote:
>> > You are completely right. It's not safe at all. But this is what I have
>> for
>> > now:
>> > two computers distributed across the Internet. I would really appreciate
>> if
>> > anyone could give me spark on how to configure the namenode's IP in a
>> > datanode. As I could identify in log files, the datanode keeps trying to
>> > connect
>> > to the IP 10.1.1.5, which is the internal IP of the namenode. I just
>> need a
>> > way
>> > to say to the datanode "Hey, could you instead connect to the IP
>> 172.1.23.2
>> > "?
>>
>> Your only bet is to set it up in a VPNed environment. That would make it
>> securitywise okay too.
>>
>> Andreas
>>
>> >
>> > Lucas
>> >
>> > On Fri, Aug 8, 2008 at 10:25 AM, Lukáš Vlček <[EMAIL PROTECTED]>
>> wrote:
>> > > HI,
>> > >
>> > > I am not an expert on Hadoop configuration but is this safe? As far as
>> I
>> > > understand the IP address is public and connection to the datanode port
>> > > is not secured. Am I correct?
>> > >
>> > > Lukas
>> > >
>> > > On Fri, Aug 8, 2008 at 8:35 AM, Lucas Nazário dos Santos <
>> > >
>> > > [EMAIL PROTECTED]> wrote:
>> > > > Hello again,
>> > > >
>> > > > In fact I can get the cluster up and running with two nodes in
>> > > > different LANs. The problem appears when executing a job.
>> > > >
>> > > > As you can see in the piece of log bellow, the datanode tries to
>> > >
>> > > comunicate
>> > >
>> > > > with the namenode using the IP 10.1.1.5. The issue is that the
>> datanode
>> > > > should be using a valid IP, and not 10.1.1.5.
>> > > >
>> > > > Is there a way of manually configuring the datanode with the
>> namenode's
>> > >
>> > > IP,
>> > >
>> > > > so I can change from 10.1.1.5 to, say 189.11.131.172?
>> > > >
>> > > > Thanks,
>> > > > Lucas
>> > > >
>> > > >
>> > > > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker:
>> > > > TaskTracker up at: localhost/127.0.0.1:60394
>> > > > 2008-08-08 02:34:23,335 INFO org.apache.hadoop.mapred.TaskTracker:
>> > >
>> > > Starting
>> > >
>> > > > tracker tracker_localhost:localhost/127.0.0.1:60394
>> > > > 2008-08-08 02:34:23,589 INFO org.apache.hadoop.mapred.TaskTracker:
>> > >
>> > > Starting
>> > >
>> > > > thread: Map-events fetcher for all reduce tasks on
>> > > > tracker_localhost:localhost/127.0.0.1:60394
>> > > > 2008-08-08 03:06:43,239 INFO org.apache.hadoop.mapred.TaskTracker:
>> > > > LaunchTaskAction: task_200808080234_0001_m_00_0
>> > > > 2008-08-08 03:07:43,989 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >
>> > > connect
>> > >
>> > > > to server: /10.1.1.5:9000. Already tried 1 time(s).
>> > > > 2008-08-08 03:08:44,999 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >
>> > > connect
>> > >
>> > > > to server: /10.1.1.5:9000. Already tried 2 time(s).
>> > > > 2008-08-08 03:09:45,999 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >
>> > > connect
>> > >
>> > > > to server: /10.1.1.5:9000. Already tried 3 time(s).
>> > > > 2008-08-08 03:10:47,009 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >
>> > > connect
>> > >
>> > > > to server: /10.1.1.5:9000. Already tried 4 time(s).
>> > > > 2008-08-08 03:11:48,009 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >
>> > > connect
>> > >
>> > > > to server: /10.1.1.5:9000. Already tried 5 time(s).
>> > > > 2008-08-08 03:12:49,026 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >
>> > > connect
>> > >
>> > > > to server: /10.1.1.5:9000. Already tried 6 time(s).
>> > > > 2008-08-08 03:13:50,036 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >
>> > > connect
>> > >
>> > > > to server: /10.1.1.5:9000. Already tried 7 time(s).
>> > > > 2008-08-08 03:14:51,046 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >
>> > > connect
>> > >
>> > > > to server: /10.1.1.5:9000. Already tried 8 time(s).
>> > > > 2008-08-08 03:15:52,056 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >
>> > > connect
>> > >
>> > > > to server: /10.1.1.5:9000. Already tried 9 time(s).
>> > > > 2008-08-08 03:16:53,066 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >
>> > > connect
>> > >
>> > > > to server: /10.1.1.5:9000. Already tried 10 time(s).
>> > > > 2008-08-08 03:17:54,077 WARN org.apache.hadoop.mapred.TaskTracker:
>> > > > Error initializing task_200808080234_0001_m_00_0:
>> > > > java.net.SocketTimeoutException
>> > > >at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
>> > > >at
>> > > >
>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
>> > > >at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
>> > > >at org.apache.hadoop.ipc.Client.call(

Re: What will happen if two processes writes the same HDFS file

2008-08-09 Thread Dhruba Borthakur

When the first one contacts the namenode to open the file for writing,
the namenode records this info in a "lease". When the second process
contacts the namenode to  open the same file for writing, the namenode
sees that a "lease" already exists for the file and rejects the
request from the second process.

On Sat, Aug 9, 2008 at 8:30 PM, 11 Nov. <[EMAIL PROTECTED]> wrote:
> IMHO, there must be one process that fails the write.
>

Re: java.io.IOException: Could not get block locations. Aborting...

2008-08-08 Thread Dhruba Borthakur

It is possible that your namenode is overloaded and is not able to
respond to RPC requests from clients. Please check the namenode logs
to see if you see lines of the form "discarding calls...".

dhrua

On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov
<[EMAIL PROTECTED]> wrote:
> I come across the same issue and also with hadoop 0.17.1
>
> would be interesting if someone say the cause of the issue.
>
> Alex
>
> 2008/8/8 Steve Loughran <[EMAIL PROTECTED]>
>
>> Piotr Kozikowski wrote:
>>
>>> Hi there:
>>>
>>> We would like to know what are the most likely causes of this sort of
>>> error:
>>>
>>> Exception closing
>>> file
>>> /data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_22_0/part-00022
>>> java.io.IOException: Could not get block locations. Aborting...
>>>at
>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080)
>>>at
>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
>>>at
>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818)
>>>
>>> Our map-reduce job does not fail completely but over 50% of the map tasks
>>> fail with this same error.
>>> We recently migrated our cluster from 0.16.4 to 0.17.1, previously we
>>> didn't have this problem using the same input data in a similar map-reduce
>>> job
>>>
>>> Thank you,
>>>
>>> Piotr
>>>
>>>
>> When I see this, its because the filesystem isnt completely up: there are
>> no locations for a specific file, meaning the client isn't getting back the
>> names of any datanodes holding the data from the name nodes.
>>
>> I've got a patch in JIRA that prints out the name of the file in question,
>> as that could be useful.
>>
>
>
>
> --
> Best Regards
> Alexander Aristov
>

Re: NameNode failover procedure

2008-07-30 Thread Dhruba Borthakur

I agree that NFS could have problems. We should ideally solve the
namenode HS issue without depending on NFS.

You can run the secondary namenode process on the same machine as the
primary. The primary namenode and the secondary namenode woudl
typically require the same amount of hep memory. If you allocate 8GB
of RAM to the primary that you would ideally allocate another 8GB of
RAM to the secondary namenode.

-dhruba

On Wed, Jul 30, 2008 at 11:48 AM, Himanshu Sharma <[EMAIL PROTECTED]> wrote:
>
> The NFS seems to be having problem as NFS locking causes namenode hangup.
> Can't be there any other way, say if namenode starts writing synchronously
> to secondary namenode apart from local directories, then in case of namenode
> failover, we can start the primary namenode process on secondary namenode
> and the latest checkpointed fsimage is already there on secondary namenode.
>
> This also raises a fundamental question, whether we can run secondary
> namenode process on the same node as primary namenode process without any
> out of memory / heap exceptions ? Also ideally what should be the memory
> size of primary namenode if alone and when with secondary namenode process ?
>
>
> Andrzej Bialecki wrote:
>>
>> Dhruba Borthakur wrote:
>>> A good way to implement failover is to make the Namenode log transactions
>>> to
>>> more than one directory, typically a local directory and a NFS mounted
>>> directory. The Namenode writes transactions to both directories
>>> synchronously.
>>>
>>> If the Namenode machine dies, copy the fsimage and fsiedits from the NFS
>>> server and you will have recovered *all* committed transactions.
>>>
>>> The SecondaryNamenode pulls the fsimage and fsedits once every configured
>>> period, typically ranging from a few minutes to an hour. If you use the
>>> image from the SecondaryNamenode, you might lose the last few minutes of
>>> transactions.
>>
>> That's a good idea. But then, what's the purpose of running a secondary
>> namenode, if it can't guarantee that the data loss is minimal ???
>> Should't edits be written synchronously to a secondary namenode, and
>> fsimage updated synchronously whenever a primary namenode performs a
>> checkpoint?
>>
>>
>> --
>> Best regards,
>> Andrzej Bialecki <><
>>   ___. ___ ___ ___ _ _   __
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/NameNode-failover-procedure-tp11711842p18740089.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
>

Re: Text search on a PDF file using hadoop

2008-07-23 Thread Dhruba Borthakur

One option for you is to use a pdf-to-text converter (many of them are
available online) and then run map-reduce on the txt file.

-dhruba

On Wed, Jul 23, 2008 at 1:07 AM, GaneshG
<[EMAIL PROTECTED]> wrote:
>
> Thanks Lohit, i am using only defalult reader and i am very new to hadoop.
> This is my map method
>
>  public void map(LongWritable key, Text value, OutputCollector Text> output, Reporter reporter) throws IOException {
>String line = value.toString();
>StringTokenizer tokenizer = new StringTokenizer(line);
>while (tokenizer.hasMoreTokens()) {
>
>String val = tokenizer.nextToken();
>try {
>
>if (val != null && val.contains("the")) {
>word.set(line);
>FileSplit spl = (FileSplit)reporter.getInputSplit();
>output.collect(word, new 
> Text(spl.getPath().getName()));
>}
>} catch (Exception e) {
>System.out.println(e);
>}
>}
>  }
>}
>
> I have a pdf file in my dfs input folder. can you tell me what i have to do
> to read pdf files?
>
> Thanks
> Ganesh.G
>
>
> lohit-2 wrote:
>>
>> Can you provide more information. How are you passing your input, are you
>> passing raw pdf files? If so, are you using your own record reader.
>> Default record reader wont read pdf files and you wont get the text out of
>> it as is.
>> Thanks,
>> Lohit
>>
>>
>>
>> - Original Message 
>> From: GaneshG <[EMAIL PROTECTED]>
>> To: core-user@hadoop.apache.org
>> Sent: Wednesday, July 23, 2008 1:51:52 AM
>> Subject: Text search on a PDF file using hadoop
>>
>>
>> while i search a text in a pdf file using hadoop, the results are not
>> coming
>> properly. i tried to debug my program, i could see the lines red from pdf
>> file is not formatted. please help me to resolve this.
>> --
>> View this message in context:
>> http://www.nabble.com/Text-search-on-a-PDF-file-using-hadoop-tp18606475p18606475.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Re%3A-Text-search-on-a-PDF-file-using-hadoop-tp18606558p18606703.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Re: All datanodes getting marked as dead

2008-06-18 Thread Dhruba Borthakur

You are running out of file handles on the namenode.  When this
happens, the namenode cannot receive heartbeats from datanodes because
these heartbeats arrive on a tcp/ip socket connection and the namenode
does not have any free file descriptors to accept these socket
connections. Your data is still safe with the datanodes. If you
increase the number of handles on the namenode, all datanodes will
re-join the cluster and things should be fine.

what OS platform is the namenode running on?

thanks,
dhruba

On Sun, Jun 15, 2008 at 5:47 AM, Murali Krishna <[EMAIL PROTECTED]> wrote:
> Hi,
>
>I was running some M/R job on a 90+ node cluster. While the
> job was running the entire data nodes seems to have become dead. Only
> major error I saw in the name node log is 'java.io.IOException: Too many
> open files'. The job might try to open thousands of file.
>
>After some time, there are lot of exceptions saying 'could
> only be replicated to 0 nodes instead of 1'. So looks like all the data
> nodes are not responding now; job has failed since it couldn't write. I
> can see the following in the data nodes logs:
>
>2008-06-15 02:38:28,477 WARN org.apache.hadoop.dfs.DataNode:
> java.net.SocketTimeoutException: timed out waiting for rpc response
>
>at org.apache.hadoop.ipc.Client.call(Client.java:484)
>
>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
>
>at org.apache.hadoop.dfs.$Proxy0.sendHeartbeat(Unknown Source)
>
>
>
> All processes (datanodes + namenodes) are still running..(dfs health
> status page shows all nodes as dead)
>
>
>
> Some questions:
>
> * Is this kind of behavior expected when name node runs out of
> file handles?
>
> * Why the data nodes are not able to send the heart beat (is it
> related to name node not having enough handles?)
>
> * What happens to the data in the hdfs when all the data nodes
> fail to send the heart beat and name node is in this state?
>
> * Is the solution is to just increase the number of file handles
> and restart the cluster?
>
>
>
> Thanks,
>
> Murali
>
>

Re: data locality in HDFS

2008-06-18 Thread Dhruba Borthakur

HDFS uses the network topology to distribute and replicate data. An
admin has to configure a script that describes the network topology to
HDFS. This is specified by using the parameter
"topology.script.file.name" in the Configuration file. This has been
tested when nodes are on different subnets in the same data center.

This code might not be generic (and is not yet tested) to support
multiple-data centers.

One can extend this topology by implementing one's own implementation
and specifying the new jar using the config parameter
topology.node.switch.mapping.impl. You will find more details at
http://hadoop.apache.org/core/docs/current/cluster_setup.html#Hadoop+Rack+Awareness

thanks,
dhruba

On Tue, Jun 17, 2008 at 10:18 PM, Ian Holsman (Lists) <[EMAIL PROTECTED]> wrote:
> hi.
>
> I want to run a distributed cluster, where i have say 20 machines/slaves in
> 3 seperate data centers that belong to the same cluster.
>
> Ideally I would like the other machines in the data center to be able to
> upload files (apache log files in this case) onto the local slaves and then
> have map/red tasks do their magic without having to move data until the
> reduce phase where the amount of data will be smaller.
>
> does Hadoop have this functionality?
> how do people handle multi-datacenter logging with hadoop in this case? do
> you just copy the data into a centeral location?
>
> regards
> Ian
>

Re: Maximum number of files in hadoop

2008-06-08 Thread Dhruba Borthakur

The maximum number of files in HDFS depends on the amount of memory
available for the namenode. Each file object and each block object
takes about  150 bytes of the memory. Thus, if you have 1million files
and each file has 1 one block each, then you would need about 3GB of
memory for the namenode.

thanks
dhruba

On Fri, Jun 6, 2008 at 11:51 PM, karthik raman <[EMAIL PROTECTED]> wrote:
> Hi,
>What is the maximum number of files that can be stored on HDFS? Is it 
> dependent on namenode memory configuration? Also does this impact on the 
> performance of namenode anyway?
> thanks in advance
> Karthik
>
>
>  From Chandigarh to Chennai - find friends all over India. Go to 
> http://in.promos.yahoo.com/groups/citygroups/

Re: HDFS Question re adding additional storage

2008-05-29 Thread Dhruba Borthakur

Hi Prasana,

Hadoop has rebalanceing feature since the 0.16 release. You can find
more details about it at

http://issues.apache.org/jira/browse/HADOOP-1652

You will find the Rebalancing User guide and the admin guide.
thanks
dhruba


On Thu, May 29, 2008 at 8:38 AM, prasana.iyengar
<[EMAIL PROTECTED]> wrote:
>
> Dhruba: I am a newbie; in my search for this capability I came across your
> post
>
> 1. does 0.16.0 have this capability ?
> 2. does this take place lazily - that's what it'd seem to me based on
> running it in our cluster.
> 3. is there way to force the rebalancing operation
>
> thanks,
> -prasana
>
> Dhruba Borthakur wrote:
>>
>>  What that means is that the new nodes will be relatively empty
>> till new data arrives into the cluster. It might take a while for the new
>> nodes to get filled up.
>>
>> Work is in progress to facilitate cluster-data rebalance when new
>> Datanodes
>> are added.
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Error-reporting-from-map-function-tp11883675p17537223.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
>

Re: "firstbadlink is/as" messages in 0.16.4

2008-05-24 Thread Dhruba Borthakur

This "firstbadlink" was an mis-configured log message in the code. It
is innocuous and has since been fixed in 0.17 release.
http://issues.apache.org/jira/browse/HADOOP-3029

thanks,
dhruba

On Sat, May 24, 2008 at 7:03 PM, C G <[EMAIL PROTECTED]> wrote:
> Hi All:
>
>  So far, running 0.16.4 has been a bit of a nightmare.  The latest problem 
> I'm seeing concerns a series of odd messages concerning "firstbadlink".   
> I've attached one of these message sequences.  I'm curious what it's trying 
> to tell me.  This is from the master node, but the same pattern also occurs 
> on all the slaves as well.
>
>  Thanks for any insight...
>  C G
>
>  2008-05-24 22:35:00,170 INFO org.apache.hadoop.dfs.DataNode: Receiving block 
> blk_-5166040941538436352 src: /10.2.13.1:41942 dest: /10.2.13.1:50010
> 2008-05-24 22:35:00,283 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 got 
> response for connect ack  from downstream datanode with firstbadlink as
> 2008-05-24 22:35:00,283 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 
> forwarding connect ack to upstream firstbadlink is
> 2008-05-24 22:35:00,313 INFO org.apache.hadoop.dfs.DataNode: Received block 
> blk_-5166040941538436352 of size 427894 from /10.2.13.1
> 2008-05-24 22:35:00,313 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 
> 2 for block blk_-5166040941538436352 terminating
>
>
>
>

Re: 0.16.4 DFS dropping blocks, then won't retart...

2008-05-23 Thread Dhruba Borthakur

If you look at the log message starting with "STARTUP_MSG:   build
=..." you will see that the namenode and good datanode was built by CG
whereas the bad datanodes were compiled by hadoopqa!

thanks,
dhruba

On Fri, May 23, 2008 at 9:01 AM, C G <[EMAIL PROTECTED]> wrote:
> 2008-05-23 11:53:25,377 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
> /
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = primary/10.2.13.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.16.4-dev
> STARTUP_MSG:   build = svn+ssh://[EMAIL 
> PROTECTED]/srv/svn/repositories/svnvmc/overdrive/trunk/hadoop-0.16.4 -r 2182; 
> compiled
>  by 'cg' on Mon May 19 17:47:05 EDT 2008
> /
> 2008-05-23 11:53:26,107 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
> Initializing RPC Metrics with hostName=NameNode, port=54310
> 2008-05-23 11:53:26,136 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: 
> overdrive1-node-primary/10.2.13.1:54310
> 2008-05-23 11:53:26,146 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2008-05-23 11:53:26,149 INFO org.apache.hadoop.dfs.NameNodeMetrics: 
> Initializing NameNodeMeterics using context object:org.apache.hadoop.metr
> ics.spi.NullContext
> 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=cg,cg
> 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: 
> supergroup=supergroup
> 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: 
> isPermissionEnabled=true
> 2008-05-23 11:53:36,064 INFO org.apache.hadoop.fs.FSNamesystem: Finished 
> loading FSImage in 9788 msecs
> 2008-05-23 11:53:36,079 INFO org.apache.hadoop.dfs.StateChange: STATE* 
> SafeModeInfo.enter: Safe mode is ON.
> Safe mode will be turned off automatically.
> 2008-05-23 11:53:36,115 INFO org.apache.hadoop.fs.FSNamesystem: Registered 
> FSNamesystemStatusMBean
> 2008-05-23 11:53:36,339 INFO org.mortbay.util.Credential: Checking Resource 
> aliases
> 2008-05-23 11:53:36,410 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2008-05-23 11:53:36,410 INFO org.mortbay.util.Container: Started 
> HttpContext[/static,/static]
> 2008-05-23 11:53:36,410 INFO org.mortbay.util.Container: Started 
> HttpContext[/logs,/logs]
> 2008-05-23 11:53:36,752 INFO org.mortbay.util.Container: Started [EMAIL 
> PROTECTED]
> 2008-05-23 11:53:36,925 INFO org.mortbay.util.Container: Started 
> WebApplicationContext[/,/]
> 2008-05-23 11:53:36,926 INFO org.mortbay.http.SocketListener: Started 
> SocketListener on 0.0.0.0:50070
> 2008-05-23 11:53:36,926 INFO org.mortbay.util.Container: Started [EMAIL 
> PROTECTED]
> 2008-05-23 11:53:36,926 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up 
> at: 0.0.0.0:50070
> 2008-05-23 11:53:36,927 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2008-05-23 11:53:36,927 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 1 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 2 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 3 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 4 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 5 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 6 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 7 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 8 on 54310: starting
> 2008-05-23 11:53:36,940 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 54310: starting
>  2008-05-23 11:53:37,096 INFO org.apache.hadoop.dfs.NameNode: Error report 
> from worker9:50010: Incompatible build versions: na
> menode BV = 2182; datanode BV = 652614
> 2008-05-23 11:53:37,097 INFO org.apache.hadoop.dfs.NameNode: Error report 
> from worker12:50010: Incompatible build versions: n
> amenode BV = 2182; datanode BV = 652614
>  [error above repeated for all nodes in system]
> 2008-05-23 11:53:42,082 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.registerDatanode: node registration from 10.2.13.1:50010 st
> orage DS-1855907496-10.2.13.1-50010-1198767012191
> 2008-05-23 11:53:42,094 INFO org.apache.hadoop.net.NetworkTopology: Adding a 
> new node: /default-rack/10.2.13.1:50010
>
>  Oddly enough, the DataNode associated with the master node is up and running:
>
>  2008-05-23 11:53:25,380 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
> /***

Re: dfs.block.size vs avg block size

2008-05-18 Thread Dhruba Borthakur

There isn's a way to change the block size of an existing file. The
block size of a file can be specified only at the time of file
creation and cannot be changed later.

There isn't any wasted space in your system. If the block size is
128MB but you create a HDFS file of say size 10MB, then that file will
contain one block and that block will occupy only 10MB on HDFS
storage. No space gets wasted.

hope this helps,
dhruba

On Fri, May 16, 2008 at 4:42 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Hello,
>
> I checked the ML archives and the Wiki, as well as the HDFS user guide, but 
> could not find information about how to change block size of an existing HDFS.
>
> After running fsck I can see that my avg. block size is 12706144 B (cca 
> 12MB), and that's a lot smaller than what I have configured: 
> dfs.block.size=67108864 B
>
> Is the difference between the configured block size and actual (avg) block 
> size results effectively wasted space?
> If so, is there a way to change the DFS block size and have Hadoop shrink all 
> the existing blocks?
> I am OK with not running any jobs on the cluster for a day or two if I can do 
> something to free up the wasted disk space.
>
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>

Re: java.io.IOException: Could not obtain block / java.io.IOException: Could not get block locations

2008-05-16 Thread Dhruba Borthakur

What version of java are you using? How may threads are you running on
the namenode? How many cores does your machines have?

thanks,
dhruba

On Fri, May 16, 2008 at 6:02 AM, André Martin <[EMAIL PROTECTED]> wrote:
> Hi Hadoopers,
> we are experiencing a lot of "Could not obtain block / Could not get block
> locations IOExceptions" when processing a 400 GB large Map/Red job using our
> 6 nodes DFS & MapRed (v. 0.16.4) cluster. Each node is equipped with a 400GB
> Sata HDD and running Suse Linux Enterprise Edition. While processing this
> "huge" MapRed job, the name node doesn't seem to receive heartbeats from
> datanodes for up to a couple of minutes and thus marks those nodes as dead
> even they are still alive and serving blocks according to their logs. We
> first suspected network congestion and measured the inter-node bandwidth
> using scp - receiving throughputs of 30MB/s. CPU utilization is about 100%
> when processing the job, however, the tasktracker instances shouldn't cause
> such datanode drop outs?
> In the datanode logs, we see a lot of java.io.IOException: Block
> blk_-7943096461180653598 is valid, and cannot be written to. errors...
>
> Any ideas? Thanks in advance.
>
> Cu on the 'net,
>   Bye - bye,
>
>  < André   èrbnA >
>
>
>

Re: HDFS corrupt...how to proceed?

2008-05-11 Thread Dhruba Borthakur

Is it possible that new files were being created by running
applications between the first and second fsck runs?

thans,
dhruba


On Sun, May 11, 2008 at 8:55 PM, C G <[EMAIL PROTECTED]> wrote:
> The system hosting the namenode experienced an OS panic and shut down, we 
> subsequently rebooted it.  Currently we don't believe there is/was a bad disk 
> or other hardware problem.
>
>   Something interesting:  I've ran fsck twice, the first time it gave the 
> result I posted.  The second time I still declared the FS to be corrupt, but 
> said:
>   [many rows of periods deleted]
>   ..Status: CORRUPT
>   Total size:4900076384766 B
>   Total blocks:  994492 (avg. block size 4927215 B)
>   Total dirs:47404
>   Total files:   952310
>   Over-replicated blocks:0 (0.0 %)
>   Under-replicated blocks:   0 (0.0 %)
>   Target replication factor: 3
>   Real replication factor:   3.0
>
>
>  The filesystem under path '/' is CORRUPT
>
>   So it seems like it's fixing some problems on its own?
>
>   Thanks,
>   C G
>
>
>  Dhruba Borthakur <[EMAIL PROTECTED]> wrote:
>   Did one datanode fail or did the namenode fail? By "fail" do you mean
>  that the system was rebooted or was there a bad disk that caused the
>  problem?
>
>  thanks,
>  dhruba
>
>  On Sun, May 11, 2008 at 7:23 PM, C G
>
>
> wrote:
>  > Hi All:
>  >
>  > We had a primary node failure over the weekend. When we brought the node 
> back up and I ran Hadoop fsck, I see the file system is corrupt. I'm unsure 
> how best to proceed. Any advice is greatly appreciated. If I've missed a Wiki 
> page or documentation somewhere please feel free to tell me to RTFM and let 
> me know where to look.
>  >
>  > Specific question: how to clear under and over replicated files? Is the 
> correct procedure to copy the file locally, delete from HDFS, and then copy 
> back to HDFS?
>  >
>  > The fsck output is long, but the final summary is:
>  >
>  > Total size: 4899680097382 B
>  > Total blocks: 994252 (avg. block size 4928006 B)
>  > Total dirs: 47404
>  > Total files: 952070
>  > 
>  > CORRUPT FILES: 2
>  > MISSING BLOCKS: 24
>  > MISSING SIZE: 1501009630 B
>  > 
>  > Over-replicated blocks: 1 (1.0057812E-4 %)
>  > Under-replicated blocks: 14958 (1.5044476 %)
>  > Target replication factor: 3
>  > Real replication factor: 2.9849212
>  >
>  > The filesystem under path '/' is CORRUPT
>  >
>  >
>  >
>  > -
>  > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it 
> now.
>
>
>
>  -
>  Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it 
> now.

Re: Read timed out, Abandoning block blk_-5476242061384228962

2008-05-11 Thread Dhruba Borthakur

You bring up an interesting point. A big chunk of the code in the
Namenode is being done inside a global lock although there are pieces
(e.g. a portion of code that chooses datanodes for a newly allocated
block) that do execute outside this lock. But, it is probably the case
that the namenode does not benefit from more than 4 core or so (with
the current code).

If you have 8 cores, you can expriement with running map-reduce jobs
on the other 4 cores.

How much memory does your machine have and how many files does your
HDFS have? One possibility is that the memory pressure of the
map-reduce jobs causes more GC runs for the namenode process.

thanks,
dhruba


On Fri, May 9, 2008 at 7:54 PM, James Moore <[EMAIL PROTECTED]> wrote:
> On Fri, May 9, 2008 at 12:00 PM, Hairong Kuang <[EMAIL PROTECTED]> wrote:
>  >> I'm using the machine running the namenode to run maps as well.
>  > Please do not run maps on the machine that is running the namenode. This
>  > would cause CPU contention and slow down namenode. Thus more easily to see
>  > SocketTimeoutException.
>  >
>  > Hairong
>
>  I've turned off running tasks on the master, and I'm not seeing those errors.
>
>  The behavior was interesting.  On one job, I saw a total of 11 timeout
>  failures (where the map was reported as a failure), but all of them
>  happened in the first few minutes.  After that it worked well and
>  completed correctly.
>
>  I'm wondering if it's worth it, though.  If the number of maps/reduces
>  that the master machine can run is substantially greater than the
>  number of failures due to timeouts, isn't it worth having the master
>  run tasks?  It seems like there's probably a point where the number of
>  machines in the cluster makes having a separate master a requirement,
>  but at 20 8-core machines, it's not clear that dedicating a box to
>  being the master is a win.  (And having a smaller machine dedicated to
>  being the master is cheaper, but annoying.  I'd rather have N
>  identical boxes running the same AMI, etc.)
>
>  To anyone using amazon - definitely upgrade to the new kernels.  I now
>  have have very few instances of the 'Exception in
>  createBlockOutputStream' error that started this thread in my logs.
>  (These are different than the 11 timeouts I mentioned above, FYI).
>
>  The ones that are there all happened in one burst at  03:59:22 this 
> afternoon:
>
>  [EMAIL PROTECTED]:~/dev/hadoop$ bin/slaves.sh grep -r
>  'Exception in createBlockOutputStream' ~/dev/hadoop/logs/
>  domU-12-31-38-00-04-51.compute-1.internal:
>  
> /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_24_0/syslog:2008-05-09
>  03:59:22,713 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.EOFException
>  domU-12-31-38-00-D6-21.compute-1.internal:
>  
> /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_48_0/syslog:2008-05-09
>  03:59:22,989 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.22.111:50010
>  domU-12-31-38-00-D6-21.compute-1.internal:
>  
> /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_61_0/syslog:2008-05-09
>  03:59:22,398 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.EOFException
>  domU-12-31-38-00-60-D1.compute-1.internal:
>  
> /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_17_0/syslog:2008-05-09
>  03:59:22,880 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.217.203:50010
>  domU-12-31-38-00-CD-41.compute-1.internal:
>  
> /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_51_0/syslog:2008-05-09
>  03:59:23,012 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.34.31:50010
>  domU-12-31-38-00-D5-E1.compute-1.internal:
>  
> /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_26_0/syslog:2008-05-09
>  03:59:24,551 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.15.47:50010
>  domU-12-31-38-00-1D-D1.compute-1.internal:
>  
> /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_56_0/syslog:2008-05-09
>  03:59:23,504 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.11.159:50010
>  domU-12-31-38-00-1D-D1.compute-1.internal:
>  
> /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_50_0/syslog:2008-05-09
>  03:59:22,454 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.EOFException
>  domU-12-31-38-00-1D-D1.compute-1.internal:
>  
> /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_09_0/syslog:2008-05-09
>  03:59:22,944 I

Re: HDFS corrupt...how to proceed?

2008-05-11 Thread Dhruba Borthakur

Did one datanode fail or did the namenode fail? By "fail" do you mean
that the system was rebooted or was there a bad disk that caused the
problem?

thanks,
dhruba

On Sun, May 11, 2008 at 7:23 PM, C G <[EMAIL PROTECTED]> wrote:
> Hi All:
>
>   We had a primary node failure over the weekend.  When we brought the node 
> back up and I ran Hadoop fsck, I see the file system is corrupt.  I'm unsure 
> how best to proceed.  Any advice is greatly appreciated.   If I've missed a 
> Wiki page or documentation somewhere please feel free to tell me to RTFM and 
> let me know where to look.
>
>   Specific question:  how to clear under and over replicated files?  Is the 
> correct procedure to copy the file locally, delete from HDFS, and then copy 
> back to HDFS?
>
>   The fsck output is long, but the final summary is:
>
>Total size:4899680097382 B
>   Total blocks:  994252 (avg. block size 4928006 B)
>   Total dirs:47404
>   Total files:   952070
>   
>   CORRUPT FILES:2
>   MISSING BLOCKS:   24
>   MISSING SIZE: 1501009630 B
>   
>   Over-replicated blocks:1 (1.0057812E-4 %)
>   Under-replicated blocks:   14958 (1.5044476 %)
>   Target replication factor: 3
>   Real replication factor:   2.9849212
>
>  The filesystem under path '/' is CORRUPT
>
>
>
>  -
>  Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it 
> now.

Re: HDFS: fault tolerance to block losses with namenode failure

2008-05-06 Thread Dhruba Borthakur

Starting in 0.17 release, an application can invoke
DFSOutputStream.fsync() to persist block locations for a file even
before the file is closed.

thanks,
dhruba


On Tue, May 6, 2008 at 8:11 AM, Cagdas Gerede <[EMAIL PROTECTED]> wrote:
> If you are writing 10 blocks for a file and let's say in 10th block namenode
>  fails, all previous 9 blocks are lost because you were not able to close the
>  file and therefore namenode did not persist the information about 9 blocks
>  to the fsimage file.
>
>  How would you solve this problem in the application? Why does the hdfs
>  client make namenode persist every block once the block is written and
>  instead waits until closing of the file? Then, don't you need to keep a copy
>  of all the blocks in your application until you close the file successfully
>  to prevent data loss. Does it make sense to have this semantics with the
>  assumption of very large files with multiple blocks?
>
>  Thanks for your response,
>
>  --
>  
>  Best Regards, Cagdas Evren Gerede
>  Home Page: http://cagdasgerede.info
>

RE: Block reports: memory vs. file system, and Dividing offerService into 2 threads

2008-04-30 Thread dhruba Borthakur

>From a code perspective, the Namenode and Datanode are in sync in all
critical matters. But there is a possibility that the request from a
Namenode to a Datanode to delete a block might not have been received by
the Datanode because of a bad connection. This means that there could be
a leakage of storage. However, the current processing of block reports
every hour is too heavy-weight a cost to solve this problem. My
assumption is to make block reports occur very infrequently (maybe once
every day).

But when blocks get removed from under the Datanode, we would like to
detect this situation as soon as possible. Thus, it makes sense to
compare the Datanode data structures with what is on the disk once every
hour or so.

Implementing partial incremental block reports as you suggested are good
too, but maybe we do not need it if we do the above. Since full block
reports will be sent only very rarely (maybe once every 1 day), maybe we
can live with the current implementation for the daily block reports?

Thanks,
dhruba

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 30, 2008 9:07 AM
To: core-user@hadoop.apache.org
Subject: Re: Block reports: memory vs. file system, and Dividing
offerService into 2 threads

dhruba Borthakur wrote:
> My current thinking is that "block report processing" should compare
the
> blkxxx files on disk with the data structure in the Datanode memory.
If
> and only if there is some discrepancy between these two, then a block
> report be sent to the Namenode. If we do this, then we will
practically
> get rid of 99% of block reports.

Doesn't this assume that the namenode and datanode are 100% in sync? 
Another purpose of block reports is to make sure that the namenode and 
datanode agree, since failed RPCs, etc. might have permitted them to 
slip out of sync.  Or are we now confident that these are never out of 
sync?  Perhaps we should start logging whenever a block report
surprises?

Long ago we talked of implementing partial, incremental block reports. 
We'd divide blockid space into 64 sections.  The datanode would ask the 
namenode for the hash of its block ids in a section.  Full block lists 
would then only be sent when the hash differs.  Both sides would 
maintain hashes of all sections in memory.  Then, instead of making a 
block report every hour, we'd make a 1/64 block id check every minute.

Doug

RE: Block reports: memory vs. file system, and Dividing offerService into 2 threads

2008-04-30 Thread dhruba Borthakur

You bring up a good point. The creating and processing of block reports
do take a lot of resources. It affects DFS Scalability and performance
to some extent. Here are some more details:
http://issues.apache.org/jira/browse/HADOOP-1079

 

There is one thread in the Datanode that sends block confirmations to
the Namenode. The same thread computes the block report and sends it to
Namenode. This sequential nature is critical in ensuring that there is
no erroneous race condition in the Namenode. See
http://issues.apache.org/jira/browse/HADOOP-1135

 

The reason block reports are there is to detect inconsistencies if
blocks get deleted underneath the Datanode process. For example, an
administrator can erroneously remove blk_xxx files. Also, a disk
corruption can make a section of blk_xxx files unreadable. This is the
reason block reports exists.

 

My current thinking is that "block report processing" should compare the
blkxxx files on disk with the data structure in the Datanode memory. If
and only if there is some discrepancy between these two, then a block
report be sent to the Namenode. If we do this, then we will practically
get rid of 99% of block reports.

 

Thanks,

dhruba

 



From: Cagdas Gerede [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 29, 2008 11:32 PM
To: core-user@hadoop.apache.org
Cc: dhruba Borthakur
Subject: Block reports: memory vs. file system, and Dividing
offerService into 2 threads

 

Currently,

 

Block reports are computed by scanning all the files and folders in the
local disk.

This happens not only in startup, but also in periodic block reports.

 

What is the intention behind doing this way instead of creating the
block report from the data structure already located in the memory?

One issue with the current approach is that it takes quite some time.
For instance, the system I am working with has 1 million blocks in each
data node and every time block report is computed, it takes about 4
minutes. You might expect this may not any effect. But there is. The
problem is the computation and sending of block reports and sending the
list of the new received blocks are part of the same thread. As a
result, when the block report computation takes a long time, it delays
the reporting of new received blocks. 

As you know when you are writing a multi block file and you request a
new block from the namenode, the namenode checks whether the very
previous block is replicated enough number of times. If at this point
the data node is computing the block report, it is very likely that it
didn't inform the namenode about the very previous block yet. As a
result, the namenode rejects this. Then client tries to repeat this 5
more times while doubling the wait time in between up to about 6 seconds
(starts with 200ms and doubles it 5 times). Then, client raises an
exception to the application. Given that the block report computation
takes minutes, this situation is very likely to occur.

 

Do you have any suggestions on how to handle this situation?

Are there any plans to take block reporting code segment and reporting
of new received blocks to separate threads (ref: offerService method of
DataNode.java file)?

 

Thanks for your response,

 


-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info <http://cagdasgerede.info/>

RE: Please Help: Namenode Safemode

2008-04-24 Thread dhruba Borthakur

Ok, cool. The randon delay is used to ensure that the Namenode does not
have to process large number of simultaneous block reports, otherwise
the situation becomes really bad when the Namenode restarts and all
Datanodes sends their block reports at the same time. This becomes worse
if the number of Datanodes is large.

 

-dhruba

 



From: Cagdas Gerede [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 24, 2008 11:56 AM
To: dhruba Borthakur
Cc: core-user@hadoop.apache.org
Subject: Re: Please Help: Namenode Safemode

 

Hi Dhruba,
Thanks for your answer. But I think you missed what I mentioned. I
mentioned that the extenstion is already 0 in my  configuration file.

After spending quite some time on the code, I found the reason. The
reason is dfs.blockreport.initialDelay.
If you do not set this in your config file, then it is 60,000 by
default. In datanodes, a random number between 0-60,000 is chosen.
Then, each datanode delays as long as this random value (in miliseconds)
to send the block report when they register with the namenode. As a
result, this value can be as much as 1 minute. If you want your namenode
start quicker, then you should put a smaller number for
dfs.blockreport.initialDelay.

When I set it to 0, the namenode now starts up in 1-2 seconds.


-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info 



On Wed, Apr 23, 2008 at 4:44 PM, dhruba Borthakur <[EMAIL PROTECTED]>
wrote:

By default, there is a variable called dfs.safemode.extension set in
hadoop-default.xml that is set to 30 seconds. This means that once the
Namenode has one replica of every block, it still waits for 30 more
seconds before exiting Safemode.

 

dhruba

 



From: Cagdas Gerede [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 23, 2008 4:37 PM
To: core-user@hadoop.apache.org
Cc: dhruba Borthakur
Subject: Please Help: Namenode Safemode

 

I have a hadoop distributed file system with 3 datanodes. I only have
150 blocks in each datanode. It takes a little more than a minute for
namenode to start and pass safemode phase.

The steps for namenode start, as much as I understand, are:
1) Datanode send a heartbeat to namenode. Namenode tells datanode to
send blockreport as a piggyback to heartbeat.
2) Datanode computes the block report. 
3) Datanode sends it to Namenode.
4) Namenode processes the block report.
5) Namenode safe mode thread monitor checks for exiting, and namenode
exist if threshold is reached and the extension time is passed.

Here are my numbers:
Step 1) Datanodes send heartbeats every 3 seconds. 
Step 2) Datanode computes the block report. (this takes about 20
miliseconds - as shown in the datanodes' logs)
Step 3) No idea? (Depends on the size of blockreport. I suspect this
should not be more than a couple of seconds).
Step 4) No idea? Shouldn't be more than a couple of seconds.
Step 5) Thread checks every second. The extension value in my
configuration is 0. So there is no wait if threshold is achieved.

Given these numbers, can any body explain where does one minute come
from? Shouldn't this step take 10-20 seconds? 
Please help. I am very confused.



-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info

RE: Please Help: Namenode Safemode

2008-04-23 Thread dhruba Borthakur

By default, there is a variable called dfs.safemode.extension set in
hadoop-default.xml that is set to 30 seconds. This means that once the
Namenode has one replica of every block, it still waits for 30 more
seconds before exiting Safemode.

 

dhruba

 



From: Cagdas Gerede [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 23, 2008 4:37 PM
To: core-user@hadoop.apache.org
Cc: dhruba Borthakur
Subject: Please Help: Namenode Safemode

 

I have a hadoop distributed file system with 3 datanodes. I only have
150 blocks in each datanode. It takes a little more than a minute for
namenode to start and pass safemode phase.

The steps for namenode start, as much as I understand, are:
1) Datanode send a heartbeat to namenode. Namenode tells datanode to
send blockreport as a piggyback to heartbeat.
2) Datanode computes the block report. 
3) Datanode sends it to Namenode.
4) Namenode processes the block report.
5) Namenode safe mode thread monitor checks for exiting, and namenode
exist if threshold is reached and the extension time is passed.

Here are my numbers:
Step 1) Datanodes send heartbeats every 3 seconds. 
Step 2) Datanode computes the block report. (this takes about 20
miliseconds - as shown in the datanodes' logs)
Step 3) No idea? (Depends on the size of blockreport. I suspect this
should not be more than a couple of seconds).
Step 4) No idea? Shouldn't be more than a couple of seconds.
Step 5) Thread checks every second. The extension value in my
configuration is 0. So there is no wait if threshold is achieved.

Given these numbers, can any body explain where does one minute come
from? Shouldn't this step take 10-20 seconds? 
Please help. I am very confused.



-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info

RE: regarding a query on the support of hadoop on windows.

2008-04-22 Thread dhruba Borthakur

As far as I know, you need cygwin to install and run hadoop. The fact
that you are using cygwin to run hadoop has almost negligible impact on
the performance and efficiency of the hadoop cluster. Cyhgin is mostly
needed for the install and configuration scripts. There are a few small
portions of cluster code that needs cygwin (e.g. creating snapshots,
detaching blocks, etc), but their overall impact on cluster performance
and throughout is negligible.

New contributions and Contributors to Hadoop are always welcome. The
project "issue" database can be found at
http://hadoop.apache.org/core/issue_tracking.html. This would be a
starting point for new developers to browse existing bugs/enhancements
that are being worked on. You can create a new issue and propose the
contribution(s) that you plan to make.

Hope this helps,
dhruba

-Original Message-
From: Anish Damodaran [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 22, 2008 12:38 AM
To: core-user@hadoop.apache.org
Subject: RE: regarding a query on the support of hadoop on windows.

hello Akshar,
I would want to know if we get use hadoop without cygwin and if we use
cygwin, what kind of impact will it have on the performance?

Regards,
Anish

From: Akshar [EMAIL PROTECTED]
Sent: Tuesday, April 22, 2008 12:50 PM
To: core-user@hadoop.apache.org
Subject: Re: regarding a query on the support of hadoop on windows.

Please see the 'Windows Users' section on
http://wiki.apache.org/hadoop/QuickStart.

On Mon, Apr 21, 2008 at 11:48 PM, Anish Damodaran <
[EMAIL PROTECTED]> wrote:

>
> Hello Sir,
> I'm currently evaluating hadoop for windows. I would like to know the
> following
> 1. It is possible for us to use hadoop without Cygwin as of now? If
not,
> how feasible is it to use modify the scripts to support windows?
> 2. Does the efficiently decrease on account of the fact that hadoop on
> windows run on Cygwin?
> 3. Is some work happening with Hadoop to make it production level on
> windows platform?
> 4. I would like to know if there are any hadoop deployments on Windows
> platform?
> 5. Is there a way for me to contribute to Hadoop for windows platform?
>
> Hoping to get answers to these questions.
>
> Thanks & Regards,
> Anish
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely for the use of the addressee(s). If you are not the intended
> recipient, please notify the sender by e-mail and delete the original
> message. Further, you are not to copy, disclose, or distribute this
e-mail
> or its contents to any other person and any such actions are unlawful.
This
> e-mail may contain viruses. Infosys has taken every reasonable
precaution to
> minimize this risk, but is not liable for any damage you may sustain
as a
> result of any virus in this e-mail. You should carry out your own
virus
> checks before opening the e-mail or attachment. Infosys reserves the
right
> to monitor and review the content of all messages sent to or from this
> e-mail address. Messages sent to or from this e-mail address may be
stored
> on the Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>

RE: datanode files list

2008-04-21 Thread dhruba Borthakur

You should be able to run "bin/hadoop fsck -files -blocks -locations /"
and get a listing of all files and the datanode(s) that each block of
the file resides in.

Thanks,
dhruba

-Original Message-
From: Shimi K [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 21, 2008 2:12 AM
To: core-user@hadoop.apache.org
Subject: datanode files list

Is there a way to get the list of files on each datanode?
I need to be able to get all the names of the files on a specific
datanode?
is there a way to do it?

RE: Help: When is it safe to discard a block in the application layer

2008-04-17 Thread dhruba Borthakur

The DFSClient caches small packets (e.g. 64K write buffers) and they are
lazily flushed to the datanoeds in the pipeline. So, when an application
completes a out.write() call, it is definitely not guaranteed that data
is sent to even one datanode. 

 

One option would be to retrieve cache hints from the Namenode and
determine if the block has three locations.

 

Thanks,

dhruba

 



From: Cagdas Gerede [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 16, 2008 7:40 PM
To: core-user@hadoop.apache.org
Subject: Help: When is it safe to discard a block in the application
layer

 

I am working on an application on top of Hadoop Distributed File System.

High level flow goes like this: User data block arrives to the
application server. The application server uses DistributedFileSystem
api of Hadoop and write the data block to the file system. Once the
block is replicated three times, the application server will notify the
user so that the user can get rid of the data since it is now in a
persistent fault tolerant storage.

I couldn't figure out the following. Let's say, this is my sample
program to write a block.

byte data[] = new byte[blockSize];
out.write(data, 0, data.length);
...

where out is
out = fs.create(outFile, FsPermission.getDefault(), true, 4096,
(short)replicationCount, blockSize, progress);


My application writes the data to the stream and then it goes to the
next line. At this point, I believe I cannot be sure that the block is
replicated
at least, say 3, times. Possibly, under the hood, the DFSClient is still
trying to push this data to others.

Given this, how is my application going to know that the data block is
replicated 3 times and it is safe to discard this data?

There are a couple of things you might think:
1) Set the minimum replica property to 3: Even if you do this, the
application still goes to the next line before the data actually
replicated 3 times.
2) Right after you write, you continuously get cache hints from master
and check if master is aware of 3 replicas of this block: My problem
with this approach is that the application will wait for a while for
every block it needs to store since it will take some time for datanodes
to report and master to process the blockreports. What is worse, if some
datanode in the pipeline fails, we have no way of knowing the error.

To sum-up, I am not sure when is the right time to discard a block of
data with the guarantee that it is replicated certain number of times.

Please help,

Thanks,
Cagdas

RE: Lease expired on open file

2008-04-16 Thread dhruba Borthakur

The DFSClient has a thread that renews leases periodically for all files
that are being written to. I suspect that this thread is not getting a
chance to run because the gunzip program is eating all the CPU. You
might want to put in a Sleep() after every few seconds on unzipping.

Thanks,
dhruba

-Original Message-
From: Luca Telloli [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 16, 2008 9:43 AM
To: core-user@hadoop.apache.org
Subject: Lease expired on open file

Hello everyone,
I wrote a small application that directly gunzip files from a
local 
filesystem to an installation of HDFS, writing on a FSDataOutputStream. 
Nevertheless, while expanding a very big file, I got this exception:

org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.dfs.LeaseExpiredException: No lease on 
/user/luca/testfile File is not open for writing. [Lease.  Holder: 44 46

53 43 6c 69 65 6e 74 5f 2d 31 39 31 34 34 39 36 31 34 30, heldlocks: 0, 
pendingcreates: 1]

I wonder what the cause would be for this Exception and if there's a way

  to know the default lease for a file and to possibly prolongate it.

Ciao,
Luca

RE: multiple datanodes in the same machine

2008-04-15 Thread dhruba Borthakur

Yes, just point the Datanodes to different config files, different sets
of ports, different data directories. Etc.etc.

Thanks,
dhruba

-Original Message-
From: Cagdas Gerede [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 15, 2008 11:21 AM
To: core-user@hadoop.apache.org
Subject: multiple datanodes in the same machine

Is there a way to run multiple datanodes in the same machine?

-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info

RE: secondary namenode web interface

2008-04-08 Thread dhruba Borthakur

The secondary Namenode uses the HTTP interface to pull the fsimage from
the primary. Similarly, the primary Namenode uses the
dfs.secondary.http.address to pull the checkpointed-fsimage back from
the secondary to the primary. So, the definition of
dfs.secondary.http.address is needed.

However, the servlet dfshealth.jsp should not be served from the
secondary Namenode. This servet should be setup in such a way that only
the primary Namenode invokes this servlet.

Thanks,
dhruba

-Original Message-
From: Yuri Pradkin [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 08, 2008 10:11 AM
To: core-user@hadoop.apache.org
Subject: Re: secondary namenode web interface

I'd be happy to file a JIRA for the bug, I just want to make sure I
understand 
what the bug is: is it the misleading "null pointer" message or is it
that 
someone is listening on this port and not doing anything useful?  I
mean,
what is the configuration parameter dfs.secondary.http.address for?
Unless 
there are plans to make this interface work, this config parameter
should go 
away, and so should the listening thread, shouldn't they?

Thanks,
  -Yuri

On Friday 04 April 2008 03:30:46 pm dhruba Borthakur wrote:
> Your configuration is good. The secondary Namenode does not publish a
> web interface. The "null pointer" message in the secondary Namenode
log
> is a harmless bug but should be fixed. It would be nice if you can
open
> a JIRA for it.
>
> Thanks,
> Dhruba
>
>
> -Original Message-
> From: Yuri Pradkin [mailto:[EMAIL PROTECTED]
> Sent: Friday, April 04, 2008 2:45 PM
> To: core-user@hadoop.apache.org
> Subject: Re: secondary namenode web interface
>
> I'm re-posting this in hope that someone would help.  Thanks!
>
> On Wednesday 02 April 2008 01:29:45 pm Yuri Pradkin wrote:
> > Hi,
> >
> > I'm running Hadoop (latest snapshot) on several machines and in our
>
> setup
>
> > namenode and secondarynamenode are on different systems.  I see from
>
> the
>
> > logs than secondary namenode regularly checkpoints fs from primary
> > namenode.
> >
> > But when I go to the secondary namenode HTTP
>
> (dfs.secondary.http.address)
>
> > in my browser I see something like this:
> >
> > HTTP ERROR: 500
> > init
> > RequestURI=/dfshealth.jsp
> > Powered by Jetty://
> >
> > And in secondary's log I find these lines:
> >
> > 2008-04-02 11:26:25,357 WARN /: /dfshealth.jsp:
> > java.lang.NullPointerException
> > at
> > org.apache.hadoop.dfs.dfshealth_jsp.(dfshealth_jsp.java:21) at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
>
> at
>
>
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorA
> cce
>
> >ssorImpl.java:57) at
>
>
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCons
> tru
>
> >ctorAccessorImpl.java:45) at
> > java.lang.reflect.Constructor.newInstance(Constructor.java:539) at
> > java.lang.Class.newInstance0(Class.java:373)
> > at java.lang.Class.newInstance(Class.java:326)
> > at
>
> org.mortbay.jetty.servlet.Holder.newInstance(Holder.java:199)
>
> > at
>
>
org.mortbay.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:32
> 6)
>
> > at
>
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:405)
>
> > at
>
>
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationH
> and
>
> >ler.java:475) at
>
>
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
> at
>
> > org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at
>
>
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCon
> tex
>
> >t.java:635) at
>
> org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at
>
> > org.mortbay.http.HttpServer.service(HttpServer.java:954) at
> > org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at
> > org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at
> > org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at
>
>
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244
> )
>
> > at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at
> > org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> >
> > Is something missing from my configuration?  Anybody else seen
these?
> >
> > Thanks,
> >
> >   -Yuri

RE: secondary namenode web interface

2008-04-04 Thread dhruba Borthakur

Your configuration is good. The secondary Namenode does not publish a
web interface. The "null pointer" message in the secondary Namenode log
is a harmless bug but should be fixed. It would be nice if you can open
a JIRA for it.

Thanks,
Dhruba


-Original Message-
From: Yuri Pradkin [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 04, 2008 2:45 PM
To: core-user@hadoop.apache.org
Subject: Re: secondary namenode web interface

I'm re-posting this in hope that someone would help.  Thanks!

On Wednesday 02 April 2008 01:29:45 pm Yuri Pradkin wrote:
> Hi,
>
> I'm running Hadoop (latest snapshot) on several machines and in our
setup
> namenode and secondarynamenode are on different systems.  I see from
the
> logs than secondary namenode regularly checkpoints fs from primary
> namenode.
>
> But when I go to the secondary namenode HTTP
(dfs.secondary.http.address)
> in my browser I see something like this:
>
>   HTTP ERROR: 500
>   init
>   RequestURI=/dfshealth.jsp
>   Powered by Jetty://
>
> And in secondary's log I find these lines:
>
> 2008-04-02 11:26:25,357 WARN /: /dfshealth.jsp:
> java.lang.NullPointerException
> at
> org.apache.hadoop.dfs.dfshealth_jsp.(dfshealth_jsp.java:21) at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
>
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorA
cce
>ssorImpl.java:57) at
>
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCons
tru
>ctorAccessorImpl.java:45) at
> java.lang.reflect.Constructor.newInstance(Constructor.java:539) at
> java.lang.Class.newInstance0(Class.java:373)
> at java.lang.Class.newInstance(Class.java:326)
> at
org.mortbay.jetty.servlet.Holder.newInstance(Holder.java:199)
> at
>
org.mortbay.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:32
6)
> at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:405)
> at
>
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationH
and
>ler.java:475) at
>
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at
> org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at
>
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCon
tex
>t.java:635) at
org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at
> org.mortbay.http.HttpServer.service(HttpServer.java:954) at
> org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at
> org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at
> org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at
>
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244
)
> at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at
> org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
>
> Is something missing from my configuration?  Anybody else seen these?
>
> Thanks,
>
>   -Yuri

RE: Append data in hdfs_write

2008-03-26 Thread dhruba Borthakur

HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700
for more details.

Thanks,
dhruba

-Original Message-
From: Raghavendra K [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 26, 2008 11:29 PM
To: core-user@hadoop.apache.org
Subject: Append data in hdfs_write

Hi,
  I am using
hdfsWrite to write data onto a file.
Whenever I close the file and re open it for writing it will start
writing
from the position 0 (rewriting the old data).
Is there any way to append data onto a file using hdfsWrite.
I cannot use hdfsTell because it works only when opened in RDONLY mode
and
also I dont know the number of bytes written onto the file previously.
Please throw some light onto it.

-- 
Regards,
Raghavendra K

RE: Performance / cluster scaling question

2008-03-21 Thread dhruba Borthakur

The namenode lazily instructs a Datanode to delete blocks. As a response to 
every heartbeat from a Datanode, the Namenode instructs it to delete a maximum 
on 100 blocks. Typically, the heartbeat periodicity is 3 seconds. The heartbeat 
thread in the Datanode deletes the block files synchronously before it can send 
the next heartbeat. That's the reason a small number (like 100) was chosen.

If you have 8 datanodes, your system will probably delete about 800 blocks 
every 3 seconds.

Thanks,
dhruba

-Original Message-
From: André Martin [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 21, 2008 3:06 PM
To: core-user@hadoop.apache.org
Subject: Re: Performance / cluster scaling question

After waiting a few hours (without having any load), the block number 
and "DFS Used" space seems to go down...
My question is: is the hardware simply too weak/slow to send the block 
deletion request to the datanodes in a timely manner, or do simply those 
"crappy" HDDs cause the delay, since I noticed that I can take up to 40 
minutes when deleting ~400.000 files at once manually using "rm -r"...
Actually - my main concern is why the performance à la the throughput 
goes down - any ideas?

RE: HDFS: Flash Application and Available APIs

2008-03-20 Thread dhruba Borthakur

There is a C-language based API to access HDFS. You can find more
details at:

http://wiki.apache.org/hadoop/LibHDFS

If you download the Hadoop source code from
http://hadoop.apache.org/core/releases.html, you will see this API in
src/c++/libhdfs/hdfs.c

hope this helps,
dhruba

-Original Message-
From: Cagdas Gerede [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 18, 2008 10:54 AM
To: core-user@hadoop.apache.org
Subject: HDFS: Flash Application and Available APIs

I have two questions:

- I was wondering if an HDFS client can be invoked from a Flash
application.
- What are the available APIs for HDFS? (I read that there is a C/C++
api for Hadoop Map/Reduce but is there a C/C++ API for HDFS? or Can it
only be invoked from a Java application?


Thanks for your help,
Cagdas

RE: Trash option in hadoop-site.xml configuration.

2008-03-20 Thread dhruba Borthakur

Actually, the fs.trash.interval number has no significance on the client. If it 
is non-zero, then the client does a rename instead of a delete. The value 
specified in fs.trash.interval is used only by the namenode to periodically 
remove files from Trash: the periodicity is the value specified by 
fs.trash.interval on the namenode.

hope this helps,
dhruba


-Original Message-
From: Taeho Kang [mailto:[EMAIL PROTECTED]
Sent: Thu 3/20/2008 1:53 AM
To: core-user@hadoop.apache.org
Subject: Re: Trash option in hadoop-site.xml configuration.
 
Thank you for the clarification.

Here is my another question.
If two different clients ordered "move to trash" with different interval,
(e.g. client #1 with fs.trash.interval = 60; client #2 with
fs.trash.interval = 120)
what would happen?

Does namenode keep track of all these info?

/Taeho


On 3/20/08, dhruba Borthakur <[EMAIL PROTECTED]> wrote:
>
> The "trash" feature is a client side option and depends on the client
> configuration file. If the client's configuration specifies that "Trash"
> is enabled, then the HDFS client invokes a "rename to Trash" instead of
> a "delete". Now, if "Trash" is enabled on the Namenode, then the
> Namenode periodically removes contents from the Trash directory.
>
> This design might be confusing to some users. But it provides the
> flexibility that different clients in the cluster can have either Trash
> enabled or disabled.
>
> Thanks,
> dhruba
>
> -Original Message-
> From: Taeho Kang [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 19, 2008 3:13 AM
> To: [EMAIL PROTECTED]; core-user@hadoop.apache.org;
> [EMAIL PROTECTED]
> Subject: Trash option in hadoop-site.xml configuration.
>
> Hello,
>
> I have these two machines that acts as a client to HDFS.
>
> Node #1 has Trash option enabled (e.g. fs.trash.interval set to 60)
> and Node #2 has Trash option off (e.g. fs.trash.interval set to 0)
>
> When I order file deletion from Node #2, the file gets deleted right
> away.
> while the file gets moved to trash when I do the same from Node #1.
>
> This is a bit of surprise to me,
> because I thought Trash option that I have set in the master node's
> config
> file
> applies to everyone who connects to / uses the HDFS.
>
> Was there any reason why Trash option was implemented in this way?
>
> Thank you in advance,
>
> /Taeho
>

RE: Trash option in hadoop-site.xml configuration.

2008-03-19 Thread dhruba Borthakur

The "trash" feature is a client side option and depends on the client
configuration file. If the client's configuration specifies that "Trash"
is enabled, then the HDFS client invokes a "rename to Trash" instead of
a "delete". Now, if "Trash" is enabled on the Namenode, then the
Namenode periodically removes contents from the Trash directory.

This design might be confusing to some users. But it provides the
flexibility that different clients in the cluster can have either Trash
enabled or disabled.

Thanks,
dhruba

-Original Message-
From: Taeho Kang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 19, 2008 3:13 AM
To: [EMAIL PROTECTED]; core-user@hadoop.apache.org;
[EMAIL PROTECTED]
Subject: Trash option in hadoop-site.xml configuration.

Hello,

I have these two machines that acts as a client to HDFS.

Node #1 has Trash option enabled (e.g. fs.trash.interval set to 60)
and Node #2 has Trash option off (e.g. fs.trash.interval set to 0)

When I order file deletion from Node #2, the file gets deleted right
away.
while the file gets moved to trash when I do the same from Node #1.

This is a bit of surprise to me,
because I thought Trash option that I have set in the master node's
config
file
applies to everyone who connects to / uses the HDFS.

Was there any reason why Trash option was implemented in this way?

Thank you in advance,

/Taeho

RE: HDFS: how to append

2008-03-18 Thread dhruba Borthakur

HDFS files, once created, cannot be modified in any way. Appends to HDFS
files will probably be supported in a future release in the next couple
of months.

Thanks,
dhruba

-Original Message-
From: Cagdas Gerede [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 18, 2008 9:53 AM
To: core-user@hadoop.apache.org
Subject: HDFS: how to append

The HDFS documentation says it is possible to append to an HDFS file.

In org.apache.hadoop.dfs.DistributedFileSystem class,
there is no method to open an existing file for writing (there are
methods for reading).
Only similar methods are "create" methods which return
FSDataOutputStream.
When I look at FSDataOutputStream class, it seems there is no "append"
method, and
all "write" methods overwrite an existing file or return an error if
such a file exists.

Does anybody know how to append to a file in HDFS?
I appreciate your help.
Thanks,

Cagdas

RE: Question about recovering from a corrupted namenode 0.16.0

2008-03-13 Thread dhruba Borthakur

Your procedure is right:

1. Copy edit.tmp from secondary to edit on primary
2. Copy srcimage from secondary to fsimage on primary 
3. remove edits.new on primary
4. restart cluster, put in Safemode, fsck /

However, the above steps are not foolproof because the transactions that
occured between the time when the last checkpoint was taken by the
secondary and when the disk became full are lost. This could cause some
blocks to go missing too, because the last checkpoint might refer to
blocks that are no longer present. If the fsck does not report any
missing blocks, then you are good to go.

Thanks,
dhruba

-Original Message-
From: Jason Venner [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 13, 2008 1:37 PM
To: core-user@hadoop.apache.org
Subject: Question about recovering from a corrupted namenode 0.16.0

The namenode ran out of disk space and on restart was throwing the error

at the end of this message.

We copied in the edit.tmp to edit from the secondary, and copied in 
srcimage to fsimage, and removed edit.new and our file system started up
and /appears/ to be intact.

What is the proper procedure, we didn't find any details on the wiki.

Namenode error:
2008-03-13 13:19:32,493 ERROR org.apache.hadoop.dfs.NameNode: 
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
at
org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:507)
at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:744)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:624)
at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:222)
at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:254)
at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:235)
at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:175)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:161)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)



-- 
Jason Venner
Attributor - Publish with Confidence 
Attributor is hiring Hadoop Wranglers, contact if interested

RE: HDFS interface

2008-03-11 Thread dhruba Borthakur

HDFS can be accessed using the FileSystem API

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/File
System.html

The HDFS Namenode protocol can be found in 

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/dfs/Nam
eNode.html

thanks,
dhruba

-Original Message-
From: Naama Kraus [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 11, 2008 12:17 AM
To: hadoop-core
Subject: HDFS interface

Hi,

I'd be interested in information about interfaces to HDFS other then the
DFSShell commands. I've seen threads about dfs and fuse, dfs and WebDav.
Could anyone provide more details or point me to related resources ?
What's
the status of these ?

Thanks, Naama

-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00
oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If
you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

RE: zombie data nodes, not alive but not dead

2008-03-10 Thread dhruba Borthakur

The following issues might be impacting you (from release notes)

http://issues.apache.org/jira/browse/HADOOP-2185

HADOOP-2185.  RPC Server uses any available port if the specified
port is zero. Otherwise it uses the specified port. Also combines
the configuration attributes for the servers' bind address and
port from "x.x.x.x" and "y" to "x.x.x.x:y".
Deprecated configuration variables:
  dfs.info.bindAddress
  dfs.info.port
  dfs.datanode.bindAddress
  dfs.datanode.port
  dfs.datanode.info.bindAdress
  dfs.datanode.info.port
  dfs.secondary.info.bindAddress
  dfs.secondary.info.port
  mapred.job.tracker.info.bindAddress
  mapred.job.tracker.info.port
  mapred.task.tracker.report.bindAddress
  tasktracker.http.bindAddress
  tasktracker.http.port
New configuration variables (post HADOOP-2404):
  dfs.secondary.http.address
  dfs.datanode.address
  dfs.datanode.http.address
  dfs.http.address
  mapred.job.tracker.http.address
  mapred.task.tracker.report.address
  mapred.task.tracker.http.address

-Original Message-
From: Dave Coyle [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 10, 2008 10:01 PM
To: core-user@hadoop.apache.org
Subject: Re: zombie data nodes, not alive but not dead

On 2008-03-10 23:37:36 -0400, [EMAIL PROTECTED] wrote:
> I can leave the cluster running  for hours and this slave will never 
> "register" itself with the namenode. I've been messing with this
problem 
> for three days now and I'm out of ideas. Any suggestions?

I had a similar-sounding problem with a 0.16.0 setup I had...
namenode thinks datanodes are dead, but the datanodes complain if
namenode is unreachable so there must be *some* connectivity.
Admittedly I haven't had the time yet to recreate what I did to see if
I had just mangled some config somewhere, but I was eventually able to
sort out my problem by...and yes, this sounds a bit wacky... running
a given datanode interactively, suspending it, then bringing it back
to the foreground.  E.g. (assuming your namenode is already running):

$ bin/hadoop datanode

$ fg

and the datanode then magically registered with the namenode.

Give it a shot... I'm curious to hear if it works for you, too.

-Coyle

RE: org.apache.hadoop.dfs.NameNode: java.lang.NullPointerException

2008-03-02 Thread dhruba Borthakur

Hi Andre,

Is it possible for you to let me look at your entire Namenode log?

Thanks,
dhruba

-Original Message-
From: André Martin [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 01, 2008 4:32 PM
To: core-user@hadoop.apache.org
Subject: org.apache.hadoop.dfs.NameNode: java.lang.NullPointerException

Hi everyone,
the namenode doesn't re-start properly:

> 2008-03-02 01:25:25,120 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
> /
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = se09/141.76.xxx.xxx
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 2008-02-28_11-01-44
> STARTUP_MSG:   build = 
> http://svn.apache.org/repos/asf/hadoop/core/trunk -r 631915; compiled 
> by 'hudson' on Thu Feb 28 11:11:52 UTC 2008
> /
> 2008-03-02 01:25:25,247 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing RPC Metrics with serverName=NameNode, port=8000
> 2008-03-02 01:25:25,254 INFO org.apache.hadoop.dfs.NameNode: Namenode 
> up at: se09.inf.tu-dresden.de/141.76.44.xxx:xxx
> 2008-03-02 01:25:25,257 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2008-03-02 01:25:25,260 INFO org.apache.hadoop.dfs.NameNodeMetrics: 
> Initializing NameNodeMeterics using context 
> object:org.apache.hadoop.metrics.spi.NullContext
> 2008-03-02 01:25:25,358 INFO org.apache.hadoop.fs.FSNamesystem: 
> fsOwner=amartin,students
> 2008-03-02 01:25:25,359 INFO org.apache.hadoop.fs.FSNamesystem: 
> supergroup=supergroup
> 2008-03-02 01:25:25,359 INFO org.apache.hadoop.fs.FSNamesystem: 
> isPermissionEnabled=true
> 2008-03-02 01:25:29,887 ERROR org.apache.hadoop.dfs.NameNode: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.dfs.FSImage.readINodeUnderConstruction(FSImage.java:950)
> at 
> org.apache.hadoop.dfs.FSImage.loadFilesUnderConstruction(FSImage.java:919)
> at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:749)
> at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:634)
> at 
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:223)
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
> at 
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:261)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:242)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:131)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:176)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:162)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:851)
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:860)
>
> 2008-03-02 01:25:29,888 INFO org.apache.hadoop.dfs.NameNode: 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at se09/141.76.xxx.xxx
> /
Any ideas? Looks like a bug...

Cu on the 'net,
Bye - bye,

   < André   èrbnA >

RE: long write operations and data recovery

2008-02-29 Thread dhruba Borthakur

It would nice if a layer on top of the dfs client can be built to handle
disconnected operation. That layer can cache files on local disk if HDFS
is unavailable. It can then upload those files into HDFS when HDFS
service comes back online. I think such a service will be helpful for
most HDFS installations.

Thanks,
dhruba

-Original Message-
From: Ted Dunning [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 29, 2008 11:33 AM
To: core-user@hadoop.apache.org
Subject: Re: long write operations and data recovery

Unless your volume is MUCH higher than ours, I think you can get by with
a
relatively small farm of log consolidators that collect and concatenate
files.

If each log line is 100 bytes after compression (that is huge really)
and
you have 10,000 events per second (also pretty danged high) then you are
only writing 1MB/s.  If you need a day of buffering (=100,000 seconds),
then
you need 100GB of buffer storage.  These are very, very moderate
requirements for your ingestion point.

On 2/29/08 11:18 AM, "Steve Sapovits" <[EMAIL PROTECTED]> wrote:

> Ted Dunning wrote:
> 
>> In our case, we looked at the problem and decided that Hadoop wasn't
>> feasible for our real-time needs in any case.  There were several
>> issues,
>> 
>> - first, of all, map-reduce itself didn't seem very plausible for
>> real-time applications.  That left hbase and hdfs as the capabilities
>> offered by hadoop (for real-time stuff)
> 
> We'll be using map-reduce batch mode, so we're okay there.
> 
>> The upshot is that we use hadoop extensively for batch operations
>> where it really shines.  The other nice effect is that we don't have
>> to worry all that much about HA (at least not real-time HA) since we
>> don't do real-time with hadoop.
> 
> What I'm struggling with is the write side of things.  We'll have a
huge
> amount of data to write that's essentially a log format.  It would
seem
> that writing that outside of HDFS then trying to batch import it would
> be a losing battle -- that you would need the distributed nature of
HDFS
> to do very large volume writes directly and wouldn't easily be able to
take
> some other flat storage model and feed it in as a secondary step
without
> having the HDFS side start to lag behind.
> 
> The realization is that Name Node could go down so we'll have to have
a
> backup store that might be used during temporary outages, but that
> most of the writes would be direct HDFS updates.
> 
> The alternative would seem to be to end up with a set of distributed
files
> without some unifying distributed file system (e.g., like lots of
Apache
> web logs on many many individual boxes) and then have to come up with
> some way to funnel those back into HDFS.

RE: long write operations and data recovery

2008-02-28 Thread dhruba Borthakur

I agree with Joydeep. For batch processing, it is sufficient to make the
application not assume that HDFS is always up and active. However, for
real-time applications that are not batch-centric, it might not be
sufficient. There are a few things that HDFS could do to better handle
Namenode outages:

1. Make Clients handle transient Namenode downtime. This requires that
Namenode restarts are fast, clients can handle long Namenode outages,
etc.etc.
2. Design HDFS Namenode to be a set of two, an active one and a passive
one. The active Namenode could continuously forward transactions to the
passive one. In case of failure of the active Namenode, the passive
could take over. This type of High-Availability would probably be very
necessary for non-batch-type-applications.

Thanks,
dhruba

-Orivery necessaginal Message-
From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 28, 2008 6:06 PM
To: core-user@hadoop.apache.org
Subject: RE: long write operations and data recovery

We have had a lot of peace of mind by building a data pipeline that does
not assume that hdfs is always up and running. If the application is
primarily non real-time log processing - I would suggest
batch/incremental copies of data to hdfs that can catch up automatically
in case of failures/downtimes.

we have a rsync like map-reduce job that monitors a log directories and
keeps pulling new data in (and suspect lot of other users do similar
stuff as well). Might be a useful notion to generalize and put in
contrib.


-Original Message-
From: Steve Sapovits [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 28, 2008 4:54 PM
To: core-user@hadoop.apache.org
Subject: Re: long write operations and data recovery


> How does replication affect this?  If there's at least one replicated
>  client still running, I assume that takes care of it?

Never mind -- I get this now after reading the docs again.

My remaining point of failure question concerns name nodes.  The docs
say manual 
intervention is still required if a name node goes down.  How is this
typically managed
in production environments?   It would seem even a short name node
outage in a 
data intestive environment would lead to data loss (no name node to give
the data
to).

-- 
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]

RE: long write operations and data recovery

2008-02-26 Thread dhruba Borthakur

The Namenode maintains a lease for every open file that is being written
to. If the client that was writing to the file disappears, the Namenode
will do "lease recovery" after expiry of the lease timeout (1 hour). The
lease recovery process (in most cases) will remove the last block from
the file (it was not fully written because the client crashed before it
could fill up the block) and close the file.

Thanks,
dhruba

-Original Message-
From: Steve Sapovits [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 25, 2008 5:13 PM
To: core-user@hadoop.apache.org
Subject: long write operations and data recovery


If I have a write operation that takes a while between opening and
closing
the file, what is the effect of a node doing that writing crashing in
the middle?
For example, suppose I have large logs that I write to continually,
rolling them
every N minutes (say every hour for the sake of discussion).  If I have
the file
opened and am 90% done my writes and things crash, what happens to the
data I've "written" ... realizing that at some level, data isn't visible
to the rest
of the cluster until the file is closed.

-- 
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]

RE: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread dhruba Borthakur

If your file system metadata is in /tmp, then you are likely to see
these kinds of problems. It would be nice if you can move the location
of your metadata files away from /tmp. If you still see the problem, can
you pl send us the logs from the log directory?

Thanks a bunch,
Dhruba

-Original Message-
From: Steve Sapovits [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 22, 2008 2:34 PM
To: core-user@hadoop.apache.org
Subject: Re: Namenode fails to re-start after cluster shutdown

Raghu Angadi wrote:

> Please report such problems if you think it was because of HDFS, as 
> opposed to some hardware or disk failures.

Will do.  I suspect it's something else.  I'm testing on a notebook in
pseudo-distributed
mode (per the quick start guide).  My IP changes when I take that box
between home
and work so that could be it -- even though I'm running everything
localhost I've seen 
other issues if my hostname can't get properly resolved.  Also, with
everything in /tmp
by default, shutdowns of that box may be removing files.  

-- 
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]

RE: Namenode fails to re-start after cluster shutdown

2008-02-22 Thread dhruba Borthakur

Reformatting should never be necessary if you are using released version
of hadoop. Hadoop-2783 refers to a bug that got introduced into trunk
(not in any released versions).

Thanks,
Dhruba


-Original Message-
From: Steve Sapovits [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 22, 2008 1:43 PM
To: core-user@hadoop.apache.org
Subject: Re: Namenode fails to re-start after cluster shutdown


What are the situations that make reformatting necessary?  Testing, we
seem
to hit a lot of cases where we have to reformat.  We're wondering how
much of
a real production issue this is.

-- 
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]

RE: Python access to HDFS

2008-02-21 Thread dhruba Borthakur

Hi Pete,

If you are referring to the ability to re-open a file and append to it,
then this feature is not in 0.16. Please see:
http://issues.apache.org/jira/browse/HADOOP-1700

Thanks,
dhruba

-Original Message-
From: Pete Wyckoff [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 21, 2008 4:09 PM
To: core-user@hadoop.apache.org
Subject: Re: Python access to HDFS

We're profiling and tuning read performance for fuse dfs and have writes
implemented, but I haven 't been able to test it even as I haven't tried
0.16 yet - It requires the ability to create the file, close it and then
re-open it to start writing - which can't be done till 16.

--pete

On 2/21/08 3:50 PM, "Steve Sapovits" <[EMAIL PROTECTED]> wrote:

> Jeff Hammerbacher wrote:
> 
>> maybe the dfs could expose a thrift interface in future releases?
> 
> ThruDB exposes Lucene via Thrift but not the underlying HDFS.   I just
> need HDFS access in Python for now.
> 
>> you could also use the FUSE module to mount the dfs and just write to
it
>> like any other filesystem...
> 
> Good point.  I'll check that avenue.  Would FUSE add much overhead for
> writing lots of data?   I see a Python binding for it.

RE: Namenode fails to replicate file

2008-02-07 Thread dhruba Borthakur

You have to use the -w parameter to the setrep command to make it wait
till the replication is complete. The following command

bin/hadoop dfs -setrep 10 -w filename

will block till all blocks of the file achieves a replication factor of
10.

Thanks,
dhruba

-Original Message-
From: Tim Wintle [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 07, 2008 11:05 PM
To: core-user@hadoop.apache.org
Subject: Re: Namenode fails to replicate file

Doesn't the -setrep command force the replication to be increased
immediately?

./hadoop dfs -setrep [replication] path

(I may have misunderstood)


On Thu, 2008-02-07 at 17:05 -0800, Ted Dunning wrote:
> 
> Chris Kline reported a problem in early January where a file which had
too
> few replicated blocks did not get replicated until a DFS restart.
> 
> I just saw a similar issue.  I had a file that had a block with 1
replica (2
> required) that did not get replicated.  I changed the number of
required
> replicates, but nothing caused any action.  Changing the number of
required
> replicas on other files got them to be replicated.
> 
> I eventually copied the file to temp, deleted the original and moved
the
> copy back to the original place.  I was also able to read the entire
file
> which shows that the problem was not due to slow reporting from a down
> datanode.
> 
> This happened just after I had a node failure which was why I was
messing
> with replication at all.  Since I was in the process of increasing the
> replication on nearly 10,000 large files, my log files are full of
other
> stuff, but I am pretty sure that there is a bug here.
> 
> This was on a relatively small cluster with 13 data nodes.
> 
> It also brings up a related issue that has come up before in that
there are
> times when you may want to increase the number of replicas of a file
right
> NOW.  I don't know of any way to force this replication.  Is there
such a
> way?
> 
> 
>

RE: question aboutc++ libhdfs and a bug comment on the libdfs test case

2008-01-22 Thread dhruba Borthakur

Hi Jason,

Good catch. It would be great if you can create a JIRA issue and submit
your code change as a patch for this problem.

There are some big sites (about 1000 node clusters) that use libhdfs to
access HDFS.

Thanks,
Dhruba

-Original Message-
From: Jason Venner [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 21, 2008 2:38 PM
To: [EMAIL PROTECTED]
Subject: question aboutc++ libhdfs and a bug comment on the libdfs test
case

We were wondering how stable it was for use? We have some 10 gig data 
sets that need a little translation before our java based hadoop jobs 
can run on them, and the translation library is in c++. We are thinking 
of using libdfs to to a translate while copying the data in to hdfs.

Any gotchas with libhdfs that we should know about.

While looking at the the HEAD version of the


  /lucene/hadoop/trunk/src/c++/libhdfs/hdfs_test.c,

I noticed a small typeo


*int* *main*(*int* argc, *char* **argv) {

hdfsFS fs = hdfsConnect(*"default"*, 0);
*if*(!fs) {
fprintf(stderr, *"Oops! Failed to connect to hdfs!\n"*);
exit(-1);
} 
 
hdfsFS lfs = hdfsConnect(NULL, 0);
*if*(!fs) {
fprintf(stderr, *"Oops! Failed to connect to 'local' hdfs!\n"*);
exit(-1);
} 
 
I believe that test should be if(!lfs) {

59 matches

Mail list logo