You can use any of these:
1. bin/hadoop dfs -get hdfsfile remote filename
2. Thrift API : http://wiki.apache.org/hadoop/HDFS-APIs
3. use fuse-mount ot mount hdfs as a regular file system on remote machine:
http://wiki.apache.org/hadoop/MountableHDFS
thanks,
dhruba
On Mon, Apr 20, 2009 at
I believe that file modification times are updated only when the file is
closed. Are you appending to a preexisting file?
thanks,
dhruba
On Tue, Dec 30, 2008 at 3:14 AM, Sandeep Dhawan dsand...@hcl.in wrote:
Hi,
I have a application which creates a simple text file on hdfs. There is a
Hi Dennis,
There were some discussions on this topic earlier:
http://issues.apache.org/jira/browse/HADOOP-3799
Do you have any specific use-case for this feature?
thanks,
dhruba
On Mon, Nov 24, 2008 at 10:22 PM, Owen O'Malley [EMAIL PROTECTED] wrote:
On Nov 24, 2008, at 8:44 PM, Mahadev
The design is such that running multiple secondary namenodes should not
corrupt the image (modulo any bugs). Are you seeing image corruptions when
this happens?
You can run all or any daemons in 32-bit mode or 64 bit-mode. You can
mix-and-match. If you have many millions of files, then you might
One can open a file and then seek to an offset and then start reading
from there. For writing, one can write only to the end of an existing
file using FileSystem.append().
hope this helps,
dhruba
On Thu, Nov 13, 2008 at 1:24 PM, Tsz Wo (Nicholas), Sze
[EMAIL PROTECTED] wrote:
Append is going to
Couple of things that one can do:
1. dfs.name.dir should have at least two locations, one on the local
disk and one on NFS. This means that all transactions are
synchronously logged into two places.
2. Create a virtual IP, say name.xx.com that points to the real
machine name of the machine on
services, it's quite a common case.
I think HBase developers would have run into similar issues as well.
Is this enough explanation?
Thanks in advance,
Taeho
On Tue, Nov 4, 2008 at 3:12 AM, Dhruba Borthakur [EMAIL PROTECTED] wrote:
In the current code, details about block locations
It can return 0 if and only if the requested size was zero. For EOF,
it should return -1.
dhruba
On Fri, Nov 7, 2008 at 8:09 PM, Pete Wyckoff [EMAIL PROTECTED] wrote:
Just want to ensure 0 iff EOF or the requested #of bytes was 0.
On 11/7/08 6:13 PM, Pete Wyckoff [EMAIL PROTECTED] wrote:
Hi Ben,
And, if I may add, if you would like to contribute the code to make this
happen, that will be awesome! In that case, we can move this discussion to a
JIRA.
Thanks,
dhruba
On 10/27/08 1:41 PM, Ashish Thusoo [EMAIL PROTECTED] wrote:
We did have some discussions around it a while back
My opinion is to not store file-namespace related metadata on the
datanodes. When a file is renamed, one has to contact all datanodes to
change this new metadata. Worse still, if one renames an entire
subdirectory, all blocks that belongs to all files in the subdirectory
have to be updated.
In almost all hadoop configurations, all host names can be specified
as IP address. So, in your hadoop-site.xml, please specify the IP
address of the namenode (instead of its hostname).
-dhruba
2008/8/8 Lucas Nazário dos Santos [EMAIL PROTECTED]:
Thanks Andreas. I'll try it.
On Fri, Aug 8,
It is possible that your namenode is overloaded and is not able to
respond to RPC requests from clients. Please check the namenode logs
to see if you see lines of the form discarding calls
dhrua
On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov
[EMAIL PROTECTED] wrote:
I come across the
wrote:
Dhruba Borthakur wrote:
A good way to implement failover is to make the Namenode log transactions
to
more than one directory, typically a local directory and a NFS mounted
directory. The Namenode writes transactions to both directories
synchronously.
If the Namenode machine dies
One option for you is to use a pdf-to-text converter (many of them are
available online) and then run map-reduce on the txt file.
-dhruba
On Wed, Jul 23, 2008 at 1:07 AM, GaneshG
[EMAIL PROTECTED] wrote:
Thanks Lohit, i am using only defalult reader and i am very new to hadoop.
This is my map
HDFS uses the network topology to distribute and replicate data. An
admin has to configure a script that describes the network topology to
HDFS. This is specified by using the parameter
topology.script.file.name in the Configuration file. This has been
tested when nodes are on different subnets in
This firstbadlink was an mis-configured log message in the code. It
is innocuous and has since been fixed in 0.17 release.
http://issues.apache.org/jira/browse/HADOOP-3029
thanks,
dhruba
On Sat, May 24, 2008 at 7:03 PM, C G [EMAIL PROTECTED] wrote:
Hi All:
So far, running 0.16.4 has been a
There isn's a way to change the block size of an existing file. The
block size of a file can be specified only at the time of file
creation and cannot be changed later.
There isn't any wasted space in your system. If the block size is
128MB but you create a HDFS file of say size 10MB, then that
Did one datanode fail or did the namenode fail? By fail do you mean
that the system was rebooted or was there a bad disk that caused the
problem?
thanks,
dhruba
On Sun, May 11, 2008 at 7:23 PM, C G [EMAIL PROTECTED] wrote:
Hi All:
We had a primary node failure over the weekend. When we
You bring up an interesting point. A big chunk of the code in the
Namenode is being done inside a global lock although there are pieces
(e.g. a portion of code that chooses datanodes for a newly allocated
block) that do execute outside this lock. But, it is probably the case
that the namenode does
: 3.0
The filesystem under path '/' is CORRUPT
So it seems like it's fixing some problems on its own?
Thanks,
C G
Dhruba Borthakur [EMAIL PROTECTED] wrote:
Did one datanode fail or did the namenode fail? By fail do you mean
that the system was rebooted or was there a bad disk
Starting in 0.17 release, an application can invoke
DFSOutputStream.fsync() to persist block locations for a file even
before the file is closed.
thanks,
dhruba
On Tue, May 6, 2008 at 8:11 AM, Cagdas Gerede [EMAIL PROTECTED] wrote:
If you are writing 10 blocks for a file and let's say in 10th
From: Cagdas Gerede [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 29, 2008 11:32 PM
To: core-user@hadoop.apache.org
Cc: dhruba Borthakur
Subject: Block reports: memory vs. file system, and Dividing
offerService into 2 threads
Currently,
Block reports
reports: memory vs. file system, and Dividing
offerService into 2 threads
dhruba Borthakur wrote:
My current thinking is that block report processing should compare
the
blkxxx files on disk with the data structure in the Datanode memory.
If
and only if there is some discrepancy between these two
of Datanodes is large.
-dhruba
From: Cagdas Gerede [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 24, 2008 11:56 AM
To: dhruba Borthakur
Cc: core-user@hadoop.apache.org
Subject: Re: Please Help: Namenode Safemode
Hi Dhruba,
Thanks for your answer. But I
As far as I know, you need cygwin to install and run hadoop. The fact
that you are using cygwin to run hadoop has almost negligible impact on
the performance and efficiency of the hadoop cluster. Cyhgin is mostly
needed for the install and configuration scripts. There are a few small
portions of
The DFSClient caches small packets (e.g. 64K write buffers) and they are
lazily flushed to the datanoeds in the pipeline. So, when an application
completes a out.write() call, it is definitely not guaranteed that data
is sent to even one datanode.
One option would be to retrieve cache hints
The DFSClient has a thread that renews leases periodically for all files
that are being written to. I suspect that this thread is not getting a
chance to run because the gunzip program is eating all the CPU. You
might want to put in a Sleep() after every few seconds on unzipping.
Thanks,
dhruba
mean,
what is the configuration parameter dfs.secondary.http.address for?
Unless
there are plans to make this interface work, this config parameter
should go
away, and so should the listening thread, shouldn't they?
Thanks,
-Yuri
On Friday 04 April 2008 03:30:46 pm dhruba Borthakur wrote:
Your
Your configuration is good. The secondary Namenode does not publish a
web interface. The null pointer message in the secondary Namenode log
is a harmless bug but should be fixed. It would be nice if you can open
a JIRA for it.
Thanks,
Dhruba
-Original Message-
From: Yuri Pradkin
The namenode lazily instructs a Datanode to delete blocks. As a response to
every heartbeat from a Datanode, the Namenode instructs it to delete a maximum
on 100 blocks. Typically, the heartbeat periodicity is 3 seconds. The heartbeat
thread in the Datanode deletes the block files synchronously
is my another question.
If two different clients ordered move to trash with different interval,
(e.g. client #1 with fs.trash.interval = 60; client #2 with
fs.trash.interval = 120)
what would happen?
Does namenode keep track of all these info?
/Taeho
On 3/20/08, dhruba Borthakur [EMAIL PROTECTED
HDFS files, once created, cannot be modified in any way. Appends to HDFS
files will probably be supported in a future release in the next couple
of months.
Thanks,
dhruba
-Original Message-
From: Cagdas Gerede [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 18, 2008 9:53 AM
To:
Your procedure is right:
1. Copy edit.tmp from secondary to edit on primary
2. Copy srcimage from secondary to fsimage on primary
3. remove edits.new on primary
4. restart cluster, put in Safemode, fsck /
However, the above steps are not foolproof because the transactions that
occured between
HDFS can be accessed using the FileSystem API
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/File
System.html
The HDFS Namenode protocol can be found in
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/dfs/Nam
eNode.html
thanks,
dhruba
-Original
I agree with Joydeep. For batch processing, it is sufficient to make the
application not assume that HDFS is always up and active. However, for
real-time applications that are not batch-centric, it might not be
sufficient. There are a few things that HDFS could do to better handle
Namenode
The Namenode maintains a lease for every open file that is being written
to. If the client that was writing to the file disappears, the Namenode
will do lease recovery after expiry of the lease timeout (1 hour). The
lease recovery process (in most cases) will remove the last block from
the file
If your file system metadata is in /tmp, then you are likely to see
these kinds of problems. It would be nice if you can move the location
of your metadata files away from /tmp. If you still see the problem, can
you pl send us the logs from the log directory?
Thanks a bunch,
Dhruba
37 matches
Mail list logo