You can use any of these:
1. bin/hadoop dfs -get hdfsfile
2. Thrift API : http://wiki.apache.org/hadoop/HDFS-APIs
3. use fuse-mount ot mount hdfs as a regular file system on remote machine:
http://wiki.apache.org/hadoop/MountableHDFS
thanks,
dhruba
On Mon, Apr 20, 2009 at 9:40 PM, Parul K
, 2008 at 11:35 PM, Sandeep Dhawan wrote:
>
> Hi Dhruba,
>
> The file is being closed properly but the timestamp does not get modified.
> The modification timestamp
> still shows the file creation time.
> I am creating a new file and writing data into this file.
>
> Tha
I believe that file modification times are updated only when the file is
closed. Are you "appending" to a preexisting file?
thanks,
dhruba
On Tue, Dec 30, 2008 at 3:14 AM, Sandeep Dhawan wrote:
>
> Hi,
>
> I have a application which creates a simple text file on hdfs. There is a
> second appli
The design is such that running multiple secondary namenodes should not
corrupt the image (modulo any bugs). Are you seeing image corruptions when
this happens?
You can run all or any daemons in 32-bit mode or 64 bit-mode. You can
mix-and-match. If you have many millions of files, then you might w
Hi Dennis,
There were some discussions on this topic earlier:
http://issues.apache.org/jira/browse/HADOOP-3799
Do you have any specific use-case for this feature?
thanks,
dhruba
On Mon, Nov 24, 2008 at 10:22 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
>
> On Nov 24, 2008, at 8:44 PM, Mahadev
One can open a file and then seek to an offset and then start reading
from there. For writing, one can write only to the end of an existing
file using FileSystem.append().
hope this helps,
dhruba
On Thu, Nov 13, 2008 at 1:24 PM, Tsz Wo (Nicholas), Sze
<[EMAIL PROTECTED]> wrote:
> Append is going
Couple of things that one can do:
1. dfs.name.dir should have at least two locations, one on the local
disk and one on NFS. This means that all transactions are
synchronously logged into two places.
2. Create a virtual IP, say name.xx.com that points to the real
machine name of the machine on whi
It can return 0 if and only if the requested size was zero. For EOF,
it should return -1.
dhruba
On Fri, Nov 7, 2008 at 8:09 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote:
> Just want to ensure 0 iff EOF or the requested #of bytes was 0.
>
> On 11/7/08 6:13 PM, "Pete Wyckoff" <[EMAIL PROTECTED]> wro
, this is a rare case, but for online
> services, it's quite a common case.
> I think HBase developers would have run into similar issues as well.
>
> Is this enough explanation?
>
> Thanks in advance,
>
> Taeho
>
>
>
> On Tue, Nov 4, 2008 at 3:12 AM, Dhruba
In the current code, details about block locations of a file are
cached on the client when the file is opened. This cache remains with
the client until the file is closed. If the same file is re-opened by
the same DFSClient, it re-contacts the namenode and refetches the
block locations. This works
Hi Ben,
And, if I may add, if you would like to contribute the code to make this
happen, that will be awesome! In that case, we can move this discussion to a
JIRA.
Thanks,
dhruba
On 10/27/08 1:41 PM, "Ashish Thusoo" <[EMAIL PROTECTED]> wrote:
We did have some discussions around it a while ba
My opinion is to not store file-namespace related metadata on the
datanodes. When a file is renamed, one has to contact all datanodes to
change this new metadata. Worse still, if one renames an entire
subdirectory, all blocks that belongs to all files in the subdirectory
have to be updated. Similar
The DFS errors might have been caused by
http://issues.apache.org/jira/browse/HADOOP-4040
thanks,
dhruba
On Sat, Sep 6, 2008 at 6:59 AM, Devaraj Das <[EMAIL PROTECTED]> wrote:
> These exceptions are apparently coming from the dfs side of things. Could
> someone from the dfs side please look at t
In almost all hadoop configurations, all host names can be specified
as IP address. So, in your hadoop-site.xml, please specify the IP
address of the namenode (instead of its hostname).
-dhruba
2008/8/8 Lucas Nazário dos Santos <[EMAIL PROTECTED]>:
> Thanks Andreas. I'll try it.
>
>
> On Fri, Aug
When the first one contacts the namenode to open the file for writing,
the namenode records this info in a "lease". When the second process
contacts the namenode to open the same file for writing, the namenode
sees that a "lease" already exists for the file and rejects the
request from the second
It is possible that your namenode is overloaded and is not able to
respond to RPC requests from clients. Please check the namenode logs
to see if you see lines of the form "discarding calls...".
dhrua
On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov
<[EMAIL PROTECTED]> wrote:
> I come across the
one and when with secondary namenode process ?
>
>
> Andrzej Bialecki wrote:
>>
>> Dhruba Borthakur wrote:
>>> A good way to implement failover is to make the Namenode log transactions
>>> to
>>> more than one directory, typically a local directory and a NFS m
One option for you is to use a pdf-to-text converter (many of them are
available online) and then run map-reduce on the txt file.
-dhruba
On Wed, Jul 23, 2008 at 1:07 AM, GaneshG
<[EMAIL PROTECTED]> wrote:
>
> Thanks Lohit, i am using only defalult reader and i am very new to hadoop.
> This is my
You are running out of file handles on the namenode. When this
happens, the namenode cannot receive heartbeats from datanodes because
these heartbeats arrive on a tcp/ip socket connection and the namenode
does not have any free file descriptors to accept these socket
connections. Your data is stil
HDFS uses the network topology to distribute and replicate data. An
admin has to configure a script that describes the network topology to
HDFS. This is specified by using the parameter
"topology.script.file.name" in the Configuration file. This has been
tested when nodes are on different subnets i
The maximum number of files in HDFS depends on the amount of memory
available for the namenode. Each file object and each block object
takes about 150 bytes of the memory. Thus, if you have 1million files
and each file has 1 one block each, then you would need about 3GB of
memory for the namenode.
e way to force the rebalancing operation
>
> thanks,
> -prasana
>
> Dhruba Borthakur wrote:
>>
>> What that means is that the new nodes will be relatively empty
>> till new data arrives into the cluster. It might take a while for the new
>> nodes to get fil
This "firstbadlink" was an mis-configured log message in the code. It
is innocuous and has since been fixed in 0.17 release.
http://issues.apache.org/jira/browse/HADOOP-3029
thanks,
dhruba
On Sat, May 24, 2008 at 7:03 PM, C G <[EMAIL PROTECTED]> wrote:
> Hi All:
>
> So far, running 0.16.4 has be
If you look at the log message starting with "STARTUP_MSG: build
=..." you will see that the namenode and good datanode was built by CG
whereas the bad datanodes were compiled by hadoopqa!
thanks,
dhruba
On Fri, May 23, 2008 at 9:01 AM, C G <[EMAIL PROTECTED]> wrote:
> 2008-05-23 11:53:25,377 I
There isn's a way to change the block size of an existing file. The
block size of a file can be specified only at the time of file
creation and cannot be changed later.
There isn't any wasted space in your system. If the block size is
128MB but you create a HDFS file of say size 10MB, then that fi
What version of java are you using? How may threads are you running on
the namenode? How many cores does your machines have?
thanks,
dhruba
On Fri, May 16, 2008 at 6:02 AM, André Martin <[EMAIL PROTECTED]> wrote:
> Hi Hadoopers,
> we are experiencing a lot of "Could not obtain block / Could not g
0 (0.0 %)
> Target replication factor: 3
> Real replication factor: 3.0
>
>
> The filesystem under path '/' is CORRUPT
>
> So it seems like it's fixing some problems on its own?
>
> Thanks,
> C G
>
>
> Dhruba Borthakur <[EMA
You bring up an interesting point. A big chunk of the code in the
Namenode is being done inside a global lock although there are pieces
(e.g. a portion of code that chooses datanodes for a newly allocated
block) that do execute outside this lock. But, it is probably the case
that the namenode does
Did one datanode fail or did the namenode fail? By "fail" do you mean
that the system was rebooted or was there a bad disk that caused the
problem?
thanks,
dhruba
On Sun, May 11, 2008 at 7:23 PM, C G <[EMAIL PROTECTED]> wrote:
> Hi All:
>
> We had a primary node failure over the weekend. When
Starting in 0.17 release, an application can invoke
DFSOutputStream.fsync() to persist block locations for a file even
before the file is closed.
thanks,
dhruba
On Tue, May 6, 2008 at 8:11 AM, Cagdas Gerede <[EMAIL PROTECTED]> wrote:
> If you are writing 10 blocks for a file and let's say in 10t
ock reports: memory vs. file system, and Dividing
offerService into 2 threads
dhruba Borthakur wrote:
> My current thinking is that "block report processing" should compare
the
> blkxxx files on disk with the data structure in the Datanode memory.
If
> and only if there is some dis
s,
dhruba
From: Cagdas Gerede [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 29, 2008 11:32 PM
To: core-user@hadoop.apache.org
Cc: dhruba Borthakur
Subject: Block reports: memory vs. file system, and Dividing
offerService into 2 threads
Currently,
Blo
of Datanodes is large.
-dhruba
From: Cagdas Gerede [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 24, 2008 11:56 AM
To: dhruba Borthakur
Cc: core-user@hadoop.apache.org
Subject: Re: Please Help: Namenode Safemode
Hi Dhruba,
Thanks for your answer. But I
: Cagdas Gerede [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 23, 2008 4:37 PM
To: core-user@hadoop.apache.org
Cc: dhruba Borthakur
Subject: Please Help: Namenode Safemode
I have a hadoop distributed file system with 3 datanodes. I only have
150 blocks in each datanode. It takes a little more
As far as I know, you need cygwin to install and run hadoop. The fact
that you are using cygwin to run hadoop has almost negligible impact on
the performance and efficiency of the hadoop cluster. Cyhgin is mostly
needed for the install and configuration scripts. There are a few small
portions of cl
You should be able to run "bin/hadoop fsck -files -blocks -locations /"
and get a listing of all files and the datanode(s) that each block of
the file resides in.
Thanks,
dhruba
-Original Message-
From: Shimi K [mailto:[EMAIL PROTECTED]
Sent: Monday, April 21, 2008 2:12 AM
To: core-user@
The DFSClient caches small packets (e.g. 64K write buffers) and they are
lazily flushed to the datanoeds in the pipeline. So, when an application
completes a out.write() call, it is definitely not guaranteed that data
is sent to even one datanode.
One option would be to retrieve cache hints fr
The DFSClient has a thread that renews leases periodically for all files
that are being written to. I suspect that this thread is not getting a
chance to run because the gunzip program is eating all the CPU. You
might want to put in a Sleep() after every few seconds on unzipping.
Thanks,
dhruba
-
Yes, just point the Datanodes to different config files, different sets
of ports, different data directories. Etc.etc.
Thanks,
dhruba
-Original Message-
From: Cagdas Gerede [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 15, 2008 11:21 AM
To: core-user@hadoop.apache.org
Subject: multiple
hing useful? I
mean,
what is the configuration parameter dfs.secondary.http.address for?
Unless
there are plans to make this interface work, this config parameter
should go
away, and so should the listening thread, shouldn't they?
Thanks,
-Yuri
On Friday 04 April 2008 03:30:46 pm dh
Your configuration is good. The secondary Namenode does not publish a
web interface. The "null pointer" message in the secondary Namenode log
is a harmless bug but should be fixed. It would be nice if you can open
a JIRA for it.
Thanks,
Dhruba
-Original Message-
From: Yuri Pradkin [mailt
HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700
for more details.
Thanks,
dhruba
-Original Message-
From: Raghavendra K [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 26, 2008 11:29 PM
To: core-user@hadoop.apache.org
Subject: Append data in hdfs_write
Hi,
I
The namenode lazily instructs a Datanode to delete blocks. As a response to
every heartbeat from a Datanode, the Namenode instructs it to delete a maximum
on 100 blocks. Typically, the heartbeat periodicity is 3 seconds. The heartbeat
thread in the Datanode deletes the block files synchronously
There is a C-language based API to access HDFS. You can find more
details at:
http://wiki.apache.org/hadoop/LibHDFS
If you download the Hadoop source code from
http://hadoop.apache.org/core/releases.html, you will see this API in
src/c++/libhdfs/hdfs.c
hope this helps,
dhruba
-Original Mess
my another question.
If two different clients ordered "move to trash" with different interval,
(e.g. client #1 with fs.trash.interval = 60; client #2 with
fs.trash.interval = 120)
what would happen?
Does namenode keep track of all these info?
/Taeho
On 3/20/08, dhruba Borthakur <[EM
The "trash" feature is a client side option and depends on the client
configuration file. If the client's configuration specifies that "Trash"
is enabled, then the HDFS client invokes a "rename to Trash" instead of
a "delete". Now, if "Trash" is enabled on the Namenode, then the
Namenode periodical
HDFS files, once created, cannot be modified in any way. Appends to HDFS
files will probably be supported in a future release in the next couple
of months.
Thanks,
dhruba
-Original Message-
From: Cagdas Gerede [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 18, 2008 9:53 AM
To: core-user@
Your procedure is right:
1. Copy edit.tmp from secondary to edit on primary
2. Copy srcimage from secondary to fsimage on primary
3. remove edits.new on primary
4. restart cluster, put in Safemode, fsck /
However, the above steps are not foolproof because the transactions that
occured between th
HDFS can be accessed using the FileSystem API
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/File
System.html
The HDFS Namenode protocol can be found in
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/dfs/Nam
eNode.html
thanks,
dhruba
-Original Messag
The following issues might be impacting you (from release notes)
http://issues.apache.org/jira/browse/HADOOP-2185
HADOOP-2185. RPC Server uses any available port if the specified
port is zero. Otherwise it uses the specified port. Also combines
the configuration attributes for the se
Hi Andre,
Is it possible for you to let me look at your entire Namenode log?
Thanks,
dhruba
-Original Message-
From: André Martin [mailto:[EMAIL PROTECTED]
Sent: Saturday, March 01, 2008 4:32 PM
To: core-user@hadoop.apache.org
Subject: org.apache.hadoop.dfs.NameNode: java.lang.NullPoint
It would nice if a layer on top of the dfs client can be built to handle
disconnected operation. That layer can cache files on local disk if HDFS
is unavailable. It can then upload those files into HDFS when HDFS
service comes back online. I think such a service will be helpful for
most HDFS instal
I agree with Joydeep. For batch processing, it is sufficient to make the
application not assume that HDFS is always up and active. However, for
real-time applications that are not batch-centric, it might not be
sufficient. There are a few things that HDFS could do to better handle
Namenode outages:
The Namenode maintains a lease for every open file that is being written
to. If the client that was writing to the file disappears, the Namenode
will do "lease recovery" after expiry of the lease timeout (1 hour). The
lease recovery process (in most cases) will remove the last block from
the file (
If your file system metadata is in /tmp, then you are likely to see
these kinds of problems. It would be nice if you can move the location
of your metadata files away from /tmp. If you still see the problem, can
you pl send us the logs from the log directory?
Thanks a bunch,
Dhruba
-Original
Reformatting should never be necessary if you are using released version
of hadoop. Hadoop-2783 refers to a bug that got introduced into trunk
(not in any released versions).
Thanks,
Dhruba
-Original Message-
From: Steve Sapovits [mailto:[EMAIL PROTECTED]
Sent: Friday, February 22, 2008
Hi Pete,
If you are referring to the ability to re-open a file and append to it,
then this feature is not in 0.16. Please see:
http://issues.apache.org/jira/browse/HADOOP-1700
Thanks,
dhruba
-Original Message-
From: Pete Wyckoff [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 21, 200
You have to use the -w parameter to the setrep command to make it wait
till the replication is complete. The following command
bin/hadoop dfs -setrep 10 -w filename
will block till all blocks of the file achieves a replication factor of
10.
Thanks,
dhruba
-Original Message-
From: Tim Wi
Hi Jason,
Good catch. It would be great if you can create a JIRA issue and submit
your code change as a patch for this problem.
There are some big sites (about 1000 node clusters) that use libhdfs to
access HDFS.
Thanks,
Dhruba
-Original Message-
From: Jason Venner [mailto:[EMAIL PROTEC
59 matches
Mail list logo