Quick question for the hadoop / linux masters out there:
I recently observed a stalled tasktracker daemon on our production cluster,
and was wondering if there were common tests to detect failures so that
administration tools (e.g. monit) can automatically restart the daemon. The
particular
hello everyone,
i have a problem in hadoop startup ,every time i try to start hadoop name
node doesnot start and when i tried to stop name node ,it gives an error :no
name node to start.
i tried to format the name node and it works well ,but now i have data in
hadoop and formatting name node will
Hi,
I am new to Hadoop. I just configured it based on the documentation. While
I was running example program wordcount.java, I am getting errors. When I
gave command
$ /bin/hadoop dfs -mkdir santhosh , I am getting error as
09/10/08 13:30:12 INFO ipc.Client: Retrying connect to server: /
Hi.
I'm using the stock Ext3 as the most tested one, but I wonder, has someone
ever tried, or even using there days in production another file system, like
JFS, XFS or even maybe Ext4?
I'm exploring way to boost the performance of DataNodes, and this seems as
one of possible venues.
Thanks for
It sounds like the name node is crashing on startup. What kind of
errors are there in the name node log?
On Thu, Oct 8, 2009 at 4:01 AM, asmaa.atef sw_as...@hotmail.com wrote:
hello everyone,
i have a problem in hadoop startup ,every time i try to start hadoop name
node doesnot start and when
I have used xfs pretty extensively, it seemed to be somewhat faster than
ext3.
The only trouble we had related to some machines running the PAE 32 bit
kernels, where we the filesystems lockup. That is an obscure use case
however.
Running JBOD with your dfs.data.dir listing a directory on each
Hi.
Thanks for the info, question is whether XFS performance justifies switching
from the more common Ext3?
JBOD is a great approach indeed.
Regards.
2009/10/8 Jason Venner jason.had...@gmail.com
I have used xfs pretty extensively, it seemed to be somewhat faster than
ext3.
The only
As an aside, there's a short article comparing the two in the latest
edition of Linux Journal. It was hardly scientific, but the main
points were:
- XFS is faster than ext3, especially for large files
- XFS is currently unsupported on Red Hat Enterprise, but apparently
will be soon.
On Thu,
Busy datanodes become bound by the metadata lookup times for the directory
and inode entries required to open a block.
Anything that optimizes that will help substantially.
We are thinking of playing with brtfs, and using a small SSD for our file
system metadata, and the spinning disks for the
Check out the bottom of this page:
http://wiki.apache.org/hadoop/DiskSetup
noatime is all we've done in our environment. I haven't found it worth the
time to optimize further since we're CPU bound in most of our jobs.
-paul
On Thu, Oct 8, 2009 at 3:26 PM, Stas Oskin stas.os...@gmail.com
I've used XFS on Silicon Graphics machines and JFS on AIX systems --
both were quite fast and extremely reliable, though this long predates
my use of Hadoop.
To your question, I recently came across a blog that compares
performance of several Linux filesystems:
On Thu, Oct 8, 2009 at 4:00 PM, Jason Venner jason.had...@gmail.com wrote:
noatime is absolutely essential, I forget to mention it, because it is
automatic now for me.
I have a fun story about atime, I have some Solaris machines with ZFS file
systems, and I was doing a find on a 6 level
FYI---the University of Maryland is seeking an assistant professor in
cloud computing. See job description below.
=
College of Information Studies, Maryland's iSchool
University of Maryland, College Park
Assistant Professor in Cloud Computing
The recently-formed Cloud
Hi,
I need to get the position of the key being processed in a mapper task.
My inputFile is a sequence file
I tried the Context, but the best i could get was the inputsplit
position and the
file name
My other option is to start recording the pos in the key value while generating
the
Hi Jason.
Brtfs is cool, I read that it has a 10% better performance then any other FS
coming next to it.
Can you post here the results of any your findings?
Regards.
2009/10/8 Jason Venner jason.had...@gmail.com
Busy datanodes become bound by the metadata lookup times for the directory
and
On Thu, Oct 8, 2009 at 9:15 PM, Stas Oskin stas.os...@gmail.com wrote:
Hi.
I head about this option before, but never actually tried it.
There is also another option, called relatime, which described as being
more compatible then noatime.
Can anyone comment on this?
Regards.
2009/10/8
Hi James,
This doesn't quite answer your original question, but if you want to help
track down these kinds of bugs, you should grab a stack trace next time this
happens.
You can do this either using jstack from the command line, by visiting
/stacks on the HTTP interface, or by sending the process
On Thu, Oct 8, 2009 at 9:20 PM, Todd Lipcon t...@cloudera.com wrote:
Hi James,
This doesn't quite answer your original question, but if you want to help
track down these kinds of bugs, you should grab a stack trace next time this
happens.
You can do this either using jstack from the command
Hi Santosh,
Check whether all the datanodes are up and running, using
the command
'bin/hadoop dfsadmin -report'.
On Thu, Oct 8, 2009 at 4:24 AM, santosh gandham santhosh...@gmail.comwrote:
Hi,
I am new to Hadoop. I just configured it based on the documentation. While
I
Hi Ishwar,
You can implement a custom MapRunner and retrieve the position from the
reader before calling your map function. Be aware though, that for block
compressed files, the position returned represents block start position, not
the individual record position.
Ahad.
On Thu, Oct 8, 2009 at
Oops, memory fails me. To correct my previous statement, for block
compressed files, getPosition reflects the position in the input stream of
the NEXT compressed block of data, so you have to watch for the change in
position after reading the key/value to capture a block transition.
Ahad.
On Thu,
21 matches
Mail list logo