Mark Kerzner wrote:
Hi,
Hi,
why is hadoop suddenly telling me
Retrying connect to server: localhost/127.0.0.1:8020
with this configuration
fs.default.name
hdfs://localhost:9000
mapred.job.tracker
localhost:9001
Shouldnt this be
hdfs://localhost:9001
Amar
Hi,
Hi,
why is hadoop suddenly telling me
Retrying connect to server: localhost/127.0.0.1:8020
with this configuration
fs.default.name
hdfs://localhost:9000
mapred.job.tracker
localhost:9001
dfs.replication
1
and both this http://localhost:50070/dfs
Stefan Will wrote:
Hi,
I¹m using the new persistent job state feature in 0.19.0, and it¹s worked
really well so far. However, this morning my JobTracker died with and OOM
error (even though the heap size is set to 768M). So I killed it and all the
TaskTrackers.
Any specific reason why you kille
I am starting to wonder If hadoop 19 stable enough for production?
Vadim
On 2/9/09, Vadim Zaliva wrote:
> yes, I can access DFS from the cluster. namenode status seems to be OK
> and I see no errors in namenode log files.
>
> initially all trackers were visible, and 9433 maps completed
> succes
I tried modifying the settings, and I'm still running into the same
issue. I increased the xceivers count (fs.datanode.max.xcievers) in
the hadoop-site.xml file. I also checked to make sure the file
handles were increased, but they were fairly high to begin with.
I don't think I'm dealing
It is a good and useful overview,thank you. It also mentions Stuart Sierra's
post, where Stuart mentions that the process is slow. Does anybody know why?
I have written code to write from the PC file system to HDFS, and I also
noticed that it is very slow. Instead of 40M/sec, as promised by the To
Yo,
I don't want to sound all spammy, but Tom White wrote a pretty nice blog
post about small files in HDFS recently that you might find helpful. The
post covers some potential solutions, including Hadoop Archives:
http://www.cloudera.com/blog/2009/02/02/the-small-files-problem.
Later,
Jeff
On M
> I am planning to add the individual files initially, and after a while (lets
> say 2 days after insertion) will make a SequenceFile out of each directory
> (I am currently looking into SequenceFile) and delete the previous files of
> that directory from HDFS. That way in future, I can access any
Which version of hadoop are you using.
I think from 0.18 or 0.19 copyFromLocal accepts multiple files as input but
destination should be a directory.
Lohit
- Original Message
From: S D
To: Hadoop Mailing List
Sent: Monday, February 9, 2009 3:34:22 PM
Subject: copyFromLocal *
I'm us
We copy over selected files from HDFS to KFS and use an instance of KFS as
backup file system.
We use distcp to take backup.
Lohit
- Original Message
From: Allen Wittenauer
To: core-user@hadoop.apache.org
Sent: Monday, February 9, 2009 5:22:38 PM
Subject: Re: Backing up HDFS?
On 2/9/
On Feb 9, 2009, at 7:50 PM, jason hadoop wrote:
The other issue you may run into, with many files in your HDFS is
that you
may end up with more than a few 100k worth of blocks on each of your
datanodes. At present this can lead to instability due to the way the
periodic block reports to the n
Hey Hadoop Fans, I wanted to call your attention to an event we're
putting on next month that would be great for your academic contacts.
Please take a moment and forward this to any faculty you think might
be interested.
http://www.cloudera.com/sigcse-2009-disc-workshop
One of the big challenges
The other issue you may run into, with many files in your HDFS is that you
may end up with more than a few 100k worth of blocks on each of your
datanodes. At present this can lead to instability due to the way the
periodic block reports to the namenode are handled. The more blocks per
datanode, the
Hey,
There's also a ticket open to enable global snapshots for a single HDFS
instance: https://issues.apache.org/jira/browse/HADOOP-3637. While this
doesn't solve the multi-site backup issue, it does provide stronger
protection against programmatic deletion of data in a single cluster.
Regards,
J
On 2/9/09 4:41 PM, "Amandeep Khurana" wrote:
> Why would you want to have another backup beyond HDFS? HDFS itself
> replicates your data so if the reliability of the system shouldnt be a
> concern (if at all it is)...
I'm reminded of a previous job where a site administrator refused to make
tape
Correct.
+1 to Jason's more unix file handles suggestion. That's a must-have.
-Bryan
On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:
This would be an addition to the hadoop-site.xml file, to up
dfs.datanode.max.xcievers?
Thanks.
On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
Smal
On Feb 9, 2009, at 6:41 PM, Amandeep Khurana wrote:
Why would you want to have another backup beyond HDFS? HDFS itself
replicates your data so if the reliability of the system shouldnt be a
concern (if at all it is)...
It should be. HDFS is not an archival system. Multiple replicas
does
Replication only protects against single node failure. If there's a
fire and we lose the whole cluster, replication doesn't help. Or if
there's human error and someone accidentally deletes data, then it's
deleted from all the replicas. We want our backups to protect against
all these scenar
Why would you want to have another backup beyond HDFS? HDFS itself
replicates your data so if the reliability of the system shouldnt be a
concern (if at all it is)...
Amandeep
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Mon, Feb 9, 2009 at 4:17 PM
How do people back up their data that they keep on HDFS? We have many
TB of data which we need to get backed up but are unclear on how to do
this efficiently/reliably.
Hey Amit,
That plan sounds much better. I think you will find the system much
more scalable.
From our experience, it takes a while to get the right amount of
monitoring and infrastructure in place to have a very dependable
system with 2 replicas. I would recommend using 3 replicas until
I'm using the Hadoop FS shell to move files into my data store (either HDFS
or S3Native). I'd like to use wildcard with copyFromLocal but this doesn't
seem to work. Is there any way I can get that kind of functionality?
Thanks,
John
This would be an addition to the hadoop-site.xml file, to up
dfs.datanode.max.xcievers?
Thanks.
On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
Small files are bad for hadoop. You should avoid keeping a lot of
small files if possible.
That said, that error is something I've seen a lot.
Small files are bad for hadoop. You should avoid keeping a lot of
small files if possible.
That said, that error is something I've seen a lot. It usually
happens when the number of xcievers hasn't been adjusted upwards from
the default of 256. We run with 8000 xcievers, and that seems to
You will have to increase the per user file descriptor limit.
For most linux machines the file /etc/security/limits.conf controls this on
a per user basis.
You will need to log in a fresh shell session after making the changes, to
see them. Any login shells started before the change and process sta
Thanks Brian for your inputs.
I am eventually targeting to store 200k directories each containing 75
files on avg, with average size of directory being 300MB (ranging from 50MB
to 650MB) in this storage system.
It will mostly be an archival storage from where I should be able to access
any of th
Hi all -
I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
at org.apache.hadoop.dfs.DFSClient
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access
$1400(DFSClient
+1 on something like getValidBytes(). Just the existence of this would
warn many programmers about getBytes().
Raghu.
Owen O'Malley wrote:
On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:
Hey Tom,
I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ??
There is a bug that when we restart the TaskTrackers they get counted twice.
The problem is the name is generated from the hostname and port number. When
TaskTrackers restart they get a new port number and get counted again. The
problem goes away when the old TaskTrackers time out in 10 minutes or
Hi,
I¹m using the new persistent job state feature in 0.19.0, and it¹s worked
really well so far. However, this morning my JobTracker died with and OOM
error (even though the heap size is set to 768M). So I killed it and all the
TaskTrackers. After starting everything up again, all my nodes were s
Hey Amit,
Your current thoughts on keeping block size larger and removing the
very small files are along the right line. Why not chose the default
size of 64MB or larger? You don't seem too concerned about the number
of replicas.
However, you're still fighting against the tide. You've
Wednesday Feb 11, Mountain View, CA
info/registration:
http://www.meetup.com/CIO-IT-Executives/calendar/9528874/
Speaker:
Rob Weltman has been Director of Engineering in Enterprise Software at Nescape,
Chief Architect at AOL, and Director of Engineering for Yahoo's data warehouse
technology. He
On Feb 8, 2009, at 11:26 PM, Taeho Kang wrote:
Dear All,
With Hadoop 0.19.0, Reduce stage does not start until Map stage gets
to the
100% completion.
Has anyone faced the similar situation?
How many maps and reduces does your job have?
Arun
I believe that in Hadoop 0.19, scheduling was changed so that reduces don't
start until 5% of maps have completed. The reasoning for this is that
reduces can't do anything until there is some map output to copy over the
network. So, if your job has very few map tasks, you won't see reduces start
un
On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:
Hey Tom,
I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes()
kind of function.
It does it because continually resizing the array to the "valid"
length
Thanks everyone. I find the solution for this one, in my main method, i call
the setNumReductTask() on JobConf with the value i want.
2009/2/9 Owen O'Malley
>
> On Feb 7, 2009, at 11:52 PM, Nick Cen wrote:
>
> Hi,
>>
>> I hava a hadoop cluster with 4 pc. And I wanna to integrate hadoop and
>> l
yes, I can access DFS from the cluster. namenode status seems to be OK
and I see no errors in namenode log files.
initially all trackers were visible, and 9433 maps completed
successfully. Then, this was followed by 65975 which were killed. In
log they all show same error:
Error initializing atte
Hi
I think the number of your job's reduce task is 1
because if the number of reduce task is 1 then reduce stage does not start
until Map stage 100% completion.
zhuweimin
-Original Message-
From: Taeho Kang [mailto:tka...@gmail.com]
Sent: Monday, February 09, 2009 4:26 PM
To: hadoop-u..
On Feb 7, 2009, at 11:52 PM, Nick Cen wrote:
Hi,
I hava a hadoop cluster with 4 pc. And I wanna to integrate hadoop and
lucene together, so i copy some of the source code from nutch's
Indexer
class, but when i run my job, i found that there is only 1 reducer
running
on 1 pc, so the perfor
39 matches
Mail list logo