On 6/22/09 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com wrote:
Hi all,
How does one handle a mount running out of space for HDFS? We have two
disks mounted on /mnt and /mnt2 respectively on one of the machines that are
used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is
On 6/22/09 12:15 PM, Qin Gao q...@cs.cmu.edu wrote:
Do you know if the tmp directory on every map/reduce task will be deleted
automatically after the map task finishes or will do I have to delete them?
I mean the tmp directory that automatically created by on current directory.
Past
On 6/19/09 3:49 AM, Harish Mallipeddi harish.mallipe...@gmail.com wrote:
Why do you want to do this in the first place? It seems like you want
cluster1 to be a plain HDFS cluster and cluster2 to be a mapred cluster.
Doing something like that will be disastrous - Hadoop is all about sending
On 6/15/09 11:16 PM, Palleti, Pallavi pallavi.pall...@corp.aol.com
wrote:
We have chown command in hadoop dfs to make a particular directory own
by a person. Do we have something similar to create user with some space
limit/restrict the disk usage by a particular user?
Quotas are
On 6/13/09 9:00 AM, PORTO aLET portoa...@gmail.com wrote:
I am just wondering what do facebook/yahoo do with the data in hdfs after
they finish processing the log files or whatever that are in hdfs?
Are they simply deleted? or get backed up in tape ?
whats the typical process?
The grid
On 5/26/09 3:40 AM, Steve Loughran ste...@apache.org wrote:
HDFS is as secure as NFS: you are trusted to be who you say you are.
Which means that you have to run it on a secured subnet -access
restricted to trusted hosts and/or one two front end servers or accept
that your dataset is
On 5/18/09 11:33 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
Do not forget 'tune2fs -m 2'. By default this value gets set at 5%.
With 1 TB disks we got 33 GB more usable space. Talk about instant
savings!
Yup. Although, I think we're using -m 1.
On Mon, May 18, 2009 at 1:31 PM,
On 5/15/09 11:38 AM, Owen O'Malley o...@yahoo-inc.com wrote:
We have observed that the default jvm on RedHat 5
I'm sure some people are scratching their heads at this.
The default JVM on at least RHEL5u0/1 is a GCJ-based 1.4, clearly
incapable of running Hadoop. We [and, really,
On 4/24/09 9:31 AM, Marc Limotte mlimo...@feeva.com wrote:
I've heard that HDFS starts to slow down after it's been running for a long
time. And I believe I've experienced this.
We did an upgrade (== complete restart) of a 2000 node instance in ~20
minutes on Wednesday. I wouldn't really
On 3/13/09 11:25 AM, Vadim Zaliva kroko...@gmail.com wrote:
When you stripe you automatically make every disk in the system have the
same speed as the slowest disk. In our experiences, systems are more likely
to have a 'slow' disk than a dead one and detecting that is really
really
On 2/9/09 4:41 PM, Amandeep Khurana ama...@gmail.com wrote:
Why would you want to have another backup beyond HDFS? HDFS itself
replicates your data so if the reliability of the system shouldnt be a
concern (if at all it is)...
I'm reminded of a previous job where a site administrator refused
On 1/28/09 7:42 PM, Andy Liu andyliu1...@gmail.com wrote:
I'm running Hadoop 0.19.0 on Solaris (SunOS 5.10 on x86) and many jobs are
failing with this exception:
Error initializing attempt_200901281655_0004_m_25_0:
java.io.IOException: Cannot run program chmod: error=12, Not enough space
On 12/27/08 12:18 AM, Arun Venugopal arunvenugopa...@gmail.com wrote:
Yes, I was able to run this on AIX as well with a minor change to the
DF.java code. But this was more of a proof of concept than on a
production system.
There are lots of places where Hadoop (esp. in contrib) interprets the
On 11/25/08 3:58 PM, Sagar Naik [EMAIL PROTECTED] wrote:
I am trying to migrate from 32 bit jvm and 64 bit for namenode only.
*setup*
NN - 64 bit
Secondary namenode (instance 1) - 64 bit
Secondary namenode (instance 2) - 32 bit
datanode- 32 bit
On 11/21/08 6:03 AM, Alexander Aristov [EMAIL PROTECTED]
wrote:
Trying hadoop-0.18.2 I got next output
[root]# hadoop fs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2008-11-21 08:08 /mnt
drwxr-xr-x - root supergroup 0 2008-11-21 08:19 /repos
... which
On 11/10/08 10:42 PM, Dhruba Borthakur [EMAIL PROTECTED] wrote:
2. Create a virtual IP, say name.xx.com that points to the real
machine name of the machine on which the namenode runs.
Everyone doing this should be aware of the discussion happening in
On 11/10/08 1:30 AM, Aaron Kimball [EMAIL PROTECTED] wrote:
It sounds like you think the 64- and 32-bit environments are effectively
interchangable. May I ask why are you using both? The 64bit environment
gives you access to more memory; do you see faster performance for the TT's
in 32-bit
On 11/10/08 6:18 AM, Brian MacKay [EMAIL PROTECTED] wrote:
I had a similar problem when I upgraded... not sure of details why, but
I had permissions problems trying to develop and run on windows out of
cygwin.
At Apachecon, we think we identified a case where someone forgot to copy
the
On 11/10/08 12:21 PM, Rick Hangartner [EMAIL PROTECTED] wrote:
But is there a proper way to allow developers to specify a remote_username
they legitimately have access to on the cluster if it is not the same
as the local_username of the account on their own machine they are
using to submit
On 11/6/08 10:17 PM, C G [EMAIL PROTECTED] wrote:
I've got a grid which has been up and running for some time. It's been using a
32 bit JVM. I am hitting the wall on memory within NameNode and need to
specify max heap size 4G. Is it possible to switch seemlessly from 32bit
JVM to 64bit?
On 11/4/08 2:16 AM, Arijit Mukherjee [EMAIL PROTECTED]
wrote:
* 1-5 TB external storage
I'm curious to find out what sort of specs do people use normally. Is
the external storage essential or will the individual disks on each node
be sufficient? Why would you need an external storage in
On 10/21/08 3:33 AM, Jean-Adrien [EMAIL PROTECTED] wrote:
I expected to keep 3.75 Gb free.
But free space goes under 1 Gb, as if I kept the default settings
I noticed that you're running on /. In general, this is a bad idea, as
space can disappear in various ways and you'll never know.
On 10/13/08 11:06 AM, Tarandeep Singh [EMAIL PROTECTED] wrote:
I want to push third party jar files that are required to execute my job, on
slave machines. What is the best way to do this?
Use a DistributedCache as part of your job submission.
On 10/9/08 6:46 PM, Songting Chen [EMAIL PROTECTED] wrote:
Does that mean I have to rebuild the native library?
Also, the LZO installation puts liblzo2.a and liblzo2.la under /usr/local/lib.
There is no liblzo2.so there. Do I need to rename them to liblzo2.so somehow?
You need to
On 10/6/08 6:39 AM, Steve Loughran [EMAIL PROTECTED] wrote:
Edward Capriolo wrote:
You bring up some valid points. This would be a great topic for a
white paper.
-a wiki page would be a start too
I was thinking about doing Deploying Hadoop Securely for a ApacheCon EU
talk, as by that
On 10/2/08 11:33 PM, Frank Singleton [EMAIL PROTECTED] wrote:
Just to clarify, this is for when the chown will modify all files owner
attributes
eg: toggle all from frank:frank to hadoop:hadoop (see below)
When we converted from 0.15 to 0.16, we chown'ed all of our files. The
local
On 9/21/08 9:40 AM, Guilherme Menezes [EMAIL PROTECTED]
wrote:
We currently have 4 nodes (16GB of
ram, 6 * 750 GB disks, Quad-Core AMD Opteron processor). Our initial plans
are to perform a Web crawl for academic purposes (something between 500
million and 1 billion pages), and we need to
On 9/11/08 2:39 AM, Alex Loddengaard [EMAIL PROTECTED] wrote:
I've never dealt with a large cluster, though I'd imagine it is managed the
same way as small clusters:
Maybe. :)
-Use hostnames or ips, whichever is more convenient for you
Use hostnames. Seriously. Who are you people
On 9/5/08 5:53 AM, Andreas Kostyrka [EMAIL PROTECTED] wrote:
Another idea would be a tool or namenode startup mode that would make it
ignore EOFExceptions to recover as much of the edits as possible.
We clearly need to change the how to configure docs to make sure
people put at least
On 9/2/08 8:33 AM, Camilo Gonzalez [EMAIL PROTECTED] wrote:
I was wondering if there is a way to Hot-Swap Slave machines, for example,
in case an Slave machine fails while the Cluster is running and I want to
mount a new Slave machine to replace the old one, is there a way to tell the
the amount of work the name node will use to re-replicate
the file in case of failure and the total amount of disk space used... So
the extra bandwidth isn't free.
Allen Wittenauer schrieb:
On 8/27/08 12:54 AM, Mork0075 [EMAIL PROTECTED] wrote:
i'am planning to use HDFS as a DFS in a web
On 8/27/08 12:54 AM, Mork0075 [EMAIL PROTECTED] wrote:
i'am planning to use HDFS as a DFS in a web application evenvironment.
There are two requirements: fault tolerence, which is ensured by the
replicas and load balancing.
There is a SPOF in the form of the name node. So depending
On 8/17/08 10:56 AM, Filippo Spiga [EMAIL PROTECTED] wrote:
I read the tutorial about HOD (Hadoop on demand) but HOD use torque only for
initial node allocation. I would use TORQUE also for computation, allowing
users to load data into HDFS, submit a TORQUE JOB that execute a Map/Reduce
task
On 8/12/08 12:07 PM, lohit [EMAIL PROTECTED] wrote:
- why RAID5?
- If running RAID 5, why is this necessary?
Not absolute necessary.
I'd be afraid of the write penalty of RAID5 vs, say, RAID10 or even just
plain RAID1.
For the record, I don't think we have any production systems
On 8/8/08 1:25 PM, James Graham (Greywolf) [EMAIL PROTECTED] wrote:
226GB of available disk space on each one;
4 processors (2 x dualcore)
8GB of RAM each.
Some simple stuff:
(Assuming SATA):
Are you using AHCI?
Do you have the write cache enabled?
Is the topologyProgram providing proper
On 8/6/08 11:52 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:
You can put the same hadoop-site.xml on all machines. Yes, you do want a
secondary NN - a single NN is a SPOF. Browser the archives a few days back to
find an email from Paul about DRBD (disk replication) to avoid this SPOF.
On 8/4/08 11:10 AM, Meng Mao [EMAIL PROTECTED] wrote:
I suppose I could, for each datanode, symlink things to point to the actual
Hadoop installation. But really, I would like the setup that is hinted as
possible by statement 1). Is there a way I could do it, or should that bit
of
On 7/29/08 6:37 PM, Rafael Turk [EMAIL PROTECTED] wrote:
I´m setting up a cluster with 4 disks per server. Is there any way to make
Hadoop aware of this setup and take benefits from that?
This is how we run our nodes. You just need to list the four file
systems in the configuration files
On 7/17/08 3:33 PM, Theocharis Ian Athanasakis [EMAIL PROTECTED] wrote:
What's the recommended way to restrict access to job submissions and
HDFS access, besides a firewall?
We basically put bastion hosts (we call them gateways) next to hadoop
that users use to submit jobs, access the
On 6/11/08 5:17 PM, Chris Collins [EMAIL PROTECTED] wrote:
The finer point to this is that in development you may be logged in as
user x and have a shared hdfs instance that a number of people are
using. In that mode its not practical to sudo as you have all your
development tools setup
On 6/5/08 11:38 AM, Ted Dunning [EMAIL PROTECTED] wrote:
We use encryption on log files using standard AES. I wrote an input format
to deal with it.
Key distribution should be done better than we do it. My preference would
be to insert an auth key into the job conf which is then used by
On 6/5/08 11:57 AM, Ted Dunning [EMAIL PROTECTED] wrote:
Can you suggest an alternative way to communicate a secret to hadoop tasks
short of embedding it into source code?
This is one of the reasons why we use hod--job isolation such that it
helps prevent data leaks from one job to the
On 5/28/08 1:22 PM, Andreas Kostyrka [EMAIL PROTECTED] wrote:
I just wondered what other people use to access the hadoop webservers,
when running on EC2?
While we don't run on EC2 :), we do protect the hadoop web processes by
putting a proxy in front of it. A user connects to the proxy,
On 5/15/08 8:56 AM, Steve Loughran [EMAIL PROTECTED] wrote:
Allen Wittenauer wrote:
On 5/15/08 5:05 AM, Steve Loughran [EMAIL PROTECTED] wrote:
I have a question for users: how do they ensure their client apps have
configuration XML file that are kept up to date?
We control
On 5/2/08 7:22 AM, Andre Gauthier [EMAIL PROTECTED] wrote:
Also I was thinking of
modifying HOD to run on grid engine. I haven't really begun to pour
over all the code for HOD but, my question is this, can I just write a
python module similar to that of torque.py under hod/schedulers/ for sge
On 5/1/08 5:00 PM, Bradford Stephens [EMAIL PROTECTED] wrote:
*Very* cool information. As someone who's leading the transition to
open-source and cluster-orientation at a company of about 50 people,
finding good tools for the IT staff to use is essential. Thanks so much for
the continued
On 4/22/08 7:12 AM, [EMAIL PROTECTED] [EMAIL PROTECTED]
wrote:
I am getting this annoying error message every time I start
bin/start-all.sh with one single node
command-line: line 0: Bad configuration option: ConnectTimeout
Do you know what could be the issue?? I can not find it in the FAQs,
On 4/22/08 12:23 PM, Mika Joukainen [EMAIL PROTECTED] wrote:
All right, I have to refrase: like to have storage system for files which
are inserted by the users. Users are going to use normal human operable sw
entities ;) System is going to have: fault tolerance, parallelism etc. ==
HDFS,
On 4/21/08 3:36 AM, vikas [EMAIL PROTECTED] wrote:
Most of your questions have been answered by Luca, from what I can see,
so let me tackle the rest a bit...
4) Let us suppose I want to shutdown one datanode for maintenance purpose.
is there any way to inform Hadoop saying that this
On 4/21/08 2:18 PM, Ted Dunning [EMAIL PROTECTED] wrote:
I agree with the fair and balanced part. I always try to keep my clusters
fair and balanced!
Joydeep should mention his background. In any case, I agree that high-end
filers may provide good enough NFS service, but I would also
On 2/21/08 11:34 AM, Jeff Hammerbacher [EMAIL PROTECTED]
wrote:
yeah, i've heard those facebook groups can be a great way to get the word
out...
anyways, just got approval yesterday for a 320 node cluster. each node has
8 cores and 4 TB of raw storage so this guy is gonna be pretty
On 2/7/08 11:01 PM, Tim Wintle [EMAIL PROTECTED] wrote:
it's
useful to be able to connect from nodes that aren't in the slaves file
so that you can put in input data direct from another machine that's not
part of the cluster,
I'd actually recommend this as a best practice. We've been
On 2/8/08 9:32 AM, Jeff Eastman [EMAIL PROTECTED] wrote:
I noticed that phenomena right off the bat. Is that a designed feature
or just an unhappy consequence of how blocks are allocated?
My understanding is that this is by design--when you are running a MR
job, you want the output, temp
53 matches
Mail list logo