We use XFS for our data drives, and we've had somewhat mixed results.
One of the biggest pros is that XFS has more free space than ext3,
even with the reserved space settings turned all the way to 0.
Another is that you can format a 1TB drive as XFS in about 0 seconds,
versus minutes for
I thought it a conspicuous omission to not discuss the cost of
various approaches. Hadoop is free, though you have to spend
developer time; how much does Vertica cost on 100 nodes?
-Bryan
On Apr 14, 2009, at 7:16 AM, Guilherme Germoglio wrote:
(Hadoop is used in the benchmarks)
Hey all,
I was trying to copy some data from our cluster on 0.19.2 to a new
cluster on 0.18.3 by using disctp and the hftp:// filesystem.
Everything seemed to be going fine for a few hours, but then a few
tasks failed because a few files got 500 errors when trying to be
read from the 19
out at the same time?
Thanks
-Todd
On Wed, Apr 8, 2009 at 11:39 PM, Bryan Duxbury br...@rapleaf.com
wrote:
Hey all,
I was trying to copy some data from our cluster on 0.19.2 to a new
cluster
on 0.18.3 by using disctp and the hftp:// filesystem. Everything
seemed to
be going fine for a few
I don't really see what the downside of reading it from disk is. A
list of word counts should be pretty small on disk so it shouldn't
take long to read it into a HashMap. Doing anything else is going to
cause you to go a long way out of your way to end up with the same
result.
-Bryan
On
Is there some area of the codebase that deals with aggregating
counters that I should be looking at?
-Bryan
On Mar 17, 2009, at 10:20 PM, Owen O'Malley wrote:
On Mar 17, 2009, at 7:44 PM, Bryan Duxbury wrote:
There is no compression in the mix for us, so that's not the culprit.
I'd
I believe the last word on appends right now is that the patch that
was committed broke a lot of other things, so it's been disabled. As
such, there is no working append in HDFS, and certainly not in
hadoop-17.x.
-Bryan
On Mar 17, 2009, at 4:50 PM, Steve Gao wrote:
Thanks, but I was told
Hey all,
In looking at the stats for a number of our jobs, the amount of data
that the UI claims we've read from or written to HDFS is vastly
larger than the amount of data that should be involved in the job.
For instance, we have a job that combines small files into big files
that we're
No. There isn't *any* version of Hadoop with a (stable) append command.
On Mar 17, 2009, at 5:08 PM, Steve Gao wrote:
Thanks, Bryan. Does 0.18.3 has built-in append command?
--- On Tue, 3/17/09, Bryan Duxbury br...@rapleaf.com wrote:
From: Bryan Duxbury br...@rapleaf.com
Subject: Re: Does
if it is due to all the disk activity that happens while
processing spills in the mapper and the copy/shuffle/sort phase in the
reducer. It would certainly be nice if all the byte counts were
reported in
a way that they're comparable.
-- Stefan
From: Bryan Duxbury br...@rapleaf.com
Reply
I've used YourKit Java Profiler pretty successfully. There's a
JobConf parameter you can flip on that will cause a few maps and
reduces to start with profiling on, so you won't be overwhelmed with
info.
-Bryan
On Feb 27, 2009, at 11:12 AM, Sandy wrote:
Hello,
Could anyone recommend any
On occasion, I've deleted a few TB of stuff in DFS at once. I've
noticed that when I do this, datanodes start taking a really long
time to check in and ultimately get marked dead. Some time later,
they'll get done deleting stuff and come back and get unmarked.
I'm wondering, why do
(Repost from the dev list)
I noticed some really odd behavior today while reviewing the job
history of some of our jobs. Our Ganglia graphs showed really long
periods of inactivity across the entire cluster, which should
definitely not be the case - we have a really long string of jobs in
We didn't customize this value, to my knowledge, so I'd suspect it's
the default.
-Bryan
On Feb 20, 2009, at 5:00 PM, Ted Dunning wrote:
How often do your reduce tasks report status?
On Fri, Feb 20, 2009 at 3:58 PM, Bryan Duxbury br...@rapleaf.com
wrote:
(Repost from the dev list)
I
Hey all,
Does anyone have any experience trying to measure IO time spent in
their map/reduce jobs? I know how to profile a sample of map and
reduce tasks, but that appears to exclude IO time. Just subtracting
the total cpu time from the total run time of a task seems like too
coarse an
Small files are bad for hadoop. You should avoid keeping a lot of
small files if possible.
That said, that error is something I've seen a lot. It usually
happens when the number of xcievers hasn't been adjusted upwards from
the default of 256. We run with 8000 xcievers, and that seems to
Correct.
+1 to Jason's more unix file handles suggestion. That's a must-have.
-Bryan
On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:
This would be an addition to the hadoop-site.xml file, to up
dfs.datanode.max.xcievers?
Thanks.
On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote
This sounds good enough for a JIRA ticket to me.
-Bryan
On Feb 3, 2009, at 11:44 AM, Jonathan Gray wrote:
Chris,
For my specific use cases, it would be best to be able to set N
mappers/reducers per job per node (so I can explicitly say, run at
most 2 at
a time of this CPU bound task on any
:
Ext2 by default reserves 5% of the drive for use by root only.
That'd be 45MB of your 907GB capacity which would account for most
of the discrepancy. You can adjust this with tune2fs.
Doug
Bryan Duxbury wrote:
There are no non-dfs files on the partitions in question.
df -h indicates
it was mostly
full (ext4 was not tested)... so, if you are thinking of pushing
things to the limits, that might be something worth considering.
Brian
On Jan 30, 2009, at 11:18 AM, stephen mulcahy wrote:
Bryan Duxbury wrote:
Hm, very interesting. Didn't know about that. What's the purpose
Hey all,
I'm currently installing a new cluster, and noticed something a
little confusing. My DFS is *completely* empty - 0 files in DFS.
However, in the namenode web interface, the reported capacity is
3.49 TB, but the remaining is 3.25TB. Where'd that .24TB go? There
are literally zero
files.
Hairong
On 1/29/09 3:23 PM, Bryan Duxbury br...@rapleaf.com wrote:
Hey all,
I'm currently installing a new cluster, and noticed something a
little confusing. My DFS is *completely* empty - 0 files in DFS.
However, in the namenode web interface, the reported capacity is
3.49 TB
If you are considering using it as a conventional filesystem from a
few clients, then it most resembles NAS. However, I don't think it
makes sense to try and classify it as SAN or NAS. HDFS is a
distributed filesystem designed to be consumed in a massively
distributed fashion, so it does
My app isn't a map/reduce job.
On Nov 25, 2008, at 9:07 PM, David B. Ritch wrote:
Do you have speculative execution enabled? I've seen error messages
like this caused by speculative execution.
David
Bryan Duxbury wrote:
I have an app that runs for a long time with no problems, but when I
to your
problem.
On Nov 25, 2008, at 9:07 PM, David B. Ritch wrote:
Do you have speculative execution enabled? I've seen error messages
like this caused by speculative execution.
David
Bryan Duxbury wrote:
I have an app that runs for a long time with no problems, but when I
signal it to shut
I have an app that runs for a long time with no problems, but when I
signal it to shut down, I get errors like this:
java.io.IOException: Filesystem closed
at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196)
at
Comments inline.
On Nov 6, 2008, at 9:29 AM, Ricky Ho wrote:
Hi,
While exploring how Hadoop fits in our usage scenarios, there are 4
recurring issues keep popping up. I don't know if they are real
issues or just our misunderstanding of Hadoop. Can any expert shed
some light here ?
Agree, we use Thrift at Rapleaf for this purpose. It's trivial to
make a ThriftWritable if you want to be crafty, but you can also just
use byte[]s and do the serialization and deserialization yourself.
-Bryan
On Nov 1, 2008, at 8:01 PM, Alex Loddengaard wrote:
Take a look at Thrift:
We do this with some of our Thrift-serialized types. We account for
this behavior explicitly in the ThrittWritable class and make it so
that we can read the serialized version off the wire completely by
prepending the size. Then, we can read in the raw bytes and hang on
to them for later
Hey all,
Why is it that FileSystem.rename returns true or false instead of
throwing an exception? It seems incredibly inconvenient to get a
false result and then have to go poring over the namenode logs
looking for the actual error message. I had this case recently where
I'd forgotten to
if it did, it's not clear to FileSystem that the failure to
rename is fatal/exceptional to the application. -C
On Sep 30, 2008, at 1:37 PM, Bryan Duxbury wrote:
Hey all,
Why is it that FileSystem.rename returns true or false instead of
throwing an exception? It seems incredibly inconvenient
Ok, so, what might I do next to try and diagnose this? Does it sound
like it might be an HDFS/mapreduce bug, or should I pore over my own
code first?
Also, did any of the other exceptions look interesting?
-Bryan
On Sep 29, 2008, at 10:40 AM, Raghu Angadi wrote:
Raghu Angadi wrote:
Doug
Hey all.
We've been running into a very annoying problem pretty frequently
lately. We'll be running some job, for instance a distcp, and it'll
be moving along quite nicely, until all of the sudden, it sort of
freezes up. It takes a while, and then we'll get an error like this one:
: Bryan Duxbury [mailto:[EMAIL PROTECTED]
Sent: Fri 9/26/2008 4:36 PM
To: core-user@hadoop.apache.org
Subject: Could not get block locations. Aborting... exception
Hey all.
We've been running into a very annoying problem pretty frequently
lately. We'll be running some job, for instance a distcp
I encountered an interesting situation today. I'm running Hadoop
0.17.1. What happened was that 3 jobs started simultaneously, which
is expected in my workflow, but then resources got very mixed up.
One of the jobs grabbed all the available reducers (5) and got one
map task in before the
On May 23, 2008, at 9:51 AM, Ted Dunning wrote:
Relative to thrift, JSON has the advantage of not requiring a
schema as well
as the disadvantage of not having a schema. The advantage is that
the data
is more fluid and I don't have to generate code to handle the
records. The
disadvantage
Nobody has any ideas about this?
-Bryan
On May 13, 2008, at 11:27 AM, Bryan Duxbury wrote:
I'm trying to create a java application that writes to HDFS. I have
it set up such that hadoop-0.16.3 is on my machine, and the env
variables HADOOP_HOME and HADOOP_CONF_DIR point to the correct
is present in your classpath. Make sure
your generated class path matches the same. And the conf dir (/
Users/bryanduxbury/hadoop-0.16.3/conf) I hope it is the similar as
the one you are using for your hadoop installation.
Thanks,
lohit
- Original Message
From: Bryan Duxbury [EMAIL
I'm trying to create a java application that writes to HDFS. I have
it set up such that hadoop-0.16.3 is on my machine, and the env
variables HADOOP_HOME and HADOOP_CONF_DIR point to the correct
respective directories. My app lives elsewhere, but generates it's
classpath by looking in
I think what you're saying is that you are mostly interested in data
locality. I don't think it's done yet, but it would be pretty easy to
make HBase provide start keys as well as region locations for splits
for a MapReduce job. In theory, that would give you all the pieces
you need to run
HBASE-493 was created, and seems similar. It's a write-if-not-
modified-since.
I would guess that you probably don't want to use HBase to maintain a
distributed auto-increment. You need to think of some other approach
the produces unique ids across concurrent access, like hash or GUID
or
To connect to HBase from PHP, you should use either REST or Thrift
integration.
-Bryan
On Mar 11, 2008, at 4:20 AM, Ved Prakash wrote:
I have seen examples to connect to hbase using php, which mentions of
hshellconnect.class.php, I would like to know where can I download
this
file, or is
Ved,
At the moment you're stuck loading the data via one of the APIs
(Java, REST or Thrift) yourself. We would like to have import tools
for HBase, but we haven't gotten around to it yet.
Also, there's now a separate HBase mailing list at hbase-
[EMAIL PROTECTED] Your questions about
There's nothing stopping you from storing doubles in HBase. All you
have to do is convert your double into a byte array.
-Bryan
On Jan 30, 2008, at 4:31 PM, Chanwit Kaewkasi wrote:
Hi Edward,
On 29/01/2008, edward yoon [EMAIL PROTECTED] wrote:
Did you mean the MATLAB-like scientific
44 matches
Mail list logo