Currently, Hadoop does round-robin allocation of blocks and data
across multiple JBOD disks. We did some testing and found that there
weren't significant differences between RAID-0 and JBOD. We went with
JBOD because we figured that RAID-0 has a higher failure rate than
JBOD -- any disk
Hi Edward,
At Metaweb, we're experimenting with storing raw triples in HDFS flat
files, and have written a simple query language and planner that
executes the queries with chained map-reduce jobs. This approach works
well for warehousing triple data, and doesn't require HBase. Queries
may
Engineering, Korea University
1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea
TEL : +82-2-3290-3580
-
On Tue, Oct 21, 2008 at 10:23 AM, Colin Evans [EMAIL PROTECTED] wrote:
Hi Edward,
At Metaweb, we're experimenting
At Freebase, we're mapping our large graphs into very large files of
triples in HDFS and running large queries over them.
Hadoop is optimized for processing streaming data off of disk, and we've
found that trying to load a multi-GB graph and then access it in a
Hadoop task has scaling
There's a patch to get the native targets to build on Mac OS X:
http://issues.apache.org/jira/browse/HADOOP-3659
You probably will need to monkey with LDFLAGS as well to get it to work,
but we've been able to build the native libs for the Mac without too
much trouble.
Doug Cutting wrote:
[exec] make[2]: *** [LzoCompressor.lo] Error 1
[exec] make[1]: *** [all-recursive] Error 1
[exec] make: *** [all] Error 2
Any ideas?
On Sep 30, 2008, at 11:53 AM, Colin Evans wrote:
There's a patch to get the native targets to build on Mac OS X:
http://issues.apache.org/jira
wrote:
Unfortunately, setting those environment variables did not help my
issue. It appears that the HADOOP_LZO_LIBRARY variable is not
defined in both LzoCompressor.c and LzoDecompressor.c. Where is this
variable supposed to be set?
On Sep 30, 2008, at 12:33 PM, Colin Evans wrote:
Hi Nathan
Freebase is finally open-sourcing our Jython-based framework for writing
map-reduce jobs on Hadoop. Happy tightly embeds Jython into the Hadoop
APIs, files off a lot of the sharp edges, and makes writing map-reduce
programs a breeze. This is the 0.1 release, but we've been using Happy
at
from the SEC in Freebase, a talk by Kurt Bollacker on data mining
Wikipedia, and at talk by Kirrily Robert on new features in Freebase.
Sign up if you're planning on coming - space can be limited.
http://upcoming.yahoo.com/event/760574
Thanks
Colin Evans
We're building a cluster of 40 machines with 5 drives each, and I'm
curious what people's experiences have been for using RAID-0 for HDFS
vs. configuring seperate partitions (JBOD) and having the datanode
balance between them.
I took a look at the datanode code, and datanodes appear to write
Here's the code. If folks are interested, I can submit it as a patch as
well.
Prasan Ary wrote:
Colin,
Is it possible that you share some of the code with us?
thx,
Prasan
Colin Evans [EMAIL PROTECTED] wrote:
We ended up subclassing TextInputFormat and adding a custom
:19 PM, Colin Evans [EMAIL PROTECTED] wrote:
The big question for me is how well a dual-CPU 4-core (8 cores per box)
configuration will do. Has anyone tried out this configuration with
Intel or AMD CPUs? Is the memory throughput sufficient?
Because of acquiring servers of different capacities at different times,
we have 2 servers with 1TB of disk each, and 11 servers with ~300GB
each. The 1TB servers tend to be under-utilized by HDFS given their
capacity. This makes sense, as block replicas need to be relatively
evenly
Hi Ted,
I've been building out a similar framework in JavaScript (Rhino) for
work that I've been doing at MetaWeb, and we've been thinking about open
sourcing it too. It's pretty clear that there are major benefits to
using a dynamic scripting language with Hadoop.
I'd love too see how
14 matches
Mail list logo