Hmm... sounds promising :) How do you distribute the data? Do you use
HDFS? Pass the data directly to the individual nodes? We really only need
to do the map operation like you. We need to distribute a matrix * vector
operation, so we want rows of the matrix distributed across different
I have been using the HDFS, setting the block size to some appropriate level
and the replication as well. When submitting the job keep in mind that each
block of the file in the HDFS will be passed into your mapping script as
Standard Input. The datafile calls will be done locally if possible.
Identity reduce is nice because the result values can be sorted.
On 3/12/08 8:21 AM, Jason Rennie [EMAIL PROTECTED] wrote:
Map could perform all the dot-products, which is the heavy lifting
in what we're trying to do. Might want to do a reduce after that, not
sure...
Ahhh...
There is an old saying for this. I think you are pulling fly specks out of
pepper.
Unless your input format is very, very strange, doing the split again for
two jobs does, indeed, lead to some small inefficiency, but this cost should
be so low compared to other inefficiencies that you
the short answer is no - can't do this.
there are some special cases:
if the map output key for the same xml record is the same for both the jobs
(ie. sort/partition/grouping is based on same value) - then you can do this in
the application layer.
if the map output keys differs - then there's
I would like to use HDFS component of Hadoop but not interested in
MapReduce.
All the Hadoop examples I have seen so far uses MapReduce classes and from
these examples there is no reference to HDFS classes including File System
API of Hadoop
Is there an easy way to search this email list?
I couldn't find any web interface.
Please help.
CEG
On 12/03/2008 4:18 PM, Cagdas Gerede wrote:
Is there an easy way to search this email list?
I couldn't find any web interface.
Please help.
http://wiki.apache.org/hadoop/MailingListArchives
Daryl
Hi Pete, Joydeep,
These sound like thoughts that could lead to excellent suggestions
with a little more investment of your time.
We'd love it if you could invest some effort into contributing to the
release process! Hadoop is open source and becoming active
contributors is the best
http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample
Hairong
On 3/12/08 1:21 PM, Arun C Murthy [EMAIL PROTECTED] wrote:
http://hadoop.apache.org/core/docs/r0.16.0/hdfs_user_guide.html
Arun
On Mar 12, 2008, at 1:16 PM, Cagdas Gerede wrote:
I would like to use HDFS component of
I used this code like this inside of a tomcat web application. It
works. Shared webserver filesystem :)
On Wed, Mar 12, 2008 at 4:50 PM, Hairong Kuang [EMAIL PROTECTED] wrote:
http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample
Hairong
On 3/12/08 1:21 PM, Arun C Murthy [EMAIL
I have two Map/Reduce jobs and both of them output a file each. Is there a way
I can name these output files different from the default names of part- ?
thanks.
__
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection
Hi Hadoop mavens-
I'm hoping someone out there will have a quick solution for me. I'm
trying to run some very basic scaling experiments for a rapidly
approaching paper deadline on a 16.0 Hadoop cluster that has ~20 nodes
with 2 procs/node. Ideally, I would want to run my code on clusters
of
Thanks-- that should work. I'll follow up with the cluster
administrators to see if I can get this to happen. To rebalance the
file storage can I just set the replication factor using hadoop dfs?
Chris
On Wed, Mar 12, 2008 at 6:36 PM, Ted Dunning [EMAIL PROTECTED] wrote:
What about just
Yes.
Increase the replication. Wait. Drop the replication.
On 3/12/08 3:44 PM, Chris Dyer [EMAIL PROTECTED] wrote:
Thanks-- that should work. I'll follow up with the cluster
administrators to see if I can get this to happen. To rebalance the
file storage can I just set the replication
I see the following paragraphs in the wiki (
http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample)http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample
Create a [image: [WWW]]
FileSystemhttp://hadoop.apache.org/core/api/org/apache/hadoop/fs/FileSystem.htmlinstance
by passing a new
I found the solution. Please let me know if you have a better idea.
I added the following addResource lines.
Configuration conf = new Configuration();
conf.addResource(new Path(location_of_hadoop-default.xml));
conf.addResource(new Path(location_of_hadoop-site.xml));
If you add the configuration directory to the class path, the configuration
files will be automatically loaded.
Hairong
On 3/12/08 5:32 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:
I found the solution. Please let me know if you have a better idea.
I added the following addResource lines.
Hi Johannes,
Which version of hadoop are you using? There is a known bug in some nightly
builds.
Nicholas
- Original Message
From: Johannes Zillmann [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Wednesday, March 12, 2008 5:47:27 PM
Subject: file permission problem
Hi,
i
Hi Nicholas,
i'm using the 0.16.0 distribution.
Johannes
[EMAIL PROTECTED] wrote:
Hi Johannes,
Which version of hadoop are you using? There is a known bug in some nightly
builds.
Nicholas
- Original Message
From: Johannes Zillmann [EMAIL PROTECTED]
To:
I tried to specify WordCountInputFormat as the input format, here is the
command line:
bin/hadoop pipes -conf src/examples/pipes/conf/word-nopipe.xml -input
inputdata/ -output outputdata -inputformat
org.apache.hadoop.mapred.pipes.WordCountInputFormat
The process of mapreduce seems not really
21 matches
Mail list logo