that is cleaned after the
Map job finishes? I'm sure there must be a better way to do this ;-)
I'm using Hadoop version 0.20.2 (the Cloudera distribution)
Thanks in advance!
Eric
(I'm using the deprecated libraries)
I can not retrieve this setting in my mapper however. Is there another way, or
am I doing something wrong?
Best regards,
Eric
I don't know if there is such a thing in Hadoop, I'm guessing not since
MapReduce is designed to have independent mappers and reducers. However,
I can see the need and usefulness of some form of shared memory.
I'm just suggesting something here: you could write a small server
yourself. Say you
I can't answer your question, but have you looked at HadoopDB? Maybe it
fits your needs.
Op 19-12-10 19:23, Jane Chen schreef:
Suppose that the output is written to a database, that only runs on
certain nodes. It will be desirable to schedule the reducer tasks to
run on the nodes local or clo
or HBase. Or am I wrong? Are you
guys patching old releases or are you keeping up with new releases instead?
Are there advantages to running Cloudera's packages instead of the Apache
releases (besides that it is slightly easier to install)?
Thank you in advance. All comments and suggestions are welcome!
--
Eric
2010/12/22 Chase Bradford
> If you want a tmp file on a task's local host, just use java's
> createTempFile from the File class. It creates a file in java.io.tmp, which
> the task runner sets up in the task's workspace and is cleaned by the TT
> even if the child jvm exits badly.
>
>
Thank you f
aries are
deprecated is confusing too. Someone should write this down for newcomers:
use the old libraries, they are deprecated but are the best choice for now
since they are complete and well tested.
2010/12/22 Todd Lipcon
> Hi Eric,
>
> Some thoughts inline below:
>
> On Wed, Dec
e are inserted into the same region
causing slowdowns.
Is there a way to randomize the keys in these sequence files? I can simply
put a random value before the key (like "%RND-keyname"), but I'm wondering
if there is a less dirty method, like a random partitioner class ;-)
--
Eric
With hyperthreading, the cpu tries to prevent being idle by running that
extra thread when it has some cycles left. It can do so cheaply, since
hyperthreading is much faster than context switching. So as Arun suggests,
it probably won't hurt as long as you have enough memory in your nodes. Your
cpu
Cloudera offers a nice distribution and decent documentation of how to
install.
When you get started, start using the "old", deprecated API as others have
pointed out before. It's most complete and most stable for now.
2011/2/23 MONTMORY Alain
> Hi,
>
>
>
> For my point of view it is not a triv
Note that Hadoop will put your files in a trash bin. You can use the
-skipTrash option to really delete the data and free up space. See the
command "hadoop dfs" for more details.
2011/2/28 Ondřej Nevělík
> hadoop dfs -rmr "directory"
>
> 2011/2/28 real great..
>
> Hi,
>> How do you delete a dir
Hi,
I have a job that processes raw data inside tarballs. As job input I have a
text file listing the full HDFS path of the files that need to be processed,
e.g.:
...
/user/eric/file451.tar.gz
/user/eric/file452.tar.gz
/user/eric/file453.tar.gz
...
Each mapper gets one line of input at a time
ency on Spring which
for me isn't a problem.
You can replace Spring with your DI framework of choice, of course, but
this pattern works well for me. Hope this helps!
Best regards.
--
Eric Sammer
e...@lifless.net
http://esammer.blogspot.com
th capacity scheduler. Thanks in
advance.
Regards,
Eric
MapProcessorFacto
ry
The MapProcessorFactory is in parsers.jar file, and file permission are the
same for the running user and parsers.jar file.
Any idea? What are the difference between addArchiveToClassPath and
addFiletoClassPath?
Regards,
Eric
On 12/31/09 3:25 PM, "Philip Zeyliger"
run the job. It didsn't work neither.
Is there any procedure that I could follow to debug this further?
Regards,
Eric
rt-mapred.sh]
> "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
> "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker
[1] - http://wiki.apache.org/hadoop/
[2] - http://www.cloudera.com/hadoop-training-mapreduce-hdfs
Hope this helps.
--
Eric Sammer
e...@lifeless.net
http://esammer.blogspot.com
ou can roll your own
start up scripts and invoke the underlying hadoop-daemon.sh scripts on
each node over whatever communication channel you'd like. You may have
to do a little environment setup first if you choose to go this route.
Take a look at the source of start-*.sh; they're pre
impact performance and add the
requirement that all values for a given key fit in memory.
Hope this helps.
--
Eric Sammer
e...@lifeless.net
http://esammer.blogspot.com
ably correct myself and say that it depends on the application. In
general, the assumption made by the framework is that all reduce values
for a given key may not fit in memory. In specific implementations it
may be fine (or even necessary) for the user to do buffering like this.
Thanks and sorry
Hi Victor,
Thanks for the detailed examination. I will make sure to remove the URI
prefix in my code for now.
Regards,
Eric
On 1/20/10 5:36 AM, "Victor Hsieh" wrote:
> BTW, this issue has been reported:
> http://issues.apache.org/jira/browse/MAPREDUCE-752
>
> On Wed, J
;s configuration? If not, does anyone else feel
like there should be?
I completely understand the correct answer is to fix the hosts file or
not depend on it at all, deferring to DNS. But, it does seem like this
bit of the code is overly complicated and brittle.
Thoughts?
Thanks.
--
Eric Sammer
e.
Hi Chris,
what you need to do is
(with Hadoop 0.20+)
job.setMapOutputValueClass(Text.class); //Mapper
job.setMapOutputKeyClass(LongWritable .class); //Mapper
This will do the trick.
regards,
Eric Arenas
From: "Wilkes, Chris"
To: mapr
decides
to add a new type/class. For example today it is String and Long, but tomorrow
it might be float, doubles, varchar and others...
(2) is easier to implement
Regards,
Eric Arenas
From: Eric Arenas
To: mapreduce-user@hadoop.apache.org
Sent: Wed
s into the specific
class names, etc.).
Hope this help. If I've said anything wrong, I'm very happy to have
people correct me.
Regards.
--
Eric Sammer
e...@lifeless.net
http://esammer.blogspot.com
t back to the "old" APIs and use MTOF or MO as you've mentioned.
I believe CDH3 has (or will have) updated versions of MTOF and MO for the
new APIs but don't quote me on that.
--
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com
Yes Raghava,
I have experience that issue before, and the solution that you mentioned also
solved my issue (adding a context.progress or setcontext to tell the JT that my
jobs are still running)
regards
Eric Arenas
From: Raghava Mutharaju
To: common-u
27;t mean in private computers, all of them in different
> places, rather a collection of datacenters, connected to each other over
> the Internet.
>
> Would that fail? If yes, how and why? What issues would arise?
>
--
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com
u assume that the bandwidth of the participants is
> abundant?
>
> @Eric Sammers
> Could you elaborate on pipe line replication a bit more? The way I
> understood it, the input is copied to one DataNode from the client, and
> then to another from the first DataNode and so on.
.hadoop.mapred.ReduceTask.run(ReduceTask.java:395)
> at org.apache.hadoop.mapred.Child.main(Child.java:194)
>
> I would like to debug this thread in a IDE but I don't know how to do it.
> Should I define properties to do this? Is there a way to do it?
>
> Thanks
>
> --
> PSC
>
o run the mapreduce program from another
> java program. I need some mechanism for submitting the job not from the
> command line but some other java program should launch the job.
>
> Nishant Sonar
>
--
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com
x tasks, but
that's cluster wide, not per host so I don't think that will be
helpful. A better option is to pack more work into each task in the
"lighter" of your two jobs so they have similar performance
characteristics, if possible. Of course, easier said than done, I
know.
--
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com
conf.set("mapred.reduce.tasks.speculative.execution", "false");
>
> What am I missing here?
>
> cheers
> --
> Torsten
>
--
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
> }
> String athString = otherArgs[otherArgs.length - 1];
> File out = new File(athString);
> if (out.exists()) {
> FileUtilities.expungeDirectory(out);
> out.delete();
> }
> Path outputDir = new Path(athString);
>
> FileOutputFormat.setOutputPath(job, outputDir);
>
> boolean ans = job.waitForCompletion(true);
> int ret = ans ? 0 : 1;
> System.exit(ret);
> }
> }
> --
> Steven M. Lewis PhD
> Institute for Systems Biology
> Seattle WA
>
--
Eric Sammer
twitter: esammer
data: www.cloudera.com
10 at 9:25 PM, Mohamed Riadh Trad
wrote:
> Dear All;
>
> I had to emit final Key/Values in the Mapper Close Method but I can't get the
> context.
>
> Any suggestion?
>
> Regard.
--
Eric Sammer
twitter: esammer
data: www.cloudera.com
ice that any use, disclosure, copying or distribution of this
> message, in any form, is strictly prohibited. If you have received this
> message in error, please immediately notify the sender and/or Syncsort and
> destroy all copies of this message in your possession, custody or control.
--
Eric Sammer
twitter: esammer
data: www.cloudera.com
he in the next release.
Feedbacks are welcome.
Regards,
Eric
On 2/23/11 5:01 AM, "real great.." wrote:
Thanks a lot..
On Wed, Feb 23, 2011 at 3:12 PM, Eric wrote:
Cloudera offers a nice distribution and decent documentation of how to install.
When you get started, start using the "
other gotchas about trying to
use a ram disk like this? It seems like a quick and dirty way to get
some performance.
Thanks,
Eric
I've seen it too. When I get this, I restart the NM, RM, and HS, and it stops
happening.
I don't have a cuase yet.
-Eric
From: Jeffrey Naisbitt [mailto:jnais...@yahoo-inc.com]
Sent: Monday, September 12, 2011 12:23 PM
To: mapreduce-...@hadoop.apache.org
Subject: Failing to contact
hanks for your time
>
> Marco Didonna
>
> PS: I use both locally and on the cloud latest version of cloudera
> distribution for hadoop
>
--
*Eric Fiala*
*Fiala Consulting*
T: 403.828.1117
E: e...@fiala.ca
http://www.fiala.ca
Hoot, these are big numbers - some thoughts
1) does your machine have 1000GB to spare for each java child thread (each
mapper + each reducer)? mapred.child.java.opts / -Xmx1048576m
2) does each of your daemons need / have 10G? HADOOP_HEAPSIZE=1
hth
EF
> # The maximum amount of heap to use, i
41 matches
Mail list logo