Getting a temporary directory in map jobs

2010-12-10 Thread Eric
that is cleaned after the Map job finishes? I'm sure there must be a better way to do this ;-) I'm using Hadoop version 0.20.2 (the Cloudera distribution) Thanks in advance! Eric

Passing configuration options to the Map class

2010-12-13 Thread Eric
(I'm using the deprecated libraries) I can not retrieve this setting in my mapper however. Is there another way, or am I doing something wrong? Best regards, Eric

Re: Passing messages

2010-12-19 Thread Eric
I don't know if there is such a thing in Hadoop, I'm guessing not since MapReduce is designed to have independent mappers and reducers. However, I can see the need and usefulness of some form of shared memory. I'm just suggesting something here: you could write a small server yourself. Say you

Re: How to Influence Reduce Task Location.

2010-12-19 Thread Eric
I can't answer your question, but have you looked at HadoopDB? Maybe it fits your needs. Op 19-12-10 19:23, Jane Chen schreef: Suppose that the output is written to a database, that only runs on certain nodes. It will be desirable to schedule the reducer tasks to run on the nodes local or clo

Which version to choose

2010-12-22 Thread Eric
or HBase. Or am I wrong? Are you guys patching old releases or are you keeping up with new releases instead? Are there advantages to running Cloudera's packages instead of the Apache releases (besides that it is slightly easier to install)? Thank you in advance. All comments and suggestions are welcome! -- Eric

Re: Getting a temporary directory in map jobs

2010-12-22 Thread Eric
2010/12/22 Chase Bradford > If you want a tmp file on a task's local host, just use java's > createTempFile from the File class. It creates a file in java.io.tmp, which > the task runner sets up in the task's workspace and is cleaned by the TT > even if the child jvm exits badly. > > Thank you f

Re: Which version to choose

2010-12-22 Thread Eric
aries are deprecated is confusing too. Someone should write this down for newcomers: use the old libraries, they are deprecated but are the best choice for now since they are complete and well tested. 2010/12/22 Todd Lipcon > Hi Eric, > > Some thoughts inline below: > > On Wed, Dec

Randomizing keys in sequence file output format

2010-12-27 Thread Eric
e are inserted into the same region causing slowdowns. Is there a way to randomize the keys in these sequence files? I can simply put a random value before the key (like "%RND-keyname"), but I'm wondering if there is a less dirty method, like a random partitioner class ;-) -- Eric

Re: Choosing number of map/reduce slots (with hyperthreading)

2011-01-10 Thread Eric
With hyperthreading, the cpu tries to prevent being idle by running that extra thread when it has some cycles left. It can do so cheaply, since hyperthreading is much faster than context switching. So as Arun suggests, it probably won't hurt as long as you have enough memory in your nodes. Your cpu

Re: easiest way to install hadoop

2011-02-23 Thread Eric
Cloudera offers a nice distribution and decent documentation of how to install. When you get started, start using the "old", deprecated API as others have pointed out before. It's most complete and most stable for now. 2011/2/23 MONTMORY Alain > Hi, > > > > For my point of view it is not a triv

Re: command to delete a directory in hadoop

2011-02-28 Thread Eric
Note that Hadoop will put your files in a trash bin. You can use the -skipTrash option to really delete the data and free up space. See the command "hadoop dfs" for more details. 2011/2/28 Ondřej Nevělík > hadoop dfs -rmr "directory" > > 2011/2/28 real great.. > > Hi, >> How do you delete a dir

Improve data locality for MR job processing tar.gz files

2011-05-09 Thread Eric
Hi, I have a job that processes raw data inside tarballs. As job input I have a text file listing the full HDFS path of the files that need to be processed, e.g.: ... /user/eric/file451.tar.gz /user/eric/file452.tar.gz /user/eric/file453.tar.gz ... Each mapper gets one line of input at a time

Re: passing dependencies to my Mapper

2009-09-08 Thread Eric Sammer
ency on Spring which for me isn't a problem. You can replace Spring with your DI framework of choice, of course, but this pattern works well for me. Hope this helps! Best regards. -- Eric Sammer e...@lifless.net http://esammer.blogspot.com

Add user jars to mapreduce

2009-12-31 Thread Eric Yang
th capacity scheduler. Thanks in advance. Regards, Eric

Re: Add user jars to mapreduce

2009-12-31 Thread Eric Yang
MapProcessorFacto ry The MapProcessorFactory is in parsers.jar file, and file permission are the same for the running user and parsers.jar file. Any idea? What are the difference between addArchiveToClassPath and addFiletoClassPath? Regards, Eric On 12/31/09 3:25 PM, "Philip Zeyliger"

Re: Add user jars to mapreduce

2010-01-04 Thread Eric Yang
run the job. It didsn't work neither. Is there any procedure that I could follow to debug this further? Regards, Eric

Re: Questions about JobTracker and TaskTracker

2010-01-11 Thread Eric Sammer
rt-mapred.sh] > "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker [1] - http://wiki.apache.org/hadoop/ [2] - http://www.cloudera.com/hadoop-training-mapreduce-hdfs Hope this helps. -- Eric Sammer e...@lifeless.net http://esammer.blogspot.com

Re: How to use an alternative connector to SSH ?

2010-01-12 Thread Eric Sammer
ou can roll your own start up scripts and invoke the underlying hadoop-daemon.sh scripts on each node over whatever communication channel you'd like. You may have to do a little environment setup first if you choose to go this route. Take a look at the source of start-*.sh; they're pre

Re: Should mapreduce.ReduceContext reuse same object in nextKeyValue?

2010-01-12 Thread Eric Sammer
impact performance and add the requirement that all values for a given key fit in memory. Hope this helps. -- Eric Sammer e...@lifeless.net http://esammer.blogspot.com

Re: Should mapreduce.ReduceContext reuse same object in nextKeyValue?

2010-01-13 Thread Eric Sammer
ably correct myself and say that it depends on the application. In general, the assumption made by the framework is that all reduce values for a given key may not fit in memory. In specific implementations it may be fine (or even necessary) for the user to do buffering like this. Thanks and sorry

Re: Add user jars to mapreduce

2010-01-20 Thread Eric Yang
Hi Victor, Thanks for the detailed examination. I will make sure to remove the URI prefix in my code for now. Regards, Eric On 1/20/10 5:36 AM, "Victor Hsieh" wrote: > BTW, this issue has been reported: > http://issues.apache.org/jira/browse/MAPREDUCE-752 > > On Wed, J

Task tracker reported machine name / IP

2010-01-20 Thread Eric Sammer
;s configuration? If not, does anyone else feel like there should be? I completely understand the correct answer is to fix the hosts file or not depend on it at all, deferring to DNS. But, it does seem like this bit of the code is overly complicated and brittle. Thoughts? Thanks. -- Eric Sammer e.

Re: Does the class of the Mapper output need to match the exact class of the specified output?

2010-01-27 Thread Eric Arenas
Hi Chris, what you need to do is (with Hadoop 0.20+) job.setMapOutputValueClass(Text.class); //Mapper job.setMapOutputKeyClass(LongWritable .class); //Mapper This will do the trick. regards, Eric Arenas From: "Wilkes, Chris" To: mapr

Re: Does the class of the Mapper output need to match the exact class of the specified output?

2010-01-27 Thread Eric Arenas
decides to add a new type/class. For example today it is String and Long, but tomorrow it might be float, doubles, varchar and others... (2) is easier to implement Regards, Eric Arenas From: Eric Arenas To: mapreduce-user@hadoop.apache.org Sent: Wed

Re: MapRed ports

2010-02-09 Thread Eric Sammer
s into the specific class names, etc.). Hope this help. If I've said anything wrong, I'm very happy to have people correct me. Regards. -- Eric Sammer e...@lifeless.net http://esammer.blogspot.com

Re: Partitioning Reducer Output

2010-04-05 Thread Eric Sammer
t back to the "old" APIs and use MTOF or MO as you've mentioned. I believe CDH3 has (or will have) updated versions of MTOF and MO for the new APIs but don't quote me on that. -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: Reduce gets struck at 99%

2010-04-08 Thread Eric Arenas
Yes Raghava, I have experience that issue before, and the solution that you mentioned also solved my issue (adding a context.progress or setcontext to tell the JT that my jobs are still running) regards Eric Arenas From: Raghava Mutharaju To: common-u

Re: Hadoop over the internet

2010-04-17 Thread Eric Sammer
27;t mean in private computers, all of them in different > places, rather a collection of datacenters, connected to each other over > the Internet. > > Would that fail? If yes, how and why? What issues would arise? > -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: Hadoop over the internet

2010-04-20 Thread Eric Sammer
u assume that the bandwidth of the participants is > abundant? > > @Eric Sammers > Could you elaborate on pipe line replication a bit more? The way I > understood it, the input is copied to one DataNode from the client, and > then to another from the first DataNode and so on.

Re: How to debug reducer thread?

2010-04-27 Thread Eric Sammer
.hadoop.mapred.ReduceTask.run(ReduceTask.java:395) > at org.apache.hadoop.mapred.Child.main(Child.java:194) > > I would like to debug this thread in a IDE but I don't know how to do it. > Should I define properties to do this? Is there a way to do it? > > Thanks > > -- > PSC >

Re: Running Mapreduce program apart from command prompt

2010-05-27 Thread Eric Sammer
o run the mapreduce program from another > java program. I need some mechanism for submitting the job not from the > command line but some other java program should launch the job. > > Nishant Sonar > -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: how to set max map tasks individually for each job?

2010-06-04 Thread Eric Sammer
x tasks, but that's cluster wide, not per host so I don't think that will be helpful. A better option is to pack more work into each task in the "lighter" of your two jobs so they have similar performance characteristics, if possible. Of course, easier said than done, I know. -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: number of reducers

2010-06-06 Thread Eric Sammer
     conf.set("mapred.reduce.tasks.speculative.execution", "false"); > > What am I missing here? > > cheers > -- > Torsten > -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com

Re: Need help with exception when mapper emits different key class from reducer

2010-06-18 Thread Eric Sammer
       FileInputFormat.addInputPath(job, new Path(otherArgs[0])); >         } >         String athString = otherArgs[otherArgs.length - 1]; >         File out = new File(athString); >         if (out.exists()) { >             FileUtilities.expungeDirectory(out); >             out.delete(); >         } >         Path outputDir = new Path(athString); > >         FileOutputFormat.setOutputPath(job, outputDir); > >         boolean ans = job.waitForCompletion(true); >         int ret = ans ? 0 : 1; >         System.exit(ret); >     } > } > -- > Steven M. Lewis PhD > Institute for Systems Biology > Seattle WA > -- Eric Sammer twitter: esammer data: www.cloudera.com

Re: How to get context in Close() method in hadoop Pipes

2010-06-26 Thread Eric Sammer
10 at 9:25 PM, Mohamed Riadh Trad wrote: > Dear All; > > I had to emit final Key/Values in the Mapper Close Method but I can't get the > context. > > Any suggestion? > > Regard. -- Eric Sammer twitter: esammer data: www.cloudera.com

Re: Setting the number of mappers to 0

2010-07-09 Thread Eric Sammer
ice that any use, disclosure, copying or distribution of this > message, in any form, is strictly prohibited. If you have received this > message in error, please immediately notify the sender and/or Syncsort and > destroy all copies of this message in your possession, custody or control. -- Eric Sammer twitter: esammer data: www.cloudera.com

Re: easiest way to install hadoop

2011-02-23 Thread Eric Yang
he in the next release. Feedbacks are welcome. Regards, Eric On 2/23/11 5:01 AM, "real great.." wrote: Thanks a lot.. On Wed, Feb 23, 2011 at 3:12 PM, Eric wrote: Cloudera offers a nice distribution and decent documentation of how to install. When you get started, start using the "

Using ram disk for cluster.local.dir

2011-07-13 Thread Eric Caspole
other gotchas about trying to use a ram disk like this? It seems like a quick and dirty way to get some performance. Thanks, Eric

RE: Failing to contact Am/History for jobs

2011-09-13 Thread Eric Payne
I've seen it too. When I get this, I restart the NM, RM, and HS, and it stops happening. I don't have a cuase yet. -Eric From: Jeffrey Naisbitt [mailto:jnais...@yahoo-inc.com] Sent: Monday, September 12, 2011 12:23 PM To: mapreduce-...@hadoop.apache.org Subject: Failing to contact

Re: Weird NPE at TaskLogAppender.flush()

2011-10-28 Thread Eric Fiala
hanks for your time > > Marco Didonna > > PS: I use both locally and on the cloud latest version of cloudera > distribution for hadoop > -- *Eric Fiala* *Fiala Consulting* T: 403.828.1117 E: e...@fiala.ca http://www.fiala.ca

Re: Mapreduce heap size error

2011-11-13 Thread Eric Fiala
Hoot, these are big numbers - some thoughts 1) does your machine have 1000GB to spare for each java child thread (each mapper + each reducer)? mapred.child.java.opts / -Xmx1048576m 2) does each of your daemons need / have 10G? HADOOP_HEAPSIZE=1 hth EF > # The maximum amount of heap to use, i