Re: "Lookup" HashMap available within the Map

2008-11-30 Thread tim robertson
Hi Shane, I can't explain that, but I can say that with 0.19.0 I am using setNumTasksToExecutePerJvm(-1) and then initializing statically declared data in the Map configure successfully now. It really is educated guesswork for the tuning parameters though - I am profiling the app for memory usage

Re: "Lookup" HashMap available within the Map

2008-11-30 Thread Shane Butler
Given the goal of a shared data accessable across the Map instances, can someone please explain some of the differences between using: - setNumTasksToExecutePerJvm() and then having statically declared data initialised in Mapper.configure(); and - a MultithreadedMapRunner? Regards, Shane On Wed,

Re: "Lookup" HashMap available within the Map

2008-11-28 Thread Saptarshi Guha
The more I use it, i realize Hadoop is not build around shared memory. For these type of things, use TSpaces (IBM), that way you can have a flag to load it once and allow for sharing. Regards Saptarshi On Tue, Nov 25, 2008 at 3:42 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote: > cool. If you need

Re: "Lookup" HashMap available within the Map

2008-11-25 Thread Chris K Wensel
cool. If you need a hand with Cascading stuff, feel free to ping me on the mail list or #cascading irc. lots of other friendly folk there already. ckw On Nov 25, 2008, at 12:35 PM, tim robertson wrote: Thanks Chris, I have a different test running, then will implement that. Might give ca

Re: "Lookup" HashMap available within the Map

2008-11-25 Thread tim robertson
Thanks Chris, I have a different test running, then will implement that. Might give cascading a shot for what I am doing. Cheers Tim On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote: > Hey Tim > > The .configure() method is what you are looking for i believe. > > It i

Re: "Lookup" HashMap available within the Map

2008-11-25 Thread Chris K Wensel
Hey Tim The .configure() method is what you are looking for i believe. It is called once per task, which in the default case, is once per jvm. Note Jobs are broken into parallel tasks, each task handles a portion of the input data. So you may create your map 100 times, because there are 100

Re: "Lookup" HashMap available within the Map

2008-11-25 Thread tim robertson
Hi Doug, Thanks - it is not so much I want to run in a single JVM - I do want a bunch of machines doing the work, it is just I want them all to have this in-memory lookup index, that is configured once per job. Is there some hook somewhere that I can trigger a read from the distributed cache, or

Re: "Lookup" HashMap available within the Map

2008-11-25 Thread Doug Cutting
tim robertson wrote: Thanks Alex - this will allow me to share the shapefile, but I need to "one time only per job per jvm" read it, parse it and store the objects in the index. Is the Mapper.configure() the best place to do this? E.g. will it only be called once per job? In 0.19, with HADOOP-

Re: "Lookup" HashMap available within the Map

2008-11-25 Thread tim robertson
Hi Thanks Alex - this will allow me to share the shapefile, but I need to "one time only per job per jvm" read it, parse it and store the objects in the index. Is the Mapper.configure() the best place to do this? E.g. will it only be called once per job? Thanks Tim On Tue, Nov 25, 2008 at 8:1

Re: "Lookup" HashMap available within the Map

2008-11-25 Thread Alex Loddengaard
You should use the DistributedCache: < http://www.cloudera.com/blog/2008/11/14/sending-files-to-remote-task-nodes-with-hadoop-mapreduce/ > and < http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache > Hope this helps! Alex On Tue, Nov 25, 2008 at 11:09 AM, tim robert

"Lookup" HashMap available within the Map

2008-11-25 Thread tim robertson
Hi all, If I want to have an in memory "lookup" Hashmap that is available in my Map class, where is the best place to initialise this please? I have a shapefile with polygons, and I wish to create the polygon objects in memory on each node's JVM and have the map able to pull back the objects by i