Re: Lookup HashMap available within the Map
Given the goal of a shared data accessable across the Map instances, can someone please explain some of the differences between using: - setNumTasksToExecutePerJvm() and then having statically declared data initialised in Mapper.configure(); and - a MultithreadedMapRunner? Regards, Shane On Wed, Nov 26, 2008 at 6:41 AM, Doug Cutting [EMAIL PROTECTED] wrote: tim robertson wrote: Thanks Alex - this will allow me to share the shapefile, but I need to one time only per job per jvm read it, parse it and store the objects in the index. Is the Mapper.configure() the best place to do this? E.g. will it only be called once per job? In 0.19, with HADOOP-249, all tasks from a job can be run in a single JVM. So, yes, you could access a static cache from Mapper.configure(). Doug
Re: HDFS directory listing from the Java API?
Got it! Thanks Jürgen and Lohit! On Wed, Nov 26, 2008 at 11:10 PM, Jürgen Broß [EMAIL PROTECTED] wrote: Hi Shane, I think what you are looking for is the following: Path dirPath = new Path(path to dir); FileStatus[] files = FileSystem.get(conf).listStatus(dirPath); Each FileStatus entry in the above array contains a Path reference (files[i].getPath()) to the file or directory contained in dirPath. Greetings, Jürgen Shane Butler wrote: Hi all, Can someone pls guide me on how to get a directory listing of files on HDFS using the java API (0.19.0)? Regards, Shane -- Jürgen Broß Institute of Computer Science Databases and Information Systems Freie Universität Berlin Takustr. 9 D-14195 Berlin, Germany phone: +49 30 838-75108 email: [EMAIL PROTECTED]
HDFS directory listing from the Java API?
Hi all, Can someone pls guide me on how to get a directory listing of files on HDFS using the java API (0.19.0)? Regards, Shane