RE: Moving Files to Distributed Cache in MapReduce

2011-08-01 Thread Michael Segel

Yeah,

I'll write something up and post it on my web site. Definitely not InfoQ stuff, 
but a simple tip and tricks stuff.

-Mike


 Subject: Re: Moving Files to Distributed Cache in MapReduce
 From: a...@apache.org
 Date: Sun, 31 Jul 2011 19:21:14 -0700
 To: common-user@hadoop.apache.org
 
 
 We really need to build a working example to the wiki and add a link from the 
 FAQ page.  Any volunteers?
 
 On Jul 29, 2011, at 7:49 PM, Michael Segel wrote:
 
  
  Here's the meat of my post earlier...
  Sample code on putting a file on the cache:
  DistributedCache.addCacheFile(new URI(path+MyFileName,conf));
  
  Sample code in pulling data off the cache:
private Path[] localFiles = 
  DistributedCache.getLocalCacheFiles(context.getConfiguration());
 boolean exitProcess = false;
int i=0;
 while (!exit){ 
 fileName = localFiles[i].getName();
if (fileName.equalsIgnoreCase(model.txt)){
  // Build your input file reader on localFiles[i].toString() 
  exitProcess = true;
}
 i++;
 } 
  
  
  Note that this is SAMPLE code. I didn't trap the exit condition if the file 
  isn't there and you go beyond the size of the array localFiles[].
  Also I set exit to false because its easier to read this as Do this loop 
  until the condition exitProcess is true.
  
  When you build your file reader you need the full path, not just the file 
  name. The path will vary when the job runs.
  
  HTH
  
  -Mike
  
  
  From: michael_se...@hotmail.com
  To: common-user@hadoop.apache.org
  Subject: RE: Moving Files to Distributed Cache in MapReduce
  Date: Fri, 29 Jul 2011 21:43:37 -0500
  
  
  I could have sworn that I gave an example earlier this week on how to push 
  and pull stuff from distributed cache.
  
  
  Date: Fri, 29 Jul 2011 14:51:26 -0700
  Subject: Re: Moving Files to Distributed Cache in MapReduce
  From: rogc...@ucdavis.edu
  To: common-user@hadoop.apache.org
  
  jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
  Configuration for that
  
  On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia 
  mohitanch...@gmail.comwrote:
  
  Is this what you are looking for?
  
  http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
  
  search for jobConf
  
  On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen rogc...@ucdavis.edu wrote:
  Thanks for the response! However, I'm having an issue with this line
  
  Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
  
  because conf has private access in org.apache.hadoop.configured
  
  On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn mapred.le...@gmail.com
  wrote:
  
  I hope my previous reply helps...
  
  On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu
  wrote:
  
  After moving it to the distributed cache, how would I call it within
  my
  MapReduce program?
  
  On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn 
  mapred.le...@gmail.com
  wrote:
  
  Did you try using -files option in your hadoop jar command as:
  
  /usr/bin/hadoop jar jar name main class name -files  absolute
  path
  of
  file to be added to distributed cache input dir output dir
  
  
  On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu
  wrote:
  
  Slight modification: I now know how to add files to the
  distributed
  file
  cache, which can be done via this command placed in the main or
  run
  class:
  
DistributedCache.addCacheFile(new
  URI(/user/hadoop/thefile.dat),
  conf);
  
  However I am still having trouble locating the file in the
  distributed
  cache. *How do I call the file path of thefile.dat in the
  distributed
  cache
  as a string?* I am using Hadoop 0.20.2
  
  
  On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu
  
  wrote:
  
  Hi all,
  
  Does anybody have examples of how one moves files from the local
  filestructure/HDFS to the distributed cache in MapReduce? A
  Google
  search
  turned up examples in Pig but not MR.
  
  --
  Roger Chen
  UC Davis Genome Center
  
  
  
  
  --
  Roger Chen
  UC Davis Genome Center
  
  
  
  
  
  --
  Roger Chen
  UC Davis Genome Center
  
  
  
  
  
  --
  Roger Chen
  UC Davis Genome Center
  
  
  
  
  
  -- 
  Roger Chen
  UC Davis Genome Center
   

 
  

Re: Moving Files to Distributed Cache in MapReduce

2011-07-31 Thread Allen Wittenauer

We really need to build a working example to the wiki and add a link from the 
FAQ page.  Any volunteers?

On Jul 29, 2011, at 7:49 PM, Michael Segel wrote:

 
 Here's the meat of my post earlier...
 Sample code on putting a file on the cache:
 DistributedCache.addCacheFile(new URI(path+MyFileName,conf));
 
 Sample code in pulling data off the cache:
   private Path[] localFiles = 
 DistributedCache.getLocalCacheFiles(context.getConfiguration());
boolean exitProcess = false;
   int i=0;
while (!exit){ 
fileName = localFiles[i].getName();
   if (fileName.equalsIgnoreCase(model.txt)){
 // Build your input file reader on localFiles[i].toString() 
 exitProcess = true;
   }
i++;
} 
 
 
 Note that this is SAMPLE code. I didn't trap the exit condition if the file 
 isn't there and you go beyond the size of the array localFiles[].
 Also I set exit to false because its easier to read this as Do this loop 
 until the condition exitProcess is true.
 
 When you build your file reader you need the full path, not just the file 
 name. The path will vary when the job runs.
 
 HTH
 
 -Mike
 
 
 From: michael_se...@hotmail.com
 To: common-user@hadoop.apache.org
 Subject: RE: Moving Files to Distributed Cache in MapReduce
 Date: Fri, 29 Jul 2011 21:43:37 -0500
 
 
 I could have sworn that I gave an example earlier this week on how to push 
 and pull stuff from distributed cache.
 
 
 Date: Fri, 29 Jul 2011 14:51:26 -0700
 Subject: Re: Moving Files to Distributed Cache in MapReduce
 From: rogc...@ucdavis.edu
 To: common-user@hadoop.apache.org
 
 jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
 Configuration for that
 
 On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia 
 mohitanch...@gmail.comwrote:
 
 Is this what you are looking for?
 
 http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
 
 search for jobConf
 
 On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen rogc...@ucdavis.edu wrote:
 Thanks for the response! However, I'm having an issue with this line
 
 Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
 
 because conf has private access in org.apache.hadoop.configured
 
 On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn mapred.le...@gmail.com
 wrote:
 
 I hope my previous reply helps...
 
 On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu
 wrote:
 
 After moving it to the distributed cache, how would I call it within
 my
 MapReduce program?
 
 On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn 
 mapred.le...@gmail.com
 wrote:
 
 Did you try using -files option in your hadoop jar command as:
 
 /usr/bin/hadoop jar jar name main class name -files  absolute
 path
 of
 file to be added to distributed cache input dir output dir
 
 
 On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu
 wrote:
 
 Slight modification: I now know how to add files to the
 distributed
 file
 cache, which can be done via this command placed in the main or
 run
 class:
 
   DistributedCache.addCacheFile(new
 URI(/user/hadoop/thefile.dat),
 conf);
 
 However I am still having trouble locating the file in the
 distributed
 cache. *How do I call the file path of thefile.dat in the
 distributed
 cache
 as a string?* I am using Hadoop 0.20.2
 
 
 On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu
 
 wrote:
 
 Hi all,
 
 Does anybody have examples of how one moves files from the local
 filestructure/HDFS to the distributed cache in MapReduce? A
 Google
 search
 turned up examples in Pig but not MR.
 
 --
 Roger Chen
 UC Davis Genome Center
 
 
 
 
 --
 Roger Chen
 UC Davis Genome Center
 
 
 
 
 
 --
 Roger Chen
 UC Davis Genome Center
 
 
 
 
 
 --
 Roger Chen
 UC Davis Genome Center
 
 
 
 
 
 -- 
 Roger Chen
 UC Davis Genome Center

 



Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Roger Chen
Hi all,

Does anybody have examples of how one moves files from the local
filestructure/HDFS to the distributed cache in MapReduce? A Google search
turned up examples in Pig but not MR.

-- 
Roger Chen
UC Davis Genome Center


Re: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Roger Chen
Slight modification: I now know how to add files to the distributed file
cache, which can be done via this command placed in the main or run class:

DistributedCache.addCacheFile(new URI(/user/hadoop/thefile.dat),
conf);

However I am still having trouble locating the file in the distributed
cache. *How do I call the file path of thefile.dat in the distributed cache
as a string?* I am using Hadoop 0.20.2


On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu wrote:

 Hi all,

 Does anybody have examples of how one moves files from the local
 filestructure/HDFS to the distributed cache in MapReduce? A Google search
 turned up examples in Pig but not MR.

 --
 Roger Chen
 UC Davis Genome Center




-- 
Roger Chen
UC Davis Genome Center


Re: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Mapred Learn
Did you try using -files option in your hadoop jar command as:

/usr/bin/hadoop jar jar name main class name -files  absolute path of
file to be added to distributed cache input dir output dir


On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu wrote:

 Slight modification: I now know how to add files to the distributed file
 cache, which can be done via this command placed in the main or run class:

DistributedCache.addCacheFile(new URI(/user/hadoop/thefile.dat),
 conf);

 However I am still having trouble locating the file in the distributed
 cache. *How do I call the file path of thefile.dat in the distributed cache
 as a string?* I am using Hadoop 0.20.2


 On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu wrote:

  Hi all,
 
  Does anybody have examples of how one moves files from the local
  filestructure/HDFS to the distributed cache in MapReduce? A Google search
  turned up examples in Pig but not MR.
 
  --
  Roger Chen
  UC Davis Genome Center
 



 --
 Roger Chen
 UC Davis Genome Center



Re: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Roger Chen
After moving it to the distributed cache, how would I call it within my
MapReduce program?

On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn mapred.le...@gmail.comwrote:

 Did you try using -files option in your hadoop jar command as:

 /usr/bin/hadoop jar jar name main class name -files  absolute path of
 file to be added to distributed cache input dir output dir


 On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu wrote:

  Slight modification: I now know how to add files to the distributed file
  cache, which can be done via this command placed in the main or run
 class:
 
 DistributedCache.addCacheFile(new URI(/user/hadoop/thefile.dat),
  conf);
 
  However I am still having trouble locating the file in the distributed
  cache. *How do I call the file path of thefile.dat in the distributed
 cache
  as a string?* I am using Hadoop 0.20.2
 
 
  On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu
 wrote:
 
   Hi all,
  
   Does anybody have examples of how one moves files from the local
   filestructure/HDFS to the distributed cache in MapReduce? A Google
 search
   turned up examples in Pig but not MR.
  
   --
   Roger Chen
   UC Davis Genome Center
  
 
 
 
  --
  Roger Chen
  UC Davis Genome Center
 




-- 
Roger Chen
UC Davis Genome Center


Re: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Mapred Learn
ok for accessing it in mapper code, u can do something like:


On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn mapred.le...@gmail.comwrote:

 Did you try using -files option in your hadoop jar command as:

 /usr/bin/hadoop jar jar name main class name -files  absolute path of
 file to be added to distributed cache input dir output dir

 Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

 String fileName=;
 for (Path p : cacheFiles) {

 if (p != null) {
 fileName = p.getName();
 }

 }

 On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu wrote:

 Slight modification: I now know how to add files to the distributed file
 cache, which can be done via this command placed in the main or run class:

DistributedCache.addCacheFile(new URI(/user/hadoop/thefile.dat),
 conf);

 However I am still having trouble locating the file in the distributed
 cache. *How do I call the file path of thefile.dat in the distributed
 cache
 as a string?* I am using Hadoop 0.20.2


 On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu wrote:

  Hi all,
 
  Does anybody have examples of how one moves files from the local
  filestructure/HDFS to the distributed cache in MapReduce? A Google
 search
  turned up examples in Pig but not MR.
 
  --
  Roger Chen
  UC Davis Genome Center
 



 --
 Roger Chen
 UC Davis Genome Center





Re: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Mapred Learn
I hope my previous reply helps...

On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu wrote:

 After moving it to the distributed cache, how would I call it within my
 MapReduce program?

 On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn mapred.le...@gmail.com
 wrote:

  Did you try using -files option in your hadoop jar command as:
 
  /usr/bin/hadoop jar jar name main class name -files  absolute path
 of
  file to be added to distributed cache input dir output dir
 
 
  On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu
 wrote:
 
   Slight modification: I now know how to add files to the distributed
 file
   cache, which can be done via this command placed in the main or run
  class:
  
  DistributedCache.addCacheFile(new
 URI(/user/hadoop/thefile.dat),
   conf);
  
   However I am still having trouble locating the file in the distributed
   cache. *How do I call the file path of thefile.dat in the distributed
  cache
   as a string?* I am using Hadoop 0.20.2
  
  
   On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu
  wrote:
  
Hi all,
   
Does anybody have examples of how one moves files from the local
filestructure/HDFS to the distributed cache in MapReduce? A Google
  search
turned up examples in Pig but not MR.
   
--
Roger Chen
UC Davis Genome Center
   
  
  
  
   --
   Roger Chen
   UC Davis Genome Center
  
 



 --
 Roger Chen
 UC Davis Genome Center



Re: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Arindam Khaled

Please unsubscribe me.

On Jul 29, 2011, at 1:18 PM, Mapred Learn wrote:


I hope my previous reply helps...

On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu  
wrote:


After moving it to the distributed cache, how would I call it  
within my

MapReduce program?

On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn  
mapred.le...@gmail.com

wrote:



Did you try using -files option in your hadoop jar command as:

/usr/bin/hadoop jar jar name main class name -files  absolute  
path

of

file to be added to distributed cache input dir output dir


On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu

wrote:



Slight modification: I now know how to add files to the distributed

file

cache, which can be done via this command placed in the main or run

class:


  DistributedCache.addCacheFile(new

URI(/user/hadoop/thefile.dat),

conf);

However I am still having trouble locating the file in the  
distributed
cache. *How do I call the file path of thefile.dat in the  
distributed

cache

as a string?* I am using Hadoop 0.20.2


On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu

wrote:



Hi all,

Does anybody have examples of how one moves files from the local
filestructure/HDFS to the distributed cache in MapReduce? A Google

search

turned up examples in Pig but not MR.

--
Roger Chen
UC Davis Genome Center





--
Roger Chen
UC Davis Genome Center







--
Roger Chen
UC Davis Genome Center





Re: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Roger Chen
Thanks for the response! However, I'm having an issue with this line

Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

because conf has private access in org.apache.hadoop.configured

On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn mapred.le...@gmail.comwrote:

 I hope my previous reply helps...

 On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu wrote:

  After moving it to the distributed cache, how would I call it within my
  MapReduce program?
 
  On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn mapred.le...@gmail.com
  wrote:
 
   Did you try using -files option in your hadoop jar command as:
  
   /usr/bin/hadoop jar jar name main class name -files  absolute path
  of
   file to be added to distributed cache input dir output dir
  
  
   On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu
  wrote:
  
Slight modification: I now know how to add files to the distributed
  file
cache, which can be done via this command placed in the main or run
   class:
   
   DistributedCache.addCacheFile(new
  URI(/user/hadoop/thefile.dat),
conf);
   
However I am still having trouble locating the file in the
 distributed
cache. *How do I call the file path of thefile.dat in the distributed
   cache
as a string?* I am using Hadoop 0.20.2
   
   
On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu
   wrote:
   
 Hi all,

 Does anybody have examples of how one moves files from the local
 filestructure/HDFS to the distributed cache in MapReduce? A Google
   search
 turned up examples in Pig but not MR.

 --
 Roger Chen
 UC Davis Genome Center

   
   
   
--
Roger Chen
UC Davis Genome Center
   
  
 
 
 
  --
  Roger Chen
  UC Davis Genome Center
 




-- 
Roger Chen
UC Davis Genome Center


Re: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Mohit Anchlia
Is this what you are looking for?

http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

search for jobConf

On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen rogc...@ucdavis.edu wrote:
 Thanks for the response! However, I'm having an issue with this line

 Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

 because conf has private access in org.apache.hadoop.configured

 On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn mapred.le...@gmail.comwrote:

 I hope my previous reply helps...

 On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu wrote:

  After moving it to the distributed cache, how would I call it within my
  MapReduce program?
 
  On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn mapred.le...@gmail.com
  wrote:
 
   Did you try using -files option in your hadoop jar command as:
  
   /usr/bin/hadoop jar jar name main class name -files  absolute path
  of
   file to be added to distributed cache input dir output dir
  
  
   On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu
  wrote:
  
Slight modification: I now know how to add files to the distributed
  file
cache, which can be done via this command placed in the main or run
   class:
   
       DistributedCache.addCacheFile(new
  URI(/user/hadoop/thefile.dat),
conf);
   
However I am still having trouble locating the file in the
 distributed
cache. *How do I call the file path of thefile.dat in the distributed
   cache
as a string?* I am using Hadoop 0.20.2
   
   
On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu
   wrote:
   
 Hi all,

 Does anybody have examples of how one moves files from the local
 filestructure/HDFS to the distributed cache in MapReduce? A Google
   search
 turned up examples in Pig but not MR.

 --
 Roger Chen
 UC Davis Genome Center

   
   
   
--
Roger Chen
UC Davis Genome Center
   
  
 
 
 
  --
  Roger Chen
  UC Davis Genome Center
 




 --
 Roger Chen
 UC Davis Genome Center



Re: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Roger Chen
jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
Configuration for that

On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Is this what you are looking for?

 http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

 search for jobConf

 On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen rogc...@ucdavis.edu wrote:
  Thanks for the response! However, I'm having an issue with this line
 
  Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
 
  because conf has private access in org.apache.hadoop.configured
 
  On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn mapred.le...@gmail.com
 wrote:
 
  I hope my previous reply helps...
 
  On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu
 wrote:
 
   After moving it to the distributed cache, how would I call it within
 my
   MapReduce program?
  
   On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn 
 mapred.le...@gmail.com
   wrote:
  
Did you try using -files option in your hadoop jar command as:
   
/usr/bin/hadoop jar jar name main class name -files  absolute
 path
   of
file to be added to distributed cache input dir output dir
   
   
On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu
   wrote:
   
 Slight modification: I now know how to add files to the
 distributed
   file
 cache, which can be done via this command placed in the main or
 run
class:

DistributedCache.addCacheFile(new
   URI(/user/hadoop/thefile.dat),
 conf);

 However I am still having trouble locating the file in the
  distributed
 cache. *How do I call the file path of thefile.dat in the
 distributed
cache
 as a string?* I am using Hadoop 0.20.2


 On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu
 
wrote:

  Hi all,
 
  Does anybody have examples of how one moves files from the local
  filestructure/HDFS to the distributed cache in MapReduce? A
 Google
search
  turned up examples in Pig but not MR.
 
  --
  Roger Chen
  UC Davis Genome Center
 



 --
 Roger Chen
 UC Davis Genome Center

   
  
  
  
   --
   Roger Chen
   UC Davis Genome Center
  
 
 
 
 
  --
  Roger Chen
  UC Davis Genome Center
 




-- 
Roger Chen
UC Davis Genome Center


Re: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Roger Chen
Hi all, I have now resolved my issue by doing a try/catch statement. Thanks
for all the help!

On Fri, Jul 29, 2011 at 2:51 PM, Roger Chen rogc...@ucdavis.edu wrote:

 jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
 Configuration for that


 On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Is this what you are looking for?

 http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

 search for jobConf

 On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen rogc...@ucdavis.edu wrote:
  Thanks for the response! However, I'm having an issue with this line
 
  Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
 
  because conf has private access in org.apache.hadoop.configured
 
  On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn mapred.le...@gmail.com
 wrote:
 
  I hope my previous reply helps...
 
  On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu
 wrote:
 
   After moving it to the distributed cache, how would I call it within
 my
   MapReduce program?
  
   On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn 
 mapred.le...@gmail.com
   wrote:
  
Did you try using -files option in your hadoop jar command as:
   
/usr/bin/hadoop jar jar name main class name -files  absolute
 path
   of
file to be added to distributed cache input dir output dir
   
   
On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu
   wrote:
   
 Slight modification: I now know how to add files to the
 distributed
   file
 cache, which can be done via this command placed in the main or
 run
class:

DistributedCache.addCacheFile(new
   URI(/user/hadoop/thefile.dat),
 conf);

 However I am still having trouble locating the file in the
  distributed
 cache. *How do I call the file path of thefile.dat in the
 distributed
cache
 as a string?* I am using Hadoop 0.20.2


 On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen 
 rogc...@ucdavis.edu
wrote:

  Hi all,
 
  Does anybody have examples of how one moves files from the
 local
  filestructure/HDFS to the distributed cache in MapReduce? A
 Google
search
  turned up examples in Pig but not MR.
 
  --
  Roger Chen
  UC Davis Genome Center
 



 --
 Roger Chen
 UC Davis Genome Center

   
  
  
  
   --
   Roger Chen
   UC Davis Genome Center
  
 
 
 
 
  --
  Roger Chen
  UC Davis Genome Center
 




 --
 Roger Chen
 UC Davis Genome Center




-- 
Roger Chen
UC Davis Genome Center


RE: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Michael Segel

I could have sworn that I gave an example earlier this week on how to push and 
pull stuff from distributed cache.


 Date: Fri, 29 Jul 2011 14:51:26 -0700
 Subject: Re: Moving Files to Distributed Cache in MapReduce
 From: rogc...@ucdavis.edu
 To: common-user@hadoop.apache.org
 
 jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
 Configuration for that
 
 On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
 
  Is this what you are looking for?
 
  http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
 
  search for jobConf
 
  On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen rogc...@ucdavis.edu wrote:
   Thanks for the response! However, I'm having an issue with this line
  
   Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
  
   because conf has private access in org.apache.hadoop.configured
  
   On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn mapred.le...@gmail.com
  wrote:
  
   I hope my previous reply helps...
  
   On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu
  wrote:
  
After moving it to the distributed cache, how would I call it within
  my
MapReduce program?
   
On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn 
  mapred.le...@gmail.com
wrote:
   
 Did you try using -files option in your hadoop jar command as:

 /usr/bin/hadoop jar jar name main class name -files  absolute
  path
of
 file to be added to distributed cache input dir output dir


 On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu
wrote:

  Slight modification: I now know how to add files to the
  distributed
file
  cache, which can be done via this command placed in the main or
  run
 class:
 
 DistributedCache.addCacheFile(new
URI(/user/hadoop/thefile.dat),
  conf);
 
  However I am still having trouble locating the file in the
   distributed
  cache. *How do I call the file path of thefile.dat in the
  distributed
 cache
  as a string?* I am using Hadoop 0.20.2
 
 
  On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu
  
 wrote:
 
   Hi all,
  
   Does anybody have examples of how one moves files from the local
   filestructure/HDFS to the distributed cache in MapReduce? A
  Google
 search
   turned up examples in Pig but not MR.
  
   --
   Roger Chen
   UC Davis Genome Center
  
 
 
 
  --
  Roger Chen
  UC Davis Genome Center
 

   
   
   
--
Roger Chen
UC Davis Genome Center
   
  
  
  
  
   --
   Roger Chen
   UC Davis Genome Center
  
 
 
 
 
 -- 
 Roger Chen
 UC Davis Genome Center
  

RE: Moving Files to Distributed Cache in MapReduce

2011-07-29 Thread Michael Segel

Here's the meat of my post earlier...
Sample code on putting a file on the cache:
DistributedCache.addCacheFile(new URI(path+MyFileName,conf));

Sample code in pulling data off the cache:
   private Path[] localFiles = 
DistributedCache.getLocalCacheFiles(context.getConfiguration());
boolean exitProcess = false;
   int i=0;
while (!exit){ 
fileName = localFiles[i].getName();
   if (fileName.equalsIgnoreCase(model.txt)){
 // Build your input file reader on localFiles[i].toString() 
 exitProcess = true;
   }
i++;
} 
 
 
Note that this is SAMPLE code. I didn't trap the exit condition if the file 
isn't there and you go beyond the size of the array localFiles[].
Also I set exit to false because its easier to read this as Do this loop until 
the condition exitProcess is true.
 
When you build your file reader you need the full path, not just the file name. 
The path will vary when the job runs.
 
HTH
 
-Mike
 

 From: michael_se...@hotmail.com
 To: common-user@hadoop.apache.org
 Subject: RE: Moving Files to Distributed Cache in MapReduce
 Date: Fri, 29 Jul 2011 21:43:37 -0500
 
 
 I could have sworn that I gave an example earlier this week on how to push 
 and pull stuff from distributed cache.
 
 
  Date: Fri, 29 Jul 2011 14:51:26 -0700
  Subject: Re: Moving Files to Distributed Cache in MapReduce
  From: rogc...@ucdavis.edu
  To: common-user@hadoop.apache.org
  
  jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
  Configuration for that
  
  On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia 
  mohitanch...@gmail.comwrote:
  
   Is this what you are looking for?
  
   http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
  
   search for jobConf
  
   On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen rogc...@ucdavis.edu wrote:
Thanks for the response! However, I'm having an issue with this line
   
Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
   
because conf has private access in org.apache.hadoop.configured
   
On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn mapred.le...@gmail.com
   wrote:
   
I hope my previous reply helps...
   
On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu
   wrote:
   
 After moving it to the distributed cache, how would I call it within
   my
 MapReduce program?

 On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn 
   mapred.le...@gmail.com
 wrote:

  Did you try using -files option in your hadoop jar command as:
 
  /usr/bin/hadoop jar jar name main class name -files  absolute
   path
 of
  file to be added to distributed cache input dir output dir
 
 
  On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu
 wrote:
 
   Slight modification: I now know how to add files to the
   distributed
 file
   cache, which can be done via this command placed in the main or
   run
  class:
  
  DistributedCache.addCacheFile(new
 URI(/user/hadoop/thefile.dat),
   conf);
  
   However I am still having trouble locating the file in the
distributed
   cache. *How do I call the file path of thefile.dat in the
   distributed
  cache
   as a string?* I am using Hadoop 0.20.2
  
  
   On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu
   
  wrote:
  
Hi all,
   
Does anybody have examples of how one moves files from the 
local
filestructure/HDFS to the distributed cache in MapReduce? A
   Google
  search
turned up examples in Pig but not MR.
   
--
Roger Chen
UC Davis Genome Center
   
  
  
  
   --
   Roger Chen
   UC Davis Genome Center
  
 



 --
 Roger Chen
 UC Davis Genome Center

   
   
   
   
--
Roger Chen
UC Davis Genome Center
   
  
  
  
  
  -- 
  Roger Chen
  UC Davis Genome Center