Hi!
I am working on applying WordCount example on the entire Wikipedia dump. The
entire english wikipedia is around 200GB which I have stored in HDFS in a
cluster to which I have access.
The problem: Wikipedia dump contains many directories (it has a very big
directory structure) containing
I wish to give a path of a jar file as an argument when executing the hadoop
jar . command as my mapper uses that jar file for its operation. I
found that -libjars option can be used but for me it is not working, it is
giving an exception. Can anyone tell, how to use libjars generic command
Hi All,
I am porting a machine learning application on Hadoop using MapReduce. The
architecture of the application goes like this:
1. run a number of server processes which take around 2-3 minutes to start
and then remain as daemon waiting for a client to call for a connection.
During the
Can anyone help me on this issue. I have an account on the cluster and I
cannot go and start server on each server process on each tasktracker.
Akhil
akhil1988 wrote:
Hi All,
I am porting a machine learning application on Hadoop using MapReduce. The
architecture of the application goes
Hi All,
I am running my mapred program in local mode by setting
mapred.jobtracker.local to local mode so that I can debug my code.
The mapred program is a direct porting of my original sequential code. There
is no reduce phase.
Basically, I have just put my program in the map class.
My program
(JobShell.java:68)
akhil1988 wrote:
Thank you Jason for your reply.
My Map class is an inner class and it is a static class. Here is the
structure of my code.
public class NerTagger {
public static class Map extends MapReduceBase implements
MapperLongWritable, Text, Text, Text
DistributedCache.addCacheFile(new
URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); Data is a directory
which contains some text as well as some binary files. In the statement
Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); I can
see(in the output messages) that it is able to read
:
DistributedCache.addCacheFile(new
URI(/home/akhil1988/Ner/OriginalNer/Data/), conf);
DistributedCache.addCacheFile(new
URI(/home/akhil1988/Ner/OriginalNer/Config/), conf);
DistributedCache.createSymlink(conf);
The program executes till the same point as before now also and terminates.
That means
that I would like to ask you is that can we use DistributerCache
for transferring directories to the local cache of the tasks?
Thanks,
Akhil
akhil1988 wrote:
Hi Jason!
Thanks for going with me to solve my problem.
To restate things and make it more easier to understand: I am working
wordcount_classes_dir.jar
org.uiuc.upcrc.extClasses.WordCount /home/akhil1988/input
/home/akhil1988/output
JO
09/06/22 19:19:01 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
org.apache.hadoop.ipc.RemoteException: java.io.FileNotFoundException
it is
only created at cluster start time.
On Mon, Jun 22, 2009 at 6:19 PM, akhil1988 akhilan...@gmail.com wrote:
Hi All!
I have been running Hadoop jobs through my user account on a cluster, for
a
while now. But now I am getting this strange exception when I try to
execute
a job
Hi All!
I want a directory to be present in the local working directory of the task
for which I am using the following statements:
DistributedCache.addCacheArchive(new URI(/home/akhil1988/Config.zip),
conf);
DistributedCache.createSymlink(conf);
Here Config is a directory which I have zipped
Please ask any questions if I am not clear above about the problem I am
facing.
Thanks,
Akhil
akhil1988 wrote:
Hi All!
I want a directory to be present in the local working directory of the
task for which I am using the following statements:
DistributedCache.addCacheArchive(new URI
Thanks Amareshwari for your reply!
The file Config.zip is lying in the HDFS, if it would not have been then the
error would be reported by the jobtracker itself while executing the
statement:
DistributedCache.addCacheArchive(new URI(/home/akhil1988/Config.zip),
conf);
But I get error in the map
Yes, my HDFS paths are of the form /home/user-name/
And I have used these in DistributedCache's addCacheFiles method
successfully.
Thanks,
Akhil
Amareshwari Sriramadasu wrote:
Is your hdfs path /home/akhil1988/Config.zip? Usually hdfs path is of the
form /user/akhil1988/Config.zip.
Just
, akhil1988 akhilan...@gmail.com wrote:
Please ask any questions if I am not clear above about the problem I am
facing.
Thanks,
Akhil
akhil1988 wrote:
Hi All!
I want a directory to be present in the local working directory of the
task for which I am using the following statements
Hi All,
I am using DistributedCache.addCacheArchives() to distribute a tar file to
the tasktrackers using the following statement.
DistributedCache.addCacheArchives(new URI(/home/akhil1988/sample.tar),
conf);
According to the documentation it should get unarchived at the tasktrackers
17 matches
Mail list logo