Hi Andrey,
I think that is classpath problem.
Can you try using patch at
https://issues.apache.org/jira/browse/HADOOP-2622 and see you still have
the problem?
Thanks
Amareshwari.
Andrey Pankov wrote:
Hi all,
I'm still new to Hadoop. I'd like to use Hadoop streaming in order to
combine
Hi Andreas,
Looks like your mapper is not available to the streaming jar. Where is
your mapper script? Did you use distributed cache to distribute the mapper?
You can use -file mapper-script-path on local fs to make it part of
jar. or Use -cacheFile /dist/wordloadmf#workloadmf to distribute the
Norbert Burger wrote:
I'm trying to use the cacheArchive command-line options with the
hadoop-0.15.3-streaming.jar. I'm using the option as follows:
-cacheArchive hdfs://host:50001/user/root/lib.jar#lib
Unfortunately, my PERL scripts fail with an error consistent with not being
able to find
LineRecordReader.readLine() is deprecated by
HADOOP-2285(http://issues.apache.org/jira/browse/HADOOP-2285) because it was
slow.
But streaming still uses the method. HADOOP-2826
(http://issues.apache.org/jira/browse/HADOOP-2826) will remove the usage in
streaming.
This change should improve
Arun C Murthy wrote:
On Apr 3, 2008, at 5:36 PM, Jason Venner wrote:
For the first day or so, when the jobs are viewable via the main page
of the job tracker web interface, the jobs specific counters are also
visible. Once the job is only visible in the history page, the
counters are not
You can have a look at TextInputFormat, KeyValueTextInputFormat etc at
http://svn.apache.org/viewvc/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/
coneybeare wrote:
I want to alter the default key, line input format to be key, line
number: + line so that my mapper can have a reference
You can put your external jar in DistributedCache. and do symlink the
jar in the current working directory of the task giving the value of
mapred.create.symlink as true. More details can be found at
http://issues.apache.org/jira/browse/HADOOP-1660.
The jar can also be added to classpath
Taeho Kang wrote:
Set mapred.tasktracker.tasks.maximum
and each node will be able to process N number of tasks - map or/and reduce.
Please note that once you set mapred.tasktracker.tasks.maximum,
mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum setting will not
The proposal on http://issues.apache.org/jira/browse/HADOOP-3386 takes
care of this.
Thanks
Amareshwari
Amareshwari Sriramadasu wrote:
If task tracker didn't receive KillJobAction, its true that job
directory will not removed.
And your observation is correct that some task trackers didn't
Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if
it make sense?
Thanks
Amareshwari
Paco NATHAN wrote:
We have a need to access data found in the JobTracker History link.
Specifically in the Analyse This Job analysis. Must be run in Java,
between jobs, in the same code
Amareshwari
Paco NATHAN wrote:
Thank you, Amareshwari -
That helps. Hadn't noticed HistoryViewer before. It has no JavaDoc.
What is a typical usage? In other words, what would be the
outputDir value in the context of ToolRunner, JobClient, etc. ?
Paco
On Sun, Jul 27, 2008 at 11:48 PM, Amareshwari
for the same,
https://issues.apache.org/jira/browse/HADOOP-3850. You can give you
inputs there.
Thanks
Amareshwari
Paco
On Mon, Jul 28, 2008 at 1:42 AM, Amareshwari Sriramadasu
[EMAIL PROTECTED] wrote:
HistoryViewer is used in JobClient to view the history files in the
directory provided
Hi Srilatha,
You can download hadoop release tar ball from
http://hadoop.apache.org/core/releases.html
You will find hadoop-*-examples.jar when you untar it.
Thanks,
Amareshwari
us latha wrote:
HI All,
Trying to run the wordcount example on single node hadoop setup.
Could anyone please
Arv Mistry wrote:
I'll try again, can anyone tell me should it be possible to run hadoop
in a pseudo-distributed mode (i.e. everything on one machine) and then
submit a mapred job using the ToolRunner from another machine on that
hadoop configuration?
Cheers Arv
Yes. It is possible to do.
The error Could not find any valid local directory for task means that
the task could not find a local directory to write file, mostly because
there is no enough space on any of the disks.
Thanks
Amareshwari
Shirley Cohen wrote:
Hi,
Does anyone know what the following error means?
You can get the file name accessed by the mapper using the config
property map.input.file
Thanks
Amareshwari
Deyaa Adranale wrote:
Hi,
I need to know inside my mapper, the name of the file that contains
the current record.
I saw that I can access the name of the input directories inside
The Mapred framework kills the map/reduce tasks if they dont report
status within 10 minutes. If your mapper/reducer needs more time they
should report status using
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Reporter.html
More documentation at
You can add more paths to input using
FileInputFormat.addInputPath(JobConf, Path).
You can also specify comma separated filenames as input path using
FileInputFormat.setInputPaths(JobConf, String commaSeparatedPaths)
More details at
Per Jacobsson wrote:
Hi all.
I've got a beginner question: Are there any best practices for how to do
logging from a task? Essentially I want to log warning messages under
certain conditions in my map and reduce tasks, and be able to review them
later.
stdout, stderr and the logs using
Dennis Kubes wrote:
If I understand what you are asking you can use the -cacheArchive with
the path to the jar to including the jar file in the classpath of your
streaming job.
Dennis
You can also use -cacheArchive option to include jar file and symlink
the unjarred directory from cwd by
in the local running directory, correct?
Just like the cacheFile option? If not how can i then specify which
class to use?
cheers,
Christian
Amareshwari Sriramadasu wrote:
Dennis Kubes wrote:
If I understand what you are asking you can use the -cacheArchive
with the path to the jar
Are you seeing HADOOP-2009?
Thanks
Amareshwari
Nathan Marz wrote:
Unfortunately, setting those environment variables did not help my
issue. It appears that the HADOOP_LZO_LIBRARY variable is not
defined in both LzoCompressor.c and LzoDecompressor.c. Where is this
variable supposed to be set?
This is because the non-zero exit status of streaming process was not
treated as failure until 0.17. In 0.17, you can specify the
configuration property stream.non.zero.exit.is.failure as true, to
consider the non-zero exit as failure. From 0.18, the default value
for/
Hi Naama,
Yes. It is possible to specify using the apis
FileInputFormat#setInputPaths(), FileOutputFormat#setOutputPath().
You can specify the FileSystem uri for the path.
Thanks,
Amareshwari
Naama Kraus wrote:
Hi,
I wanted to know if it is possible to use different file systems for Map
Hi,
From 0.19, the jars added using -libjars are available on the client
classpath also, fixed by HADOOP-3570.
Thanks
Amareshwari
Mahadev Konar wrote:
HI Tarandeep,
the libjars options does not add the jar on the client side. Their is an
open jira for that ( id ont remember which one)...
Has your task-tracker started? I mean, do you see non-zero nodes on your
job tracker UI?
-Amareshwari
John Babilon wrote:
Hello,
I've been trying to get Hadoop up and running on a Windows Desktop running
Windows XP. I've installed Cygwin and Hadoop. I run the start-all.sh script,
it
Hi,
How are you passing your classes to the pipes job? If you are passing
them as a jar file, you can use -libjars option. From branch 0.19, the
libjar files are added to the client classpath also.
Thanks
Amareshwari
Zhengguo 'Mike' SUN wrote:
Hi,
I implemented customized classes for
ways to do that?
Thanks
Mike
From: Amareshwari Sriramadasu [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Tuesday, October 28, 2008 11:58:33 PM
Subject: Re: How do I include customized InputFormat, InputSplit and
RecordReader in a C++ pipes job?
Hi,
How
Some more links:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Other+Useful+Features
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Debugging
-Amareshwari
Arun C Murthy wrote:
On Oct 30, 2008, at 1:16 PM, Scott Whitecross wrote:
Is the presentation online
Nathan Marz wrote:
Hello all,
Occasionally when running jobs, Hadoop fails to clean up the
_temporary directories it has left behind. This only appears to
happen when a task is killed (aka a speculative execution), and the
data that task has outputted so far is not cleaned up. Is this a
some speed wrote:
I was wondering if it was possible to read the input for a map function from
2 different files:
1st file --- user-input file from a particular location(path)
2nd file=--- A resultant file (has just one key,value pair) from a
previous MapReduce job. (I am implementing a chain
Jeremy Pinkham wrote:
We are using the distributed cache in one of our jobs and have noticed
that the local copies on all of the task nodes never seem to get cleaned
up. Is there a mechanism in the API to tell the framework that those
copies are no longer needed so they can be deleted. I've
Hi Rahul,
How did you set the configuration mapred.line.input.format.linespermap
and your input format? You have to set them in hadoop-site.xml or pass
them through -D option to the job.
NLineInputFormat will split N lines of input as one split. So, each map
gets N lines.
But the RecordReader
that it returns the
value as N Lines?
Setting Configuration in run() method will also work. You have to extend
LineRecordReader and override method next() to return N lines as value
instead of 1 line.
Thanks
Amareshwari
Thanks
Rahul
On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu
[EMAIL
that it returns the
value as N Lines?
Thanks
Rahul
On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:
Hi Rahul,
How did you set the configuration
mapred.line.input.format.linespermap and your input format? You
have to set them
tim robertson wrote:
Hi all,
I am running MR which is scanning 130M records and then trying to
group them into around 64,000 files.
The Map does the grouping of the record by determining the key, and
then I use a MultipleTextOutputFormat to write the file based on the
key:
@Override
Jeremy Chow wrote:
Hi list,
I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then
refreshed my cluster with command
bin/hadoop dfsadmin -refreshNodes
It showed that it can only shut down the DataNode process but not included
the TaskTracker process on each
Message-
From: Amareshwari Sriramadasu [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2008 10:56 AM
To: core-user@hadoop.apache.org
Subject: Re: Error with Sequence File in hadoop-18
It got fixed in 0.18.3 (HADOOP-4499).
-Amareshwari
Palleti, Pallavi wrote:
Hi,
I am getting Check sum
Hi Aayush,
Do you want one map to run one command? You can give input file
consisting of lines of file outputfile. Use NLineInputFormat which
splits N lines of input as one split. i.e gives N lines to one map for
processing. By default, N is one. Then your map can just run the shell
command
Arv Mistry wrote:
I'm using hadoop 0.17.0. Unfortunately I cant upgrade to 0.19.0 just
yet.
I'm trying to control the amount of extraneous files. I noticed there
are the following log files produced by hadoop;
On Slave
- userlogs (for each map/reduce job)
You can set the configuration property
mapred.task.tracker.http.address to 0.0.0.0:0 . If the port is given
as 0, then the server will start on a free port.
Thanks
Amareshwari
Sagar Naik wrote:
- check hadoop-default.xml
in here u will find all the ports used. Copy the xml-nodes from
You can report status from streaming job by emitting
reporter:status:message in stderr.
See documentation @
http://hadoop.apache.org/core/docs/r0.18.2/streaming.html#How+do+I+update+status+in+streaming+applications%3F
But from the exception trace, it doesn't look like lack of
org.apache.hadoop.mapred.JobTracker:
Removed completed task 'attempt_200812221742_0075_r_00_2' from
'tracker_hnode1.cor.mystrands.in:localhost/127.0.0.1:37971'
Thanks,
RDH
On Dec 23, 2008, at 1:00 AM, Amareshwari Sriramadasu wrote:
You can report status from streaming job by emitting
reporter:status:message
Saptarshi Guha wrote:
Caught it in action.
Running ps -e -o 'vsz pid ruser args' |sort -nr|head -5
on a machine where the map task was running
04812 16962 sguha/home/godhuli/custom/jdk1.6.0_11/jre/bin/java
Sean Shanny wrote:
To all,
Version: hadoop-0.17.2.1-core.jar
I have created a MapFile.
What I don't seem to be able to do is correctly place the MapFile in
the DistributedCache and the make use of it in a map method.
I need the following info please:
1.How and where to place the
Saptarshi Guha wrote:
Hello,
I had previously emailed regarding heap size issue and have discovered
that the hadoop-site.xml is not loading completely, i.e
Configuration defaults = new Configuration();
JobConf jobConf = new JobConf(defaults, XYZ.class);
jobOutputDir is the location specified by the configuration property
hadoop.job.history.user.location. If you don't specify anything for the property,
the job history logs will be created in job's output directory. So, to view your history give
your jobOutputDir, if you havent specified any
You can also have a look at NLineInputFormat.
@http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
Thanks
Amareshwari
Abdul Qadeer wrote:
Dmitry,
If you are talking about Text data, then the splits can be anywhere. But
LineRecordReader will take
You can use Job Control.
See
http://hadoop.apache.org/core/docs/r0.19.0/mapred_tutorial.html#Job+Control
http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/jobcontrol/Job.html
and
From the exception you pasted, it looks like your io.serializations did
not set the SerializationFactory properly. Do you see any logs on your
console for adding serialization class?
Can you try running your app on pseudo distributed mode, instead of
LocalJobRunner ?
You can find pseudo
Saptarshi Guha wrote:
Sorry, i see - every line is now a maptask - one split,one task.(in
this case N=1 line per split)
Is that correct?
Saptarshi
You are right. NLineInputFormat splits N lines of input as one split and
each split is given to a map task.
By default, N is 1. N can configured
patektek wrote:
Hello list, I am trying to add some functionality to Hadoop-core and I am
having serious issues
debugging it. I have searched in the list archive and still have not been
able to resolve the issues.
Simple question:
If I want to insert LOG.INFO() statements in Hadoop code is not
Edwin wrote:
Hi
I am looking for a way to interrupt a thread that entered
JobClient.runJob(). The runJob() method keep polling the JobTracker until
the job is completed. After reading the source code, I know that the
InterruptException is caught in runJob(). Thus, I can't interrupt it using
You can use NLineInputFormat for this, which splits one line (N=1, by
default) as one split.
So, each map task processes one line.
See
http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
-Amareshwari
S D wrote:
Hello,
I have a clarifying
Kris Jirapinyo wrote:
Hi all,
I am using counters in Hadoop via the reporter. I can see this custom
counter fine after I run my job. However, if somehow I restart the cluster,
then when I look into the Hadoop Job History, I can't seem to find the
information of my previous counter values
Anum Ali wrote:
Hi,
Need some kind of guidance related to started with Hadoop Installation and
system setup. Iam newbie regarding to Hadoop . Our system OS is Fedora 8,
should I start from a stable release of Hadoop or get it from svn developing
version (from contribute site).
Thank You
approach, can you point me to an example of what kind of
param should be specified? I appreciate your help.
Thanks,
SD
On Thu, Jan 29, 2009 at 10:49 PM, Amareshwari Sriramadasu
amar...@yahoo-inc.com wrote:
You can use NLineInputFormat for this, which splits one line (N=1, by
default) as one
.)
-Amareshwari
Any thoughts?
John
On Sun, Feb 1, 2009 at 11:00 PM, Amareshwari Sriramadasu
amar...@yahoo-inc.com wrote:
Which version of hadoop are you using?
You can directly use -inputformat
org.apache.hadoop.mapred.lib.NLineInputFormat for your streaming job. You
need not include
Andrew wrote:
I've noticed that task tracker moves all unpacked jars into
${hadoop.tmp.dir}/mapred/local/taskTracker.
We are using a lot of external libraries, that are deployed via -libjars
option. The total number of files after unpacking is about 20 thousands.
After running a number of
Nick Cen wrote:
Hi,
I hava a hadoop cluster with 4 pc. And I wanna to integrate hadoop and
lucene together, so i copy some of the source code from nutch's Indexer
class, but when i run my job, i found that there is only 1 reducer running
on 1 pc, so the performance is not as far as expect.
Nathan Marz wrote:
I have some unit tests which run MapReduce jobs and test the
inputs/outputs in standalone mode. I recently started using
DistributedCache in one of these jobs, but now my tests fail with
errors such as:
Caused by: java.io.IOException: Incomplete HDFS URI, no host:
Bill Au wrote:
I have enabled persistent completed jobs status and can see them in HDFS.
However, they are not listed in the jobtracker's UI after the jobtracker is
restarted. I thought that jobtracker will automatically look in HDFS if it
does not find a job in its memory cache. What am I
Yes. The configuration is read only when the taskTracker starts.
You can see more discussion on jira HADOOP-5170
(http://issues.apache.org/jira/browse/HADOOP-5170) for making it per job.
-Amareshwari
jason hadoop wrote:
I certainly hope it changes but I am unaware that it is in the todo queue
You should implement Tool interface and submit jobs.
For example see org.apache.hadoop.examples.WordCount
-Amareshwari
Wu Wei wrote:
Hi,
I used to submit Hadoop job with the utility RunJar.main() on hadoop
0.18. On hadoop 0.19, because the commandLineConfig of JobClient was
null, I got a
Arun C Murthy wrote:
On Feb 23, 2009, at 2:01 AM, Bing TANG wrote:
Hi, everyone,
Could somdone tell me the principle of -file when using Hadoop
Streaming. I want to ship a big file to Slaves, so how it works?
Hadoop uses SCP to copy? How does Hadoop deal with -file option?
No, -file just
Nathan Marz wrote:
I have a large job operating on over 2 TB of data, with about 5
input splits. For some reason (as yet unknown), tasks started failing
on two of the machines (which got blacklisted). 13 mappers failed in
total. Of those 13, 8 of the tasks were able to execute on another
Are you hitting HADOOP-2771?
-Amareshwari
Sandy wrote:
Hello all,
For the sake of benchmarking, I ran the standard hadoop wordcount example on
an input file using 2, 4, and 8 mappers and reducers for my job.
In other words, I do:
time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m
Is your job a streaming job?
If so, Which version of hadoop are you using? what is the configured
value for stream.non.zero.exit.is.failure? Can you see
stream.non.zero.exit.is.failure to true and try again?
Thanks
Amareshwari
Saptarshi Guha wrote:
Hello,
I have given a case where my mapper
This is due to HADOOP-5233. Got fixed in branch 0.19.2
-Amareshwari
Nathan Marz wrote:
Every now and then, I have jobs that stall forever with one map task
remaining. The last map task remaining says it is at 100% and in the
logs, it says it is in the process of committing. However, the task
Till 0.18.x, files are not added to client-side classpath. Use 0.19,
and run following command to use custom input format
bin/hadoop jar contrib/streaming/hadoop-0.19.0-streaming.jar -mapper
mapper.pl -reducer org.apache.hadoop.mapred.lib.IdentityReducer -input
test.data -output test-output
into future releases.
cheers,
ckw
On Mar 12, 2009, at 8:20 PM, Amareshwari Sriramadasu wrote:
Are you seeing reducers getting spawned from web ui? then, it is a bug.
If not, there won't be reducers spawned, it could be job-setup/
job-cleanup task that is running on a reduce slot. See HADOOP-3150
Saptarshi Guha wrote:
Hello,
I would like to produce side effect files which will be later copied
to the outputfolder.
I am using FileOuputFormat, and in the Map's close() method i copy
files (from the local tmp/ folder) to
FileOutputFormat.getWorkOutputPath(job);
Can you look for Exception from jetty in JT logs and report here? That
would tell us the cause for ERROR 500.
Thanks
Amareshwari
Nathan Marz wrote:
Sometimes I am unable to access a job's details and instead only see.
I am seeing this on 0.19.2 branch.
HTTP ERROR: 500
Internal Server Error
Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours
to higher value. By default, their values are 24 hours. These might be
the reason for failure, though I'm not sure.
Thanks
Amareshwari
Billy Pearson wrote:
I am seeing on one of my long running jobs about 50-60 hours
Elia Mazzawi wrote:
is there a command that i can run from the shell that says this job
passed / failed
I found these but they don't really say pass/fail they only say what
is running and percent complete.
this shows what is running
./hadoop job -list
and this shows the completion
./hadoop
You can add your jar to distributed cache and add it to classpath by
passing it in configuration propery - mapred.job.classpath.archives.
-Amareshwari
Peter Skomoroch wrote:
If I need to use a custom streaming combiner jar in Hadoop 18.3, is there a
way to add it to the classpath without the
Hi Sandhya,
Which version of HADOOP are you using? There could be attempt_id
directories in mapred/local, pre 0.17. Now, there should not be any such
directories.
From version 0.17 onwards, the attempt directories will be present only
at mapred/local/taskTracker/jobCache/jobid/attempid . If
give some pointers on how to debug it further.
Regards
Sandhya
On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
amar...@yahoo-inc.com wrote:
Hi Sandhya,
Which version of HADOOP are you using? There could be attempt_id
directories in mapred/local, pre 0.17. Now, there should not be any
Hi Lance,
Where are you passing the -libjars parameter? It is now GenericOption.
It is no more a parameter for jar command.
Thanks
Amareshwari
Lance Riedel wrote:
We are trying to upgrade to .20 from 19.1 due to several issues we are
having. Now are jobs are failing with class not found
HRoger wrote:
Hi
As you know in the org.apache.hadoop.mapred.jobcontrol.Job there is a
method called addDependingJob but not in
org.apache.hadoop.mapreduce.Job.Is there some method works like
addDependingJob in mapreduce package?
org.apache.hadoop.mapred.jobcontrol.Job is moved to
one job ran after the other job in one class with the new
api?
Amareshwari Sriramadasu wrote:
HRoger wrote:
Hi
As you know in the org.apache.hadoop.mapred.jobcontrol.Job there is a
method called addDependingJob but not in
org.apache.hadoop.mapreduce.Job.Is there some method works like
Is your jar file in local file system or hdfs?
The jar file should be in local fs.
Thanks
Amareshwari
Shravan Mahankali wrote:
Am as well having similar... there is no solution yet!!!
Thank You,
Shravan Kumar. M
Catalytic Software Ltd. [SEI-CMMI Level 5 Company]
-
Hi Akhil,
DistributedCache.addCacheArchive takes path on hdfs. From your code, it looks
like you are passing local path.
Also, if you want to create symlink, you should pass URI as hdfs://path#linkname, besides calling
DistributedCache.createSymlink(conf);
Thanks
Amareshwari
akhil1988
but still getting the same error:
DistributedCache.addCacheArchive(new
URI(/home/akhil1988/Config.zip#Config), conf);
Do you think whether there should be any problem in distributing a zipped
directory and then hadoop unzipping it recursively.
Thanks!
Akhil
Amareshwari Sriramadasu wrote:
Hi
84 matches
Mail list logo