Re: Hadoop streaming question

2008-03-11 Thread Amareshwari Sriramadasu
Hi Andrey, I think that is classpath problem. Can you try using patch at https://issues.apache.org/jira/browse/HADOOP-2622 and see you still have the problem? Thanks Amareshwari. Andrey Pankov wrote: Hi all, I'm still new to Hadoop. I'd like to use Hadoop streaming in order to combine

Re: streaming problem

2008-03-18 Thread Amareshwari Sriramadasu
Hi Andreas, Looks like your mapper is not available to the streaming jar. Where is your mapper script? Did you use distributed cache to distribute the mapper? You can use -file mapper-script-path on local fs to make it part of jar. or Use -cacheFile /dist/wordloadmf#workloadmf to distribute the

Re: Hadoop streaming cacheArchive

2008-03-20 Thread Amareshwari Sriramadasu
Norbert Burger wrote: I'm trying to use the cacheArchive command-line options with the hadoop-0.15.3-streaming.jar. I'm using the option as follows: -cacheArchive hdfs://host:50001/user/root/lib.jar#lib Unfortunately, my PERL scripts fail with an error consistent with not being able to find

Re: Hadoop streaming performance problem

2008-04-01 Thread Amareshwari Sriramadasu
LineRecordReader.readLine() is deprecated by HADOOP-2285(http://issues.apache.org/jira/browse/HADOOP-2285) because it was slow. But streaming still uses the method. HADOOP-2826 (http://issues.apache.org/jira/browse/HADOOP-2826) will remove the usage in streaming. This change should improve

Re: Question on how to view the counters of jobs in the job tracker history

2008-04-07 Thread Amareshwari Sriramadasu
Arun C Murthy wrote: On Apr 3, 2008, at 5:36 PM, Jason Venner wrote: For the first day or so, when the jobs are viewable via the main page of the job tracker web interface, the jobs specific counters are also visible. Once the job is only visible in the history page, the counters are not

Re: Newbie InputFormat Question

2008-05-08 Thread Amareshwari Sriramadasu
You can have a look at TextInputFormat, KeyValueTextInputFormat etc at http://svn.apache.org/viewvc/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/ coneybeare wrote: I want to alter the default key, line input format to be key, line number: + line so that my mapper can have a reference

Re: External Jar

2008-05-29 Thread Amareshwari Sriramadasu
You can put your external jar in DistributedCache. and do symlink the jar in the current working directory of the task giving the value of mapred.create.symlink as true. More details can be found at http://issues.apache.org/jira/browse/HADOOP-1660. The jar can also be added to classpath

Re: Why is there a seperate map and reduce task capacity?

2008-06-16 Thread Amareshwari Sriramadasu
Taeho Kang wrote: Set mapred.tasktracker.tasks.maximum and each node will be able to process N number of tasks - map or/and reduce. Please note that once you set mapred.tasktracker.tasks.maximum, mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum setting will not

Re: Tasktrackers job cache directories not always cleaned up

2008-07-09 Thread Amareshwari Sriramadasu
The proposal on http://issues.apache.org/jira/browse/HADOOP-3386 takes care of this. Thanks Amareshwari Amareshwari Sriramadasu wrote: If task tracker didn't receive KillJobAction, its true that job directory will not removed. And your observation is correct that some task trackers didn't

Re: JobTracker History data+analysis

2008-07-27 Thread Amareshwari Sriramadasu
Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if it make sense? Thanks Amareshwari Paco NATHAN wrote: We have a need to access data found in the JobTracker History link. Specifically in the Analyse This Job analysis. Must be run in Java, between jobs, in the same code

Re: JobTracker History data+analysis

2008-07-28 Thread Amareshwari Sriramadasu
Amareshwari Paco NATHAN wrote: Thank you, Amareshwari - That helps. Hadn't noticed HistoryViewer before. It has no JavaDoc. What is a typical usage? In other words, what would be the outputDir value in the context of ToolRunner, JobClient, etc. ? Paco On Sun, Jul 27, 2008 at 11:48 PM, Amareshwari

Re: JobTracker History data+analysis

2008-07-28 Thread Amareshwari Sriramadasu
for the same, https://issues.apache.org/jira/browse/HADOOP-3850. You can give you inputs there. Thanks Amareshwari Paco On Mon, Jul 28, 2008 at 1:42 AM, Amareshwari Sriramadasu [EMAIL PROTECTED] wrote: HistoryViewer is used in JobClient to view the history files in the directory provided

Re: Where can i download hadoop-0.17.1-examples.jar

2008-07-30 Thread Amareshwari Sriramadasu
Hi Srilatha, You can download hadoop release tar ball from http://hadoop.apache.org/core/releases.html You will find hadoop-*-examples.jar when you untar it. Thanks, Amareshwari us latha wrote: HI All, Trying to run the wordcount example on single node hadoop setup. Could anyone please

Re: Running mapred job from remote machine to a pseudo-distributed hadoop

2008-08-03 Thread Amareshwari Sriramadasu
Arv Mistry wrote: I'll try again, can anyone tell me should it be possible to run hadoop in a pseudo-distributed mode (i.e. everything on one machine) and then submit a mapred job using the ToolRunner from another machine on that hadoop configuration? Cheers Arv Yes. It is possible to do.

Re: Could not find any valid local directory for task

2008-08-03 Thread Amareshwari Sriramadasu
The error Could not find any valid local directory for task means that the task could not find a local directory to write file, mostly because there is no enough space on any of the disks. Thanks Amareshwari Shirley Cohen wrote: Hi, Does anyone know what the following error means?

Re: mapper input file name

2008-08-03 Thread Amareshwari Sriramadasu
You can get the file name accessed by the mapper using the config property map.input.file Thanks Amareshwari Deyaa Adranale wrote: Hi, I need to know inside my mapper, the name of the file that contains the current record. I saw that I can access the name of the input directories inside

Re: help,error ...failed to report status for xxx seconds...

2008-08-03 Thread Amareshwari Sriramadasu
The Mapred framework kills the map/reduce tasks if they dont report status within 10 minutes. If your mapper/reducer needs more time they should report status using http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Reporter.html More documentation at

Re: input files

2008-08-20 Thread Amareshwari Sriramadasu
You can add more paths to input using FileInputFormat.addInputPath(JobConf, Path). You can also specify comma separated filenames as input path using FileInputFormat.setInputPaths(JobConf, String commaSeparatedPaths) More details at

Re: Logging best practices?

2008-09-08 Thread Amareshwari Sriramadasu
Per Jacobsson wrote: Hi all. I've got a beginner question: Are there any best practices for how to do logging from a task? Essentially I want to log warning messages under certain conditions in my map and reduce tasks, and be able to review them later. stdout, stderr and the logs using

Re: streaming question

2008-09-14 Thread Amareshwari Sriramadasu
Dennis Kubes wrote: If I understand what you are asking you can use the -cacheArchive with the path to the jar to including the jar file in the classpath of your streaming job. Dennis You can also use -cacheArchive option to include jar file and symlink the unjarred directory from cwd by

Re: streaming question

2008-09-16 Thread Amareshwari Sriramadasu
in the local running directory, correct? Just like the cacheFile option? If not how can i then specify which class to use? cheers, Christian Amareshwari Sriramadasu wrote: Dennis Kubes wrote: If I understand what you are asking you can use the -cacheArchive with the path to the jar

Re: LZO and native hadoop libraries

2008-09-30 Thread Amareshwari Sriramadasu
Are you seeing HADOOP-2009? Thanks Amareshwari Nathan Marz wrote: Unfortunately, setting those environment variables did not help my issue. It appears that the HADOOP_LZO_LIBRARY variable is not defined in both LzoCompressor.c and LzoDecompressor.c. Where is this variable supposed to be set?

Re: streaming silently failing when executing binaries with unresolved dependencies

2008-10-02 Thread Amareshwari Sriramadasu
This is because the non-zero exit status of streaming process was not treated as failure until 0.17. In 0.17, you can specify the configuration property stream.non.zero.exit.is.failure as true, to consider the non-zero exit as failure. From 0.18, the default value for/

Re: Using different file systems for Map Reduce job input and output

2008-10-06 Thread Amareshwari Sriramadasu
Hi Naama, Yes. It is possible to specify using the apis FileInputFormat#setInputPaths(), FileOutputFormat#setOutputPath(). You can specify the FileSystem uri for the path. Thanks, Amareshwari Naama Kraus wrote: Hi, I wanted to know if it is possible to use different file systems for Map

Re: Add jar file via -libjars - giving errors

2008-10-06 Thread Amareshwari Sriramadasu
Hi, From 0.19, the jars added using -libjars are available on the client classpath also, fixed by HADOOP-3570. Thanks Amareshwari Mahadev Konar wrote: HI Tarandeep, the libjars options does not add the jar on the client side. Their is an open jira for that ( id ont remember which one)...

Re: Problems running the Hadoop Quickstart

2008-10-20 Thread Amareshwari Sriramadasu
Has your task-tracker started? I mean, do you see non-zero nodes on your job tracker UI? -Amareshwari John Babilon wrote: Hello, I've been trying to get Hadoop up and running on a Windows Desktop running Windows XP. I've installed Cygwin and Hadoop. I run the start-all.sh script, it

Re: How do I include customized InputFormat, InputSplit and RecordReader in a C++ pipes job?

2008-10-28 Thread Amareshwari Sriramadasu
Hi, How are you passing your classes to the pipes job? If you are passing them as a jar file, you can use -libjars option. From branch 0.19, the libjar files are added to the client classpath also. Thanks Amareshwari Zhengguo 'Mike' SUN wrote: Hi, I implemented customized classes for

Re: How do I include customized InputFormat, InputSplit and RecordReader in a C++ pipes job?

2008-10-29 Thread Amareshwari Sriramadasu
ways to do that? Thanks Mike From: Amareshwari Sriramadasu [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Tuesday, October 28, 2008 11:58:33 PM Subject: Re: How do I include customized InputFormat, InputSplit and RecordReader in a C++ pipes job? Hi, How

Re: Debugging / Logging in Hadoop?

2008-10-31 Thread Amareshwari Sriramadasu
Some more links: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Other+Useful+Features http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Debugging -Amareshwari Arun C Murthy wrote: On Oct 30, 2008, at 1:16 PM, Scott Whitecross wrote: Is the presentation online

Re: _temporary directories not deleted

2008-11-04 Thread Amareshwari Sriramadasu
Nathan Marz wrote: Hello all, Occasionally when running jobs, Hadoop fails to clean up the _temporary directories it has left behind. This only appears to happen when a task is killed (aka a speculative execution), and the data that task has outputted so far is not cleaned up. Is this a

Re: reading input for a map function from 2 different files?

2008-11-09 Thread Amareshwari Sriramadasu
some speed wrote: I was wondering if it was possible to read the input for a map function from 2 different files: 1st file --- user-input file from a particular location(path) 2nd file=--- A resultant file (has just one key,value pair) from a previous MapReduce job. (I am implementing a chain

Re: distributed cache

2008-11-11 Thread Amareshwari Sriramadasu
Jeremy Pinkham wrote: We are using the distributed cache in one of our jobs and have noticed that the local copies on all of the task nodes never seem to get cleaned up. Is there a mechanism in the API to tell the framework that those copies are no longer needed so they can be deleted. I've

Re: NLine Input Format

2008-11-16 Thread Amareshwari Sriramadasu
Hi Rahul, How did you set the configuration mapred.line.input.format.linespermap and your input format? You have to set them in hadoop-site.xml or pass them through -D option to the job. NLineInputFormat will split N lines of input as one split. So, each map gets N lines. But the RecordReader

Re: NLine Input Format

2008-11-19 Thread Amareshwari Sriramadasu
that it returns the value as N Lines? Setting Configuration in run() method will also work. You have to extend LineRecordReader and override method next() to return N lines as value instead of 1 line. Thanks Amareshwari Thanks Rahul On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu [EMAIL

Re: NLine Input Format

2008-11-19 Thread Amareshwari Sriramadasu
that it returns the value as N Lines? Thanks Rahul On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi Rahul, How did you set the configuration mapred.line.input.format.linespermap and your input format? You have to set them

Re: Newbie: error=24, Too many open files

2008-11-23 Thread Amareshwari Sriramadasu
tim robertson wrote: Hi all, I am running MR which is scanning 130M records and then trying to group them into around 64,000 files. The Map does the grouping of the record by determining the key, and then I use a MultipleTextOutputFormat to write the file based on the key: @Override

Re: how can I decommission nodes on-the-fly?

2008-11-25 Thread Amareshwari Sriramadasu
Jeremy Chow wrote: Hi list, I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then refreshed my cluster with command bin/hadoop dfsadmin -refreshNodes It showed that it can only shut down the DataNode process but not included the TaskTracker process on each

Re: Error with Sequence File in hadoop-18

2008-11-27 Thread Amareshwari Sriramadasu
Message- From: Amareshwari Sriramadasu [mailto:[EMAIL PROTECTED] Sent: Friday, November 28, 2008 10:56 AM To: core-user@hadoop.apache.org Subject: Re: Error with Sequence File in hadoop-18 It got fixed in 0.18.3 (HADOOP-4499). -Amareshwari Palleti, Pallavi wrote: Hi, I am getting Check sum

Re: Optimized way

2008-12-04 Thread Amareshwari Sriramadasu
Hi Aayush, Do you want one map to run one command? You can give input file consisting of lines of file outputfile. Use NLineInputFormat which splits N lines of input as one split. i.e gives N lines to one map for processing. By default, N is one. Then your map can just run the shell command

Re: Reducing Hadoop Logs

2008-12-09 Thread Amareshwari Sriramadasu
Arv Mistry wrote: I'm using hadoop 0.17.0. Unfortunately I cant upgrade to 0.19.0 just yet. I'm trying to control the amount of extraneous files. I noticed there are the following log files produced by hadoop; On Slave - userlogs (for each map/reduce job)

Re: Failed to start TaskTracker server

2008-12-22 Thread Amareshwari Sriramadasu
You can set the configuration property mapred.task.tracker.http.address to 0.0.0.0:0 . If the port is given as 0, then the server will start on a free port. Thanks Amareshwari Sagar Naik wrote: - check hadoop-default.xml in here u will find all the ports used. Copy the xml-nodes from

Re: Reduce not completing

2008-12-23 Thread Amareshwari Sriramadasu
You can report status from streaming job by emitting reporter:status:message in stderr. See documentation @ http://hadoop.apache.org/core/docs/r0.18.2/streaming.html#How+do+I+update+status+in+streaming+applications%3F But from the exception trace, it doesn't look like lack of

Re: Reduce not completing

2008-12-23 Thread Amareshwari Sriramadasu
org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200812221742_0075_r_00_2' from 'tracker_hnode1.cor.mystrands.in:localhost/127.0.0.1:37971' Thanks, RDH On Dec 23, 2008, at 1:00 AM, Amareshwari Sriramadasu wrote: You can report status from streaming job by emitting reporter:status:message

Re: OutofMemory Error, inspite of large amounts provided

2008-12-28 Thread Amareshwari Sriramadasu
Saptarshi Guha wrote: Caught it in action. Running ps -e -o 'vsz pid ruser args' |sort -nr|head -5 on a machine where the map task was running 04812 16962 sguha/home/godhuli/custom/jdk1.6.0_11/jre/bin/java

Re: Does anyone have a working example for using MapFiles on the DistributedCache?

2008-12-28 Thread Amareshwari Sriramadasu
Sean Shanny wrote: To all, Version: hadoop-0.17.2.1-core.jar I have created a MapFile. What I don't seem to be able to do is correctly place the MapFile in the DistributedCache and the make use of it in a map method. I need the following info please: 1.How and where to place the

Re: Problem loading hadoop-site.xml - dumping parameters

2008-12-29 Thread Amareshwari Sriramadasu
Saptarshi Guha wrote: Hello, I had previously emailed regarding heap size issue and have discovered that the hadoop-site.xml is not loading completely, i.e Configuration defaults = new Configuration(); JobConf jobConf = new JobConf(defaults, XYZ.class);

Re: hadoop job -history

2009-01-15 Thread Amareshwari Sriramadasu
jobOutputDir is the location specified by the configuration property hadoop.job.history.user.location. If you don't specify anything for the property, the job history logs will be created in job's output directory. So, to view your history give your jobOutputDir, if you havent specified any

Re: streaming question.

2009-01-18 Thread Amareshwari Sriramadasu
You can also have a look at NLineInputFormat. @http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html Thanks Amareshwari Abdul Qadeer wrote: Dmitry, If you are talking about Text data, then the splits can be anywhere. But LineRecordReader will take

Re: Calling a mapreduce job from inside another

2009-01-18 Thread Amareshwari Sriramadasu
You can use Job Control. See http://hadoop.apache.org/core/docs/r0.19.0/mapred_tutorial.html#Job+Control http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/jobcontrol/Job.html and

Re: How to debug a MapReduce application

2009-01-18 Thread Amareshwari Sriramadasu
From the exception you pasted, it looks like your io.serializations did not set the SerializationFactory properly. Do you see any logs on your console for adding serialization class? Can you try running your app on pseudo distributed mode, instead of LocalJobRunner ? You can find pseudo

Re: NLineInputFormat and very high number of maptasks

2009-01-20 Thread Amareshwari Sriramadasu
Saptarshi Guha wrote: Sorry, i see - every line is now a maptask - one split,one task.(in this case N=1 line per split) Is that correct? Saptarshi You are right. NLineInputFormat splits N lines of input as one split and each split is given to a map task. By default, N is 1. N can configured

Re: Debugging in Hadoop

2009-01-26 Thread Amareshwari Sriramadasu
patektek wrote: Hello list, I am trying to add some functionality to Hadoop-core and I am having serious issues debugging it. I have searched in the list archive and still have not been able to resolve the issues. Simple question: If I want to insert LOG.INFO() statements in Hadoop code is not

Re: Interrupting JobClient.runJob

2009-01-27 Thread Amareshwari Sriramadasu
Edwin wrote: Hi I am looking for a way to interrupt a thread that entered JobClient.runJob(). The runJob() method keep polling the JobTracker until the job is completed. After reading the source code, I know that the InterruptException is caught in runJob(). Thus, I can't interrupt it using

Re: Hadoop Streaming Semantics

2009-01-29 Thread Amareshwari Sriramadasu
You can use NLineInputFormat for this, which splits one line (N=1, by default) as one split. So, each map task processes one line. See http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html -Amareshwari S D wrote: Hello, I have a clarifying

Re: Counters in Hadoop

2009-01-29 Thread Amareshwari Sriramadasu
Kris Jirapinyo wrote: Hi all, I am using counters in Hadoop via the reporter. I can see this custom counter fine after I run my job. However, if somehow I restart the cluster, then when I look into the Hadoop Job History, I can't seem to find the information of my previous counter values

Re: [ANNOUNCE] Hadoop release 0.18.3 available

2009-01-30 Thread Amareshwari Sriramadasu
Anum Ali wrote: Hi, Need some kind of guidance related to started with Hadoop Installation and system setup. Iam newbie regarding to Hadoop . Our system OS is Fedora 8, should I start from a stable release of Hadoop or get it from svn developing version (from contribute site). Thank You

Re: Hadoop Streaming Semantics

2009-02-01 Thread Amareshwari Sriramadasu
approach, can you point me to an example of what kind of param should be specified? I appreciate your help. Thanks, SD On Thu, Jan 29, 2009 at 10:49 PM, Amareshwari Sriramadasu amar...@yahoo-inc.com wrote: You can use NLineInputFormat for this, which splits one line (N=1, by default) as one

Re: Hadoop Streaming Semantics

2009-02-02 Thread Amareshwari Sriramadasu
.) -Amareshwari Any thoughts? John On Sun, Feb 1, 2009 at 11:00 PM, Amareshwari Sriramadasu amar...@yahoo-inc.com wrote: Which version of hadoop are you using? You can directly use -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat for your streaming job. You need not include

Re: Task tracker archive contains too many files

2009-02-04 Thread Amareshwari Sriramadasu
Andrew wrote: I've noticed that task tracker moves all unpacked jars into ${hadoop.tmp.dir}/mapred/local/taskTracker. We are using a lot of external libraries, that are deployed via -libjars option. The total number of files after unpacking is about 20 thousands. After running a number of

Re: only one reducer running in a hadoop cluster

2009-02-08 Thread Amareshwari Sriramadasu
Nick Cen wrote: Hi, I hava a hadoop cluster with 4 pc. And I wanna to integrate hadoop and lucene together, so i copy some of the source code from nutch's Indexer class, but when i run my job, i found that there is only 1 reducer running on 1 pc, so the performance is not as far as expect.

Re: Testing with Distributed Cache

2009-02-10 Thread Amareshwari Sriramadasu
Nathan Marz wrote: I have some unit tests which run MapReduce jobs and test the inputs/outputs in standalone mode. I recently started using DistributedCache in one of these jobs, but now my tests fail with errors such as: Caused by: java.io.IOException: Incomplete HDFS URI, no host:

Re: Persistent completed jobs status not showing in jobtracker UI

2009-02-18 Thread Amareshwari Sriramadasu
Bill Au wrote: I have enabled persistent completed jobs status and can see them in HDFS. However, they are not listed in the jobtracker's UI after the jobtracker is restarted. I thought that jobtracker will automatically look in HDFS if it does not find a job in its memory cache. What am I

Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf

2009-02-18 Thread Amareshwari Sriramadasu
Yes. The configuration is read only when the taskTracker starts. You can see more discussion on jira HADOOP-5170 (http://issues.apache.org/jira/browse/HADOOP-5170) for making it per job. -Amareshwari jason hadoop wrote: I certainly hope it changes but I am unaware that it is in the todo queue

Re: How to use Hadoop API to submit job?

2009-02-20 Thread Amareshwari Sriramadasu
You should implement Tool interface and submit jobs. For example see org.apache.hadoop.examples.WordCount -Amareshwari Wu Wei wrote: Hi, I used to submit Hadoop job with the utility RunJar.main() on hadoop 0.18. On hadoop 0.19, because the commandLineConfig of JobClient was null, I got a

Re: Hadoop Streaming -file option

2009-02-24 Thread Amareshwari Sriramadasu
Arun C Murthy wrote: On Feb 23, 2009, at 2:01 AM, Bing TANG wrote: Hi, everyone, Could somdone tell me the principle of -file when using Hadoop Streaming. I want to ship a big file to Slaves, so how it works? Hadoop uses SCP to copy? How does Hadoop deal with -file option? No, -file just

Re: FAILED_UNCLEAN?

2009-02-24 Thread Amareshwari Sriramadasu
Nathan Marz wrote: I have a large job operating on over 2 TB of data, with about 5 input splits. For some reason (as yet unknown), tasks started failing on two of the machines (which got blacklisted). 13 mappers failed in total. Of those 13, 8 of the tasks were able to execute on another

Re: wordcount getting slower with more mappers and reducers?

2009-03-05 Thread Amareshwari Sriramadasu
Are you hitting HADOOP-2771? -Amareshwari Sandy wrote: Hello all, For the sake of benchmarking, I ran the standard hadoop wordcount example on an input file using 2, 4, and 8 mappers and reducers for my job. In other words, I do: time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m

Re: Throwing an IOException in Map, yet task does not fail

2009-03-05 Thread Amareshwari Sriramadasu
Is your job a streaming job? If so, Which version of hadoop are you using? what is the configured value for stream.non.zero.exit.is.failure? Can you see stream.non.zero.exit.is.failure to true and try again? Thanks Amareshwari Saptarshi Guha wrote: Hello, I have given a case where my mapper

Re: Jobs stalling forever

2009-03-10 Thread Amareshwari Sriramadasu
This is due to HADOOP-5233. Got fixed in branch 0.19.2 -Amareshwari Nathan Marz wrote: Every now and then, I have jobs that stall forever with one map task remaining. The last map task remaining says it is at 100% and in the logs, it says it is in the process of committing. However, the task

Re: streaming inputformat: class not found

2009-03-11 Thread Amareshwari Sriramadasu
Till 0.18.x, files are not added to client-side classpath. Use 0.19, and run following command to use custom input format bin/hadoop jar contrib/streaming/hadoop-0.19.0-streaming.jar -mapper mapper.pl -reducer org.apache.hadoop.mapred.lib.IdentityReducer -input test.data -output test-output

Re: Reducers spawned when mapred.reduce.tasks=0

2009-03-15 Thread Amareshwari Sriramadasu
into future releases. cheers, ckw On Mar 12, 2009, at 8:20 PM, Amareshwari Sriramadasu wrote: Are you seeing reducers getting spawned from web ui? then, it is a bug. If not, there won't be reducers spawned, it could be job-setup/ job-cleanup task that is running on a reduce slot. See HADOOP-3150

Re: Task Side Effect files and copying(getWorkOutputPath)

2009-03-16 Thread Amareshwari Sriramadasu
Saptarshi Guha wrote: Hello, I would like to produce side effect files which will be later copied to the outputfolder. I am using FileOuputFormat, and in the Map's close() method i copy files (from the local tmp/ folder) to FileOutputFormat.getWorkOutputPath(job);

Re: Unable to access job details

2009-03-22 Thread Amareshwari Sriramadasu
Can you look for Exception from jetty in JT logs and report here? That would tell us the cause for ERROR 500. Thanks Amareshwari Nathan Marz wrote: Sometimes I am unable to access a job's details and instead only see. I am seeing this on 0.19.2 branch. HTTP ERROR: 500 Internal Server Error

Re: reduce task failing after 24 hours waiting

2009-03-25 Thread Amareshwari Sriramadasu
Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reason for failure, though I'm not sure. Thanks Amareshwari Billy Pearson wrote: I am seeing on one of my long running jobs about 50-60 hours

Re: job status from command prompt

2009-04-05 Thread Amareshwari Sriramadasu
Elia Mazzawi wrote: is there a command that i can run from the shell that says this job passed / failed I found these but they don't really say pass/fail they only say what is running and percent complete. this shows what is running ./hadoop job -list and this shows the completion ./hadoop

Re: Hadoop streaming performance: elements vs. vectors

2009-04-05 Thread Amareshwari Sriramadasu
You can add your jar to distributed cache and add it to classpath by passing it in configuration propery - mapred.job.classpath.archives. -Amareshwari Peter Skomoroch wrote: If I need to use a custom streaming combiner jar in Hadoop 18.3, is there a way to add it to the classpath without the

Re: intermediate files of killed tasks not purged

2009-04-28 Thread Amareshwari Sriramadasu
Hi Sandhya, Which version of HADOOP are you using? There could be attempt_id directories in mapred/local, pre 0.17. Now, there should not be any such directories. From version 0.17 onwards, the attempt directories will be present only at mapred/local/taskTracker/jobCache/jobid/attempid . If

Re: intermediate files of killed tasks not purged

2009-04-28 Thread Amareshwari Sriramadasu
give some pointers on how to debug it further. Regards Sandhya On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu amar...@yahoo-inc.com wrote: Hi Sandhya, Which version of HADOOP are you using? There could be attempt_id directories in mapred/local, pre 0.17. Now, there should not be any

Re: external jars in .20

2009-06-01 Thread Amareshwari Sriramadasu
Hi Lance, Where are you passing the -libjars parameter? It is now GenericOption. It is no more a parameter for jar command. Thanks Amareshwari Lance Riedel wrote: We are trying to upgrade to .20 from 19.1 due to several issues we are having. Now are jobs are failing with class not found

Re: where is the addDependingJob?

2009-06-24 Thread Amareshwari Sriramadasu
HRoger wrote: Hi As you know in the org.apache.hadoop.mapred.jobcontrol.Job there is a method called addDependingJob but not in org.apache.hadoop.mapreduce.Job.Is there some method works like addDependingJob in mapreduce package? org.apache.hadoop.mapred.jobcontrol.Job is moved to

Re: where is the addDependingJob?

2009-06-24 Thread Amareshwari Sriramadasu
one job ran after the other job in one class with the new api? Amareshwari Sriramadasu wrote: HRoger wrote: Hi As you know in the org.apache.hadoop.mapred.jobcontrol.Job there is a method called addDependingJob but not in org.apache.hadoop.mapreduce.Job.Is there some method works like

Re: Unable to run Jar file in Hadoop.

2009-06-25 Thread Amareshwari Sriramadasu
Is your jar file in local file system or hdfs? The jar file should be in local fs. Thanks Amareshwari Shravan Mahankali wrote: Am as well having similar... there is no solution yet!!! Thank You, Shravan Kumar. M Catalytic Software Ltd. [SEI-CMMI Level 5 Company] -

Re: Using addCacheArchive

2009-06-25 Thread Amareshwari Sriramadasu
Hi Akhil, DistributedCache.addCacheArchive takes path on hdfs. From your code, it looks like you are passing local path. Also, if you want to create symlink, you should pass URI as hdfs://path#linkname, besides calling DistributedCache.createSymlink(conf); Thanks Amareshwari akhil1988

Re: Using addCacheArchive

2009-06-25 Thread Amareshwari Sriramadasu
but still getting the same error: DistributedCache.addCacheArchive(new URI(/home/akhil1988/Config.zip#Config), conf); Do you think whether there should be any problem in distributing a zipped directory and then hadoop unzipping it recursively. Thanks! Akhil Amareshwari Sriramadasu wrote: Hi