Re: Finding where the file blocks are

2009-05-19 Thread Arun C Murthy
On May 19, 2009, at 12:13 AM, Foss User wrote: I know that if a file is very large, it will be split into blocks and the blocks would be spread out in various data nodes. I want to know whether I can find out through GUI or logs exactly where which data nodes contain which file blocks of a

Winning a sixty second dash with a yellow elephant

2009-05-11 Thread Arun C Murthy
... oh, and getting it to run a marathon too! http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html Owen Arun

Re: specific block size for a file

2009-05-05 Thread Arun C Murthy
On May 5, 2009, at 4:47 AM, Christian Ulrik Søttrup wrote: Hi all, I have a job that creates very big local files so i need to split it to as many mappers as possible. Now the DFS block size I'm using means that this job is only split to 3 mappers. I don't want to change the hdfs wide

Re: What's the difference of RawLocalFileSystem and LocalFileSystem?

2009-04-20 Thread Arun C Murthy
On Apr 20, 2009, at 7:49 PM, Xie, Tao wrote: I am new to hadoop and now begin to look into the code. I want to know the difference between RawLocalFileSystem and LocalFileSystem. I know the latter one has the capability to do checksum. Is that all? Pretty much. Arun

Re: Performance question

2009-04-20 Thread Arun C Murthy
On Apr 20, 2009, at 9:56 AM, Mark Kerzner wrote: Hi, I ran a Hadoop MapReduce task in the local mode, reading and writing from HDFS, and it took 2.5 minutes. Essentially the same operations on the local file system without MapReduce took 1/2 minute. Is this to be expected? Hmm...

Re: Reduce task attempt retry strategy

2009-04-14 Thread Arun C Murthy
On Apr 14, 2009, at 9:11 AM, Jothi Padmanabhan wrote: 2. Framework kills the task because it did not progress enough That should count as a 'failed' task, not 'killed' - it is a bug if we are not counting timed-out tasks against the job... Arun

Re: why SequenceFile cannot run without native GZipCodec?

2009-04-05 Thread Arun C Murthy
On Apr 4, 2009, at 7:05 AM, Zheng Shao wrote: I guess the performance will be bad, but we should still be able to read/write the file. Correct? Why do we throw an Exception? java.util.zip.GzipCodec doesn't expose the underlying codec... that's critical to do a *reset*. The native

Re: wordcount getting slower with more mappers and reducers?

2009-03-05 Thread Arun C Murthy
I assume you have only 2 map and 2 reduce slots per tasktracker - which totals to 2 maps/reduces for you cluster. This means with more maps/reduces they are serialized to 2 at a time. Also, the -m is only a hint to the JobTracker, you might see less/more than the number of maps you have

Re: OutOfMemory error processing large amounts of gz files

2009-02-26 Thread Arun C Murthy
On Feb 24, 2009, at 4:03 PM, bzheng wrote: 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: java.lang.OutOfMemoryError: Java heap space That tells that that your TaskTracker is running out of memory, not your reduce tasks. I think you are hitting

Re: Hadoop Streaming -file option

2009-02-23 Thread Arun C Murthy
On Feb 23, 2009, at 2:01 AM, Bing TANG wrote: Hi, everyone, Could somdone tell me the principle of -file when using Hadoop Streaming. I want to ship a big file to Slaves, so how it works? Hadoop uses SCP to copy? How does Hadoop deal with -file option? No, -file just copies the file from

Re: Loading native libraries

2009-02-11 Thread Arun C Murthy
On Feb 10, 2009, at 12:24 PM, Mimi Sun wrote: I see UnsatisfiedLinkError. Also I'm calling System.getProperty(java.library.path) in the reducer and logging it. The only thing that prints out is ...hadoop-0.18.2/bin/../lib/ native/Mac_OS_X-i386-32 I'm using Cascading, not sure if that

Re: Loading native libraries

2009-02-10 Thread Arun C Murthy
On Feb 10, 2009, at 11:06 AM, Mimi Sun wrote: Hi, I'm new to Hadoop and I'm wondering what the recommended method is for using native libraries in mapred jobs. I've tried the following separately: 1. set LD_LIBRARY_PATH in .bashrc 2. set LD_LIBRARY_PATH and JAVA_LIBRARY_PATH in

Re: Reduce won't start until Map stage reaches 100%?

2009-02-09 Thread Arun C Murthy
On Feb 8, 2009, at 11:26 PM, Taeho Kang wrote: Dear All, With Hadoop 0.19.0, Reduce stage does not start until Map stage gets to the 100% completion. Has anyone faced the similar situation? How many maps and reduces does your job have? Arun

Re: Completed jobs not finishing, errors in jobtracker logs

2009-02-07 Thread Arun C Murthy
On Feb 6, 2009, at 12:39 PM, Bryan Duxbury wrote: I'm seeing some strange behavior on my cluster. Jobs will be done (that is, all tasks completed), but the job will still be running. This state seems to persist for minutes, and is really killing my throughput. I'm seeing errors

Re: Reporter for Hadoop Streaming?

2009-02-05 Thread Arun C Murthy
On Feb 5, 2009, at 1:40 PM, S D wrote: Is there a way to use the Reporter interface (or something similar such as Counters) with Hadoop streaming? Alternatively, can how could STDOUT be intercepted for the purpose of updates? If anyone could point me to documentation or examples that cover

Re: job management in Hadoop

2009-01-30 Thread Arun C Murthy
On Jan 30, 2009, at 2:41 PM, Bill Au wrote: Is there any way to cancel a job after it has been submitted? bin/hadoop job -kill jobid Arun

Re: getting null from CompressionCodecFactory.getCodec(Path file)

2009-01-13 Thread Arun C Murthy
On Jan 13, 2009, at 7:29 AM, Gert Pfeifer wrote: Hi, I want to use an lzo file as input for a mapper. The record reader determines the codec using a CompressionCodecFactory, like this: (Hadoop version 0.19.0) http://hadoop.apache.org/core/docs/r0.19.0/native_libraries.html hth, Arun

Re: A reporter thread during the reduce stage for a long running line

2009-01-09 Thread Arun C Murthy
On Jan 9, 2009, at 12:09 AM, Saptarshi Guha wrote: Hello, Sorry for the puzzling subject. I have a single long running /statement/ in my reduce method, so the the framework might assume my reduce is not responding and kill it. I solved the problem in the map method by subclassing MapRunner,

Re: Out of Memory error in reduce shuffling phase when compression is turned on

2008-12-18 Thread Arun C Murthy
On Dec 18, 2008, at 2:09 PM, Zheng Shao wrote: mapred.compress.map.output is set to true, and the job has 6860 mappers and 300 reducers. Several reducers failed because:out of memory error in the shuffling phase. Error log: 2008-12-18 11:42:46,593 WARN org.apache.hadoop.mapred.ReduceTask:

Re: Reset hadoop servers

2008-12-09 Thread Arun C Murthy
On Dec 9, 2008, at 10:37 AM, Owen O'Malley wrote: On Dec 9, 2008, at 2:22 AM, Devaraj Das wrote: I know that the tasktracker/jobtracker doesn't have any command for re-reading the configuration. There is built-in support for restart/shut-down but those are via external scripts that

Re: getting Configuration object in mapper

2008-12-06 Thread Arun C Murthy
On Dec 5, 2008, at 12:32 PM, Craig Macdonald wrote: I have a related question - I have a class which is both mapper and reducer. How can I tell in configure() if the current task is map or a reduce task? Parse the taskid? Get the taskid, then use

Re: slow shuffle

2008-12-06 Thread Arun C Murthy
On Dec 5, 2008, at 2:43 PM, Songting Chen wrote: To summarize the slow shuffle issue: 1. I think one problem is that the Reducer starts very late in the process, slowing the entire job significantly. Is there a way to let reducer start earlier?

Re: JobTracker Faiing to respond with OutOfMemory error

2008-12-06 Thread Arun C Murthy
On Dec 5, 2008, at 10:58 AM, charles du wrote: Any update on this? What is the available heapsize for the JobTracker? (HADOOP_HEAPSIZE or set it in HADOOP_JOBTRACKER_OPTS in conf/hadoop-env.sh). Do you remember how many total tasks (across all jobs) were executed before the OOM?

Re: JobTracker Faiing to respond with OutOfMemory error

2008-12-06 Thread Arun C Murthy
On Dec 6, 2008, at 11:40 AM, charles du wrote: I used the default value, which I believe is 1000 MB. My cluster has about 30 machines. Each machine is configured to run up to 5 tasks. We run hourly, daily jobs on the cluster. When OOM happened, I was running a job with 1500 - 1600

Re: Can I ignore some errors in map step?

2008-12-03 Thread Arun C Murthy
On Dec 3, 2008, at 5:49 AM, Zhou, Yunqing wrote: I'm running a job on a data with size 5TB. But currently it reports there is a checksum error block in the file. Then it cause a map task failure then the whole job failed. But the lack of a 64MB block will almost not affect the final result. So

Re: Quickstart Docs

2008-11-23 Thread Arun C Murthy
On Nov 23, 2008, at 6:09 AM, Tim Williams wrote: The Quickstart[1] suggests the minimum java version is 1.5.x but I was only successful getting the examples running after using 1.6.Thanks, --tim [1] - http://hadoop.apache.org/core/docs/current/quickstart.html Thanks for pointing this

Re: Dynamically terminate a job once Reporter hits a threshold

2008-11-07 Thread Arun C Murthy
On Nov 7, 2008, at 12:12 PM, Brian MacKay wrote: Looking for a way to dynamically terminate a job once Reporter in a Map job hits a threshold, Example: public void map(WritableComparable key, Text values, Output CollectorText, Text output, Reporter reporter) throws IOException { if(

Re: Debugging / Logging in Hadoop?

2008-10-31 Thread Arun C Murthy
On Oct 30, 2008, at 1:16 PM, Scott Whitecross wrote: Is the presentation online as well? (Hard to see some of the slides in the video) http://wiki.apache.org/hadoop/HadoopPresentations Arun On Oct 30, 2008, at 1:34 PM, Alex Loddengaard wrote: Arun gave a great talk about debugging

Re: TaskTrackers disengaging from JobTracker

2008-10-29 Thread Arun C Murthy
It's possible that the JobTracker is under duress and unable to respond to the TaskTrackers... what do the JobTracker logs say? Arun On Oct 29, 2008, at 12:33 PM, Aaron Kimball wrote: Hi all, I'm working with a 30 node Hadoop cluster that has just started demonstrating some weird behavior.

Re: Merge of the inmemory files threw an exception and diffs between 0.17.2 and 0.18.1

2008-10-28 Thread Arun C Murthy
On Oct 27, 2008, at 7:05 PM, Grant Ingersoll wrote: Hi, Over in Mahout (lucene.a.o/mahout), we are seeing an oddity with some of our clustering code and Hadoop 0.18.1. The thread in context is at: http://mahout.markmail.org/message/vcyvlz2met7fnthr The problem seems to occur when

Re: Passing Constants from One Job to the Next

2008-10-22 Thread Arun C Murthy
On Oct 22, 2008, at 2:52 PM, Yih Sun Khoo wrote: I like to hear some good ways of passing constants from one job to the next. Unless I'm missing something: JobConf? A HDFS file? DistributedCache? Arun These are some ways that I can think of: 1) The obvious solution is to carry the

Re: Gets a number of reduce_output_records

2008-10-14 Thread Arun C Murthy
On Oct 10, 2008, at 12:52 AM, Edward J. Yoon wrote: Hi, To get a number of reduce_output_records, I was write code as: long rows = rJob.getCounters().findCounter( org.apache.hadoop.mapred.Task$Counter, 8, REDUCE_OUTPUT_RECORDS) .getCounter();

Re: How to make LZO work?

2008-10-10 Thread Arun C Murthy
. There is no liblzo2.so there. Do I need to rename them to liblzo2.so somehow? --- On Thu, 10/9/08, Arun C Murthy [EMAIL PROTECTED] wrote: From: Arun C Murthy [EMAIL PROTECTED] Subject: Re: How to make LZO work? To: core-user@hadoop.apache.org Date: Thursday, October 9, 2008, 6:35 PM On Oct 9

Re: How to make LZO work?

2008-10-10 Thread Arun C Murthy
= SequenceFile.createWriter(fileSys, jobConf, file, LongWritable.class, BytesWritable.class, SequenceFile.CompressionType.BLOCK, new LzoCodec()); Rebuilding the library gave some weird error too. --- On Fri, 10/10/08, Arun C Murthy [EMAIL PROTECTED] wrote: From: Arun C

Re: How to make LZO work?

2008-10-09 Thread Arun C Murthy
On Oct 9, 2008, at 5:58 PM, Songting Chen wrote: Hi, I have installed lzo-2.03 to my Linux box. But still my code for writing a SequenceFile using LZOcodec returns the following error: util.NativeCodeLoader: Loaded the native-hadoop library java.lang.UnsatisfiedLinkError: Cannot load

Re: Sharing an object across mappers

2008-10-03 Thread Arun C Murthy
On Oct 3, 2008, at 1:10 AM, Devajyoti Sarkar wrote: Hi Alan, Thanks for your message. The object can be read-only once it is initialized - I do not need to modify Please take a look at DistributedCache: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache An

Re: Turning off FileSystem statistics during MapReduce

2008-10-03 Thread Arun C Murthy
Nathan, On Oct 3, 2008, at 5:18 PM, Nathan Marz wrote: Hello, We have been doing some profiling of our MapReduce jobs, and we are seeing about 20% of the time of our jobs is spent calling FileSystem $Statistics.incrementBytesRead when we interact with the FileSystem. Is there a way to

Re: architecture diagram

2008-10-01 Thread Arun C Murthy
On Oct 1, 2008, at 10:17 AM, Terrence A. Pietrondi wrote: I am trying to plan out my map-reduce implementation and I have some questions of where computation should be split in order to take advantage of the distributed nodes. Looking at the architecture diagram

Re: Merging of the local FS files threw an exception

2008-10-01 Thread Arun C Murthy
On Oct 1, 2008, at 11:07 AM, Per Jacobsson wrote: I ran a job last night with Hadoop 0.18.0 on EC2, using the standard small AMI. The job was producing gzipped output, otherwise I haven't changed the configuration. The final reduce steps failed with this error that I haven't seem before:

Re: LZO and native hadoop libraries

2008-10-01 Thread Arun C Murthy
/jira/browse/HADOOP-3659 You probably will need to monkey with LDFLAGS as well to get it to work, but we've been able to build the native libs for the Mac without too much trouble. Doug Cutting wrote: Arun C Murthy wrote: You need to add libhadoop.so to your java.library.patch. libhadoop.so

Re: Merging of the local FS files threw an exception

2008-10-01 Thread Arun C Murthy
, Oct 1, 2008 at 11:23 AM, Arun C Murthy [EMAIL PROTECTED] wrote: Do you still have the task logs for the reduce? I suspect are running into http://issues.apache.org/jira/browse/HADOOP-3647 which we never could reproduce reliably to pin it down or fix. However, in light of http

Re: LZO and native hadoop libraries

2008-09-30 Thread Arun C Murthy
Nathan, You need to add libhadoop.so to your java.library.patch. libhadoop.so is available in the corresponding release in the lib/ native directory. Arun On Sep 30, 2008, at 11:14 AM, Nathan Marz wrote: I am trying to use SequenceFiles with LZO compression outside the context of a

Re: LZO and native hadoop libraries

2008-09-30 Thread Arun C Murthy
On Sep 30, 2008, at 11:46 AM, Doug Cutting wrote: Arun C Murthy wrote: You need to add libhadoop.so to your java.library.patch. libhadoop.so is available in the corresponding release in the lib/ native directory. I think he needs to first build libhadoop.so, since he appears

Re: rename return values

2008-09-30 Thread Arun C Murthy
On Sep 30, 2008, at 1:37 PM, Bryan Duxbury wrote: Hey all, Why is it that FileSystem.rename returns true or false instead of throwing an exception? It seems incredibly inconvenient to get a false result and then have to go poring over the namenode logs looking for the actual error

Re: Doubt in MultiFileWordCount.java

2008-09-29 Thread Arun C Murthy
On Sep 29, 2008, at 3:11 AM, Geethajini C wrote: Hi everyone, In the example MultiFileWordCount.java (hadoop-0.17.0), what happens when the statement JobClient.runJob(job);is executed. What methods will be called in sequence? This might help:

Re: Jobtracker config?

2008-09-29 Thread Arun C Murthy
On Sep 29, 2008, at 2:52 PM, Saptarshi Guha wrote: Setup: I am running the namenode on A, the sec. namenode on B and the jobtracker on C. The datanodes and tasktrackers are on Z1,Z2,Z3. Problem: However, the jobtracker is starting up on A. Here are my configs for Jobtracker This would

Re: The reduce copier failed

2008-09-26 Thread Arun C Murthy
2008-09-25 17:12:18,250 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200809180916_0027_r_07_2: Got 2 new map-outputs number of known map outputs is 21 2008-09-25 17:12:18,251 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200809180916_0027_r_07_2 Merge of the inmemory files

Re: The reduce copier failed

2008-09-25 Thread Arun C Murthy
On Sep 25, 2008, at 2:26 PM, Joe Shaw wrote: Hi, I'm trying to build an index using the index contrib in Hadoop 0.18.0, but the reduce tasks are consistently failing. What did the logs for the task-attempt 'attempt_200809180916_0027_r_07_2' look like? Did the TIP/Job succeed?

Re: Problems increasing number of tasks per node

2008-09-23 Thread Arun C Murthy
On Sep 23, 2008, at 11:41 AM, Joel Welling wrote: Hi folks; I have a small cluster, but each node is big- 8 cores each, with lots of IO bandwidth. I'd like to increase the number of simultaneous map and reduce tasks scheduled per node from the default of 2 to something like 8. My

Re: setting a different input/output class for combiner function than map and reduce functions

2008-09-23 Thread Arun C Murthy
==map phase== input: key = LongWritable value = Text, output: key = Text, value = Longwritable ==combiner== input: key = Text, value = iteratorLongWritable output: key = Text, value = Text The combiner is a pure optimization and *cannot* change the output types of the map i.e. the combiner

Re: Adding $CLASSPATH to Map/Reduce tasks

2008-09-21 Thread Arun C Murthy
On Sep 21, 2008, at 2:05 PM, David Hall wrote: (New to this list) Hi, My research group is setting up a small (20-node) cluster. All of these machines are linked by NFS. We have a fairly entrenched codebase/development cycle, and in particular we'd like to be able to access user $CLASSPATHs

Re: reduce task progress above 100%?

2008-09-16 Thread Arun C Murthy
On Sep 16, 2008, at 12:26 PM, pvvpr wrote: Hello, A strange thing happened in my job. In reduce phase, one of the tasks status shows 101.44% complete and runs till some 102% and successfully finished back to 100%. Is this a right behavior? Which version of Hadoop are you running; and are

Re: Reduce task failed: org.apache.hadoop.fs.FSError: java.io.IOException

2008-09-11 Thread Arun C Murthy
On Sep 11, 2008, at 9:10 AM, pvvpr wrote: Hello, Never came across this error before. Upgraded to 0.18.0 this morning and ran a nutch fetch job. Got this exception in both the reduce attempts of a task and they failed. All other reducers seemed to work fine, except for one task. Any

Re: Failing MR jobs!

2008-09-09 Thread Arun C Murthy
On Sep 7, 2008, at 12:26 PM, Erik Holstad wrote: Hi! I'm trying to run a MR job, but it keeps on failing and I can't understand why. Sometimes it shows output at 66% and sometimes 98% or so. I had a couple of exception before that I didn't catch that made the job to fail. The log file

Re: Customize job name in command line

2008-08-22 Thread Arun C Murthy
On Aug 22, 2008, at 2:15 PM, Kevin wrote: Why -jobconf is not recognized, and -D is overwritten by the program code? For Hadoop Streaming: -jobconf mapred.job.name=myjob For java Map-Reduce applications: -Dmapred.job.name=myjob Arun Best, -Kevin On Fri, Aug 22, 2008 at 2:05 PM,

Re: Beginners Questions....

2008-08-22 Thread Arun C Murthy
On Aug 22, 2008, at 11:15 AM, Chris Gray wrote: All, I am using Hadoop using a test case set up by Michael Noll found on his web page (http://www.michael-noll.com). I have successfully ran a job on a single Node Cluster from his examples. I am trying to add a additional machine to the

Re: Hadoop over Lustre?

2008-08-21 Thread Arun C Murthy
It wouldn't be too much of a stretch to use Lustre directly... although it isn't trivial either. You'd need to implement the 'FileSystem' interface for Lustre, define a URI scheme (e.g. lfs://) etc. Please take a take a look at the KFS/ S3 implementations. Arun On Aug 21, 2008, at 9:59 AM,

Re: Cannot read reducer values into a list

2008-08-19 Thread Arun C Murthy
On Aug 19, 2008, at 12:17 PM, Stuart Sierra wrote: Hello list, Thought I would share this tidbit that frustrated me for a couple of hours. Beware! Hadoop reuses the Writable objects given to the reducer. For example: Yes. http://issues.apache.org/jira/browse/HADOOP-2399 - fixed in

Re: map tasks and processes

2008-08-15 Thread Arun C Murthy
. On Tue, Aug 12, 2008 at 5:07 PM, Arun C Murthy [EMAIL PROTECTED] wrote: On Aug 12, 2008, at 11:21 AM, charles du wrote: Hi: Does hadoop always start a new process for each map task? Yes. http://issues.apache.org/jira/browse/HADOOP-249 is open to optimize that. Till HADOOP-249 is fixed

Re: When will hadoop version 0.18 be released?

2008-08-13 Thread Arun C Murthy
On Aug 12, 2008, at 11:51 PM, 11 Nov. wrote: Hi colleagues, As you know, the append writer will be available in version 0.18. We are here waiting for the feature and want to know the rough time of release. It's currently under vote, it should be released by the end of the week if it

Re: sort failing, help?

2008-08-12 Thread Arun C Murthy
io.sort.mb and fs.inmemory.size.mb are way too high given you are using default of -200Xmx. Bump both down to 100-200 and up -Xmx to 512M via mapred.child.java.opts. Arun On Aug 12, 2008, at 1:26 PM, James Graham (Greywolf) wrote: Environment specifications: Hadoop 0.16.4 (stable)

Re: Parameter to clean jobcache???

2008-08-12 Thread Arun C Murthy
On Aug 12, 2008, at 2:35 PM, steph wrote: Is there a tunable in hadoop to make it cleans the jobcache entry? JobCache can ve extremely greedy in terms of inodes considering that for each task it starts by unjarring the jars-- which can be big--. Which version of Hadoop are you running?

Re: Finding per-task log files on the cluster

2008-08-12 Thread Arun C Murthy
On Aug 12, 2008, at 11:52 AM, Stuart Sierra wrote: Hello, list, I've seen this question before, but haven't found an answer. If I run a Hadoop job on a cluster (EC2), how can I download the stdout/stderr logs from each mapper/reducer task? Are they stored somewhere in HDFS, or just on the

Re: hadoop running out of inodes

2008-08-12 Thread Arun C Murthy
On Aug 12, 2008, at 11:01 AM, stephanebrossier wrote: Hi, I am using hadoop 0.16.3 in a production environment. We have been using our system for a few weeks already and are constantly running out of inodes. We've fixed related bugs in 0.17.2 - it should be released in the next couple of

Re: Difference between Hadoop Streaming and Normal mode

2008-08-12 Thread Arun C Murthy
On Aug 12, 2008, at 3:15 PM, Ashish Venugopal wrote: There is definitely functionality in normal mode that is not available in streaming, like the ability to write counters to instruments jobs. I personally just use streaming, so I am interested to see if there are further key differences...

Re: map tasks and processes

2008-08-12 Thread Arun C Murthy
On Aug 12, 2008, at 11:21 AM, charles du wrote: Hi: Does hadoop always start a new process for each map task? Yes. http://issues.apache.org/jira/browse/HADOOP-249 is open to optimize that. Till HADOOP-249 is fixed, you could try and launch fewer, fatter maps by doing more work on

Re: Bean Scripting Framework?

2008-07-25 Thread Arun C Murthy
On Jul 25, 2008, at 3:53 PM, Joydeep Sen Sarma wrote: Just as an aside - there is probably a general perception that streaming is really slow (at least I had it). The last I did some profiling (in 0.15) - the primary overheads from streaming came from the scripting language (python is

Re: [PIG LATIN] how to get the size of a data bag

2008-07-18 Thread Arun C Murthy
Charles, The right forum for Pig is [EMAIL PROTECTED], I'm redirecting you there... good luck! Arun On Jul 18, 2008, at 11:51 AM, charles du wrote: Hi: Just start learning hadoop and pig latin. How can I get the number of elements in a data bag? For example, a data bag like follow has

Re: [Streaming]What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Arun C Murthy
On Jul 18, 2008, at 4:53 PM, Steve Gao wrote: Hi All, I am using Hadoop Streaming. I am confused by streaming options: -file and -CacheFile. Seems that they mean the same thing, right? The difference is that -file will 'ship' your file (local file) to the cluster, while

Re: Logging and JobTracker

2008-07-16 Thread Arun C Murthy
On Jul 16, 2008, at 4:09 PM, Kylie McCormick wrote: Hello (Again): I've managed to get Map/Reduce on its feet and running, but the JobClient runs the Map() to 100% then idles. At least, I think it's idling. It's certainly not updating, and I let it run 10+ minutes. I tried to get the

Re: Failed to repeat the Quickstart guide for Pseudo-distributed operation

2008-07-08 Thread Arun C Murthy
# bin/hadoop dfs -put conf input 08/06/29 09:38:42 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File / user/root/input/hadoop-env.sh could only be replicated to 0 nodes, instead of 1 Looks like your datanode didn't come up, anything in the logs?

Re: RandomTextWriter

2008-07-07 Thread Arun C Murthy
On Jul 7, 2008, at 9:46 AM, Chris K Wensel wrote: Hey all Has anyone had success with RandomTextWriter? I'm finding it fairly unstable on 0.16.x, haven't tried 0.17 yet though. What problems are you seeing? It seems to work fine for me... Arun

Re: Combiner is optional though it is specified?

2008-07-01 Thread Arun C Murthy
On Jul 1, 2008, at 4:04 AM, novice user wrote: Hi all, I have a query regarding the functionality of combiner. Is it possible to ignore combiner code for some of the outputs of mapper and directly being sent to reducer though combiner is specified in job configuration? Because, I

Re: Help! How to overcome a RemoteException:

2008-07-01 Thread Arun C Murthy
On Jul 1, 2008, at 5:49 AM, boris starchev wrote: a.io.IOException: File /tmp/hadoop-bstarchev/mapred/system/job_200807011532_0001 /job.jar could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.ja va:1145) Looks like

Re: hadoop map reduce: additional heap space?

2008-06-27 Thread Arun C Murthy
On Jun 27, 2008, at 4:26 PM, Mori Bellamy wrote: hey all, i was wondering if theres a way to allocate more heap space for each mapper and reducer process that hadoop spawns. i'm getting this error: Use the 'mapred.child.java.opts' parameter:

Re: Issue loading a native library through the DistributedCache

2008-06-12 Thread Arun C Murthy
On Jun 12, 2008, at 6:47 AM, montag wrote: Hi, I'm a new Hadoop user, so if this question is blatantly obvious, I apologize. I'm trying to load a native shared library using the DistributedCache as outlined in https://issues.apache.org/jira/browse/HADOOP-1660?

Re: hadoop benchmarked, too slow to use

2008-06-11 Thread Arun C Murthy
On Jun 11, 2008, at 11:53 AM, Elia Mazzawi wrote: we concatenated the files to bring them close to and less than 64mb and the difference was huge without changing anything else we went from 214 minutes to 3 minutes ! *smile* How many reduces are you running now? 1 or more? Arun Elia

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Arun C Murthy
On Jun 10, 2008, at 2:48 PM, Meng Mao wrote: I'm interested in the same thing -- is there a recommended way to batch Hadoop jobs together? Hadoop Map-Reduce JobControl: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Job +Control and

Re: Streaming --counters question

2008-06-10 Thread Arun C Murthy
On Jun 10, 2008, at 3:16 PM, Miles Osborne wrote: Is there support for counters in streaming? In particular, it would be nice to be able to access these after a job has run. Yes. Streaming applications can update counters in hadoop-0.18: http://issues.apache.org/jira/browse/HADOOP-1328

Re: Questions on how to use DistributedCache

2008-05-22 Thread Arun C Murthy
On May 21, 2008, at 10:45 PM, Taeho Kang wrote: Dear all, I am trying to use DistributedCache class for distributing files required for running my jobs. While API documentation provides good guidelines, Is there any tips or usage examples (e.g. sample codes)?

Re: Node stops working

2008-05-20 Thread Arun C Murthy
On May 20, 2008, at 2:21 AM, Marianne Spiller wrote: Can anyone please give me a hint? The below BindException's tell me that there is something already running on those ports... maybe you need to kill them if they are Hadoop daemons. Arun 2008-05-15 12:51:55,077 FATAL

Re: Meaning of Data-local map tasks in the web status gui to MapReduce

2008-05-20 Thread Arun C Murthy
On May 20, 2008, at 9:03 AM, Saptarshi Guha wrote: Hello, Does the Data-local map tasks counter mean the number of tasks that the had the input data already present on the machine on they are running on? i.e the wasn't a need to ship the data to them. Yes. Your understanding is

Re: Master Failure

2008-05-19 Thread Arun C Murthy
On May 19, 2008, at 1:37 AM, Fabrizio detto Mario wrote: How does Hadoop manage the failure of the JobTracker (Master Node)? For example, Google Map/Reduce version aborts the MapReduce computation if the master fails. As NameNode and SecondaryNameNode, exists a SecondaryJobTracker?

Re: why it stopped at Reduce phase?

2008-05-13 Thread Arun C Murthy
Wang, On May 13, 2008, at 8:12 AM, wangxiaowei wrote: hi all: I uses two computers A and B as a hadoop cluster,A is JobTracker and NameNode,both A and B are slaves. The input data size is about 80MB,including 100,000records. The job is to read one record a time and find some useful

Re: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find task_200805131750_0005_m_000001_0/file.out.index in any of the configured local directories

2008-05-13 Thread Arun C Murthy
property namehadoop.tmp.dir/name valuetmp_storage/value /property Could you try and change the above to an absolute path and check? That path should be relevant on each of the tasktrackers. Of course, you can configure each tasktracker independently by editing it's hadoop-site.xml.

Re: How to write simple programs using Hadoop?

2008-05-07 Thread Arun C Murthy
On May 7, 2008, at 12:33 AM, Hadoop wrote: Is there any chance to see some simple programs for Hadoop (such as Hello world, counting numbers 1-10, reading two numbers and printing the larger one, other number, string and file processing examples,...etc) written in Java/C++. It seems

Re: Not allow file split

2008-05-07 Thread Arun C Murthy
On May 7, 2008, at 6:30 AM, Roberto Zandonati wrote: Hi at all, I'm a newbie and I have the following problem. I need to implement an InputFormat such that the isSplitable always returns false ah shown in http://wiki.apache.org/hadoop/FAQ (question no 10). And here there is the problem. I

Re: multi-outputs and join-input

2008-05-04 Thread Arun C Murthy
Yi, On May 3, 2008, at 1:02 AM, Yi Wang (王益) wrote: It seems org.apache.hadoop.mapred.join implements the function of joined-inputs. I am wondering whether Hadoop allows Mapper outputs to multiple output channels? There is a MultipleOutputCollector being currently worked on:

Re: DistributedCache on Java Objects?

2008-04-30 Thread Arun C Murthy
On Apr 30, 2008, at 8:14 AM, ncardoso wrote: Hello, I'm using Hadoop for distributed text mining of large collection of documents, and in my optimizing process, I want to speed things up a bit, and I want to know how can I do this step with Hadoop... Each Map process takes a group of

Re: tracking down mapper error

2008-04-24 Thread Arun C Murthy
On Apr 23, 2008, at 8:14 PM, Ashish Venugopal wrote: 2008-04-23 19:43:23,848 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed.waitOutputThreads(): subprocess failed with code 14 Looks like your streaming command failed with error_code of 14, could you check why your command failed?

Re: reducer outofmemoryerror

2008-04-24 Thread Arun C Murthy
On Apr 23, 2008, at 7:51 AM, Apurva Jadhav wrote: There are six reducers and 24000 mappers because there are 24000 files. The number of tasks per node is 2. mapred.child.java opts is the default value 200m. What is a good value for this.? My mappers and reducers are fairly simple and do

Re: reducer outofmemoryerror

2008-04-24 Thread Arun C Murthy
On Apr 23, 2008, at 7:51 AM, Apurva Jadhav wrote: There are six reducers and 24000 mappers because there are 24000 files. The number of tasks per node is 2. mapred.child.java opts is the default value 200m. What is a good value for this.? My mappers and reducers are fairly simple and do

Re: Question on how to view the counters of jobs in the job tracker history

2008-04-03 Thread Arun C Murthy
On Apr 3, 2008, at 5:36 PM, Jason Venner wrote: For the first day or so, when the jobs are viewable via the main page of the job tracker web interface, the jobs specific counters are also visible. Once the job is only visible in the history page, the counters are not visible. Is it

Re: Hadoop: Multiple map reduce or some better way

2008-03-26 Thread Arun C Murthy
On Mar 26, 2008, at 9:39 AM, Aayush Garg wrote: HI, I am developing the simple inverted index program frm the hadoop. My map function has the output: word, doc and the reducer has: word, list(docs) Now I want to use one more mapreduce to remove stop and scrub words from this output.

Re: Hadoop cookbook / snippets site?

2008-03-26 Thread Arun C Murthy
On Mar 26, 2008, at 10:08 AM, Parand Darugar wrote: Hello, Is there a hadoop recipes / snippets / cookbook site? I'm thinking something like the Python Cookbook (http://aspn.activestate.com/ ASPN/Python/Cookbook/) or Django Snippets (http:// www.djangosnippets.org/), where people can post

Re: Hadoop: Multiple map reduce or some better way

2008-03-26 Thread Arun C Murthy
On Mar 26, 2008, at 11:05 AM, Arun C Murthy wrote: On Mar 26, 2008, at 9:39 AM, Aayush Garg wrote: HI, I am developing the simple inverted index program frm the hadoop. My map function has the output: word, doc and the reducer has: word, list(docs) Now I want to use one more mapreduce

Re: walkthrough of developing first hadoop app from scratch

2008-03-22 Thread Arun C Murthy
On Mar 21, 2008, at 6:35 PM, Stephen J. Barr wrote: Hello, I am working on developing my first hadoop app from scratch. It is a Monte-Carlo simulation, and I am using the PiEstimator code from the examples as a reference. I believe I have what I want in a .java file. However, I couldn't

Re: libhdfs working for test program when run from ant but failing when run individually

2008-03-18 Thread Arun C Murthy
On Mar 14, 2008, at 11:48 PM, Raghavendra K wrote: Hi, My apologies for bugging the forum again and again. I am able to get the sample program for libhdfs working. I followed these steps. --- compiled using ant --- modified the test-libhdfs.sh to include CLASSPATH, HADOOP_HOME,

Re: Hadoop Quickstart page

2008-03-10 Thread Arun C Murthy
On Mar 10, 2008, at 3:18 PM, Jason Rennie wrote: I just ran through this as a new user and had trouble w/ the JAVA_HOME setting. Per the instructions, I had JAVA_HOME set appropriately (via my .bashrc), but not in conf/hadoop-env.sh. Would be good if 1. under Required Software specified

Re: Problem with LibHDFS

2008-02-22 Thread Arun C Murthy
On Feb 21, 2008, at 3:29 AM, Raghavendra K wrote: Hi, I am able to get Hadoop running and also able to compile the libhdfs. But when I run the hdfs_test program it is giving Segmentation Fault. Unfortunately the documentation for using libhdfs is sparse, our apologies. You'll need

  1   2   >