Re: different input/output formats

2012-05-29 Thread Mark question
at same Application . >I am not getting any error. > > > On Wed, May 30, 2012 at 1:27 AM, Mark question wrote: > >> Hi guys, this is a very simple program, trying to use TextInputFormat and >> SequenceFileoutputFormat. Should be easy but I get the same error. >&g

Re: different input/output formats

2012-05-29 Thread Mark question
age 1 to 1.0f > then it will work.* >} > > let me know the status after the change > > > On Wed, May 30, 2012 at 1:27 AM, Mark question > wrote: > > > Hi guys, this is a very simple program, trying to use TextInputFormat > and > > SequenceFileoutputFormat.

Re: How to add debugging to map- red code

2012-04-20 Thread Mark question
I'm interested in this too, but could you tell me where to apply the patch and is the following the right command to write it: patch < MAPREDUCE-336_0_20090818.patch

Has anyone installed HCE and built it successfully?

2012-04-18 Thread Mark question
Hey guys, I've been stuck with HCE installation for two days now and can't figure out the problem. Errors I get from running (sh build.sh) is "can not execute binary file" . I tried setting my JAVA_HOME and ANT_HOME manually and using the script build.sh, no luck. So, please if you've used HCE cou

Re: Hadoop streaming or pipes ..

2012-04-06 Thread Mark question
ad of sending all of the > data out to a separate process just to read it back in again. > > > > --Bobby Evans > > > > > > On 4/5/12 1:54 PM, "Mark question" wrote: > > > > Hi guys, > > quick question: > > Are there any performance

Re: Hadoop streaming or pipes ..

2012-04-05 Thread Mark question
again. > > --Bobby Evans > > > On 4/5/12 1:54 PM, "Mark question" wrote: > > Hi guys, > quick question: > Are there any performance gains from hadoop streaming or pipes over > Java? From what I've read, it's only to ease testing by using your favo

Hadoop streaming or pipes ..

2012-04-05 Thread Mark question
Hi guys, quick question: Are there any performance gains from hadoop streaming or pipes over Java? From what I've read, it's only to ease testing by using your favorite language. So I guess it is eventually translated to bytecode then executed. Is that true? Thank you, Mark

Hadoop pipes and streaming ..

2012-04-05 Thread Mark question
Hi guys, Two quick questions: 1. Are there any performance gains from hadoop streaming or pipes ? As far as I read, it is to ease testing using your favorite language. Which I think implies that everything is translated to bytecode eventually and executed.

Re: Custom Seq File Loader: ClassNotFoundException

2012-03-05 Thread Mark question
Unfortunately, "public" didn't change my error ... Any other ideas? Has anyone ran Hadoop on eclipse with custom sequence inputs ? Thank you, Mark On Mon, Mar 5, 2012 at 9:58 AM, Mark question wrote: > Hi Madhu, it has the following line: > > TermDocFreqArrayWritable (

Re: Custom Seq File Loader: ClassNotFoundException

2012-03-05 Thread Mark question
stomWritable has a default constructor. > > On Sat, Mar 3, 2012 at 4:56 AM, Mark question wrote: > > > Hello, > > > > I'm trying to debug my code through eclipse, which worked fine with > > given Hadoop applications (eg. wordcount), but as soon as I run it on my

Re: Streaming Hadoop using C

2012-03-01 Thread Mark question
Starfish worked great for wordcount .. I didn't run it on my application because I have only map tasks. Mark On Thu, Mar 1, 2012 at 4:34 AM, Charles Earl wrote: > How was your experience of starfish? > C > On Mar 1, 2012, at 12:35 AM, Mark question wrote: > > > Than

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
gain insight by configuring hadoop to run > absolute minimum number of tasks? > Perhaps the discussion > > http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task > might be relevant? > On Feb 29, 2012, at 3:53 PM, Mark question wrote

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
rles Earl wrote: > Mark, > So if I understand, it is more the memory management that you are > interested in, rather than a need to run an existing C or C++ application > in MapReduce platform? > Have you done profiling of the application? > C > On Feb 29, 2012, at 2:19 PM, Mark

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
ory even though it's running on VM eventually? Thanks, Mark On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl wrote: > Mark, > Both streaming and pipes allow this, perhaps more so pipes at the level of > the mapreduce task. Can you provide more details on the application? > O

Streaming Hadoop using C

2012-02-29 Thread Mark question
Hi guys, thought I should ask this before I use it ... will using C over Hadoop give me the usual C memory management? For example, malloc() , sizeof() ? My guess is no since this all will eventually be turned into bytecode, but I need more control on memory which obviously is hard for me to do wit

Re: memory of mappers and reducers

2012-02-16 Thread Mark question
es. > > ** value of mapred.child.ulimit > value of mapred.child.java.opts > > On Thu, Feb 16, 2012 at 12:38 AM, Mark question > wrote: > > Thanks for the reply Srinivas, so option 2 will be enough, however, when > I > > tried setting it to 512MB, I see through the system monitor

Re: memory of mappers and reducers

2012-02-15 Thread Mark question
> memory based scheduling decisions may be affected. > > On Wed, Feb 15, 2012 at 5:57 PM, Mark question > wrote: > > Hi, > > > > My question is what's the difference between the following two settings: > > > > 1. mapred.task.default.maxvmem > &g

memory of mappers and reducers

2012-02-15 Thread Mark question
Hi, My question is what's the difference between the following two settings: 1. mapred.task.default.maxvmem 2. mapred.child.java.opts The first one is used by the TT to monitor the memory usage of tasks, while the second one is the maximum heap space assigned for each task. I want to limit eac

Namenode no lease exception ... what does it mean?

2012-02-09 Thread Mark question
Hi guys, Even though there is enough space on HDFS as shown by -report ... I get the following 2 error shown first in the log of a datanode and the second on Namenode log: 1)2012-02-09 10:18:37,519 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_844811798682217395

Re: Too many open files Error

2012-01-27 Thread Mark question
re technically allowing DN to run 1 million block transfer > > (in/out) threads by doing that. It does not take up resources by > > default sure, but now it can be abused with requests to make your DN > > run out of memory and crash cause its not bound to proper limits now. > > &

Re: Too many open files Error

2012-01-26 Thread Mark question
nd not > other jobs (or other task attempts). > > On Fri, Jan 27, 2012 at 1:10 AM, Raj V wrote: > > Mark > > > > You have this "Connection reset by peer". Why do you think this problem > is related to too many open files? > > > > Raj

Re: Too many open files Error

2012-01-26 Thread Mark question
No worries ... thanks guys .. I set it to 100M and it worked :) Thanks, Mark On Thu, Jan 26, 2012 at 11:10 AM, Mark question wrote: > Hi again, > I've tried : > > dfs.datanode.max.xcievers > 1048576 > > but I'm still getting the s

Re: Too many open files Error

2012-01-26 Thread Mark question
Hi again, I've tried : dfs.datanode.max.xcievers 1048576 but I'm still getting the same error ... how high can I go?? Thanks, Mark On Thu, Jan 26, 2012 at 9:29 AM, Mark question wrote: > Thanks for the reply I have nothing about dfs.datanode.ma

Re: Too many open files Error

2012-01-26 Thread Mark question
apred or hdfs what do you see when you do a ulimit > -a? > >> That should give you the number of open files allowed by a single > user... > >> > >> > >> Sent from a remote device. Please excuse any typos... > >> > >>

Re: connection between slaves and master

2012-01-11 Thread Mark question
t when the master and slaves are on different machines. > > Praveen > > On Mon, Jan 9, 2012 at 11:41 PM, Mark question > wrote: > > > Hello guys, > > > > I'm requesting from a PBS scheduler a number of machines to run Hadoop > > and even though all hadoop

connection between slaves and master

2012-01-09 Thread Mark question
Hello guys, I'm requesting from a PBS scheduler a number of machines to run Hadoop and even though all hadoop daemons start normally on the master and slaves, the slaves don't have worker tasks in them. Digging into that, there seems to be some blocking between nodes (?) don't know how to descri

Re: Expected file://// error

2012-01-08 Thread Mark question
oop/conf) in there or > it won't pick up the correct configs. > > -Joey > > On Sun, Jan 8, 2012 at 12:59 PM, Mark question > wrote: > > mapred-site.xml: > > > > > >mapred.job.tracker > >

Re: Expected file://// error

2012-01-08 Thread Mark question
mapred-site.xml: mapred.job.tracker localhost:10001 mapred.child.java.opts -Xmx1024m mapred.tasktracker.map.tasks.maximum 10 Command is running a script which runs a java program that submit two jobs consecutively insuring waiting for the first job

Re: Expected file://// error

2012-01-06 Thread Mark question
t to? It should be set to hdfs://host:port > and not just host:port. Can you ensure this and retry? > > On 06-Jan-2012, at 5:45 PM, Mark question wrote: > > > Hello, > > > > I'm running two jobs on Hadoop-0.20.2 consecutively, such that the > second > >

Expected file://// error

2012-01-06 Thread Mark question
Hello, I'm running two jobs on Hadoop-0.20.2 consecutively, such that the second one reads the output of the first which would look like: outputPath/part-0 outputPath/_logs But I get the error: 12/01/06 03:29:34 WARN fs.FileSystem: "localhost:12123" is a deprecated filesystem name. U

Connection reset by peer Error

2011-11-20 Thread Mark question
Hi, I've been getting this error multiple times now, the namenode mentions something about peer resetting connection, but I don't know why this is happening, because I'm running on a single machine with 12 cores any ideas? The job starting running normally, which contains about 200 mappers e

reading Hadoop output messages

2011-11-16 Thread Mark question
Hi all, I'm wondering if there is a way to get output messages that are printed from the main class of a Hadoop job. Usually "2>&1>> out.log" would wok, but in this case it only saves the output messages printed in the main class before starting the job. What I want is the output messages th

Re: Cannot access JobTracker GUI (port 50030) via web browser while running on Amazon EC2

2011-10-24 Thread Mark question
les restart > iptables: Flushing firewall rules: [ OK ] > iptables: Setting chains to policy ACCEPT: filter [ OK ] > iptables: Unloading modules: [ OK ] > iptables: Applying firewall rules: [

Re: Cannot access JobTracker GUI (port 50030) via web browser while running on Amazon EC2

2011-10-24 Thread Mark question
I have the same issue and the output of "curl localhost:50030" is like yours, and I'm running on a remote cluster on pesudo-distributed mode. Can anyone help? Thanks, Mark On Mon, Oct 24, 2011 at 11:02 AM, Sameer Farooqui wrote: > Hi guys, > > I'm running a 1-node Hadoop 0.20.2 pseudo-distribute

Remote Blocked Transfer count

2011-10-21 Thread Mark question
Hello, I wonder if there is a way to measure how many of the data blocks have transferred over the network? Or more generally, how many times where there a connection/contact between different machines? I thought of checking the Namenode log file which usually shows blk_ from src= to dst .

fixing the mapper percentage viewer

2011-10-19 Thread Mark question
Hi all, I'm written a custom mapRunner, but it seems to have ruined the percentage shown for maps on console. I want to know which part of code is responsible for adjusting the percentage of maps ... Is it the following in MapRunner: if(incrProcCount) { reporter.incrCounter(Sk

Re: hadoop input buffer size

2011-10-10 Thread Mark question
either read one line each time, nor fetching > > dfs.block.size of lines > > into a buffer, > > Actually, for the TextInputFormat, it read io.file.buffer.size > > bytes of text > > into a buffer each time, > > this can be seen from the hadoop source file LineReader.ja

hadoop input buffer size

2011-10-04 Thread Mark question
Hello, Correct me if I'm wrong, but when a program opens n-files at the same time to read from, and start reading from each file at a time 1 line at a time. Isn't hadoop actually fetching dfs.block.size of lines into a buffer? and not actually one line. If this is correct, I set up my dfs.blo

Mapper Progress

2011-07-21 Thread Mark question
Hi, I have my custom MapRunner which apparently seemed to affect the progress report of the mapper and showing 100% while the mapper is still reading files to process. Where can I change/add a progress object to be shown in UI ? Thank you, Mark

Re: One file per mapper

2011-07-05 Thread Mark question
Hi Govind, You should use overwrite your FileInputFormat isSplitable function in a class say myFileInputFormat extends FileInputFormat as follows: @Override public boolean isSplitable(FileSystem fs, Path filename){ return false; } Then one you use your myFileInputFormat class. To

One node with Rack-local mappers ?!!!

2011-06-16 Thread Mark question
Hi, this is weird ... I'm running a job on single node with 32 mappers, running one at a time. Output says this: .. 11/06/16 00:59:43 INFO mapred.JobClient: Rack-local map tasks=18 == 11/06/16 00:59:43 INFO mapred.JobClient: Launched map tasks=32 11/06/16 00:59:43 INFO mapred

Hadoop Runner

2011-06-11 Thread Mark question
Hi, 1) Where can I find the "main" class of hadoop? The one that calls the InputFormat then the MapperRunner and ReducerRunner and others? This will help me understand what is in memory or still on disk , exact flow of data between split and mappers . My problem is, assuming I have a TextI

Re: org.apache.hadoop.mapred.Utils can not be resolved

2011-06-10 Thread Mark question
om > lib/ too, like avro, commons-cli or so.. there was a discussion on > this, can't find it in search right now - you may have better luck). > > On Fri, Jun 10, 2011 at 4:22 AM, Mark question > wrote: > > Hi, > > > > My question here is general to this pro

DiskUsage class DU Error

2011-06-09 Thread Mark question
Hi, Has Anyone tried using DU class to report hdfs-files size? Both of the following lines are causing errors , running on Mac: DU DiskUsage = new DU(new File(outDir.getPath()),12L); DU DiskUsage = new DU(new File(outDir.getName()),Configuration)conf); where, Path outDir = SequenceFileOut

org.apache.hadoop.mapred.Utils can not be resolved

2011-06-09 Thread Mark question
Hi, My question here is general to this problem. How can you know which jar file will solve such error: *org.apache.hadoop.mapred.Utils can not be resolved. *I don't plan to include all hadoop jars ... Well, hope so .. Can you tell me your techniques? Thanks, Mark * *

Re: re-reading

2011-06-08 Thread Mark question
I assumed before reading the split API that it is the actual split, my bad. Thanks alot Harsh, it's working great! Mark

Re: re-reading

2011-06-08 Thread Mark question
? Thanks, Mark On Wed, Jun 8, 2011 at 9:13 AM, Mark question wrote: > Thanks for the replies, but input doesn't have 'clone' I don't know why ... > so I'll have to write my custom inputFormat ... I was hoping for an easier > way though. > > Thank you, >

Re: re-reading

2011-06-08 Thread Mark question
son (haven't tried it really), try > writing your own InputFormat wrapper where in you can have direct > access to the InputSplit object to do what you want to (open two > record readers, and manage them separately). > > On Wed, Jun 8, 2011 at 1:48 PM, Stefan Wienert wrote

re-reading

2011-06-07 Thread Mark question
Hi, I'm trying to read the inputSplit over and over using following function in MapperRunner: @Override public void run(RecordReader input, OutputCollector output, Reporter reporter) throws IOException { RecordReader copyInput = input; //First read while(input.next(key,value));

Re: Reducing Mapper InputSplit size

2011-06-06 Thread Mark question
Great! Thanks guys :) Mark 2011/6/6 Panayotis Antonopoulos > > Hi Mark, > > Check: > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html > > I think that setMaxInputSplitSize(Job job, > long size) > > > will do what you

Reducing Mapper InputSplit size

2011-06-06 Thread Mark question
Hi, Does anyone have a way to reduce InputSplit size in general ? By default, the minimum size chunk that map input should be split into is set to 0 (ie.mapred.min.split.size). Can I change dfs.block.size or some other configuration to reduce the split size and spawn many mappers? Thanks, Mark

Re: SequenceFile.Reader

2011-06-02 Thread Mark question
) would skip reading value from disk. Mark On Thu, Jun 2, 2011 at 6:20 PM, Mark question wrote: > Hi John, thanks for the reply. But I'm not asking about the key memory > allocation here. I'm just saying what's the difference between: > > Next(key,value) and Next(key

Re: SequenceFile.Reader

2011-06-02 Thread Mark question
the recordSize skips to the next key? Thanks, Mark On Thu, Jun 2, 2011 at 3:49 PM, John Armstrong wrote: > On Thu, 2 Jun 2011 15:43:37 -0700, Mark question > wrote: > > Does anyone knows if : SequenceFile.next(key) is actually not reading > > value into memory >

SequenceFile.Reader

2011-06-02 Thread Mark question
Hi, Does anyone knows if : SequenceFile.next(key) is actually not reading value into memory *next *(Writable

UI not working

2011-05-28 Thread Mark question
Hi, My UI for hadoop 20.2 on a single machine suddenly is giving the following errors for NN and JT web-sites respectively: HTTP ERROR: 404 /dfshealth.jsp RequestURI=/dfshealth.jsp *Powered by Jetty:// * HTTP ERROR: 503 SERVICE_UNAVAILABLE RequestURI=/jobtracke

Re: web site doc link broken

2011-05-27 Thread Mark question
I also got the following from "learn about" : Not Found The requested URL /common/docs/stable/ was not found on this server. -- Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c Server at hadoop.apache.orgPort 80 Mark On Fri, May 27, 2011 at 8:03 AM, Harsh J wrote:

Re: How to copy over using dfs

2011-05-27 Thread Mark question
I don't think so, becauseI read somewhere that this is to insure the safety of the produced data. Hence Hadoop will force you to do this to know what exactly is happening. Mark On Fri, May 27, 2011 at 12:28 PM, Mohit Anchlia wrote: > If I have to overwrite a file I generally use > > hadoop dfs -

Increase node-mappers capacity in single node

2011-05-26 Thread Mark question
Hi, I tried changing "mapreduce.job.maps" to be more than 2 , but since I'm running in pseudo distributed mode, JobTracker is local and hence this property is not changed. I'm running on a 12 core machine and would like to make use of that ... Is there a way to trick Hadoop? I also tried usi

Re: one question about hadoop

2011-05-26 Thread Mark question
web.xml is in: hadoop-releaseNo/webapps/job/WEB-INF/web.xml Mark On Thu, May 26, 2011 at 1:29 AM, Luke Lu wrote: > Hadoop embeds jetty directly into hadoop servers with the > org.apache.hadoop.http.HttpServer class for servlets. For jsp, web.xml > is auto generated with the jasper compiler d

Re: Sorting ...

2011-05-26 Thread Mark question
st and a lot less dev work to > get going you might want to look at pig. They can do a distributed order by > that is fairly good. > > --Bobby Evans > > On 5/26/11 2:45 AM, "Luca Pireddu" wrote: > > On May 25, 2011 22:15:50 Mark question wrote: > > I'm

Re: UI not working ..

2011-05-25 Thread Mark question
Hi, > > My UI for hadoop 20.2 on a single machine suddenly is giving the > following errors for NN and JT web-sites respectively: > > HTTP ERROR: 404 > > /dfshealth.jsp > > RequestURI=/dfshealth.jsp > > *Powered by Jetty:// * > > > HTTP ERROR: 503 > > SERVICE_UNAVAILAB

UI not working ..

2011-05-25 Thread Mark question
Hi, My UI for hadoop 20.2 on a single machine suddenly is giving the following errors for NN and JT web-sites respectively: HTTP ERROR: 404 /dfshealth.jsp RequestURI=/dfshealth.jsp *Powered by Jetty:// * HTTP ERROR: 503 SERVICE_UNAVAILABLE RequestURI=/jobtracke

Re: Sorting ...

2011-05-25 Thread Mark question
I'm using SequenceFileInputFormat, but then what to write in my mappers? each mapper is taking a split from the SequenceInputFile then sort its split ?! I don't want that.. Thanks, Mark On Wed, May 25, 2011 at 2:09 AM, Luca Pireddu wrote: > On May 25, 2011 01:43:22 Mark q

Re: I can't see this email ... So to clarify ..

2011-05-24 Thread Mark question
Mark On Tue, May 24, 2011 at 9:26 PM, Mapred Learn wrote: > Do u Hv right permissions on the new dirs ? > Try stopping n starting cluster... > > -JJ > > On May 24, 2011, at 9:13 PM, Mark question wrote: > > > Well, you're right ... moving it to hdfs-site.xml had

Re: I can't see this email ... So to clarify ..

2011-05-24 Thread Mark question
warning, if you use /tmp to store your HDFS data, you risk > data loss. On many operating systems, files and directories in /tmp > are automatically deleted. > > -Joey > > On Tue, May 24, 2011 at 10:22 PM, Mark question > wrote: > > Hi guys, > > > > I&

I can't see this email ... So to clarify ..

2011-05-24 Thread Mark question
Hi guys, I'm using an NFS cluster consisting of 30 machines, but only specified 3 of the nodes to be my hadoop cluster. So my problem is this. Datanode won't start in one of the nodes because of the following error: org.apache.hadoop.hdfs.server. common.Storage: Cannot lock storage /cs/student/ma

Cannot lock storage, directory is already locked

2011-05-24 Thread Mark question
Hi guys, I'm using an NFS cluster consisting of 30 machines, but only specified 3 of the nodes to be my hadoop cluster. So my problem is this. Datanode won't start in one of the nodes because of the following error: org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /cs/student/mar

Re: Sorting ...

2011-05-24 Thread Mark question
Thanks Luca, but what other way to sort a directory of sequence files? I don't plan to write a sorting algorithm in mappers/reducers, but hoping to use the sequenceFile.sorter instead. Any ideas? Mark On Mon, May 23, 2011 at 12:33 AM, Luca Pireddu wrote: > > On May 22, 2011 03

Re: get name of file in mapper output directory

2011-05-24 Thread Mark question
this: > > > > attempt_200707121733_0003_m_05_0 > > > > You're interested in the m_05 part, This gets translated into the > > output file name part-m-5. > > > > -Joey > > > > On Sat, May 21, 2011 at 8:03 PM, Mark question >

Re: How hadoop parse input files into (Key,Value) pairs ??

2011-05-22 Thread Mark question
The case your talking about is when you use FileInputFormat ... So usually the InputFormat Interface is the one responsible for that. For FileInputFormat, it uses a LineRecordReader which will take your text file and assigns key to be the offset within your text file and value to be the line (unti

I didn't see my email sent yesterday ... So here is the question again ..

2011-05-22 Thread Mark question
Hi, I'm running a job with maps only and I want by end of each map (ie. in its Close() function) to open the file that the current map has wrote using its output.collector. I know "job.getWorkingDirectory()" would give me the parent path of the file written, but how to get the full path or

Re: current line number as key?

2011-05-21 Thread Mark question
What if you run a MapReduce program to generate a Sequence File from your text file where key is the line number and value is the whole line, then for the second job, the splits are done record wise hence, each mapper will be getting a split/block of records [] ~Cheers, Mark On Wed, May 18, 2011 a

Sorting ...

2011-05-21 Thread Mark question
I'm trying to sort Sequence files using the Hadoop-Example TeraSort. But after taking a couple of minutes .. output is empty. HDFS has the following Sequence files: -rw-r--r-- 1 Hadoop supergroup 196113760 2011-05-21 12:16 /user/Hadoop/out/part-0 -rw-r--r-- 1 Hadoop supergroup 250935096

get name of file in mapper output directory

2011-05-21 Thread Mark question
Hi, I'm running a job with maps only and I want by end of each map (ie.Close() function) to open the file that the current map has wrote using its output.collector. I know "job.getWorkingDirectory()" would give me the parent path of the file written, but how to get the full path or the name

Re: outputCollector vs. Localfile

2011-05-20 Thread Mark question
I thought it was, because of FileBytesWritten counter. Thanks for the clarification. Mark On Fri, May 20, 2011 at 4:23 AM, Harsh J wrote: > Mark, > > On Fri, May 20, 2011 at 10:17 AM, Mark question > wrote: > > This is puzzling me ... > > > > With a mapper pro

outputCollector vs. Localfile

2011-05-19 Thread Mark question
This is puzzling me ... With a mapper producing output of size ~ 400 MB ... which one is supposed to be faster? 1) output collector: which will write to local file then copy to HDFS since I don't have reducers. 2) Open a unique local file inside "mapred.local.dir" for each mapper. I tho

Re: How do you run HPROF locally?

2011-05-17 Thread Mark question
or conf.setBoolean("mapred.task.profile", true); Mark On Tue, May 17, 2011 at 4:49 PM, Mark question wrote: > I usually do this setting inside my java program (in run function) as > follows: > > JobConf conf = new JobConf(this.getConf(),My.class); >

Re: How do you run HPROF locally?

2011-05-17 Thread Mark question
I usually do this setting inside my java program (in run function) as follows: JobConf conf = new JobConf(this.getConf(),My.class); conf.set("*mapred*.task.*profile*", "true"); then I'll see some output files in that same working directory. Hope that helps, Mark On Tue,

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
wrote: > > > > On May 17, 2011, at 3:11 PM, Mark question wrote: > > > >> So what other memory consumption tools do you suggest? I don't want to > do it > >> manually and dump statistics into file because IO will affect > performance > >> to

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
So what other memory consumption tools do you suggest? I don't want to do it manually and dump statistics into file because IO will affect performance too. Thanks, Mark On Tue, May 17, 2011 at 2:58 PM, Allen Wittenauer wrote: > > On May 17, 2011, at 1:01 PM, Mark question wrot

Again ... Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
Sorry for the spam, but I didn't see my previous email yet. I need to use hadoop-tool-kit for monitoring. So I followed http://code.google.com/p/hadoop-toolkit/source/checkout and applied the patch in my hadoop.20.2 directory as: patch -p0 < patch.20.2 and set a property *“mapred.performance.

Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
Hi I need to use hadoop-tool-kit for monitoring. So I followed http://code.google.com/p/hadoop-toolkit/source/checkout and applied the patch in my hadoop.20.2 directory as: patch -p0 < patch.20.2 and set a property *“mapred.performance.diagnose”* to true in * mapred-site.xml*. but I don't se

Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
n Thu, May 12, 2011 at 9:23 PM, Mark question > wrote: > > > So there is no way I can see the other possible splits (start+length)? > > like > > some function that returns strings of map.input.file and map.input.offset > > of > > the other mappers ? > > &g

Re: how to get user-specified Job name from hadoop for running jobs?

2011-05-12 Thread Mark question
you mean by "user-specified" is when you write your job name via JobConf.setJobName("myTask") ? Then using the same object you can recall your name as follows: JobConf conf ; conf.getJobName() ; ~Cheers Mark On Tue, May 10, 2011 at 10:16 AM, Mark Zand wrote: > While I can get JobStatus with th

Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
alley wrote: > On Thu, May 12, 2011 at 8:59 PM, Mark question > wrote: > > > Hi > > > > I'm using FileInputFormat which will split files logically according to > > their sizes into splits. Can the mapper get a pointer to these splits? > and > &g

I can't see my messages immediately, and sometimes doesn't even arrive why !

2011-05-12 Thread Mark question

Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
Hi I'm using FileInputFormat which will split files logically according to their sizes into splits. Can the mapper get a pointer to these splits? and know which split it is assigned ? I tried looking up the Reporter class and see how is it printing the logical splits on the UI for each mapp

Space needed to user SequenceFile.Sorter

2011-04-28 Thread Mark question
I don't know why I can't see my emails immediately sent to the group ... anyways, I'm sorting a sequenceFile using it's sorter on my local filesystem. The inputFile size is 1937690478 bytes. but after 14 minutes of sorting.. I get : TEST SORTING .. java.io.FileNotFoundException: File does not ex

Required Space for SequenceFile.Sorter

2011-04-28 Thread Mark question
Hi, Has anyone know how much space (memory,heap,disk,..) is needed for the Sequencefile.Sorter to sort an input size of say Y bytes. Like a formula with Y for example? Thanks, Mark

Re: Reading from File

2011-04-27 Thread Mark question
On Tue, Apr 26, 2011 at 11:49 PM, Harsh J wrote: > Hello Mark, > > On Wed, Apr 27, 2011 at 12:19 AM, Mark question > wrote: > > Hi, > > > > My mapper opens a file and read records using next() . However, I want > to > > stop reading if there is no mem

Reading from File

2011-04-26 Thread Mark question
Hi, My mapper opens a file and read records using next() . However, I want to stop reading if there is no memory available. What confuses me here is that even though I'm reading record by record with next(), hadoop actually reads them in dfs.block.size. So, I have two questions: 1. Is it true

Re: Configured Memory Capacity

2011-04-25 Thread Mark question
I think I changed it using : dfs.datanode.du.reserved 1073741824 true Hope that helps, Mark On Mon, Apr 25, 2011 at 9:41 AM, maha wrote: > Hi, > > I'm running out of memory as shown by the fsck -report > > Decommission Status : Normal > Configured Capacity: 4227530752 (3.9

Re: Sequence.Sorter Performance

2011-04-25 Thread Mark question
Thanks Owen ! Mark On Mon, Apr 25, 2011 at 11:43 AM, Owen O'Malley wrote: > The SequenceFile sorter is ok. It used to be the sort used in the shuffle. > *grin* > > Make sure to set io.sort.factor and io.sort.mb to appropriate values for > your hardware. I'd usually use io.sort.factor as 25 * dri

SequenceFile.Sorter

2011-04-24 Thread Mark question
Hi guys, I'm trying to sort a 2.5 GB sequence file in one mapper using its implemented Sort function, but it's taking long that the map is killed for not reporting . I would increase the default time to get reports from the mapper, but I'll do this only if sorting using SequenceFile.sorter is

SequenceFile.Sorter performance

2011-04-24 Thread Mark question
Hi guys, I'm trying to sort a 2.5 GB sequence file in one mapper using its implemented Sort function, but it's taking long that the map is killed for not reporting . I would increase the default time to get reports from the mapper, but I'll do this only if sorting using SequenceFile.sorter is