Changing into Replication factor
HI Folks, Rite now i m having replication factor 2, but now i want to make it three for sum tables so how can i do that for specific tables, so that whenever the data would be loaded in those tables it can automatically replicated into three nodes. Or i need to replicate for all the tables. and how can i do that by simply changing the parameter to 3 and run -*refreshNodes *or there is another way to perform that. Regards hadoopHive
Re: Writing to SequenceFile fails
1. It is important to ensure your clients are on the same major version jars as your server. 2. You are probably looking for "hadoop fs -chown" and "hadoop fs -chmod" tools to modify permissions. On Wed, Feb 22, 2012 at 3:15 AM, Mohit Anchlia wrote: > I am past this error. Looks like I needed to use CDH libraries. I changed > my maven repo. Now I am stuck at > > *org.apache.hadoop.security.AccessControlException *since I am not writing > as user that owns the file. Looking online for solutions > > > On Tue, Feb 21, 2012 at 12:48 PM, Mohit Anchlia wrote: > >> I am trying to write to the sequence file and it seems to be failing. Not >> sure why, Is there something I need to do >> >> String uri="hdfs://db1:54310/examples/testfile1.seq"; >> >> FileSystem fs = FileSystem.*get*(URI.*create*(uri), conf); //Fails >> on this line >> >> >> Caused by: >> *java.io.EOFException* >> >> at java.io.DataInputStream.readInt( >> *DataInputStream.java:375*) >> >> at org.apache.hadoop.ipc.Client$Connection.receiveResponse( >> *Client.java:501*) >> >> at org.apache.hadoop.ipc.Client$Connection.run(*Client.java:446*) >> -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
Re: Writing small files to one big file in hdfs
Finally figured it out. I needed to use SequenceFileAstextInputFormat. There is just lack of examples that makes it difficult when you start. On Tue, Feb 21, 2012 at 4:50 PM, Mohit Anchlia wrote: > It looks like in mapper values are coming as binary instead of Text. Is > this expected from sequence file? I initially wrote SequenceFile with Text > values. > > > On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia wrote: > >> Need some more help. I wrote sequence file using below code but now when >> I run mapreduce job I get "file.*java.lang.ClassCastException*: >> org.apache.hadoop.io.LongWritable cannot be cast to >> org.apache.hadoop.io.Text" even though I didn't use LongWritable when I >> originally wrote to the sequence >> >> //Code to write to the sequence file. There is no LongWritable here >> >> org.apache.hadoop.io.Text key = >> *new* org.apache.hadoop.io.Text(); >> >> BufferedReader buffer = >> *new* BufferedReader(*new* FileReader(filePath)); >> >> String line = >> *null*; >> >> org.apache.hadoop.io.Text value = >> *new* org.apache.hadoop.io.Text(); >> >> *try* { >> >> writer = SequenceFile.*createWriter*(fs, conf, path, key.getClass(), >> >> value.getClass(), SequenceFile.CompressionType. >> *RECORD*); >> >> *int* i = 1; >> >> *long* timestamp=System.*currentTimeMillis*(); >> >> *while* ((line = buffer.readLine()) != *null*) { >> >> key.set(String.*valueOf*(timestamp)); >> >> value.set(line); >> >> writer.append(key, value); >> >> i++; >> >> } >> >> >> On Tue, Feb 21, 2012 at 12:18 PM, Arko Provo Mukherjee < >> arkoprovomukher...@gmail.com> wrote: >> >>> Hi, >>> >>> I think the following link will help: >>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html >>> >>> Cheers >>> Arko >>> >>> On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia >> >wrote: >>> >>> > Sorry may be it's something obvious but I was wondering when map or >>> reduce >>> > gets called what would be the class used for key and value? If I used >>> > "org.apache.hadoop.io.Text >>> > value = *new* org.apache.hadoop.io.Text();" would the map be called >>> with >>> > Text class? >>> > >>> > public void map(LongWritable key, Text value, Context context) throws >>> > IOException, InterruptedException { >>> > >>> > >>> > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee < >>> > arkoprovomukher...@gmail.com> wrote: >>> > >>> > > Hi Mohit, >>> > > >>> > > I am not sure that I understand your question. >>> > > >>> > > But you can write into a file using: >>> > > *BufferedWriter output = new BufferedWriter >>> > > (new OutputStreamWriter(fs.create(my_path,true)));* >>> > > *output.write(data);* >>> > > * >>> > > * >>> > > Then you can pass that file as the input to your MapReduce program. >>> > > >>> > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );* >>> > > >>> > > From inside your Map/Reduce methods, I think you should NOT be >>> tinkering >>> > > with the input / output paths of that Map/Reduce job. >>> > > Cheers >>> > > Arko >>> > > >>> > > >>> > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia < >>> mohitanch...@gmail.com >>> > > >wrote: >>> > > >>> > > > Thanks How does mapreduce work on sequence file? Is there an >>> example I >>> > > can >>> > > > look at? >>> > > > >>> > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee < >>> > > > arkoprovomukher...@gmail.com> wrote: >>> > > > >>> > > > > Hi, >>> > > > > >>> > > > > Let's say all the smaller files are in the same directory. >>> > > > > >>> > > > > Then u can do: >>> > > > > >>> > > > > *BufferedWriter output = new BufferedWriter >>> > > > > (newOutputStreamWriter(fs.create(output_path, >>> > > > > true))); // Output path* >>> > > > > >>> > > > > *FileStatus[] output_files = fs.listStatus(new >>> Path(input_path)); // >>> > > > Input >>> > > > > directory* >>> > > > > >>> > > > > *for ( int i=0; i < output_files.length; i++ ) * >>> > > > > >>> > > > > *{* >>> > > > > >>> > > > > * BufferedReader reader = new >>> > > > > >>> > > >>> BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(; >>> > > > > * >>> > > > > >>> > > > > * String data;* >>> > > > > >>> > > > > * data = reader.readLine();* >>> > > > > >>> > > > > * while ( data != null ) * >>> > > > > >>> > > > > * {* >>> > > > > >>> > > > > *output.write(data);* >>> > > > > >>> > > > > * }* >>> > > > > >>> > > > > *reader.close* >>> > > > > >>> > > > > *}* >>> > > > > >>> > > > > *output.close* >>> > > > > >>> > > > > >>> > > > > In case you have the files in multiple directories, call the >>> code for >>> > > > each >>> > > > > of them with different input paths. >>> > > > > >>> > > > > Hope this helps! >>> > > > > >>> > > > > Cheers >>> > > > > >>> > > > > Arko >>> > > > > >>> > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia < >>> > mohitanch...@gmail.com >>> > > > > >wrote: >>> > > > > >>> > > > > > I am trying to look for examples that demonstrates using >>> sequence >>> > > files >>> > > > > > including writing to it and then running
Re: Did DFSClient cache the file data into a temporary local file
thanks a lot. 2012/2/21 Harsh J > Seven, > > Yes that strategy has changed since long ago, but the doc on it was > only recently updated: https://issues.apache.org/jira/browse/HDFS-1454 > (and some more improvements followed later IIRC) > > 2012/2/21 seven garfee : > > hi,all > > As this Page( > > http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#Staging) > > said,"In fact, initially the HDFS client caches the file data into a > > temporary local file". > > But I read the DFSClient.java in 0.20.2,and found nothing about storing > > data in tmp local file. > > Did I miss something or That strategy has been removed? > > > > -- > Harsh J > Customer Ops. Engineer > Cloudera | http://tiny.cloudera.com/about >
Re: Writing small files to one big file in hdfs
On Tue, Feb 21, 2012 at 7:50 PM, Mohit Anchlia wrote: > It looks like in mapper values are coming as binary instead of Text. Is > this expected from sequence file? I initially wrote SequenceFile with Text > values. > > On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia wrote: > >> Need some more help. I wrote sequence file using below code but now when I >> run mapreduce job I get "file.*java.lang.ClassCastException*: >> org.apache.hadoop.io.LongWritable cannot be cast to >> org.apache.hadoop.io.Text" even though I didn't use LongWritable when I >> originally wrote to the sequence >> >> //Code to write to the sequence file. There is no LongWritable here >> >> org.apache.hadoop.io.Text key = >> *new* org.apache.hadoop.io.Text(); >> >> BufferedReader buffer = >> *new* BufferedReader(*new* FileReader(filePath)); >> >> String line = >> *null*; >> >> org.apache.hadoop.io.Text value = >> *new* org.apache.hadoop.io.Text(); >> >> *try* { >> >> writer = SequenceFile.*createWriter*(fs, conf, path, key.getClass(), >> >> value.getClass(), SequenceFile.CompressionType. >> *RECORD*); >> >> *int* i = 1; >> >> *long* timestamp=System.*currentTimeMillis*(); >> >> *while* ((line = buffer.readLine()) != *null*) { >> >> key.set(String.*valueOf*(timestamp)); >> >> value.set(line); >> >> writer.append(key, value); >> >> i++; >> >> } >> >> >> On Tue, Feb 21, 2012 at 12:18 PM, Arko Provo Mukherjee < >> arkoprovomukher...@gmail.com> wrote: >> >>> Hi, >>> >>> I think the following link will help: >>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html >>> >>> Cheers >>> Arko >>> >>> On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia >> >wrote: >>> >>> > Sorry may be it's something obvious but I was wondering when map or >>> reduce >>> > gets called what would be the class used for key and value? If I used >>> > "org.apache.hadoop.io.Text >>> > value = *new* org.apache.hadoop.io.Text();" would the map be called with >>> > Text class? >>> > >>> > public void map(LongWritable key, Text value, Context context) throws >>> > IOException, InterruptedException { >>> > >>> > >>> > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee < >>> > arkoprovomukher...@gmail.com> wrote: >>> > >>> > > Hi Mohit, >>> > > >>> > > I am not sure that I understand your question. >>> > > >>> > > But you can write into a file using: >>> > > *BufferedWriter output = new BufferedWriter >>> > > (new OutputStreamWriter(fs.create(my_path,true)));* >>> > > *output.write(data);* >>> > > * >>> > > * >>> > > Then you can pass that file as the input to your MapReduce program. >>> > > >>> > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );* >>> > > >>> > > From inside your Map/Reduce methods, I think you should NOT be >>> tinkering >>> > > with the input / output paths of that Map/Reduce job. >>> > > Cheers >>> > > Arko >>> > > >>> > > >>> > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia < >>> mohitanch...@gmail.com >>> > > >wrote: >>> > > >>> > > > Thanks How does mapreduce work on sequence file? Is there an >>> example I >>> > > can >>> > > > look at? >>> > > > >>> > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee < >>> > > > arkoprovomukher...@gmail.com> wrote: >>> > > > >>> > > > > Hi, >>> > > > > >>> > > > > Let's say all the smaller files are in the same directory. >>> > > > > >>> > > > > Then u can do: >>> > > > > >>> > > > > *BufferedWriter output = new BufferedWriter >>> > > > > (newOutputStreamWriter(fs.create(output_path, >>> > > > > true))); // Output path* >>> > > > > >>> > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path)); >>> // >>> > > > Input >>> > > > > directory* >>> > > > > >>> > > > > *for ( int i=0; i < output_files.length; i++ ) * >>> > > > > >>> > > > > *{* >>> > > > > >>> > > > > * BufferedReader reader = new >>> > > > > >>> > > >>> BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(; >>> > > > > * >>> > > > > >>> > > > > * String data;* >>> > > > > >>> > > > > * data = reader.readLine();* >>> > > > > >>> > > > > * while ( data != null ) * >>> > > > > >>> > > > > * {* >>> > > > > >>> > > > > * output.write(data);* >>> > > > > >>> > > > > * }* >>> > > > > >>> > > > > * reader.close* >>> > > > > >>> > > > > *}* >>> > > > > >>> > > > > *output.close* >>> > > > > >>> > > > > >>> > > > > In case you have the files in multiple directories, call the code >>> for >>> > > > each >>> > > > > of them with different input paths. >>> > > > > >>> > > > > Hope this helps! >>> > > > > >>> > > > > Cheers >>> > > > > >>> > > > > Arko >>> > > > > >>> > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia < >>> > mohitanch...@gmail.com >>> > > > > >wrote: >>> > > > > >>> > > > > > I am trying to look for examples that demonstrates using >>> sequence >>> > > files >>> > > > > > including writing to it and then running mapred on it, but >>> unable >>> > to >>> > > > find >>> > > > > > one. Could you please point me to some examples of sequence >>> files? >>> > > > >
Re: Writing small files to one big file in hdfs
It looks like in mapper values are coming as binary instead of Text. Is this expected from sequence file? I initially wrote SequenceFile with Text values. On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia wrote: > Need some more help. I wrote sequence file using below code but now when I > run mapreduce job I get "file.*java.lang.ClassCastException*: > org.apache.hadoop.io.LongWritable cannot be cast to > org.apache.hadoop.io.Text" even though I didn't use LongWritable when I > originally wrote to the sequence > > //Code to write to the sequence file. There is no LongWritable here > > org.apache.hadoop.io.Text key = > *new* org.apache.hadoop.io.Text(); > > BufferedReader buffer = > *new* BufferedReader(*new* FileReader(filePath)); > > String line = > *null*; > > org.apache.hadoop.io.Text value = > *new* org.apache.hadoop.io.Text(); > > *try* { > > writer = SequenceFile.*createWriter*(fs, conf, path, key.getClass(), > > value.getClass(), SequenceFile.CompressionType. > *RECORD*); > > *int* i = 1; > > *long* timestamp=System.*currentTimeMillis*(); > > *while* ((line = buffer.readLine()) != *null*) { > > key.set(String.*valueOf*(timestamp)); > > value.set(line); > > writer.append(key, value); > > i++; > > } > > > On Tue, Feb 21, 2012 at 12:18 PM, Arko Provo Mukherjee < > arkoprovomukher...@gmail.com> wrote: > >> Hi, >> >> I think the following link will help: >> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html >> >> Cheers >> Arko >> >> On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia > >wrote: >> >> > Sorry may be it's something obvious but I was wondering when map or >> reduce >> > gets called what would be the class used for key and value? If I used >> > "org.apache.hadoop.io.Text >> > value = *new* org.apache.hadoop.io.Text();" would the map be called with >> > Text class? >> > >> > public void map(LongWritable key, Text value, Context context) throws >> > IOException, InterruptedException { >> > >> > >> > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee < >> > arkoprovomukher...@gmail.com> wrote: >> > >> > > Hi Mohit, >> > > >> > > I am not sure that I understand your question. >> > > >> > > But you can write into a file using: >> > > *BufferedWriter output = new BufferedWriter >> > > (new OutputStreamWriter(fs.create(my_path,true)));* >> > > *output.write(data);* >> > > * >> > > * >> > > Then you can pass that file as the input to your MapReduce program. >> > > >> > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );* >> > > >> > > From inside your Map/Reduce methods, I think you should NOT be >> tinkering >> > > with the input / output paths of that Map/Reduce job. >> > > Cheers >> > > Arko >> > > >> > > >> > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia < >> mohitanch...@gmail.com >> > > >wrote: >> > > >> > > > Thanks How does mapreduce work on sequence file? Is there an >> example I >> > > can >> > > > look at? >> > > > >> > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee < >> > > > arkoprovomukher...@gmail.com> wrote: >> > > > >> > > > > Hi, >> > > > > >> > > > > Let's say all the smaller files are in the same directory. >> > > > > >> > > > > Then u can do: >> > > > > >> > > > > *BufferedWriter output = new BufferedWriter >> > > > > (newOutputStreamWriter(fs.create(output_path, >> > > > > true))); // Output path* >> > > > > >> > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path)); >> // >> > > > Input >> > > > > directory* >> > > > > >> > > > > *for ( int i=0; i < output_files.length; i++ ) * >> > > > > >> > > > > *{* >> > > > > >> > > > > * BufferedReader reader = new >> > > > > >> > > >> BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(; >> > > > > * >> > > > > >> > > > > * String data;* >> > > > > >> > > > > * data = reader.readLine();* >> > > > > >> > > > > * while ( data != null ) * >> > > > > >> > > > > * {* >> > > > > >> > > > > *output.write(data);* >> > > > > >> > > > > * }* >> > > > > >> > > > > *reader.close* >> > > > > >> > > > > *}* >> > > > > >> > > > > *output.close* >> > > > > >> > > > > >> > > > > In case you have the files in multiple directories, call the code >> for >> > > > each >> > > > > of them with different input paths. >> > > > > >> > > > > Hope this helps! >> > > > > >> > > > > Cheers >> > > > > >> > > > > Arko >> > > > > >> > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia < >> > mohitanch...@gmail.com >> > > > > >wrote: >> > > > > >> > > > > > I am trying to look for examples that demonstrates using >> sequence >> > > files >> > > > > > including writing to it and then running mapred on it, but >> unable >> > to >> > > > find >> > > > > > one. Could you please point me to some examples of sequence >> files? >> > > > > > >> > > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks < >> bejoy.had...@gmail.com >> > > >> > > > > wrote: >> > > > > > >> > > > > > > Hi Mohit >> > > > > > > AFAIK XMLLoader in pig won't be suited for Sequence >> Files. >> > > >
Re: Writing small files to one big file in hdfs
Need some more help. I wrote sequence file using below code but now when I run mapreduce job I get "file.*java.lang.ClassCastException*: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text" even though I didn't use LongWritable when I originally wrote to the sequence //Code to write to the sequence file. There is no LongWritable here org.apache.hadoop.io.Text key = *new* org.apache.hadoop.io.Text(); BufferedReader buffer = *new* BufferedReader(*new* FileReader(filePath)); String line = *null*; org.apache.hadoop.io.Text value = *new* org.apache.hadoop.io.Text(); *try* { writer = SequenceFile.*createWriter*(fs, conf, path, key.getClass(), value.getClass(), SequenceFile.CompressionType.*RECORD*); *int* i = 1; *long* timestamp=System.*currentTimeMillis*(); *while* ((line = buffer.readLine()) != *null*) { key.set(String.*valueOf*(timestamp)); value.set(line); writer.append(key, value); i++; } On Tue, Feb 21, 2012 at 12:18 PM, Arko Provo Mukherjee < arkoprovomukher...@gmail.com> wrote: > Hi, > > I think the following link will help: > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html > > Cheers > Arko > > On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia >wrote: > > > Sorry may be it's something obvious but I was wondering when map or > reduce > > gets called what would be the class used for key and value? If I used > > "org.apache.hadoop.io.Text > > value = *new* org.apache.hadoop.io.Text();" would the map be called with > > Text class? > > > > public void map(LongWritable key, Text value, Context context) throws > > IOException, InterruptedException { > > > > > > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee < > > arkoprovomukher...@gmail.com> wrote: > > > > > Hi Mohit, > > > > > > I am not sure that I understand your question. > > > > > > But you can write into a file using: > > > *BufferedWriter output = new BufferedWriter > > > (new OutputStreamWriter(fs.create(my_path,true)));* > > > *output.write(data);* > > > * > > > * > > > Then you can pass that file as the input to your MapReduce program. > > > > > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );* > > > > > > From inside your Map/Reduce methods, I think you should NOT be > tinkering > > > with the input / output paths of that Map/Reduce job. > > > Cheers > > > Arko > > > > > > > > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia > > >wrote: > > > > > > > Thanks How does mapreduce work on sequence file? Is there an example > I > > > can > > > > look at? > > > > > > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee < > > > > arkoprovomukher...@gmail.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > Let's say all the smaller files are in the same directory. > > > > > > > > > > Then u can do: > > > > > > > > > > *BufferedWriter output = new BufferedWriter > > > > > (newOutputStreamWriter(fs.create(output_path, > > > > > true))); // Output path* > > > > > > > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path)); > // > > > > Input > > > > > directory* > > > > > > > > > > *for ( int i=0; i < output_files.length; i++ ) * > > > > > > > > > > *{* > > > > > > > > > > * BufferedReader reader = new > > > > > > > > > BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(; > > > > > * > > > > > > > > > > * String data;* > > > > > > > > > > * data = reader.readLine();* > > > > > > > > > > * while ( data != null ) * > > > > > > > > > > * {* > > > > > > > > > > *output.write(data);* > > > > > > > > > > * }* > > > > > > > > > > *reader.close* > > > > > > > > > > *}* > > > > > > > > > > *output.close* > > > > > > > > > > > > > > > In case you have the files in multiple directories, call the code > for > > > > each > > > > > of them with different input paths. > > > > > > > > > > Hope this helps! > > > > > > > > > > Cheers > > > > > > > > > > Arko > > > > > > > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia < > > mohitanch...@gmail.com > > > > > >wrote: > > > > > > > > > > > I am trying to look for examples that demonstrates using sequence > > > files > > > > > > including writing to it and then running mapred on it, but unable > > to > > > > find > > > > > > one. Could you please point me to some examples of sequence > files? > > > > > > > > > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks < > bejoy.had...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Hi Mohit > > > > > > > AFAIK XMLLoader in pig won't be suited for Sequence Files. > > > > Please > > > > > > > post the same to Pig user group for some workaround over the > > same. > > > > > > > SequenceFIle is a preferred option when we want to > store > > > > small > > > > > > > files in hdfs and needs to be processed by MapReduce as it > stores > > > > data > > > > > in > > > > > > > key value format.Since SequenceFileInputFormat is available at > > your > > > > > > > disposal you don't need any custom input formats for processing
Re: Dynamic changing of slaves
Yeah, I'm not sure how you can actually do it, as I haven't done it before, but from a logical perspective, you'd probably have to do a lot of configuration changes and maybe even write up some complicated M/R code, coordination/rules engine logic, change how the heartbeat & scheduler operate to do what you want. There might be an easier way, I'm not sure though. Peter J On 2/21/12 3:16 PM, "Merto Mertek" wrote: >I think that job configuration does not allow you such setup, however >maybe >I missed something.. > > Probably I would tackle this problem from the scheduler source. The >default one is JobQueueTaskScheduler which preserves a fifo based queue. >When a tasktracker (your slave) tells the jobtracker that it has some free >slots to run, JT in the heartbeat method calls the scheduler assignTasks >method where tasks are assigned on local basis. In other words, scheduler >tries to find tasks on the tasktracker which data resides on it. If the >scheduler will not find a local map/reduce task to run it will try to find >a non local one. Probably here is the point where you should do something >with your jobs and wait for the tasktrackers heartbeat.. Instead of >waiting >for the TT heartbeat, maybe there is another option to force an >heartbeatResponse, despite the TT has not send a heartbeat but I am not >aware of it.. > > >On 21 February 2012 19:27, theta wrote: > >> >> Hi, >> >> I am working on a project which requires a setup as follows: >> >> One master with four slaves.However, when a map only program is run, the >> master dynamically selects the slave to run the map. For example, when >>the >> program is run for the first time, slave 2 is selected to run the map >>and >> reduce programs, and the output is stored on dfs. When the program is >>run >> the second time, slave 3 is selected and son on. >> >> I am currently using Hadoop 0.20.2 with Ubuntu 11.10. >> >> Any ideas on creating the setup as described above? >> >> Regards >> >> -- >> View this message in context: >> >>http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> >>
Re: Dynamic changing of slaves
I think that job configuration does not allow you such setup, however maybe I missed something.. Probably I would tackle this problem from the scheduler source. The default one is JobQueueTaskScheduler which preserves a fifo based queue. When a tasktracker (your slave) tells the jobtracker that it has some free slots to run, JT in the heartbeat method calls the scheduler assignTasks method where tasks are assigned on local basis. In other words, scheduler tries to find tasks on the tasktracker which data resides on it. If the scheduler will not find a local map/reduce task to run it will try to find a non local one. Probably here is the point where you should do something with your jobs and wait for the tasktrackers heartbeat.. Instead of waiting for the TT heartbeat, maybe there is another option to force an heartbeatResponse, despite the TT has not send a heartbeat but I am not aware of it.. On 21 February 2012 19:27, theta wrote: > > Hi, > > I am working on a project which requires a setup as follows: > > One master with four slaves.However, when a map only program is run, the > master dynamically selects the slave to run the map. For example, when the > program is run for the first time, slave 2 is selected to run the map and > reduce programs, and the output is stored on dfs. When the program is run > the second time, slave 3 is selected and son on. > > I am currently using Hadoop 0.20.2 with Ubuntu 11.10. > > Any ideas on creating the setup as described above? > > Regards > > -- > View this message in context: > http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > >
Re: WAN-based Hadoop high availability (HA)?
For High Availability? The issue is the nameNode, going forward there is a Federated NameNode environment, but I haven't used it and not sure If it's kind of an active-active name node environment or just a sharded environment. DR/BR is always an issue when you have petabytes of data across clusters. There are secondary name node options, back up certain pieces and not others, Clone the box, etc. Peter J On 2/21/12 1:23 PM, "Saqib Jang -- Margalla Communications" wrote: >Hello, > >I'm a market analyst involved in researching the Hadoop space, had > >a quick question. I was wondering if and what type of requirements may > >there be for WAN-based high availability for Hadoop configurations > >e.g. for disaster recovery and what type of solutions may be available > >for such applications? > > > >thanks, > >Saqib > > > >Saqib Jang > >Principal/Founder > >Margalla Communications, Inc. > >1339 Portola Road, Woodside, CA 94062 > >(650) 274 8745 > >www.margallacomm.com > > > > >
Re: Writing to SequenceFile fails
I am past this error. Looks like I needed to use CDH libraries. I changed my maven repo. Now I am stuck at *org.apache.hadoop.security.AccessControlException *since I am not writing as user that owns the file. Looking online for solutions On Tue, Feb 21, 2012 at 12:48 PM, Mohit Anchlia wrote: > I am trying to write to the sequence file and it seems to be failing. Not > sure why, Is there something I need to do > > String uri="hdfs://db1:54310/examples/testfile1.seq"; > > FileSystem fs = FileSystem.*get*(URI.*create*(uri), conf); //Fails > on this line > > > Caused by: > *java.io.EOFException* > > at java.io.DataInputStream.readInt( > *DataInputStream.java:375*) > > at org.apache.hadoop.ipc.Client$Connection.receiveResponse( > *Client.java:501*) > > at org.apache.hadoop.ipc.Client$Connection.run(*Client.java:446*) >
WAN-based Hadoop high availability (HA)?
Hello, I'm a market analyst involved in researching the Hadoop space, had a quick question. I was wondering if and what type of requirements may there be for WAN-based high availability for Hadoop configurations e.g. for disaster recovery and what type of solutions may be available for such applications? thanks, Saqib Saqib Jang Principal/Founder Margalla Communications, Inc. 1339 Portola Road, Woodside, CA 94062 (650) 274 8745 www.margallacomm.com
Re: Writing small files to one big file in hdfs
Hi, I think the following link will help: http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Cheers Arko On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia wrote: > Sorry may be it's something obvious but I was wondering when map or reduce > gets called what would be the class used for key and value? If I used > "org.apache.hadoop.io.Text > value = *new* org.apache.hadoop.io.Text();" would the map be called with > Text class? > > public void map(LongWritable key, Text value, Context context) throws > IOException, InterruptedException { > > > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee < > arkoprovomukher...@gmail.com> wrote: > > > Hi Mohit, > > > > I am not sure that I understand your question. > > > > But you can write into a file using: > > *BufferedWriter output = new BufferedWriter > > (new OutputStreamWriter(fs.create(my_path,true)));* > > *output.write(data);* > > * > > * > > Then you can pass that file as the input to your MapReduce program. > > > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );* > > > > From inside your Map/Reduce methods, I think you should NOT be tinkering > > with the input / output paths of that Map/Reduce job. > > Cheers > > Arko > > > > > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia > >wrote: > > > > > Thanks How does mapreduce work on sequence file? Is there an example I > > can > > > look at? > > > > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee < > > > arkoprovomukher...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > Let's say all the smaller files are in the same directory. > > > > > > > > Then u can do: > > > > > > > > *BufferedWriter output = new BufferedWriter > > > > (newOutputStreamWriter(fs.create(output_path, > > > > true))); // Output path* > > > > > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path)); // > > > Input > > > > directory* > > > > > > > > *for ( int i=0; i < output_files.length; i++ ) * > > > > > > > > *{* > > > > > > > > * BufferedReader reader = new > > > > > > BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(; > > > > * > > > > > > > > * String data;* > > > > > > > > * data = reader.readLine();* > > > > > > > > * while ( data != null ) * > > > > > > > > * {* > > > > > > > > *output.write(data);* > > > > > > > > * }* > > > > > > > > *reader.close* > > > > > > > > *}* > > > > > > > > *output.close* > > > > > > > > > > > > In case you have the files in multiple directories, call the code for > > > each > > > > of them with different input paths. > > > > > > > > Hope this helps! > > > > > > > > Cheers > > > > > > > > Arko > > > > > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia < > mohitanch...@gmail.com > > > > >wrote: > > > > > > > > > I am trying to look for examples that demonstrates using sequence > > files > > > > > including writing to it and then running mapred on it, but unable > to > > > find > > > > > one. Could you please point me to some examples of sequence files? > > > > > > > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks > > > > > wrote: > > > > > > > > > > > Hi Mohit > > > > > > AFAIK XMLLoader in pig won't be suited for Sequence Files. > > > Please > > > > > > post the same to Pig user group for some workaround over the > same. > > > > > > SequenceFIle is a preferred option when we want to store > > > small > > > > > > files in hdfs and needs to be processed by MapReduce as it stores > > > data > > > > in > > > > > > key value format.Since SequenceFileInputFormat is available at > your > > > > > > disposal you don't need any custom input formats for processing > the > > > > same > > > > > > using map reduce. It is a cleaner and better approach compared to > > > just > > > > > > appending small xml file contents into a big file. > > > > > > > > > > > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia < > > > > mohitanch...@gmail.com > > > > > > >wrote: > > > > > > > > > > > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks < > > bejoy.had...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > Mohit > > > > > > > > Rather than just appending the content into a normal > text > > > > file > > > > > or > > > > > > > > so, you can create a sequence file with the individual > smaller > > > file > > > > > > > content > > > > > > > > as values. > > > > > > > > > > > > > > > > Thanks. I was planning to use pig's > > > > > > > org.apache.pig.piggybank.storage.XMLLoader > > > > > > > for processing. Would it work with sequence file? > > > > > > > > > > > > > > This text file that I was referring to would be in hdfs itself. > > Is > > > it > > > > > > still > > > > > > > different than using sequence file? > > > > > > > > > > > > > > > Regards > > > > > > > > Bejoy.K.S > > > > > > > > > > > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia < > > > > > > mohitanch...@gmail.com > > > > > > > > >wrote: > > > > > > > > > > > > > > > > > We have small xml files. Currently I am plannin
Re: Writing small files to one big file in hdfs
Sorry may be it's something obvious but I was wondering when map or reduce gets called what would be the class used for key and value? If I used "org.apache.hadoop.io.Text value = *new* org.apache.hadoop.io.Text();" would the map be called with Text class? public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee < arkoprovomukher...@gmail.com> wrote: > Hi Mohit, > > I am not sure that I understand your question. > > But you can write into a file using: > *BufferedWriter output = new BufferedWriter > (new OutputStreamWriter(fs.create(my_path,true)));* > *output.write(data);* > * > * > Then you can pass that file as the input to your MapReduce program. > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );* > > From inside your Map/Reduce methods, I think you should NOT be tinkering > with the input / output paths of that Map/Reduce job. > Cheers > Arko > > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia >wrote: > > > Thanks How does mapreduce work on sequence file? Is there an example I > can > > look at? > > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee < > > arkoprovomukher...@gmail.com> wrote: > > > > > Hi, > > > > > > Let's say all the smaller files are in the same directory. > > > > > > Then u can do: > > > > > > *BufferedWriter output = new BufferedWriter > > > (newOutputStreamWriter(fs.create(output_path, > > > true))); // Output path* > > > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path)); // > > Input > > > directory* > > > > > > *for ( int i=0; i < output_files.length; i++ ) * > > > > > > *{* > > > > > > * BufferedReader reader = new > > > > BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(; > > > * > > > > > > * String data;* > > > > > > * data = reader.readLine();* > > > > > > * while ( data != null ) * > > > > > > * {* > > > > > > *output.write(data);* > > > > > > * }* > > > > > > *reader.close* > > > > > > *}* > > > > > > *output.close* > > > > > > > > > In case you have the files in multiple directories, call the code for > > each > > > of them with different input paths. > > > > > > Hope this helps! > > > > > > Cheers > > > > > > Arko > > > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia > > >wrote: > > > > > > > I am trying to look for examples that demonstrates using sequence > files > > > > including writing to it and then running mapred on it, but unable to > > find > > > > one. Could you please point me to some examples of sequence files? > > > > > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks > > > wrote: > > > > > > > > > Hi Mohit > > > > > AFAIK XMLLoader in pig won't be suited for Sequence Files. > > Please > > > > > post the same to Pig user group for some workaround over the same. > > > > > SequenceFIle is a preferred option when we want to store > > small > > > > > files in hdfs and needs to be processed by MapReduce as it stores > > data > > > in > > > > > key value format.Since SequenceFileInputFormat is available at your > > > > > disposal you don't need any custom input formats for processing the > > > same > > > > > using map reduce. It is a cleaner and better approach compared to > > just > > > > > appending small xml file contents into a big file. > > > > > > > > > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia < > > > mohitanch...@gmail.com > > > > > >wrote: > > > > > > > > > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks < > bejoy.had...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > Mohit > > > > > > > Rather than just appending the content into a normal text > > > file > > > > or > > > > > > > so, you can create a sequence file with the individual smaller > > file > > > > > > content > > > > > > > as values. > > > > > > > > > > > > > > Thanks. I was planning to use pig's > > > > > > org.apache.pig.piggybank.storage.XMLLoader > > > > > > for processing. Would it work with sequence file? > > > > > > > > > > > > This text file that I was referring to would be in hdfs itself. > Is > > it > > > > > still > > > > > > different than using sequence file? > > > > > > > > > > > > > Regards > > > > > > > Bejoy.K.S > > > > > > > > > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia < > > > > > mohitanch...@gmail.com > > > > > > > >wrote: > > > > > > > > > > > > > > > We have small xml files. Currently I am planning to append > > these > > > > > small > > > > > > > > files to one file in hdfs so that I can take advantage of > > splits, > > > > > > larger > > > > > > > > blocks and sequential IO. What I am unsure is if it's ok to > > > append > > > > > one > > > > > > > file > > > > > > > > at a time to this hdfs file > > > > > > > > > > > > > > > > Could someone suggest if this is ok? Would like to know how > > other > > > > do > > > > > > it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: Writing small files to one big file in hdfs
Hi Mohit, I am not sure that I understand your question. But you can write into a file using: *BufferedWriter output = new BufferedWriter (new OutputStreamWriter(fs.create(my_path,true)));* *output.write(data);* * * Then you can pass that file as the input to your MapReduce program. *FileInputFormat.addInputPath(jobconf, new Path (my_path) );* >From inside your Map/Reduce methods, I think you should NOT be tinkering with the input / output paths of that Map/Reduce job. Cheers Arko On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia wrote: > Thanks How does mapreduce work on sequence file? Is there an example I can > look at? > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee < > arkoprovomukher...@gmail.com> wrote: > > > Hi, > > > > Let's say all the smaller files are in the same directory. > > > > Then u can do: > > > > *BufferedWriter output = new BufferedWriter > > (newOutputStreamWriter(fs.create(output_path, > > true))); // Output path* > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path)); // > Input > > directory* > > > > *for ( int i=0; i < output_files.length; i++ ) * > > > > *{* > > > > * BufferedReader reader = new > > BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(; > > * > > > > * String data;* > > > > * data = reader.readLine();* > > > > * while ( data != null ) * > > > > * {* > > > > *output.write(data);* > > > > * }* > > > > *reader.close* > > > > *}* > > > > *output.close* > > > > > > In case you have the files in multiple directories, call the code for > each > > of them with different input paths. > > > > Hope this helps! > > > > Cheers > > > > Arko > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia > >wrote: > > > > > I am trying to look for examples that demonstrates using sequence files > > > including writing to it and then running mapred on it, but unable to > find > > > one. Could you please point me to some examples of sequence files? > > > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks > > wrote: > > > > > > > Hi Mohit > > > > AFAIK XMLLoader in pig won't be suited for Sequence Files. > Please > > > > post the same to Pig user group for some workaround over the same. > > > > SequenceFIle is a preferred option when we want to store > small > > > > files in hdfs and needs to be processed by MapReduce as it stores > data > > in > > > > key value format.Since SequenceFileInputFormat is available at your > > > > disposal you don't need any custom input formats for processing the > > same > > > > using map reduce. It is a cleaner and better approach compared to > just > > > > appending small xml file contents into a big file. > > > > > > > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia < > > mohitanch...@gmail.com > > > > >wrote: > > > > > > > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks > > > > wrote: > > > > > > > > > > > Mohit > > > > > > Rather than just appending the content into a normal text > > file > > > or > > > > > > so, you can create a sequence file with the individual smaller > file > > > > > content > > > > > > as values. > > > > > > > > > > > > Thanks. I was planning to use pig's > > > > > org.apache.pig.piggybank.storage.XMLLoader > > > > > for processing. Would it work with sequence file? > > > > > > > > > > This text file that I was referring to would be in hdfs itself. Is > it > > > > still > > > > > different than using sequence file? > > > > > > > > > > > Regards > > > > > > Bejoy.K.S > > > > > > > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia < > > > > mohitanch...@gmail.com > > > > > > >wrote: > > > > > > > > > > > > > We have small xml files. Currently I am planning to append > these > > > > small > > > > > > > files to one file in hdfs so that I can take advantage of > splits, > > > > > larger > > > > > > > blocks and sequential IO. What I am unsure is if it's ok to > > append > > > > one > > > > > > file > > > > > > > at a time to this hdfs file > > > > > > > > > > > > > > Could someone suggest if this is ok? Would like to know how > other > > > do > > > > > it. > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: Writing small files to one big file in hdfs
Thanks How does mapreduce work on sequence file? Is there an example I can look at? On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee < arkoprovomukher...@gmail.com> wrote: > Hi, > > Let's say all the smaller files are in the same directory. > > Then u can do: > > *BufferedWriter output = new BufferedWriter > (newOutputStreamWriter(fs.create(output_path, > true))); // Output path* > > *FileStatus[] output_files = fs.listStatus(new Path(input_path)); // Input > directory* > > *for ( int i=0; i < output_files.length; i++ ) * > > *{* > > * BufferedReader reader = new > BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(; > * > > * String data;* > > * data = reader.readLine();* > > * while ( data != null ) * > > * {* > > *output.write(data);* > > * }* > > *reader.close* > > *}* > > *output.close* > > > In case you have the files in multiple directories, call the code for each > of them with different input paths. > > Hope this helps! > > Cheers > > Arko > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia >wrote: > > > I am trying to look for examples that demonstrates using sequence files > > including writing to it and then running mapred on it, but unable to find > > one. Could you please point me to some examples of sequence files? > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks > wrote: > > > > > Hi Mohit > > > AFAIK XMLLoader in pig won't be suited for Sequence Files. Please > > > post the same to Pig user group for some workaround over the same. > > > SequenceFIle is a preferred option when we want to store small > > > files in hdfs and needs to be processed by MapReduce as it stores data > in > > > key value format.Since SequenceFileInputFormat is available at your > > > disposal you don't need any custom input formats for processing the > same > > > using map reduce. It is a cleaner and better approach compared to just > > > appending small xml file contents into a big file. > > > > > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia < > mohitanch...@gmail.com > > > >wrote: > > > > > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks > > > wrote: > > > > > > > > > Mohit > > > > > Rather than just appending the content into a normal text > file > > or > > > > > so, you can create a sequence file with the individual smaller file > > > > content > > > > > as values. > > > > > > > > > > Thanks. I was planning to use pig's > > > > org.apache.pig.piggybank.storage.XMLLoader > > > > for processing. Would it work with sequence file? > > > > > > > > This text file that I was referring to would be in hdfs itself. Is it > > > still > > > > different than using sequence file? > > > > > > > > > Regards > > > > > Bejoy.K.S > > > > > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia < > > > mohitanch...@gmail.com > > > > > >wrote: > > > > > > > > > > > We have small xml files. Currently I am planning to append these > > > small > > > > > > files to one file in hdfs so that I can take advantage of splits, > > > > larger > > > > > > blocks and sequential IO. What I am unsure is if it's ok to > append > > > one > > > > > file > > > > > > at a time to this hdfs file > > > > > > > > > > > > Could someone suggest if this is ok? Would like to know how other > > do > > > > it. > > > > > > > > > > > > > > > > > > > > >
Re: Writing small files to one big file in hdfs
Hi, Let's say all the smaller files are in the same directory. Then u can do: *BufferedWriter output = new BufferedWriter (newOutputStreamWriter(fs.create(output_path, true))); // Output path* *FileStatus[] output_files = fs.listStatus(new Path(input_path)); // Input directory* *for ( int i=0; i < output_files.length; i++ ) * *{* * BufferedReader reader = new BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(; * * String data;* * data = reader.readLine();* * while ( data != null ) * * {* *output.write(data);* * }* *reader.close* *}* *output.close* In case you have the files in multiple directories, call the code for each of them with different input paths. Hope this helps! Cheers Arko On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia wrote: > I am trying to look for examples that demonstrates using sequence files > including writing to it and then running mapred on it, but unable to find > one. Could you please point me to some examples of sequence files? > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks wrote: > > > Hi Mohit > > AFAIK XMLLoader in pig won't be suited for Sequence Files. Please > > post the same to Pig user group for some workaround over the same. > > SequenceFIle is a preferred option when we want to store small > > files in hdfs and needs to be processed by MapReduce as it stores data in > > key value format.Since SequenceFileInputFormat is available at your > > disposal you don't need any custom input formats for processing the same > > using map reduce. It is a cleaner and better approach compared to just > > appending small xml file contents into a big file. > > > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia > >wrote: > > > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks > > wrote: > > > > > > > Mohit > > > > Rather than just appending the content into a normal text file > or > > > > so, you can create a sequence file with the individual smaller file > > > content > > > > as values. > > > > > > > > Thanks. I was planning to use pig's > > > org.apache.pig.piggybank.storage.XMLLoader > > > for processing. Would it work with sequence file? > > > > > > This text file that I was referring to would be in hdfs itself. Is it > > still > > > different than using sequence file? > > > > > > > Regards > > > > Bejoy.K.S > > > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia < > > mohitanch...@gmail.com > > > > >wrote: > > > > > > > > > We have small xml files. Currently I am planning to append these > > small > > > > > files to one file in hdfs so that I can take advantage of splits, > > > larger > > > > > blocks and sequential IO. What I am unsure is if it's ok to append > > one > > > > file > > > > > at a time to this hdfs file > > > > > > > > > > Could someone suggest if this is ok? Would like to know how other > do > > > it. > > > > > > > > > > > > > > >
Re: Writing small files to one big file in hdfs
I am trying to look for examples that demonstrates using sequence files including writing to it and then running mapred on it, but unable to find one. Could you please point me to some examples of sequence files? On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks wrote: > Hi Mohit > AFAIK XMLLoader in pig won't be suited for Sequence Files. Please > post the same to Pig user group for some workaround over the same. > SequenceFIle is a preferred option when we want to store small > files in hdfs and needs to be processed by MapReduce as it stores data in > key value format.Since SequenceFileInputFormat is available at your > disposal you don't need any custom input formats for processing the same > using map reduce. It is a cleaner and better approach compared to just > appending small xml file contents into a big file. > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia >wrote: > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks > wrote: > > > > > Mohit > > > Rather than just appending the content into a normal text file or > > > so, you can create a sequence file with the individual smaller file > > content > > > as values. > > > > > > Thanks. I was planning to use pig's > > org.apache.pig.piggybank.storage.XMLLoader > > for processing. Would it work with sequence file? > > > > This text file that I was referring to would be in hdfs itself. Is it > still > > different than using sequence file? > > > > > Regards > > > Bejoy.K.S > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia < > mohitanch...@gmail.com > > > >wrote: > > > > > > > We have small xml files. Currently I am planning to append these > small > > > > files to one file in hdfs so that I can take advantage of splits, > > larger > > > > blocks and sequential IO. What I am unsure is if it's ok to append > one > > > file > > > > at a time to this hdfs file > > > > > > > > Could someone suggest if this is ok? Would like to know how other do > > it. > > > > > > > > > >
Re: Writing small files to one big file in hdfs
You might want to check out File Crusher: http://www.jointhegrid.com/hadoop_filecrush/index.jsp I've never used it, but it sounds like it could be helpful. On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks wrote: > Hi Mohit > AFAIK XMLLoader in pig won't be suited for Sequence Files. Please > post the same to Pig user group for some workaround over the same. > SequenceFIle is a preferred option when we want to store small > files in hdfs and needs to be processed by MapReduce as it stores data in > key value format.Since SequenceFileInputFormat is available at your > disposal you don't need any custom input formats for processing the same > using map reduce. It is a cleaner and better approach compared to just > appending small xml file contents into a big file. > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia >wrote: > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks > wrote: > > > > > Mohit > > > Rather than just appending the content into a normal text file or > > > so, you can create a sequence file with the individual smaller file > > content > > > as values. > > > > > > Thanks. I was planning to use pig's > > org.apache.pig.piggybank.storage.XMLLoader > > for processing. Would it work with sequence file? > > > > This text file that I was referring to would be in hdfs itself. Is it > still > > different than using sequence file? > > > > > Regards > > > Bejoy.K.S > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia < > mohitanch...@gmail.com > > > >wrote: > > > > > > > We have small xml files. Currently I am planning to append these > small > > > > files to one file in hdfs so that I can take advantage of splits, > > larger > > > > blocks and sequential IO. What I am unsure is if it's ok to append > one > > > file > > > > at a time to this hdfs file > > > > > > > > Could someone suggest if this is ok? Would like to know how other do > > it. > > > > > > > > > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
Dynamic changing of slaves
Hi, I am working on a project which requires a setup as follows: One master with four slaves.However, when a map only program is run, the master dynamically selects the slave to run the map. For example, when the program is run for the first time, slave 2 is selected to run the map and reduce programs, and the output is stored on dfs. When the program is run the second time, slave 3 is selected and son on. I am currently using Hadoop 0.20.2 with Ubuntu 11.10. Any ideas on creating the setup as described above? Regards -- View this message in context: http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Writing small files to one big file in hdfs
Hi Mohit AFAIK XMLLoader in pig won't be suited for Sequence Files. Please post the same to Pig user group for some workaround over the same. SequenceFIle is a preferred option when we want to store small files in hdfs and needs to be processed by MapReduce as it stores data in key value format.Since SequenceFileInputFormat is available at your disposal you don't need any custom input formats for processing the same using map reduce. It is a cleaner and better approach compared to just appending small xml file contents into a big file. On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia wrote: > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks wrote: > > > Mohit > > Rather than just appending the content into a normal text file or > > so, you can create a sequence file with the individual smaller file > content > > as values. > > > > Thanks. I was planning to use pig's > org.apache.pig.piggybank.storage.XMLLoader > for processing. Would it work with sequence file? > > This text file that I was referring to would be in hdfs itself. Is it still > different than using sequence file? > > > Regards > > Bejoy.K.S > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia > >wrote: > > > > > We have small xml files. Currently I am planning to append these small > > > files to one file in hdfs so that I can take advantage of splits, > larger > > > blocks and sequential IO. What I am unsure is if it's ok to append one > > file > > > at a time to this hdfs file > > > > > > Could someone suggest if this is ok? Would like to know how other do > it. > > > > > >
Re: Writing small files to one big file in hdfs
On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks wrote: > Mohit > Rather than just appending the content into a normal text file or > so, you can create a sequence file with the individual smaller file content > as values. > > Thanks. I was planning to use pig's > org.apache.pig.piggybank.storage.XMLLoader for processing. Would it work with sequence file? This text file that I was referring to would be in hdfs itself. Is it still different than using sequence file? > Regards > Bejoy.K.S > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia >wrote: > > > We have small xml files. Currently I am planning to append these small > > files to one file in hdfs so that I can take advantage of splits, larger > > blocks and sequential IO. What I am unsure is if it's ok to append one > file > > at a time to this hdfs file > > > > Could someone suggest if this is ok? Would like to know how other do it. > > >
Re: Writing small files to one big file in hdfs
Mohit Rather than just appending the content into a normal text file or so, you can create a sequence file with the individual smaller file content as values. Regards Bejoy.K.S On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia wrote: > We have small xml files. Currently I am planning to append these small > files to one file in hdfs so that I can take advantage of splits, larger > blocks and sequential IO. What I am unsure is if it's ok to append one file > at a time to this hdfs file > > Could someone suggest if this is ok? Would like to know how other do it. >
Re: Writing small files to one big file in hdfs
I'd recommend making a SequenceFile[1] to store each XML file as a value. -Joey [1] http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/SequenceFile.html On Tue, Feb 21, 2012 at 12:15 PM, Mohit Anchlia wrote: > We have small xml files. Currently I am planning to append these small > files to one file in hdfs so that I can take advantage of splits, larger > blocks and sequential IO. What I am unsure is if it's ok to append one file > at a time to this hdfs file > > Could someone suggest if this is ok? Would like to know how other do it. > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: Number of Under-Replicated Blocks ?
Have you had any Name Node failures lately? I had them every couple of days and found that there were files being left in hdfs /log/hadoop/tmp/mapred/staging/... when communications with the Name Node was lost. Not sure why they never got replicated correctly (maybe because they are in /log?) I went in and removed the old files (say 2 days older or older) and saw the # of blocks drop to 0. Hope this helps, Chris On Mon, Feb 20, 2012 at 1:25 AM, praveenesh kumar wrote: > I recently added a new DN/TT to my cluster. Could it be the reason for such > behaviour ? > > Thanks, > Praveenesh > > On Mon, Feb 20, 2012 at 11:51 AM, Harsh J wrote: > > > Hi, > > > > The tool "hadoop fsck" will tell you which files are under replicated > > with a count of what was expected instead, just run it over /. > > > > While it isn't a 'normal' thing to see it come up suddenly it is still > > in the safe zone, and is most likely an indicator that either one of > > your DN or one of its disks has gone bad, or you have a bad > > mapred.submit.replication value for your cluster size (default is 10 > > replicas for all MR job submit data), or bit rot of existing blocks on > > HDDs around the cluster, etc. -- You can mostly spot the pattern of > > files causing it by running the fsck and obtaining the listing. > > > > On Mon, Feb 20, 2012 at 11:43 AM, praveenesh kumar > > > wrote: > > > Hi, > > > > > > I am suddenly seeing some under-replicated blocks on my cluster. > Although > > > its not causing any problems, but It seems like few blocks are not > > > replicated properly. > > > > > > Number of Under-Replicated Blocks : 147 > > > > > > Is it okay behavior on hadoop. If no, How can I know what are the files > > > with under-replicated blocks and how can I configure it properly to > > reduce > > > the number of under-replicated blocks. > > > > > > Thanks, > > > Praveenesh > > > > > > > > -- > > Harsh J > > Customer Ops. Engineer > > Cloudera | http://tiny.cloudera.com/about > > >
Re: access hbase table from hadoop mapreduce
It sounds to me like you just need to include your HBase jars into your compiler's classpath like so: javac -classpath $HADOOP_HOME Example.java where $HADOOP_HOME includes all your base hadoop jars as well as your hbase jars. then you would want to put the resulting Example.class file into it's own jar with something like this: jar cvf Example.jar Example.class then you can execute the program with this: hadoop jar Example.jar Example The manual for running the hadoop CLI is here: http://hadoop.apache.org/common/docs/current/commands_manual.html Hope that helps, Clint On Tue, Feb 21, 2012 at 1:26 AM, amsal wrote: > hi.. > i want to access hbase table from hadoop mapreducei m using windowsXP > and cygwin > i m using hadoop-0.20.2 and hbase-0.92.0 > hadoop cluster is working finei am able to run mapreduce wordcount > successfully on 3 pc's > hbase is also working .i can cerate table from shell > > i have tried many examples but they are not workingwhen i try to > compile > it using > javac Example.java > > it gives error. > org.apache.hadoop.hbase.client does not exist > org.apache.hadoop.hbase does not exist > org.apache.hadoop.hbase.io does not exist > > please can anyone help me in this.. > -plz give me some example code to access hbase from hadoop map reduce > -also guide me how should i compile and execute it > > thanx in advance > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/access-hbase-table-from-hadoop-mapreduce-tp3762847p3762847.html > Sent from the Hadoop lucene-users mailing list archive at Nabble.com. >
HDFS problem in hadoop 0.20.203
Hi Hadoopers, We are experiencing a strange problem on Hadoop 0.20.203 Our cluster has 58 nodes, everything is started from a fresh HDFS (we deleted all local folders on datanodes and reformatted the namenode). After running some small jobs, the HDFS becomes behaving abnormally and the jobs become very slow. The namenode log is crushed by Gigabytes of errors like is: 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_4524177823306792294 is added to invalidSet of 10.105.19.31:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_4524177823306792294 is added to invalidSet of 10.105.19.18:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_4524177823306792294 is added to invalidSet of 10.105.19.32:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_2884522252507300332 is added to invalidSet of 10.105.19.35:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_2884522252507300332 is added to invalidSet of 10.105.19.27:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_2884522252507300332 is added to invalidSet of 10.105.19.33:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.21:50010 is added to blk_- 6843171124277753504_2279882 size 124490 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000313_0/result_stem-m-00313. blk_- 6379064588594672168_2279890 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.26:50010 is added to blk_5338983375361999760_2279887 size 1476 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.29:50010 is added to blk_-977828927900581074_2279887 size 13818 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000364_0/result_stem-m-00364 is closed by DFSClient_attempt_201202202043_0013_m_000364_0 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.23:50010 is added to blk_5338983375361999760_2279887 size 1476 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.20:50010 is added to blk_5338983375361999760_2279887 size 1476 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000364_0/result_suffix-m-00364. blk_1921685366929756336_2279890 2012-02-21 00:00:38,634 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000279_0/result_suffix-m-00279 is closed by DFSClient_attempt_201202202043_0013_m_000279_0 2012-02-21 00:00:38,635 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_495061820035691700 is added to invalidSet of 10.105.19.20:50010 2012-02-21 00:00:38,635 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_495061820035691700 is added to invalidSet of 10.105.19.25:50010 2012-02-21 00:00:38,635 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_495061820035691700 is added to invalidSet of 10.105.19.33:50010 2012-02-21 00:00:38,635 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000284_0/result_stem-m-00284. blk_8796188324642771330_2279891 2012-02-21 00:00:38,638 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.34:50010 is added to blk_-977828927900581074_2279887 size 13818 2012-02-21 00:00:38,638 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000296_0/result_stem-m-00296. blk_- 6800409224007034579_2279891 2012-02-21 00:00:38,638 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.29:50010 is added to blk_1921685366929756336_2279890 size 1511 2012-02-21 00:00:38,638 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.25:50010 is added to blk_- 2982099629304436976_2279752 size 569 In Map/Reduce
Re: Problem in installation
Dheeraj, In most homogenous cluster environments, people do keep the configs synced. However, that isn't necessary. It is alright to have different *-site.xml contents on each slave, tailored for its provided resources. For instance if you have 3 slaves with 3 disks, and 1 slave with 2, you can have a different "dfs.data.dir" configuration on 2-disk one. Although, managing configurations this way could get a bit painful unless you use a configuration manager that can ease managing config entries for you. On Tue, Feb 21, 2012 at 4:06 PM, Dheeraj Kv wrote: > Hi > > > I am installing hadoop cluster of 5 nodes. > I decided to make 1 node as master (namenode and jobtracker) and rest of the > 4 nodes as slaves( datanode and task tracker). > I m skeptical about the configuration file location. Does the same site.xml > files reside in all the cluster nodes? > If so I will have different hdfs mount points on different nodes, and when > the same site.xml files are available on all nodes > will it cause any problem if it don't find the mount point on one node (which > is available on other node) . > > > Regards > > Dheeraj KV > > -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
Problem in installation
Hi I am installing hadoop cluster of 5 nodes. I decided to make 1 node as master (namenode and jobtracker) and rest of the 4 nodes as slaves( datanode and task tracker). I m skeptical about the configuration file location. Does the same site.xml files reside in all the cluster nodes? If so I will have different hdfs mount points on different nodes, and when the same site.xml files are available on all nodes will it cause any problem if it don't find the mount point on one node (which is available on other node) . Regards Dheeraj KV
Application Submission using ClientRMProtocol in Hadoop 0.23
Hi, I followed steps given in below link to submit a application on hadoop 0.23: http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html It didn't work for me. It may be because 1) ClientRMProtocol is not a VersionedProtocol 2) GetNewApplicationRequest do not implement writable interface Are steps given in above link worked for any one? Regards, Abhishek -- View this message in context: http://hadoop-common.472056.n3.nabble.com/Application-Submission-using-ClientRMProtocol-in-Hadoop-0-23-tp3763037p3763037.html Sent from the Users mailing list archive at Nabble.com.
access hbase table from hadoop mapreduce
hi.. i want to access hbase table from hadoop mapreducei m using windowsXP and cygwin i m using hadoop-0.20.2 and hbase-0.92.0 hadoop cluster is working finei am able to run mapreduce wordcount successfully on 3 pc's hbase is also working .i can cerate table from shell i have tried many examples but they are not workingwhen i try to compile it using javac Example.java it gives error. org.apache.hadoop.hbase.client does not exist org.apache.hadoop.hbase does not exist org.apache.hadoop.hbase.io does not exist please can anyone help me in this.. -plz give me some example code to access hbase from hadoop map reduce -also guide me how should i compile and execute it thanx in advance -- View this message in context: http://lucene.472066.n3.nabble.com/access-hbase-table-from-hadoop-mapreduce-tp3762847p3762847.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Pydoop 0.5 released
awesome, guys! -Alex sent via my mobile device On Feb 20, 2012, at 11:59 PM, Luca Pireddu wrote: > Hello everyone, > > we're happy to announce that we have just released Pydoop 0.5.0 > (http://pydoop.sourceforge.net). > > The main changes with respect to the previous version are: > * Pydoop now works with Hadoop 1.0.0. > * Support for multiple Hadoop versions with the same Pydoop installation > * Easy Pydoop scripting with pydoop_script > * Python version requirement bumped to 2.7 > * Dropped support for Hadoop 0.21 > > > Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the C++ > Pipes and the C libhdfs APIs, that allows to write full-fledged MapReduce > applications with HDFS access. Pydoop has been maturing nicely and is > currently in production use at CRS4 as we have a few scientific projects that > are based on it, including Seal > (https://sourceforge.net/projects/biodoop-seal/), Biodoop and Biodoop-BLAST > (https://sourceforge.net/projects/biodoop/), and a new project for > high-throughput genotyping that is about to be released by CRS4. > > > Links: > > * download page: http://sourceforge.net/projects/pydoop/files > * full release notes: > http://sourceforge.net/apps/mediawiki/pydoop/index.php?title=Release_Notes > > > Happy pydooping! > > > The Pydoop Team
Re: Did DFSClient cache the file data into a temporary local file
Seven, Yes that strategy has changed since long ago, but the doc on it was only recently updated: https://issues.apache.org/jira/browse/HDFS-1454 (and some more improvements followed later IIRC) 2012/2/21 seven garfee : > hi,all > As this Page( > http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#Staging) > said,"In fact, initially the HDFS client caches the file data into a > temporary local file". > But I read the DFSClient.java in 0.20.2,and found nothing about storing > data in tmp local file. > Did I miss something or That strategy has been removed? -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
Tasktracker fails
Dear all, Today I am trying to configure hadoop-0.20.205.0 on a 4 node Cluster. When I start my cluster , all daemons got started except tasktracker, don't know why task tracker fails due to following error logs. Cluster is in private network.My /etc/hosts file contains all IP hostname resolution commands in all nodes. 2012-02-21 17:48:33,056 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered. 2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.net.SocketException: Invalid argument at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server.bind(Server.java:225) at org.apache.hadoop.ipc.Server$Listener.(Server.java:301) at org.apache.hadoop.ipc.Server.(Server.java:1483) at org.apache.hadoop.ipc.RPC$Server.(RPC.java:545) at org.apache.hadoop.ipc.RPC.getServer(RPC.java:506) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:772) at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1428) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3673) Any comments on the issue. Thanks
Consistent "register getProtocolVersion" error due to "Duplicate metricsName:getProtocolVersion" during cluster startup -- then various other errors during job execution
Hi, I've got a pseudo-distributed Hadoop (v0.20.02) setup with 1 machine (with Ubuntu 10.04 LTS) running all the hadoop processes (NN + SNN + JT + TT + DN). I've also configured the files under conf/ so that the master is referred to by its actual machine name (in this case, *bali*), instead of localhost (however, the issue below is seen regardless). I was able to successfully format the HDFS (by running hadoop namenode –format). However, right after I deploy the cluster using bin/start-all.sh, I see the following error in the NameNode's config file. It is an INFO error, but I believe it is the root cause behind various other errors I am encountering when executing actual Hadoop jobs. (For instance, at one point I see errors that the datanode and namenode were communicating using different protocol versions ... 3 vs 6 etc.). Anyway, here is the initial error: *2012-02-21 09:01:42,015 INFO org.apache.hadoop.ipc.Server: Error register getProtocolVersion java.lang.**IllegalArgumentException: Duplicate metricsName:getProtocolVersion at org.apache.hadoop.metrics.**util.MetricsRegistry.add(** MetricsRegistry.java:53) at org.apache.hadoop.metrics.**util.MetricsTimeVaryingRate.<** init>(MetricsTimeVaryingRate.**java:89) at org.apache.hadoop.metrics.**util.MetricsTimeVaryingRate.<** init>(MetricsTimeVaryingRate.**java:99) at org.apache.hadoop.ipc.RPC$**Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$**Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$**Handler$1.run(Server.java:955) at java.security.**AccessController.doPrivileged(**Native Method) at javax.security.auth.Subject.**doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$**Handler.run(Server.java:953) * I’ve scoured the web searching for other instances of this error, but none of the hits were helpful, nor relevant to my setup. My hunch is that this is preventing the cluster from correctly initializing. I would have switched to a later version of Hadoop, but the Nutch v1.4 distribution I’m trying to run on top of Hadoop is, AFAIK, only compatible with Hadoop v0.20. I have included with this email all my hadoop config files (config.rar), in case you need to take a quick look. Below is my /etc/hosts configuration, in case the issue is with that. I believe this is a hadoop-specific issue, and not related to Nutch, hence am posting to the hadoop mailing list. *ETC/HOSTS: **127.0.0.1 localhost #127.0.1.1 bali** ** # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 192.168.1.21 bali ** FILE-SYSTEM layout:** *Here's my filesystem layout. I've got all my hadoop configs pointing to folders under a root folder called */private/user/hadoop*, with the following permissions. *ls -l /private/user/ *total 4 drwxrwxrwx 7 user alt 4096 Feb 21 09:06 hadoop *ls -l /private/user/hadoop/ *total 20 drwxr-xr-x 5 user alt 4096 Feb 21 09:01 data drwxr-xr-x 3 user alt 4096 Feb 21 09:07 mapred drwxr-xr-x 4 user alt 4096 Feb 21 08:59 name drwxr-xr-x 2 user alt 4096 Feb 21 08:59 pids drwxr-xr-x 3 user alt 4096 Feb 21 09:01 tmp Shortly after the getProtocolVersion error above, I start seeing these errors in the namenode log: *2012-02-21 09:06:47,895 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.io.IOException: Server returned HTTP response code: 503 for URL: http://192.168.1.21:50090/getimage?getimage=1 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:151) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:58) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseN
Pydoop 0.5 released
Hello everyone, we're happy to announce that we have just released Pydoop 0.5.0 (http://pydoop.sourceforge.net). The main changes with respect to the previous version are: * Pydoop now works with Hadoop 1.0.0. * Support for multiple Hadoop versions with the same Pydoop installation * Easy Pydoop scripting with pydoop_script * Python version requirement bumped to 2.7 * Dropped support for Hadoop 0.21 Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the C++ Pipes and the C libhdfs APIs, that allows to write full-fledged MapReduce applications with HDFS access. Pydoop has been maturing nicely and is currently in production use at CRS4 as we have a few scientific projects that are based on it, including Seal (https://sourceforge.net/projects/biodoop-seal/), Biodoop and Biodoop-BLAST (https://sourceforge.net/projects/biodoop/), and a new project for high-throughput genotyping that is about to be released by CRS4. Links: * download page: http://sourceforge.net/projects/pydoop/files * full release notes: http://sourceforge.net/apps/mediawiki/pydoop/index.php?title=Release_Notes Happy pydooping! The Pydoop Team