Changing into Replication factor

2012-02-21 Thread hadoop hive
HI Folks,

Rite now i m having replication factor 2, but now i want to make it three
for sum tables so how can i do that for specific tables, so that whenever
the data would be loaded in those tables it can automatically replicated
into three nodes.

Or i need to replicate for all the tables.

and how can i do that by simply changing the parameter to 3 and run
-*refreshNodes
*or there is another way to perform that.


Regards
hadoopHive


Re: Writing to SequenceFile fails

2012-02-21 Thread Harsh J
1. It is important to ensure your clients are on the same major
version jars as your server.
2. You are probably looking for "hadoop fs -chown" and "hadoop fs
-chmod" tools to modify permissions.

On Wed, Feb 22, 2012 at 3:15 AM, Mohit Anchlia  wrote:
> I am past this error. Looks like I needed to use CDH libraries. I changed
> my maven repo. Now I am stuck at
>
> *org.apache.hadoop.security.AccessControlException *since I am not writing
> as user that owns the file. Looking online for solutions
>
>
> On Tue, Feb 21, 2012 at 12:48 PM, Mohit Anchlia wrote:
>
>> I am trying to write to the sequence file and it seems to be failing. Not
>> sure why, Is there something I need to do
>>
>> String uri="hdfs://db1:54310/examples/testfile1.seq";
>>
>> FileSystem fs = FileSystem.*get*(URI.*create*(uri), conf);      //Fails
>> on this line
>>
>>
>> Caused by:
>> *java.io.EOFException*
>>
>> at java.io.DataInputStream.readInt(
>> *DataInputStream.java:375*)
>>
>> at org.apache.hadoop.ipc.Client$Connection.receiveResponse(
>> *Client.java:501*)
>>
>> at org.apache.hadoop.ipc.Client$Connection.run(*Client.java:446*)
>>



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about


Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
Finally figured it out. I needed to use SequenceFileAstextInputFormat.
There is just lack of examples that makes it difficult when you start.

On Tue, Feb 21, 2012 at 4:50 PM, Mohit Anchlia wrote:

> It looks like in mapper values are coming as binary instead of Text. Is
> this expected from sequence file? I initially wrote SequenceFile with Text
> values.
>
>
> On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia wrote:
>
>> Need some more help. I wrote sequence file using below code but now when
>> I run mapreduce job I get "file.*java.lang.ClassCastException*:
>> org.apache.hadoop.io.LongWritable cannot be cast to
>> org.apache.hadoop.io.Text" even though I didn't use LongWritable when I
>> originally wrote to the sequence
>>
>> //Code to write to the sequence file. There is no LongWritable here
>>
>> org.apache.hadoop.io.Text key =
>> *new* org.apache.hadoop.io.Text();
>>
>> BufferedReader buffer =
>> *new* BufferedReader(*new* FileReader(filePath));
>>
>> String line =
>> *null*;
>>
>> org.apache.hadoop.io.Text value =
>> *new* org.apache.hadoop.io.Text();
>>
>> *try* {
>>
>> writer = SequenceFile.*createWriter*(fs, conf, path, key.getClass(),
>>
>> value.getClass(), SequenceFile.CompressionType.
>> *RECORD*);
>>
>> *int* i = 1;
>>
>> *long* timestamp=System.*currentTimeMillis*();
>>
>> *while* ((line = buffer.readLine()) != *null*) {
>>
>> key.set(String.*valueOf*(timestamp));
>>
>> value.set(line);
>>
>> writer.append(key, value);
>>
>> i++;
>>
>> }
>>
>>
>>   On Tue, Feb 21, 2012 at 12:18 PM, Arko Provo Mukherjee <
>> arkoprovomukher...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I think the following link will help:
>>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
>>>
>>> Cheers
>>> Arko
>>>
>>> On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia >> >wrote:
>>>
>>> > Sorry may be it's something obvious but I was wondering when map or
>>> reduce
>>> > gets called what would be the class used for key and value? If I used
>>> > "org.apache.hadoop.io.Text
>>> > value = *new* org.apache.hadoop.io.Text();" would the map be called
>>> with
>>>  > Text class?
>>> >
>>> > public void map(LongWritable key, Text value, Context context) throws
>>> > IOException, InterruptedException {
>>> >
>>> >
>>> > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee <
>>> > arkoprovomukher...@gmail.com> wrote:
>>> >
>>> > > Hi Mohit,
>>> > >
>>> > > I am not sure that I understand your question.
>>> > >
>>> > > But you can write into a file using:
>>> > > *BufferedWriter output = new BufferedWriter
>>> > > (new OutputStreamWriter(fs.create(my_path,true)));*
>>> > > *output.write(data);*
>>> > > *
>>> > > *
>>> > > Then you can pass that file as the input to your MapReduce program.
>>> > >
>>> > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );*
>>> > >
>>> > > From inside your Map/Reduce methods, I think you should NOT be
>>> tinkering
>>> > > with the input / output paths of that Map/Reduce job.
>>> > > Cheers
>>> > > Arko
>>> > >
>>> > >
>>> > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia <
>>> mohitanch...@gmail.com
>>> > > >wrote:
>>> > >
>>> > > > Thanks How does mapreduce work on sequence file? Is there an
>>> example I
>>> > > can
>>> > > > look at?
>>> > > >
>>> > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee <
>>> > > > arkoprovomukher...@gmail.com> wrote:
>>> > > >
>>> > > > > Hi,
>>> > > > >
>>> > > > > Let's say all the smaller files are in the same directory.
>>> > > > >
>>> > > > > Then u can do:
>>> > > > >
>>> > > > > *BufferedWriter output = new BufferedWriter
>>> > > > > (newOutputStreamWriter(fs.create(output_path,
>>> > > > > true)));  // Output path*
>>> > > > >
>>> > > > > *FileStatus[] output_files = fs.listStatus(new
>>> Path(input_path));  //
>>> > > > Input
>>> > > > > directory*
>>> > > > >
>>> > > > > *for ( int i=0; i < output_files.length; i++ )  *
>>> > > > >
>>> > > > > *{*
>>> > > > >
>>> > > > > *   BufferedReader reader = new
>>> > > > >
>>> > >
>>> BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(;
>>> > > > > *
>>> > > > >
>>> > > > > *   String data;*
>>> > > > >
>>> > > > > *   data = reader.readLine();*
>>> > > > >
>>> > > > > *   while ( data != null ) *
>>> > > > >
>>> > > > > *  {*
>>> > > > >
>>> > > > > *output.write(data);*
>>> > > > >
>>> > > > > *  }*
>>> > > > >
>>> > > > > *reader.close*
>>> > > > >
>>> > > > > *}*
>>> > > > >
>>> > > > > *output.close*
>>> > > > >
>>> > > > >
>>> > > > > In case you have the files in multiple directories, call the
>>> code for
>>> > > > each
>>> > > > > of them with different input paths.
>>> > > > >
>>> > > > > Hope this helps!
>>> > > > >
>>> > > > > Cheers
>>> > > > >
>>> > > > > Arko
>>> > > > >
>>> > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia <
>>> > mohitanch...@gmail.com
>>> > > > > >wrote:
>>> > > > >
>>> > > > > > I am trying to look for examples that demonstrates using
>>> sequence
>>> > > files
>>> > > > > > including writing to it and then running 

Re: Did DFSClient cache the file data into a temporary local file

2012-02-21 Thread seven garfee
thanks a lot.


2012/2/21 Harsh J 

> Seven,
>
> Yes that strategy has changed since long ago, but the doc on it was
> only recently updated: https://issues.apache.org/jira/browse/HDFS-1454
> (and some more improvements followed later IIRC)
>
> 2012/2/21 seven garfee :
> > hi,all
> > As this Page(
> > http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#Staging)
> >  said,"In fact, initially the HDFS client caches the file data into a
> > temporary local file".
> > But I read the DFSClient.java in 0.20.2,and found nothing about storing
> > data in tmp local file.
> > Did I miss something or That strategy has been removed?
>
>
>
> --
> Harsh J
> Customer Ops. Engineer
> Cloudera | http://tiny.cloudera.com/about
>


Re: Writing small files to one big file in hdfs

2012-02-21 Thread Edward Capriolo
On Tue, Feb 21, 2012 at 7:50 PM, Mohit Anchlia  wrote:
> It looks like in mapper values are coming as binary instead of Text. Is
> this expected from sequence file? I initially wrote SequenceFile with Text
> values.
>
> On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia wrote:
>
>> Need some more help. I wrote sequence file using below code but now when I
>> run mapreduce job I get "file.*java.lang.ClassCastException*:
>> org.apache.hadoop.io.LongWritable cannot be cast to
>> org.apache.hadoop.io.Text" even though I didn't use LongWritable when I
>> originally wrote to the sequence
>>
>> //Code to write to the sequence file. There is no LongWritable here
>>
>> org.apache.hadoop.io.Text key =
>> *new* org.apache.hadoop.io.Text();
>>
>> BufferedReader buffer =
>> *new* BufferedReader(*new* FileReader(filePath));
>>
>> String line =
>> *null*;
>>
>> org.apache.hadoop.io.Text value =
>> *new* org.apache.hadoop.io.Text();
>>
>> *try* {
>>
>> writer = SequenceFile.*createWriter*(fs, conf, path, key.getClass(),
>>
>> value.getClass(), SequenceFile.CompressionType.
>> *RECORD*);
>>
>> *int* i = 1;
>>
>> *long* timestamp=System.*currentTimeMillis*();
>>
>> *while* ((line = buffer.readLine()) != *null*) {
>>
>> key.set(String.*valueOf*(timestamp));
>>
>> value.set(line);
>>
>> writer.append(key, value);
>>
>> i++;
>>
>> }
>>
>>
>>   On Tue, Feb 21, 2012 at 12:18 PM, Arko Provo Mukherjee <
>> arkoprovomukher...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I think the following link will help:
>>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
>>>
>>> Cheers
>>> Arko
>>>
>>> On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia >> >wrote:
>>>
>>> > Sorry may be it's something obvious but I was wondering when map or
>>> reduce
>>> > gets called what would be the class used for key and value? If I used
>>> > "org.apache.hadoop.io.Text
>>> > value = *new* org.apache.hadoop.io.Text();" would the map be called with
>>>  > Text class?
>>> >
>>> > public void map(LongWritable key, Text value, Context context) throws
>>> > IOException, InterruptedException {
>>> >
>>> >
>>> > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee <
>>> > arkoprovomukher...@gmail.com> wrote:
>>> >
>>> > > Hi Mohit,
>>> > >
>>> > > I am not sure that I understand your question.
>>> > >
>>> > > But you can write into a file using:
>>> > > *BufferedWriter output = new BufferedWriter
>>> > > (new OutputStreamWriter(fs.create(my_path,true)));*
>>> > > *output.write(data);*
>>> > > *
>>> > > *
>>> > > Then you can pass that file as the input to your MapReduce program.
>>> > >
>>> > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );*
>>> > >
>>> > > From inside your Map/Reduce methods, I think you should NOT be
>>> tinkering
>>> > > with the input / output paths of that Map/Reduce job.
>>> > > Cheers
>>> > > Arko
>>> > >
>>> > >
>>> > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia <
>>> mohitanch...@gmail.com
>>> > > >wrote:
>>> > >
>>> > > > Thanks How does mapreduce work on sequence file? Is there an
>>> example I
>>> > > can
>>> > > > look at?
>>> > > >
>>> > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee <
>>> > > > arkoprovomukher...@gmail.com> wrote:
>>> > > >
>>> > > > > Hi,
>>> > > > >
>>> > > > > Let's say all the smaller files are in the same directory.
>>> > > > >
>>> > > > > Then u can do:
>>> > > > >
>>> > > > > *BufferedWriter output = new BufferedWriter
>>> > > > > (newOutputStreamWriter(fs.create(output_path,
>>> > > > > true)));  // Output path*
>>> > > > >
>>> > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path));
>>>  //
>>> > > > Input
>>> > > > > directory*
>>> > > > >
>>> > > > > *for ( int i=0; i < output_files.length; i++ )  *
>>> > > > >
>>> > > > > *{*
>>> > > > >
>>> > > > > *   BufferedReader reader = new
>>> > > > >
>>> > >
>>> BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(;
>>> > > > > *
>>> > > > >
>>> > > > > *   String data;*
>>> > > > >
>>> > > > > *   data = reader.readLine();*
>>> > > > >
>>> > > > > *   while ( data != null ) *
>>> > > > >
>>> > > > > *  {*
>>> > > > >
>>> > > > > *        output.write(data);*
>>> > > > >
>>> > > > > *  }*
>>> > > > >
>>> > > > > *    reader.close*
>>> > > > >
>>> > > > > *}*
>>> > > > >
>>> > > > > *output.close*
>>> > > > >
>>> > > > >
>>> > > > > In case you have the files in multiple directories, call the code
>>> for
>>> > > > each
>>> > > > > of them with different input paths.
>>> > > > >
>>> > > > > Hope this helps!
>>> > > > >
>>> > > > > Cheers
>>> > > > >
>>> > > > > Arko
>>> > > > >
>>> > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia <
>>> > mohitanch...@gmail.com
>>> > > > > >wrote:
>>> > > > >
>>> > > > > > I am trying to look for examples that demonstrates using
>>> sequence
>>> > > files
>>> > > > > > including writing to it and then running mapred on it, but
>>> unable
>>> > to
>>> > > > find
>>> > > > > > one. Could you please point me to some examples of sequence
>>> files?
>>> > > > >

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
It looks like in mapper values are coming as binary instead of Text. Is
this expected from sequence file? I initially wrote SequenceFile with Text
values.

On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia wrote:

> Need some more help. I wrote sequence file using below code but now when I
> run mapreduce job I get "file.*java.lang.ClassCastException*:
> org.apache.hadoop.io.LongWritable cannot be cast to
> org.apache.hadoop.io.Text" even though I didn't use LongWritable when I
> originally wrote to the sequence
>
> //Code to write to the sequence file. There is no LongWritable here
>
> org.apache.hadoop.io.Text key =
> *new* org.apache.hadoop.io.Text();
>
> BufferedReader buffer =
> *new* BufferedReader(*new* FileReader(filePath));
>
> String line =
> *null*;
>
> org.apache.hadoop.io.Text value =
> *new* org.apache.hadoop.io.Text();
>
> *try* {
>
> writer = SequenceFile.*createWriter*(fs, conf, path, key.getClass(),
>
> value.getClass(), SequenceFile.CompressionType.
> *RECORD*);
>
> *int* i = 1;
>
> *long* timestamp=System.*currentTimeMillis*();
>
> *while* ((line = buffer.readLine()) != *null*) {
>
> key.set(String.*valueOf*(timestamp));
>
> value.set(line);
>
> writer.append(key, value);
>
> i++;
>
> }
>
>
>   On Tue, Feb 21, 2012 at 12:18 PM, Arko Provo Mukherjee <
> arkoprovomukher...@gmail.com> wrote:
>
>> Hi,
>>
>> I think the following link will help:
>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
>>
>> Cheers
>> Arko
>>
>> On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia > >wrote:
>>
>> > Sorry may be it's something obvious but I was wondering when map or
>> reduce
>> > gets called what would be the class used for key and value? If I used
>> > "org.apache.hadoop.io.Text
>> > value = *new* org.apache.hadoop.io.Text();" would the map be called with
>>  > Text class?
>> >
>> > public void map(LongWritable key, Text value, Context context) throws
>> > IOException, InterruptedException {
>> >
>> >
>> > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee <
>> > arkoprovomukher...@gmail.com> wrote:
>> >
>> > > Hi Mohit,
>> > >
>> > > I am not sure that I understand your question.
>> > >
>> > > But you can write into a file using:
>> > > *BufferedWriter output = new BufferedWriter
>> > > (new OutputStreamWriter(fs.create(my_path,true)));*
>> > > *output.write(data);*
>> > > *
>> > > *
>> > > Then you can pass that file as the input to your MapReduce program.
>> > >
>> > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );*
>> > >
>> > > From inside your Map/Reduce methods, I think you should NOT be
>> tinkering
>> > > with the input / output paths of that Map/Reduce job.
>> > > Cheers
>> > > Arko
>> > >
>> > >
>> > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia <
>> mohitanch...@gmail.com
>> > > >wrote:
>> > >
>> > > > Thanks How does mapreduce work on sequence file? Is there an
>> example I
>> > > can
>> > > > look at?
>> > > >
>> > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee <
>> > > > arkoprovomukher...@gmail.com> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Let's say all the smaller files are in the same directory.
>> > > > >
>> > > > > Then u can do:
>> > > > >
>> > > > > *BufferedWriter output = new BufferedWriter
>> > > > > (newOutputStreamWriter(fs.create(output_path,
>> > > > > true)));  // Output path*
>> > > > >
>> > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path));
>>  //
>> > > > Input
>> > > > > directory*
>> > > > >
>> > > > > *for ( int i=0; i < output_files.length; i++ )  *
>> > > > >
>> > > > > *{*
>> > > > >
>> > > > > *   BufferedReader reader = new
>> > > > >
>> > >
>> BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(;
>> > > > > *
>> > > > >
>> > > > > *   String data;*
>> > > > >
>> > > > > *   data = reader.readLine();*
>> > > > >
>> > > > > *   while ( data != null ) *
>> > > > >
>> > > > > *  {*
>> > > > >
>> > > > > *output.write(data);*
>> > > > >
>> > > > > *  }*
>> > > > >
>> > > > > *reader.close*
>> > > > >
>> > > > > *}*
>> > > > >
>> > > > > *output.close*
>> > > > >
>> > > > >
>> > > > > In case you have the files in multiple directories, call the code
>> for
>> > > > each
>> > > > > of them with different input paths.
>> > > > >
>> > > > > Hope this helps!
>> > > > >
>> > > > > Cheers
>> > > > >
>> > > > > Arko
>> > > > >
>> > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia <
>> > mohitanch...@gmail.com
>> > > > > >wrote:
>> > > > >
>> > > > > > I am trying to look for examples that demonstrates using
>> sequence
>> > > files
>> > > > > > including writing to it and then running mapred on it, but
>> unable
>> > to
>> > > > find
>> > > > > > one. Could you please point me to some examples of sequence
>> files?
>> > > > > >
>> > > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks <
>> bejoy.had...@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi Mohit
>> > > > > > >  AFAIK XMLLoader in pig won't be suited for Sequence
>> Files.
>> > > >

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
Need some more help. I wrote sequence file using below code but now when I
run mapreduce job I get "file.*java.lang.ClassCastException*:
org.apache.hadoop.io.LongWritable cannot be cast to
org.apache.hadoop.io.Text" even though I didn't use LongWritable when I
originally wrote to the sequence

//Code to write to the sequence file. There is no LongWritable here

org.apache.hadoop.io.Text key = *new* org.apache.hadoop.io.Text();

BufferedReader buffer = *new* BufferedReader(*new* FileReader(filePath));

String line = *null*;

org.apache.hadoop.io.Text value = *new* org.apache.hadoop.io.Text();

*try* {

writer = SequenceFile.*createWriter*(fs, conf, path, key.getClass(),

value.getClass(), SequenceFile.CompressionType.*RECORD*);

*int* i = 1;

*long* timestamp=System.*currentTimeMillis*();

*while* ((line = buffer.readLine()) != *null*) {

key.set(String.*valueOf*(timestamp));

value.set(line);

writer.append(key, value);

i++;

}


On Tue, Feb 21, 2012 at 12:18 PM, Arko Provo Mukherjee <
arkoprovomukher...@gmail.com> wrote:

> Hi,
>
> I think the following link will help:
> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
>
> Cheers
> Arko
>
> On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia  >wrote:
>
> > Sorry may be it's something obvious but I was wondering when map or
> reduce
> > gets called what would be the class used for key and value? If I used
> > "org.apache.hadoop.io.Text
> > value = *new* org.apache.hadoop.io.Text();" would the map be called with
>  > Text class?
> >
> > public void map(LongWritable key, Text value, Context context) throws
> > IOException, InterruptedException {
> >
> >
> > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee <
> > arkoprovomukher...@gmail.com> wrote:
> >
> > > Hi Mohit,
> > >
> > > I am not sure that I understand your question.
> > >
> > > But you can write into a file using:
> > > *BufferedWriter output = new BufferedWriter
> > > (new OutputStreamWriter(fs.create(my_path,true)));*
> > > *output.write(data);*
> > > *
> > > *
> > > Then you can pass that file as the input to your MapReduce program.
> > >
> > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );*
> > >
> > > From inside your Map/Reduce methods, I think you should NOT be
> tinkering
> > > with the input / output paths of that Map/Reduce job.
> > > Cheers
> > > Arko
> > >
> > >
> > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia  > > >wrote:
> > >
> > > > Thanks How does mapreduce work on sequence file? Is there an example
> I
> > > can
> > > > look at?
> > > >
> > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee <
> > > > arkoprovomukher...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Let's say all the smaller files are in the same directory.
> > > > >
> > > > > Then u can do:
> > > > >
> > > > > *BufferedWriter output = new BufferedWriter
> > > > > (newOutputStreamWriter(fs.create(output_path,
> > > > > true)));  // Output path*
> > > > >
> > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path));
>  //
> > > > Input
> > > > > directory*
> > > > >
> > > > > *for ( int i=0; i < output_files.length; i++ )  *
> > > > >
> > > > > *{*
> > > > >
> > > > > *   BufferedReader reader = new
> > > > >
> > >
> BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(;
> > > > > *
> > > > >
> > > > > *   String data;*
> > > > >
> > > > > *   data = reader.readLine();*
> > > > >
> > > > > *   while ( data != null ) *
> > > > >
> > > > > *  {*
> > > > >
> > > > > *output.write(data);*
> > > > >
> > > > > *  }*
> > > > >
> > > > > *reader.close*
> > > > >
> > > > > *}*
> > > > >
> > > > > *output.close*
> > > > >
> > > > >
> > > > > In case you have the files in multiple directories, call the code
> for
> > > > each
> > > > > of them with different input paths.
> > > > >
> > > > > Hope this helps!
> > > > >
> > > > > Cheers
> > > > >
> > > > > Arko
> > > > >
> > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia <
> > mohitanch...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > I am trying to look for examples that demonstrates using sequence
> > > files
> > > > > > including writing to it and then running mapred on it, but unable
> > to
> > > > find
> > > > > > one. Could you please point me to some examples of sequence
> files?
> > > > > >
> > > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks <
> bejoy.had...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Mohit
> > > > > > >  AFAIK XMLLoader in pig won't be suited for Sequence Files.
> > > > Please
> > > > > > > post the same to Pig user group for some workaround over the
> > same.
> > > > > > > SequenceFIle is a preferred option when we want to
> store
> > > > small
> > > > > > > files in hdfs and needs to be processed by MapReduce as it
> stores
> > > > data
> > > > > in
> > > > > > > key value format.Since SequenceFileInputFormat is available at
> > your
> > > > > > > disposal you don't need any custom input formats for processing

Re: Dynamic changing of slaves

2012-02-21 Thread Jamack, Peter
Yeah, I'm not sure how you can actually do it, as I haven't done it
before, but from a logical perspective,  you'd probably have to do a lot
of configuration changes and maybe even write up some complicated M/R
code, coordination/rules engine logic, change how the heartbeat &
scheduler operate to do what you want.
 There might be an easier way, I'm not sure though.

Peter J

On 2/21/12 3:16 PM, "Merto Mertek"  wrote:

>I think that job configuration does not allow you such setup, however
>maybe
>I missed something..
>
> Probably I would tackle this problem from the scheduler source. The
>default one is JobQueueTaskScheduler which preserves a fifo based queue.
>When a tasktracker (your slave) tells the jobtracker that it has some free
>slots to run, JT in the heartbeat method calls the scheduler assignTasks
>method where tasks are assigned on local basis. In other words, scheduler
>tries to find tasks on the tasktracker which data resides on it. If the
>scheduler will not find a local map/reduce task to run it will try to find
>a non local one. Probably here is the point where you should do something
>with your jobs and wait for the tasktrackers heartbeat.. Instead of
>waiting
>for the TT heartbeat, maybe there is another option to force an
>heartbeatResponse, despite the TT has not send a heartbeat but I am not
>aware of it..
>
>
>On 21 February 2012 19:27, theta  wrote:
>
>>
>> Hi,
>>
>> I am working on a project which requires a setup as follows:
>>
>> One master with four slaves.However, when a map only program is run, the
>> master dynamically selects the slave to run the map. For example, when
>>the
>> program is run for the first time, slave 2 is selected to run the map
>>and
>> reduce programs, and the output is stored on dfs. When the program is
>>run
>> the second time, slave 3 is selected and son on.
>>
>> I am currently using Hadoop 0.20.2 with Ubuntu 11.10.
>>
>> Any ideas on creating the setup as described above?
>>
>> Regards
>>
>> --
>> View this message in context:
>> 
>>http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>



Re: Dynamic changing of slaves

2012-02-21 Thread Merto Mertek
I think that job configuration does not allow you such setup, however maybe
I missed something..

 Probably I would tackle this problem from the scheduler source. The
default one is JobQueueTaskScheduler which preserves a fifo based queue.
When a tasktracker (your slave) tells the jobtracker that it has some free
slots to run, JT in the heartbeat method calls the scheduler assignTasks
method where tasks are assigned on local basis. In other words, scheduler
tries to find tasks on the tasktracker which data resides on it. If the
scheduler will not find a local map/reduce task to run it will try to find
a non local one. Probably here is the point where you should do something
with your jobs and wait for the tasktrackers heartbeat.. Instead of waiting
for the TT heartbeat, maybe there is another option to force an
heartbeatResponse, despite the TT has not send a heartbeat but I am not
aware of it..


On 21 February 2012 19:27, theta  wrote:

>
> Hi,
>
> I am working on a project which requires a setup as follows:
>
> One master with four slaves.However, when a map only program is run, the
> master dynamically selects the slave to run the map. For example, when the
> program is run for the first time, slave 2 is selected to run the map and
> reduce programs, and the output is stored on dfs. When the program is run
> the second time, slave 3 is selected and son on.
>
> I am currently using Hadoop 0.20.2 with Ubuntu 11.10.
>
> Any ideas on creating the setup as described above?
>
> Regards
>
> --
> View this message in context:
> http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


Re: WAN-based Hadoop high availability (HA)?

2012-02-21 Thread Jamack, Peter
For High Availability?
The issue is the nameNode, going forward there is a Federated NameNode
environment, but I haven't used it and not sure If it's kind of an
active-active name node environment or just a sharded environment.

  DR/BR is always an issue when you have petabytes of data across clusters.
There are secondary name node options, back up certain pieces and not
others,
Clone the box, etc.

Peter J

On 2/21/12 1:23 PM, "Saqib Jang -- Margalla Communications"
 wrote:

>Hello,
>
>I'm a market analyst involved in researching the Hadoop space, had
>
>a quick question. I was wondering if and what type of requirements may
>
>there be for WAN-based high availability for Hadoop configurations
>
>e.g. for disaster recovery and what type of solutions may be available
>
>for such applications?
>
> 
>
>thanks,
>
>Saqib
>
> 
>
>Saqib Jang
>
>Principal/Founder
>
>Margalla Communications, Inc.
>
>1339 Portola Road, Woodside, CA 94062
>
>(650) 274 8745
>
>www.margallacomm.com
>
> 
>
> 
>



Re: Writing to SequenceFile fails

2012-02-21 Thread Mohit Anchlia
I am past this error. Looks like I needed to use CDH libraries. I changed
my maven repo. Now I am stuck at

*org.apache.hadoop.security.AccessControlException *since I am not writing
as user that owns the file. Looking online for solutions


On Tue, Feb 21, 2012 at 12:48 PM, Mohit Anchlia wrote:

> I am trying to write to the sequence file and it seems to be failing. Not
> sure why, Is there something I need to do
>
> String uri="hdfs://db1:54310/examples/testfile1.seq";
>
> FileSystem fs = FileSystem.*get*(URI.*create*(uri), conf);  //Fails
> on this line
>
>
> Caused by:
> *java.io.EOFException*
>
> at java.io.DataInputStream.readInt(
> *DataInputStream.java:375*)
>
> at org.apache.hadoop.ipc.Client$Connection.receiveResponse(
> *Client.java:501*)
>
> at org.apache.hadoop.ipc.Client$Connection.run(*Client.java:446*)
>


WAN-based Hadoop high availability (HA)?

2012-02-21 Thread Saqib Jang -- Margalla Communications
Hello,

I'm a market analyst involved in researching the Hadoop space, had

a quick question. I was wondering if and what type of requirements may

there be for WAN-based high availability for Hadoop configurations

e.g. for disaster recovery and what type of solutions may be available

for such applications?

 

thanks,

Saqib

 

Saqib Jang

Principal/Founder

Margalla Communications, Inc.

1339 Portola Road, Woodside, CA 94062

(650) 274 8745

www.margallacomm.com

 

 



Re: Writing small files to one big file in hdfs

2012-02-21 Thread Arko Provo Mukherjee
Hi,

I think the following link will help:
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

Cheers
Arko

On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia wrote:

> Sorry may be it's something obvious but I was wondering when map or reduce
> gets called what would be the class used for key and value? If I used
> "org.apache.hadoop.io.Text
> value = *new* org.apache.hadoop.io.Text();" would the map be called with
> Text class?
>
> public void map(LongWritable key, Text value, Context context) throws
> IOException, InterruptedException {
>
>
> On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee <
> arkoprovomukher...@gmail.com> wrote:
>
> > Hi Mohit,
> >
> > I am not sure that I understand your question.
> >
> > But you can write into a file using:
> > *BufferedWriter output = new BufferedWriter
> > (new OutputStreamWriter(fs.create(my_path,true)));*
> > *output.write(data);*
> > *
> > *
> > Then you can pass that file as the input to your MapReduce program.
> >
> > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );*
> >
> > From inside your Map/Reduce methods, I think you should NOT be tinkering
> > with the input / output paths of that Map/Reduce job.
> > Cheers
> > Arko
> >
> >
> > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia  > >wrote:
> >
> > > Thanks How does mapreduce work on sequence file? Is there an example I
> > can
> > > look at?
> > >
> > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee <
> > > arkoprovomukher...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Let's say all the smaller files are in the same directory.
> > > >
> > > > Then u can do:
> > > >
> > > > *BufferedWriter output = new BufferedWriter
> > > > (newOutputStreamWriter(fs.create(output_path,
> > > > true)));  // Output path*
> > > >
> > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path));  //
> > > Input
> > > > directory*
> > > >
> > > > *for ( int i=0; i < output_files.length; i++ )  *
> > > >
> > > > *{*
> > > >
> > > > *   BufferedReader reader = new
> > > >
> > BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(;
> > > > *
> > > >
> > > > *   String data;*
> > > >
> > > > *   data = reader.readLine();*
> > > >
> > > > *   while ( data != null ) *
> > > >
> > > > *  {*
> > > >
> > > > *output.write(data);*
> > > >
> > > > *  }*
> > > >
> > > > *reader.close*
> > > >
> > > > *}*
> > > >
> > > > *output.close*
> > > >
> > > >
> > > > In case you have the files in multiple directories, call the code for
> > > each
> > > > of them with different input paths.
> > > >
> > > > Hope this helps!
> > > >
> > > > Cheers
> > > >
> > > > Arko
> > > >
> > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia <
> mohitanch...@gmail.com
> > > > >wrote:
> > > >
> > > > > I am trying to look for examples that demonstrates using sequence
> > files
> > > > > including writing to it and then running mapred on it, but unable
> to
> > > find
> > > > > one. Could you please point me to some examples of sequence files?
> > > > >
> > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks  >
> > > > wrote:
> > > > >
> > > > > > Hi Mohit
> > > > > >  AFAIK XMLLoader in pig won't be suited for Sequence Files.
> > > Please
> > > > > > post the same to Pig user group for some workaround over the
> same.
> > > > > > SequenceFIle is a preferred option when we want to store
> > > small
> > > > > > files in hdfs and needs to be processed by MapReduce as it stores
> > > data
> > > > in
> > > > > > key value format.Since SequenceFileInputFormat is available at
> your
> > > > > > disposal you don't need any custom input formats for processing
> the
> > > > same
> > > > > > using map reduce. It is a cleaner and better approach compared to
> > > just
> > > > > > appending small xml file contents into a big file.
> > > > > >
> > > > > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia <
> > > > mohitanch...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks <
> > bejoy.had...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Mohit
> > > > > > > >   Rather than just appending the content into a normal
> text
> > > > file
> > > > > or
> > > > > > > > so, you can create a sequence file with the individual
> smaller
> > > file
> > > > > > > content
> > > > > > > > as values.
> > > > > > > >
> > > > > > > >  Thanks. I was planning to use pig's
> > > > > > > org.apache.pig.piggybank.storage.XMLLoader
> > > > > > > for processing. Would it work with sequence file?
> > > > > > >
> > > > > > > This text file that I was referring to would be in hdfs itself.
> > Is
> > > it
> > > > > > still
> > > > > > > different than using sequence file?
> > > > > > >
> > > > > > > > Regards
> > > > > > > > Bejoy.K.S
> > > > > > > >
> > > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia <
> > > > > > mohitanch...@gmail.com
> > > > > > > > >wrote:
> > > > > > > >
> > > > > > > > > We have small xml files. Currently I am plannin

Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
Sorry may be it's something obvious but I was wondering when map or reduce
gets called what would be the class used for key and value? If I used
"org.apache.hadoop.io.Text
value = *new* org.apache.hadoop.io.Text();" would the map be called with
Text class?

public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {


On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee <
arkoprovomukher...@gmail.com> wrote:

> Hi Mohit,
>
> I am not sure that I understand your question.
>
> But you can write into a file using:
> *BufferedWriter output = new BufferedWriter
> (new OutputStreamWriter(fs.create(my_path,true)));*
> *output.write(data);*
> *
> *
> Then you can pass that file as the input to your MapReduce program.
>
> *FileInputFormat.addInputPath(jobconf, new Path (my_path) );*
>
> From inside your Map/Reduce methods, I think you should NOT be tinkering
> with the input / output paths of that Map/Reduce job.
> Cheers
> Arko
>
>
> On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia  >wrote:
>
> > Thanks How does mapreduce work on sequence file? Is there an example I
> can
> > look at?
> >
> > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee <
> > arkoprovomukher...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Let's say all the smaller files are in the same directory.
> > >
> > > Then u can do:
> > >
> > > *BufferedWriter output = new BufferedWriter
> > > (newOutputStreamWriter(fs.create(output_path,
> > > true)));  // Output path*
> > >
> > > *FileStatus[] output_files = fs.listStatus(new Path(input_path));  //
> > Input
> > > directory*
> > >
> > > *for ( int i=0; i < output_files.length; i++ )  *
> > >
> > > *{*
> > >
> > > *   BufferedReader reader = new
> > >
> BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(;
> > > *
> > >
> > > *   String data;*
> > >
> > > *   data = reader.readLine();*
> > >
> > > *   while ( data != null ) *
> > >
> > > *  {*
> > >
> > > *output.write(data);*
> > >
> > > *  }*
> > >
> > > *reader.close*
> > >
> > > *}*
> > >
> > > *output.close*
> > >
> > >
> > > In case you have the files in multiple directories, call the code for
> > each
> > > of them with different input paths.
> > >
> > > Hope this helps!
> > >
> > > Cheers
> > >
> > > Arko
> > >
> > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia  > > >wrote:
> > >
> > > > I am trying to look for examples that demonstrates using sequence
> files
> > > > including writing to it and then running mapred on it, but unable to
> > find
> > > > one. Could you please point me to some examples of sequence files?
> > > >
> > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks 
> > > wrote:
> > > >
> > > > > Hi Mohit
> > > > >  AFAIK XMLLoader in pig won't be suited for Sequence Files.
> > Please
> > > > > post the same to Pig user group for some workaround over the same.
> > > > > SequenceFIle is a preferred option when we want to store
> > small
> > > > > files in hdfs and needs to be processed by MapReduce as it stores
> > data
> > > in
> > > > > key value format.Since SequenceFileInputFormat is available at your
> > > > > disposal you don't need any custom input formats for processing the
> > > same
> > > > > using map reduce. It is a cleaner and better approach compared to
> > just
> > > > > appending small xml file contents into a big file.
> > > > >
> > > > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia <
> > > mohitanch...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks <
> bejoy.had...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Mohit
> > > > > > >   Rather than just appending the content into a normal text
> > > file
> > > > or
> > > > > > > so, you can create a sequence file with the individual smaller
> > file
> > > > > > content
> > > > > > > as values.
> > > > > > >
> > > > > > >  Thanks. I was planning to use pig's
> > > > > > org.apache.pig.piggybank.storage.XMLLoader
> > > > > > for processing. Would it work with sequence file?
> > > > > >
> > > > > > This text file that I was referring to would be in hdfs itself.
> Is
> > it
> > > > > still
> > > > > > different than using sequence file?
> > > > > >
> > > > > > > Regards
> > > > > > > Bejoy.K.S
> > > > > > >
> > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia <
> > > > > mohitanch...@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > We have small xml files. Currently I am planning to append
> > these
> > > > > small
> > > > > > > > files to one file in hdfs so that I can take advantage of
> > splits,
> > > > > > larger
> > > > > > > > blocks and sequential IO. What I am unsure is if it's ok to
> > > append
> > > > > one
> > > > > > > file
> > > > > > > > at a time to this hdfs file
> > > > > > > >
> > > > > > > > Could someone suggest if this is ok? Would like to know how
> > other
> > > > do
> > > > > > it.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Writing small files to one big file in hdfs

2012-02-21 Thread Arko Provo Mukherjee
Hi Mohit,

I am not sure that I understand your question.

But you can write into a file using:
*BufferedWriter output = new BufferedWriter
(new OutputStreamWriter(fs.create(my_path,true)));*
*output.write(data);*
*
*
Then you can pass that file as the input to your MapReduce program.

*FileInputFormat.addInputPath(jobconf, new Path (my_path) );*

>From inside your Map/Reduce methods, I think you should NOT be tinkering
with the input / output paths of that Map/Reduce job.
Cheers
Arko


On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia wrote:

> Thanks How does mapreduce work on sequence file? Is there an example I can
> look at?
>
> On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee <
> arkoprovomukher...@gmail.com> wrote:
>
> > Hi,
> >
> > Let's say all the smaller files are in the same directory.
> >
> > Then u can do:
> >
> > *BufferedWriter output = new BufferedWriter
> > (newOutputStreamWriter(fs.create(output_path,
> > true)));  // Output path*
> >
> > *FileStatus[] output_files = fs.listStatus(new Path(input_path));  //
> Input
> > directory*
> >
> > *for ( int i=0; i < output_files.length; i++ )  *
> >
> > *{*
> >
> > *   BufferedReader reader = new
> > BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(;
> > *
> >
> > *   String data;*
> >
> > *   data = reader.readLine();*
> >
> > *   while ( data != null ) *
> >
> > *  {*
> >
> > *output.write(data);*
> >
> > *  }*
> >
> > *reader.close*
> >
> > *}*
> >
> > *output.close*
> >
> >
> > In case you have the files in multiple directories, call the code for
> each
> > of them with different input paths.
> >
> > Hope this helps!
> >
> > Cheers
> >
> > Arko
> >
> > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia  > >wrote:
> >
> > > I am trying to look for examples that demonstrates using sequence files
> > > including writing to it and then running mapred on it, but unable to
> find
> > > one. Could you please point me to some examples of sequence files?
> > >
> > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks 
> > wrote:
> > >
> > > > Hi Mohit
> > > >  AFAIK XMLLoader in pig won't be suited for Sequence Files.
> Please
> > > > post the same to Pig user group for some workaround over the same.
> > > > SequenceFIle is a preferred option when we want to store
> small
> > > > files in hdfs and needs to be processed by MapReduce as it stores
> data
> > in
> > > > key value format.Since SequenceFileInputFormat is available at your
> > > > disposal you don't need any custom input formats for processing the
> > same
> > > > using map reduce. It is a cleaner and better approach compared to
> just
> > > > appending small xml file contents into a big file.
> > > >
> > > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia <
> > mohitanch...@gmail.com
> > > > >wrote:
> > > >
> > > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks 
> > > > wrote:
> > > > >
> > > > > > Mohit
> > > > > >   Rather than just appending the content into a normal text
> > file
> > > or
> > > > > > so, you can create a sequence file with the individual smaller
> file
> > > > > content
> > > > > > as values.
> > > > > >
> > > > > >  Thanks. I was planning to use pig's
> > > > > org.apache.pig.piggybank.storage.XMLLoader
> > > > > for processing. Would it work with sequence file?
> > > > >
> > > > > This text file that I was referring to would be in hdfs itself. Is
> it
> > > > still
> > > > > different than using sequence file?
> > > > >
> > > > > > Regards
> > > > > > Bejoy.K.S
> > > > > >
> > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia <
> > > > mohitanch...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > We have small xml files. Currently I am planning to append
> these
> > > > small
> > > > > > > files to one file in hdfs so that I can take advantage of
> splits,
> > > > > larger
> > > > > > > blocks and sequential IO. What I am unsure is if it's ok to
> > append
> > > > one
> > > > > > file
> > > > > > > at a time to this hdfs file
> > > > > > >
> > > > > > > Could someone suggest if this is ok? Would like to know how
> other
> > > do
> > > > > it.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
Thanks How does mapreduce work on sequence file? Is there an example I can
look at?

On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee <
arkoprovomukher...@gmail.com> wrote:

> Hi,
>
> Let's say all the smaller files are in the same directory.
>
> Then u can do:
>
> *BufferedWriter output = new BufferedWriter
> (newOutputStreamWriter(fs.create(output_path,
> true)));  // Output path*
>
> *FileStatus[] output_files = fs.listStatus(new Path(input_path));  // Input
> directory*
>
> *for ( int i=0; i < output_files.length; i++ )  *
>
> *{*
>
> *   BufferedReader reader = new
> BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(;
> *
>
> *   String data;*
>
> *   data = reader.readLine();*
>
> *   while ( data != null ) *
>
> *  {*
>
> *output.write(data);*
>
> *  }*
>
> *reader.close*
>
> *}*
>
> *output.close*
>
>
> In case you have the files in multiple directories, call the code for each
> of them with different input paths.
>
> Hope this helps!
>
> Cheers
>
> Arko
>
> On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia  >wrote:
>
> > I am trying to look for examples that demonstrates using sequence files
> > including writing to it and then running mapred on it, but unable to find
> > one. Could you please point me to some examples of sequence files?
> >
> > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks 
> wrote:
> >
> > > Hi Mohit
> > >  AFAIK XMLLoader in pig won't be suited for Sequence Files. Please
> > > post the same to Pig user group for some workaround over the same.
> > > SequenceFIle is a preferred option when we want to store small
> > > files in hdfs and needs to be processed by MapReduce as it stores data
> in
> > > key value format.Since SequenceFileInputFormat is available at your
> > > disposal you don't need any custom input formats for processing the
> same
> > > using map reduce. It is a cleaner and better approach compared to just
> > > appending small xml file contents into a big file.
> > >
> > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia <
> mohitanch...@gmail.com
> > > >wrote:
> > >
> > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks 
> > > wrote:
> > > >
> > > > > Mohit
> > > > >   Rather than just appending the content into a normal text
> file
> > or
> > > > > so, you can create a sequence file with the individual smaller file
> > > > content
> > > > > as values.
> > > > >
> > > > >  Thanks. I was planning to use pig's
> > > > org.apache.pig.piggybank.storage.XMLLoader
> > > > for processing. Would it work with sequence file?
> > > >
> > > > This text file that I was referring to would be in hdfs itself. Is it
> > > still
> > > > different than using sequence file?
> > > >
> > > > > Regards
> > > > > Bejoy.K.S
> > > > >
> > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia <
> > > mohitanch...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > We have small xml files. Currently I am planning to append these
> > > small
> > > > > > files to one file in hdfs so that I can take advantage of splits,
> > > > larger
> > > > > > blocks and sequential IO. What I am unsure is if it's ok to
> append
> > > one
> > > > > file
> > > > > > at a time to this hdfs file
> > > > > >
> > > > > > Could someone suggest if this is ok? Would like to know how other
> > do
> > > > it.
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Writing small files to one big file in hdfs

2012-02-21 Thread Arko Provo Mukherjee
Hi,

Let's say all the smaller files are in the same directory.

Then u can do:

*BufferedWriter output = new BufferedWriter
(newOutputStreamWriter(fs.create(output_path,
true)));  // Output path*

*FileStatus[] output_files = fs.listStatus(new Path(input_path));  // Input
directory*

*for ( int i=0; i < output_files.length; i++ )  *

*{*

*   BufferedReader reader = new
BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath(;
*

*   String data;*

*   data = reader.readLine();*

*   while ( data != null ) *

*  {*

*output.write(data);*

*  }*

*reader.close*

*}*

*output.close*


In case you have the files in multiple directories, call the code for each
of them with different input paths.

Hope this helps!

Cheers

Arko

On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia wrote:

> I am trying to look for examples that demonstrates using sequence files
> including writing to it and then running mapred on it, but unable to find
> one. Could you please point me to some examples of sequence files?
>
> On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks  wrote:
>
> > Hi Mohit
> >  AFAIK XMLLoader in pig won't be suited for Sequence Files. Please
> > post the same to Pig user group for some workaround over the same.
> > SequenceFIle is a preferred option when we want to store small
> > files in hdfs and needs to be processed by MapReduce as it stores data in
> > key value format.Since SequenceFileInputFormat is available at your
> > disposal you don't need any custom input formats for processing the same
> > using map reduce. It is a cleaner and better approach compared to just
> > appending small xml file contents into a big file.
> >
> > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia  > >wrote:
> >
> > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks 
> > wrote:
> > >
> > > > Mohit
> > > >   Rather than just appending the content into a normal text file
> or
> > > > so, you can create a sequence file with the individual smaller file
> > > content
> > > > as values.
> > > >
> > > >  Thanks. I was planning to use pig's
> > > org.apache.pig.piggybank.storage.XMLLoader
> > > for processing. Would it work with sequence file?
> > >
> > > This text file that I was referring to would be in hdfs itself. Is it
> > still
> > > different than using sequence file?
> > >
> > > > Regards
> > > > Bejoy.K.S
> > > >
> > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia <
> > mohitanch...@gmail.com
> > > > >wrote:
> > > >
> > > > > We have small xml files. Currently I am planning to append these
> > small
> > > > > files to one file in hdfs so that I can take advantage of splits,
> > > larger
> > > > > blocks and sequential IO. What I am unsure is if it's ok to append
> > one
> > > > file
> > > > > at a time to this hdfs file
> > > > >
> > > > > Could someone suggest if this is ok? Would like to know how other
> do
> > > it.
> > > > >
> > > >
> > >
> >
>


Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
I am trying to look for examples that demonstrates using sequence files
including writing to it and then running mapred on it, but unable to find
one. Could you please point me to some examples of sequence files?

On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks  wrote:

> Hi Mohit
>  AFAIK XMLLoader in pig won't be suited for Sequence Files. Please
> post the same to Pig user group for some workaround over the same.
> SequenceFIle is a preferred option when we want to store small
> files in hdfs and needs to be processed by MapReduce as it stores data in
> key value format.Since SequenceFileInputFormat is available at your
> disposal you don't need any custom input formats for processing the same
> using map reduce. It is a cleaner and better approach compared to just
> appending small xml file contents into a big file.
>
> On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia  >wrote:
>
> > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks 
> wrote:
> >
> > > Mohit
> > >   Rather than just appending the content into a normal text file or
> > > so, you can create a sequence file with the individual smaller file
> > content
> > > as values.
> > >
> > >  Thanks. I was planning to use pig's
> > org.apache.pig.piggybank.storage.XMLLoader
> > for processing. Would it work with sequence file?
> >
> > This text file that I was referring to would be in hdfs itself. Is it
> still
> > different than using sequence file?
> >
> > > Regards
> > > Bejoy.K.S
> > >
> > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia <
> mohitanch...@gmail.com
> > > >wrote:
> > >
> > > > We have small xml files. Currently I am planning to append these
> small
> > > > files to one file in hdfs so that I can take advantage of splits,
> > larger
> > > > blocks and sequential IO. What I am unsure is if it's ok to append
> one
> > > file
> > > > at a time to this hdfs file
> > > >
> > > > Could someone suggest if this is ok? Would like to know how other do
> > it.
> > > >
> > >
> >
>


Re: Writing small files to one big file in hdfs

2012-02-21 Thread Bill Graham
You might want to check out File Crusher:
http://www.jointhegrid.com/hadoop_filecrush/index.jsp

I've never used it, but it sounds like it could be helpful.

On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks  wrote:

> Hi Mohit
>  AFAIK XMLLoader in pig won't be suited for Sequence Files. Please
> post the same to Pig user group for some workaround over the same.
> SequenceFIle is a preferred option when we want to store small
> files in hdfs and needs to be processed by MapReduce as it stores data in
> key value format.Since SequenceFileInputFormat is available at your
> disposal you don't need any custom input formats for processing the same
> using map reduce. It is a cleaner and better approach compared to just
> appending small xml file contents into a big file.
>
> On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia  >wrote:
>
> > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks 
> wrote:
> >
> > > Mohit
> > >   Rather than just appending the content into a normal text file or
> > > so, you can create a sequence file with the individual smaller file
> > content
> > > as values.
> > >
> > >  Thanks. I was planning to use pig's
> > org.apache.pig.piggybank.storage.XMLLoader
> > for processing. Would it work with sequence file?
> >
> > This text file that I was referring to would be in hdfs itself. Is it
> still
> > different than using sequence file?
> >
> > > Regards
> > > Bejoy.K.S
> > >
> > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia <
> mohitanch...@gmail.com
> > > >wrote:
> > >
> > > > We have small xml files. Currently I am planning to append these
> small
> > > > files to one file in hdfs so that I can take advantage of splits,
> > larger
> > > > blocks and sequential IO. What I am unsure is if it's ok to append
> one
> > > file
> > > > at a time to this hdfs file
> > > >
> > > > Could someone suggest if this is ok? Would like to know how other do
> > it.
> > > >
> > >
> >
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgra...@gmail.com going forward.*


Dynamic changing of slaves

2012-02-21 Thread theta

Hi,

I am working on a project which requires a setup as follows:

One master with four slaves.However, when a map only program is run, the
master dynamically selects the slave to run the map. For example, when the
program is run for the first time, slave 2 is selected to run the map and
reduce programs, and the output is stored on dfs. When the program is run
the second time, slave 3 is selected and son on.

I am currently using Hadoop 0.20.2 with Ubuntu 11.10.

Any ideas on creating the setup as described above?

Regards

-- 
View this message in context: 
http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Writing small files to one big file in hdfs

2012-02-21 Thread Bejoy Ks
Hi Mohit
  AFAIK XMLLoader in pig won't be suited for Sequence Files. Please
post the same to Pig user group for some workaround over the same.
 SequenceFIle is a preferred option when we want to store small
files in hdfs and needs to be processed by MapReduce as it stores data in
key value format.Since SequenceFileInputFormat is available at your
disposal you don't need any custom input formats for processing the same
using map reduce. It is a cleaner and better approach compared to just
appending small xml file contents into a big file.

On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia wrote:

> On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks  wrote:
>
> > Mohit
> >   Rather than just appending the content into a normal text file or
> > so, you can create a sequence file with the individual smaller file
> content
> > as values.
> >
> >  Thanks. I was planning to use pig's
> org.apache.pig.piggybank.storage.XMLLoader
> for processing. Would it work with sequence file?
>
> This text file that I was referring to would be in hdfs itself. Is it still
> different than using sequence file?
>
> > Regards
> > Bejoy.K.S
> >
> > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia  > >wrote:
> >
> > > We have small xml files. Currently I am planning to append these small
> > > files to one file in hdfs so that I can take advantage of splits,
> larger
> > > blocks and sequential IO. What I am unsure is if it's ok to append one
> > file
> > > at a time to this hdfs file
> > >
> > > Could someone suggest if this is ok? Would like to know how other do
> it.
> > >
> >
>


Re: Writing small files to one big file in hdfs

2012-02-21 Thread Mohit Anchlia
On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks  wrote:

> Mohit
>   Rather than just appending the content into a normal text file or
> so, you can create a sequence file with the individual smaller file content
> as values.
>
>  Thanks. I was planning to use pig's 
> org.apache.pig.piggybank.storage.XMLLoader
for processing. Would it work with sequence file?

This text file that I was referring to would be in hdfs itself. Is it still
different than using sequence file?

> Regards
> Bejoy.K.S
>
> On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia  >wrote:
>
> > We have small xml files. Currently I am planning to append these small
> > files to one file in hdfs so that I can take advantage of splits, larger
> > blocks and sequential IO. What I am unsure is if it's ok to append one
> file
> > at a time to this hdfs file
> >
> > Could someone suggest if this is ok? Would like to know how other do it.
> >
>


Re: Writing small files to one big file in hdfs

2012-02-21 Thread Bejoy Ks
Mohit
   Rather than just appending the content into a normal text file or
so, you can create a sequence file with the individual smaller file content
as values.

Regards
Bejoy.K.S

On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia wrote:

> We have small xml files. Currently I am planning to append these small
> files to one file in hdfs so that I can take advantage of splits, larger
> blocks and sequential IO. What I am unsure is if it's ok to append one file
> at a time to this hdfs file
>
> Could someone suggest if this is ok? Would like to know how other do it.
>


Re: Writing small files to one big file in hdfs

2012-02-21 Thread Joey Echeverria
I'd recommend making a SequenceFile[1] to store each XML file as a value.

-Joey

[1]
http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/SequenceFile.html

On Tue, Feb 21, 2012 at 12:15 PM, Mohit Anchlia wrote:

> We have small xml files. Currently I am planning to append these small
> files to one file in hdfs so that I can take advantage of splits, larger
> blocks and sequential IO. What I am unsure is if it's ok to append one file
> at a time to this hdfs file
>
> Could someone suggest if this is ok? Would like to know how other do it.
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: Number of Under-Replicated Blocks ?

2012-02-21 Thread Chris Curtin
Have you had any Name Node failures lately? I had them every couple of days
and found that there were files being left in hdfs
/log/hadoop/tmp/mapred/staging/... when communications with the Name Node
was lost. Not sure why they never got replicated correctly (maybe because
they are in /log?)

I went in and removed the old files (say 2 days older or older) and saw the
# of blocks drop to 0.

Hope this helps,

Chris

On Mon, Feb 20, 2012 at 1:25 AM, praveenesh kumar wrote:

> I recently added a new DN/TT to my cluster. Could it be the reason for such
> behaviour ?
>
> Thanks,
> Praveenesh
>
> On Mon, Feb 20, 2012 at 11:51 AM, Harsh J  wrote:
>
> > Hi,
> >
> > The tool "hadoop fsck" will tell you which files are under replicated
> > with a count of what was expected instead, just run it over /.
> >
> > While it isn't a 'normal' thing to see it come up suddenly it is still
> > in the safe zone, and is most likely an indicator that either one of
> > your DN or one of its disks has gone bad, or you have a bad
> > mapred.submit.replication value for your cluster size (default is 10
> > replicas for all MR job submit data), or bit rot of existing blocks on
> > HDDs around the cluster, etc. -- You can mostly spot the pattern of
> > files causing it by running the fsck and obtaining the listing.
> >
> > On Mon, Feb 20, 2012 at 11:43 AM, praveenesh kumar  >
> > wrote:
> > > Hi,
> > >
> > > I am suddenly seeing some under-replicated blocks on my cluster.
> Although
> > > its not causing any problems, but It seems like few blocks are not
> > > replicated properly.
> > >
> > > Number of Under-Replicated Blocks : 147
> > >
> > > Is it okay behavior on hadoop. If no, How can I know what are the files
> > > with under-replicated blocks and how can I configure it properly to
> > reduce
> > > the number of under-replicated blocks.
> > >
> > > Thanks,
> > > Praveenesh
> >
> >
> >
> > --
> > Harsh J
> > Customer Ops. Engineer
> > Cloudera | http://tiny.cloudera.com/about
> >
>


Re: access hbase table from hadoop mapreduce

2012-02-21 Thread Clint Heath
It sounds to me like you just need to include your HBase jars into your
compiler's classpath like so:

javac -classpath $HADOOP_HOME Example.java

where $HADOOP_HOME includes all your base hadoop jars as well as your hbase
jars.

then you would want to put the resulting Example.class file into it's own
jar with something like this:

jar cvf Example.jar Example.class

then you can execute the program with this:

hadoop jar Example.jar Example

  The manual for running the hadoop CLI is here:
http://hadoop.apache.org/common/docs/current/commands_manual.html

Hope that helps,

Clint

On Tue, Feb 21, 2012 at 1:26 AM, amsal  wrote:

> hi..
> i want to access hbase table from hadoop mapreducei m using windowsXP
> and cygwin
> i m using hadoop-0.20.2 and hbase-0.92.0
> hadoop cluster is working finei am able to run mapreduce wordcount
> successfully on 3 pc's
> hbase is also working .i can cerate table from shell
>
> i have tried many examples but they are not workingwhen i try to
> compile
> it using
> javac Example.java
>
> it gives error.
> org.apache.hadoop.hbase.client does not exist
> org.apache.hadoop.hbase does not exist
> org.apache.hadoop.hbase.io does not exist
>
> please can anyone help me in this..
> -plz give me some example code to access hbase from hadoop map reduce
> -also guide me how should i compile and execute it
>
> thanx in advance
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/access-hbase-table-from-hadoop-mapreduce-tp3762847p3762847.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>


HDFS problem in hadoop 0.20.203

2012-02-21 Thread Shi Yu
Hi Hadoopers,

We are experiencing a strange problem on Hadoop 0.20.203 

Our cluster has 58 nodes, everything is started from a fresh 
HDFS (we deleted all local folders on datanodes and 
reformatted the namenode).  After running some small jobs, the 
HDFS becomes behaving abnormally and the jobs become very 
slow.  The namenode log is crushed by Gigabytes of errors like 
is:

2012-02-21 00:00:38,632 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_4524177823306792294 is added 
to invalidSet of 10.105.19.31:50010
2012-02-21 00:00:38,632 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_4524177823306792294 is added 
to invalidSet of 10.105.19.18:50010
2012-02-21 00:00:38,632 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_4524177823306792294 is added 
to invalidSet of 10.105.19.32:50010
2012-02-21 00:00:38,632 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_2884522252507300332 is added 
to invalidSet of 10.105.19.35:50010
2012-02-21 00:00:38,632 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_2884522252507300332 is added 
to invalidSet of 10.105.19.27:50010
2012-02-21 00:00:38,632 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_2884522252507300332 is added 
to invalidSet of 10.105.19.33:50010
2012-02-21 00:00:38,632 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 
10.105.19.21:50010 is added to blk_-
6843171124277753504_2279882 size 124490
2012-02-21 00:00:38,632 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
043_0013_m_000313_0/result_stem-m-00313. blk_-
6379064588594672168_2279890
2012-02-21 00:00:38,633 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 
10.105.19.26:50010 is added to blk_5338983375361999760_2279887 
size 1476
2012-02-21 00:00:38,633 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 
10.105.19.29:50010 is added to blk_-977828927900581074_2279887 
size 13818
2012-02-21 00:00:38,633 INFO 
org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.completeFile: file 
/syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
043_0013_m_000364_0/result_stem-m-00364 is closed by 
DFSClient_attempt_201202202043_0013_m_000364_0
2012-02-21 00:00:38,633 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 
10.105.19.23:50010 is added to blk_5338983375361999760_2279887 
size 1476
2012-02-21 00:00:38,633 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 
10.105.19.20:50010 is added to blk_5338983375361999760_2279887 
size 1476
2012-02-21 00:00:38,633 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
043_0013_m_000364_0/result_suffix-m-00364. 
blk_1921685366929756336_2279890
2012-02-21 00:00:38,634 INFO 
org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.completeFile: file 
/syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
043_0013_m_000279_0/result_suffix-m-00279 is closed by 
DFSClient_attempt_201202202043_0013_m_000279_0
2012-02-21 00:00:38,635 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_495061820035691700 is added 
to invalidSet of 10.105.19.20:50010
2012-02-21 00:00:38,635 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_495061820035691700 is added 
to invalidSet of 10.105.19.25:50010
2012-02-21 00:00:38,635 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_495061820035691700 is added 
to invalidSet of 10.105.19.33:50010
2012-02-21 00:00:38,635 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
043_0013_m_000284_0/result_stem-m-00284. 
blk_8796188324642771330_2279891
2012-02-21 00:00:38,638 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 
10.105.19.34:50010 is added to blk_-977828927900581074_2279887 
size 13818
2012-02-21 00:00:38,638 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/syu/output/naive/iter5_partout1/_temporary/_attempt_201202202
043_0013_m_000296_0/result_stem-m-00296. blk_-
6800409224007034579_2279891
2012-02-21 00:00:38,638 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 
10.105.19.29:50010 is added to blk_1921685366929756336_2279890 
size 1511
2012-02-21 00:00:38,638 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 
10.105.19.25:50010 is added to blk_-
2982099629304436976_2279752 size 569

In Map/Reduce 

Re: Problem in installation

2012-02-21 Thread Harsh J
Dheeraj,

In most homogenous cluster environments, people do keep the configs
synced. However, that isn't necessary.

It is alright to have different *-site.xml contents on each slave,
tailored for its provided resources. For instance if you have 3 slaves
with 3 disks, and 1 slave with 2, you can have a different
"dfs.data.dir" configuration on 2-disk one.

Although, managing configurations this way could get a bit painful
unless you use a configuration manager that can ease managing config
entries for you.

On Tue, Feb 21, 2012 at 4:06 PM, Dheeraj Kv  wrote:
> Hi
>
>
>           I am installing hadoop cluster of 5 nodes.
> I decided to make 1 node as master (namenode and jobtracker) and rest of the 
> 4 nodes as slaves( datanode and task tracker).
> I m skeptical about the configuration file location. Does the same site.xml 
> files reside in all the cluster nodes?
> If so I will have different hdfs mount points on different nodes, and when 
> the same site.xml files are available on all nodes
> will it cause any problem if it don't find the mount point on one node (which 
> is available on other node) .
>
>
> Regards
>
> Dheeraj KV
>
>



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about


Problem in installation

2012-02-21 Thread Dheeraj Kv
Hi


   I am installing hadoop cluster of 5 nodes.
I decided to make 1 node as master (namenode and jobtracker) and rest of the 4 
nodes as slaves( datanode and task tracker).
I m skeptical about the configuration file location. Does the same site.xml 
files reside in all the cluster nodes? 
If so I will have different hdfs mount points on different nodes, and when the 
same site.xml files are available on all nodes 
will it cause any problem if it don't find the mount point on one node (which 
is available on other node) .


Regards

Dheeraj KV




Application Submission using ClientRMProtocol in Hadoop 0.23

2012-02-21 Thread abhishek1015
Hi,

I followed steps given in below link to submit a application on hadoop 0.23:
 
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html

It didn't work for me. It may be because
  1) ClientRMProtocol is not a VersionedProtocol
  2) GetNewApplicationRequest do not implement writable interface

Are steps given in above link worked for any one?

Regards,
Abhishek

--
View this message in context: 
http://hadoop-common.472056.n3.nabble.com/Application-Submission-using-ClientRMProtocol-in-Hadoop-0-23-tp3763037p3763037.html
Sent from the Users mailing list archive at Nabble.com.


access hbase table from hadoop mapreduce

2012-02-21 Thread amsal
hi..
i want to access hbase table from hadoop mapreducei m using windowsXP
and cygwin
i m using hadoop-0.20.2 and hbase-0.92.0
hadoop cluster is working finei am able to run mapreduce wordcount
successfully on 3 pc's
hbase is also working .i can cerate table from shell

i have tried many examples but they are not workingwhen i try to compile
it using
javac Example.java

it gives error. 
org.apache.hadoop.hbase.client does not exist
org.apache.hadoop.hbase does not exist 
org.apache.hadoop.hbase.io does not exist

please can anyone help me in this..
-plz give me some example code to access hbase from hadoop map reduce
-also guide me how should i compile and execute it

thanx in advance


--
View this message in context: 
http://lucene.472066.n3.nabble.com/access-hbase-table-from-hadoop-mapreduce-tp3762847p3762847.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Re: Pydoop 0.5 released

2012-02-21 Thread Alexander Lorenz
awesome, guys!

-Alex

sent via my mobile device

On Feb 20, 2012, at 11:59 PM, Luca Pireddu  wrote:

> Hello everyone,
> 
> we're happy to announce that we have just released Pydoop 0.5.0
> (http://pydoop.sourceforge.net).
> 
> The main changes with respect to the previous version are:
> * Pydoop now works with Hadoop 1.0.0.
> * Support for multiple Hadoop versions with the same Pydoop installation
> * Easy Pydoop scripting with pydoop_script
> * Python version requirement bumped to 2.7
> * Dropped support for Hadoop 0.21
> 
> 
> Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the C++
> Pipes and the C libhdfs APIs, that allows to write full-fledged MapReduce 
> applications with HDFS access. Pydoop has been maturing nicely and is 
> currently in production use at CRS4 as we have a few scientific projects that 
> are based on it, including Seal
> (https://sourceforge.net/projects/biodoop-seal/), Biodoop and Biodoop-BLAST 
> (https://sourceforge.net/projects/biodoop/), and a new project for 
> high-throughput genotyping that is about to be released by CRS4.
> 
> 
> Links:
> 
> * download page: http://sourceforge.net/projects/pydoop/files
> * full release notes:
> http://sourceforge.net/apps/mediawiki/pydoop/index.php?title=Release_Notes
> 
> 
> Happy pydooping!
> 
> 
> The Pydoop Team


Re: Did DFSClient cache the file data into a temporary local file

2012-02-21 Thread Harsh J
Seven,

Yes that strategy has changed since long ago, but the doc on it was
only recently updated: https://issues.apache.org/jira/browse/HDFS-1454
(and some more improvements followed later IIRC)

2012/2/21 seven garfee :
> hi,all
> As this Page(
> http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#Staging)
>  said,"In fact, initially the HDFS client caches the file data into a
> temporary local file".
> But I read the DFSClient.java in 0.20.2,and found nothing about storing
> data in tmp local file.
> Did I miss something or That strategy has been removed?



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about


Tasktracker fails

2012-02-21 Thread Adarsh Sharma

Dear all,

Today I am trying  to configure hadoop-0.20.205.0 on a 4  node Cluster.
When I start my cluster , all daemons got started except tasktracker, 
don't know why task tracker fails due to following error logs.


Cluster is in private network.My /etc/hosts file contains all IP 
hostname resolution commands in all  nodes.


2012-02-21 17:48:33,056 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
TaskTrackerMetrics registered.
2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.TaskTracker: Can 
not start task tracker because java.net.SocketException: Invalid argument

   at sun.nio.ch.Net.bind(Native Method)
   at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)

   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
   at org.apache.hadoop.ipc.Server.bind(Server.java:225)
   at org.apache.hadoop.ipc.Server$Listener.(Server.java:301)
   at org.apache.hadoop.ipc.Server.(Server.java:1483)
   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:545)
   at org.apache.hadoop.ipc.RPC.getServer(RPC.java:506)
   at 
org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:772)
   at 
org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1428)

   at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3673)

Any comments on the issue.


Thanks


Consistent "register getProtocolVersion" error due to "Duplicate metricsName:getProtocolVersion" during cluster startup -- then various other errors during job execution

2012-02-21 Thread Ali S Kureishy
Hi,

I've got a pseudo-distributed Hadoop (v0.20.02) setup with 1 machine (with
Ubuntu 10.04 LTS) running all the hadoop processes (NN + SNN + JT + TT +
DN). I've also configured the files under conf/ so that the master is
referred to by its actual machine name (in this case, *bali*), instead of
localhost (however, the issue below is seen regardless). I was able to
successfully format the HDFS (by running hadoop namenode –format). However,
right after I deploy the cluster using bin/start-all.sh, I see the
following error in the NameNode's config file. It is an INFO error, but I
believe it is the root cause behind various other errors I am encountering
when executing actual Hadoop jobs. (For instance, at one point I see errors
that the datanode and namenode were communicating using different protocol
versions ... 3 vs 6 etc.). Anyway, here is the initial error:

*2012-02-21 09:01:42,015 INFO org.apache.hadoop.ipc.Server: Error register
getProtocolVersion
java.lang.**IllegalArgumentException: Duplicate
metricsName:getProtocolVersion
at org.apache.hadoop.metrics.**util.MetricsRegistry.add(**
MetricsRegistry.java:53)
at org.apache.hadoop.metrics.**util.MetricsTimeVaryingRate.<**
init>(MetricsTimeVaryingRate.**java:89)
at org.apache.hadoop.metrics.**util.MetricsTimeVaryingRate.<**
init>(MetricsTimeVaryingRate.**java:99)
at org.apache.hadoop.ipc.RPC$**Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$**Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$**Handler$1.run(Server.java:955)
at java.security.**AccessController.doPrivileged(**Native Method)
at javax.security.auth.Subject.**doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$**Handler.run(Server.java:953)
*
I’ve scoured the web searching for other instances of this error, but none
of the hits were helpful, nor relevant to my setup. My hunch is that this
is preventing the cluster from correctly initializing. I would have
switched to a later version of Hadoop, but the Nutch v1.4 distribution I’m
trying to run on top of Hadoop is, AFAIK, only compatible with Hadoop
v0.20. I have included with this email all my hadoop config files
(config.rar), in case you need to take a quick look. Below is my /etc/hosts
configuration, in case the issue is with that. I believe this is a
hadoop-specific issue, and not related to Nutch, hence am posting to the
hadoop mailing list.

*ETC/HOSTS:
**127.0.0.1   localhost
#127.0.1.1  bali** **

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

192.168.1.21 bali

**
FILE-SYSTEM layout:**
*Here's my filesystem layout. I've got all my hadoop configs pointing to
folders under a root folder called */private/user/hadoop*, with the
following permissions.
*ls -l /private/user/
*total 4
drwxrwxrwx 7 user alt 4096 Feb 21 09:06 hadoop

*ls -l /private/user/hadoop/
*total 20
drwxr-xr-x 5 user alt 4096 Feb 21 09:01 data
drwxr-xr-x 3 user alt 4096 Feb 21 09:07 mapred
drwxr-xr-x 4 user alt 4096 Feb 21 08:59 name
drwxr-xr-x 2 user alt 4096 Feb 21 08:59 pids
drwxr-xr-x 3 user alt 4096 Feb 21 09:01 tmp

Shortly after the getProtocolVersion error above, I start seeing these
errors in the namenode log:
*2012-02-21 09:06:47,895 WARN org.mortbay.log: /getimage:
java.io.IOException: GetImage failed. java.io.IOException: Server returned
HTTP response code: 503 for URL:
http://192.168.1.21:50090/getimage?getimage=1
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:151)

at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:58)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseN

Pydoop 0.5 released

2012-02-21 Thread Luca Pireddu

Hello everyone,

we're happy to announce that we have just released Pydoop 0.5.0
(http://pydoop.sourceforge.net).

The main changes with respect to the previous version are:
* Pydoop now works with Hadoop 1.0.0.
* Support for multiple Hadoop versions with the same Pydoop installation
* Easy Pydoop scripting with pydoop_script
* Python version requirement bumped to 2.7
* Dropped support for Hadoop 0.21


Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the C++
Pipes and the C libhdfs APIs, that allows to write full-fledged 
MapReduce applications with HDFS access. Pydoop has been maturing nicely 
and is currently in production use at CRS4 as we have a few scientific 
projects that are based on it, including Seal
(https://sourceforge.net/projects/biodoop-seal/), Biodoop and 
Biodoop-BLAST (https://sourceforge.net/projects/biodoop/), and a new 
project for high-throughput genotyping that is about to be released by CRS4.



Links:

* download page: http://sourceforge.net/projects/pydoop/files
* full release notes:
http://sourceforge.net/apps/mediawiki/pydoop/index.php?title=Release_Notes


Happy pydooping!


The Pydoop Team