Re: hi all:

2006-12-09 Thread Sami Siren
吴志敏 wrote:
  I want to read the stored segments to a xml file, but when I read the
 SegmentReader.java, I find that it ‘s not a simple thing.
 
 it’s a hadoop’s job to dump a text file. I just want to dump the
 segments’ some content witch I interested to a xml. 
 
 So some one can tell me hwo to do this, any reply will be appreciated!

Segment data is basically just a bunch of files containing
key-value pairs, so there's always the possibility of reading the data
directly with help of:

http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/io/SequenceFile.Reader.html

To see what kind of object to expect you can just examine the beginning
of file where there is some metadata stored - like class used for key
and class used for value (that metadata is also available from methods
of SequenceFile.Reader class).

For example to read the contents of Content data from a segment one
could use something like:

SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);

Text url = new Text();  //key
Content content = new Content();//value
while (reader.next(url, content)) {
  //now just use url and content the way you like
}

--
 Sami Siren



Re: hi all:

2006-12-09 Thread kauu

thx very much ,i'll try it

On 12/9/06, Sami Siren [EMAIL PROTECTED] wrote:


吴志敏 wrote:
  I want to read the stored segments to a xml file, but when I read the
 SegmentReader.java, I find that it 's not a simple thing.

 it's a hadoop's job to dump a text file. I just want to dump the
 segments' some content witch I interested to a xml.

 So some one can tell me hwo to do this, any reply will be appreciated!

Segment data is basically just a bunch of files containing
key-value pairs, so there's always the possibility of reading the data
directly with help of:


http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/io/SequenceFile.Reader.html

To see what kind of object to expect you can just examine the beginning
of file where there is some metadata stored - like class used for key
and class used for value (that metadata is also available from methods
of SequenceFile.Reader class).

For example to read the contents of Content data from a segment one
could use something like:

SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);

Text url = new Text();  //key
Content content = new Content();//value
while (reader.next(url, content)) {
  //now just use url and content the way you like
}

--
Sami Siren





--
www.babatu.com