Re: help with readseg

Amna Waqar Wed, 02 Feb 2011 04:27:44 -0800

I have found that directory..That in hdfs path
/user/root/crawl/

On Wed, Feb 2, 2011 at 7:19 AM, Amna Waqar <[email protected]> wrote:


> i donot see any directory named amna_out in the current working directory
> from where the command is launched .Why this is so? I m using nutch.1.2
>
>
>
> On Wed, Feb 2, 2011 at 7:01 AM, Arjun Kumar Reddy <
> [email protected]> wrote:
>
>> Hi Amna,
>>
>> The output folder 'amna_out' will be created in the directory from where
>> you
>> are running the readseg command. In that directory, you'll find a file
>> named
>> 'dump'. You can get the contents of crawled pages from it.
>>
>> Thanks and regards,*
>> *Ch. Arjun Kumar Reddy
>>
>>
>> On Wed, Feb 2, 2011 at 5:16 PM, Amna Waqar <[email protected]>
>> wrote:
>>
>> > I am using the following command
>> > [root@Amna search]# bin/nutch readseg -dump
>> > /user/root/crawl/segments/20110124205537/ amna_out
>> >
>> > but output is
>> > SegmentReader: dump segment: /user/root/crawl/segments/20110124205537
>> > SegmentReader: done
>> >
>> > how can i view the readseg output from amna_out temporary directory
>> because
>> > i didnt see this directory anywhere
>> > After reading the code of segment reader ,in dump method,there is no
>> > println
>> > method or flush method being called so how can we see its output
>> >
>> > I can see the output of
>> > readseg -get command which is for one url
>> >
>>
>
>

Re: help with readseg

Reply via email to