Re: [Nutch-dev] retrieving original html from database

2007-04-28 Thread songjue
That's just what I need! thanks Brigg. songjue 2007-04-29 发件人: Briggs 发送时间: 2007-04-28 00:12:36 收件人: [EMAIL PROTECTED] 抄送: 主题: Re: retrieving original html from database If you need an api for getting the content, can't you just look into the cachedContent.jsp of the demo search applicatio

Re: [Nutch-dev] retrieving original html from database

2007-04-27 Thread Briggs
If you need an api for getting the content, can't you just look into the cachedContent.jsp of the demo search application? That shows how to retrieve the original text/html that is stored within the segments. Perhaps I am missing something. On 4/27/07, songjue <[EMAIL PROTECTED]> wrote: Yo

Re: [Nutch-dev] retrieving original html from database

2007-04-26 Thread songjue
You can try this command: bin/nutch readseg (-dump ... | -get ...) . If you need an API instead of the command line, you may have to hack the segment/SegmentReader.java? I'm also wondering this. BTW, make sure you set the 'http.content.limit' property to -1 to avoid content truncation. son

Re: [Nutch-dev] retrieving original html from database

2007-04-26 Thread Charlie Williams
thank you, I will give it a try :) On 4/25/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: On 4/25/07, Charlie Williams <[EMAIL PROTECTED]> wrote: > I have an index of pages from the web, a bit over 1 million. The fetch took > several weeks to complete, since it was mainly over a small set of doma

Re: [Nutch-dev] retrieving original html from database

2007-04-25 Thread Doğacan Güney
On 4/25/07, Charlie Williams <[EMAIL PROTECTED]> wrote: > I have an index of pages from the web, a bit over 1 million. The fetch took > several weeks to complete, since it was mainly over a small set of domains. > Once we had a completed fetch, and index we began trying to work with the > retrieved