crawl db disrtibution on different data nodes
What will be the maximum size of crawlDb on a single node? -- View this message in context: http://www.nabble.com/crawl-db-disrtibution-on-different-data-nodes-tf2410095.html#a6717799 Sent from the Nutch - User mailing list archive at Nabble.com.
can nutch 0.7.2 set the max pages when doing a crawl job?
HI, can nutch 0.7.2 set the max pages to be crawed when doing a crawl job? if so how to? Regards! -- kevin
Re: Lucene query support in Nutch
2006/10/8, Stefan Neufeind [EMAIL PROTECTED]: if it's not the full feature-set, maybe most people could live with it. But basic boolean queries I think were the root for this topic. Is there an easier way to allow this in Nutch as well instead of throwing quite a bit away and using the Lucene-syntax? As has just been pointed out: It This is *exactly* what I was thinking. Like Stefan, I believe the nutch analyzer is a good foundation and should therefore be extended to support the or operator, and possibly additional capabilities when the need arises. t.n.a.
Re: Lucene query support in Nutch
On 10/9/06, Tomi NA [EMAIL PROTECTED] wrote: This is *exactly* what I was thinking. Like Stefan, I believe the nutch analyzer is a good foundation and should therefore be extended to support the or operator, and possibly additional capabilities when the need arises. t.n.a. Tomi, why would you extend Nutch's analyzer when Lucene's analyzer, which does exactly what you want, is already there? Regards, Cristina
Problem with readseg
Hi, I am running 'readseg' to get data for a particular URL and getting the following exception. (see attached trace) Everything else like search etc. is working fine. However, I have been unable to understand this error. Any help is highly appreciated thanks -Sameer -- This is the trace: $ bin/nutch readseg -get segments/20061008154327 http://www.bartleby.com/100/ SegmentReader: get 'http://www.bartleby.com/100/' 06/10/09 19:31:53 INFO segment.SegmentReader: SegmentReader: get 'http://www.bar tleby.com/100/' java.lang.ArrayIndexOutOfBoundsException: 0 06/10/09 19:31:53 WARN segment.SegmentReader: java.lang.ArrayIndexOutOfBoundsExc eption: 0 at org.apache.nutch.segment.SegmentReader.getMapRecords(SegmentReader.java:352) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.getMapRecords(SegmentReader.java:352) at org.apache.nutch.segment.SegmentReader.access$000(SegmentReader.java:40) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.access$000(SegmentReader.java:40) at org.apache.nutch.segment.SegmentReader$1.run(SegmentReader.java:265) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader$1.run(SegmentReader.java:265) java.lang.ArrayIndexOutOfBoundsException: 0 06/10/09 19:31:53 WARN segment.SegmentReader: java.lang.ArrayIndexOutOfBoundsExc eption: 0 at org.apache.nutch.segment.SegmentReader.getMapRecords(SegmentReader.java:352) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.getMapRecords(SegmentReader.java:352) at org.apache.nutch.segment.SegmentReader.access$000(SegmentReader.java:40) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.access$000(SegmentReader.java:40) at org.apache.nutch.segment.SegmentReader$2.run(SegmentReader.java:275) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader$2.run(SegmentReader.java:275) java.lang.ArrayIndexOutOfBoundsException: 0 06/10/09 19:31:53 WARN segment.SegmentReader: java.lang.ArrayIndexOutOfBoundsExc eption: 0 at org.apache.nutch.segment.SegmentReader.getSeqRecords(SegmentReader.java:369) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.getSeqRecords(SegmentReader.java:369) at org.apache.nutch.segment.SegmentReader.access$100(SegmentReader.java:40) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.access$100(SegmentReader.java:40) at org.apache.nutch.segment.SegmentReader$3.run(SegmentReader.java:285) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader$3.run(SegmentReader.java:285) java.lang.ArrayIndexOutOfBoundsException: 0 06/10/09 19:31:53 WARN segment.SegmentReader: java.lang.ArrayIndexOutOfBoundsExc eption: 0 at org.apache.nutch.segment.SegmentReader.getSeqRecords(SegmentReader.java:369) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.getSeqRecords(SegmentReader.java:369) at org.apache.nutch.segment.SegmentReader.access$100(SegmentReader.java:40) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.access$100(SegmentReader.java:40) at org.apache.nutch.segment.SegmentReader$4.run(SegmentReader.java:295) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader$4.run(SegmentReader.java:295) java.lang.ArrayIndexOutOfBoundsException: 0 06/10/09 19:31:53 WARN segment.SegmentReader: java.lang.ArrayIndexOutOfBoundsExc eption: 0 at org.apache.nutch.segment.SegmentReader.getMapRecords(SegmentReader.java:352) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.getMapRecords(SegmentReader.java:352) at org.apache.nutch.segment.SegmentReader.access$000(SegmentReader.java:40) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.access$000(SegmentReader.java:40) at org.apache.nutch.segment.SegmentReader$5.run(SegmentReader.java:305) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader$5.run(SegmentReader.java:305) java.lang.ArrayIndexOutOfBoundsException: 0 06/10/09 19:31:53 WARN segment.SegmentReader: java.lang.ArrayIndexOutOfBoundsExc eption: 0 at org.apache.nutch.segment.SegmentReader.getMapRecords(SegmentReader.java:352) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.getMapRecords(SegmentReader.java:352) at org.apache.nutch.segment.SegmentReader.access$000(SegmentReader.java:40) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen tReader.access$000(SegmentReader.java:40) at org.apache.nutch.segment.SegmentReader$6.run(SegmentReader.java:315) 06/10/09 19:31:53 WARN segment.SegmentReader: at org.apache.nutch.segment.Segmen