crawl db disrtibution on different data nodes

2006-10-09 Thread jaison Qburst


What will be the maximum size of crawlDb on a single node?
-- 
View this message in context: 
http://www.nabble.com/crawl-db-disrtibution-on-different-data-nodes-tf2410095.html#a6717799
Sent from the Nutch - User mailing list archive at Nabble.com.



can nutch 0.7.2 set the max pages when doing a crawl job?

2006-10-09 Thread kevin

HI,


can nutch 0.7.2 set the max pages to be crawed when doing a crawl job? if so
how to?


Regards!


--
kevin


Re: Lucene query support in Nutch

2006-10-09 Thread Tomi NA

2006/10/8, Stefan Neufeind [EMAIL PROTECTED]:


if it's not the full feature-set, maybe most people could live with it.
But basic boolean queries I think were the root for this topic. Is there
an easier way to allow this in Nutch as well instead of throwing quite
a bit away and using the Lucene-syntax? As has just been pointed out: It


This is *exactly* what I was thinking. Like Stefan, I believe the
nutch analyzer is a good foundation and should therefore be extended
to support the or operator, and possibly additional capabilities
when the need arises.

t.n.a.


Re: Lucene query support in Nutch

2006-10-09 Thread Cristina Belderrain

On 10/9/06, Tomi NA [EMAIL PROTECTED] wrote:


This is *exactly* what I was thinking. Like Stefan, I believe the
nutch analyzer is a good foundation and should therefore be extended
to support the or operator, and possibly additional capabilities
when the need arises.

t.n.a.


Tomi, why would you extend Nutch's analyzer when Lucene's analyzer,
which does exactly what you want, is already there?

Regards,

Cristina


Problem with readseg

2006-10-09 Thread Pankaj Mathur

Hi,

I am running 'readseg' to get data for a particular URL and getting the 
following exception. (see attached trace)
Everything else like search etc. is working fine. However, I have been 
unable to understand this error.


Any help is highly appreciated

thanks
-Sameer
--
This is the trace:

$ bin/nutch readseg  -get segments/20061008154327 
http://www.bartleby.com/100/

SegmentReader: get 'http://www.bartleby.com/100/'
06/10/09 19:31:53 INFO segment.SegmentReader: SegmentReader: get 
'http://www.bar

tleby.com/100/'
java.lang.ArrayIndexOutOfBoundsException: 0
06/10/09 19:31:53 WARN segment.SegmentReader: 
java.lang.ArrayIndexOutOfBoundsExc

eption: 0
at 
org.apache.nutch.segment.SegmentReader.getMapRecords(SegmentReader.java:352)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.getMapRecords(SegmentReader.java:352)
at org.apache.nutch.segment.SegmentReader.access$000(SegmentReader.java:40)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.access$000(SegmentReader.java:40)
at org.apache.nutch.segment.SegmentReader$1.run(SegmentReader.java:265)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader$1.run(SegmentReader.java:265)
java.lang.ArrayIndexOutOfBoundsException: 0
06/10/09 19:31:53 WARN segment.SegmentReader: 
java.lang.ArrayIndexOutOfBoundsExc

eption: 0
at 
org.apache.nutch.segment.SegmentReader.getMapRecords(SegmentReader.java:352)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.getMapRecords(SegmentReader.java:352)
at org.apache.nutch.segment.SegmentReader.access$000(SegmentReader.java:40)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.access$000(SegmentReader.java:40)
at org.apache.nutch.segment.SegmentReader$2.run(SegmentReader.java:275)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader$2.run(SegmentReader.java:275)
java.lang.ArrayIndexOutOfBoundsException: 0
06/10/09 19:31:53 WARN segment.SegmentReader: 
java.lang.ArrayIndexOutOfBoundsExc

eption: 0
at 
org.apache.nutch.segment.SegmentReader.getSeqRecords(SegmentReader.java:369)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.getSeqRecords(SegmentReader.java:369)
at org.apache.nutch.segment.SegmentReader.access$100(SegmentReader.java:40)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.access$100(SegmentReader.java:40)
at org.apache.nutch.segment.SegmentReader$3.run(SegmentReader.java:285)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader$3.run(SegmentReader.java:285)
java.lang.ArrayIndexOutOfBoundsException: 0
06/10/09 19:31:53 WARN segment.SegmentReader: 
java.lang.ArrayIndexOutOfBoundsExc

eption: 0
at 
org.apache.nutch.segment.SegmentReader.getSeqRecords(SegmentReader.java:369)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.getSeqRecords(SegmentReader.java:369)
at org.apache.nutch.segment.SegmentReader.access$100(SegmentReader.java:40)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.access$100(SegmentReader.java:40)
at org.apache.nutch.segment.SegmentReader$4.run(SegmentReader.java:295)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader$4.run(SegmentReader.java:295)
java.lang.ArrayIndexOutOfBoundsException: 0
06/10/09 19:31:53 WARN segment.SegmentReader: 
java.lang.ArrayIndexOutOfBoundsExc

eption: 0
at 
org.apache.nutch.segment.SegmentReader.getMapRecords(SegmentReader.java:352)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.getMapRecords(SegmentReader.java:352)
at org.apache.nutch.segment.SegmentReader.access$000(SegmentReader.java:40)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.access$000(SegmentReader.java:40)
at org.apache.nutch.segment.SegmentReader$5.run(SegmentReader.java:305)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader$5.run(SegmentReader.java:305)
java.lang.ArrayIndexOutOfBoundsException: 0
06/10/09 19:31:53 WARN segment.SegmentReader: 
java.lang.ArrayIndexOutOfBoundsExc

eption: 0
at 
org.apache.nutch.segment.SegmentReader.getMapRecords(SegmentReader.java:352)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.getMapRecords(SegmentReader.java:352)
at org.apache.nutch.segment.SegmentReader.access$000(SegmentReader.java:40)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen

tReader.access$000(SegmentReader.java:40)
at org.apache.nutch.segment.SegmentReader$6.run(SegmentReader.java:315)
06/10/09 19:31:53 WARN segment.SegmentReader: at 
org.apache.nutch.segment.Segmen