Is nutch still under developing?

2008-12-15 Thread yunyi.x
Is nutch 0.9 the last version? If not.when will the next version be published? -- View this message in context: http://www.nabble.com/Is-nutch-still-under-developing--tp21010047p21010047.html Sent from the Nutch - Dev mailing list archive at Nabble.com.

(un)sorted index speed performance

2008-12-15 Thread Marko Bauhardt
Hi, we use a older version of nutch, version 0.8-dev. i have a question about the indexing/searching. as far as i know the documents inside an nutch index is not sorted by boost (the popularity of a website which is computed via opic/linkdb). The default value from searcher.max.hits is

File system

2008-12-15 Thread oSilvio
Do somebody know how do the file structure works, briefly? It seems that the data are compressed or something, its not possible to understand whats recorded in the data nor index files. Thanks Silvio -- View this message in context: http://www.nabble.com/File-system-tp21022587p21022587.html

[jira] Created: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2008-12-15 Thread Sean Dean (JIRA)
Upgrade the Carrot2 plug-in to release 3.0 -- Key: NUTCH-673 URL: https://issues.apache.org/jira/browse/NUTCH-673 Project: Nutch Issue Type: Improvement Components: web gui Affects

Re: File system

2008-12-15 Thread Dennis Kubes
The nutch databases are either SequenceFile or MapFile formats which store key and value pairs. Their keys and values are Writable implementations which translate an object into it byte equivalent and vice versa. Data and index files are MapFile format. Data is a SequenceFile, index is an

Build failed in Hudson: Nutch-trunk #662

2008-12-15 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/662/changes -- [...truncated 2223 lines...] A src/plugin/protocol-http/src/test/org/apache/nutch A src/plugin/protocol-http/src/test/org/apache/nutch/protocol A

Last-Modified metatag in nutch

2008-12-15 Thread susmita ganguli
Hi, I am trying to read the Last modified date of a web page using the following code snippet: String moddate = parse.getData().getContentMeta().get(Metadata.LAST_MODIFIED); in org.apache.nutch.indexer.basic.BasicIndexingFilter.java. But it is returning me today's date instead of the last