Re: [Nutch-dev] [jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic

Thomas Müller Tue, 02 Jan 2007 22:36:09 -0800

Alan, would it be possible, to create with this plugin columns in the nutch 
database, which correspond to the www.yacy.net search enginge (as well in 
java), so that nutch can be hybrid with the yacy p2p system?
Then this means, the databse of each nutch can be distributed over this p2p 
system to other yacy AND nutch nodes.
Then we only need as well a yacy plugin, and each website is crwaled twice in 
each nutch central search engine, once for nutch, once for yacy, but both relay 
on the same database.


Thanks


-------- Original-Nachricht --------
Datum: Tue, 2 Jan 2007 14:57:27 -0800 (PST)
Von: "Alan Tanaman (JIRA)" <[EMAIL PROTECTED]>
An: [email protected]
Betreff: [jira] Commented: (NUTCH-422) index-extra plugin creates additional 
fields in the index, based on configurable logic

> 
>     [
> http://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461863
>  ] 
> 
> Alan Tanaman commented on NUTCH-422:
> ------------------------------------
> 
> Many thanks for your feedback.
> 
> Do you have any specifics in mind regarding examples?  I will try and
> include any additional ones that we implement.  I know there are a lot of
> options, but it is a little hard to see what is unclear from my end -- as I am
> so involved in the development, another point-of-view on this is welcome.
> ;)
> 
> Regarding query-extra, we are not currently using the Nutch bean, so the
> need has not arisen for us at this point in time, but I can see how that
> would be useful.  I guess you could adapt one of the existing query-xxxx
> plugins fairly easily by having them read the xml configuration file to see 
> what
> fields are potentially available in the index.
> 
> As for the boost, I included that as it seems like a useful thing to be
> able to control the boost of a single field, although we don't need that at
> this very moment.  The line of code in the
> org.apache.nutch.indexer.Indexer's
> reduce method could be overridden, but I'm not yet sure how that would
> affect the overall scoring (scoring is one of my really weak points).
> Perhaps one of the scoring experts could give some guidance on this?
> 
> > index-extra plugin creates additional fields in the index, based on
> configurable logic
> >
> --------------------------------------------------------------------------------------
> >
> >                 Key: NUTCH-422
> >                 URL: http://issues.apache.org/jira/browse/NUTCH-422
> >             Project: Nutch
> >          Issue Type: New Feature
> >          Components: indexer
> >    Affects Versions: 0.8.1
> >         Environment: All environments
> >            Reporter: Alan Tanaman
> >         Attachments: index-extra-v1.0-bin-java1.5.zip,
> index-extra-v1.0-source.zip
> >
> >
> > Extract from the Readme file:
> > A.  Introduction
> >     The index-extra plugin allows you to configure additional fields
> that you wish to be added to the index, based on one of the following sources:
> >       - The parsed text
> >       - Meta data fields
> >       - Previously created document-to-be-indexed fields
> >       - Plain constant string
> >       - Java expression combining one or more of the above, and
> resolving to a string
> >     A regex can also be applied to any of the above, allowing fields to
> be created based on patterns extracted from the source.
> > B.  Installation
> >     1)  Binaries only:  Copy the 'index-extra' folder within
> index-extra-v1.0-bin-java1.5.zip to NUTCHDIR/build
> >                         Copy the 'index-extra-conf.xml' file to
> NUTCHDIR/conf, and configure
> >                         Enable the plugin by updating the nutch-site.xml
> file
> >     2)  Source code:    Always refer to the Nutch wiki for detailed
> instructions on building Nutch.  In short:
> >                         Copy the 'index-extra' folder within
> index-extra-v1.0-source.zip to NUTCHDIR/src/plugin
> >                         Update the build.xml in NUTCHDIR/src/plugin to
> include plugin
> >                         Update the NUTCHDIR/default.properties file to
> include plugin
> >                         run ant to build
> >                         Copy the 'index-extra-conf.xml' file to
> NUTCHDIR/conf, and configure
> >                         Enable the plugin by updating the nutch-site.xml
> file
> > C.  Known Issues
> >     1)  For this plugin to work correctly on any document field, it is
> necessary to run the other index filters
> >     first, so that all basic document fields are generated first.  To do
> this, configure the indexingfilter.order
> >     property.  (Please see patch NUTCH-421 to enable
> indexingfilter.order property. If this patch is not applied,
> >     the plugin will still work, but will not be able to use document
> fields created by other index filter plugins.)
> >     2)  At this stage, field boost can not be used as Nutch scoring
> overrides the field boost with its own
> >     document-level boost calculation.  This occurs at the end of
> org.apache.nutch.indexer.Indexer's reduce method.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
>         

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] [jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic

Reply via email to