Enis - thanks for the pointer.

Enis Soztutar wrote:
> You can write index plugins. Please first read the (slighlty outdated) 
> tutorial and then check    http://wiki.apache.org/nutch/PluginCentral. 
> Optionally you may want to write html parse plugins depending on the 
> source of the data.
> 
> Chris Hane wrote:
>> I am looking to use nutch to crawl/index a website.  A lot of the 
>> pages have videos on them.  We have transcripts for the videos that we 
>> would like to be included for indexing; but we do not want to put the 
>> transcripts on the web pages.
>>
>> Is there a way to "add" this information to a given web page for 
>> purposes of indexing as part of the crawl process?  Maybe another 
>> point in the process before the index is generated?  I am hoping there 
>> is a point in the crawl process where I can add augmented content to a 
>> page in the nutch segment (rough thought based on very limited time 
>> spent looking at nutch).
>>
>> We are comfortable using java and can write custom code as needed.  I 
>> would appreciate any pointers on where to look in the nutch code.
>>
>> Thanks in advance,
>> Chris.....
>>
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to