Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Doğacan Güney
Hi, Doug Cutting wrote: Doğacan Güney wrote: I think it would make much more sense to change parse plugins to take content and return Parse[] instead of Parse. You're right. That does make more sense. OK, then should I go forward with this and implement something? This should be pretty

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Gal Nitzan
Hi, IMO it should stay the same. URL as the key and in the filter each item link element becomes the key. I will be happy to convert the current parse-rss filter to the suggested implementation. Gal. -- Original Message -- Received: Tue, 06 Feb 2007 10:36:03 AM IST From: Doğacan Güney

Nutch error messages

2007-02-06 Thread Armel T. Nene
Hi guys, I wrote a parser for parsing proprietary file formats. The plugin used to work until recently. Now when I try to parse simple CAD files I get the following error messages: INFO fetcher.Fetcher - fetching

[jira] Created: (NUTCH-439) Top Level Domains Indexing / Scoring

2007-02-06 Thread Enis Soztutar (JIRA)
Top Level Domains Indexing / Scoring Key: NUTCH-439 URL: https://issues.apache.org/jira/browse/NUTCH-439 Project: Nutch Issue Type: New Feature Components: indexer Affects Versions: 0.9.0

JobConf Questions

2007-02-06 Thread Charlie Williams
I am very new to the Nutch source code, and have been reading over the Injector class code. From what I understood of the MapReduce system there had to be both a map and reduce step in order for the algorithm to function properly. However, in CrawlDb.createJob( Configuration, Path ) a new job is

[jira] Updated: (NUTCH-439) Top Level Domains Indexing / Scoring

2007-02-06 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated NUTCH-439: Attachment: tld_plugin_v1.0.patch This is a plugin implementation for indexing and scoring top

Re: JobConf Questions

2007-02-06 Thread Dennis Kubes
If no mapper or reducer class is set in the jobConf then the code defaults to IdentityMapper and IdentityReducer respectively which essentially are pass throughs of key/value pairs. Dennis Kubes Charlie Williams wrote: I am very new to the Nutch source code, and have been reading over the

Getting a semantic version of an HTML page

2007-02-06 Thread Michael Wechner
Hi Is there any standardized way that nutch is getting a semantic version of a web-page, e.g. the HTML page is as follows html head link rel=semantic-content href=index-semantic.xml/ /head body blablabal .. /body /html and the sematic XML (index-semantic.xml) would be something more useful

Re: JobConf Questions

2007-02-06 Thread Charlie Williams
thanks for the clarification! -Charlie Williams On 2/6/07, Dennis Kubes [EMAIL PROTECTED] wrote: If no mapper or reducer class is set in the jobConf then the code defaults to IdentityMapper and IdentityReducer respectively which essentially are pass throughs of key/value pairs. Dennis Kubes

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Doug Cutting
Doğacan Güney wrote: OK, then should I go forward with this and implement something? This should be pretty easy, though I am not sure what to give as keys to a Parse[]. I mean, when getParse returned a single Parse, ParseSegment output them as url, Parse. But, if getParse returns an array,

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Chris Mattmann
Hi Doug, Since the target of the link must still be indexed separately from the item itself, how much use is all this? If the RSS document is considered a single page that changes frequently, and item's links are considered ordinary outlinks, isn't much the same effect achieved? IMHO, yes.

api.RegexURLFilterBase - Configuration Resources

2007-02-06 Thread Tobias Zahn
Hello! I have written a new plugin extending the IndexingFilter and using the RegexURLFilterBase class. In the log there is this message: FATAL api.RegexURLFilterBase - Can't find resource: null I don't know how to handle that Configuration-Objects (setConf() etc.) What should I do to avoid that

[jira] Created: (NUTCH-440) Command line utilities should exit with an error message when given wrong arguments

2007-02-06 Thread JIRA
Command line utilities should exit with an error message when given wrong arguments --- Key: NUTCH-440 URL: https://issues.apache.org/jira/browse/NUTCH-440 Project:

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Renaud Richardet
Hi Chris, Doug, Chris Mattmann wrote: Hi Doug, Since the target of the link must still be indexed separately from the item itself, how much use is all this? If the RSS document is considered a single page that changes frequently, and item's links are considered ordinary outlinks, isn't

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Doug Cutting
Renaud Richardet wrote: The usecase is that you index RSS-feeds, but your users can search each feed-entry as a single document. Does it makes sense? But each feed item also contains a link whose content will be indexed and that's generally a superset of the item. So should there be two

Re: api.RegexURLFilterBase - Configuration Resources

2007-02-06 Thread Renaud Richardet
Tobias Zahn wrote: Hello! I have written a new plugin extending the IndexingFilter and using the RegexURLFilterBase class. In the log there is this message: FATAL api.RegexURLFilterBase - Can't find resource: null in your new class CustomIndexingFilter, create a field Configuration conf,

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Renaud Richardet
Doug Cutting wrote: Renaud Richardet wrote: The usecase is that you index RSS-feeds, but your users can search each feed-entry as a single document. Does it makes sense? But each feed item also contains a link whose content will be indexed and that's generally a superset of the item.

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Doğacan Güney
Renaud Richardet wrote: Doug Cutting wrote: Renaud Richardet wrote: The usecase is that you index RSS-feeds, but your users can search each feed-entry as a single document. Does it makes sense? But each feed item also contains a link whose content will be indexed and that's generally a