Hi,
Doug Cutting wrote:
Doğacan Güney wrote:
I think it would make much more sense to change parse plugins to take
content and return Parse[] instead of Parse.
You're right. That does make more sense.
OK, then should I go forward with this and implement something? This
should be pretty
Hi,
IMO it should stay the same.
URL as the key and in the filter each item link element becomes the key.
I will be happy to convert the current parse-rss filter to the suggested
implementation.
Gal.
-- Original Message --
Received: Tue, 06 Feb 2007 10:36:03 AM IST
From: Doğacan Güney
Hi guys,
I wrote a parser for parsing proprietary file formats. The plugin used to work
until recently. Now when I try to parse simple CAD files I get the following
error messages:
INFO fetcher.Fetcher - fetching
Top Level Domains Indexing / Scoring
Key: NUTCH-439
URL: https://issues.apache.org/jira/browse/NUTCH-439
Project: Nutch
Issue Type: New Feature
Components: indexer
Affects Versions: 0.9.0
I am very new to the Nutch source code, and have been reading over the
Injector class code. From what I understood of the MapReduce system there
had to be both a map and reduce step in order for the algorithm to function
properly. However, in CrawlDb.createJob( Configuration, Path ) a new job is
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v1.0.patch
This is a plugin implementation for indexing and scoring top
If no mapper or reducer class is set in the jobConf then the code
defaults to IdentityMapper and IdentityReducer respectively which
essentially are pass throughs of key/value pairs.
Dennis Kubes
Charlie Williams wrote:
I am very new to the Nutch source code, and have been reading over the
Hi
Is there any standardized way that nutch is getting a semantic version
of a web-page, e.g. the HTML page is as follows
html
head
link rel=semantic-content href=index-semantic.xml/
/head
body
blablabal ..
/body
/html
and the sematic XML (index-semantic.xml) would be something more useful
thanks for the clarification!
-Charlie Williams
On 2/6/07, Dennis Kubes [EMAIL PROTECTED] wrote:
If no mapper or reducer class is set in the jobConf then the code
defaults to IdentityMapper and IdentityReducer respectively which
essentially are pass throughs of key/value pairs.
Dennis Kubes
Doğacan Güney wrote:
OK, then should I go forward with this and implement something? This
should be pretty easy,
though I am not sure what to give as keys to a Parse[].
I mean, when getParse returned a single Parse, ParseSegment output them
as url, Parse. But, if getParse
returns an array,
Hi Doug,
Since the target of the link must still be indexed separately from the
item itself, how much use is all this? If the RSS document is
considered a single page that changes frequently, and item's links are
considered ordinary outlinks, isn't much the same effect achieved?
IMHO, yes.
Hello!
I have written a new plugin extending the IndexingFilter and using the
RegexURLFilterBase class.
In the log there is this message:
FATAL api.RegexURLFilterBase - Can't find resource: null
I don't know how to handle that Configuration-Objects (setConf() etc.)
What should I do to avoid that
Command line utilities should exit with an error message when given wrong
arguments
---
Key: NUTCH-440
URL: https://issues.apache.org/jira/browse/NUTCH-440
Project:
Hi Chris, Doug,
Chris Mattmann wrote:
Hi Doug,
Since the target of the link must still be indexed separately from the
item itself, how much use is all this? If the RSS document is
considered a single page that changes frequently, and item's links are
considered ordinary outlinks, isn't
Renaud Richardet wrote:
The usecase is that you index RSS-feeds, but your users can search each
feed-entry as a single document. Does it makes sense?
But each feed item also contains a link whose content will be indexed
and that's generally a superset of the item. So should there be two
Tobias Zahn wrote:
Hello!
I have written a new plugin extending the IndexingFilter and using the
RegexURLFilterBase class.
In the log there is this message:
FATAL api.RegexURLFilterBase - Can't find resource: null
in your new class CustomIndexingFilter, create a field Configuration
conf,
Doug Cutting wrote:
Renaud Richardet wrote:
The usecase is that you index RSS-feeds, but your users can search
each feed-entry as a single document. Does it makes sense?
But each feed item also contains a link whose content will be indexed
and that's generally a superset of the item.
Renaud Richardet wrote:
Doug Cutting wrote:
Renaud Richardet wrote:
The usecase is that you index RSS-feeds, but your users can search
each feed-entry as a single document. Does it makes sense?
But each feed item also contains a link whose content will be indexed
and that's generally a
18 matches
Mail list logo