Christoph,
I'm not sure about configuring a second indexer/crawler in the
cocoon.xconf, but I do know how to restrict the path. When you configure
the crawler, add the element <exclude/> with the path to exclude in the
element. For example, this would exclude all files within folders
entitled 'search':
<cocoon-crawler>
<exclude>.*/search/.*</exclude>
</cocoon-crawler>
If you figure out how to configure the second indexer/crawler, I would be
very interested in finding out. There are ways to restrict access to parts
of the index, but I am not familiar enough with them to help you. There is
an excellent tool to help you with physically viewing what is in your
index. I downloaded it from here:
http://www.getopt.org/luke/
Hopefully, this was helpful to you.
Regards,
Joshua
|---------+------------------------------->
| | Christoph Hermann |
| | <[EMAIL PROTECTED]|
| | uschtel.de> |
| | |
| | 08/30/2005 07:15 AM |
| | Please respond to |
| | users |
| | |
|---------+------------------------------->
>------------------------------------------------------------------------------------------------------------------------------|
|
|
| To: [email protected]
|
| cc:
|
| Subject: Several Crawlers with different configurations / Lucene
Index |
>------------------------------------------------------------------------------------------------------------------------------|
Hello,
i wanted to know if there is a way to configure different
indexer/crawler (in cocoon.xconf?) so that i.e. crawler one only crawls
urls under a certain directory i.e. http://www.example.com/foo/bar (the
crawler would NOT visit example.com/baz/boo) and crawler two crawls the
entire site (example.com).
In cocoon.xconf it seems there is only one possibility to specify
configuration options. I already modified the LuceneUtil java class to
permit me to create different indexes in whatever directory i want, but
i also need to have a possibility to restrict the crawling process a
little more.
I thought about specifying different views for different crawlers, but
as it seems i cannot specify two crawler.
Is there a way to do this?
With kind regards,
Christoph
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]