Hi,
- Is there any way to perform form based authentication? I
know
that this is a common request but I haven’t found a “good-enough”
answer to
it. The only references I’ve found are about basic auth, which I’d
prefer to
avoid. I ask this because I’ve noticed that SearchBlox, which uses
Nutch
internally, has an option to support form based auth. Was this
something
they developed on their own?
I'm not the expert in this things but I would say without hacking
some code this is today not possible.
In general there is http client plugin that uses commons httpclient.
If it is possible with httpclient somehow than it possible with nutch
somehow. :-o
- Another issue I have is authorization support. The
intranet I’m
working on has different security profiles, with sensitive stuff
that must
be hidden from some users but has to be searchable by others. What
is the
best way to do this? To have an index per profile?
In case you can extract these information from the page or based on a
url pattern I suggest to implement a indexing filter plugin that
'tag' each document with a profile:
something like;
doc.add(Field.KeyWord("profil", theProfile));
Also you need a Query Filter and than you can extend the user query with
QueryString = QueryString +"profile:managers";
- What is the best reference to implement incremental
indexing? I
wouldn’t like to rebuild my index in every crawl session. I would
rather
have it being update incrementally. Is this possible?
I'm not sure what you mean. Use the step by step crawl commands
instead of the crawl command and merge you indexes together, also
deduging is a good idea.
See the tutorial and wiki for more details.
- Can the companion web app (the search web app included
in Nutch
distribution) perform the crawling process too?
No. only command line support for now.
I ask this because I’ve
noticed that it has included a nutch-default.xml file. Maybe it
uses Quartz
or something to perform asynch processing?
:-) Not yet.
- Can Nutch perform stemming?
Not by default, if you know lucene it would be easy to add.
HTH
Stefan
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general