Hello,

I have been reading this list for quite a while. This was frustrating at times 
because very often I thought, "If only I could release Arch now, I could help 
this..., and this..., and this..." But, it was not ready. Now it is ready and I 
am more than happy to release it.

I hope it will be useful in more than one way. A few examples:

-          People often asked how to avoid a complete re-crawl when a crawl 
fails. With Arch, you can do it. You can split your web site into areas and 
crawl them separately as needed. Then they are combined into a single index. If 
a crawl fails and you restart Arch, it will start with the area that failed, 
skipping already indexed ones.
-          People asked how to use Nutch classes from Java. Arch is doing that, 
see the sources.
-          People had issues with updating pages in the index. Arch does not 
have this problem.

Arch has a lot more than the above. For me, as a webmaster, it has everything 
that I can ask for: document level security, easy support for multiple web 
sites, modular pluggable authentication, automatic dynamic site directory, 
scheduled cheap index updates.

A very important feature is improved document weighting scheme. It works 
fantastic on intranets. No more users' complains about finding junk instead of 
what they expect to find.

Arch has a dual (PHP and JSP) interface. For those of you that prefer PHP to 
Java, the PHP interface will be easier to customise.

More information, sources, screenshots and binaries are available here:

http://www.atnf.csiro.au/computing/software/arch/index.html

Sorry, no demo is available, as Arch runs behind the firewall at ATNF. I hope 
to get it out in the open in a few days.


Regards,


Arkadi Kosmynin
CSIRO Astronomy and Space Science

Reply via email to