> > I think it would be better to have the junit tests
> > start jetty then
> > crawl localhost. I'd love to see some end-to-end
> > unit tests like that.

+1

> I think this would also make it nice to test things
> like recursive linking, parsing pdfs or other file
> formats, observing robots.txt or any crawling bugs
> that are encountered and then fixed.

I think end to end testing must focus on end to end problems (ie checking
pdf parsing is already checked by unit tests, and it is really the right
place for
doing it).
It should be better to performs some end to end tests (functional tests) for
checking (not exhaustive):
* that depending on many configurations, the good documents are fecthed and
correctly parsed (as you suggested it).
* checking some limit cases : Protocol errors, Corrupted content,
* Performs some fetching/crawling/indexing performance tests with many confs
* Performs some searching performance tests with many querying
charges/database size, ...

That just some ideas....
But it could be very cool if you can work on this subject.

Suggestions for where to put such test content in the
> tree?

What about creating a trunk/qa "module" ?

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to