Mikkel Kamstrup Erlandsen wrote:
>     Unfortunatelly this is rather hard to do for me, because in the data set
>     there were some documents that might be for internal use only, so it
>     would be very time consuming to select the "proper" ones :-(
> 
>     But maybe it is time to create one good set of documents that people can
>     freely use for testing the indexers.
> 
> 
> Maybe some wikipedia dumps? Do they have an OAI target? Maybe we could
> even takes dumps of localized wikipedias?
> 
> Cheers,
> Mikkel
> 
> PS: I ofcourse mean to strip all formatting from the harvested files.

Hello,
I've created small java application and posted it on my rarely updated
blog, which grabs some text from wikipedia (MediaWiki) as you wished,
please test it.

http://blogs.sun.com/migi/entry/wikipedia_for_indexers_testing

hope it will help with testing

-- 
best
Michal Pryc

_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to