Great. Do you think it would be possible to have a default configuration for a small index of the top 10000 entities as measured by popularity?
I am also thinking of building maven artifacts to embed the opennlp models in version 1.5 without checking them in the Stanbol svn repo. I could help you bundle a set of small entity indexes. Also could you write a howto for building indexes? I think such howto should better be written as text file in the stanbol source tree or better as a new documentation page for the stanbol website (using the markdown syntax) rather than a new wikipage on the IKS wiki). As soon as you have such an howto ready I would be glad to write a bunch of pig scripts to build indexes for topics (rather than entities) so as to be able to perform document level topic assignment rather than occurrence-based entity lookups. -- Olivier
