This sounds like you are actually looking for the project next door: Mahout.
UIMA really doesn't have a lot to do with clustering (although you could do some things). We do use UIMA for information extraction *before* clustering and sending it to Solr, though, as a sort of preprocessing to get relevant features from unstructured text. But it doesn't sound like that's what you're trying to do.
HTH, Jens On 06/08/2012 05:44 PM, Deejay wrote:
Hi all, I recently discovered Apache UIMA, and it looks like a very large project! I was hoping that someone more experienced with it than I could comment on whether there are parts of the project that could help with my problem. I need to go over many millions of objects (Protocol Buffers in HBase, as it happens), and cluster them according to their similarity. Once each cluster is formed, I need to 'collapse' each property of the objects to find the most prevalent value. After this, the collapsed object will be added to a Solr index. Would any part of Apache UIMA be useful for the clustering or collapsing, or have I misunderstood the nature of the project?
