Re: Clustering, Collapsing

Tommaso Teofili Mon, 11 Jun 2012 00:00:58 -0700

Hi Deejay,

2012/6/8 Deejay <[email protected]>


> Hi all,
>
> I recently discovered Apache UIMA, and it looks like a very large project!
> I
> was hoping that someone more experienced with it than I could comment on
> whether there are parts of the project that could help with my problem.
>
> I need to go over many millions of objects (Protocol Buffers in HBase, as
> it
> happens), and cluster them according to their similarity. Once each
> cluster is
> formed, I need to 'collapse' each property of the objects to find the most
> prevalent value. After this, the collapsed object will be added to a Solr
> index.
>

I think you could take advantage of UIMA Collection Processing Engine [1],
particularly by using a UIMA-AS based architecture since it looks like you
are handling huge collections [2].
Apart from the specific algorithms used for clustering / collapsing, which
would define the UIMA pipeline implementations/configurations, you could
use SolrCas [3] to finally write data in the index.


>
> Would any part of Apache UIMA be useful for the clustering or collapsing,
> or
> have I misunderstood the nature of the project?
>
>
HTH
Tommaso

[1] :
http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.cpe
[2] : http://uima.apache.org/doc-uimaas-what.html
[3] : http://uima.apache.org/sandbox.html#solrcas.consumer

Re: Clustering, Collapsing

Reply via email to