There seems to be an implication that compass wont scale as well as solr - and
I'm not sure that's true at all. They will both scale as well as the underlying
Lucene.
Lucene doesn't handle distributed search or replication out of the box,
you have to implement it using some of it's features (deletion policy,
etc..). Compass provides distributed index support but mainly through
some grid solution (GigaSpaces, Oracle Coherence, Terrecota) many of
which are commercial products, or by using the JDBC Directory which
doesn't perform very well. Even when using Terracotta I don't know of an
actual deployment which handles hundreds of million of documents (do
you?) so it's hard to say how well it scales. Solr on the other hand
already provides distributed/replication mechanism which is proven to
work well on very large collections. But I do agree that if you don't
need to handle such large scale deployments Compass may still fit your
needs. If I would have to choose between Compass and Hibernate Search, I
would definitely go for Compass (much more robust architecture... not
bound to ORM... much more customizable..). More over, transaction
support and very frequent updates (as in the case with most Compass
deployments I've seen) are not always that scalable.... it very much
depends on your collection (perhaps now with the near real-time searche
support in Lucene it can be much better supported).
Solrj does have ability to write pojos and annotate them for mapping
to/from solr.
This support is extremely limited compared to Compass. Compass can
really be seen as an ORM-like framework on top of Lucene... supporting
different types of relationships and aggregation in the domain model.
This is actually one of the big differentiators between Compass and
Solr... while in Solr the schema dictates the structure of the index, in
Compass it's the domain model that defines the structure.