Endeca vs Solr?

2010-05-20 Thread kkieser

First of all, I'd like to apologize in advance for being a pretty raw newbie
when it comes to search technologies, so please bear with me!

The situation:
My company has a system that moderates 15 character free form text fields.
We have a dictionary of words in our database that are banned due to various
legal reasons (profanity, copywrite issues, etc). Our system does an intial
check when the user is entering their choices for these fields and
auto-rejects anything that matches or comes close to matching (based on
phonetics, purposefully misspelled, etc) anything on our banned list. Once
the order is placed, the system checks again to see if the fields are exact
matches to anything on our auto-approve list (held in same database as
previous list) and passes those on through. Items that do not match either
list are moved to a review queue where a customer service rep manually
reviews the items. During the review the CSR can add a word to either list,
which will prevent future orders using the newly added value from needing to
be reviewed. 

My question:
Currently our system simply holds all the words in a hash map in memory, but
we're worried about scalability. I've been asked to try and find out more
about Solr and how it compares to Endeca, which another of our department
uses but I'm not very familiar with. I've been reading the wiki and other
articles I've found online, but it seems like there's a lot of overlap of
features between Solr and Endeca, the main difference just seems to be cost.
Endeca also seems to have better support of real time searches, and has a
stricter sorting algorithm. On the other hand, it sounds like Solr
re-indexes quickly enough that its quick enough for my purposes, and its
sorting algorithm can be tweaked to match what I need. Are there any other
technical differences between the two if used in the scenario I described
above? Also, are there any important hardware footprint differences? I'm no
admin, but I believe our system runs on Jboss on a Solaris box last I
checked.

Any help or insight you guys can provide would help greatly. Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Endeca-vs-Solr-tp832826p832826.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Endeca vs Solr?

2010-05-20 Thread kkieser

Thanks for your response David! At the moment we have over 40,000 words on
our banned list, and only recently added the white list, so we anticipate
this number to jump quite quickly. I've heard Solr can handle up to around 2
million records before slowing down so I'm not too worried about hitting
that limit. Our database implementation has already started slowing down and
is causing complaints from the CSRs. This system is used on a public facing
website that gets quite a lot of traffic, which is why we're looking into
swapping from having the full database hashmap in memory to something with
an efficient index that can handle both the high traffic of users creating
designs as well as the CSRs reviewing the ones that arent auto approved or
auto rejected. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Endeca-vs-Solr-tp832826p833016.html
Sent from the Solr - User mailing list archive at Nabble.com.