Rob Nagler wrote:
I agree with you with the overhead provoked by the Oracle solution. Particularly, using intermedia with the 'Internet File System' option of 8i/9i things get extremely complex in terms of manageability. On the other hand, the user friendly interface that allows to drop a file into the DB and get indexed on the fly has a high cost in terms of system
It isn't indexed on the fly in our version (8i). Has this changed?You have to run the indexer regularly, so in this it is no better thanexternal indexing solutions. Indeed, one of the big problems is thatyou can't qualify the query *prior* to index search afaik. It seemsto search the entire index always. In our case, this is extremelycostly, because our space naturally divides, and isolated indexeswould solve the problem much more efficiently.
Oracle claims to get the file search within the DB at a fraction of time
respect to MS flat files in the IFS solution with 8i onwards (Enterprise
edition), obviously it doesn't mean that indexing performs well compared
to an analog solution. Didn't pay attention to the fact of the reindexing
after dropping a doc inside IFS since I have definitively abandoned the idea
due to performance issues. Looking backwards to the history, from Context
to Intermedia, now the solution has become 'UltraSearch' for which I personally
have to get acquainted about improvements.
resources, for my personal point of view this particular workflow did not scale well with existing systems having installed only the RDBMS with no spare capacity, specially in terms of CPU/Memory resources.
It scales enough, if you aren't trying to solve the googleproblem. :-) For our users, it's ok performance, even for the heavyinternal users. Just being able to search message boards and fileareas (including word docs) is huge plus for us.
Ihave tried IFS within a system doing well the RDBMS job for a lo-mid sized/tuned
configuration using Solaris 2.6 and Sun Sparc. IFS made to us the horrible
first impression of putting the system down in it's knees. Frankly, didn't
had the time/patiente to understand if there was a chance to tune-up a little
more and accomplish with the scalation, in my opinion it should have been
a waste of time for that particular situation without a real machine scalation.
Following your suggestion, I could drop the PDF textual contents achieved using pdftotext to a 'TEXT' datatype into a PostgreSQL, then use a search engine to look inside it to resemble a similar functionality regarding intermedia.
Regarding the search engine, guess that it should be necessary to have at least a de-structurated text search algorithm along with something like SOUNDEX in Oracle.
I don't think intermedia uses SOUNDEX. It does pure keywordmatching. It's particularly bad in my opinion. It also doesn't learnwhat people really want to know. For example, if you search:http://www.bivio.com/pub/search?s=taxesYou always get the IRS Pubs, but this is rarely what people arelooking for on our site (although they should read the publications,they are more interested in what bivio can do for them in terms oftaxes). Note the performance on the search. The data set you aresearching in the public case is very small in comparison to the wholedocument database which is multi-GB.Hope this helps.Rob
I'm not sure what intermedia uses to search text, certainly it don't learns
anything about searches (don't know what 'Ultrasearch' is capable of despite
all the hyphe Oracle is putting into this technology as usually) . Regarding
the search in bivio.com, it's quite okay in terms of human-awareness response
but probably it should do better thinking in terms of a 12 pages indexed
data set.
Thanks a lot for your valuable suggestions. I will let you know just in case
of further evolution from what we've talking about.
All the best.
Fabian.