Hey there, I'd like to ask for your input on the following scenario:
I've got a webcrawler that retrieves xml and html files. The files can be either stored in the database or the filesystem, the are identified with a unique id in the db. Now I want to index these files to enable a better search. Ideally this would include some feature extraction from the files. I'm not sure if sphinx or thinking sphinx is suitable for this or how I have to prepare the data for it. I can for example strip html/xml tags, put them into the db as text and index it. Does sphinx then do some kind of stemming, stop word removal etc? It's kind of hard to find out where to get started, as I'm not very experienced with full text search in rails/ruby. I'd be very happy if someone could help me a bit. Thank you, Christoph -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
