Hi Nick There's nothing in Thinking Sphinx that does this... however, I think the --buildstops flag for indexer may get you that data of figuring out the most commonly indexed words.
indexer --config path/to/config.conf --buildstops words.txt 1000 That'll grab the 1000 most common indexed words. Have a read of the docs for more info (the flag itself is slightly misleading) - search for buildstops: http://sphinxsearch.com/docs/manual-1.10.html Hope this helps -- Pat On 01/03/2011, at 5:44 PM, Nicholas Faiz wrote: > Hi, > > I'd like to create some metadata from a large source of documents (which are > stored in a db). This is to build a strong set of facets for search. > > The first example I'm trying to solve is reducing a text column in the > database, called abstract, to a set of commonly occuring keywords - e.g. > abstract_keywords . The abstract is a lengthy document, and I'd like to find > a library which can scan all values of abstract for the most common keywords, > then store those results in the metadata column - abstract_keywords (in > another table, most likely). We then hope to use abstract_keywords as a facet > attribute in Thinking Sphinx. > > Can anyone point me to a good starting place in Thinking Sphinx where I can > find this sort of scanning and aggregation of keywords functionality? Has > anyone done something similar? > > Cheers, > Nicholas > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
