Hi, I'm using Sphinx with a crawler. As the crawler does its business, it replaces all articles with updated versions. Consequently, all articles.id change and as the crawler's working, the number of results search returns slowly dwindles until a rebuild can be done at the end. I could delta, but didn't want to as everything would end up in the delta index and it seems the TS docs advise against this.
I found http://stackoverflow.com/questions/965656/specifying-different-column-as-doc-id-using-thinking-sphinxand trawled through the commits until I found set_sphinx_primary_key. I already have a unique string column egms_id, so tried it, but it looks like TS wants an integer field as sphinx_document_idperforms a multiplication on it. Ok, I thought, I'll just .to_crc32 the egms_id, but this looks like it's going to produce integers which are way too large - in any case, it hangs the indexer :) Is there any way for me to just use the egms_id string here? Or can I use a CRC32 by perhaps patching Sphinx to accept set_primary_key :egms_id_crc32, :behaves_like_hash => true and not perform the multiplication when it sees this option? Or have I missed the purpose of the multiplication (it seems like it's to guarantee uniqueness, and I can already do that with a CRC of a unique string)? Apologies if this comes across confused and tired - that's because I am ;) Cheers, Russ -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
