Hi,

I'm using Sphinx with a crawler.  As the crawler does its business, it 
replaces all articles with updated versions.  Consequently, all articles.id 
change and as the crawler's working, the number of results search returns 
slowly dwindles until a rebuild can be done at the end.  I could delta, but 
didn't want to as everything would end up in the delta index and it seems 
the TS docs advise against this.  

I found 
http://stackoverflow.com/questions/965656/specifying-different-column-as-doc-id-using-thinking-sphinxand
 trawled through the commits until I found 
set_sphinx_primary_key.  I already have a unique string column egms_id, so 
tried it, but it looks like TS wants an integer field as 
sphinx_document_idperforms a multiplication on it.  Ok, I thought, I'll just 
.to_crc32 the egms_id, but this looks like it's going to produce integers 
which are way too large - in any case, it hangs the indexer :)

Is there any way for me to just use the egms_id string here?  Or can I use a 
CRC32 by perhaps patching Sphinx to accept set_primary_key :egms_id_crc32, 
:behaves_like_hash => true and not perform the multiplication when it sees 
this option? Or have I missed the purpose of the multiplication (it seems 
like it's to guarantee uniqueness, and I can already do that with a CRC of a 
unique string)?

Apologies if this comes across confused and tired - that's because I am ;)

Cheers,

Russ

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to