Hi Russ

It's actually a limitation of Sphinx, not Thinking Sphinx, for document ids to 
be integers. Using CRC'd versions should be possible, but you could have 
collisions - CRC32 has no guarantee of being unique.

The reason it's slow is probably due to the way Sphinx pages indexing 
queries... You can change this:
http://freelancing-god.github.com/ts/en/common_issues.html#slow_indexing

With the multiplication, you may need to compile Sphinx with 64bit document id 
support, but I'd certainly be hesitant to go down this path - as soon as you 
hit a collision in the CRC'd values, it becomes a waste of time.

Cheers

-- 
Pat

On 20/02/2011, at 1:53 AM, Russell Garner wrote:

> Hi,
> 
> I'm using Sphinx with a crawler.  As the crawler does its business, it 
> replaces all articles with updated versions.  Consequently, all articles.id 
> change and as the crawler's working, the number of results search returns 
> slowly dwindles until a rebuild can be done at the end.  I could delta, but 
> didn't want to as everything would end up in the delta index and it seems the 
> TS docs advise against this.  
> 
> I found 
> http://stackoverflow.com/questions/965656/specifying-different-column-as-doc-id-using-thinking-sphinx
>  and trawled through the commits until I found set_sphinx_primary_key.  I 
> already have a unique string column egms_id, so tried it, but it looks like 
> TS wants an integer field as sphinx_document_id performs a multiplication on 
> it.  Ok, I thought, I'll just .to_crc32 the egms_id, but this looks like it's 
> going to produce integers which are way too large - in any case, it hangs the 
> indexer :)
> 
> Is there any way for me to just use the egms_id string here?  Or can I use a 
> CRC32 by perhaps patching Sphinx to accept set_primary_key :egms_id_crc32, 
> :behaves_like_hash => true and not perform the multiplication when it sees 
> this option? Or have I missed the purpose of the multiplication (it seems 
> like it's to guarantee uniqueness, and I can already do that with a CRC of a 
> unique string)?
> 
> Apologies if this comes across confused and tired - that's because I am ;)
> 
> Cheers,
> 
> Russ
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to