I've been thinking about string attributes again lately, especially in terms of facets, since my previous approach (see http://groups.google.com/group/thinking-sphinx/browse_thread/thread/c8cc4fb1e38f7679/76f353007ff827ef?lnk=gst&q=string+attribute#76f353007ff827ef) had speed issues. I came up with a way faster solution: https://gist.github.com/924493. Now I'm wondering if it would make sense to port that stuff back to Thinking Sphinx.
I'm thinking the following: When defining a string attribute, Thinking Sphinx could internally keep 2 attributes (similar to what it does for facets now) – the original string value as well as its str2ordinal counterpart. For sorting, grouping and filtering one could use the str2ordinal value and stuff like facet labels it could use the actual string value. This would largely allow to get rid of the translation part that does all the reverse lookups which is the main reason for the speed decrease. The main issue I see would be the support for 2 different flavors: pre 1.10-beta Sphinx installations would need the current implementation whereas 1.10-beta and later could use the new implementation. In the near future, there might even be a third version that could lose the sorting/grouping/filtering column, once Sphinx is able to do that natively on string attributes. The question is: Would it make sense to implement that in Thinking Sphinx, bearing in mind the additional complexity it brings (mapping all grouping, sorting and filtering to a custom column internally – although that happens for facets anyway). If so, I'm happy to try coming up with a clean implementation. Otherwise, I'll just adapt a blog post that I have in my pipeline where I explain the whole issue and my solution. WDYT? -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
