Hi Pedro
That's a very interesting approach, and while I'd thought of it before, I'd
never got around to coding it. I think it's the best solution available when
dealing with both core and delta indexes, and am very interested in hearing how
it goes in production.
One thing to make sure you're aware of - Sphinx doesn't have the concept of
signed integers, only unsigned, so perhaps there's another bit there for you to
play with.
Cheers
--
Pat
On 05/04/2011, at 2:52 AM, Pedro Cunha wrote:
> This previous solution would never work because core + delta
> str2ordinal issues.. so I've come up with a solution and I want to
> hear all your thoughts:
>
>
>
>
> So basically the idea is to convert a String to int, without looking
> on a whole, but making a universal number.
> Downcase chars should be treated equally (optional)
>
> HOW-TO:
> Look at strings as numbers on base 36 and convert them to base 10
> 10 base representation will require more bits to encode than base 36
> so there is a limitation on space occupied
>
> Sphinx max integer value is using sql_attr_bigint which is a 64 bit
> signed integer - This is our gate
>
> For the tests I've made, encoding a 12-char string gives a number
> (base 10) nearly on 2^63-1 which is ~9.223372036854776e+18
>
> Why 63?
> Well it's a signed field, and I'm assuming the lowest string,
> containing 12 0's, is equal to 0.
> I'm working only with the positive range and first bit at 1 should
> give only negative values.
> So I'm working with the 64 - 1 bits.
>
> So,
> I made sure I created a column on my bd which could hold that number
> (I suggest making a decimal column with 19 as precision and scale 0)
> Then created the following code on core extensions:
>
> class String
>
> # This is meant to be hold on a signed 8 bytes field
> # Max number is 2^63 - 1 = 9.223372036854776e+18 ( base 2 => 10)
> def int8_sort_key
> max_value = 9.223372036854776e+18 - 1
>
> # Max number of chars that can be safely count for sorting
> # Decreasing might speed up perfomance
> # Increasing might overflow the max value
> max_len_sz = 12
>
> # Converting special chars
> # -- accented chars are converted
> # -- special chars are converted to "-"
> # Note: Assumming i'm parsing regular strings with only chars and
> spaces (no pontuaction neither symbols)
> str = self.parameterize.gsub("-", "").downcase
>
> # String must have always the same nr of chars
> # This allows to order things like "A B BA" properly
> # So we add "0"s at end if chars missing
> stuffer = "000000000000"
> stuffer = stuffer[0..max_len_sz-1]
>
> # Filling string if it does not have the required length
> if str.length < max_len_sz
> str = str + stuffer[str.length..stuffer.length]
> end
>
> # Now finally converting string (number on base 36)
> value = str[0..max_len_sz-1].to_i 36
>
> # Preventing value overflow
> return value > max_value ? max_value : value
> end
> end
>
>
> Model:
> before_save :set_int8_attributes
> def set_int8_attributes
> self.int8_name = self.name.int8_sort_key
> ...
> end
>
> define_index do
> ...
> # Special string sorting
> has int8_name, :as => :name, :type => :bigint
> end
>
> This is an attempt to solve this issue, this might have bugs and I'm
> mistaken.
> But this is working great on my BD values and sort is accomplished on
> core+delta cases.
>
> The only limitation at the moment is only being able to sort 12 chars
> max.
> Which I presume in a lot of cases should work.
>
>
> I want to hear your thoughts about this, planning to push this to a
> production environment soon.
>
>
> Best regards,
> Pedro
>
>
> On Mar 24, 1:25 pm, Pedro Cunha <[email protected]> wrote:
>> Just to clarify my typo and this seems to be working
>>
>> Model.search "", :order => fieldname_sort ASC
>>
>> On Mar 24, 12:23 pm, Pedro Cunha <[email protected]> wrote:
>>
>>
>>
>>
>>
>>
>>
>>> Back to the topic.
>>
>>> Using:
>>
>>> => indexes column_which_has_content_with_accents, :as
>>> => :fieldname, :sorting=> true
>>
>>> +
>>
>>> => charset_table
>>
>>> Model.search "", :sort => fieldname_sort ASC
>>
>>> Properly sortsaccentedwords!
>>
>>> Best regards :)
>>
>>> On Mar 23, 3:56 pm, Pedro Cunha <[email protected]> wrote:
>>
>>>> Thanks for this information.
>>
>>>> Best regards,
>>>> Pedro
>>
>>>> On Mar 22, 1:32 pm, Pat Allan <[email protected]> wrote:
>>
>>>>> Hi Pedro
>>
>>>>> Unfortunately, Sphinx only supportssortingon attributes - and so, the
>>>>> charset table settings don't get applied (they're only for fields).
>>>>> You're going to need a denormalised column (or a function that ends up
>>>>> with the same value) to make this reliable.
>>
>>>>> Sorry - perhaps Sphinx will be a bit more flexible with this in the
>>>>> future.
>>
>>>>> --
>>>>> Pat
>>
>>>>> On 21/03/2011, at 9:26 AM, Pedro Cunha wrote:
>>
>>>>>> Hello,
>>
>>>>>> I browsed over web to accomplishsortingaccentedattributes and
>>>>>> solution is pointing to:
>>
>>>>>> => has "LOWER(field_name)", :as => :field_name, :type => :string
>>
>>>>>> This don't seem to be working.
>>
>>>>>> Seems like having a particular charset table only works for index
>>>>>> fields, meaning, I can search over words withaccentedchars like they
>>>>>> are just normal, but same doesn't happen on attributes.
>>
>>>>>> Is there something wrong on configuration or do I really need to have
>>>>>> a normalized column forsorting?
>>
>>>>>> Best regards,
>>>>>> Pedro
>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Thinking Sphinx" group.
>>>>>> To post to this group, send email to [email protected].
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected].
>>>>>> For more options, visit this group
>>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
>
--
You received this message because you are subscribed to the Google Groups
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/thinking-sphinx?hl=en.