This previous solution would never work because core + delta
str2ordinal issues.. so I've come up with a solution and I want to
hear all your thoughts:




So basically the idea is to convert a String to int, without looking
on a whole, but making a universal number.
Downcase chars should be treated equally (optional)

HOW-TO:
Look at strings as numbers on base 36 and convert them to base 10
10 base representation will require more bits to encode than base 36
so there is a limitation on space occupied

Sphinx max integer value is using sql_attr_bigint which is a 64 bit
signed integer - This is our gate

For the tests I've made, encoding a  12-char string  gives a number
(base 10) nearly on 2^63-1 which is ~9.223372036854776e+18

Why 63?
Well it's a signed field, and I'm assuming the lowest string,
containing 12 0's, is equal to 0.
I'm working only with the positive range and first bit at 1 should
give only negative values.
So I'm working with the 64 - 1 bits.

So,
I made sure I created a column on my bd which could hold that number
(I suggest making a decimal column with 19 as precision and scale 0)
Then created the following code on core extensions:

class String

  # This is meant to be hold on a signed 8 bytes field
  # Max number is 2^63 - 1 = 9.223372036854776e+18 ( base 2 => 10)
  def int8_sort_key
    max_value = 9.223372036854776e+18 - 1

    # Max number of chars that can be safely count for sorting
    # Decreasing might speed up perfomance
    # Increasing might overflow the max value
    max_len_sz = 12

    # Converting special chars
    # -- accented chars are converted
    # -- special chars are converted to "-"
    # Note: Assumming i'm parsing regular strings with only chars and
spaces (no pontuaction neither symbols)
    str = self.parameterize.gsub("-", "").downcase

    # String must have always the same nr of chars
    # This allows to order things like "A B BA" properly
    # So we add "0"s at end if chars missing
    stuffer = "000000000000"
    stuffer = stuffer[0..max_len_sz-1]

    # Filling string if it does not have the required length
    if str.length < max_len_sz
      str = str + stuffer[str.length..stuffer.length]
    end

    # Now finally converting string (number on base 36)
    value = str[0..max_len_sz-1].to_i 36

    # Preventing value overflow
    return value > max_value ? max_value : value
  end
end


Model:
before_save :set_int8_attributes
def set_int8_attributes
  self.int8_name = self.name.int8_sort_key
  ...
end

define_index do
  ...
  # Special string sorting
  has int8_name, :as => :name, :type => :bigint
end

This is an attempt to solve this issue, this might have bugs and I'm
mistaken.
But this is working great on my BD values and sort is accomplished on
core+delta cases.

The only limitation at the moment is only being able to sort 12 chars
max.
Which I presume in a lot of cases should work.


I want to hear your thoughts about this, planning to push this to a
production environment soon.


Best regards,
Pedro


On Mar 24, 1:25 pm, Pedro Cunha <[email protected]> wrote:
> Just to clarify my typo and this seems to be working
>
> Model.search "", :order => fieldname_sort ASC
>
> On Mar 24, 12:23 pm, Pedro Cunha <[email protected]> wrote:
>
>
>
>
>
>
>
> > Back to the topic.
>
> > Using:
>
> > => indexes column_which_has_content_with_accents, :as
> > => :fieldname, :sorting=> true
>
> > +
>
> > => charset_table
>
> > Model.search "", :sort => fieldname_sort ASC
>
> > Properly sortsaccentedwords!
>
> > Best regards :)
>
> > On Mar 23, 3:56 pm, Pedro Cunha <[email protected]> wrote:
>
> > > Thanks for this information.
>
> > > Best regards,
> > > Pedro
>
> > > On Mar 22, 1:32 pm, Pat Allan <[email protected]> wrote:
>
> > > > Hi Pedro
>
> > > > Unfortunately, Sphinx only supportssortingon attributes - and so, the 
> > > > charset table settings don't get applied (they're only for fields). 
> > > > You're going to need a denormalised column (or a function that ends up 
> > > > with the same value) to make this reliable.
>
> > > > Sorry - perhaps Sphinx will be a bit more flexible with this in the 
> > > > future.
>
> > > > --
> > > > Pat
>
> > > > On 21/03/2011, at 9:26 AM, Pedro Cunha wrote:
>
> > > > > Hello,
>
> > > > > I browsed over web to accomplishsortingaccentedattributes and
> > > > > solution is pointing to:
>
> > > > > => has "LOWER(field_name)", :as => :field_name,  :type => :string
>
> > > > > This don't seem to be working.
>
> > > > > Seems like having a particular charset table only works for index
> > > > > fields, meaning, I can search over words withaccentedchars like they
> > > > > are just normal, but same doesn't happen on attributes.
>
> > > > > Is there something wrong on configuration or do I really need to have
> > > > > a normalized column forsorting?
>
> > > > > Best regards,
> > > > > Pedro
>
> > > > > --
> > > > > You received this message because you are subscribed to the Google 
> > > > > Groups "Thinking Sphinx" group.
> > > > > To post to this group, send email to [email protected].
> > > > > To unsubscribe from this group, send email to 
> > > > > [email protected].
> > > > > For more options, visit this group 
> > > > > athttp://groups.google.com/group/thinking-sphinx?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to