Re: +Idx problems maybe?

Henrik Sarvell Tue, 03 Nov 2009 18:17:06 -0800

I tested the

 (rel key (+Fold +Ref +String))
     ...
     (fold @Str @Cls key)


version and rebuilt the index but I still can't get the search to work
in a case insensitive fasion. Did I miss something?


On Tue, Nov 3, 2009 at 11:02 AM, Henrik Sarvell <[email protected]> wrote:
> I'll try with the one you suggested, thanks for the clarifications!
>
> /Henrik
>
> On Tue, Nov 3, 2009 at 8:38 AM, Alexander Burger <[email protected]> wr=
ote:
>> Hi Henrik,
>>
>>> I took a look at the pilog file, I already get what same and range are
>>> doing but what are part, head and fold doing?
>>
>> You are on the right track. You used 'tolr', but this actually makes
>> sense only in combination with the '+Sn' (Soundex) prefix. The whole
>> matter is rather complicated, because there are so many combinations of
>> index types and Pilog comparison functions possible.
>>
>>
>> I would say that we have the following typical use cases for string
>> searches (I'll leave out numerical searches, which usually combine with
>> 'same' or 'range').
>>
>> 1. "Exact" searches. You have either a unique index
>>
>> =A0 =A0 =A0(rel key (+Key +String))
>>
>> =A0 or a non-unique index
>>
>> =A0 =A0 =A0(rel key (+Ref +String))
>>
>> =A0 and you can compare results in Pilog with
>>
>> =A0 =A0 =A0(same @Str @Cls key)
>>
>> =A0 for exact matches, or with
>>
>> =A0 =A0 =A0(head @Str @Cls key)
>>
>> =A0 for "dictionary" searches (searching only for the beginning of
>> =A0 strings). These are case-sensitive searches.
>>
>>
>> 2. "Folded" searches. They make use of the 'fold' function which keeps
>> =A0 only letters, converted to lower case, and digits.
>>
>> =A0 =A0 =A0(rel key (+Fold +Ref +String))
>> =A0 =A0 =A0...
>> =A0 =A0 =A0(fold @Str @Cls key)
>>
>> =A0 This searches only for the beginning of strings. We use it typically
>> =A0 for telephone numbers.
>>
>>
>> =A0 If a search for individual words in a key is desired, we can use
>>
>> =A0 =A0 =A0(rel key (+List +Fold +Ref +String))
>> =A0 =A0 =A0...
>> =A0 =A0 =A0(fold @Str @Cls key)
>>
>> =A0 This stores only the strings in the list (not the substrings) in
>> =A0 'fold'ed representation. So each word can be found by "dictionary"
>> =A0 search. This requires changes to the GUI and import functions,
>> =A0 though, as 'key' is not a string but a list of strings.
>>
>>
>> =A0 Finally, we can also index folded substrings:
>>
>> =A0 =A0 =A0(rel key (+Fold +Idx +String))
>> =A0 =A0 =A0...
>> =A0 =A0 =A0(part @Str @Cls key)
>>
>> =A0 This is perhaps what you need. If you go for it, I'd recommend you
>> =A0 download once more the latest testing release, as the 'part' functio=
n
>> =A0 was changed recently.
>>
>>
>> 3. "Tolerant" searches. They return first all exact (case-sensitive)
>> =A0 matches of partial strings, and then the matches according to the
>> =A0 soundex algorithm (the first letter is compared exactly
>> =A0 (case-sensitive), the rest checks for similarity). This makes mainly
>> =A0 sense for personal names.
>>
>> =A0 =A0 =A0(rel key (+Sn +Idx +String))
>> =A0 =A0 =A0...
>> =A0 =A0 =A0(tolr @Str @Cls key)
>>
>>
>> Concerning space consumption, the '+Key' and '+Ref' indexes are the most
>> economical ones. They create only a single entry in the index tree per
>> key.
>>
>> Then follow the '+List +Ref +String' indexes, which create an entry per
>> word.
>>
>> Most space-hungry are the '+Idx' indexes, as they create an entry for
>> each substring down to a length of three, and '+Sn' adds one more for
>> the soundex key.
>>
>> Cheers,
>> - Alex
>> --
>> UNSUBSCRIBE: mailto:[email protected]?subject=3dunsubscribe
>>
>
-- 
UNSUBSCRIBE: mailto:[email protected]?subject=unsubscribe

Re: +Idx problems maybe?

Reply via email to