I'll try with the one you suggested, thanks for the clarifications!
/Henrik
On Tue, Nov 3, 2009 at 8:38 AM, Alexander Burger a...@software-lab.de wrot=
e:
Hi Henrik,
I took a look at the pilog file, I already get what same and range are
doing but what are part, head and fold doing?
You are on the right track. You used 'tolr', but this actually makes
sense only in combination with the '+Sn' (Soundex) prefix. The whole
matter is rather complicated, because there are so many combinations of
index types and Pilog comparison functions possible.
I would say that we have the following typical use cases for string
searches (I'll leave out numerical searches, which usually combine with
'same' or 'range').
1. Exact searches. You have either a unique index
=A0 =A0 =A0(rel key (+Key +String))
=A0 or a non-unique index
=A0 =A0 =A0(rel key (+Ref +String))
=A0 and you can compare results in Pilog with
=A0 =A0 =A0(same @Str @Cls key)
=A0 for exact matches, or with
=A0 =A0 =A0(head @Str @Cls key)
=A0 for dictionary searches (searching only for the beginning of
=A0 strings). These are case-sensitive searches.
2. Folded searches. They make use of the 'fold' function which keeps
=A0 only letters, converted to lower case, and digits.
=A0 =A0 =A0(rel key (+Fold +Ref +String))
=A0 =A0 =A0...
=A0 =A0 =A0(fold @Str @Cls key)
=A0 This searches only for the beginning of strings. We use it typically
=A0 for telephone numbers.
=A0 If a search for individual words in a key is desired, we can use
=A0 =A0 =A0(rel key (+List +Fold +Ref +String))
=A0 =A0 =A0...
=A0 =A0 =A0(fold @Str @Cls key)
=A0 This stores only the strings in the list (not the substrings) in
=A0 'fold'ed representation. So each word can be found by dictionary
=A0 search. This requires changes to the GUI and import functions,
=A0 though, as 'key' is not a string but a list of strings.
=A0 Finally, we can also index folded substrings:
=A0 =A0 =A0(rel key (+Fold +Idx +String))
=A0 =A0 =A0...
=A0 =A0 =A0(part @Str @Cls key)
=A0 This is perhaps what you need. If you go for it, I'd recommend you
=A0 download once more the latest testing release, as the 'part' function
=A0 was changed recently.
3. Tolerant searches. They return first all exact (case-sensitive)
=A0 matches of partial strings, and then the matches according to the
=A0 soundex algorithm (the first letter is compared exactly
=A0 (case-sensitive), the rest checks for similarity). This makes mainly
=A0 sense for personal names.
=A0 =A0 =A0(rel key (+Sn +Idx +String))
=A0 =A0 =A0...
=A0 =A0 =A0(tolr @Str @Cls key)
Concerning space consumption, the '+Key' and '+Ref' indexes are the most
economical ones. They create only a single entry in the index tree per
key.
Then follow the '+List +Ref +String' indexes, which create an entry per
word.
Most space-hungry are the '+Idx' indexes, as they create an entry for
each substring down to a length of three, and '+Sn' adds one more for
the soundex key.
Cheers,
- Alex
--
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe
--
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe