I'll try with the one you suggested, thanks for the clarifications! /Henrik
On Tue, Nov 3, 2009 at 8:38 AM, Alexander Burger <[email protected]> wrot= e: > Hi Henrik, > >> I took a look at the pilog file, I already get what same and range are >> doing but what are part, head and fold doing? > > You are on the right track. You used 'tolr', but this actually makes > sense only in combination with the '+Sn' (Soundex) prefix. The whole > matter is rather complicated, because there are so many combinations of > index types and Pilog comparison functions possible. > > > I would say that we have the following typical use cases for string > searches (I'll leave out numerical searches, which usually combine with > 'same' or 'range'). > > 1. "Exact" searches. You have either a unique index > > =A0 =A0 =A0(rel key (+Key +String)) > > =A0 or a non-unique index > > =A0 =A0 =A0(rel key (+Ref +String)) > > =A0 and you can compare results in Pilog with > > =A0 =A0 =A0(same @Str @Cls key) > > =A0 for exact matches, or with > > =A0 =A0 =A0(head @Str @Cls key) > > =A0 for "dictionary" searches (searching only for the beginning of > =A0 strings). These are case-sensitive searches. > > > 2. "Folded" searches. They make use of the 'fold' function which keeps > =A0 only letters, converted to lower case, and digits. > > =A0 =A0 =A0(rel key (+Fold +Ref +String)) > =A0 =A0 =A0... > =A0 =A0 =A0(fold @Str @Cls key) > > =A0 This searches only for the beginning of strings. We use it typically > =A0 for telephone numbers. > > > =A0 If a search for individual words in a key is desired, we can use > > =A0 =A0 =A0(rel key (+List +Fold +Ref +String)) > =A0 =A0 =A0... > =A0 =A0 =A0(fold @Str @Cls key) > > =A0 This stores only the strings in the list (not the substrings) in > =A0 'fold'ed representation. So each word can be found by "dictionary" > =A0 search. This requires changes to the GUI and import functions, > =A0 though, as 'key' is not a string but a list of strings. > > > =A0 Finally, we can also index folded substrings: > > =A0 =A0 =A0(rel key (+Fold +Idx +String)) > =A0 =A0 =A0... > =A0 =A0 =A0(part @Str @Cls key) > > =A0 This is perhaps what you need. If you go for it, I'd recommend you > =A0 download once more the latest testing release, as the 'part' function > =A0 was changed recently. > > > 3. "Tolerant" searches. They return first all exact (case-sensitive) > =A0 matches of partial strings, and then the matches according to the > =A0 soundex algorithm (the first letter is compared exactly > =A0 (case-sensitive), the rest checks for similarity). This makes mainly > =A0 sense for personal names. > > =A0 =A0 =A0(rel key (+Sn +Idx +String)) > =A0 =A0 =A0... > =A0 =A0 =A0(tolr @Str @Cls key) > > > Concerning space consumption, the '+Key' and '+Ref' indexes are the most > economical ones. They create only a single entry in the index tree per > key. > > Then follow the '+List +Ref +String' indexes, which create an entry per > word. > > Most space-hungry are the '+Idx' indexes, as they create an entry for > each substring down to a length of three, and '+Sn' adds one more for > the soundex key. > > Cheers, > - Alex > -- > UNSUBSCRIBE: mailto:[email protected]?subject=3dunsubscribe > -- UNSUBSCRIBE: mailto:[email protected]?subject=unsubscribe
