Re: +Idx problems maybe?

2009-11-03 Thread Henrik Sarvell
I'll try with the one you suggested, thanks for the clarifications!

/Henrik

On Tue, Nov 3, 2009 at 8:38 AM, Alexander Burger a...@software-lab.de wrot=
e:
 Hi Henrik,

 I took a look at the pilog file, I already get what same and range are
 doing but what are part, head and fold doing?

 You are on the right track. You used 'tolr', but this actually makes
 sense only in combination with the '+Sn' (Soundex) prefix. The whole
 matter is rather complicated, because there are so many combinations of
 index types and Pilog comparison functions possible.


 I would say that we have the following typical use cases for string
 searches (I'll leave out numerical searches, which usually combine with
 'same' or 'range').

 1. Exact searches. You have either a unique index

 =A0 =A0 =A0(rel key (+Key +String))

 =A0 or a non-unique index

 =A0 =A0 =A0(rel key (+Ref +String))

 =A0 and you can compare results in Pilog with

 =A0 =A0 =A0(same @Str @Cls key)

 =A0 for exact matches, or with

 =A0 =A0 =A0(head @Str @Cls key)

 =A0 for dictionary searches (searching only for the beginning of
 =A0 strings). These are case-sensitive searches.


 2. Folded searches. They make use of the 'fold' function which keeps
 =A0 only letters, converted to lower case, and digits.

 =A0 =A0 =A0(rel key (+Fold +Ref +String))
 =A0 =A0 =A0...
 =A0 =A0 =A0(fold @Str @Cls key)

 =A0 This searches only for the beginning of strings. We use it typically
 =A0 for telephone numbers.


 =A0 If a search for individual words in a key is desired, we can use

 =A0 =A0 =A0(rel key (+List +Fold +Ref +String))
 =A0 =A0 =A0...
 =A0 =A0 =A0(fold @Str @Cls key)

 =A0 This stores only the strings in the list (not the substrings) in
 =A0 'fold'ed representation. So each word can be found by dictionary
 =A0 search. This requires changes to the GUI and import functions,
 =A0 though, as 'key' is not a string but a list of strings.


 =A0 Finally, we can also index folded substrings:

 =A0 =A0 =A0(rel key (+Fold +Idx +String))
 =A0 =A0 =A0...
 =A0 =A0 =A0(part @Str @Cls key)

 =A0 This is perhaps what you need. If you go for it, I'd recommend you
 =A0 download once more the latest testing release, as the 'part' function
 =A0 was changed recently.


 3. Tolerant searches. They return first all exact (case-sensitive)
 =A0 matches of partial strings, and then the matches according to the
 =A0 soundex algorithm (the first letter is compared exactly
 =A0 (case-sensitive), the rest checks for similarity). This makes mainly
 =A0 sense for personal names.

 =A0 =A0 =A0(rel key (+Sn +Idx +String))
 =A0 =A0 =A0...
 =A0 =A0 =A0(tolr @Str @Cls key)


 Concerning space consumption, the '+Key' and '+Ref' indexes are the most
 economical ones. They create only a single entry in the index tree per
 key.

 Then follow the '+List +Ref +String' indexes, which create an entry per
 word.

 Most space-hungry are the '+Idx' indexes, as they create an entry for
 each substring down to a length of three, and '+Sn' adds one more for
 the soundex key.

 Cheers,
 - Alex
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe

-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: +Idx problems maybe?

2009-11-02 Thread Henrik Sarvell
I did the rebuild but I'm still not getting any results when running
the above query and I know for a fact that I should be getting at
least two.

Doing the scan gives me a long list, this is the end of it:

(zerosum dirt(nap) - Home . {8o}) {8o}
(zhou {BH}) {BH}
(ziDesigns - Zaigham's Corner // Updates . {9i}) {9i}
(zie {5S}) {5S}
(zin {CL}) {CL}
(zine {1N}) {1N}
(zine {1v}) {1v}
(zine {3Q}) {3Q}
(zine {4z}) {4z}
(zine {;m}) {;m}
(zine {Ai}) {Ai}
(zine {i}) {i}
(zing {73}) {73}
(ziuba {6X}) {6X}
(zny {3m}) {3m}
(zons {D}) {D}
(zor {CB}) {CB}
(zorberry's {:b}) {:b}
(zorblades {4m}) {4m}
(zweilAI.net {1j}) {1j}
(zysz {4D}) {4D}
(|architect's {b}) {b}
(=C3=A4lp {DK}) {DK}
(=C3=B4ng {8T}) {8T}
(=C3=B6kmotorkonsult {CV}) {CV}
(=C3=B6ren {6Y}) {6Y}
(=C3=BCro {Av}) {Av}
(=CE=BB Tony's Blog =CE=BB . {A6}) {A6}
(=E3=80=8A=E8=B4=A2=E7=BB=8F=E7=BD=91=E3=80=8B-English . {:q}) {:q}
(=E3=80=8B-English {:q}) {:q}
(=E7=BB=8F=E7=BD=91=E3=80=8B-English {:q}) {:q}
(=E7=BD=91=E3=80=8B-English {:q}) {:q}
(=E8=B4=A2=E7=BB=8F=E7=BD=91=E3=80=8B-English {:q}) {:q}

Does that output tell you anything?

On Mon, Nov 2, 2009 at 12:25 PM, Henrik Sarvell hsarv...@gmail.com wrote:
 Thanks, I'll try it out.

 /Henrik


 On Mon, Nov 2, 2009 at 8:13 AM, Alexander Burger a...@software-lab.de wr=
ote:
 Hi Henrik,

 (rel title =C2=A0 =C2=A0 (+Idx +String)) #
 ...
 The +Idx wasn't there when the feeds in question were first created, I
 added it because I thought it would be necessary or speed things up.

 OK.

 (mapc '((F)(put! F 'title (; F title))) (collect 'fid '+Feed))

 This has no effect, because 'put!' will not modify the object if the
 new value is the current value, and therefore will also not modify or
 create the index as a side effect.


 The easiest is to use 'rebuild' to (re)build an index:

 =C2=A0 (load lib/too.l)
 =C2=A0 (rebuild (collect 'fid '+Feed) 'title '+Feed)


 (mapc show
 =C2=A0 =C2=A0(solve
 =C2=A0 =C2=A0 =C2=A0(quote @Str big
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (select (@Feeds)
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0((title +Feed @Str))
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(tolr @Str @Feeds title)))
 =C2=A0 =C2=A0 =c2...@feeds ))

 This query looks correct.

 You could also try to dump the index directly

 =C2=A0 (scan (tree 'title '+Feed))

 Cheers,
 - Alex
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe


-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: +Idx problems maybe?

2009-11-02 Thread Henrik Sarvell
Oops seems like the substring matching wasn't case insensitive...

Do I have to take care of that explicitly or is there some sibling of
tolr that will do case insensitive searches too?

I took a look at the pilog file, I already get what same and range are
doing but what are part, head and fold doing?

/Henrik

On Mon, Nov 2, 2009 at 9:39 PM, Henrik Sarvell hsarv...@gmail.com wrote:
 I did the rebuild but I'm still not getting any results when running
 the above query and I know for a fact that I should be getting at
 least two.

 Doing the scan gives me a long list, this is the end of it:

 (zerosum dirt(nap) - Home . {8o}) {8o}
 (zhou {BH}) {BH}
 (ziDesigns - Zaigham's Corner // Updates . {9i}) {9i}
 (zie {5S}) {5S}
 (zin {CL}) {CL}
 (zine {1N}) {1N}
 (zine {1v}) {1v}
 (zine {3Q}) {3Q}
 (zine {4z}) {4z}
 (zine {;m}) {;m}
 (zine {Ai}) {Ai}
 (zine {i}) {i}
 (zing {73}) {73}
 (ziuba {6X}) {6X}
 (zny {3m}) {3m}
 (zons {D}) {D}
 (zor {CB}) {CB}
 (zorberry's {:b}) {:b}
 (zorblades {4m}) {4m}
 (zweilAI.net {1j}) {1j}
 (zysz {4D}) {4D}
 (|architect's {b}) {b}
 (=C3=A4lp {DK}) {DK}
 (=C3=B4ng {8T}) {8T}
 (=C3=B6kmotorkonsult {CV}) {CV}
 (=C3=B6ren {6Y}) {6Y}
 (=C3=BCro {Av}) {Av}
 (=CE=BB Tony's Blog =CE=BB . {A6}) {A6}
 (=E3=80=8A=E8=B4=A2=E7=BB=8F=E7=BD=91=E3=80=8B-English . {:q}) {:q}
 (=E3=80=8B-English {:q}) {:q}
 (=E7=BB=8F=E7=BD=91=E3=80=8B-English {:q}) {:q}
 (=E7=BD=91=E3=80=8B-English {:q}) {:q}
 (=E8=B4=A2=E7=BB=8F=E7=BD=91=E3=80=8B-English {:q}) {:q}

 Does that output tell you anything?

 On Mon, Nov 2, 2009 at 12:25 PM, Henrik Sarvell hsarv...@gmail.com wrot=
e:
 Thanks, I'll try it out.

 /Henrik


 On Mon, Nov 2, 2009 at 8:13 AM, Alexander Burger a...@software-lab.de w=
rote:
 Hi Henrik,

 (rel title =C2=A0 =C2=A0 (+Idx +String)) #
 ...
 The +Idx wasn't there when the feeds in question were first created, I
 added it because I thought it would be necessary or speed things up.

 OK.

 (mapc '((F)(put! F 'title (; F title))) (collect 'fid '+Feed))

 This has no effect, because 'put!' will not modify the object if the
 new value is the current value, and therefore will also not modify or
 create the index as a side effect.


 The easiest is to use 'rebuild' to (re)build an index:

 =C2=A0 (load lib/too.l)
 =C2=A0 (rebuild (collect 'fid '+Feed) 'title '+Feed)


 (mapc show
 =C2=A0 =C2=A0(solve
 =C2=A0 =C2=A0 =C2=A0(quote @Str big
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (select (@Feeds)
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0((title +Feed @Str))
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(tolr @Str @Feeds title)))
 =C2=A0 =C2=A0 =c2...@feeds ))

 This query looks correct.

 You could also try to dump the index directly

 =C2=A0 (scan (tree 'title '+Feed))

 Cheers,
 - Alex
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe



-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: +Idx problems maybe?

2009-11-02 Thread Alexander Burger
Hi Henrik,

 I took a look at the pilog file, I already get what same and range are
 doing but what are part, head and fold doing?

You are on the right track. You used 'tolr', but this actually makes
sense only in combination with the '+Sn' (Soundex) prefix. The whole
matter is rather complicated, because there are so many combinations of
index types and Pilog comparison functions possible.


I would say that we have the following typical use cases for string
searches (I'll leave out numerical searches, which usually combine with
'same' or 'range').

1. Exact searches. You have either a unique index

  (rel key (+Key +String))

   or a non-unique index

  (rel key (+Ref +String))

   and you can compare results in Pilog with

  (same @Str @Cls key)

   for exact matches, or with

  (head @Str @Cls key)

   for dictionary searches (searching only for the beginning of
   strings). These are case-sensitive searches.


2. Folded searches. They make use of the 'fold' function which keeps
   only letters, converted to lower case, and digits.

  (rel key (+Fold +Ref +String))
  ...
  (fold @Str @Cls key)

   This searches only for the beginning of strings. We use it typically
   for telephone numbers.


   If a search for individual words in a key is desired, we can use

  (rel key (+List +Fold +Ref +String))
  ...
  (fold @Str @Cls key)

   This stores only the strings in the list (not the substrings) in
   'fold'ed representation. So each word can be found by dictionary
   search. This requires changes to the GUI and import functions,
   though, as 'key' is not a string but a list of strings.


   Finally, we can also index folded substrings:

  (rel key (+Fold +Idx +String))
  ...
  (part @Str @Cls key)

   This is perhaps what you need. If you go for it, I'd recommend you
   download once more the latest testing release, as the 'part' function
   was changed recently.


3. Tolerant searches. They return first all exact (case-sensitive)
   matches of partial strings, and then the matches according to the
   soundex algorithm (the first letter is compared exactly
   (case-sensitive), the rest checks for similarity). This makes mainly
   sense for personal names.

  (rel key (+Sn +Idx +String))
  ...
  (tolr @Str @Cls key)


Concerning space consumption, the '+Key' and '+Ref' indexes are the most
economical ones. They create only a single entry in the index tree per
key.

Then follow the '+List +Ref +String' indexes, which create an entry per
word.

Most space-hungry are the '+Idx' indexes, as they create an entry for
each substring down to a length of three, and '+Sn' adds one more for
the soundex key.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


+Idx problems maybe?

2009-11-01 Thread Henrik Sarvell
I want to be able to search for feeds by substring in their titles,
the E/R looks like this at the moment:

(class +Feed +Entity) #
(rel fid   (+Key +Number)) #
(rel title (+Idx +String)) #
(rel xmlUrl(+Key +String)) #
(rel htmlUrl   (+Key +String)) #
(rel lastFetch (+Number))

The +Idx wasn't there when the feeds in question were first created, I
added it because I thought it would be necessary or speed things up.

Anyway, I ran the following in the hopes of generating the index,
after the fact:

(mapc '((F)(put! F 'title (; F title))) (collect 'fid '+Feed))

And then my test query looks like this:

(mapc show
   (solve
 (quote @Str big
(select (@Feeds)
   ((title +Feed @Str))
   (tolr @Str @Feeds title)))
 @Feeds ))

Which wouldn't return anything so I tested this instead:

(mapc show
   (solve
 (quote @Str big
(select (@Feeds)
   ((fid +Feed T))
   (tolr @Str @Feeds title)))
 @Feeds ))

Which returns all feeds whose title starts with a b. What did I do wrong?

/Henrik
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe