subject:"Re\: \[racket\-dev\] Potential search improvement"

Re: [racket-dev] Potential search improvement

2012-06-14 Thread Eli Barzilay

About two weeks ago, Sam Tobin-Hochstadt wrote:
 On Tue, May 29, 2012 at 11:53 AM, Eli Barzilay e...@barzilay.org wrote:
  Three hours ago, Sam Tobin-Hochstadt wrote:
  Getting away from the discussion on sorting speed, I don't think
  my suggestion even requires sorting: just add a 1.5 for
  match-all-subword-parts-to-whole-id.
 
  That won't work, since current-line-sep will have the
  all-subword match for both entries.  The first one is whatever
  comes first in the alphabetically sorted index.  You can see the
  same problem with a search for current sep line.
 
 No, what I mean is that you should separate based on whether the
 subwords in the search string cover the entire identifier.  So
 current sep line would rank current-line-sep ahead of
 current-alist-line-sep because alist isn't matched.

This is finally done.  It wasn't nearly as straightforward as you
think: the problem is that you need to compare two bags of strings,
which I did by a double loop over the input patterns sorted by length.

I think that this concludes the current work on the search, and will
commit soon.  More testing over the current version are welcome -- you
can use it at http://pre.racket-lang.org/docs/html/search/

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!

_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-06-14 Thread Eli Barzilay

About two weeks ago, Jens Axel Søgaard wrote:
 
 A quick note: I searched for Scribble and didn't get the main
 manual. The reason is that is named:
 
  Scribble: The Racket Documentation Tool  in scribble
 
 This case could be solved by renaming it:
  Scribble - The Racket Documentation Tool  in scribble
 
 Or by stripping punctuation in manual titles.

I think that it's best to do the stripping for the inex when
generating it (right now, the index string for that is scribble: the
racket documentation tool).  But I'm not sure about it, since you
might also look for scribble: and expect it to appear.  So perhaps
it's better to just have some flag saying which index entries are
identifiers.


 I am undecided on the following:
 Try a search for list. The results are a bunch of places where list
 is exported from.
 
 It would be nice to see the these more prominently displayed:
 
List Filtering  in reference
List Iteration  in reference
List Iteration from Scratch  in guide
List Operations  in reference
 
 It won't take long, before 20 modules export list, and the
 reference and guide results disappear from the front page.

I don't see a good way around this one.  It's the same problem of not
having an ordering for the displayed results, because even if there's
a different placement for occurrences in titles there would still be a
problem with less important titles that have list in them (eg, List
of incompatibilities), and common title words would interfere with
bindings that use these words.

(Another option is some additional search operator, like something
that requires that the result is in a section title, but I think that
almost nobody is using these operators anyway.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!

_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-06-14 Thread Rodolfo Carvalho

On Tue, May 29, 2012 at 11:20 AM, Eli Barzilay e...@barzilay.org wrote:

 Just now, Justin Zamora wrote:
  The search still doesn't find words in function descriptions.

 [It's not a full-text search, and as long as it's required to run on
 client machines (needed to run on your local copy), it's unlikely to
 become a full-text search.]



It could be an SQLite-backed Full Text Search, couldn't it?
(just it would require possibly unwanted changes to the whole
architecture...)
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-06-14 Thread Rodolfo Carvalho

On Thu, Jun 14, 2012 at 10:56 PM, Rodolfo Carvalho rhcarva...@gmail.comwrote:

 On Tue, May 29, 2012 at 11:20 AM, Eli Barzilay e...@barzilay.org wrote:

 Just now, Justin Zamora wrote:
  The search still doesn't find words in function descriptions.

 [It's not a full-text search, and as long as it's required to run on
 client machines (needed to run on your local copy), it's unlikely to
 become a full-text search.]



 It could be an SQLite-backed Full Text Search, couldn't it?
 (just it would require possibly unwanted changes to the whole
 architecture...)



I had in mind that it is possible to use WebSQL or IndexedDB (on browsers
that support them), or even sql.js:

https://github.com/kripken/sql.js
Demo: http://syntensity.com/static/sql.html

The demo consumes 23 MB on Chrome while the new search page consumes around
40 MB. Of course these numbers are not to be compared directly and I don't
mean to make any comparison.
I just looked at it and cite it as a clue that it may be viable
performance-wise (i.e. sql.js itself apparently doesn't take hundreds of
megabytes of RAM).


[]'s

Rodolfo Carvalho
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Sam Tobin-Hochstadt

On Tue, May 29, 2012 at 7:17 AM, Eli Barzilay e...@barzilay.org wrote:

 ** More about the change (especially if you want to try to improve
   things):

 This is not real ranking, but it should give better results overall.
 The thing is that the search assigns a small integer score for each
 term, where the scores are (roughly)

  0 no match,
  1 match-all-subword-parts,
  2 contains a match,
  3 matches a prefix,
  4 exact match.

I think you probably want to rank/divide '1' here based on how much of
the identifier is matched by the search.  For example, if you search
for 'current-sep-line', you probably want 'current-line-sep' first,
but currently you get 'current-alist-line-sep' first.
-- 
sam th
sa...@ccs.neu.edu

_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Eli Barzilay

Just now, Sam Tobin-Hochstadt wrote:
 On Tue, May 29, 2012 at 7:17 AM, Eli Barzilay e...@barzilay.org wrote:
 
  ** More about the change (especially if you want to try to improve
    things):
 
  This is not real ranking, but it should give better results overall.
  The thing is that the search assigns a small integer score for each
  term, where the scores are (roughly)
 
   0 no match,
   1 match-all-subword-parts,
   2 contains a match,
   3 matches a prefix,
   4 exact match.
 
 I think you probably want to rank/divide '1' here based on how much of
 the identifier is matched by the search.  For example, if you search
 for 'current-sep-line', you probably want 'current-line-sep' first,
 but currently you get 'current-alist-line-sep' first.

Like I said: [...] but that would require to actually sort the
results.

(The thing is that now it does something like

  matches[score].push(entry)

and then it concatenates all of the matches arrays.  To have random
numbers, it would need to put everything in one array and then sort
it.  That can currently get to ~20k things to sort and adjust for
additional entries that get added on each release, planet packages,
etc.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!

_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Neil Van Dyke


Eli Barzilay wrote at 05/29/2012 07:17 AM:

I have made a possibly useful improvement to the JS search code.
It's not pushed, yet, but I dropped the revised JS code on the
pre-built pages so you can try it out here:

   http://pre.racket-lang.org/docs/html/search/
   

[...]

Eli, looks like a noticeable improvement over 5.2.1 search to me so 
far.  Thank you for working on this.


Here's a small quirk on pre: searching for scribble doesn't get the 
Scribble manual as the first hit, but the incremental search as you're 
typing gets you the Scribble manual as the first hit for scri through 
scribbl.


Neil V.

_
 Racket Developers list:
 http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Eli Barzilay

A few minutes ago, Neil Van Dyke wrote:
 Eli Barzilay wrote at 05/29/2012 07:17 AM:
  I have made a possibly useful improvement to the JS search code.
  It's not pushed, yet, but I dropped the revised JS code on the
  pre-built pages so you can try it out here:
 
 http://pre.racket-lang.org/docs/html/search/
 
 [...]
 
 Eli, looks like a noticeable improvement over 5.2.1 search to me so 
 far.  Thank you for working on this.
 
 Here's a small quirk on pre: searching for scribble doesn't get
 the Scribble manual as the first hit, but the incremental search as
 you're typing gets you the Scribble manual as the first hit for
 scri through scribbl.

That looks like a bug...

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Sam Tobin-Hochstadt

On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote:
 That can currently get to ~20k things to sort and adjust for
 additional entries that get added on each release, planet packages,
 etc.

Have you measured how long this takes? On my machine, the `sort()`
method on an array of 25000 strings takes 11ms in Firefox.
-- 
sam th
sa...@ccs.neu.edu
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Eli Barzilay

20 minutes ago, Sam Tobin-Hochstadt wrote:
 On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote:
  That can currently get to ~20k things to sort and adjust for
  additional entries that get added on each release, planet
  packages, etc.
 
 Have you measured how long this takes? On my machine, the `sort()`
 method on an array of 25000 strings takes 11ms in Firefox.

I didn't, but my worry is about older machines (and things like IE).
This wouldn't be an issue if I could abort the sort when there's new
user input -- but JS being what it is, once it starts sorting I can't
stop it until it's done, which means that new input characters need to
wait for the sort.

[Another option that would help is if there's a reliable (and
user-invisible) way to find out how fast things run and adjust the
delay before firing a new sort on slower machines.]

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Sam Tobin-Hochstadt

On Tue, May 29, 2012 at 8:16 AM, Eli Barzilay e...@barzilay.org wrote:
 20 minutes ago, Sam Tobin-Hochstadt wrote:
 On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote:
  That can currently get to ~20k things to sort and adjust for
  additional entries that get added on each release, planet
  packages, etc.

 Have you measured how long this takes? On my machine, the `sort()`
 method on an array of 25000 strings takes 11ms in Firefox.

 I didn't, but my worry is about older machines (and things like IE).

I think that (a) this isn't going to be a big deal for any systems,
especially if you filter out the 0 scores first, and (b) that we
should be optimizing for people who are or might become Racket
developers, who will overwhelmingly have modern systems and browsers
(including IE 9, which I bet is very fast on this).

 This wouldn't be an issue if I could abort the sort when there's new
 user input -- but JS being what it is, once it starts sorting I can't
 stop it until it's done, which means that new input characters need to
 wait for the sort.

To stop the sort in the middle, use a custom comparison function, a
bit of state, and an exception.

 [Another option that would help is if there's a reliable (and
 user-invisible) way to find out how fast things run and adjust the
 delay before firing a new sort on slower machines.]

There are a couple options here -- check for particular browsers
(using `navigator.userAgent`), or run some test like sorting a bunch
of numbers.
-- 
sam th
sa...@ccs.neu.edu
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Jens Axel Søgaard

2012/5/29 Eli Barzilay e...@barzilay.org:
 I have made a possibly useful improvement to the JS search code.
 It's not pushed, yet, but I dropped the revised JS code on the
 pre-built pages so you can try it out here:

  http://pre.racket-lang.org/docs/html/search/

 and compare searches with the usual page:

  http://docs.racket-lang.org/search/

 I'd appreciate people playing with it to find about potential problems
 with the ordering and possibly with different browsers.

I like it.

A quick note: I searched for Scribble and didn't get
the main manual. The reason is that is named:

 Scribble: The Racket Documentation Tool  in scribble

This case could be solved by renaming it:
 Scribble - The Racket Documentation Tool  in scribble

Or by stripping punctuation in manual titles. Stripping other
things than manual titles would be a bad idea (in case
actual identifiers were involved), but it it seems that
the manual titles doesn't contain any:

http://pre.racket-lang.org/docs/html/index.html

 The thing is that they used to be lumped to 2 groups with exact
 matches first.  Now I made each of these be in its own group, so
 there's a little more order.  To see an example that works nicely now
 try splay.

Sweet!

I am undecided on the following:
Try a search for list. The results are a bunch of places where list
is exported from.

It would be nice to see the these more prominently displayed:

   List Filtering  in reference
   List Iteration  in reference
   List Iteration from Scratch  in guide
   List Operations  in reference

It won't take long, before 20 modules export list, and the
reference and guide results disappear from the front page.

Hmm. How about displaying a yellow box at the top
of the results saying x hits from guide
and y hits from reference, click here to see them.
This way the guide and reference hits are in your face
for beginners.

The actual results for list:

list  provided from racket/base, racket
list  provided from r5rs
list  provided from rnrs/base-6
list  provided from lang/htdp-advanced
list  provided from lang/htdp-beginner
list  provided from lang/htdp-beginner-abbr
list  provided from lang/htdp-intermediate
list  provided from lang/htdp-intermediate-lambda
list  provided from deinprogramm/DMdA-advanced
list  provided from deinprogramm/DMdA-assignments
list  provided from deinprogramm/DMdA-vanilla
list  provided from lazy
list  provided from srfi/1
List  provided from typed/racket/base, typed/racket
list box  in gui
List Filtering  in reference
List Iteration  in reference
List Iteration from Scratch  in guide
List Operations  in reference
list patterns  in syntax

/Jens Axel

_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Sam Tobin-Hochstadt

On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote:
 Just now, Sam Tobin-Hochstadt wrote:
 On Tue, May 29, 2012 at 7:17 AM, Eli Barzilay e...@barzilay.org wrote:
 
  ** More about the change (especially if you want to try to improve
    things):
 
  This is not real ranking, but it should give better results overall.
  The thing is that the search assigns a small integer score for each
  term, where the scores are (roughly)
 
   0 no match,
   1 match-all-subword-parts,
   2 contains a match,
   3 matches a prefix,
   4 exact match.

 I think you probably want to rank/divide '1' here based on how much of
 the identifier is matched by the search.  For example, if you search
 for 'current-sep-line', you probably want 'current-line-sep' first,
 but currently you get 'current-alist-line-sep' first.

 Like I said: [...] but that would require to actually sort the
 results.

 (The thing is that now it does something like

  matches[score].push(entry)

 and then it concatenates all of the matches arrays.  To have random
 numbers, it would need to put everything in one array and then sort
 it.  That can currently get to ~20k things to sort and adjust for
 additional entries that get added on each release, planet packages,
 etc.)

Getting away from the discussion on sorting speed, I don't think my
suggestion even requires sorting: just add a 1.5 for
match-all-subword-parts-to-whole-id.

-- 
sam th
sa...@ccs.neu.edu

_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Eli Barzilay

An hour and a half ago, Sam Tobin-Hochstadt wrote:
 On Tue, May 29, 2012 at 8:16 AM, Eli Barzilay e...@barzilay.org wrote:
  20 minutes ago, Sam Tobin-Hochstadt wrote:
  On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote:
   That can currently get to ~20k things to sort and adjust for
   additional entries that get added on each release, planet
   packages, etc.
 
  Have you measured how long this takes? On my machine, the
  `sort()` method on an array of 25000 strings takes 11ms in
  Firefox.
 
  I didn't, but my worry is about older machines (and things like
  IE).
 
 I think that (a) this isn't going to be a big deal for any systems,
 especially if you filter out the 0 scores first, and (b) that we
 should be optimizing for people who are or might become Racket
 developers, who will overwhelmingly have modern systems and browsers
 (including IE 9, which I bet is very fast on this).

The sorting happens on each and every update, which can happen after
every key -- and there are still schools that have old browsers with
slow machines.  (We've been through this discussion before, BTW.)


  This wouldn't be an issue if I could abort the sort when there's new
  user input -- but JS being what it is, once it starts sorting I can't
  stop it until it's done, which means that new input characters need to
  wait for the sort.
 
 To stop the sort in the middle, use a custom comparison function, a
 bit of state, and an exception.

This might work.


  [Another option that would help is if there's a reliable (and
  user-invisible) way to find out how fast things run and adjust the
  delay before firing a new sort on slower machines.]
 
 There are a couple options here -- check for particular browsers
 (using `navigator.userAgent`), or run some test like sorting a bunch
 of numbers.

(The user agent is useless; a test requires running for some
measurable time which might interfere with typing and there's also the
non-trivial job of finding some good way to infer a good delay for the
resulting timer.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Justin Zamora

The search still doesn't find words in function descriptions. For
example, http://pre.racket-lang.org/docs/html/search/index.html?q=sine
returns no results. This is especially frustrating since the very
first exercise in HTDP 1e is to use the search to find out whether
DrRacket has a sine function.

Justin

On Tue, May 29, 2012 at 7:17 AM, Eli Barzilay e...@barzilay.org wrote:
I have made a possibly useful improvement to the JS search code.
It's not pushed, yet, but I dropped the revised JS code on the
pre-built pages so you can try it out here:

http://pre.racket-lang.org/docs/html/search/

and compare searches with the usual page:

http://docs.racket-lang.org/search/

I'd appreciate people playing with it to find about potential problems
with the ordering and possibly with different browsers.

** More about the change (especially if you want to try to improve
things):

This is not real ranking, but it should give better results overall.
The thing is that the search assigns a small integer score for each
term, where the scores are (roughly)

0 no match,
1 match-all-subword-parts,
2 contains a match,
3 matches a prefix,
4 exact match.

The thing is that they used to be lumped to 2 groups with exact
matches first. Now I made each of these be in its own group, so
there's a little more order. To see an example that works nicely now
try splay.

This doesn't solve all problems... To see problematic things (that
Neil has complained about in the past) try:

* port (gives precedence for exact matches, but the reference
entries are better; better now with the chapters appearing right
after the exact binding matches).

* fold (same problem, where it could be argued that for most
people foldl from `racket/base' is better than fold from the
DMdA languages and `srfi/1').

Some of the problem comes from having no preferences for the results.
Such preferences are not hard to implement, but they connect two
unrelated pieces of code (the score assignments in the JS search, and
the bonus for each manual) and it can quickly get into sticky
questions.

Another aspect of the problem is that there's N search terms, not just
one. Currently, the score for each is combined with a `min'; a `max'
tends to be worse. Ideally, it would use an average, but that would
require to actually sort the results.

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!
_
Racket Developers list:
http://lists.racket-lang.org/dev

_
Racket Developers list:
http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Eli Barzilay

Just now, Justin Zamora wrote:
 The search still doesn't find words in function descriptions.

[It's not a full-text search, and as long as it's required to run on
client machines (needed to run on your local copy), it's unlikely to
become a full-text search.]

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Matthias Felleisen


On May 29, 2012, at 10:18 AM, Justin Zamora wrote:

 This is  especially frustrating since the very first exercise in HTDP 1e is 
 to use the search to find out whether DrRacket has a sine function.

(It is okay for students to guess that sometimes they may have to search for a 
slightly different word, sin here.) 
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Eli Barzilay

20 minutes ago, Eli Barzilay wrote:
 An hour and a half ago, Sam Tobin-Hochstadt wrote:
  
  To stop the sort in the middle, use a custom comparison function,
  a bit of state, and an exception.
 
 This might work.

I was confused.  It does work, but it's not enough to be able to throw
an exception -- I also need some form of a yield() call to check if it
should be interrupted...  Is there something like that?

(The search code started as a simple thing that I CPSed so it can be
killed when there's new user input -- if there's a way to do the above
then that code can be simplified too.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Matthias Felleisen


On May 29, 2012, at 10:25 AM, Eli Barzilay wrote:

 20 minutes ago, Eli Barzilay wrote:
 An hour and a half ago, Sam Tobin-Hochstadt wrote:
 
 To stop the sort in the middle, use a custom comparison function,
 a bit of state, and an exception.
 
 This might work.
 
 I was confused.  It does work, but it's not enough to be able to throw
 an exception -- I also need some form of a yield() call to check if it
 should be interrupted...  Is there something like that?
 
 (The search code started as a simple thing that I CPSed so it can be
 killed when there's new user input -- if there's a way to do the above
 then that code can be simplified too.)

cps? Oh what a case study in expressiveness -) 
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Sam Tobin-Hochstadt

On Tue, May 29, 2012 at 10:25 AM, Eli Barzilay e...@barzilay.org wrote:
 20 minutes ago, Eli Barzilay wrote:
 An hour and a half ago, Sam Tobin-Hochstadt wrote:
 
  To stop the sort in the middle, use a custom comparison function,
  a bit of state, and an exception.

 This might work.

 I was confused.  It does work, but it's not enough to be able to throw
 an exception -- I also need some form of a yield() call to check if it
 should be interrupted...  Is there something like that?

No, not cross-browser.  You'll need to manually do the CPS and defer
the continuation to the next turn in the event loop.  Writing a merge
sort in this style might work.

`yield` is coming in the next version of JS, and is already in
Firefox, but that doesn't help now.
-- 
sam th
sa...@ccs.neu.edu

_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Eli Barzilay

Four hours ago, Eli Barzilay wrote:
 A few minutes ago, Neil Van Dyke wrote:
  Eli Barzilay wrote at 05/29/2012 07:17 AM:
   I have made a possibly useful improvement to the JS search code.
   It's not pushed, yet, but I dropped the revised JS code on the
   pre-built pages so you can try it out here:
  
  http://pre.racket-lang.org/docs/html/search/
  
  [...]
  
  Eli, looks like a noticeable improvement over 5.2.1 search to me so 
  far.  Thank you for working on this.
  
  Here's a small quirk on pre: searching for scribble doesn't get
  the Scribble manual as the first hit, but the incremental search as
  you're typing gets you the Scribble manual as the first hit for
  scri through scribbl.
 
 That looks like a bug...

...which was apparently there before too.  Fixed now.

(But that entry is not really helpful.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Eli Barzilay

Three hours ago, Sam Tobin-Hochstadt wrote:
 On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote:
  Just now, Sam Tobin-Hochstadt wrote:
  I think you probably want to rank/divide '1' here based on how
  much of the identifier is matched by the search.  For example, if
  you search for 'current-sep-line', you probably want
  'current-line-sep' first, but currently you get
  'current-alist-line-sep' first.
 [...]
 
 Getting away from the discussion on sorting speed, I don't think my
 suggestion even requires sorting: just add a 1.5 for
 match-all-subword-parts-to-whole-id.

That won't work, since current-line-sep will have the all-subword
match for both entries.  The first one is whatever comes first in the
alphabetically sorted index.  You can see the same problem with a
search for current sep line.

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!

_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Sam Tobin-Hochstadt

On Tue, May 29, 2012 at 11:53 AM, Eli Barzilay e...@barzilay.org wrote:
 Three hours ago, Sam Tobin-Hochstadt wrote:
 On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote:
  Just now, Sam Tobin-Hochstadt wrote:
  I think you probably want to rank/divide '1' here based on how
  much of the identifier is matched by the search.  For example, if
  you search for 'current-sep-line', you probably want
  'current-line-sep' first, but currently you get
  'current-alist-line-sep' first.
 [...]

 Getting away from the discussion on sorting speed, I don't think my
 suggestion even requires sorting: just add a 1.5 for
 match-all-subword-parts-to-whole-id.

 That won't work, since current-line-sep will have the all-subword
 match for both entries.  The first one is whatever comes first in the
 alphabetically sorted index.  You can see the same problem with a
 search for current sep line.

No, what I mean is that you should separate based on whether the
subwords in the search string cover the entire identifier.  So
current sep line would rank current-line-sep ahead of
current-alist-line-sep because alist isn't matched.
-- 
sam th
sa...@ccs.neu.edu

_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Stephen Bloch


On May 29, 2012, at 11:53 AM, Eli Barzilay wrote:

 Three hours ago, Sam Tobin-Hochstadt wrote:
 On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote:
 Just now, Sam Tobin-Hochstadt wrote:
 I think you probably want to rank/divide '1' here based on how
 much of the identifier is matched by the search.  For example, if
 you search for 'current-sep-line', you probably want
 'current-line-sep' first, but currently you get
 'current-alist-line-sep' first.
 [...]
 
 Getting away from the discussion on sorting speed, I don't think my
 suggestion even requires sorting: just add a 1.5 for
 match-all-subword-parts-to-whole-id.
 
 That won't work, since current-line-sep will have the all-subword
 match for both entries.  The first one is whatever comes first in the
 alphabetically sorted index.  You can see the same problem with a
 search for current sep line.

I thought Sam's original suggestion was, when you get an all-subword match, you 
weight by the ratio of the matched length to the whole-entry length?  Thus in 
the example in question, current-line-sep would get a weight of 1.0 but 
current-alist-line-sep only 14/19=0.74.  (Or something like that, depending 
on how you count the hyphens.)  Still doesn't require any sorting, and the 
precise numbers don't matter, only their ordering.


Stephen Bloch
sbl...@adelphi.edu


_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

2012-05-29 Thread Michael Wilber

I just noticed something about the way I use search.

Just now, I wanted to find the reference docs that describe threads. So
I typed thread into the search box and got documentation about the
(thread ...) form, in both the old and new search pages.

But the reference describes Threads, not thread, so as soon as I
thought to add the final s to my query, the new search immediately
pulled the correct document up. (This is a big improvement on the old
search, which presented me with call-with-killing-threads and
callbacks for blocked threads and so forth.)

Should search take such common suffixes into account?

On Tue, 29 May 2012 07:17:16 -0400, Eli Barzilay e...@barzilay.org wrote:
 I have made a possibly useful improvement to the JS search code.
 It's not pushed, yet, but I dropped the revised JS code on the
 pre-built pages so you can try it out here:

   http://pre.racket-lang.org/docs/html/search/

 and compare searches with the usual page:

   http://docs.racket-lang.org/search/

 I'd appreciate people playing with it to find about potential problems
 with the ordering and possibly with different browsers.


 ** More about the change (especially if you want to try to improve
things):

 This is not real ranking, but it should give better results overall.
 The thing is that the search assigns a small integer score for each
 term, where the scores are (roughly)

   0 no match,
   1 match-all-subword-parts,
   2 contains a match,
   3 matches a prefix,
   4 exact match.

 The thing is that they used to be lumped to 2 groups with exact
 matches first.  Now I made each of these be in its own group, so
 there's a little more order.  To see an example that works nicely now
 try splay.

 This doesn't solve all problems...  To see problematic things (that
 Neil has complained about in the past) try:

   * port (gives precedence for exact matches, but the reference
 entries are better; better now with the chapters appearing right
 after the exact binding matches).

   * fold (same problem, where it could be argued that for most
 people foldl from `racket/base' is better than fold from the
 DMdA languages and `srfi/1').

 Some of the problem comes from having no preferences for the results.
 Such preferences are not hard to implement, but they connect two
 unrelated pieces of code (the score assignments in the JS search, and
 the bonus for each manual) and it can quickly get into sticky
 questions.

 Another aspect of the problem is that there's N search terms, not just
 one.  Currently, the score for each is combined with a `min'; a `max'
 tends to be worse.  Ideally, it would use an average, but that would
 require to actually sort the results.

 --
   ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
 http://barzilay.org/   Maze is Life!
 _
   Racket Developers list:
   http://lists.racket-lang.org/dev
_
  Racket Developers list:
  http://lists.racket-lang.org/dev

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

Re: [racket-dev] Potential search improvement

25 matches

Site Navigation

Mail list logo

Footer information