Re: [racket-dev] Potential search improvement
About two weeks ago, Sam Tobin-Hochstadt wrote: On Tue, May 29, 2012 at 11:53 AM, Eli Barzilay e...@barzilay.org wrote: Three hours ago, Sam Tobin-Hochstadt wrote: Getting away from the discussion on sorting speed, I don't think my suggestion even requires sorting: just add a 1.5 for match-all-subword-parts-to-whole-id. That won't work, since current-line-sep will have the all-subword match for both entries. The first one is whatever comes first in the alphabetically sorted index. You can see the same problem with a search for current sep line. No, what I mean is that you should separate based on whether the subwords in the search string cover the entire identifier. So current sep line would rank current-line-sep ahead of current-alist-line-sep because alist isn't matched. This is finally done. It wasn't nearly as straightforward as you think: the problem is that you need to compare two bags of strings, which I did by a double loop over the input patterns sorted by length. I think that this concludes the current work on the search, and will commit soon. More testing over the current version are welcome -- you can use it at http://pre.racket-lang.org/docs/html/search/ -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
About two weeks ago, Jens Axel Søgaard wrote: A quick note: I searched for Scribble and didn't get the main manual. The reason is that is named: Scribble: The Racket Documentation Tool in scribble This case could be solved by renaming it: Scribble - The Racket Documentation Tool in scribble Or by stripping punctuation in manual titles. I think that it's best to do the stripping for the inex when generating it (right now, the index string for that is scribble: the racket documentation tool). But I'm not sure about it, since you might also look for scribble: and expect it to appear. So perhaps it's better to just have some flag saying which index entries are identifiers. I am undecided on the following: Try a search for list. The results are a bunch of places where list is exported from. It would be nice to see the these more prominently displayed: List Filtering in reference List Iteration in reference List Iteration from Scratch in guide List Operations in reference It won't take long, before 20 modules export list, and the reference and guide results disappear from the front page. I don't see a good way around this one. It's the same problem of not having an ordering for the displayed results, because even if there's a different placement for occurrences in titles there would still be a problem with less important titles that have list in them (eg, List of incompatibilities), and common title words would interfere with bindings that use these words. (Another option is some additional search operator, like something that requires that the result is in a section title, but I think that almost nobody is using these operators anyway.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On Tue, May 29, 2012 at 11:20 AM, Eli Barzilay e...@barzilay.org wrote: Just now, Justin Zamora wrote: The search still doesn't find words in function descriptions. [It's not a full-text search, and as long as it's required to run on client machines (needed to run on your local copy), it's unlikely to become a full-text search.] It could be an SQLite-backed Full Text Search, couldn't it? (just it would require possibly unwanted changes to the whole architecture...) _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On Thu, Jun 14, 2012 at 10:56 PM, Rodolfo Carvalho rhcarva...@gmail.comwrote: On Tue, May 29, 2012 at 11:20 AM, Eli Barzilay e...@barzilay.org wrote: Just now, Justin Zamora wrote: The search still doesn't find words in function descriptions. [It's not a full-text search, and as long as it's required to run on client machines (needed to run on your local copy), it's unlikely to become a full-text search.] It could be an SQLite-backed Full Text Search, couldn't it? (just it would require possibly unwanted changes to the whole architecture...) I had in mind that it is possible to use WebSQL or IndexedDB (on browsers that support them), or even sql.js: https://github.com/kripken/sql.js Demo: http://syntensity.com/static/sql.html The demo consumes 23 MB on Chrome while the new search page consumes around 40 MB. Of course these numbers are not to be compared directly and I don't mean to make any comparison. I just looked at it and cite it as a clue that it may be viable performance-wise (i.e. sql.js itself apparently doesn't take hundreds of megabytes of RAM). []'s Rodolfo Carvalho _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On Tue, May 29, 2012 at 7:17 AM, Eli Barzilay e...@barzilay.org wrote: ** More about the change (especially if you want to try to improve things): This is not real ranking, but it should give better results overall. The thing is that the search assigns a small integer score for each term, where the scores are (roughly) 0 no match, 1 match-all-subword-parts, 2 contains a match, 3 matches a prefix, 4 exact match. I think you probably want to rank/divide '1' here based on how much of the identifier is matched by the search. For example, if you search for 'current-sep-line', you probably want 'current-line-sep' first, but currently you get 'current-alist-line-sep' first. -- sam th sa...@ccs.neu.edu _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
Just now, Sam Tobin-Hochstadt wrote: On Tue, May 29, 2012 at 7:17 AM, Eli Barzilay e...@barzilay.org wrote: ** More about the change (especially if you want to try to improve things): This is not real ranking, but it should give better results overall. The thing is that the search assigns a small integer score for each term, where the scores are (roughly) 0 no match, 1 match-all-subword-parts, 2 contains a match, 3 matches a prefix, 4 exact match. I think you probably want to rank/divide '1' here based on how much of the identifier is matched by the search. For example, if you search for 'current-sep-line', you probably want 'current-line-sep' first, but currently you get 'current-alist-line-sep' first. Like I said: [...] but that would require to actually sort the results. (The thing is that now it does something like matches[score].push(entry) and then it concatenates all of the matches arrays. To have random numbers, it would need to put everything in one array and then sort it. That can currently get to ~20k things to sort and adjust for additional entries that get added on each release, planet packages, etc.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
Eli Barzilay wrote at 05/29/2012 07:17 AM: I have made a possibly useful improvement to the JS search code. It's not pushed, yet, but I dropped the revised JS code on the pre-built pages so you can try it out here: http://pre.racket-lang.org/docs/html/search/ [...] Eli, looks like a noticeable improvement over 5.2.1 search to me so far. Thank you for working on this. Here's a small quirk on pre: searching for scribble doesn't get the Scribble manual as the first hit, but the incremental search as you're typing gets you the Scribble manual as the first hit for scri through scribbl. Neil V. _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
A few minutes ago, Neil Van Dyke wrote: Eli Barzilay wrote at 05/29/2012 07:17 AM: I have made a possibly useful improvement to the JS search code. It's not pushed, yet, but I dropped the revised JS code on the pre-built pages so you can try it out here: http://pre.racket-lang.org/docs/html/search/ [...] Eli, looks like a noticeable improvement over 5.2.1 search to me so far. Thank you for working on this. Here's a small quirk on pre: searching for scribble doesn't get the Scribble manual as the first hit, but the incremental search as you're typing gets you the Scribble manual as the first hit for scri through scribbl. That looks like a bug... -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote: That can currently get to ~20k things to sort and adjust for additional entries that get added on each release, planet packages, etc. Have you measured how long this takes? On my machine, the `sort()` method on an array of 25000 strings takes 11ms in Firefox. -- sam th sa...@ccs.neu.edu _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
20 minutes ago, Sam Tobin-Hochstadt wrote: On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote: That can currently get to ~20k things to sort and adjust for additional entries that get added on each release, planet packages, etc. Have you measured how long this takes? On my machine, the `sort()` method on an array of 25000 strings takes 11ms in Firefox. I didn't, but my worry is about older machines (and things like IE). This wouldn't be an issue if I could abort the sort when there's new user input -- but JS being what it is, once it starts sorting I can't stop it until it's done, which means that new input characters need to wait for the sort. [Another option that would help is if there's a reliable (and user-invisible) way to find out how fast things run and adjust the delay before firing a new sort on slower machines.] -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On Tue, May 29, 2012 at 8:16 AM, Eli Barzilay e...@barzilay.org wrote: 20 minutes ago, Sam Tobin-Hochstadt wrote: On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote: That can currently get to ~20k things to sort and adjust for additional entries that get added on each release, planet packages, etc. Have you measured how long this takes? On my machine, the `sort()` method on an array of 25000 strings takes 11ms in Firefox. I didn't, but my worry is about older machines (and things like IE). I think that (a) this isn't going to be a big deal for any systems, especially if you filter out the 0 scores first, and (b) that we should be optimizing for people who are or might become Racket developers, who will overwhelmingly have modern systems and browsers (including IE 9, which I bet is very fast on this). This wouldn't be an issue if I could abort the sort when there's new user input -- but JS being what it is, once it starts sorting I can't stop it until it's done, which means that new input characters need to wait for the sort. To stop the sort in the middle, use a custom comparison function, a bit of state, and an exception. [Another option that would help is if there's a reliable (and user-invisible) way to find out how fast things run and adjust the delay before firing a new sort on slower machines.] There are a couple options here -- check for particular browsers (using `navigator.userAgent`), or run some test like sorting a bunch of numbers. -- sam th sa...@ccs.neu.edu _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
2012/5/29 Eli Barzilay e...@barzilay.org: I have made a possibly useful improvement to the JS search code. It's not pushed, yet, but I dropped the revised JS code on the pre-built pages so you can try it out here: http://pre.racket-lang.org/docs/html/search/ and compare searches with the usual page: http://docs.racket-lang.org/search/ I'd appreciate people playing with it to find about potential problems with the ordering and possibly with different browsers. I like it. A quick note: I searched for Scribble and didn't get the main manual. The reason is that is named: Scribble: The Racket Documentation Tool in scribble This case could be solved by renaming it: Scribble - The Racket Documentation Tool in scribble Or by stripping punctuation in manual titles. Stripping other things than manual titles would be a bad idea (in case actual identifiers were involved), but it it seems that the manual titles doesn't contain any: http://pre.racket-lang.org/docs/html/index.html The thing is that they used to be lumped to 2 groups with exact matches first. Now I made each of these be in its own group, so there's a little more order. To see an example that works nicely now try splay. Sweet! I am undecided on the following: Try a search for list. The results are a bunch of places where list is exported from. It would be nice to see the these more prominently displayed: List Filtering in reference List Iteration in reference List Iteration from Scratch in guide List Operations in reference It won't take long, before 20 modules export list, and the reference and guide results disappear from the front page. Hmm. How about displaying a yellow box at the top of the results saying x hits from guide and y hits from reference, click here to see them. This way the guide and reference hits are in your face for beginners. The actual results for list: list provided from racket/base, racket list provided from r5rs list provided from rnrs/base-6 list provided from lang/htdp-advanced list provided from lang/htdp-beginner list provided from lang/htdp-beginner-abbr list provided from lang/htdp-intermediate list provided from lang/htdp-intermediate-lambda list provided from deinprogramm/DMdA-advanced list provided from deinprogramm/DMdA-assignments list provided from deinprogramm/DMdA-vanilla list provided from lazy list provided from srfi/1 List provided from typed/racket/base, typed/racket list box in gui List Filtering in reference List Iteration in reference List Iteration from Scratch in guide List Operations in reference list patterns in syntax /Jens Axel _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote: Just now, Sam Tobin-Hochstadt wrote: On Tue, May 29, 2012 at 7:17 AM, Eli Barzilay e...@barzilay.org wrote: ** More about the change (especially if you want to try to improve things): This is not real ranking, but it should give better results overall. The thing is that the search assigns a small integer score for each term, where the scores are (roughly) 0 no match, 1 match-all-subword-parts, 2 contains a match, 3 matches a prefix, 4 exact match. I think you probably want to rank/divide '1' here based on how much of the identifier is matched by the search. For example, if you search for 'current-sep-line', you probably want 'current-line-sep' first, but currently you get 'current-alist-line-sep' first. Like I said: [...] but that would require to actually sort the results. (The thing is that now it does something like matches[score].push(entry) and then it concatenates all of the matches arrays. To have random numbers, it would need to put everything in one array and then sort it. That can currently get to ~20k things to sort and adjust for additional entries that get added on each release, planet packages, etc.) Getting away from the discussion on sorting speed, I don't think my suggestion even requires sorting: just add a 1.5 for match-all-subword-parts-to-whole-id. -- sam th sa...@ccs.neu.edu _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
An hour and a half ago, Sam Tobin-Hochstadt wrote: On Tue, May 29, 2012 at 8:16 AM, Eli Barzilay e...@barzilay.org wrote: 20 minutes ago, Sam Tobin-Hochstadt wrote: On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote: That can currently get to ~20k things to sort and adjust for additional entries that get added on each release, planet packages, etc. Have you measured how long this takes? On my machine, the `sort()` method on an array of 25000 strings takes 11ms in Firefox. I didn't, but my worry is about older machines (and things like IE). I think that (a) this isn't going to be a big deal for any systems, especially if you filter out the 0 scores first, and (b) that we should be optimizing for people who are or might become Racket developers, who will overwhelmingly have modern systems and browsers (including IE 9, which I bet is very fast on this). The sorting happens on each and every update, which can happen after every key -- and there are still schools that have old browsers with slow machines. (We've been through this discussion before, BTW.) This wouldn't be an issue if I could abort the sort when there's new user input -- but JS being what it is, once it starts sorting I can't stop it until it's done, which means that new input characters need to wait for the sort. To stop the sort in the middle, use a custom comparison function, a bit of state, and an exception. This might work. [Another option that would help is if there's a reliable (and user-invisible) way to find out how fast things run and adjust the delay before firing a new sort on slower machines.] There are a couple options here -- check for particular browsers (using `navigator.userAgent`), or run some test like sorting a bunch of numbers. (The user agent is useless; a test requires running for some measurable time which might interfere with typing and there's also the non-trivial job of finding some good way to infer a good delay for the resulting timer.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
The search still doesn't find words in function descriptions. For example, http://pre.racket-lang.org/docs/html/search/index.html?q=sine returns no results. This is especially frustrating since the very first exercise in HTDP 1e is to use the search to find out whether DrRacket has a sine function. Justin On Tue, May 29, 2012 at 7:17 AM, Eli Barzilay e...@barzilay.org wrote: I have made a possibly useful improvement to the JS search code. It's not pushed, yet, but I dropped the revised JS code on the pre-built pages so you can try it out here: http://pre.racket-lang.org/docs/html/search/ and compare searches with the usual page: http://docs.racket-lang.org/search/ I'd appreciate people playing with it to find about potential problems with the ordering and possibly with different browsers. ** More about the change (especially if you want to try to improve things): This is not real ranking, but it should give better results overall. The thing is that the search assigns a small integer score for each term, where the scores are (roughly) 0 no match, 1 match-all-subword-parts, 2 contains a match, 3 matches a prefix, 4 exact match. The thing is that they used to be lumped to 2 groups with exact matches first. Now I made each of these be in its own group, so there's a little more order. To see an example that works nicely now try splay. This doesn't solve all problems... To see problematic things (that Neil has complained about in the past) try: * port (gives precedence for exact matches, but the reference entries are better; better now with the chapters appearing right after the exact binding matches). * fold (same problem, where it could be argued that for most people foldl from `racket/base' is better than fold from the DMdA languages and `srfi/1'). Some of the problem comes from having no preferences for the results. Such preferences are not hard to implement, but they connect two unrelated pieces of code (the score assignments in the JS search, and the bonus for each manual) and it can quickly get into sticky questions. Another aspect of the problem is that there's N search terms, not just one. Currently, the score for each is combined with a `min'; a `max' tends to be worse. Ideally, it would use an average, but that would require to actually sort the results. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
Just now, Justin Zamora wrote: The search still doesn't find words in function descriptions. [It's not a full-text search, and as long as it's required to run on client machines (needed to run on your local copy), it's unlikely to become a full-text search.] -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On May 29, 2012, at 10:18 AM, Justin Zamora wrote: This is especially frustrating since the very first exercise in HTDP 1e is to use the search to find out whether DrRacket has a sine function. (It is okay for students to guess that sometimes they may have to search for a slightly different word, sin here.) _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
20 minutes ago, Eli Barzilay wrote: An hour and a half ago, Sam Tobin-Hochstadt wrote: To stop the sort in the middle, use a custom comparison function, a bit of state, and an exception. This might work. I was confused. It does work, but it's not enough to be able to throw an exception -- I also need some form of a yield() call to check if it should be interrupted... Is there something like that? (The search code started as a simple thing that I CPSed so it can be killed when there's new user input -- if there's a way to do the above then that code can be simplified too.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On May 29, 2012, at 10:25 AM, Eli Barzilay wrote: 20 minutes ago, Eli Barzilay wrote: An hour and a half ago, Sam Tobin-Hochstadt wrote: To stop the sort in the middle, use a custom comparison function, a bit of state, and an exception. This might work. I was confused. It does work, but it's not enough to be able to throw an exception -- I also need some form of a yield() call to check if it should be interrupted... Is there something like that? (The search code started as a simple thing that I CPSed so it can be killed when there's new user input -- if there's a way to do the above then that code can be simplified too.) cps? Oh what a case study in expressiveness -) _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On Tue, May 29, 2012 at 10:25 AM, Eli Barzilay e...@barzilay.org wrote: 20 minutes ago, Eli Barzilay wrote: An hour and a half ago, Sam Tobin-Hochstadt wrote: To stop the sort in the middle, use a custom comparison function, a bit of state, and an exception. This might work. I was confused. It does work, but it's not enough to be able to throw an exception -- I also need some form of a yield() call to check if it should be interrupted... Is there something like that? No, not cross-browser. You'll need to manually do the CPS and defer the continuation to the next turn in the event loop. Writing a merge sort in this style might work. `yield` is coming in the next version of JS, and is already in Firefox, but that doesn't help now. -- sam th sa...@ccs.neu.edu _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
Four hours ago, Eli Barzilay wrote: A few minutes ago, Neil Van Dyke wrote: Eli Barzilay wrote at 05/29/2012 07:17 AM: I have made a possibly useful improvement to the JS search code. It's not pushed, yet, but I dropped the revised JS code on the pre-built pages so you can try it out here: http://pre.racket-lang.org/docs/html/search/ [...] Eli, looks like a noticeable improvement over 5.2.1 search to me so far. Thank you for working on this. Here's a small quirk on pre: searching for scribble doesn't get the Scribble manual as the first hit, but the incremental search as you're typing gets you the Scribble manual as the first hit for scri through scribbl. That looks like a bug... ...which was apparently there before too. Fixed now. (But that entry is not really helpful.) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
Three hours ago, Sam Tobin-Hochstadt wrote: On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote: Just now, Sam Tobin-Hochstadt wrote: I think you probably want to rank/divide '1' here based on how much of the identifier is matched by the search. For example, if you search for 'current-sep-line', you probably want 'current-line-sep' first, but currently you get 'current-alist-line-sep' first. [...] Getting away from the discussion on sorting speed, I don't think my suggestion even requires sorting: just add a 1.5 for match-all-subword-parts-to-whole-id. That won't work, since current-line-sep will have the all-subword match for both entries. The first one is whatever comes first in the alphabetically sorted index. You can see the same problem with a search for current sep line. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On Tue, May 29, 2012 at 11:53 AM, Eli Barzilay e...@barzilay.org wrote: Three hours ago, Sam Tobin-Hochstadt wrote: On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote: Just now, Sam Tobin-Hochstadt wrote: I think you probably want to rank/divide '1' here based on how much of the identifier is matched by the search. For example, if you search for 'current-sep-line', you probably want 'current-line-sep' first, but currently you get 'current-alist-line-sep' first. [...] Getting away from the discussion on sorting speed, I don't think my suggestion even requires sorting: just add a 1.5 for match-all-subword-parts-to-whole-id. That won't work, since current-line-sep will have the all-subword match for both entries. The first one is whatever comes first in the alphabetically sorted index. You can see the same problem with a search for current sep line. No, what I mean is that you should separate based on whether the subwords in the search string cover the entire identifier. So current sep line would rank current-line-sep ahead of current-alist-line-sep because alist isn't matched. -- sam th sa...@ccs.neu.edu _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
On May 29, 2012, at 11:53 AM, Eli Barzilay wrote: Three hours ago, Sam Tobin-Hochstadt wrote: On Tue, May 29, 2012 at 7:33 AM, Eli Barzilay e...@barzilay.org wrote: Just now, Sam Tobin-Hochstadt wrote: I think you probably want to rank/divide '1' here based on how much of the identifier is matched by the search. For example, if you search for 'current-sep-line', you probably want 'current-line-sep' first, but currently you get 'current-alist-line-sep' first. [...] Getting away from the discussion on sorting speed, I don't think my suggestion even requires sorting: just add a 1.5 for match-all-subword-parts-to-whole-id. That won't work, since current-line-sep will have the all-subword match for both entries. The first one is whatever comes first in the alphabetically sorted index. You can see the same problem with a search for current sep line. I thought Sam's original suggestion was, when you get an all-subword match, you weight by the ratio of the matched length to the whole-entry length? Thus in the example in question, current-line-sep would get a weight of 1.0 but current-alist-line-sep only 14/19=0.74. (Or something like that, depending on how you count the hyphens.) Still doesn't require any sorting, and the precise numbers don't matter, only their ordering. Stephen Bloch sbl...@adelphi.edu _ Racket Developers list: http://lists.racket-lang.org/dev
Re: [racket-dev] Potential search improvement
I just noticed something about the way I use search. Just now, I wanted to find the reference docs that describe threads. So I typed thread into the search box and got documentation about the (thread ...) form, in both the old and new search pages. But the reference describes Threads, not thread, so as soon as I thought to add the final s to my query, the new search immediately pulled the correct document up. (This is a big improvement on the old search, which presented me with call-with-killing-threads and callbacks for blocked threads and so forth.) Should search take such common suffixes into account? On Tue, 29 May 2012 07:17:16 -0400, Eli Barzilay e...@barzilay.org wrote: I have made a possibly useful improvement to the JS search code. It's not pushed, yet, but I dropped the revised JS code on the pre-built pages so you can try it out here: http://pre.racket-lang.org/docs/html/search/ and compare searches with the usual page: http://docs.racket-lang.org/search/ I'd appreciate people playing with it to find about potential problems with the ordering and possibly with different browsers. ** More about the change (especially if you want to try to improve things): This is not real ranking, but it should give better results overall. The thing is that the search assigns a small integer score for each term, where the scores are (roughly) 0 no match, 1 match-all-subword-parts, 2 contains a match, 3 matches a prefix, 4 exact match. The thing is that they used to be lumped to 2 groups with exact matches first. Now I made each of these be in its own group, so there's a little more order. To see an example that works nicely now try splay. This doesn't solve all problems... To see problematic things (that Neil has complained about in the past) try: * port (gives precedence for exact matches, but the reference entries are better; better now with the chapters appearing right after the exact binding matches). * fold (same problem, where it could be argued that for most people foldl from `racket/base' is better than fold from the DMdA languages and `srfi/1'). Some of the problem comes from having no preferences for the results. Such preferences are not hard to implement, but they connect two unrelated pieces of code (the score assignments in the JS search, and the bonus for each manual) and it can quickly get into sticky questions. Another aspect of the problem is that there's N search terms, not just one. Currently, the score for each is combined with a `min'; a `max' tends to be worse. Ideally, it would use an average, but that would require to actually sort the results. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _ Racket Developers list: http://lists.racket-lang.org/dev _ Racket Developers list: http://lists.racket-lang.org/dev