Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-08 Thread Thomas Passin

On 3/8/2023 3:27 PM, Peter J. Holzer wrote:

On 2023-03-08 00:12:04 -0500, Thomas Passin wrote:

On 3/7/2023 7:33 AM, Dino wrote:

in fact it's a dilemma I am facing now. My back-end returns 10
entries (I am limiting to max 10 matches server side for reasons you
can imagine). As the user keeps typing, should I restrict the
existing result set based on the new information or re-issue a API
call to the server? Things get confusing pretty fast for the user.
You don't want too many cooks in kitchen, I guess.
Played a little bit with both approaches in my little application.
Re-requesting from the server seems to win hands down in my case.
I am sure that them google engineers reached spectacular levels of UI
finesse with stuff like this.


Subject of course to trying this out, I would be inclined to send a much
larger list of responses to the client, and let the client reduce the number
to be displayed.  The latency for sending a longer list will be smaller than
establishing a new connection or even reusing an old one to send a new,
short list of responses.


That depends very much on how long that list can become. If it's 200
matches - sure, send them all, even if the client will display only 10
of them. Probably even for 2000. But if you might get 20 million matches
you surely don't want to send them all to the client.


Yes, of course.  OTOH, if you have 2000+ possibilities it's basically 
pointless to send them to the client.  You can send the first 10, and 
hope that will be worth something (it probably won't).  You can send all 
2000 and let the client show the first say 10, but that probably won't 
be worth much either.  If you have some way to prioritize them, you can 
include the scores and send the top say 100 what you send to the client, 
and let the client figure out what to do.


If you are going to have that many responses you will need some more 
complex and sophisticated approach anyway, so the whole discussion would 
 not be applicable.  And this would be getting miles (kms) away from 
the OP's situation.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-08 Thread Peter J. Holzer
On 2023-03-08 00:12:04 -0500, Thomas Passin wrote:
> On 3/7/2023 7:33 AM, Dino wrote:
> > in fact it's a dilemma I am facing now. My back-end returns 10
> > entries (I am limiting to max 10 matches server side for reasons you
> > can imagine). As the user keeps typing, should I restrict the
> > existing result set based on the new information or re-issue a API
> > call to the server? Things get confusing pretty fast for the user.
> > You don't want too many cooks in kitchen, I guess.
> > Played a little bit with both approaches in my little application.
> > Re-requesting from the server seems to win hands down in my case.
> > I am sure that them google engineers reached spectacular levels of UI
> > finesse with stuff like this.
> 
> Subject of course to trying this out, I would be inclined to send a much
> larger list of responses to the client, and let the client reduce the number
> to be displayed.  The latency for sending a longer list will be smaller than
> establishing a new connection or even reusing an old one to send a new,
> short list of responses.

That depends very much on how long that list can become. If it's 200
matches - sure, send them all, even if the client will display only 10
of them. Probably even for 2000. But if you might get 20 million matches
you surely don't want to send them all to the client.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-08 Thread Dino

On 3/7/2023 2:02 PM, avi.e.gr...@gmail.com wrote:

Some of the discussions here leave me confused as the info we think we got
early does not last long intact and often morphs into something else and we
find much of the discussion is misdirected or wasted.



Apologies. I'm the OP and also the OS (original sinner). My "mistake" 
was to go for a "stream of consciousness" kind of question, rather than 
a well researched and thought out one.


You are correct, Avi. I have a simple web UI, I came across the Whoosh 
video and got infatuated with the idea that Whoosh could be used for 
create a autofill function, as my backend is already Python/Flask. As 
many have observed and as I have also quickly realized, Whoosh was 
overkill for my use case. In the meantime people started asking 
questions, I responded and, before you know it, we are all discussing 
the intricacies of JavaScript web development in a Python forum. Should 
I have stopped them? How?


One thing is for sure: I am really grateful that so many used so much of 
their time to help.


A big thank you to each of you, friends.

Dino


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-08 Thread Dino

On 3/7/2023 1:28 PM, David Lowry-Duda wrote:

But I'll note that I use whoosh from time to time and I find it stable 
and pleasant to work with. It's true that development stopped, but it 
stopped in a very stable place. I don't recommend using whoosh here, but 
I would recommend experimenting with it more generally.


Thank you, David. Noted.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-07 Thread Thomas Passin

On 3/7/2023 7:33 AM, Dino wrote:

It must be nice to have a server or two...


No kidding

About everything else you wrote, it makes a ton of sense, in fact it's a 
dilemma I am facing now. My back-end returns 10 entries (I am limiting 
to max 10 matches server side for reasons you can imagine).
As the user keeps typing, should I restrict the existing result set 
based on the new information or re-issue a API call to the server?
Things get confusing pretty fast for the user. You don't want too many 
cooks in kitchen, I guess.
Played a little bit with both approaches in my little application. 
Re-requesting from the server seems to win hands down in my case.
I am sure that them google engineers reached spectacular levels of UI 
finesse with stuff like this.


Subject of course to trying this out, I would be inclined to send a much 
larger list of responses to the client, and let the client reduce the 
number to be displayed.  The latency for sending a longer list will be 
smaller than establishing a new connection or even reusing an old one to 
send a new, short list of responses.  When the client types more, it can 
only reduce the number of possibilities - among the (possibly imaginary) 
larger original number of them. After the next round of user typing, the 
client can check and see if there are enough surviving responses to 
list.  If not, it can then request a new list from the server.


Using this in reverse, if the user deletes some characters from the end, 
there should be no need to go back to the server.  The possible 
responses would already have been sent to the client.  They could be 
interned in an associative array keyed by the string the client had 
typed to get those responses.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-07 Thread rbowman
On Tue, 7 Mar 2023 07:33:01 -0500, Dino wrote:

> Played a little bit with both approaches in my little application.
> Re-requesting from the server seems to win hands down in my case.

That's necessary for a non-trivial data set. Assume you get 10 suggestions 
after the user type 'to'. 

today
tomorrow
tomato
tonsil
torque
totem
toad
toque
toward
touch

If the user type 'l' next and is trying for 'tolerance' you'll need a new 
set. You'll need a little refinement. If the user is a proficient typist 
and wants to type 'tolerance' they may get ahead of you. 

Another consideration is a less proficient typist or someone who can't 
spell. Again, play with maps.google.com. They're good at it. Put '123 
thomd' in the search bar. YMMV but I get 5 variations on 123 Thomas. When 
they were working down 'thompd' had zero matches so they backed up to 
'thom'.

If you play with their search they're using some more magic too.  Try '123 
ellekt'.  They may be using a variation on soundex or something more 
sophisticated. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-07 Thread Dino

On 3/6/2023 11:05 PM, rbowman wrote:


It must be nice to have a server or two...


No kidding

About everything else you wrote, it makes a ton of sense, in fact it's a 
dilemma I am facing now. My back-end returns 10 entries (I am limiting 
to max 10 matches server side for reasons you can imagine).
As the user keeps typing, should I restrict the existing result set 
based on the new information or re-issue a API call to the server?
Things get confusing pretty fast for the user. You don't want too many 
cooks in kitchen, I guess.
Played a little bit with both approaches in my little application. 
Re-requesting from the server seems to win hands down in my case.
I am sure that them google engineers reached spectacular levels of UI 
finesse with stuff like this.



On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote:


https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript


That could be annoying. My use case is address entry. When the user types

102 ma

the suggestions might be

main
manson
maple
massachusetts
masten

in a simple case. When they enter 's' it's narrowed down. Typically I'm
only dealing with a city or county so the data to be searched isn't huge.
The maps.google.com address search covers the world and they're also
throwing in a geographical constraint so the suggestions are applicable to
the area you're viewing.  



--
https://mail.python.org/mailman/listinfo/python-list


RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-07 Thread avi.e.gross
Some of the discussions here leave me confused as the info we think we got
early does not last long intact and often morphs into something else and we
find much of the discussion is misdirected or wasted.

Wouldn't it have been nice if this discussion had not started with a mention
of a package/module few have heard of along with a vague request on how best
to search for lines that match something in a file?

I still do not know enough to feel comfortable even after all this time. It
now seems to be a web-based application in which a web page wants to use
autocompletion as the user types.

So was the web page a static file that the user runs, or is it dynamically
created by something like a python program? How is the fact that a user has
typed a letter in a textbox or drop down of sorts reflected in a request
being sent to a python program to return possible choices? Is the same
process called anew each time or is it, or perhaps a group of similar
processes or threads going to stick around and be called repeatedly?

Lots of details are missing and in particular, much of what is being
described sounds like it is happening in the browser, presumably in
JavaScript. Also noted is that the first keystroke or two may return too
much data.

So does the OP still think this is a python question? So much of the
discussion sounds like it is in the browser deciding whether to wait for the
user to type more before making a request, or throwing away results of an
older request.

So my guess is that a possible design for this amount of data may simply be
to read the file into the browser at startup, or when the first letter is
typed, and do all the searches internally, perhaps cascaded as long as
backspace or editing is not used.

If the data gets much larger, of course, then using a server makes sense
albeit it need not use python unless lots more in the project is also ...

-Original Message-
From: Python-list  On
Behalf Of David Lowry-Duda
Sent: Tuesday, March 7, 2023 1:29 PM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

On 22:43 Sat 04 Mar 2023, Dino wrote:
>How can I implement this? A library called Whoosh seems very promising 
>(albeit it's so feature-rich that it's almost like shooting a fly with 
>a bazooka in my case), but I see two problems:
>
> 1) Whoosh is either abandoned or the project is a mess in terms of 
>community and support 
>(https://groups.google.com/g/whoosh/c/QM_P8cGi4v4 ) and
>
> 2) Whoosh seems to be a Python only thing, which is great for now, 
>but I wouldn't want this to become an obstacle should I need port it to 
>a different language at some point.

As others have noted, it sounds like relatively straightforward 
implementations will be sufficient.

But I'll note that I use whoosh from time to time and I find it stable 
and pleasant to work with. It's true that development stopped, but it 
stopped in a very stable place. I don't recommend using whoosh here, but 
I would recommend experimenting with it more generally.

- DLD
-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-07 Thread David Lowry-Duda

On 22:43 Sat 04 Mar 2023, Dino wrote:
How can I implement this? A library called Whoosh seems very promising 
(albeit it's so feature-rich that it's almost like shooting a fly with 
a bazooka in my case), but I see two problems:


1) Whoosh is either abandoned or the project is a mess in terms of 
community and support 
(https://groups.google.com/g/whoosh/c/QM_P8cGi4v4 ) and


2) Whoosh seems to be a Python only thing, which is great for now, 
but I wouldn't want this to become an obstacle should I need port it to 
a different language at some point.


As others have noted, it sounds like relatively straightforward 
implementations will be sufficient.


But I'll note that I use whoosh from time to time and I find it stable 
and pleasant to work with. It's true that development stopped, but it 
stopped in a very stable place. I don't recommend using whoosh here, but 
I would recommend experimenting with it more generally.


- DLD
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-07 Thread Peter J. Holzer
On 2023-03-07 04:05:19 +, rbowman wrote:
> On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote:
> > ne issue that was also correctly foreseen by some is that there's going
> > to be a new request at every user key stroke. Known problem. JavaScript
> > programmers use a trick called "debounceing" to be reasonably sure that
> > the user is done typing before a request is issued:
> > 
> > https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript
> 
> That could be annoying. My use case is address entry. When the user types

It can be. The delay is short but noticeable.

A somewhat smarter strategy is to send each query as soon as the user
hit the key but keep track of what you sent and received and discard
responses for obsolete requests (This is necessary because if you first
send "ma" and then "mas", the response to the first query might arrive
after the response to the second query and you don't want to display
"mansion" if the user already typed "mas".)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread rbowman
On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote:

> ne issue that was also correctly foreseen by some is that there's going
> to be a new request at every user key stroke. Known problem. JavaScript
> programmers use a trick called "debounceing" to be reasonably sure that
> the user is done typing before a request is issued:
> 
> https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript

That could be annoying. My use case is address entry. When the user types

102 ma

the suggestions might be 

main
manson
maple
massachusetts
masten

in a simple case. When they enter 's' it's narrowed down. Typically I'm 
only dealing with a city or county so the data to be searched isn't huge. 
The maps.google.com address search covers the world and they're also 
throwing in a geographical constraint so the suggestions are applicable to 
the area you're viewing.  It must be nice to have a server or two...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/4/2023 10:43 PM, Dino wrote:


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


Gentlemen, thanks a ton to everyone who offered to help (and did help!). 
I loved the part where some tried to divine the true meaning of my words :)


What you guys wrote is correct: the grep-esque search is guaranteed to 
turn up a ton of false positives, but for the autofill use-case, that's 
actually OK. Users will quickly figure what is not relevant and skip 
those entries, just to zero on in on the suggestion that they find relevant.


One issue that was also correctly foreseen by some is that there's going 
to be a new request at every user key stroke. Known problem. JavaScript 
programmers use a trick called "debounceing" to be reasonably sure that 
the user is done typing before a request is issued:


https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript

I was able to apply that successfully and I am now very pleased with the 
final result.


Apologies if I posted 1400 lines or data file. Seeing that certain 
newsgroups carry gigabytes of copyright infringing material must have 
conveyed the wrong impression to me.


Thank you.

Dino

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Greg Ewing via Python-list

On 7/03/23 6:49 am, avi.e.gr...@gmail.com wrote:

But the example given wanted to match something like "V6" in middle of the text 
and I do not see how that would work as you would now need to search 26 dictionaries 
completely.


It might even make things worse, as there is likely to be a lot of
overlap between entries containing "V" and entries containing "6",
so you end up searching the same data multiple times.

--
Greg

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Greg Ewing via Python-list

On 7/03/23 4:35 am, Weatherby,Gerard wrote:

If mailing space is a consideration, we could all help by keeping our replies 
short and to the point.


Indeed. A thread or two of untrimmed quoted messages is probably
more data than Dino posted!

--
Greg

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Thomas Passin

On 3/6/2023 12:49 PM, avi.e.gr...@gmail.com wrote:

Thomas,

I may have missed any discussion where the OP explained more about proposed 
usage. If the program is designed to load the full data once, never get updates 
except by re-reading some file, and then handles multiple requests, then some 
things may be worth doing.

It looked to me, and I may well be wrong, like he wanted to search for a string anywhere 
in the text so a grep-like solution is a reasonable start with the actual data being 
stored as something like a list of character strings you can search "one line" 
at a time. I suspect a numpy variant may work faster.

And of course any search function he builds can be made to remember some or all 
previous searches using a cache decorator. That generally uses a dictionary for 
the search keys internally.

But using lots of dictionaries strikes me as only helping if you are searching for text anchored to the start of a line 
so if you ask for "Honda" you instead ask the dictionary called "h" and search perhaps just for 
"onda" then recombine the prefix in any results. But the example given wanted to match something like 
"V6" in middle of the text and I do not see how that would work as you would now need to search 26 
dictionaries completely.


Well, that's the question, isn't it?  Just how is this expected to be 
used?  I didn't read the initial posting that carefully, and I may have 
missed something that makes a difference.


The OP gives as an example a user entering a string ("v60").  The 
example is for a model designation.  If we know that this entry box will 
only receive model, then I would populate a dictionary using the model 
numbers as keys.  The number of distinct keys will probably not be that 
large.


For example, highly simplified of course:

>>> models = {'v60': 'Volvo', 'GV60': 'Genesis', 'cl': 'Acura'}
>>> entry = '60'
>>> candidates = (m for m in models.keys() if entry in m)
>>> list(candidates)
['v60', 'GV60']

The keys would be lower-cased.  A separate dictionary would give the 
complete string with the desired casing.  The values could be object 
references to the complete information.  If there might be several 
different models models with the same key, then the values could be 
lists or dictionaries and one would need to do some disambiguation, but 
that should be simple or quick.


It all depends on the planned access patterns.  If the OP really wants 
full-text search in the complete unstructured data file, then yes, a 
full text indexer of some kind will be useful.  Whoosh certainly looks 
good though I have not used it.  But for populating dropdown lists in 
web forms, most likely the design of the form will provide a structure 
for the various searches.



-Original Message-
From: Python-list  On 
Behalf Of Thomas Passin
Sent: Monday, March 6, 2023 11:03 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:

Not sure if this is what Thomas meant, but I was also thinking dictionaries.

Dino could build a set of dictionaries with keys “a” through “z” that contain 
data with those letters in them. (I’m assuming case insensitive search) and 
then just search “v” if that’s what the user starts with.

Increased performance may be achieved by building dictionaries “aa”,”ab” ... 
“zz. And so on.

Of course, it’s trading CPU for memory usage, and there’s likely a point at 
which the cost of building dictionaries exceeds the savings in searching.


Chances are it would only be seconds at most to build the data cache,
and then subsequent queries would respond very quickly.



From: Python-list  on behalf of 
Thomas Passin 
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org 
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$>




--
https://mail.python.org/mailman/listinfo/python-list


RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread avi.e.gross
Ah, thanks Dino. Autocomplete within a web page can be an interesting
scenario but also a daunting one.

Now, do you mean you have a web page with a text field, initially I suppose
empty, and the user types a single character and rapidly a drop-down list or
something is created and shown? And as they type, it may shrink? And as soon
as they select one, it is replaced in the text field and done?

If your form has an attached function written in JavaScript, some might load
your data into the browser and do all that work from within. No python
needed.

Now if your scenario is similar to the above, or perhaps the user needs to
ask for autocompletion by using tab or something, and you want to keep
sending requests to a server, you can of course use any language on the
server. BUT I would be cautious in such a design.

My guess is you autocomplete on every keystroke and the user may well type
multiple characters resulting in multiple requests for your program. Is a
new one called every time or is it a running service. If the latter, it pays
to read in the data once and then carefully serve it. But when you get just
the letter "h" you may not want to send and process a thousand results but
limit It to say the first N. If they then add an o to make a ho, You may not
need to do much if it is anchored to the start except to search in the
results of the previous search rather than the whole data.

But have you done some searching on how autocomplete from a fixed corpus is
normally done? It is a quite common thing.


-Original Message-
From: Python-list  On
Behalf Of Dino
Sent: Monday, March 6, 2023 7:40 AM
To: python-list@python.org
Subject: Re: RE: Fast full-text searching in Python (job for Whoosh?)

Thank you for taking the time to write such a detailed answer, Avi. And 
apologies for not providing more info from the get go.

What I am trying to achieve here is supporting autocomplete (no pun 
intended) in a web form field, hence the -i case insensitive example in 
my initial question.

Your points are all good, and my original question was a bit rushed. I 
guess that the problem was that I saw this video:

https://www.youtube.com/watch?v=gRvZbYtwTeo_channel=NextDayVideo

The idea that someone types into an input field and matches start 
dancing in the browser made me think that this was exactly what I 
needed, and hence I figured that asking here about Whoosh would be a 
good idea. I know realize that Whoosh would be overkill for my use-case, 
as a simple (case insensitive) query substring would get me 90% of what 
I want. Speed is in the order of a few milliseconds out of the box, 
which is chump change in the context of a web UI.

Thank you again for taking the time to look at my question

Dino

On 3/5/2023 10:56 PM, avi.e.gr...@gmail.com wrote:
> Dino, Sending lots of data to an archived forum is not a great idea. I
> snipped most of it out below as not to replicate it.
> 
> Your question does not look difficult unless your real question is about
> speed. Realistically, much of the time spent generally is in reading in a
> file and the actual search can be quite rapid with a wide range of
methods.
> 
> The data looks boring enough and seems to not have much structure other
than
> one comma possibly separating two fields. Do you want the data as one wide
> filed or perhaps in two parts, which a CSV file is normally used to
> represent. Do you ever have questions like tell me all cars whose name
> begins with the letter D and has a V6 engine? If so, you may want more
than
> a vanilla search.
> 
> What exactly do you want to search for? Is it a set of built-in searches
or
> something the user types in?
> 
> The data seems to be sorted by the first field and then by the second and
I
> did not check if some searches might be ambiguous. Can there be many
entries
> containing III? Yep. Can the same words like Cruiser or Hybrid appear?
> 
> So is this a one-time search or multiple searches once loaded as in a
> service that stays resident and fields requests. The latter may be worth
> speeding up.
> 
> I don't NEED to know any of this but want you to know that the answer may
> depend on this and similar factors. We had a long discussion lately on
> whether to search using regular expressions or string methods. If your
data
> is meant to be used once, you may not even need to read the file into
> memory, but read something like a line at a time and test it. Or, if you
end
> up with more data like how many cylinders a car has, it may be time to
read
> it in not just to a list of lines or such data structures, but get
> numpy/pandas involved and use their many search methods in something like
a
> data.frame.
> 
> Of course if you are worried about portability, keep using Get Regular
> Expression Print.
> 
> Your example was:
> 
>   $ grep -i v60 all_cars_unique.csv
>  

Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Thomas Passin

On 3/6/2023 7:28 AM, Dino wrote:

On 3/5/2023 9:05 PM, Thomas Passin wrote:


I would probably ingest the data at startup into a dictionary - or 
perhaps several depending on your access patterns - and then you will 
only need to to a fast lookup in one or more dictionaries.


If your access pattern would be easier with SQL queries, load the data 
into an SQLite database on startup.


Thank you. SQLite would be overkill here, plus all the machinery that I 
would need to set up to make sure that the DB is rebuilt/updated regularly.

Do you happen to know something about Whoosh? have you ever used it?


I know nothing about it, sorry.  But anything beyond python dictionaries 
and possibly some lists strikes me as overkill for what you have described.



IOW, do the bulk of the work once at startup.


Sound advice

Thank you


--
https://mail.python.org/mailman/listinfo/python-list


RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread avi.e.gross
Thomas,

I may have missed any discussion where the OP explained more about proposed 
usage. If the program is designed to load the full data once, never get updates 
except by re-reading some file, and then handles multiple requests, then some 
things may be worth doing.

It looked to me, and I may well be wrong, like he wanted to search for a string 
anywhere in the text so a grep-like solution is a reasonable start with the 
actual data being stored as something like a list of character strings you can 
search "one line" at a time. I suspect a numpy variant may work faster.

And of course any search function he builds can be made to remember some or all 
previous searches using a cache decorator. That generally uses a dictionary for 
the search keys internally.

But using lots of dictionaries strikes me as only helping if you are searching 
for text anchored to the start of a line so if you ask for "Honda" you instead 
ask the dictionary called "h" and search perhaps just for "onda" then recombine 
the prefix in any results. But the example given wanted to match something like 
"V6" in middle of the text and I do not see how that would work as you would 
now need to search 26 dictionaries completely.



-Original Message-
From: Python-list  On 
Behalf Of Thomas Passin
Sent: Monday, March 6, 2023 11:03 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:
> Not sure if this is what Thomas meant, but I was also thinking dictionaries.
> 
> Dino could build a set of dictionaries with keys “a” through “z” that contain 
> data with those letters in them. (I’m assuming case insensitive search) and 
> then just search “v” if that’s what the user starts with.
> 
> Increased performance may be achieved by building dictionaries “aa”,”ab” ... 
> “zz. And so on.
> 
> Of course, it’s trading CPU for memory usage, and there’s likely a point at 
> which the cost of building dictionaries exceeds the savings in searching.

Chances are it would only be seconds at most to build the data cache, 
and then subsequent queries would respond very quickly.

> 
> From: Python-list  on 
> behalf of Thomas Passin 
> Date: Sunday, March 5, 2023 at 9:07 PM
> To: python-list@python.org 
> Subject: Re: Fast full-text searching in Python (job for Whoosh?)
> 
> I would probably ingest the data at startup into a dictionary - or
> perhaps several depending on your access patterns - and then you will
> only need to to a fast lookup in one or more dictionaries.
> 
> If your access pattern would be easier with SQL queries, load the data
> into an SQLite database on startup.
> 
> IOW, do the bulk of the work once at startup.
> --
> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$>

-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread avi.e.gross
Gerard,

I was politely pointing out how it was more than the minimum necessary and
might gets repeated multiple times as people replied. The storage space is a
resource someone else provides and I prefer not abusing it.

However, since the OP seems to be asking a question focused on how long it
takes to search using possible techniques, indeed some people would want the
entire data to test with.

In my personal view, the a snippet of the data is what I need to see how it
is organized and then what I need way more is some idea for what kind of
searching is needed.

If I was told there would be a web page allowing users to search a web
service hosting the data on a server with one process called as much as
needed that spawned threads to handle the task, I might see it as very
worthwhile to read in the data once into some data structure that allows
rapid searches over and over.  If it is an app called ONCE as a whole for
each result, as in the grep example, why bother and just read a line at a
time and be done with it.

My suggestion remains my preference. The discussion is archived. Messages
are can optimally be trimmed as needed and not allowed to contain the full
contents of the last twenty replies back and forth unless that is needed.
Larger amounts of data can be offered to share and if wanted, can be posted
or send to someone asking for it or placed in some public accessible place.

But my preference may not be relevant as the forum has hosts or owners and
it is what they want that counts.

The data this time was not really gigantic. But I often work with data from
a CSV that has hundreds of columns and hundreds of thousands or more rows,
with some of the columns containing large amounts of text. But I may be
interested in how to work with say just half a dozen columns and for the
purposes of my question here, perhaps a hundred representative rows. Should
I share everything, or maybe save the subset and only share that?

This is not about python as a language but about expressing ideas and
opinions on a public forum with limited resources. Yes, over the years, my
combined posts probably use far more archival space. We are not asked to be
sparse, just not be wasteful. 

The OP may consider what he is working with as a LOT of data but it really
isn't by modern standards. 

-Original Message-
From: Python-list  On
Behalf Of Weatherby,Gerard
Sent: Monday, March 6, 2023 10:35 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

"Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it."

Surely in 2023, storage is affordable enough there's no need to criticize
Dino for posting complete information. If mailing space is a consideration,
we could all help by keeping our replies short and to the point.

-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/5/2023 9:05 PM, Thomas Passin wrote:


I would probably ingest the data at startup into a dictionary - or 
perhaps several depending on your access patterns - and then you will 
only need to to a fast lookup in one or more dictionaries.


If your access pattern would be easier with SQL queries, load the data 
into an SQLite database on startup.


Thank you. SQLite would be overkill here, plus all the machinery that I 
would need to set up to make sure that the DB is rebuilt/updated regularly.

Do you happen to know something about Whoosh? have you ever used it?


IOW, do the bulk of the work once at startup.


Sound advice

Thank you
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread rbowman
On Mon, 6 Mar 2023 07:40:29 -0500, Dino wrote:

> The idea that someone types into an input field and matches start
> dancing in the browser made me think that this was exactly what I
> needed, and hence I figured that asking here about Whoosh would be a
> good idea. I know realize that Whoosh would be overkill for my use-case,
> as a simple (case insensitive) query substring would get me 90% of what
> I want. Speed is in the order of a few milliseconds out of the box,
> which is chump change in the context of a web UI.

For a web application the round trips to the server for the next set of 
suggestions swamp out the actual lookups. Use the developer console in 
your browser to look at the network traffic and you'll see it's busy.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread rbowman
On Mon, 6 Mar 2023 15:32:09 +, Weatherby,Gerard wrote:


> Increased performance may be achieved by building dictionaries “aa”,”ab”
> ... “zz. And so on.

Or a trie. There have been several implementations but I believe this is 
the most active:

https://pypi.org/project/PyTrie/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino
Thank you for taking the time to write such a detailed answer, Avi. And 
apologies for not providing more info from the get go.


What I am trying to achieve here is supporting autocomplete (no pun 
intended) in a web form field, hence the -i case insensitive example in 
my initial question.


Your points are all good, and my original question was a bit rushed. I 
guess that the problem was that I saw this video:


https://www.youtube.com/watch?v=gRvZbYtwTeo_channel=NextDayVideo

The idea that someone types into an input field and matches start 
dancing in the browser made me think that this was exactly what I 
needed, and hence I figured that asking here about Whoosh would be a 
good idea. I know realize that Whoosh would be overkill for my use-case, 
as a simple (case insensitive) query substring would get me 90% of what 
I want. Speed is in the order of a few milliseconds out of the box, 
which is chump change in the context of a web UI.


Thank you again for taking the time to look at my question

Dino

On 3/5/2023 10:56 PM, avi.e.gr...@gmail.com wrote:

Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it.

Your question does not look difficult unless your real question is about
speed. Realistically, much of the time spent generally is in reading in a
file and the actual search can be quite rapid with a wide range of methods.

The data looks boring enough and seems to not have much structure other than
one comma possibly separating two fields. Do you want the data as one wide
filed or perhaps in two parts, which a CSV file is normally used to
represent. Do you ever have questions like tell me all cars whose name
begins with the letter D and has a V6 engine? If so, you may want more than
a vanilla search.

What exactly do you want to search for? Is it a set of built-in searches or
something the user types in?

The data seems to be sorted by the first field and then by the second and I
did not check if some searches might be ambiguous. Can there be many entries
containing III? Yep. Can the same words like Cruiser or Hybrid appear?

So is this a one-time search or multiple searches once loaded as in a
service that stays resident and fields requests. The latter may be worth
speeding up.

I don't NEED to know any of this but want you to know that the answer may
depend on this and similar factors. We had a long discussion lately on
whether to search using regular expressions or string methods. If your data
is meant to be used once, you may not even need to read the file into
memory, but read something like a line at a time and test it. Or, if you end
up with more data like how many cylinders a car has, it may be time to read
it in not just to a list of lines or such data structures, but get
numpy/pandas involved and use their many search methods in something like a
data.frame.

Of course if you are worried about portability, keep using Get Regular
Expression Print.

Your example was:

  $ grep -i v60 all_cars_unique.csv
  Genesis,GV60
  Volvo,V60

You seem to have wanted case folding and that is NOT a normal search. And
your search is matching anything on any line. If you wanted only a complete
field, such as all text after a comma to the end of the line, you could use
grep specifications to say that.

But once inside python, you would need to make choices depending on what
kind of searches you want to allow but also things like do you want all
matching lines shown if you search for say "a" ...



--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/5/2023 1:19 AM, Greg Ewing wrote:

I just did a similar test with your actual data and got
about the same result. If that's fast enough for you,
then you don't need to do anything fancy.


thank you, Greg. That's what I am going to do in fact.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Thomas Passin

On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:

Not sure if this is what Thomas meant, but I was also thinking dictionaries.

Dino could build a set of dictionaries with keys “a” through “z” that contain 
data with those letters in them. (I’m assuming case insensitive search) and 
then just search “v” if that’s what the user starts with.

Increased performance may be achieved by building dictionaries “aa”,”ab” ... 
“zz. And so on.

Of course, it’s trading CPU for memory usage, and there’s likely a point at 
which the cost of building dictionaries exceeds the savings in searching.


Chances are it would only be seconds at most to build the data cache, 
and then subsequent queries would respond very quickly.




From: Python-list  on behalf of 
Thomas Passin 
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org 
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$>


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Weatherby,Gerard
“Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it.”

Surely in 2023, storage is affordable enough there’s no need to criticize Dino 
for posting complete information. If mailing space is a consideration, we could 
all help by keeping our replies short and to the point.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Weatherby,Gerard
Not sure if this is what Thomas meant, but I was also thinking dictionaries.

Dino could build a set of dictionaries with keys “a” through “z” that contain 
data with those letters in them. (I’m assuming case insensitive search) and 
then just search “v” if that’s what the user starts with.

Increased performance may be achieved by building dictionaries “aa”,”ab” ... 
“zz. And so on.

Of course, it’s trading CPU for memory usage, and there’s likely a point at 
which the cost of building dictionaries exceeds the savings in searching.


From: Python-list  on 
behalf of Thomas Passin 
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org 
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$>
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread avi.e.gross
Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it.

Your question does not look difficult unless your real question is about
speed. Realistically, much of the time spent generally is in reading in a
file and the actual search can be quite rapid with a wide range of methods.

The data looks boring enough and seems to not have much structure other than
one comma possibly separating two fields. Do you want the data as one wide
filed or perhaps in two parts, which a CSV file is normally used to
represent. Do you ever have questions like tell me all cars whose name
begins with the letter D and has a V6 engine? If so, you may want more than
a vanilla search.

What exactly do you want to search for? Is it a set of built-in searches or
something the user types in?

The data seems to be sorted by the first field and then by the second and I
did not check if some searches might be ambiguous. Can there be many entries
containing III? Yep. Can the same words like Cruiser or Hybrid appear? 

So is this a one-time search or multiple searches once loaded as in a
service that stays resident and fields requests. The latter may be worth
speeding up.

I don't NEED to know any of this but want you to know that the answer may
depend on this and similar factors. We had a long discussion lately on
whether to search using regular expressions or string methods. If your data
is meant to be used once, you may not even need to read the file into
memory, but read something like a line at a time and test it. Or, if you end
up with more data like how many cylinders a car has, it may be time to read
it in not just to a list of lines or such data structures, but get
numpy/pandas involved and use their many search methods in something like a
data.frame.

Of course if you are worried about portability, keep using Get Regular
Expression Print.

Your example was:

 $ grep -i v60 all_cars_unique.csv
 Genesis,GV60
 Volvo,V60

You seem to have wanted case folding and that is NOT a normal search. And
your search is matching anything on any line. If you wanted only a complete
field, such as all text after a comma to the end of the line, you could use
grep specifications to say that.

But once inside python, you would need to make choices depending on what
kind of searches you want to allow but also things like do you want all
matching lines shown if you search for say "a" ...




-Original Message-
From: Python-list  On
Behalf Of Dino
Sent: Saturday, March 4, 2023 10:47 PM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)


Here's the complete data file should anyone care.

Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend

smart,fortwo electric drive
smart,fortwo electric drive cabrio

-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Thomas Passin

On 3/4/2023 11:12 PM, Dino wrote:

On 3/4/2023 10:43 PM, Dino wrote:


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


I suspect I am really close to answering my own question...

 >>> import time
 >>> lis = [str(a**2+a*3+a) for a in range(0,3)]
 >>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s);

753800
 >>> s = time.process_time_ns(); res = [el for el in lis if "52356" in 
el]; print(time.process_time_ns() -s);

1068300
 >>> s = time.process_time_ns(); res = [el for el in lis if "5256" in 
el]; print(time.process_time_ns() -s);

862000
 >>> s = time.process_time_ns(); res = [el for el in lis if "6" in el]; 
print(time.process_time_ns() -s);

1447300
 >>> s = time.process_time_ns(); res = [el for el in lis if "1" in el]; 
print(time.process_time_ns() -s);

1511100
 >>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s); print(len(res), res[:10])

926900
2 ['134676021', '313467021']
 >>>

I can do a substring search in a list of 30k elements in less than 2ms 
with Python. Is my reasoning sound?


I would probably ingest the data at startup into a dictionary - or 
perhaps several depending on your access patterns - and then you will 
only need to to a fast lookup in one or more dictionaries.


If your access pattern would be easier with SQL queries, load the data 
into an SQLite database on startup.


IOW, do the bulk of the work once at startup.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino



Here's the complete data file should anyone care.

Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend
Acura,MDX
Acura,MDX Sport Hybrid
Acura,NSX
Acura,RDX
Acura,RL
Acura,RLX
Acura,RLX Sport Hybrid
Acura,RSX
Acura,SLX
Acura,TL
Acura,TLX
Acura,TSX
Acura,Vigor
Acura,ZDX
Alfa Romeo,164
Alfa Romeo,4C
Alfa Romeo,4C Spider
Alfa Romeo,Giulia
Alfa Romeo,Spider
Alfa Romeo,Stelvio
Alfa Romeo,Tonale
Aston Martin,DB11
Aston Martin,DB9
Aston Martin,DB9 GT
Aston Martin,DBS
Aston Martin,DBS Superleggera
Aston Martin,DBX
Aston Martin,Rapide
Aston Martin,Rapide S
Aston Martin,Vanquish
Aston Martin,Vanquish S
Aston Martin,Vantage
Aston Martin,Virage
Audi,100
Audi,80
Audi,90
Audi,A3
Audi,A3 Sportback e-tron
Audi,A4
Audi,A4 (2005.5)
Audi,A4 allroad
Audi,A5
Audi,A5 Sport
Audi,A6
Audi,A6 allroad
Audi,A7
Audi,A8
Audi,Cabriolet
Audi,Q3
Audi,Q4 Sportback e-tron
Audi,Q4 e-tron
Audi,Q5
Audi,Q5 Sportback
Audi,Q7
Audi,Q8
Audi,Quattro
Audi,R8
Audi,RS 3
Audi,RS 4
Audi,RS 5
Audi,RS 6
Audi,RS 7
Audi,RS Q8
Audi,RS e-tron GT
Audi,S3
Audi,S4
Audi,S4 (2005.5)
Audi,S5
Audi,S6
Audi,S7
Audi,S8
Audi,SQ5
Audi,SQ5 Sportback
Audi,SQ7
Audi,SQ8
Audi,TT
Audi,allroad
Audi,e-tron
Audi,e-tron GT
Audi,e-tron S
Audi,e-tron S Sportback
Audi,e-tron Sportback
BMW,1 Series
BMW,2 Series
BMW,3 Series
BMW,4 Series
BMW,5 Series
BMW,6 Series
BMW,7 Series
BMW,8 Series
BMW,Alpina B7
BMW,M
BMW,M2
BMW,M3
BMW,M4
BMW,M5
BMW,M6
BMW,M8
BMW,X1
BMW,X2
BMW,X3
BMW,X3 M
BMW,X4
BMW,X4 M
BMW,X5
BMW,X5 M
BMW,X6
BMW,X6 M
BMW,X7
BMW,Z3
BMW,Z4
BMW,Z4 M
BMW,Z8
BMW,i3
BMW,i4
BMW,i7
BMW,i8
BMW,iX
Bentley,Arnage
Bentley,Azure
Bentley,Azure T
Bentley,Bentayga
Bentley,Brooklands
Bentley,Continental
Bentley,Continental GT
Bentley,Flying Spur
Bentley,Mulsanne
Buick,Cascada
Buick,Century
Buick,Enclave
Buick,Encore
Buick,Encore GX
Buick,Envision
Buick,LaCrosse
Buick,LeSabre
Buick,Lucerne
Buick,Park Avenue
Buick,Rainier
Buick,Regal
Buick,Regal Sportback
Buick,Regal TourX
Buick,Rendezvous
Buick,Riviera
Buick,Roadmaster
Buick,Skylark
Buick,Terraza
Buick,Verano
Cadillac,ATS
Cadillac,ATS-V
Cadillac,Allante
Cadillac,Brougham
Cadillac,CT4
Cadillac,CT5
Cadillac,CT6
Cadillac,CT6-V
Cadillac,CTS
Cadillac,CTS-V
Cadillac,Catera
Cadillac,DTS
Cadillac,DeVille
Cadillac,ELR
Cadillac,Eldorado
Cadillac,Escalade
Cadillac,Escalade ESV
Cadillac,Escalade EXT
Cadillac,Fleetwood
Cadillac,LYRIQ
Cadillac,SRX
Cadillac,STS
Cadillac,Seville
Cadillac,Sixty Special
Cadillac,XLR
Cadillac,XT4
Cadillac,XT5
Cadillac,XT6
Cadillac,XTS
Chevrolet,1500 Extended Cab
Chevrolet,1500 Regular Cab
Chevrolet,2500 Crew Cab
Chevrolet,2500 Extended Cab
Chevrolet,2500 HD Extended Cab
Chevrolet,2500 HD Regular Cab
Chevrolet,2500 Regular Cab
Chevrolet,3500 Crew Cab
Chevrolet,3500 Extended Cab
Chevrolet,3500 HD Extended Cab
Chevrolet,3500 HD Regular Cab
Chevrolet,3500 Regular Cab
Chevrolet,APV Cargo
Chevrolet,Astro Cargo
Chevrolet,Astro Passenger
Chevrolet,Avalanche
Chevrolet,Avalanche 1500
Chevrolet,Avalanche 2500
Chevrolet,Aveo
Chevrolet,Beretta
Chevrolet,Blazer
Chevrolet,Blazer EV
Chevrolet,Bolt EUV
Chevrolet,Bolt EV
Chevrolet,Camaro
Chevrolet,Caprice
Chevrolet,Caprice Classic
Chevrolet,Captiva Sport
Chevrolet,Cavalier
Chevrolet,City Express
Chevrolet,Classic
Chevrolet,Cobalt
Chevrolet,Colorado Crew Cab
Chevrolet,Colorado Extended Cab
Chevrolet,Colorado Regular Cab
Chevrolet,Corsica
Chevrolet,Corvette
Chevrolet,Cruze
Chevrolet,Cruze Limited
Chevrolet,Equinox
Chevrolet,Equinox EV
Chevrolet,Express 1500 Cargo
Chevrolet,Express 1500 Passenger
Chevrolet,Express 2500 Cargo
Chevrolet,Express 2500 Passenger
Chevrolet,Express 3500 Cargo
Chevrolet,Express 3500 Passenger
Chevrolet,G-Series 1500
Chevrolet,G-Series 2500
Chevrolet,G-Series 3500
Chevrolet,G-Series G10
Chevrolet,G-Series G20
Chevrolet,G-Series G30
Chevrolet,HHR
Chevrolet,Impala
Chevrolet,Impala Limited
Chevrolet,Lumina
Chevrolet,Lumina APV
Chevrolet,Lumina Cargo
Chevrolet,Lumina Passenger
Chevrolet,Malibu
Chevrolet,Malibu (Classic)
Chevrolet,Malibu Limited
Chevrolet,Metro
Chevrolet,Monte Carlo
Chevrolet,Prizm
Chevrolet,S10 Blazer
Chevrolet,S10 Crew Cab
Chevrolet,S10 Extended Cab
Chevrolet,S10 Regular Cab
Chevrolet,SS
Chevrolet,SSR
Chevrolet,Silverado (Classic) 1500 Crew Cab
Chevrolet,Silverado (Classic) 1500 Extended Cab
Chevrolet,Silverado (Classic) 1500 HD Crew Cab
Chevrolet,Silverado (Classic) 1500 Regular Cab
Chevrolet,Silverado (Classic) 2500 HD Crew Cab
Chevrolet,Silverado (Classic) 2500 HD Extended Cab
Chevrolet,Silverado (Classic) 2500 HD Regular Cab
Chevrolet,Silverado (Classic) 3500 Crew Cab
Chevrolet,Silverado (Classic) 3500 Extended Cab
Chevrolet,Silverado (Classic) 3500 Regular Cab
Chevrolet,Silverado 1500 Crew Cab
Chevrolet,Silverado 1500 Double Cab
Chevrolet,Silverado 1500 Extended Cab
Chevrolet,Silverado 1500 HD Crew Cab
Chevrolet,Silverado 1500 LD Double Cab
Chevrolet,Silverado 1500 Limited Crew Cab
Chevrolet,Silverado 1500 Limited Double Cab
Chevrolet,Silverado 1500 Limited Regular Cab
Chevrolet,Silverado 1500 Regular Cab
Chevrolet,Silverado 2500 Crew Cab
Chevrolet,Silverado 

Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino

On 3/4/2023 10:43 PM, Dino wrote:


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


I suspect I am really close to answering my own question...

>>> import time
>>> lis = [str(a**2+a*3+a) for a in range(0,3)]
>>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s);

753800
>>> s = time.process_time_ns(); res = [el for el in lis if "52356" in 
el]; print(time.process_time_ns() -s);

1068300
>>> s = time.process_time_ns(); res = [el for el in lis if "5256" in 
el]; print(time.process_time_ns() -s);

862000
>>> s = time.process_time_ns(); res = [el for el in lis if "6" in el]; 
print(time.process_time_ns() -s);

1447300
>>> s = time.process_time_ns(); res = [el for el in lis if "1" in el]; 
print(time.process_time_ns() -s);

1511100
>>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s); print(len(res), res[:10])

926900
2 ['134676021', '313467021']
>>>

I can do a substring search in a list of 30k elements in less than 2ms 
with Python. Is my reasoning sound?


Dino


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-04 Thread Greg Ewing via Python-list

On 5/03/23 5:12 pm, Dino wrote:
I can do a substring search in a list of 30k elements in less than 2ms 
with Python. Is my reasoning sound?


I just did a similar test with your actual data and got
about the same result. If that's fast enough for you,
then you don't need to do anything fancy.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list