Re: Fast full-text searching in Python (job for Whoosh?)
On 3/8/2023 3:27 PM, Peter J. Holzer wrote: On 2023-03-08 00:12:04 -0500, Thomas Passin wrote: On 3/7/2023 7:33 AM, Dino wrote: in fact it's a dilemma I am facing now. My back-end returns 10 entries (I am limiting to max 10 matches server side for reasons you can imagine). As the user keeps typing, should I restrict the existing result set based on the new information or re-issue a API call to the server? Things get confusing pretty fast for the user. You don't want too many cooks in kitchen, I guess. Played a little bit with both approaches in my little application. Re-requesting from the server seems to win hands down in my case. I am sure that them google engineers reached spectacular levels of UI finesse with stuff like this. Subject of course to trying this out, I would be inclined to send a much larger list of responses to the client, and let the client reduce the number to be displayed. The latency for sending a longer list will be smaller than establishing a new connection or even reusing an old one to send a new, short list of responses. That depends very much on how long that list can become. If it's 200 matches - sure, send them all, even if the client will display only 10 of them. Probably even for 2000. But if you might get 20 million matches you surely don't want to send them all to the client. Yes, of course. OTOH, if you have 2000+ possibilities it's basically pointless to send them to the client. You can send the first 10, and hope that will be worth something (it probably won't). You can send all 2000 and let the client show the first say 10, but that probably won't be worth much either. If you have some way to prioritize them, you can include the scores and send the top say 100 what you send to the client, and let the client figure out what to do. If you are going to have that many responses you will need some more complex and sophisticated approach anyway, so the whole discussion would not be applicable. And this would be getting miles (kms) away from the OP's situation. -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 2023-03-08 00:12:04 -0500, Thomas Passin wrote: > On 3/7/2023 7:33 AM, Dino wrote: > > in fact it's a dilemma I am facing now. My back-end returns 10 > > entries (I am limiting to max 10 matches server side for reasons you > > can imagine). As the user keeps typing, should I restrict the > > existing result set based on the new information or re-issue a API > > call to the server? Things get confusing pretty fast for the user. > > You don't want too many cooks in kitchen, I guess. > > Played a little bit with both approaches in my little application. > > Re-requesting from the server seems to win hands down in my case. > > I am sure that them google engineers reached spectacular levels of UI > > finesse with stuff like this. > > Subject of course to trying this out, I would be inclined to send a much > larger list of responses to the client, and let the client reduce the number > to be displayed. The latency for sending a longer list will be smaller than > establishing a new connection or even reusing an old one to send a new, > short list of responses. That depends very much on how long that list can become. If it's 200 matches - sure, send them all, even if the client will display only 10 of them. Probably even for 2000. But if you might get 20 million matches you surely don't want to send them all to the client. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature -- https://mail.python.org/mailman/listinfo/python-list
Re: RE: Fast full-text searching in Python (job for Whoosh?)
On 3/7/2023 2:02 PM, avi.e.gr...@gmail.com wrote: Some of the discussions here leave me confused as the info we think we got early does not last long intact and often morphs into something else and we find much of the discussion is misdirected or wasted. Apologies. I'm the OP and also the OS (original sinner). My "mistake" was to go for a "stream of consciousness" kind of question, rather than a well researched and thought out one. You are correct, Avi. I have a simple web UI, I came across the Whoosh video and got infatuated with the idea that Whoosh could be used for create a autofill function, as my backend is already Python/Flask. As many have observed and as I have also quickly realized, Whoosh was overkill for my use case. In the meantime people started asking questions, I responded and, before you know it, we are all discussing the intricacies of JavaScript web development in a Python forum. Should I have stopped them? How? One thing is for sure: I am really grateful that so many used so much of their time to help. A big thank you to each of you, friends. Dino -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/7/2023 1:28 PM, David Lowry-Duda wrote: But I'll note that I use whoosh from time to time and I find it stable and pleasant to work with. It's true that development stopped, but it stopped in a very stable place. I don't recommend using whoosh here, but I would recommend experimenting with it more generally. Thank you, David. Noted. -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/7/2023 7:33 AM, Dino wrote: It must be nice to have a server or two... No kidding About everything else you wrote, it makes a ton of sense, in fact it's a dilemma I am facing now. My back-end returns 10 entries (I am limiting to max 10 matches server side for reasons you can imagine). As the user keeps typing, should I restrict the existing result set based on the new information or re-issue a API call to the server? Things get confusing pretty fast for the user. You don't want too many cooks in kitchen, I guess. Played a little bit with both approaches in my little application. Re-requesting from the server seems to win hands down in my case. I am sure that them google engineers reached spectacular levels of UI finesse with stuff like this. Subject of course to trying this out, I would be inclined to send a much larger list of responses to the client, and let the client reduce the number to be displayed. The latency for sending a longer list will be smaller than establishing a new connection or even reusing an old one to send a new, short list of responses. When the client types more, it can only reduce the number of possibilities - among the (possibly imaginary) larger original number of them. After the next round of user typing, the client can check and see if there are enough surviving responses to list. If not, it can then request a new list from the server. Using this in reverse, if the user deletes some characters from the end, there should be no need to go back to the server. The possible responses would already have been sent to the client. They could be interned in an associative array keyed by the string the client had typed to get those responses. -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On Tue, 7 Mar 2023 07:33:01 -0500, Dino wrote: > Played a little bit with both approaches in my little application. > Re-requesting from the server seems to win hands down in my case. That's necessary for a non-trivial data set. Assume you get 10 suggestions after the user type 'to'. today tomorrow tomato tonsil torque totem toad toque toward touch If the user type 'l' next and is trying for 'tolerance' you'll need a new set. You'll need a little refinement. If the user is a proficient typist and wants to type 'tolerance' they may get ahead of you. Another consideration is a less proficient typist or someone who can't spell. Again, play with maps.google.com. They're good at it. Put '123 thomd' in the search bar. YMMV but I get 5 variations on 123 Thomas. When they were working down 'thompd' had zero matches so they backed up to 'thom'. If you play with their search they're using some more magic too. Try '123 ellekt'. They may be using a variation on soundex or something more sophisticated. -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/6/2023 11:05 PM, rbowman wrote: It must be nice to have a server or two... No kidding About everything else you wrote, it makes a ton of sense, in fact it's a dilemma I am facing now. My back-end returns 10 entries (I am limiting to max 10 matches server side for reasons you can imagine). As the user keeps typing, should I restrict the existing result set based on the new information or re-issue a API call to the server? Things get confusing pretty fast for the user. You don't want too many cooks in kitchen, I guess. Played a little bit with both approaches in my little application. Re-requesting from the server seems to win hands down in my case. I am sure that them google engineers reached spectacular levels of UI finesse with stuff like this. On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote: https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript That could be annoying. My use case is address entry. When the user types 102 ma the suggestions might be main manson maple massachusetts masten in a simple case. When they enter 's' it's narrowed down. Typically I'm only dealing with a city or county so the data to be searched isn't huge. The maps.google.com address search covers the world and they're also throwing in a geographical constraint so the suggestions are applicable to the area you're viewing. -- https://mail.python.org/mailman/listinfo/python-list
RE: Fast full-text searching in Python (job for Whoosh?)
Some of the discussions here leave me confused as the info we think we got early does not last long intact and often morphs into something else and we find much of the discussion is misdirected or wasted. Wouldn't it have been nice if this discussion had not started with a mention of a package/module few have heard of along with a vague request on how best to search for lines that match something in a file? I still do not know enough to feel comfortable even after all this time. It now seems to be a web-based application in which a web page wants to use autocompletion as the user types. So was the web page a static file that the user runs, or is it dynamically created by something like a python program? How is the fact that a user has typed a letter in a textbox or drop down of sorts reflected in a request being sent to a python program to return possible choices? Is the same process called anew each time or is it, or perhaps a group of similar processes or threads going to stick around and be called repeatedly? Lots of details are missing and in particular, much of what is being described sounds like it is happening in the browser, presumably in JavaScript. Also noted is that the first keystroke or two may return too much data. So does the OP still think this is a python question? So much of the discussion sounds like it is in the browser deciding whether to wait for the user to type more before making a request, or throwing away results of an older request. So my guess is that a possible design for this amount of data may simply be to read the file into the browser at startup, or when the first letter is typed, and do all the searches internally, perhaps cascaded as long as backspace or editing is not used. If the data gets much larger, of course, then using a server makes sense albeit it need not use python unless lots more in the project is also ... -Original Message- From: Python-list On Behalf Of David Lowry-Duda Sent: Tuesday, March 7, 2023 1:29 PM To: python-list@python.org Subject: Re: Fast full-text searching in Python (job for Whoosh?) On 22:43 Sat 04 Mar 2023, Dino wrote: >How can I implement this? A library called Whoosh seems very promising >(albeit it's so feature-rich that it's almost like shooting a fly with >a bazooka in my case), but I see two problems: > > 1) Whoosh is either abandoned or the project is a mess in terms of >community and support >(https://groups.google.com/g/whoosh/c/QM_P8cGi4v4 ) and > > 2) Whoosh seems to be a Python only thing, which is great for now, >but I wouldn't want this to become an obstacle should I need port it to >a different language at some point. As others have noted, it sounds like relatively straightforward implementations will be sufficient. But I'll note that I use whoosh from time to time and I find it stable and pleasant to work with. It's true that development stopped, but it stopped in a very stable place. I don't recommend using whoosh here, but I would recommend experimenting with it more generally. - DLD -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 22:43 Sat 04 Mar 2023, Dino wrote: How can I implement this? A library called Whoosh seems very promising (albeit it's so feature-rich that it's almost like shooting a fly with a bazooka in my case), but I see two problems: 1) Whoosh is either abandoned or the project is a mess in terms of community and support (https://groups.google.com/g/whoosh/c/QM_P8cGi4v4 ) and 2) Whoosh seems to be a Python only thing, which is great for now, but I wouldn't want this to become an obstacle should I need port it to a different language at some point. As others have noted, it sounds like relatively straightforward implementations will be sufficient. But I'll note that I use whoosh from time to time and I find it stable and pleasant to work with. It's true that development stopped, but it stopped in a very stable place. I don't recommend using whoosh here, but I would recommend experimenting with it more generally. - DLD -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 2023-03-07 04:05:19 +, rbowman wrote: > On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote: > > ne issue that was also correctly foreseen by some is that there's going > > to be a new request at every user key stroke. Known problem. JavaScript > > programmers use a trick called "debounceing" to be reasonably sure that > > the user is done typing before a request is issued: > > > > https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript > > That could be annoying. My use case is address entry. When the user types It can be. The delay is short but noticeable. A somewhat smarter strategy is to send each query as soon as the user hit the key but keep track of what you sent and received and discard responses for obsolete requests (This is necessary because if you first send "ma" and then "mas", the response to the first query might arrive after the response to the second query and you don't want to display "mansion" if the user already typed "mas".) hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote: > ne issue that was also correctly foreseen by some is that there's going > to be a new request at every user key stroke. Known problem. JavaScript > programmers use a trick called "debounceing" to be reasonably sure that > the user is done typing before a request is issued: > > https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript That could be annoying. My use case is address entry. When the user types 102 ma the suggestions might be main manson maple massachusetts masten in a simple case. When they enter 's' it's narrowed down. Typically I'm only dealing with a city or county so the data to be searched isn't huge. The maps.google.com address search covers the world and they're also throwing in a geographical constraint so the suggestions are applicable to the area you're viewing. It must be nice to have a server or two... -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/4/2023 10:43 PM, Dino wrote: I need fast text-search on a large (not huge, let's say 30k records totally) list of items. Here's a sample of my raw data (a list of US cars: model and make) Gentlemen, thanks a ton to everyone who offered to help (and did help!). I loved the part where some tried to divine the true meaning of my words :) What you guys wrote is correct: the grep-esque search is guaranteed to turn up a ton of false positives, but for the autofill use-case, that's actually OK. Users will quickly figure what is not relevant and skip those entries, just to zero on in on the suggestion that they find relevant. One issue that was also correctly foreseen by some is that there's going to be a new request at every user key stroke. Known problem. JavaScript programmers use a trick called "debounceing" to be reasonably sure that the user is done typing before a request is issued: https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript I was able to apply that successfully and I am now very pleased with the final result. Apologies if I posted 1400 lines or data file. Seeing that certain newsgroups carry gigabytes of copyright infringing material must have conveyed the wrong impression to me. Thank you. Dino -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 7/03/23 6:49 am, avi.e.gr...@gmail.com wrote: But the example given wanted to match something like "V6" in middle of the text and I do not see how that would work as you would now need to search 26 dictionaries completely. It might even make things worse, as there is likely to be a lot of overlap between entries containing "V" and entries containing "6", so you end up searching the same data multiple times. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 7/03/23 4:35 am, Weatherby,Gerard wrote: If mailing space is a consideration, we could all help by keeping our replies short and to the point. Indeed. A thread or two of untrimmed quoted messages is probably more data than Dino posted! -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/6/2023 12:49 PM, avi.e.gr...@gmail.com wrote: Thomas, I may have missed any discussion where the OP explained more about proposed usage. If the program is designed to load the full data once, never get updates except by re-reading some file, and then handles multiple requests, then some things may be worth doing. It looked to me, and I may well be wrong, like he wanted to search for a string anywhere in the text so a grep-like solution is a reasonable start with the actual data being stored as something like a list of character strings you can search "one line" at a time. I suspect a numpy variant may work faster. And of course any search function he builds can be made to remember some or all previous searches using a cache decorator. That generally uses a dictionary for the search keys internally. But using lots of dictionaries strikes me as only helping if you are searching for text anchored to the start of a line so if you ask for "Honda" you instead ask the dictionary called "h" and search perhaps just for "onda" then recombine the prefix in any results. But the example given wanted to match something like "V6" in middle of the text and I do not see how that would work as you would now need to search 26 dictionaries completely. Well, that's the question, isn't it? Just how is this expected to be used? I didn't read the initial posting that carefully, and I may have missed something that makes a difference. The OP gives as an example a user entering a string ("v60"). The example is for a model designation. If we know that this entry box will only receive model, then I would populate a dictionary using the model numbers as keys. The number of distinct keys will probably not be that large. For example, highly simplified of course: >>> models = {'v60': 'Volvo', 'GV60': 'Genesis', 'cl': 'Acura'} >>> entry = '60' >>> candidates = (m for m in models.keys() if entry in m) >>> list(candidates) ['v60', 'GV60'] The keys would be lower-cased. A separate dictionary would give the complete string with the desired casing. The values could be object references to the complete information. If there might be several different models models with the same key, then the values could be lists or dictionaries and one would need to do some disambiguation, but that should be simple or quick. It all depends on the planned access patterns. If the OP really wants full-text search in the complete unstructured data file, then yes, a full text indexer of some kind will be useful. Whoosh certainly looks good though I have not used it. But for populating dropdown lists in web forms, most likely the design of the form will provide a structure for the various searches. -Original Message- From: Python-list On Behalf Of Thomas Passin Sent: Monday, March 6, 2023 11:03 AM To: python-list@python.org Subject: Re: Fast full-text searching in Python (job for Whoosh?) On 3/6/2023 10:32 AM, Weatherby,Gerard wrote: Not sure if this is what Thomas meant, but I was also thinking dictionaries. Dino could build a set of dictionaries with keys “a” through “z” that contain data with those letters in them. (I’m assuming case insensitive search) and then just search “v” if that’s what the user starts with. Increased performance may be achieved by building dictionaries “aa”,”ab” ... “zz. And so on. Of course, it’s trading CPU for memory usage, and there’s likely a point at which the cost of building dictionaries exceeds the savings in searching. Chances are it would only be seconds at most to build the data cache, and then subsequent queries would respond very quickly. From: Python-list on behalf of Thomas Passin Date: Sunday, March 5, 2023 at 9:07 PM To: python-list@python.org Subject: Re: Fast full-text searching in Python (job for Whoosh?) I would probably ingest the data at startup into a dictionary - or perhaps several depending on your access patterns - and then you will only need to to a fast lookup in one or more dictionaries. If your access pattern would be easier with SQL queries, load the data into an SQLite database on startup. IOW, do the bulk of the work once at startup. -- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$> -- https://mail.python.org/mailman/listinfo/python-list
RE: Fast full-text searching in Python (job for Whoosh?)
Ah, thanks Dino. Autocomplete within a web page can be an interesting scenario but also a daunting one. Now, do you mean you have a web page with a text field, initially I suppose empty, and the user types a single character and rapidly a drop-down list or something is created and shown? And as they type, it may shrink? And as soon as they select one, it is replaced in the text field and done? If your form has an attached function written in JavaScript, some might load your data into the browser and do all that work from within. No python needed. Now if your scenario is similar to the above, or perhaps the user needs to ask for autocompletion by using tab or something, and you want to keep sending requests to a server, you can of course use any language on the server. BUT I would be cautious in such a design. My guess is you autocomplete on every keystroke and the user may well type multiple characters resulting in multiple requests for your program. Is a new one called every time or is it a running service. If the latter, it pays to read in the data once and then carefully serve it. But when you get just the letter "h" you may not want to send and process a thousand results but limit It to say the first N. If they then add an o to make a ho, You may not need to do much if it is anchored to the start except to search in the results of the previous search rather than the whole data. But have you done some searching on how autocomplete from a fixed corpus is normally done? It is a quite common thing. -Original Message- From: Python-list On Behalf Of Dino Sent: Monday, March 6, 2023 7:40 AM To: python-list@python.org Subject: Re: RE: Fast full-text searching in Python (job for Whoosh?) Thank you for taking the time to write such a detailed answer, Avi. And apologies for not providing more info from the get go. What I am trying to achieve here is supporting autocomplete (no pun intended) in a web form field, hence the -i case insensitive example in my initial question. Your points are all good, and my original question was a bit rushed. I guess that the problem was that I saw this video: https://www.youtube.com/watch?v=gRvZbYtwTeo_channel=NextDayVideo The idea that someone types into an input field and matches start dancing in the browser made me think that this was exactly what I needed, and hence I figured that asking here about Whoosh would be a good idea. I know realize that Whoosh would be overkill for my use-case, as a simple (case insensitive) query substring would get me 90% of what I want. Speed is in the order of a few milliseconds out of the box, which is chump change in the context of a web UI. Thank you again for taking the time to look at my question Dino On 3/5/2023 10:56 PM, avi.e.gr...@gmail.com wrote: > Dino, Sending lots of data to an archived forum is not a great idea. I > snipped most of it out below as not to replicate it. > > Your question does not look difficult unless your real question is about > speed. Realistically, much of the time spent generally is in reading in a > file and the actual search can be quite rapid with a wide range of methods. > > The data looks boring enough and seems to not have much structure other than > one comma possibly separating two fields. Do you want the data as one wide > filed or perhaps in two parts, which a CSV file is normally used to > represent. Do you ever have questions like tell me all cars whose name > begins with the letter D and has a V6 engine? If so, you may want more than > a vanilla search. > > What exactly do you want to search for? Is it a set of built-in searches or > something the user types in? > > The data seems to be sorted by the first field and then by the second and I > did not check if some searches might be ambiguous. Can there be many entries > containing III? Yep. Can the same words like Cruiser or Hybrid appear? > > So is this a one-time search or multiple searches once loaded as in a > service that stays resident and fields requests. The latter may be worth > speeding up. > > I don't NEED to know any of this but want you to know that the answer may > depend on this and similar factors. We had a long discussion lately on > whether to search using regular expressions or string methods. If your data > is meant to be used once, you may not even need to read the file into > memory, but read something like a line at a time and test it. Or, if you end > up with more data like how many cylinders a car has, it may be time to read > it in not just to a list of lines or such data structures, but get > numpy/pandas involved and use their many search methods in something like a > data.frame. > > Of course if you are worried about portability, keep using Get Regular > Expression Print. > > Your example was: > > $ grep -i v60 all_cars_unique.csv >
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/6/2023 7:28 AM, Dino wrote: On 3/5/2023 9:05 PM, Thomas Passin wrote: I would probably ingest the data at startup into a dictionary - or perhaps several depending on your access patterns - and then you will only need to to a fast lookup in one or more dictionaries. If your access pattern would be easier with SQL queries, load the data into an SQLite database on startup. Thank you. SQLite would be overkill here, plus all the machinery that I would need to set up to make sure that the DB is rebuilt/updated regularly. Do you happen to know something about Whoosh? have you ever used it? I know nothing about it, sorry. But anything beyond python dictionaries and possibly some lists strikes me as overkill for what you have described. IOW, do the bulk of the work once at startup. Sound advice Thank you -- https://mail.python.org/mailman/listinfo/python-list
RE: Fast full-text searching in Python (job for Whoosh?)
Thomas, I may have missed any discussion where the OP explained more about proposed usage. If the program is designed to load the full data once, never get updates except by re-reading some file, and then handles multiple requests, then some things may be worth doing. It looked to me, and I may well be wrong, like he wanted to search for a string anywhere in the text so a grep-like solution is a reasonable start with the actual data being stored as something like a list of character strings you can search "one line" at a time. I suspect a numpy variant may work faster. And of course any search function he builds can be made to remember some or all previous searches using a cache decorator. That generally uses a dictionary for the search keys internally. But using lots of dictionaries strikes me as only helping if you are searching for text anchored to the start of a line so if you ask for "Honda" you instead ask the dictionary called "h" and search perhaps just for "onda" then recombine the prefix in any results. But the example given wanted to match something like "V6" in middle of the text and I do not see how that would work as you would now need to search 26 dictionaries completely. -Original Message- From: Python-list On Behalf Of Thomas Passin Sent: Monday, March 6, 2023 11:03 AM To: python-list@python.org Subject: Re: Fast full-text searching in Python (job for Whoosh?) On 3/6/2023 10:32 AM, Weatherby,Gerard wrote: > Not sure if this is what Thomas meant, but I was also thinking dictionaries. > > Dino could build a set of dictionaries with keys “a” through “z” that contain > data with those letters in them. (I’m assuming case insensitive search) and > then just search “v” if that’s what the user starts with. > > Increased performance may be achieved by building dictionaries “aa”,”ab” ... > “zz. And so on. > > Of course, it’s trading CPU for memory usage, and there’s likely a point at > which the cost of building dictionaries exceeds the savings in searching. Chances are it would only be seconds at most to build the data cache, and then subsequent queries would respond very quickly. > > From: Python-list on > behalf of Thomas Passin > Date: Sunday, March 5, 2023 at 9:07 PM > To: python-list@python.org > Subject: Re: Fast full-text searching in Python (job for Whoosh?) > > I would probably ingest the data at startup into a dictionary - or > perhaps several depending on your access patterns - and then you will > only need to to a fast lookup in one or more dictionaries. > > If your access pattern would be easier with SQL queries, load the data > into an SQLite database on startup. > > IOW, do the bulk of the work once at startup. > -- > https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$> -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
RE: Fast full-text searching in Python (job for Whoosh?)
Gerard, I was politely pointing out how it was more than the minimum necessary and might gets repeated multiple times as people replied. The storage space is a resource someone else provides and I prefer not abusing it. However, since the OP seems to be asking a question focused on how long it takes to search using possible techniques, indeed some people would want the entire data to test with. In my personal view, the a snippet of the data is what I need to see how it is organized and then what I need way more is some idea for what kind of searching is needed. If I was told there would be a web page allowing users to search a web service hosting the data on a server with one process called as much as needed that spawned threads to handle the task, I might see it as very worthwhile to read in the data once into some data structure that allows rapid searches over and over. If it is an app called ONCE as a whole for each result, as in the grep example, why bother and just read a line at a time and be done with it. My suggestion remains my preference. The discussion is archived. Messages are can optimally be trimmed as needed and not allowed to contain the full contents of the last twenty replies back and forth unless that is needed. Larger amounts of data can be offered to share and if wanted, can be posted or send to someone asking for it or placed in some public accessible place. But my preference may not be relevant as the forum has hosts or owners and it is what they want that counts. The data this time was not really gigantic. But I often work with data from a CSV that has hundreds of columns and hundreds of thousands or more rows, with some of the columns containing large amounts of text. But I may be interested in how to work with say just half a dozen columns and for the purposes of my question here, perhaps a hundred representative rows. Should I share everything, or maybe save the subset and only share that? This is not about python as a language but about expressing ideas and opinions on a public forum with limited resources. Yes, over the years, my combined posts probably use far more archival space. We are not asked to be sparse, just not be wasteful. The OP may consider what he is working with as a LOT of data but it really isn't by modern standards. -Original Message- From: Python-list On Behalf Of Weatherby,Gerard Sent: Monday, March 6, 2023 10:35 AM To: python-list@python.org Subject: Re: Fast full-text searching in Python (job for Whoosh?) "Dino, Sending lots of data to an archived forum is not a great idea. I snipped most of it out below as not to replicate it." Surely in 2023, storage is affordable enough there's no need to criticize Dino for posting complete information. If mailing space is a consideration, we could all help by keeping our replies short and to the point. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/5/2023 9:05 PM, Thomas Passin wrote: I would probably ingest the data at startup into a dictionary - or perhaps several depending on your access patterns - and then you will only need to to a fast lookup in one or more dictionaries. If your access pattern would be easier with SQL queries, load the data into an SQLite database on startup. Thank you. SQLite would be overkill here, plus all the machinery that I would need to set up to make sure that the DB is rebuilt/updated regularly. Do you happen to know something about Whoosh? have you ever used it? IOW, do the bulk of the work once at startup. Sound advice Thank you -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On Mon, 6 Mar 2023 07:40:29 -0500, Dino wrote: > The idea that someone types into an input field and matches start > dancing in the browser made me think that this was exactly what I > needed, and hence I figured that asking here about Whoosh would be a > good idea. I know realize that Whoosh would be overkill for my use-case, > as a simple (case insensitive) query substring would get me 90% of what > I want. Speed is in the order of a few milliseconds out of the box, > which is chump change in the context of a web UI. For a web application the round trips to the server for the next set of suggestions swamp out the actual lookups. Use the developer console in your browser to look at the network traffic and you'll see it's busy. -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On Mon, 6 Mar 2023 15:32:09 +, Weatherby,Gerard wrote: > Increased performance may be achieved by building dictionaries “aa”,”ab” > ... “zz. And so on. Or a trie. There have been several implementations but I believe this is the most active: https://pypi.org/project/PyTrie/ -- https://mail.python.org/mailman/listinfo/python-list
Re: RE: Fast full-text searching in Python (job for Whoosh?)
Thank you for taking the time to write such a detailed answer, Avi. And apologies for not providing more info from the get go. What I am trying to achieve here is supporting autocomplete (no pun intended) in a web form field, hence the -i case insensitive example in my initial question. Your points are all good, and my original question was a bit rushed. I guess that the problem was that I saw this video: https://www.youtube.com/watch?v=gRvZbYtwTeo_channel=NextDayVideo The idea that someone types into an input field and matches start dancing in the browser made me think that this was exactly what I needed, and hence I figured that asking here about Whoosh would be a good idea. I know realize that Whoosh would be overkill for my use-case, as a simple (case insensitive) query substring would get me 90% of what I want. Speed is in the order of a few milliseconds out of the box, which is chump change in the context of a web UI. Thank you again for taking the time to look at my question Dino On 3/5/2023 10:56 PM, avi.e.gr...@gmail.com wrote: Dino, Sending lots of data to an archived forum is not a great idea. I snipped most of it out below as not to replicate it. Your question does not look difficult unless your real question is about speed. Realistically, much of the time spent generally is in reading in a file and the actual search can be quite rapid with a wide range of methods. The data looks boring enough and seems to not have much structure other than one comma possibly separating two fields. Do you want the data as one wide filed or perhaps in two parts, which a CSV file is normally used to represent. Do you ever have questions like tell me all cars whose name begins with the letter D and has a V6 engine? If so, you may want more than a vanilla search. What exactly do you want to search for? Is it a set of built-in searches or something the user types in? The data seems to be sorted by the first field and then by the second and I did not check if some searches might be ambiguous. Can there be many entries containing III? Yep. Can the same words like Cruiser or Hybrid appear? So is this a one-time search or multiple searches once loaded as in a service that stays resident and fields requests. The latter may be worth speeding up. I don't NEED to know any of this but want you to know that the answer may depend on this and similar factors. We had a long discussion lately on whether to search using regular expressions or string methods. If your data is meant to be used once, you may not even need to read the file into memory, but read something like a line at a time and test it. Or, if you end up with more data like how many cylinders a car has, it may be time to read it in not just to a list of lines or such data structures, but get numpy/pandas involved and use their many search methods in something like a data.frame. Of course if you are worried about portability, keep using Get Regular Expression Print. Your example was: $ grep -i v60 all_cars_unique.csv Genesis,GV60 Volvo,V60 You seem to have wanted case folding and that is NOT a normal search. And your search is matching anything on any line. If you wanted only a complete field, such as all text after a comma to the end of the line, you could use grep specifications to say that. But once inside python, you would need to make choices depending on what kind of searches you want to allow but also things like do you want all matching lines shown if you search for say "a" ... -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/5/2023 1:19 AM, Greg Ewing wrote: I just did a similar test with your actual data and got about the same result. If that's fast enough for you, then you don't need to do anything fancy. thank you, Greg. That's what I am going to do in fact. -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/6/2023 10:32 AM, Weatherby,Gerard wrote: Not sure if this is what Thomas meant, but I was also thinking dictionaries. Dino could build a set of dictionaries with keys “a” through “z” that contain data with those letters in them. (I’m assuming case insensitive search) and then just search “v” if that’s what the user starts with. Increased performance may be achieved by building dictionaries “aa”,”ab” ... “zz. And so on. Of course, it’s trading CPU for memory usage, and there’s likely a point at which the cost of building dictionaries exceeds the savings in searching. Chances are it would only be seconds at most to build the data cache, and then subsequent queries would respond very quickly. From: Python-list on behalf of Thomas Passin Date: Sunday, March 5, 2023 at 9:07 PM To: python-list@python.org Subject: Re: Fast full-text searching in Python (job for Whoosh?) I would probably ingest the data at startup into a dictionary - or perhaps several depending on your access patterns - and then you will only need to to a fast lookup in one or more dictionaries. If your access pattern would be easier with SQL queries, load the data into an SQLite database on startup. IOW, do the bulk of the work once at startup. -- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$> -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
“Dino, Sending lots of data to an archived forum is not a great idea. I snipped most of it out below as not to replicate it.” Surely in 2023, storage is affordable enough there’s no need to criticize Dino for posting complete information. If mailing space is a consideration, we could all help by keeping our replies short and to the point. -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
Not sure if this is what Thomas meant, but I was also thinking dictionaries. Dino could build a set of dictionaries with keys “a” through “z” that contain data with those letters in them. (I’m assuming case insensitive search) and then just search “v” if that’s what the user starts with. Increased performance may be achieved by building dictionaries “aa”,”ab” ... “zz. And so on. Of course, it’s trading CPU for memory usage, and there’s likely a point at which the cost of building dictionaries exceeds the savings in searching. From: Python-list on behalf of Thomas Passin Date: Sunday, March 5, 2023 at 9:07 PM To: python-list@python.org Subject: Re: Fast full-text searching in Python (job for Whoosh?) I would probably ingest the data at startup into a dictionary - or perhaps several depending on your access patterns - and then you will only need to to a fast lookup in one or more dictionaries. If your access pattern would be easier with SQL queries, load the data into an SQLite database on startup. IOW, do the bulk of the work once at startup. -- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$> -- https://mail.python.org/mailman/listinfo/python-list
RE: Fast full-text searching in Python (job for Whoosh?)
Dino, Sending lots of data to an archived forum is not a great idea. I snipped most of it out below as not to replicate it. Your question does not look difficult unless your real question is about speed. Realistically, much of the time spent generally is in reading in a file and the actual search can be quite rapid with a wide range of methods. The data looks boring enough and seems to not have much structure other than one comma possibly separating two fields. Do you want the data as one wide filed or perhaps in two parts, which a CSV file is normally used to represent. Do you ever have questions like tell me all cars whose name begins with the letter D and has a V6 engine? If so, you may want more than a vanilla search. What exactly do you want to search for? Is it a set of built-in searches or something the user types in? The data seems to be sorted by the first field and then by the second and I did not check if some searches might be ambiguous. Can there be many entries containing III? Yep. Can the same words like Cruiser or Hybrid appear? So is this a one-time search or multiple searches once loaded as in a service that stays resident and fields requests. The latter may be worth speeding up. I don't NEED to know any of this but want you to know that the answer may depend on this and similar factors. We had a long discussion lately on whether to search using regular expressions or string methods. If your data is meant to be used once, you may not even need to read the file into memory, but read something like a line at a time and test it. Or, if you end up with more data like how many cylinders a car has, it may be time to read it in not just to a list of lines or such data structures, but get numpy/pandas involved and use their many search methods in something like a data.frame. Of course if you are worried about portability, keep using Get Regular Expression Print. Your example was: $ grep -i v60 all_cars_unique.csv Genesis,GV60 Volvo,V60 You seem to have wanted case folding and that is NOT a normal search. And your search is matching anything on any line. If you wanted only a complete field, such as all text after a comma to the end of the line, you could use grep specifications to say that. But once inside python, you would need to make choices depending on what kind of searches you want to allow but also things like do you want all matching lines shown if you search for say "a" ... -Original Message- From: Python-list On Behalf Of Dino Sent: Saturday, March 4, 2023 10:47 PM To: python-list@python.org Subject: Re: Fast full-text searching in Python (job for Whoosh?) Here's the complete data file should anyone care. Acura,CL Acura,ILX Acura,Integra Acura,Legend smart,fortwo electric drive smart,fortwo electric drive cabrio -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/4/2023 11:12 PM, Dino wrote: On 3/4/2023 10:43 PM, Dino wrote: I need fast text-search on a large (not huge, let's say 30k records totally) list of items. Here's a sample of my raw data (a list of US cars: model and make) I suspect I am really close to answering my own question... >>> import time >>> lis = [str(a**2+a*3+a) for a in range(0,3)] >>> s = time.process_time_ns(); res = [el for el in lis if "13467" in el]; print(time.process_time_ns() -s); 753800 >>> s = time.process_time_ns(); res = [el for el in lis if "52356" in el]; print(time.process_time_ns() -s); 1068300 >>> s = time.process_time_ns(); res = [el for el in lis if "5256" in el]; print(time.process_time_ns() -s); 862000 >>> s = time.process_time_ns(); res = [el for el in lis if "6" in el]; print(time.process_time_ns() -s); 1447300 >>> s = time.process_time_ns(); res = [el for el in lis if "1" in el]; print(time.process_time_ns() -s); 1511100 >>> s = time.process_time_ns(); res = [el for el in lis if "13467" in el]; print(time.process_time_ns() -s); print(len(res), res[:10]) 926900 2 ['134676021', '313467021'] >>> I can do a substring search in a list of 30k elements in less than 2ms with Python. Is my reasoning sound? I would probably ingest the data at startup into a dictionary - or perhaps several depending on your access patterns - and then you will only need to to a fast lookup in one or more dictionaries. If your access pattern would be easier with SQL queries, load the data into an SQLite database on startup. IOW, do the bulk of the work once at startup. -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
Here's the complete data file should anyone care. Acura,CL Acura,ILX Acura,Integra Acura,Legend Acura,MDX Acura,MDX Sport Hybrid Acura,NSX Acura,RDX Acura,RL Acura,RLX Acura,RLX Sport Hybrid Acura,RSX Acura,SLX Acura,TL Acura,TLX Acura,TSX Acura,Vigor Acura,ZDX Alfa Romeo,164 Alfa Romeo,4C Alfa Romeo,4C Spider Alfa Romeo,Giulia Alfa Romeo,Spider Alfa Romeo,Stelvio Alfa Romeo,Tonale Aston Martin,DB11 Aston Martin,DB9 Aston Martin,DB9 GT Aston Martin,DBS Aston Martin,DBS Superleggera Aston Martin,DBX Aston Martin,Rapide Aston Martin,Rapide S Aston Martin,Vanquish Aston Martin,Vanquish S Aston Martin,Vantage Aston Martin,Virage Audi,100 Audi,80 Audi,90 Audi,A3 Audi,A3 Sportback e-tron Audi,A4 Audi,A4 (2005.5) Audi,A4 allroad Audi,A5 Audi,A5 Sport Audi,A6 Audi,A6 allroad Audi,A7 Audi,A8 Audi,Cabriolet Audi,Q3 Audi,Q4 Sportback e-tron Audi,Q4 e-tron Audi,Q5 Audi,Q5 Sportback Audi,Q7 Audi,Q8 Audi,Quattro Audi,R8 Audi,RS 3 Audi,RS 4 Audi,RS 5 Audi,RS 6 Audi,RS 7 Audi,RS Q8 Audi,RS e-tron GT Audi,S3 Audi,S4 Audi,S4 (2005.5) Audi,S5 Audi,S6 Audi,S7 Audi,S8 Audi,SQ5 Audi,SQ5 Sportback Audi,SQ7 Audi,SQ8 Audi,TT Audi,allroad Audi,e-tron Audi,e-tron GT Audi,e-tron S Audi,e-tron S Sportback Audi,e-tron Sportback BMW,1 Series BMW,2 Series BMW,3 Series BMW,4 Series BMW,5 Series BMW,6 Series BMW,7 Series BMW,8 Series BMW,Alpina B7 BMW,M BMW,M2 BMW,M3 BMW,M4 BMW,M5 BMW,M6 BMW,M8 BMW,X1 BMW,X2 BMW,X3 BMW,X3 M BMW,X4 BMW,X4 M BMW,X5 BMW,X5 M BMW,X6 BMW,X6 M BMW,X7 BMW,Z3 BMW,Z4 BMW,Z4 M BMW,Z8 BMW,i3 BMW,i4 BMW,i7 BMW,i8 BMW,iX Bentley,Arnage Bentley,Azure Bentley,Azure T Bentley,Bentayga Bentley,Brooklands Bentley,Continental Bentley,Continental GT Bentley,Flying Spur Bentley,Mulsanne Buick,Cascada Buick,Century Buick,Enclave Buick,Encore Buick,Encore GX Buick,Envision Buick,LaCrosse Buick,LeSabre Buick,Lucerne Buick,Park Avenue Buick,Rainier Buick,Regal Buick,Regal Sportback Buick,Regal TourX Buick,Rendezvous Buick,Riviera Buick,Roadmaster Buick,Skylark Buick,Terraza Buick,Verano Cadillac,ATS Cadillac,ATS-V Cadillac,Allante Cadillac,Brougham Cadillac,CT4 Cadillac,CT5 Cadillac,CT6 Cadillac,CT6-V Cadillac,CTS Cadillac,CTS-V Cadillac,Catera Cadillac,DTS Cadillac,DeVille Cadillac,ELR Cadillac,Eldorado Cadillac,Escalade Cadillac,Escalade ESV Cadillac,Escalade EXT Cadillac,Fleetwood Cadillac,LYRIQ Cadillac,SRX Cadillac,STS Cadillac,Seville Cadillac,Sixty Special Cadillac,XLR Cadillac,XT4 Cadillac,XT5 Cadillac,XT6 Cadillac,XTS Chevrolet,1500 Extended Cab Chevrolet,1500 Regular Cab Chevrolet,2500 Crew Cab Chevrolet,2500 Extended Cab Chevrolet,2500 HD Extended Cab Chevrolet,2500 HD Regular Cab Chevrolet,2500 Regular Cab Chevrolet,3500 Crew Cab Chevrolet,3500 Extended Cab Chevrolet,3500 HD Extended Cab Chevrolet,3500 HD Regular Cab Chevrolet,3500 Regular Cab Chevrolet,APV Cargo Chevrolet,Astro Cargo Chevrolet,Astro Passenger Chevrolet,Avalanche Chevrolet,Avalanche 1500 Chevrolet,Avalanche 2500 Chevrolet,Aveo Chevrolet,Beretta Chevrolet,Blazer Chevrolet,Blazer EV Chevrolet,Bolt EUV Chevrolet,Bolt EV Chevrolet,Camaro Chevrolet,Caprice Chevrolet,Caprice Classic Chevrolet,Captiva Sport Chevrolet,Cavalier Chevrolet,City Express Chevrolet,Classic Chevrolet,Cobalt Chevrolet,Colorado Crew Cab Chevrolet,Colorado Extended Cab Chevrolet,Colorado Regular Cab Chevrolet,Corsica Chevrolet,Corvette Chevrolet,Cruze Chevrolet,Cruze Limited Chevrolet,Equinox Chevrolet,Equinox EV Chevrolet,Express 1500 Cargo Chevrolet,Express 1500 Passenger Chevrolet,Express 2500 Cargo Chevrolet,Express 2500 Passenger Chevrolet,Express 3500 Cargo Chevrolet,Express 3500 Passenger Chevrolet,G-Series 1500 Chevrolet,G-Series 2500 Chevrolet,G-Series 3500 Chevrolet,G-Series G10 Chevrolet,G-Series G20 Chevrolet,G-Series G30 Chevrolet,HHR Chevrolet,Impala Chevrolet,Impala Limited Chevrolet,Lumina Chevrolet,Lumina APV Chevrolet,Lumina Cargo Chevrolet,Lumina Passenger Chevrolet,Malibu Chevrolet,Malibu (Classic) Chevrolet,Malibu Limited Chevrolet,Metro Chevrolet,Monte Carlo Chevrolet,Prizm Chevrolet,S10 Blazer Chevrolet,S10 Crew Cab Chevrolet,S10 Extended Cab Chevrolet,S10 Regular Cab Chevrolet,SS Chevrolet,SSR Chevrolet,Silverado (Classic) 1500 Crew Cab Chevrolet,Silverado (Classic) 1500 Extended Cab Chevrolet,Silverado (Classic) 1500 HD Crew Cab Chevrolet,Silverado (Classic) 1500 Regular Cab Chevrolet,Silverado (Classic) 2500 HD Crew Cab Chevrolet,Silverado (Classic) 2500 HD Extended Cab Chevrolet,Silverado (Classic) 2500 HD Regular Cab Chevrolet,Silverado (Classic) 3500 Crew Cab Chevrolet,Silverado (Classic) 3500 Extended Cab Chevrolet,Silverado (Classic) 3500 Regular Cab Chevrolet,Silverado 1500 Crew Cab Chevrolet,Silverado 1500 Double Cab Chevrolet,Silverado 1500 Extended Cab Chevrolet,Silverado 1500 HD Crew Cab Chevrolet,Silverado 1500 LD Double Cab Chevrolet,Silverado 1500 Limited Crew Cab Chevrolet,Silverado 1500 Limited Double Cab Chevrolet,Silverado 1500 Limited Regular Cab Chevrolet,Silverado 1500 Regular Cab Chevrolet,Silverado 2500 Crew Cab Chevrolet,Silverado
Re: Fast full-text searching in Python (job for Whoosh?)
On 3/4/2023 10:43 PM, Dino wrote: I need fast text-search on a large (not huge, let's say 30k records totally) list of items. Here's a sample of my raw data (a list of US cars: model and make) I suspect I am really close to answering my own question... >>> import time >>> lis = [str(a**2+a*3+a) for a in range(0,3)] >>> s = time.process_time_ns(); res = [el for el in lis if "13467" in el]; print(time.process_time_ns() -s); 753800 >>> s = time.process_time_ns(); res = [el for el in lis if "52356" in el]; print(time.process_time_ns() -s); 1068300 >>> s = time.process_time_ns(); res = [el for el in lis if "5256" in el]; print(time.process_time_ns() -s); 862000 >>> s = time.process_time_ns(); res = [el for el in lis if "6" in el]; print(time.process_time_ns() -s); 1447300 >>> s = time.process_time_ns(); res = [el for el in lis if "1" in el]; print(time.process_time_ns() -s); 1511100 >>> s = time.process_time_ns(); res = [el for el in lis if "13467" in el]; print(time.process_time_ns() -s); print(len(res), res[:10]) 926900 2 ['134676021', '313467021'] >>> I can do a substring search in a list of 30k elements in less than 2ms with Python. Is my reasoning sound? Dino -- https://mail.python.org/mailman/listinfo/python-list
Re: Fast full-text searching in Python (job for Whoosh?)
On 5/03/23 5:12 pm, Dino wrote: I can do a substring search in a list of 30k elements in less than 2ms with Python. Is my reasoning sound? I just did a similar test with your actual data and got about the same result. If that's fast enough for you, then you don't need to do anything fancy. -- Greg -- https://mail.python.org/mailman/listinfo/python-list