Re: RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-08 Thread Dino

On 3/7/2023 2:02 PM, avi.e.gr...@gmail.com wrote:

Some of the discussions here leave me confused as the info we think we got
early does not last long intact and often morphs into something else and we
find much of the discussion is misdirected or wasted.



Apologies. I'm the OP and also the OS (original sinner). My "mistake" 
was to go for a "stream of consciousness" kind of question, rather than 
a well researched and thought out one.


You are correct, Avi. I have a simple web UI, I came across the Whoosh 
video and got infatuated with the idea that Whoosh could be used for 
create a autofill function, as my backend is already Python/Flask. As 
many have observed and as I have also quickly realized, Whoosh was 
overkill for my use case. In the meantime people started asking 
questions, I responded and, before you know it, we are all discussing 
the intricacies of JavaScript web development in a Python forum. Should 
I have stopped them? How?


One thing is for sure: I am really grateful that so many used so much of 
their time to help.


A big thank you to each of you, friends.

Dino


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-08 Thread Dino

On 3/7/2023 1:28 PM, David Lowry-Duda wrote:

But I'll note that I use whoosh from time to time and I find it stable 
and pleasant to work with. It's true that development stopped, but it 
stopped in a very stable place. I don't recommend using whoosh here, but 
I would recommend experimenting with it more generally.


Thank you, David. Noted.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-07 Thread Dino

On 3/6/2023 11:05 PM, rbowman wrote:


It must be nice to have a server or two...


No kidding

About everything else you wrote, it makes a ton of sense, in fact it's a 
dilemma I am facing now. My back-end returns 10 entries (I am limiting 
to max 10 matches server side for reasons you can imagine).
As the user keeps typing, should I restrict the existing result set 
based on the new information or re-issue a API call to the server?
Things get confusing pretty fast for the user. You don't want too many 
cooks in kitchen, I guess.
Played a little bit with both approaches in my little application. 
Re-requesting from the server seems to win hands down in my case.
I am sure that them google engineers reached spectacular levels of UI 
finesse with stuff like this.



On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote:


https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript


That could be annoying. My use case is address entry. When the user types

102 ma

the suggestions might be

main
manson
maple
massachusetts
masten

in a simple case. When they enter 's' it's narrowed down. Typically I'm
only dealing with a city or county so the data to be searched isn't huge.
The maps.google.com address search covers the world and they're also
throwing in a geographical constraint so the suggestions are applicable to
the area you're viewing.  



--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/4/2023 10:43 PM, Dino wrote:


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


Gentlemen, thanks a ton to everyone who offered to help (and did help!). 
I loved the part where some tried to divine the true meaning of my words :)


What you guys wrote is correct: the grep-esque search is guaranteed to 
turn up a ton of false positives, but for the autofill use-case, that's 
actually OK. Users will quickly figure what is not relevant and skip 
those entries, just to zero on in on the suggestion that they find relevant.


One issue that was also correctly foreseen by some is that there's going 
to be a new request at every user key stroke. Known problem. JavaScript 
programmers use a trick called "debounceing" to be reasonably sure that 
the user is done typing before a request is issued:


https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript

I was able to apply that successfully and I am now very pleased with the 
final result.


Apologies if I posted 1400 lines or data file. Seeing that certain 
newsgroups carry gigabytes of copyright infringing material must have 
conveyed the wrong impression to me.


Thank you.

Dino

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/5/2023 9:05 PM, Thomas Passin wrote:


I would probably ingest the data at startup into a dictionary - or 
perhaps several depending on your access patterns - and then you will 
only need to to a fast lookup in one or more dictionaries.


If your access pattern would be easier with SQL queries, load the data 
into an SQLite database on startup.


Thank you. SQLite would be overkill here, plus all the machinery that I 
would need to set up to make sure that the DB is rebuilt/updated regularly.

Do you happen to know something about Whoosh? have you ever used it?


IOW, do the bulk of the work once at startup.


Sound advice

Thank you
--
https://mail.python.org/mailman/listinfo/python-list


Re: RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino
Thank you for taking the time to write such a detailed answer, Avi. And 
apologies for not providing more info from the get go.


What I am trying to achieve here is supporting autocomplete (no pun 
intended) in a web form field, hence the -i case insensitive example in 
my initial question.


Your points are all good, and my original question was a bit rushed. I 
guess that the problem was that I saw this video:


https://www.youtube.com/watch?v=gRvZbYtwTeo_channel=NextDayVideo

The idea that someone types into an input field and matches start 
dancing in the browser made me think that this was exactly what I 
needed, and hence I figured that asking here about Whoosh would be a 
good idea. I know realize that Whoosh would be overkill for my use-case, 
as a simple (case insensitive) query substring would get me 90% of what 
I want. Speed is in the order of a few milliseconds out of the box, 
which is chump change in the context of a web UI.


Thank you again for taking the time to look at my question

Dino

On 3/5/2023 10:56 PM, avi.e.gr...@gmail.com wrote:

Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it.

Your question does not look difficult unless your real question is about
speed. Realistically, much of the time spent generally is in reading in a
file and the actual search can be quite rapid with a wide range of methods.

The data looks boring enough and seems to not have much structure other than
one comma possibly separating two fields. Do you want the data as one wide
filed or perhaps in two parts, which a CSV file is normally used to
represent. Do you ever have questions like tell me all cars whose name
begins with the letter D and has a V6 engine? If so, you may want more than
a vanilla search.

What exactly do you want to search for? Is it a set of built-in searches or
something the user types in?

The data seems to be sorted by the first field and then by the second and I
did not check if some searches might be ambiguous. Can there be many entries
containing III? Yep. Can the same words like Cruiser or Hybrid appear?

So is this a one-time search or multiple searches once loaded as in a
service that stays resident and fields requests. The latter may be worth
speeding up.

I don't NEED to know any of this but want you to know that the answer may
depend on this and similar factors. We had a long discussion lately on
whether to search using regular expressions or string methods. If your data
is meant to be used once, you may not even need to read the file into
memory, but read something like a line at a time and test it. Or, if you end
up with more data like how many cylinders a car has, it may be time to read
it in not just to a list of lines or such data structures, but get
numpy/pandas involved and use their many search methods in something like a
data.frame.

Of course if you are worried about portability, keep using Get Regular
Expression Print.

Your example was:

  $ grep -i v60 all_cars_unique.csv
  Genesis,GV60
  Volvo,V60

You seem to have wanted case folding and that is NOT a normal search. And
your search is matching anything on any line. If you wanted only a complete
field, such as all text after a comma to the end of the line, you could use
grep specifications to say that.

But once inside python, you would need to make choices depending on what
kind of searches you want to allow but also things like do you want all
matching lines shown if you search for say "a" ...



--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/5/2023 1:19 AM, Greg Ewing wrote:

I just did a similar test with your actual data and got
about the same result. If that's fast enough for you,
then you don't need to do anything fancy.


thank you, Greg. That's what I am going to do in fact.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino



Here's the complete data file should anyone care.

Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend
Acura,MDX
Acura,MDX Sport Hybrid
Acura,NSX
Acura,RDX
Acura,RL
Acura,RLX
Acura,RLX Sport Hybrid
Acura,RSX
Acura,SLX
Acura,TL
Acura,TLX
Acura,TSX
Acura,Vigor
Acura,ZDX
Alfa Romeo,164
Alfa Romeo,4C
Alfa Romeo,4C Spider
Alfa Romeo,Giulia
Alfa Romeo,Spider
Alfa Romeo,Stelvio
Alfa Romeo,Tonale
Aston Martin,DB11
Aston Martin,DB9
Aston Martin,DB9 GT
Aston Martin,DBS
Aston Martin,DBS Superleggera
Aston Martin,DBX
Aston Martin,Rapide
Aston Martin,Rapide S
Aston Martin,Vanquish
Aston Martin,Vanquish S
Aston Martin,Vantage
Aston Martin,Virage
Audi,100
Audi,80
Audi,90
Audi,A3
Audi,A3 Sportback e-tron
Audi,A4
Audi,A4 (2005.5)
Audi,A4 allroad
Audi,A5
Audi,A5 Sport
Audi,A6
Audi,A6 allroad
Audi,A7
Audi,A8
Audi,Cabriolet
Audi,Q3
Audi,Q4 Sportback e-tron
Audi,Q4 e-tron
Audi,Q5
Audi,Q5 Sportback
Audi,Q7
Audi,Q8
Audi,Quattro
Audi,R8
Audi,RS 3
Audi,RS 4
Audi,RS 5
Audi,RS 6
Audi,RS 7
Audi,RS Q8
Audi,RS e-tron GT
Audi,S3
Audi,S4
Audi,S4 (2005.5)
Audi,S5
Audi,S6
Audi,S7
Audi,S8
Audi,SQ5
Audi,SQ5 Sportback
Audi,SQ7
Audi,SQ8
Audi,TT
Audi,allroad
Audi,e-tron
Audi,e-tron GT
Audi,e-tron S
Audi,e-tron S Sportback
Audi,e-tron Sportback
BMW,1 Series
BMW,2 Series
BMW,3 Series
BMW,4 Series
BMW,5 Series
BMW,6 Series
BMW,7 Series
BMW,8 Series
BMW,Alpina B7
BMW,M
BMW,M2
BMW,M3
BMW,M4
BMW,M5
BMW,M6
BMW,M8
BMW,X1
BMW,X2
BMW,X3
BMW,X3 M
BMW,X4
BMW,X4 M
BMW,X5
BMW,X5 M
BMW,X6
BMW,X6 M
BMW,X7
BMW,Z3
BMW,Z4
BMW,Z4 M
BMW,Z8
BMW,i3
BMW,i4
BMW,i7
BMW,i8
BMW,iX
Bentley,Arnage
Bentley,Azure
Bentley,Azure T
Bentley,Bentayga
Bentley,Brooklands
Bentley,Continental
Bentley,Continental GT
Bentley,Flying Spur
Bentley,Mulsanne
Buick,Cascada
Buick,Century
Buick,Enclave
Buick,Encore
Buick,Encore GX
Buick,Envision
Buick,LaCrosse
Buick,LeSabre
Buick,Lucerne
Buick,Park Avenue
Buick,Rainier
Buick,Regal
Buick,Regal Sportback
Buick,Regal TourX
Buick,Rendezvous
Buick,Riviera
Buick,Roadmaster
Buick,Skylark
Buick,Terraza
Buick,Verano
Cadillac,ATS
Cadillac,ATS-V
Cadillac,Allante
Cadillac,Brougham
Cadillac,CT4
Cadillac,CT5
Cadillac,CT6
Cadillac,CT6-V
Cadillac,CTS
Cadillac,CTS-V
Cadillac,Catera
Cadillac,DTS
Cadillac,DeVille
Cadillac,ELR
Cadillac,Eldorado
Cadillac,Escalade
Cadillac,Escalade ESV
Cadillac,Escalade EXT
Cadillac,Fleetwood
Cadillac,LYRIQ
Cadillac,SRX
Cadillac,STS
Cadillac,Seville
Cadillac,Sixty Special
Cadillac,XLR
Cadillac,XT4
Cadillac,XT5
Cadillac,XT6
Cadillac,XTS
Chevrolet,1500 Extended Cab
Chevrolet,1500 Regular Cab
Chevrolet,2500 Crew Cab
Chevrolet,2500 Extended Cab
Chevrolet,2500 HD Extended Cab
Chevrolet,2500 HD Regular Cab
Chevrolet,2500 Regular Cab
Chevrolet,3500 Crew Cab
Chevrolet,3500 Extended Cab
Chevrolet,3500 HD Extended Cab
Chevrolet,3500 HD Regular Cab
Chevrolet,3500 Regular Cab
Chevrolet,APV Cargo
Chevrolet,Astro Cargo
Chevrolet,Astro Passenger
Chevrolet,Avalanche
Chevrolet,Avalanche 1500
Chevrolet,Avalanche 2500
Chevrolet,Aveo
Chevrolet,Beretta
Chevrolet,Blazer
Chevrolet,Blazer EV
Chevrolet,Bolt EUV
Chevrolet,Bolt EV
Chevrolet,Camaro
Chevrolet,Caprice
Chevrolet,Caprice Classic
Chevrolet,Captiva Sport
Chevrolet,Cavalier
Chevrolet,City Express
Chevrolet,Classic
Chevrolet,Cobalt
Chevrolet,Colorado Crew Cab
Chevrolet,Colorado Extended Cab
Chevrolet,Colorado Regular Cab
Chevrolet,Corsica
Chevrolet,Corvette
Chevrolet,Cruze
Chevrolet,Cruze Limited
Chevrolet,Equinox
Chevrolet,Equinox EV
Chevrolet,Express 1500 Cargo
Chevrolet,Express 1500 Passenger
Chevrolet,Express 2500 Cargo
Chevrolet,Express 2500 Passenger
Chevrolet,Express 3500 Cargo
Chevrolet,Express 3500 Passenger
Chevrolet,G-Series 1500
Chevrolet,G-Series 2500
Chevrolet,G-Series 3500
Chevrolet,G-Series G10
Chevrolet,G-Series G20
Chevrolet,G-Series G30
Chevrolet,HHR
Chevrolet,Impala
Chevrolet,Impala Limited
Chevrolet,Lumina
Chevrolet,Lumina APV
Chevrolet,Lumina Cargo
Chevrolet,Lumina Passenger
Chevrolet,Malibu
Chevrolet,Malibu (Classic)
Chevrolet,Malibu Limited
Chevrolet,Metro
Chevrolet,Monte Carlo
Chevrolet,Prizm
Chevrolet,S10 Blazer
Chevrolet,S10 Crew Cab
Chevrolet,S10 Extended Cab
Chevrolet,S10 Regular Cab
Chevrolet,SS
Chevrolet,SSR
Chevrolet,Silverado (Classic) 1500 Crew Cab
Chevrolet,Silverado (Classic) 1500 Extended Cab
Chevrolet,Silverado (Classic) 1500 HD Crew Cab
Chevrolet,Silverado (Classic) 1500 Regular Cab
Chevrolet,Silverado (Classic) 2500 HD Crew Cab
Chevrolet,Silverado (Classic) 2500 HD Extended Cab
Chevrolet,Silverado (Classic) 2500 HD Regular Cab
Chevrolet,Silverado (Classic) 3500 Crew Cab
Chevrolet,Silverado (Classic) 3500 Extended Cab
Chevrolet,Silverado (Classic) 3500 Regular Cab
Chevrolet,Silverado 1500 Crew Cab
Chevrolet,Silverado 1500 Double Cab
Chevrolet,Silverado 1500 Extended Cab
Chevrolet,Silverado 1500 HD Crew Cab
Chevrolet,Silverado 1500 LD Double Cab
Chevrolet,Silverado 1500 Limited Crew Cab
Chevrolet,Silverado 1500 Limited Double Cab
Chevrolet,Silverado 1500 Limited Regular Cab
Chevrolet,Silverado 1500 Regular Cab
Chevrolet,Silverado 2500 Crew Cab
Chevrolet,Silverado 

Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino



I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


$ head all_cars_unique.csv\
Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend
Acura,MDX
Acura,MDX Sport Hybrid
Acura,NSX
Acura,RDX
Acura,RL
Acura,RLX
$ wc -l all_cars_unique.csv
1415 all_cars_unique.csv
$ grep -i v60 all_cars_unique.csv
Genesis,GV60
Volvo,V60
$

Essentially, I want my input field to suggest autofill options with data 
from this file/list. The user types "v60" and a REST point will offer:


[
 {"model":"GV60", "manufacturer":"Genesis"},
 {"model":"V60", "manufacturer":"Volvo"}
]

i.e. a JSON response that I can use to generate the autofill with 
JavaScript. My Back-End is Python (Flask).


How can I implement this? A library called Whoosh seems very promising 
(albeit it's so feature-rich that it's almost like shooting a fly with a 
bazooka in my case), but I see two problems:


 1) Whoosh is either abandoned or the project is a mess in terms of 
community and support (https://groups.google.com/g/whoosh/c/QM_P8cGi4v4 
) and


 2) Whoosh seems to be a Python only thing, which is great for now, but 
I wouldn't want this to become an obstacle should I need port it to a 
different language at some point.


are there other options that are fast out there? Can I "grep" through a 
data structure in python... but faster?


Thanks

Dino

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino

On 3/4/2023 10:43 PM, Dino wrote:


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


I suspect I am really close to answering my own question...

>>> import time
>>> lis = [str(a**2+a*3+a) for a in range(0,3)]
>>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s);

753800
>>> s = time.process_time_ns(); res = [el for el in lis if "52356" in 
el]; print(time.process_time_ns() -s);

1068300
>>> s = time.process_time_ns(); res = [el for el in lis if "5256" in 
el]; print(time.process_time_ns() -s);

862000
>>> s = time.process_time_ns(); res = [el for el in lis if "6" in el]; 
print(time.process_time_ns() -s);

1447300
>>> s = time.process_time_ns(); res = [el for el in lis if "1" in el]; 
print(time.process_time_ns() -s);

1511100
>>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s); print(len(res), res[:10])

926900
2 ['134676021', '313467021']
>>>

I can do a substring search in a list of 30k elements in less than 2ms 
with Python. Is my reasoning sound?


Dino


--
https://mail.python.org/mailman/listinfo/python-list


Re: LRU cache

2023-02-17 Thread Dino



Thank you, Gerard. I really appreciate your help

Dino

On 2/16/2023 9:40 PM, Weatherby,Gerard wrote:

I think this does the trick:

https://gist.github.com/Gerardwx/c60d200b4db8e7864cb3342dd19d41c9


#!/usr/bin/env python3
import collections
import random
from typing import Hashable, Any, Optional, Dict, Tuple


class LruCache:
 """Dictionary like storage of most recently inserted values"""

 def __init__(self, size: int = 1000):
 """:param size number of cached entries"""
 assert isinstance(size, int)
 self.size = size
 self.insert_counter = 0
self.oldest = 0
 self._data : Dict[Hashable,Tuple[Any,int]]= {} # store values and age 
index
 self._lru: Dict[int, Hashable] = {} # age counter dictionary

 def insert(self, key: Hashable, value: Any) -> None:
 """Insert into dictionary"""
 existing = self._data.get(key, None)
 self._data[key] = (value, self.insert_counter)
 self._lru[self.insert_counter] = key
 if existing is not None:
 self._lru.pop(existing[1], None)  # remove old counter value, if 
it exists
 self.insert_counter += 1
 if (sz := len(self._data)) > self.size:  # is cache full?
 assert sz == self.size + 1
 while (
 key := self._lru.get(self.oldest, None)) is None:  # index may not 
be present, if value was reinserted
 self.oldest += 1
 del self._data[key]  # remove oldest key / value from dictionary
 del self._lru[self.oldest]
 self.oldest += 1  # next oldest index
 assert len(self._lru) == len(self._data)

 def get(self, key: Hashable) -> Optional[Any]:
 """Get value or return None if not in cache"""
 if (tpl := self._data.get(key, None)) is not None:
 return tpl[0]
 return None


if __name__ == "__main__":
 CACHE_SIZE = 1000
 TEST_SIZE = 1_000_000
 cache = LruCache(size=CACHE_SIZE)

 all = []
 for i in range(TEST_SIZE):
 all.append(random.randint(-5000, 5000))

 summary = collections.defaultdict(int)
 for value in all:
 cache.insert(value, value * value)
 summary[value] += 1
 smallest = TEST_SIZE
 largest = -TEST_SIZE
 total = 0
 for value, count in summary.items():
 smallest = min(smallest, count)
 largest = max(largest, count)
 total += count
 avg = total / len(summary)
 print(f"{len(summary)} values occurrences range from {smallest} to {largest}, 
average {avg:.1f}")

 recent = set()  # recent most recent entries
 for i in range(len(all) - 1, -1, -1):  # loop backwards to get the most 
recent entries
 value = all[i]
 if len(recent) < CACHE_SIZE:
 recent.add(value)
 if value in recent:
 if (r := cache.get(value)) != value * value:
 raise ValueError(f"Cache missing recent {value} {r}")
 else:
 if cache.get(value) != None:
 raise ValueError(f"Cache includes old {value}")

From: Python-list  on behalf of 
Dino 
Date: Wednesday, February 15, 2023 at 3:07 PM
To: python-list@python.org 
Subject: Re: LRU cache
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

Thank you Mats, Avi and Chris

btw, functools.lru_cache seems rather different from what I need, but
maybe I am missing something. I'll look closer.

On 2/14/2023 7:36 PM, Mats Wichmann wrote:

On 2/14/23 15:07, Dino wrote:




--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$>


--
https://mail.python.org/mailman/listinfo/python-list


Re: LRU cache

2023-02-15 Thread Dino



Thank you Mats, Avi and Chris

btw, functools.lru_cache seems rather different from what I need, but 
maybe I am missing something. I'll look closer.


On 2/14/2023 7:36 PM, Mats Wichmann wrote:

On 2/14/23 15:07, Dino wrote:




--
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing caching strategies

2023-02-14 Thread Dino

On 2/10/2023 7:39 PM, Dino wrote:


- How would you structure the caching so that different caching 
strategies are "pluggable"? change one line of code (or even a config 
file) and a different caching strategy is used in the next run. Is this 
the job for a design pattern such as factory or facade?



turns out that the strategy pattern was the right one for me.



--
https://mail.python.org/mailman/listinfo/python-list


LRU cache

2023-02-14 Thread Dino



Here's my problem today. I am using a dict() to implement a quick and 
dirty in-memory cache.


I am stopping adding elements when I am reaching 1000 elements (totally 
arbitrary number), but I would like to have something slightly more 
sophisticated to free up space for newer and potentially more relevant 
entries.


I am thinking of the Least Recently Used principle, but how to implement 
that is not immediate. Before I embark on reinventing the wheel, is 
there a tool, library or smart trick that will allow me to remove 
elements with LRU logic?


thanks

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Comparing caching strategies

2023-02-13 Thread Dino
First off, a big shout out to Peter J. Holzer, who mentioned roaring 
bitmaps a few days ago and led me to quite a discovery.


Now I am stuck with an internal dispute with another software architect 
(well, with a software architect, I should say, as I probably shouldn't 
define myself a software architect when confronted with people with more 
experience than me in building more complex systems).
Anyway, now that I know what roaring bitmaps are (and what they can 
do!), my point is that we should abandon other attempts to build a 
caching layer for our project and just veer decidedly towards relying on 
those magic bitmaps and screw anything else. Sure, there is some 
overhead  marshaling our entries into integers and back, but the sheer 
speed and compactness of RBMs trump any other consideration (according 
to me, not according to the other guy, obviously).


Long story short: I want to prototype a couple of caching strategies in 
Python using bitmaps, and measure both performance and speed.


So, here are a few questions from an inexperienced programmer for you, 
friends. Apologies if they are a bit "open ended".


- How would you structure the caching so that different caching 
strategies are "pluggable"? change one line of code (or even a config 
file) and a different caching strategy is used in the next run. Is this 
the job for a design pattern such as factory or facade?


- what tool should I use to measure/log performance and memory 
occupation of my script? Google is coming up with quite a few options, 
but I value the opinion of people here a lot.


Thank you for any feedback you may be able to provide.

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Re: RE: RE: bool and int

2023-01-28 Thread Dino



you have your reasons, and I was tempted to stop there, but... I have to 
pick this...


On 1/26/2023 10:09 PM, avi.e.gr...@gmail.com wrote:

  You can often borrow
ideas and code from an online search and hopefully cobble "a" solution
together that works well enough. Of course it may suddenly fall apart.


also carefully designed systems that are the work of experts may 
suddenly fall apart.


Thank you for all the time you have used to address the points I raised. 
It was interesting reading.


Dino

--
https://mail.python.org/mailman/listinfo/python-list


Re: bool and int

2023-01-26 Thread Dino

On 1/25/2023 5:42 PM, Chris Angelico wrote:



Try this (or its equivalent) in as many languages as possible:

x = (1 > 2)
x == 0

You'll find that x (which has effectively been set to False, or its
equivalent in any language) will be equal to zero in a very large
number of languages. Thus, to an experienced programmer, it would
actually be quite the opposite: having it NOT be a number would be the
surprising thing!


I thought I had already responded to this, but I can't see it. Weird.

Anyway, straight out of the Chrome DevTools console:

x = (1>2)
false

x == 0
true

typeof(x)
'boolean'

typeof(0)
'number'

typeof(x) == 'number'
false

So, you are technically correct, but you can see that JavaScript - which 
comes with many gotchas - does not offer this particular one.



--
https://mail.python.org/mailman/listinfo/python-list


Re: RE: bool and int

2023-01-26 Thread Dino



Wow. That was quite a message and an interesting read. Tempted to go 
deep and say what I agree and what I disagree with, but there are two 
issues: 1) time 2) I will soon be at a disadvantage discussing with 
people (you or others) who know more than me (which doesn't make them 
right necessarily, but certainly they'll have the upper-hand in a 
discussion).


Personally, in the first part of my career I got into the habit of 
learning things fast, sometimes superficially I confess, and then get 
stuff done hopefully within time and budget. Not the recommended 
approach if you need to build software for a nuclear plant. An OK 
approach (within reason) if you build websites or custom solutions for 
this or that organization and the budget is what it is. After all, 
technology moves sooo fast, and what we learn in detail today is bound 
to be old and possibly useless 5 years down the road.


Also, I argue that there is value in having familiarity with lots of 
different technologies (front-end and back-end) and knowing (or at 
lease, having a sense) of how they can all be made play together with an 
appreciation of the different challenges and benefits that each domain 
offers.


Anyway, everything is equivalent to a Turing machine and IA will screw 
everyone, including programmers, eventually.


Thanks again and have a great day

Dino

On 1/25/2023 9:14 PM, avi.e.gr...@gmail.com wrote:

Dino,

There is no such things as a "principle of least surprise" or if you insist
there is, I can nominate many more such "rules" such as "the principle of
get out of my way and let me do what I want!"

Computer languages with too many rules are sometimes next to unusable in
practical situations.

I am neither defending or attacking choices Python or other languages have
made. I merely observe and agree to use languages carefully and as
documented.


--
https://mail.python.org/mailman/listinfo/python-list


Re: HTTP server benchmarking/load testing in Python

2023-01-26 Thread Dino

On 1/25/2023 4:30 PM, Thomas Passin wrote:

On 1/25/2023 3:29 PM, Dino wrote:
Great!  Don't forget what I said about potential overheating if you hit 
the server with as many requests as it can handle.


Noted. Thank you.




--
https://mail.python.org/mailman/listinfo/python-list


Re: HTTP server benchmarking/load testing in Python

2023-01-25 Thread Dino

On 1/25/2023 3:27 PM, Dino wrote:

On 1/25/2023 1:33 PM, orzodk wrote:


I have used locust with success in the past.

https://locust.io


First impression, exactly what I need. Thank you Orzo!


the more I learn about Locust and I tinker with it, the more I love it. 
Thanks again.

--
https://mail.python.org/mailman/listinfo/python-list


Re: HTTP server benchmarking/load testing in Python

2023-01-25 Thread Dino

On 1/25/2023 1:21 PM, Thomas Passin wrote:



I actually have a Python program that does exactly this.  


Thank you, Thomas. I'll check out Locust, mentioned by Orzodk, as it 
looks like a mature library that appears to do exactly what I was hoping.




--
https://mail.python.org/mailman/listinfo/python-list


Re: HTTP server benchmarking/load testing in Python

2023-01-25 Thread Dino

On 1/25/2023 1:33 PM, orzodk wrote:



I have used locust with success in the past.

https://locust.io


First impression, exactly what I need. Thank you Orzo!
--
https://mail.python.org/mailman/listinfo/python-list


Re: bool and int

2023-01-25 Thread Dino

On 1/23/2023 11:22 PM, Dino wrote:

 >>> b = True
 >>> isinstance(b,bool)
True
 >>> isinstance(b,int)
True
 >>>


ok, I read everything you guys wrote. Everyone's got their reasons 
obviously, but allow me to observe that there's also something called 
"principle of least surprise".


In my case, it took me some time to figure out where a nasty bug was 
hidden. Letting a bool be a int is quite a gotcha, no matter how hard 
the benevolent dictator tries to convince me otherwise!




--
https://mail.python.org/mailman/listinfo/python-list


HTTP server benchmarking/load testing in Python

2023-01-25 Thread Dino



Hello, I could use something like Apache ab in Python ( 
https://httpd.apache.org/docs/2.4/programs/ab.html ).


The reason why ab doesn't quite cut it for me is that I need to define a 
pool of HTTP requests and I want the tool to run those (as opposed to 
running the same request over and over again)


Does such a marvel exist?

Thinking about it, it doesn't necessarily need to be Python, but I guess 
I would have a chance to tweak things if it was.


Thanks

Dino
--
https://mail.python.org/mailman/listinfo/python-list


bool and int

2023-01-24 Thread Dino



$ python
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> b = True
>>> isinstance(b,bool)
True
>>> isinstance(b,int)
True
>>>

WTF!

--
https://mail.python.org/mailman/listinfo/python-list


Re: tree representation of Python data

2023-01-21 Thread Dino


you rock. Thank you, Stefan.

Dino

On 1/21/2023 2:41 PM, Stefan Ram wrote:

r...@zedat.fu-berlin.de (Stefan Ram) writes:

def display_( object, last ):
directory = object; result = ''; count = len( directory )
for entry in directory:
count -= 1; name = entry; indent = ''
for c in last[ 1: ]: indent += '│   ' if c else ''
indent += '├──' if count else '└──' if last else ''
result += '\n' + indent +( ' ' if indent else '' )+ name
if directory[ entry ]:
result += display_( directory[ entry ], last +[ count ])
return result


   This ultimate version has some variable names made more speaking:

def display_( directory, container_counts ):
 result = ''; count = len( directory )
 for name in directory:
 count -= 1; indent = ''
 for container_count in container_counts[ 1: ]:
 indent += '│   ' if container_count else ''
 indent += '├──' if count else '└──' if container_counts else ''
 result += '\n' + indent +( ' ' if indent else '' )+ name
 if directory[ name ]:
 result += display_\
 ( directory[ name ], container_counts +[ count ])
 return result




--
https://mail.python.org/mailman/listinfo/python-list


Re: ok, I feel stupid, but there must be a better way than this! (finding name of unique key in dict)

2023-01-21 Thread Dino



I learned new things today and I thank you all for your responses.

Please consider yourself thanked individually.

Dino

On 1/20/2023 10:29 AM, Dino wrote:


let's say I have this list of nested dicts:


--
https://mail.python.org/mailman/listinfo/python-list


tree representation of Python data

2023-01-21 Thread Dino


I have a question that is a bit of a shot in the dark. I have this nice 
bash utility installed:


$ tree -d unit/
unit/
├── mocks
├── plugins
│   ├── ast
│   ├── editor
│   ├── editor-autosuggest
│   ├── editor-metadata
│   ├── json-schema-validator
│   │   └── test-documents
│   └── validate-semantic
│   ├── 2and3
│   ├── bugs
│   └── oas3
└── standalone
└── topbar-insert

I just thought that it would be great if there was a Python utility that 
visualized a similar graph for nested data structures.
Of course I am aware of indent (json.dumps()) and pprint, and they are 
OK options for my need. It's just that the compact, improved 
visualization would be nice to have. Not so nice that I would go out of 
my way to build, but nice enough to use an exising package.


Thanks

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Re: ok, I feel stupid, but there must be a better way than this! (finding name of unique key in dict)

2023-01-20 Thread Dino

On 1/20/2023 11:06 AM, Tobiah wrote:

On 1/20/23 07:29, Dino wrote:




This doesn't look like the program output you're getting.


you are right that I tweaked the name of fields and variables manually 
(forgot a couple of places, my bad) to illustrate the problem more 
generally, but hopefully you get the spirit.


"value": cn,
"a": cd[cn]["a"],
"b": cd[cn]["b"]

Anyway, the key point (ooops, a pun) is if there's a more elegant way to 
do this (i.e. get a reference to the unique key in a dict() when the key 
is unknown):


cn = list(cd.keys())[0] # There must be a better way than this!

Thanks

--
https://mail.python.org/mailman/listinfo/python-list


ok, I feel stupid, but there must be a better way than this! (finding name of unique key in dict)

2023-01-20 Thread Dino



let's say I have this list of nested dicts:

[
  { "some_key": {'a':1, 'b':2}},
  { "some_other_key": {'a':3, 'b':4}}
]

I need to turn this into:

[
  { "value": "some_key", 'a':1, 'b':2},
  { "value": "some_other_key", 'a':3, 'b':4}
]

I actually did it with:

listOfDescriptors = list()
for cd in origListOfDescriptors:
cn = list(cd.keys())[0] # There must be a better way than this!
listOfDescriptors.append({
"value": cn,
"type": cd[cn]["a"],
"description": cd[cn]["b"]
})

and it works, but I look at this and think that there must be a better 
way. Am I missing something obvious?


PS: Screw OpenAPI!

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-17 Thread Dino


Thanks a lot, Edmondo. Or better... Grazie mille.

On 1/17/2023 5:42 AM, Edmondo Giovannozzi wrote:


Sorry,
I was just creating an array of 400x10 elements that I fill with random 
numbers:

   a = np.random.randn(400,100_000)

Then I pick one element randomly, it is just a stupid sort on a row and then I 
take an element in another row, but it doesn't matter, I'm just taking a random 
element. I may have used other ways to get that but was the first that came to 
my mind.

  ia = np.argsort(a[0,:])
  a_elem = a[56, ia[0]]

The I'm finding that element in the all the matrix a (of course I know where it 
is, but I want to test the speed of a linear search done on the C level):

%timeit isel = a == a_elem

Actually isel is a logic array that is True where a[i,j] == a_elem and False 
where a[i,j] != a_elem. It may find more then one element but, of course, in 
our case it will find only the element that we have selected at the beginning. 
So it will give the speed of a linear search plus the time needed to allocate 
the logic array. The search is on the all matrix of 40 million of elements not 
just on one of its row of 100k element.

On the single row (that I should say I have chosen to be contiguous) is much 
faster.

%timeit isel = a[56,:] == a_elem
26 µs ± 588 ns per loop (mean ± std. dev. of 7 runs, 1 loops each)

the matrix is a double precision numbers that is 8 byte, I haven't tested it on 
string of characters.

This wanted to be an estimate of the speed that one can get going to the C 
level.
You loose of course the possibility to have a relational database, you need to 
have everything in memory, etc...

A package that implements tables based on numpy is pandas: 
https://pandas.pydata.org/

I hope that it can be useful.




--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-16 Thread Dino

On 1/16/2023 1:18 PM, Edmondo Giovannozzi wrote:


As a comparison with numpy. Given the following lines:

import numpy as np
a = np.random.randn(400,100_000)
ia = np.argsort(a[0,:])
a_elem = a[56, ia[0]]

I have just taken an element randomly in a numeric table of 400x10 elements
To find it with numpy:

%timeit isel = a == a_elem
35.5 ms ± 2.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

And
%timeit a[isel]
9.18 ms ± 371 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As data are not ordered it is searching it one by one but at C level.
Of course it depends on a lot of thing...


thank you for this. It's probably my lack of experience with Numpy, 
but... can you explain what is going on here in more detail?


Thank you

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-16 Thread Dino

On 1/16/2023 2:53 AM, David wrote:

See here:
   https://docs.python.org/3/reference/expressions.html#assignment-expressions
   https://realpython.com/python-walrus-operator/


Thank you, brother.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-16 Thread Dino



Just wanted to take a moment to express my gratitude to everyone who 
responded here. You have all been so incredibly helpful. Thank you


Dino

On 1/14/2023 11:26 PM, Dino wrote:


Hello, I have built a PoC service in Python Flask for my work, and - now 
that the point is made - I need to make it a little more performant (to 
be honest, chances are that someone else will pick up from where I left 
off, and implement the same service from scratch in a different language 
(GoLang? .Net? Java?) but I am digressing).

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-15 Thread Dino

On 1/15/2023 2:23 PM, Weatherby,Gerard wrote:

That’s about what I got using a Python dictionary on random data on a high 
memory machine.

https://github.com/Gerardwx/database_testing.git

It’s not obvious to me how to get it much faster than that.


Gerard, you are a rockstar. This is going to be really useful if I do 
decide to adopt sqlite3 for my PoC, as I understand what's going on 
conceptually, but never really used sqlite (nor SQL in a long long 
time), so this may save me a bunch of time.


I created a 300 Mb DB using your script. Then:

$ ./readone.py
testing 2654792 of 4655974
Found somedata0002654713 for 1ed9f9cd-0a9e-47e3-b0a7-3e1fcdabe166 in 
0.23933520219 seconds


$ ./prefetch.py
Index build 4.42093784897 seconds
testing 3058568 of 4655974
Found somedata202200 for 5dca1455-9cd6-4e4d-8e5a-7e6400de7ca7 in 
4.443999403715e-06 seconds


So, if I understand right:

1) once I built a dict out of the DB (in about 4 seconds), I was able to 
lookup an entry/record in 4 microseconds(!)


2) looking up a record/entry using a Sqlite query took 0.2 seconds (i.e. 
500x slower)


Interesting. Thank you for this. Very informative. I really appreciate 
that you took the time to write this.


The conclusion seems to me that I probably don't want to go the Sqlite 
route, as I would be placing my data into a database just to extract it 
back into a dict when I need it if I want it fast.


Ps: a few minor fixes to the README as this may be helpful to others.

./venv/... => ./env/..

i.e.
 ./env/bin/pip install -U pip
 ./env/bin/pip install -e .

Also add part in []

Run create.py [size of DB in bytes] prior to running readone.py and/or 
prefetch.py


BTW, can you tell me what is going on here? what's := ?

   while (increase := add_some(conn,adding)) == 0:

https://github.com/Gerardwx/database_testing/blob/main/src/database_testing/create.py#L40

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-15 Thread Dino



Thank you, Peter. Yes, setting up my own indexes is more or less the 
idea of the modular cache that I was considering. Seeing others think in 
the same direction makes it look more viable.


About Scalene, thank you for the pointer. I'll do some research.

Do you have any idea about the speed of a SELECT query against a 100k 
rows / 300 Mb Sqlite db?


Dino

On 1/15/2023 6:14 AM, Peter J. Holzer wrote:

On 2023-01-14 23:26:27 -0500, Dino wrote:

Hello, I have built a PoC service in Python Flask for my work, and - now
that the point is made - I need to make it a little more performant (to be
honest, chances are that someone else will pick up from where I left off,
and implement the same service from scratch in a different language (GoLang?
.Net? Java?) but I am digressing).

Anyway, my Flask service initializes by loading a big "table" of 100k rows
and 40 columns or so (memory footprint: order of 300 Mb)


300 MB is large enough that you should at least consider putting that
into a database (Sqlite is probably simplest. Personally I would go with
PostgreSQL because I'm most familiar with it and Sqlite is a bit of an
outlier).

The main reason for putting it into a database is the ability to use
indexes, so you don't have to scan all 100 k rows for each query.

You may be able to do that for your Python data structures, too: Can you
set up dicts which map to subsets you need often?

There are some specialized in-memory bitmap implementations which can be
used for filtering. I've used
[Judy bitmaps](https://judy.sourceforge.net/doc/Judy1_3x.htm) in the
past (mostly in Perl).
These days [Roaring Bitmaps](https://www.roaringbitmap.org/) is probably
the most popular. I see several packages on PyPI - but I haven't used
any of them yet, so no recommendation from me.

Numpy might also help. You will still have linear scans, but it is more
compact and many of the searches can probably be done in C and not in
Python.


As you can imagine, this is not very performant in its current form, but
performance was not the point of the PoC - at least initially.


For performanc optimization it is very important to actually measure
performance, and a good profiler helps very much in identifying hot
spots. Unfortunately until recently Python was a bit deficient in this
area, but [Scalene](https://pypi.org/project/scalene/) looks promising.

 hp



--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-15 Thread Dino


Thank you for your answer, Lars. Just a clarification: I am already 
doing a rough measuring of my queries.


A fresh query without any caching: < 4s.

Cached full query: < 5 micro-s (i.e. 6 orders of magnitude faster)

Desired speed for my POC: 10 Also, I didn't want to ask a question with way too many "moving parts", 
but when I talked about the "table", it's actually a 100k long list of 
IDs. I can then use each ID to invoke an API that will return those 40 
attributes. The API is fast, but still, I am bound to loop through the 
whole thing to respond to the query, that's unless I pre-load the data 
into something that allows faster access.


Also, as you correctly observed, "looking good with my colleagues" is a 
nice-to-have feature at this point, not really an absolute requirement :)


Dino

On 1/15/2023 3:17 AM, Lars Liedtke wrote:

Hey,

before you start optimizing. I would suggest, that you measure response 
times and query times, data search times and so on. In order to save 
time, you have to know where you "loose" time.


Does your service really have to load the whole table at once? Yes that 
might lead to quicker response times on requests, but databases are 
often very good with caching themselves, so that the first request might 
be slower than following requests, with similar parameters. Do you use a 
database, or are you reading from a file? Are you maybe looping through 
your whole dataset on every request? Instead of asking for the specific 
data?


Before you start introducing a cache and its added complexity, do you 
really need that cache?


You are talking about saving microseconds, that sounds a bit as if you 
might be “overdoing” it. How many requests will you have in the future? 
At least in which magnitude and how quick do they have to be? You write 
about 1-4 seconds on your laptop. But that does not really tell you that 
much, because most probably the service will run on a server. I am not 
saying that you should get a server or a cloud-instance to test against, 
but to talk with your architect about that.


I totally understand your impulse to appear as good as can be, but you 
have to know where you really need to debug and optimize. It will not be 
advantageous for you, if you start to optimize for optimizing's sake. 
Additionally if you service is a PoC, optimizing now might be not the 
first thing you have to worry about, but about that you made everything 
as simple and readable as possible and that you do not spend too much 
time for just showing how it could work.


But of course, I do not know the tasks given to you and the expectations 
you have to fulfil. All I am trying to say is to reconsider where you 
really could improve and how far you have to improve.




--
https://mail.python.org/mailman/listinfo/python-list


Fast lookup of bulky "table"

2023-01-14 Thread Dino



Hello, I have built a PoC service in Python Flask for my work, and - now 
that the point is made - I need to make it a little more performant (to 
be honest, chances are that someone else will pick up from where I left 
off, and implement the same service from scratch in a different language 
(GoLang? .Net? Java?) but I am digressing).


Anyway, my Flask service initializes by loading a big "table" of 100k 
rows and 40 columns or so (memory footprint: order of 300 Mb) and then 
accepts queries through a REST endpoint. Columns are strings, enums, and 
numbers. Once initialized, the table is read only. The endpoint will 
parse the query and match it against column values (equality, 
inequality, greater than, etc.) Finally, it will return a (JSON) list of 
all rows that satisfy all conditions in the query.


As you can imagine, this is not very performant in its current form, but 
performance was not the point of the PoC - at least initially.


Before I deliver the PoC to a more experienced software architect who 
will look at my code, though, I wouldn't mind to look a bit less lame 
and do something about performance in my own code first, possibly by 
bringing the average time for queries down from where it is now (order 
of 1 to 4 seconds per query on my laptop) to 1 or 2 milliseconds on 
average).


To be honest, I was already able to bring the time down to a handful of 
microseconds thanks to a rudimentary cache that will associate the 
"signature" of a query to its result, and serve it the next time the 
same query is received, but this may not be good enough: 1) queries 
might be many and very different from one another each time, AND 2) I am 
not sure the server will have a ton of RAM if/when this thing - or 
whatever is derived from it - is placed into production.


How can I make my queries generally more performant, ideally also in 
case of a new query?


Here's what I have been considering:

1. making my cache more "modular", i.e. cache the result of certain 
(wide) queries. When a complex query comes in, I may be able to restrict 
my search to a subset of the rows (as determined by a previously cached 
partial query). This should keep the memory footprint under control.


2. Load my data into a numpy.array and use numpy.array operations to 
slice and dice my data.


3. load my data into sqlite3 and use SELECT statement to query my table. 
I have never used sqllite, plus there's some extra complexity as 
comparing certain colum requires custom logic, but I wonder if this 
architecture would work well also when dealing with a 300Mb database.


4. Other ideas?

Hopefully I made sense. Thank you for your attention

Dino
--
https://mail.python.org/mailman/listinfo/python-list


[issue46965] Enable informing callee it's awaited via vector call flag

2022-03-17 Thread Dino Viehland


Dino Viehland  added the comment:

Doh, sorry about that link, this one goes to a specific commit: 
https://github.com/facebookincubator/cinder/blob/6863212ada4b569c15cd95c4e7a838f254c8ccfb/Python/ceval.c#L6642

I do think a new opcode is a good way to go, and that could just be emitted by 
the compiler when it recognizes the pattern.  I think we mainly avoided that 
because we had some issues around performance testing when we updated the byte 
code version and the peek was negligible, but with improved call performance in 
3.11 that may not be the case anymore.

It's probably possible to keep most of gather in Python if necessary, there'd 
still need to be a C wrapper which could flow the wrapper in and the wait 
handle creation would need to be possible from Python (which slightly scares 
me).  There's probably also a perf win from the C implementation - I'll see if 
@v2m has any data on that.

--

___
Python tracker 
<https://bugs.python.org/issue46965>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46965] Enable informing callee it's awaited via vector call flag

2022-03-08 Thread Dino Viehland


New submission from Dino Viehland :

The idea here is to add a new flag to the vectorcall nargs that indicates the 
call is being awaited: _Py_AWAITED_CALL_MARKER.  This flag will allow the 
callee to know that it's being eagerly evaluated.  When the call is eagerly 
evaluated the callee can potentially avoid various amounts of overhead.  For a 
coroutine the function can avoid creating the coroutine object and instead 
returns a singleton instance of a wait handle indicating eager execution has 
occurred:
https://github.com/facebookincubator/cinder/blob/cinder/3.8/Python/ceval.c#L6617

This gives a small win by reducing the overhead of allocating the co-routine 
object.

For something like gather much more significant wins can be achieved.  If all 
of the inputs have already been computed the creation of tasks and scheduling 
of them to the event loop can be elided.  An example implementation of this is 
available in Cinder: 
https://github.com/facebookincubator/cinder/blob/cinder/3.8/Modules/_asynciomodule.c#L7103

Again the gather implementation uses the singleton wait handle object to return 
the value indicating the computation completed synchronously.

We've used this elsewhere in Cinder as well - for example if we have an 
"AsyncLazyValue" which lazily performs a one-time computation of a value   and 
caches it.  Therefore the common case becomes that the value is already 
available, and the await can be performed without allocating any intermediate 
values.

--
assignee: dino.viehland
messages: 414782
nosy: Mark.Shannon, carljm, dino.viehland, gvanrossum, itamaro
priority: normal
severity: normal
stage: needs patch
status: open
title: Enable informing callee it's awaited via vector call flag
type: performance
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46965>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30533] missing feature in inspect module: getmembers_static

2021-12-01 Thread Dino Viehland


Change by Dino Viehland :


--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue30533>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30533] missing feature in inspect module: getmembers_static

2021-12-01 Thread Dino Viehland


Dino Viehland  added the comment:


New changeset c2bb29ce9ae4adb6a8123285ad3585907cd4cc73 by Weipeng Hong in 
branch 'main':
bpo-30533: Add docs for `inspect.getmembers_static` (#29874)
https://github.com/python/cpython/commit/c2bb29ce9ae4adb6a8123285ad3585907cd4cc73


--

___
Python tracker 
<https://bugs.python.org/issue30533>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30533] missing feature in inspect module: getmembers_static

2021-11-30 Thread Dino Viehland


Change by Dino Viehland :


--
stage: patch review -> resolved
status: open -> closed
versions: +Python 3.11 -Python 3.9

___
Python tracker 
<https://bugs.python.org/issue30533>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30533] missing feature in inspect module: getmembers_static

2021-11-30 Thread Dino Viehland


Dino Viehland  added the comment:


New changeset af8c8caaf5e07c02202d736a31f6a2f7e27819b8 by Weipeng Hong in 
branch 'main':
bpo-30533:Add function inspect.getmembers_static that does not call properties 
or dynamic properties. (#20911)
https://github.com/python/cpython/commit/af8c8caaf5e07c02202d736a31f6a2f7e27819b8


--
nosy: +dino.viehland

___
Python tracker 
<https://bugs.python.org/issue30533>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43636] test_descr fails randomly when executed with -R :

2021-03-29 Thread Dino Viehland


Dino Viehland  added the comment:

@vstinner - The fix doesn't change the behavior of _PyType_Lookup and instead 
just fixes a previous unrelated bug.  The condition will typically ever be hit 
once (at startup) as it's very unlikely that versions will wrap, so there 
should be no performance difference after the fix.

--
resolution: fixed -> 
stage: resolved -> 
status: closed -> open

___
Python tracker 
<https://bugs.python.org/issue43636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43636] test_descr fails randomly when executed with -R :

2021-03-26 Thread Dino Viehland


Dino Viehland  added the comment:

I think the issue here is that in assign_version_tag there's this code:

if (type->tp_version_tag == 0) {
// Wrap-around or just starting Python - clear the whole cache
type_cache_clear(cache, 1);
return 1;
}

the return 1 is incorrect, it should be return 0 as a valid version tag hasn't 
been assigned.

--

___
Python tracker 
<https://bugs.python.org/issue43636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43636] test_descr fails randomly when executed with -R :

2021-03-26 Thread Dino Viehland


Dino Viehland  added the comment:

And it looks like we have an entry with a 0 version, but with a valid name:

(type_cache_entry) $3 = {
  version = 0
  name = 0x000100ec44f0
  value = NULL
}

--

___
Python tracker 
<https://bugs.python.org/issue43636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43636] test_descr fails randomly when executed with -R :

2021-03-26 Thread Dino Viehland


Dino Viehland  added the comment:

It's probably worth having an assert that the version tag is valid, that'll 
make it easier to see what's going wrong in the cache hit case.  We should have 
the version tag being 0 now when it's not valid.  I may not be able to debug it 
anymore tonight, but if no will look tomorrow.

--

___
Python tracker 
<https://bugs.python.org/issue43636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43636] test_descr fails randomly when executed with -R :

2021-03-26 Thread Dino Viehland


Dino Viehland  added the comment:

Looking!

--

___
Python tracker 
<https://bugs.python.org/issue43636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43452] Microoptimize PyType_Lookup for cache hits

2021-03-19 Thread Dino Viehland


Dino Viehland  added the comment:

Setup a micro-benchmark, foo.c:

#define PY_SSIZE_T_CLEAN
#include 
#include 
#include 

int
main(int argc, char *argv[])
{
wchar_t *program = Py_DecodeLocale(argv[0], NULL);
if (program == NULL) {
fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
exit(1);
}
Py_SetProgramName(program);  /* optional but recommended */
Py_Initialize();
PyObject *pName = PyUnicode_DecodeFSDefault("foo");
if (pName == NULL) { printf("no foo\n"); PyErr_Print(); }
PyObject *pModule = PyImport_Import(pName);
if (pModule == NULL) { printf("no mod\n"); PyErr_Print(); return 0; }
PyObject *cls = PyObject_GetAttrString(pModule, "C");
if (cls == NULL) { printf("no cls\n"); }
PyObject *fs[20];
for(int i = 0; i<20; i++) {
 char buf[4];
 sprintf(buf, "f%d", i);
 fs[i] = PyUnicode_DecodeFSDefault(buf);
}
for(int i = 0; i<1; i++) {
 for(int j = 0; j<20; j++) {
 if(_PyType_Lookup(cls, fs[j])==NULL) {
printf("Uh oh\n");
 }
 }
}

   if (Py_FinalizeEx() < 0) {
exit(120);
}
PyMem_RawFree(program);
return 0;
}


Lib/foo.py:
import time


class C:
pass

for i in range(20):
setattr(C, f"f{i}", lambda self: None)


obj hash: 0m6.222s
str hash: 0m6.327s
baseline: 0m6.784s

--

___
Python tracker 
<https://bugs.python.org/issue43452>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43452] Microoptimize PyType_Lookup for cache hits

2021-03-09 Thread Dino Viehland


Change by Dino Viehland :


--
keywords: +patch
pull_requests: +23572
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24804

___
Python tracker 
<https://bugs.python.org/issue43452>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43452] Microoptimize PyType_Lookup for cache hits

2021-03-09 Thread Dino Viehland


New submission from Dino Viehland :

The common case going through _PyType_Lookup is to have a cache hit.  There's 
some small tweaks which can make this a little cheaper:

1) the name field identity is used for a cache hit, and is kept alive by the 
cache.  So there's no need to read the hash code of the name - instead the 
address can be used as the hash.

2) There's no need to check if the name is cachable on the lookup either, it 
probably is, and if it is, it'll be in the cache.

3) If we clear the version tag when invalidating a type then we don't actually 
need to check for a valid version tag bit.

--
components: Interpreter Core
messages: 388377
nosy: dino.viehland
priority: normal
severity: normal
status: open
title: Microoptimize PyType_Lookup for cache hits
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue43452>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42199] bytecode_helper assertNotInBytecode fails too eagerly

2020-12-17 Thread Dino Viehland


Dino Viehland  added the comment:


New changeset 6e799be0a18d0bb5bbbdc77cd3c30a229d31dfb4 by Max Bernstein in 
branch 'master':
bpo-42199: Fix bytecode_helper assertNotInBytecode (#23031)
https://github.com/python/cpython/commit/6e799be0a18d0bb5bbbdc77cd3c30a229d31dfb4


--
nosy: +dino.viehland

___
Python tracker 
<https://bugs.python.org/issue42199>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40255] Fixing Copy on Writes from reference counting

2020-04-14 Thread Dino Viehland


Dino Viehland  added the comment:

I think there's other cases of performance related features being hidden under 
an ifdef.  Computed gotos show up that way, although probably more because it's 
a compiler extension that's not supported everywhere.  Pymalloc is also very 
similar in that it implies an ABI change as well.

I wonder if it would be worth it to introduce an ABI flag for this as well?  On 
the one hand is it a slightly different contract, on the other hand using 
extensions that don't support the immortalization actually work just fine.

--
nosy: +dino.viehland

___
Python tracker 
<https://bugs.python.org/issue40255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26067] test_shutil fails when gid name is missing

2020-03-17 Thread Dino Viehland


Dino Viehland  added the comment:


New changeset 52268941f37e3e27bd01792b081877ec3bc9ce12 by Matthias Braun in 
branch 'master':
bpo-26067: Do not fail test_shutil / chown when gid/uid cannot be resolved 
(#19032)
https://github.com/python/cpython/commit/52268941f37e3e27bd01792b081877ec3bc9ce12


--
nosy: +dino.viehland

___
Python tracker 
<https://bugs.python.org/issue26067>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39551] mock patch should match behavior of import from when module isn't present in sys.modules

2020-02-10 Thread Dino Viehland


Dino Viehland  added the comment:

I like that idea, let me see if I can find a way to do that.  This is a little 
bit different in that it's implicitly trying to find a module, and supports 
dotting through non-packages as well, but maybe there's a way to leverage 
importlib and still keep the existing behavior.

--

___
Python tracker 
<https://bugs.python.org/issue39551>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39551] mock patch should match behavior of import from when module isn't present in sys.modules

2020-02-06 Thread Dino Viehland


Dino Viehland  added the comment:

Sorry, publish may not necessarily be the best term.  When you do "from foo 
import bar" and foo is a package Python will set the bar module onto the foo 
module.  That leads to the common mistake of doing "import foo" and then 
"foo.bar.baz" and later removing the thing that do "from foo import bar" and 
having things blow up later.

Without additional support there's no way to patch the immutable module.  We 
can provide a mode where we enable the patching for testing and disable it in a 
production environment though.  That basically just involves passing a proxy 
object down to Mock.  And monkey patching the mutable module is perfectly fine.

The thing that's doing the ignoring of the assignment is the import system.  So 
it's now okay if the package raises an AttributeError.

There's not really a great way to work around this other than just bypassing 
mock's resolution of the object here - i.e. replacing mock.patch along with 
_get_target, _importer, and _dot_lookup and calling mock._patch directly, which 
isn't very nice (but is do-able).

And while this is a strange way to arrive at a module existing in sys.modules 
but not being on the package it is something that can happen in the normal 
course of imports, hence the reason why the import system handles this by 
checking sys.modules today.  It's also just turning what is currently an error 
while mocking into a success case with a simple few line change.

--
stage: needs patch -> patch review

___
Python tracker 
<https://bugs.python.org/issue39551>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39551] mock patch should match behavior of import from when module isn't present in sys.modules

2020-02-04 Thread Dino Viehland


Dino Viehland  added the comment:

My actual scenario involves a custom module loader where the modules are 
published are completely immutable (it ends up publishing an object which isn't 
a subtype of module).  It can still have normal Python modules as a child which 
aren't immutable, so they could still be patched by Mock (or it could have 
immutable sub-packages which Mock wouldn't be able to patch).  

So imagine something like this:

immutable_package\__init__.py
__immutable__ = True
x = 2

immutable_package\x.py
y = 2


Doing a "from immutable_package import x" would normally publish "x" as a child 
onto the package.  But because the package is immutable, this is impossible, 
and the assignment is ignored with a warning.  

When Mock gets a call to patch on something like "immutable_package.x.y", it's 
not going to find x, even though if I were to write "from immutable_package.x 
import y" or "from immutable_package import x" it would succeed.

Cases can be contrived without all of this though where the child isn't 
published on it's parent, but it requires 

x/__init__.py
from x.pkg import child

x/pkg/__init__.py:
x = 1


x/pkg/child.py:
from unittest.mock import patch
y = 42

@patch('x.pkg.child.y', 100)
def f():
print(y)

f()

"python -m x" will fail without the patch but succeed with it.

--

___
Python tracker 
<https://bugs.python.org/issue39551>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39551] mock patch should match behavior of import from when module isn't present in sys.modules

2020-02-04 Thread Dino Viehland


Dino Viehland  added the comment:

It's related to bpo-39336.  If you have an immutable package which doesn't 
allow it's children to be published on it when following the chain of packages 
it ends up not arriving at the final module.

But you can also hit it if you end up doing the patch during a relative import, 
although that seems much less likely.  But it generally seems matching the 
behavior of imports would be good to me.

--
stage: patch review -> needs patch

___
Python tracker 
<https://bugs.python.org/issue39551>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17636] Modify IMPORT_FROM to fallback on sys.modules

2020-02-04 Thread Dino Viehland


Change by Dino Viehland :


--
pull_requests: +17722
pull_request: https://github.com/python/cpython/pull/18347

___
Python tracker 
<https://bugs.python.org/issue17636>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39551] mock patch should match behavior of import from when module isn't present in sys.modules

2020-02-04 Thread Dino Viehland


Change by Dino Viehland :


--
keywords: +patch
pull_requests: +17721
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/18347

___
Python tracker 
<https://bugs.python.org/issue39551>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39551] mock patch should match behavior of import from when module isn't present in sys.modules

2020-02-04 Thread Dino Viehland


New submission from Dino Viehland :

The fix for bpo-17636 added support for falling back to sys.modules when a 
module isn't directly present on the module.  But mock doesn't have the same 
behavior - it'll try the import, and then try to get the value off the object.  
If it's not there it just errors out.

Instead it should also consult sys.modules to be consistent with import 
semantics.

--
assignee: dino.viehland
components: Tests
messages: 361366
nosy: dino.viehland
priority: normal
severity: normal
stage: needs patch
status: open
title: mock patch should match behavior of import from when module isn't 
present in sys.modules
type: behavior
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue39551>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39459] test_import: test_unwritable_module() fails on AMD64 Fedora Stable Clang Installed 3.x

2020-01-28 Thread Dino Viehland


Dino Viehland  added the comment:

I guess the update to lib.pyproj probably just makes the files show up when 
opening the solution in Visual Studio then.

--

___
Python tracker 
<https://bugs.python.org/issue39459>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39459] test_import: test_unwritable_module() fails on AMD64 Fedora Stable Clang Installed 3.x

2020-01-28 Thread Dino Viehland


Dino Viehland  added the comment:


New changeset 0cd5bff6b7da3118d0c5a88fc2b80f80eb7c3059 by Dino Viehland in 
branch 'master':
bpo-39459: include missing test files in windows installer 
https://github.com/python/cpython/commit/0cd5bff6b7da3118d0c5a88fc2b80f80eb7c3059


--

___
Python tracker 
<https://bugs.python.org/issue39459>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39459] test_import: test_unwritable_module() fails on AMD64 Fedora Stable Clang Installed 3.x

2020-01-28 Thread Dino Viehland


Dino Viehland  added the comment:

Nope, thank you for pointing that out.  I've updated them now with PR 18241

--

___
Python tracker 
<https://bugs.python.org/issue39459>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39459] test_import: test_unwritable_module() fails on AMD64 Fedora Stable Clang Installed 3.x

2020-01-28 Thread Dino Viehland


Change by Dino Viehland :


--
pull_requests: +17621
pull_request: https://github.com/python/cpython/pull/18241

___
Python tracker 
<https://bugs.python.org/issue39459>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39459] test_import: test_unwritable_module() fails on AMD64 Fedora Stable Clang Installed 3.x

2020-01-27 Thread Dino Viehland


Dino Viehland  added the comment:

I've added the files to the makefile and AMD64 Fedora Stable Clang Installed 
3.x was passing.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue39459>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39459] test_import: test_unwritable_module() fails on AMD64 Fedora Stable Clang Installed 3.x

2020-01-27 Thread Dino Viehland


Change by Dino Viehland :


--
pull_requests: +17590
pull_request: https://github.com/python/cpython/pull/18211

___
Python tracker 
<https://bugs.python.org/issue39459>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39459] test_import: test_unwritable_module() fails on AMD64 Fedora Stable Clang Installed 3.x

2020-01-27 Thread Dino Viehland


Dino Viehland  added the comment:

Ahh, that's probably it Brett, I didn't know that was there, thanks!

--

___
Python tracker 
<https://bugs.python.org/issue39459>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39459] test_import: test_unwritable_module() fails on AMD64 Fedora Stable Clang Installed 3.x

2020-01-27 Thread Dino Viehland


Dino Viehland  added the comment:

The curious thing about this is other tests in CircularImportTests are 
importing packages from test.test_import.data in the exact same way.

--

___
Python tracker 
<https://bugs.python.org/issue39459>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38076] Make struct module PEP-384 compatible

2020-01-24 Thread Dino Viehland


Dino Viehland  added the comment:

One more data point: Backporting this change to Python 3.6 (I just happened to 
have it applied there already, so I haven't tried it on 3.7 or 3.8) has no 
crash and no hangs in multiprocessing on Linux.  So something definitely 
changed in multiproessing which is causing the hang on shutdown, and forces us 
into this code path where we crash as well.

--

___
Python tracker 
<https://bugs.python.org/issue38076>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39336] Immutable module type can't be used as package in custom loader

2020-01-22 Thread Dino Viehland


Change by Dino Viehland :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue39336>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39336] Immutable module type can't be used as package in custom loader

2020-01-22 Thread Dino Viehland


Dino Viehland  added the comment:


New changeset 9b6fec46513006d7b06fcb645cca6e4f5bf7c7b8 by Dino Viehland in 
branch 'master':
bpo-39336: Allow packages to not let their child modules be set on them (#18006)
https://github.com/python/cpython/commit/9b6fec46513006d7b06fcb645cca6e4f5bf7c7b8


--

___
Python tracker 
<https://bugs.python.org/issue39336>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38076] Make struct module PEP-384 compatible

2020-01-22 Thread Dino Viehland


Dino Viehland  added the comment:

With either fix, or with both, on Linux I still see this hang at shutdown.  
Victor mentioned the fact that he had to hit Ctrl-C on Linux to see this, and I 
have to do the same thing. Then with the fixes in place the original test case 
still hangs on shutdown.  

On Python 3.7 (I don't readily have 3.8 available) at least this just runs and 
completes with no ctrl-C and no crashes.  So while either of the fixes may be 
good to prevent the crashes, there's still probably some underlying issue in 
multiprocessing.  I haven't tested on Mac OS/X.

It looks like the clearing was originally introduced here: 
https://bugs.python.org/issue10241  Interestingly there was a similar issue w/ 
_tkinter, which also used PyType_FromSpec (although it sounds like it was just 
a ref count issue on the type).  Unfortunately there's no associated test cases 
added to verify the behavior.  Antoine and Neil are both now on the PR which 
removes the collection behavior so hopefully they can chime in on the safety of 
that fix.

--

___
Python tracker 
<https://bugs.python.org/issue38076>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38076] Make struct module PEP-384 compatible

2020-01-17 Thread Dino Viehland


Dino Viehland  added the comment:

https://github.com/python/cpython/pull/18038 is a partial fix for this.  I 
think it fixes the crash at shutdown, although I'm still seeing a hang on 
master on Linux which is different then earlier versions of Python.  I seem to 
have a really bogus stack trace when I attach to it so I'm not quite certain 
what's going on there.

--

___
Python tracker 
<https://bugs.python.org/issue38076>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38076] Make struct module PEP-384 compatible

2020-01-17 Thread Dino Viehland


Change by Dino Viehland :


--
pull_requests: +17437
stage: resolved -> patch review
pull_request: https://github.com/python/cpython/pull/18038

___
Python tracker 
<https://bugs.python.org/issue38076>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38076] Make struct module PEP-384 compatible

2020-01-16 Thread Dino Viehland


Dino Viehland  added the comment:

And here's a variation which doesn't involve any instances from the module:

import _struct

class C:
def __init__(self):
self.pack = _struct.pack
def __del__(self):
self.pack('I', -42)

_struct.x = C()

--

___
Python tracker 
<https://bugs.python.org/issue38076>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38076] Make struct module PEP-384 compatible

2020-01-16 Thread Dino Viehland


Dino Viehland  added the comment:

This is a relatively simple repro of the underlying problem:

import _struct

s = _struct.Struct('i')

class C:
def __del__(self):
s.pack(42, 100)

_struct.x = C()

It's a little bit different in that it is actually causing the module to 
attempt to throw an exception instead of do a type check.

--

___
Python tracker 
<https://bugs.python.org/issue38076>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38076] Make struct module PEP-384 compatible

2020-01-16 Thread Dino Viehland


Dino Viehland  added the comment:

It seems problematic that_PyInterpreterState_ClearModules runs before all 
instances from a module have been cleared.  If PyState_FindModule is no longer 
able to return module state then there's no way for a module to reliably work 
at shutdown other than having all instances hold onto the module (or module 
state) directly from all of their insatances.  Certainly that would mimic more 
closely what happens w/ pure Python instances and modules - the type will hold 
onto the functions which will hold onto the module global state.

--
nosy: +dino.viehland

___
Python tracker 
<https://bugs.python.org/issue38076>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38076] Make struct module PEP-384 compatible

2020-01-16 Thread Dino Viehland


Change by Dino Viehland :


--
nosy: +eelizondo -dino.viehland

___
Python tracker 
<https://bugs.python.org/issue38076>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39336] Immutable module type can't be used as package in custom loader

2020-01-14 Thread Dino Viehland


Dino Viehland  added the comment:

I think the warning shouldn't be too bad.  It looks like ImportWarnings are 
filtered by default already, and the extra overhead of raising a warning in 
this case probably is nothing compared to the actual work in loading the module.

--

___
Python tracker 
<https://bugs.python.org/issue39336>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39336] Immutable module type can't be used as package in custom loader

2020-01-14 Thread Dino Viehland


Change by Dino Viehland :


--
keywords: +patch
pull_requests: +17406
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/18006

___
Python tracker 
<https://bugs.python.org/issue39336>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39336] Immutable module type can't be used as package in custom loader

2020-01-14 Thread Dino Viehland


Change by Dino Viehland :


--
nosy: +brett.cannon

___
Python tracker 
<https://bugs.python.org/issue39336>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39336] Immutable module type can't be used as package in custom loader

2020-01-14 Thread Dino Viehland


New submission from Dino Viehland :

I'm trying to create a custom module type for a custom loader where the 
returned modules are immutable.  But I'm running into an issue where the 
immutable module type can't be used as a module for a package.  That's because 
the import machinery calls setattr to set the module as an attribute on it's 
parent in _boostrap.py

# Set the module as an attribute on its parent.
parent_module = sys.modules[parent]
setattr(parent_module, name.rpartition('.')[2], module)

I'd be okay if for these immutable module types they simply didn't have their 
children packages published on them.

A simple simulation of this is a package which replaces its self with an object 
which doesn't support adding arbitrary attributes:

x/__init__.py:
import sys

class MyMod(object):
__slots__ = ['__builtins__', '__cached__', '__doc__', '__file__', 
'__loader__', '__name__', '__package__', '__path__', '__spec__']
def __init__(self):
for attr in self.__slots__:
setattr(self, attr, globals()[attr])


sys.modules['x'] = MyMod()

x/y.py:
# Empty file

>>> from x import y
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 983, in _find_and_load
  File "", line 971, in _find_and_load_unlocked
AttributeError: 'MyMod' object has no attribute 'y'

There's a few different options I could see on how this could be supported:
1) Simply handle the attribute error and allow things to continue
2) Add ability for the modules loader to perform the set, and fallback to 
setattr if one isn't available.  Such as:
 getattr(parent_module, 'add_child_module', setattr)(parent_module, 
name.rpartition('.')[2], module)
3) Add the ability for the module type to handle the setattr:
 getattr(type(parent_module), 'add_child_module', 
fallback)(parent_module, 
, name.rpartition('.')[2], module)

--
assignee: dino.viehland
components: Interpreter Core
messages: 36
nosy: dino.viehland
priority: normal
severity: normal
stage: needs patch
status: open
title: Immutable module type can't be used as package in custom loader
type: behavior

___
Python tracker 
<https://bugs.python.org/issue39336>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38500] Provide a way to get/set PyInterpreterState.frame_eval without needing to access interpreter internals

2019-11-01 Thread Dino Viehland


Dino Viehland  added the comment:

Adding the getter/setters seems perfectly reasonable to me, and I agree they 
should be underscore prefixed as well.

--

___
Python tracker 
<https://bugs.python.org/issue38500>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36971] Add subsections in C API "Common Object Structures" page

2019-10-29 Thread Dino Viehland


Dino Viehland  added the comment:

@BTaskaya Seems done, I'll go ahead and close it

--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue36971>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38140] Py_tp_dictoffset / Py_tp_finalize are unsettable in stable API

2019-09-19 Thread Dino Viehland


Change by Dino Viehland :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue38140>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38140] Py_tp_dictoffset / Py_tp_finalize are unsettable in stable API

2019-09-19 Thread Dino Viehland


Dino Viehland  added the comment:


New changeset 3368f3c6ae4140a0883e19350e672fd09c9db616 by Dino Viehland (Eddie 
Elizondo) in branch 'master':
bpo-38140: Make dict and weakref offsets opaque for C heap types (#16076)
https://github.com/python/cpython/commit/3368f3c6ae4140a0883e19350e672fd09c9db616


--

___
Python tracker 
<https://bugs.python.org/issue38140>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38176] test_threading leaked [1, 1, 1] references: test_threads_join

2019-09-15 Thread Dino Viehland


Change by Dino Viehland :


--
stage: patch review -> resolved

___
Python tracker 
<https://bugs.python.org/issue38176>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38176] test_threading leaked [1, 1, 1] references: test_threads_join

2019-09-15 Thread Dino Viehland


Change by Dino Viehland :


--
resolution:  -> fixed
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue38176>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38176] test_threading leaked [1, 1, 1] references: test_threads_join

2019-09-15 Thread Dino Viehland


Change by Dino Viehland :


--
keywords: +patch
pull_requests: +15768
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/16158

___
Python tracker 
<https://bugs.python.org/issue38176>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38176] test_threading leaked [1, 1, 1] references: test_threads_join

2019-09-15 Thread Dino Viehland


Change by Dino Viehland :


--
assignee:  -> dino.viehland
nosy: +dino.viehland

___
Python tracker 
<https://bugs.python.org/issue38176>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38116] Make select module PEP-384 compatible

2019-09-14 Thread Dino Viehland


Dino Viehland  added the comment:


New changeset f919054e539a5c1afde1b31c9fd7a8f5b2313311 by Dino Viehland in 
branch 'master':
bpo-38116: Convert select module to PEP-384 (#15971)
https://github.com/python/cpython/commit/f919054e539a5c1afde1b31c9fd7a8f5b2313311


--

___
Python tracker 
<https://bugs.python.org/issue38116>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38140] Py_tp_dictoffset / Py_tp_finalize are unsettable in stable API

2019-09-14 Thread Dino Viehland


Change by Dino Viehland :


--
nosy: +petr.viktorin

___
Python tracker 
<https://bugs.python.org/issue38140>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38152] AST change introduced tons of reference leaks

2019-09-13 Thread Dino Viehland


Change by Dino Viehland :


--
assignee:  -> dino.viehland
nosy: +dino.viehland

___
Python tracker 
<https://bugs.python.org/issue38152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38138] test_importleak is leaking references

2019-09-12 Thread Dino Viehland


Change by Dino Viehland :


--
keywords: +patch
pull_requests: +15676
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/16053

___
Python tracker 
<https://bugs.python.org/issue38138>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38138] test_importleak is leaking references

2019-09-12 Thread Dino Viehland


Change by Dino Viehland :


--
assignee:  -> dino.viehland

___
Python tracker 
<https://bugs.python.org/issue38138>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38140] Py_tp_dictoffset / Py_tp_finalize are unsettable in stable API

2019-09-12 Thread Dino Viehland


New submission from Dino Viehland :

This makes it impossible to port certain types to the stable ABI and remove 
statics from the interpreter.

--
assignee: dino.viehland
components: Interpreter Core
messages: 352167
nosy: dino.viehland, eric.snow
priority: normal
severity: normal
status: open
title: Py_tp_dictoffset / Py_tp_finalize are unsettable in stable API
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue38140>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38113] Remove statics from ast.c

2019-09-11 Thread Dino Viehland


Change by Dino Viehland :


--
pull_requests: +15608
pull_request: https://github.com/python/cpython/pull/15975

___
Python tracker 
<https://bugs.python.org/issue38113>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38113] Remove statics from ast.c

2019-09-11 Thread Dino Viehland


Dino Viehland  added the comment:

Remove statics to make more compatible with subinterpreters.

--
components: +Interpreter Core -Extension Modules
title: Make ast module PEP-384 compatible -> Remove statics from ast.c

___
Python tracker 
<https://bugs.python.org/issue38113>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   >