Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Steven D'Aprano
On Tue, Mar 07, 2017 at 09:10:00PM +, Elliot Gorokhovsky wrote:

> I think the bigger problem, though, is that most list use does *not*
> involve sorting, 

There are other uses for a type hint apart from sorting. There may be 
optimized versions of other functions (sum, math.fsum, ...) and list 
methods (e.g. count, remove), anything that has to walk the list 
comparing items).

All very hypothetical at the moment, I admit it...


> so it would be a shame to impose the non-trivial overhead
> of type-checking on *all* list use.

You might be right, but my guess is that the overhead isn't quite as big 
as you may think, at least for some operations. The type-check itself is 
just a pointer compare, not an issubclass() operation. The types are 
either identical, or the list is hetrogeneous.

True, `mylist[i] = x` should be a quick assignment into an array of 
pointers, but there's a bounds check to ensure i is within the correct 
range. Other methods are even more expensive: `mylist[i:j] = [a...z]` 
has to move list items around, possibly allocate more memory, and update 
the list length, compared to that the scan of a...z is probably minimal. 
`mylist.append(x)` is *usually* a simple assignment into the array (plus 
updating the list length) but every now and again it triggers a resize, 
which is costly.

I don't think we can predict whether this is a nett gain or loss just 
from first principles.


> Anyway, my patch could always be a precursor to a more general optimization
> along these lines.

Indeed! Even if nothing else comes of this than your patch, thank you!


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Steven D'Aprano
On Tue, Mar 07, 2017 at 08:46:45PM +, Erik wrote:

> I don't think anyone has mentioned this yet, but FWIW I think the 'type 
> hint' may need to be tri-state: heterogeneous (NULL), homogeneous (the 
> pointer to the type structure) and also "unknown" (a sentinel value - 
> the address of a static char or something).

I thought about that and rejected it as an unnecessary complication. 
Hetrogeneous and unknown might as well be the same state: either way, 
you cannot use the homogeneous-type optimization.

Part of the complexity here is that I'd like this flag to be available 
to Python code, not just a hidden internal state of the list.

But my instinct for this could be wrong -- if anyone is interested to do 
some experiments on this, it might be that a three-state flag works out 
better in practice than two-states.


> Otherwise, a delete operation on the list would need to scan the list to 
> work out if it had changed from heterogeneous to homogeneous (unless it 
> was acceptable that once heterogeneous, a list is always considered 
> heterogeneous - i.e., delete always sets the hint to NULL).

In a later email, you corrected this: a delete operation need not touch 
the type-hint (except when the last item is deleted, at which point it 
resets to None/unknown.

With a three-state flag, you can make a three-way decision:

If the flag is Unknown:
- do a O(N) scan to determine whether the list is homogeneous 
  or hetrogeneous, then choose the optimized or unoptimized
  routine;

if the flag is Hetrogeneous:
- there is no point doing a scan, so always choose the 
  unoptimized routine;

if the flag is a type:
- the list is definitely homogeneous, so depending on the
  type, you may be able to choose an optimized rountine.


Compared to that, a two-state flag misses some opportunities to run 
the optimized routine:

- list contains [1, 2, 3, 4, "foo"] so the hint is set to None;
- delete the last item;
- list is now homogeneous but the hint is still None.

But also avoids bothering with an O(N) scan in some situations where 
the list really is hetrogeneous. So there's both an opportunity cost and 
a benefit.

There may be other opportunities to do the scan, such as when the 
underlying pre-allocated array resizes, so even with the two-state flag, 
"unknown" need not stay unknown forever.


> I'd prefer the sort optimization to be based on what my list contains 
> NOW, not on what it may have contained some time in the past, 

Remember, we're talking about opportunities for applying an optimization 
here, nothing more. You're not giving up anything: at worst, the 
ordinary, unoptimized routine will run and you're no worse off than you 
are today.


> so I'm not 
> a fan of the "once heterogeneous, always considered heterogeneous" 
> behaviour if it's cheap enough to avoid it.

It is not just a matter of the cost of tracking three states versus two. 
It is a matter of the complexity of the interface.

I suppose this could be reported to Python code as None, False or the 
type. Although, I don't know about you, but I know I'll never be able to 
remember whether None means Unknown and False means Hetrogeneous, or the 
other way around.

Ultimately, this is all very pie-in-the-sky unless somebody tests just 
how expensive this is and whether the benefit is worthwhile. That's not 
going to be me: I'm not able to hack on the list C code.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread David Mertz
On Tue, Mar 7, 2017 at 4:36 PM, Erik  wrote:

> * listextend() - this should do the right thing with the type hint when
>> extending one list with another.
>>
>
> * Several other methods ('contains', 'remove', 'count', 'index') also use
> PyObject_RichCompareBool(). They could also presumably benefit from the
> same optimisation (perhaps it's not all about sort() - perhaps this gives a
> little more weight to the idea).


Good point about list.extend().  I don't think __type_hint__ could help
with .__contains__() or .count() or .remove().  E.g.:

In [7]: lst = [1.0, 2.0, 1+0j, F(1,1)]
In [8]: from fractions import Fraction as F
In [9]: lst = [1.0, 2.0, 1+0j, F(1,1)]
In [10]: 1 in lst
Out[10]: True
In [11]: lst.count(1)
Out[11]: 3
In [12]: l.index(1)
Out[12]: 0


The list has absolutely nothing of the right type.  Yet it contains an
item, counts things that are equal, finds a position for an equal item.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Elliot Gorokhovsky
On Tue, Mar 7, 2017 at 1:47 PM Erik  wrote:

>
> I'd prefer the sort optimization to be based on what my list contains
> NOW, not on what it may have contained some time in the past, so I'm not
> a fan of the "once heterogeneous, always considered heterogeneous"
> behaviour if it's cheap enough to avoid it.
>

Sure. Dictionaries actually don't implement this, though: as soon as they
see a non-string key, they permanently switch to a heterogeneous state
(IIRC).

I think the bigger problem, though, is that most list use does *not*
involve sorting, so it would be a shame to impose the non-trivial overhead
of type-checking on *all* list use. With dictionaries, you have to
type-check inserts anyway (for equality testing), so it's less of a
problem.  But the fundamental list operations *don't* require type-checking
currently, so why add it in? In practice, the pre-sort check is *very*
cheap, and its cost is only imposed on sort usage.

Anyway, my patch could always be a precursor to a more general optimization
along these lines. I'm almost finished fixing the problem Tim identified
earlier in this thread; after that, it'll be ready for review!
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Erik

On 07/03/17 20:46, Erik wrote:

(unless it
was acceptable that once heterogeneous, a list is always considered
heterogeneous - i.e., delete always sets the hint to NULL).


Rubbish. I meant that delete would not touch the hint at all.

E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Wrapper for ctypes

2017-03-07 Thread Ryan Gonzalez
Ever looked up cffi? You won't be disappointed.

--
Ryan (ライアン)
Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else
http://refi64.com

On Mar 7, 2017 3:43 AM, "George Fischhof"  wrote:

> Hi Guys,
>
> right now I had to call functions from a dll, and I started using ctypes.
>
>
> I found this library too
>
> https://pypi.python.org/pypi/pywrap/0.1.0
>
> which says (qutation):
>
> Replace this:
>
> prototype = ctypes.WINFUNCTYPE(wintypes.HANDLE, wintypes.UINT, 
> wintypes.HANDLE)paramflags = (1, "uFormat"), (1, "hMem")SetClipboardData = 
> prototype(("SetClipboardData", user32), paramflags)SetClipboardData.errcheck 
> = null_errcheck
>
> With this:
>
> SetClipboardData = pywrap.wrap_winapi(name="SetClipboardData",
> library=user32,
> restype=wintypes.BOOL,
> params=[
> Parameter("uFormat", 
> wintypes.UINT),
> Parameter("hMem", wintypes.HANDLE)
> ],
> errcheck=null_errcheck)
>
>
>
>  (end qutation)
>
> My idea: something like this library should be implemented in Python to make 
> it more simple to use ctypes.
>
> BR,
>
> George
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Wrapper for ctypes

2017-03-07 Thread Stephan Houben
I have in the past written a small utility to use compact signatures like:

i i -> i

to indicate a function taking two ints and returning an int.

See for example:
https://github.com/stephanh42/armasm


Op 7 mrt. 2017 10:43 a.m. schreef "George Fischhof" :

> Hi Guys,
>
> right now I had to call functions from a dll, and I started using ctypes.
>
>
> I found this library too
>
> https://pypi.python.org/pypi/pywrap/0.1.0
>
> which says (qutation):
>
> Replace this:
>
> prototype = ctypes.WINFUNCTYPE(wintypes.HANDLE, wintypes.UINT, 
> wintypes.HANDLE)paramflags = (1, "uFormat"), (1, "hMem")SetClipboardData = 
> prototype(("SetClipboardData", user32), paramflags)SetClipboardData.errcheck 
> = null_errcheck
>
> With this:
>
> SetClipboardData = pywrap.wrap_winapi(name="SetClipboardData",
> library=user32,
> restype=wintypes.BOOL,
> params=[
> Parameter("uFormat", 
> wintypes.UINT),
> Parameter("hMem", wintypes.HANDLE)
> ],
> errcheck=null_errcheck)
>
>
>
>  (end qutation)
>
> My idea: something like this library should be implemented in Python to make 
> it more simple to use ctypes.
>
> BR,
>
> George
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/