Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Chris Barker
On Wed, Mar 8, 2017 at 2:58 PM, Elliot Gorokhovsky <
elliot.gorokhov...@gmail.com> wrote:

>
> The whole point of my patch is that we do O(nlogn) compares, but only have
> O(n) elements, so it's much cheaper to do all the type checks in advance,
>


>  I mean, practically speaking, the advance check is basically free. The
> compare-time checks, in sum, are not, both asymptotically and practically.
>

hmm -- I know folks like to say that "the constants don't matter", but it
seems they do in this case:

without pre-checking:

O(nlogn)

with pre-checking If homogenous:

O(n) + O(nlogn)

so the pre-checking only adds if you ignore the constants..

But I'm assuming (particularly with locality and branch prediction and all
that included) the constant to type-check is much smaller than the constant
to compare two unknown types, so:

TC*n + KC*n*logn

vs

UC*n*logn

where:
TC -- Constant to type check
KC -- Constant known compare
UC -- Constant unknown type check

So if UC > TC/logn + KC

Then this optimization make sense.

If UC >KC and UC >>> TC, then this all works out.

But if n is HUGE, it may end up being slower (maybe more so with cache
locality???)

Which is why you need to profile all this carefully.

So far Elliott's experiments seem to show it works out well.

Which doesn't surprise me for built-ins like float and int that have a
native compare, but apparently also for more complex types? How about
custom classes?

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] dict(default=int)

2017-03-08 Thread Dan Sommers
On Thu, 09 Mar 2017 09:43:48 +1100, Chris Angelico wrote:

> On Thu, Mar 9, 2017 at 9:39 AM, Brice PARENT wrote:

>> But a possible workaround, is if we used the first positional
>> argument of dict() as the default value [...]

> ... Granted, there aren't going to be very many objects that are both
> callable and iterable ...

Many stream-like objects are both callable (get the next element of the
stream) and iterable (iterate through the remainder of the stream).  Or
at least that's the way I make mine.  Sometimes.

> Safer to keep this out of the signature of dict itself.

Agreed.

Dan

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Elliot Gorokhovsky
On Wed, Mar 8, 2017 at 2:14 PM Barry  wrote:

> Can you assume that list of of type(list[0]) and use that type's optimised
> sort?
> But in the optimised sort code check that the types are as required.
> If you hit an element that is not of the required type then fall back to
> the unoptimised sort.
>

Well, how would you tell if you've hit an element that is not of the
required type? You'd have to check during every compare, right? And that's
precisely what we're trying to avoid!

The whole point of my patch is that we do O(nlogn) compares, but only have
O(n) elements, so it's much cheaper to do all the type checks in advance,
and in the very likely case that our list is homogeneous, switch to an
optimized special-case compare function.

Even when we only do O(n) compares, my patch is still much faster (see
benchmarks higher up in this thread). Why? Well, if you're doing the type
checks during the compares, you're doing them across different function
calls, with other stuff interspersed in between. So pipeline/branch
prediction/caching is less able to optimize away the overhead of the safety
checks (I don't know how CPUs work, but one of those is relevant here).
With the pre-sort check, it's all in a single iteration through the list,
and we're taking the same exact branch every time; it's much faster. Think
iterating over a matrix row-wise versus iterating column-wise (not exactly
relevant here since that's about cache, but that's the idea. Again, I don't
really know how CPUs work).

So in short: no, we can't just check as we go, because that's what we're
already doing. But we can optimize significantly by checking in advance. I
mean, practically speaking, the advance check is basically free. The
compare-time checks, in sum, are not, both asymptotically and practically.

Best
Elliot
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] dict(default=int)

2017-03-08 Thread Eric V. Smith

On 3/8/2017 5:43 PM, Chris Angelico wrote:

On Thu, Mar 9, 2017 at 9:39 AM, Brice PARENT  wrote:

But a possible workaround, is if we used the first positional argument of
dict() as the default value. As right now it doesn't accept positional
arguments (or at least if they are not iterable, which complicates a bit the
thing), we could allow a syntax like :
d = dict([default, ][*args, ]**kwargs)
where default is a callable, *args made of iterables, and kwargs any kwargs.


There'd still be a pile of special cases. Granted, there aren't going
to be very many objects that are both callable and iterable, but there
certainly _can be_, and if one were passed as the first argument, it
would be ambiguous.

Safer to keep this out of the signature of dict itself.


If we really want to make defaultdict feel more "builtin" (and I don't 
see any reason to do so), I'd suggest adding a factory function:


dict.defaultdict(int)

Similar in spirit to dict.fromkeys(), except of course returning a 
defauldict, not a dict.


Eric.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] dict(default=int)

2017-03-08 Thread Chris Angelico
On Thu, Mar 9, 2017 at 9:39 AM, Brice PARENT  wrote:
> But a possible workaround, is if we used the first positional argument of
> dict() as the default value. As right now it doesn't accept positional
> arguments (or at least if they are not iterable, which complicates a bit the
> thing), we could allow a syntax like :
> d = dict([default, ][*args, ]**kwargs)
> where default is a callable, *args made of iterables, and kwargs any kwargs.

There'd still be a pile of special cases. Granted, there aren't going
to be very many objects that are both callable and iterable, but there
certainly _can be_, and if one were passed as the first argument, it
would be ambiguous.

Safer to keep this out of the signature of dict itself.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] dict(default=int)

2017-03-08 Thread Chris Angelico
On Thu, Mar 9, 2017 at 9:23 AM, Brice PARENT  wrote:
> Would those 2 dicts be equal ?
> d1 = dict(default=5)
> d2 = {'default': 5}

Easy to find out:

>>> d1 = dict(default=5)
>>> d2 = {'default': 5}
>>> d1 == d2
True

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] dict(default=int)

2017-03-08 Thread Mahmoud Hashemi
That's already valid dict syntax.

>>> dict(default=int)
{'default': }

Generally that in itself makes this a no go.

Mahmoud

On Wed, Mar 8, 2017 at 2:09 PM, Steven Piantadosi 
wrote:

> Hi All,
>
> I find importing defaultdict from collections to be clunky and it seems
> like having a default should just be an optional keyword to dict. Thus,
> something like,
>
> d = dict(default=int)
>
> would be the same as
>
> from collections import defaultdict
> d = defaultdict(int)
>
> Any thoughts?
>
> Thanks,
>
> ++Steve
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Erik

On 08/03/17 11:07, Steven D'Aprano wrote:

I mentioned earlier that I have code which has to track the type of list
items, and swaps to a different algorithm when the types are not all the
same.


Hmmm. Yes, I guess if the expensive version requires a lot of 
isinstance() messing or similar for each element then it could be better 
to have optimized versions for homogeneous lists of ints or strings etc.



A list.is_heterogeneous() method
could be implemented if it was necessary,


I would prefer to get the list item's type:

if mylist.__type_hint__ is float:


If you know the list is homogeneous then the item's type is 
"type(mylist[0])".


Also, having it be a function call gives an obvious place to put the 
transition from "unknown" to known state if the tri-state hint approach 
was taken. Otherwise, that would have to be hooked into the attribute 
access somehow.


That's for someone who wants to try implementing it to decide and 
propose though :)


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: making __str__ count in time's class

2017-03-08 Thread Ethan Furman

On 03/08/2017 08:01 AM, Francesco Franchina wrote:


Before expressing my thought and proposal, I want to make sure we all agree on 
a simple and clear fact:
the __str__ magic method is used to give a literal and human-readable 
representation to the object (unlike __repr__).


If __str__ has not been defined, then __repr__ is used instead.


time.struct_time(tm_year=2017, tm_mon=3, tm_mday=8, tm_hour=16, tm_min=6, 
tm_sec=16, tm_wday=2, tm_yday=67, tm_isdst=0)


Which is what that looks like.

--
~Ethan~

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: making __str__ count in time's class

2017-03-08 Thread Barry Scott

> On 8 Mar 2017, at 16:01, Francesco Franchina  wrote:
> 
> Hello everyone,
> 
> I'm shortly writing to you about a reflection I lately made upon the current 
> functioning of __str__ for the time's class.
> 
> Before expressing my thought and proposal, I want to make sure we all agree 
> on a simple and clear fact:
> the __str__ magic method is used to give a literal and human-readable 
> representation to the object (unlike __repr__).
> 
> Generally this is true across the python panorama. It's not true for the time 
> class, for example.
> 
> >>> import time
> >>> a = time.localtime()
> >>> a.__str__()
> 'time.struct_time(tm_year=2017, tm_mon=3, tm_mday=8, tm_hour=16, tm_min=6, 
> tm_sec=16, tm_wday=2, tm_yday=67, tm_isdst=0)'
> 
> Well, don't get me wrong: the main aim of the __str__ method has been 
> accomplished but, imho, not in the most pythonic way.
> 
> I just wanted to ask you: what do you think about re-writing the __str__ of 
> the time class so it would return something like
> ISO 8601 [https://en.wikipedia.org/wiki/ISO_8601 
> ] format? Wouldn't it be more 
> meaningful? Especially in the JS-everywhere-era
> it could be more more productive.
> 
> 
> TL;DR
> __str__ for dates should return a human-readable date format (eg: 
> https://en.wikipedia.org/wiki/ISO_8601 
> )

Just use datetime module instead of time?

>>>  datetime.datetime.now().isoformat()
'2017-03-08T16:14:58.448801'

Barry

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Proposal: making __str__ count in time's class

2017-03-08 Thread Francesco Franchina
Hello everyone,

I'm shortly writing to you about a reflection I lately made upon the
current functioning of __str__ for the time's class.

Before expressing my thought and proposal, I want to make sure we all agree
on a simple and clear fact:
the __str__ magic method is used to give a literal and human-readable
representation to the object (unlike __repr__).

Generally this is true across the python panorama. It's not true for the
time class, for example.




*>>> import time>>> a = time.localtime()>>>
a.__str__()'time.struct_time(tm_year=2017, tm_mon=3, tm_mday=8, tm_hour=16,
tm_min=6, tm_sec=16, tm_wday=2, tm_yday=67, tm_isdst=0)'*

Well, don't get me wrong: the main aim of the __str__ method has been
accomplished but, imho, not in the most pythonic way.

I just wanted to ask you: what do you think about re-writing the __str__ of
the time class so it would return something like
ISO 8601 [https://en.wikipedia.org/wiki/ISO_8601] format? Wouldn't it be
more meaningful? Especially in the JS-everywhere-era
it could be more more productive.


*TL;DR*
__str__ for dates should return a human-readable date format (eg:
https://en.wikipedia.org/wiki/ISO_8601)


I'm waiting for your opinions.
Thank you for your time and ideas!

Francesco Franchina
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Steven D'Aprano
On Wed, Mar 08, 2017 at 01:20:19AM +, Erik wrote:

> >Part of the complexity here is that I'd like this flag to be available
> >to Python code, not just a hidden internal state of the list.
> 
> Out of interest, for what purpose? Generally, I thought Python code 
> should not need to worry about low-level optimisations such as this 
> (which are C-Python specific AIUI). 

I mentioned earlier that I have code which has to track the type of list 
items, and swaps to a different algorithm when the types are not all the 
same. In practice, I just run the "mixed types" algorithm regardless, 
because the cost of doing a scan of the items in pure Python is too 
expensive relative to the time saved. Moving some of the work into the C 
infrastructure might change that.

I'm not completely sure that this would, in fact, be useful to me, but 
I'd like the opportunity to experiment. I could try using a list 
subclass, but again, the cost of doing the type-checking in Python 
instead of C is (I think) prohibitive. Nevertheless, I understand that 
the burden of proving the usefulness of this is on me. (Or anyone else 
that wants to argue the case.)


> A list.is_heterogeneous() method 
> could be implemented if it was necessary, but how would that be used?

I would prefer to get the list item's type:

if mylist.__type_hint__ is float:
# run optimized float version
...
elif mylist.__type_hint__ is int:
...
else:
# unoptimized version




> >But also avoids bothering with an O(N) scan in some situations where
> >the list really is hetrogeneous. So there's both an opportunity cost and
> >a benefit.
> 
> O(N) is worst case.

It is also the best and average case.

Given a homogenous list of N items, for some arbitrary N between 0 and 
∞, you have to look at all N items before knowing that they're all the 
same type. So for the homogenous case, the best, worst and average are 
identically O(N).

Given a hetrogeneous list of N items where the first difference is 
found at index K, K can range from 1 through N-1. (By definition, it 
cannot be at index 0: you can only detect a difference in types after 
checking *two* items.) The worst case is that you don't find the 
difference until the last item, which is O(N). The best case is that you 
find the difference in position 1 and bail out early. On average, you 
will find the first difference halfway through the list, which makes it 
O(N/2) but that's just O(N). (Constant factors don't matter.)

If you want to call that O(1) for the best hetrogeneous case, I won't 
argue except to say that's rather against the spirit of Big Oh analysis 
in my opinion. I think it's more realistic to say its O(N) across all 
combinations of best/worst/average and homogeneous/hetrogeneous. But of 
course if you bail out early, the constant multiplier may differ.



> Most of the anecdotal evidence in this thread so far seems to suggest 
> that heterogeneous lists are not common. May or may not be true. 
> Empirically, for me, it is true. Who knows? (and there is the question).

That will strongly depend on where the data is coming from, but when it 
comes to sorting, randomly mixed types will usually fail since 
comparisons between different types are generally illegal:

py> [1, "a"].sort()
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unorderable types: str() < int()



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/