[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

2019-06-07 Thread Tim Peters
To be clearer, while knowing the size of allocated objects may be of
some use to some other allocators, "not really" for obmalloc.  That
one does small objects by itself in a uniform way, and punts
everything else to the system malloc family.  The _only_ thing it
wants to know on a free/realloc is which of those two ways delivered
the address to begin with.

Knowing the original size would be an indirect way to accomplish that,
but it really only needs 1 bit of info.  If you're using a radix tree
to implement - in effect - a dict mapping addresses to "stuff", 1 flag
bit in the "stuff" would be ideal for that.  If you haven't already, I
assume you'll soon come around to believe you really should be
tracking the addresses of all gc'ed objects (not just the "small"
ones).

> If the above idea works, we know the object size at free() and realloc(), we 
> don't
> need address_in_range() for those code paths.

Two things:  first, a single bit would not only be sufficient, it
would be _better_ than knowing the object size.  If the bit says the
object came from the system, the size is of no use at all.  It the bit
says it came from obmalloc, what's _really_ wanted is the index of the
object's "size class" in the vector of free lists, and that's already
directly available in the object's pool header (various parts of which
have to be read up &/or updated regardless).  Second, if that bit were
available, `address_in_range()` could be thrown away - obmalloc's free
and realloc entries are the only places it's used (or is of any
conceivable use to obmalloc).

For the current obmalloc, I have in mind a different way (briefly. let
s pool span a number of 4K pages, but teach pools about the page
structure so that address_in_range() continues to work after trivial
changes (read the arena index from the containing page's start rather
than from the containing pool's)).  Not ideal, but looks remarkably
easy to do, and captures the important part (more objects in a pool ->
more times obmalloc can remain in its fastest "all within the pool"
paths).

> My gut feeling is that the prev/next pointer updates done by
> move_unreachable() and similar functions must be quite expensive.
> Doing the traversal with an explicit stack is a lot less elegant but
> I think it should be faster.  At least, when you are dealing with a
> big set of GC objects that don't fit in the CPU cache.

At the start, it was the _reachable_ objects that were moved out of
the collected generation list rather than the unreachable objects.
Since it's (in all my experience) almost all objects that survive
almost all collections in almost all programs, that was an enormous
amount of moving overall.  So long I ago I rewrote that code to move
the _un_reachable objects instead.

Debug output showed countless billions of pointer updates saved across
the programs I tried at the time, but the timing difference was in the
noise.

Probably a big part of "the problem":  when collecting the oldest
generation in a program with "a lot" of objects, no approach at all
will "fit in cache".  We have to crawl the entire object graph then.
Note that the test case in the parent thread had over a billion class
instances, and it was way whittled down(!) from the OP's real program.

But I don't _think_ that's typical quite yet ;-) , and:

> You are likely correct. I'm hoping to benchmark the radix tree idea.
> I'm not too far from having it working such that it can replace
> address_in_range().  Maybe allocating gc_refs as a block would
> offset the radix tree cost vs address_in_range().

I certainly agree gc_refs is the big-bang-for-the-buck thing to
target.  There's no way to avoid mutating that bunches and bunches of
times, even in small programs, and reducing the number of dirtied
cache lines due to that "should" pay off.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3DPAR4PPPM3Z675ELBGYIJ74INI6PL3X/


[Python-Dev] Re: python-ideas and python-dev migrated to Mailman 3/HyperKitty

2019-06-07 Thread Ned Deily
On Jun 7, 2019, at 18:03, Victor Stinner  wrote:
> I am not sure that we are good at archiving.

I'm not sure what this has to do with mailing list URLs but ...

> Example with Subversion links in the bug tracker:
> 
> https://bugs.python.org/issue2001#msg123254
> 
> "Added the missing CSS file in r86971."
> 
> The revision link is:
> https://hg.python.org/lookup/r86971
> 
> Which redirects to the following HTTP 404 (not found) error:
> http://svn.python.org/view?view=revision=86971

This worked for mamy years.  I believe the issue is that the process relied on 
the old Subversion source web viewer which has more recently been retired.

> We have a mapping of Subversion commits to... Mercurial commits, but the 
> Mercurial server doesn't use it.
> 
> Then bugs will migrate to GitHub. I already expect that we will loose some 
> data in the long term...
> 
> I commonly have to dig into Python history to understand the rationale of a 
> change made 10 years ago, if not 15 or 20 years ago. It is becoming more and 
> more painful at each migration...


There is an open issue about this that appears to have stalled:

https://github.com/python/core-workflow/issues/12

Perhaps we can interest someone in picking it up.  That would be a big help for 
everyone.

--
  Ned Deily
  n...@python.org -- []
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/O4OMR5U4TKGEFERMFIVZGTDLU4EW6MOI/


[Python-Dev] Re: python-ideas and python-dev migrated to Mailman 3/HyperKitty

2019-06-07 Thread Victor Stinner
I am not sure that we are good at archiving. Example with Subversion links
in the bug tracker:

https://bugs.python.org/issue2001#msg123254

"Added the missing CSS file in r86971."

The revision link is:
https://hg.python.org/lookup/r86971

Which redirects to the following HTTP 404 (not found) error:
http://svn.python.org/view?view=revision=86971

We have a mapping of Subversion commits to... Mercurial commits, but the
Mercurial server doesn't use it.

Then bugs will migrate to GitHub. I already expect that we will loose some
data in the long term...

I commonly have to dig into Python history to understand the rationale of a
change made 10 years ago, if not 15 or 20 years ago. It is becoming more
and more painful at each migration...

Victor

-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MEAXMKDSMCIOK36BB2VC6NM2NSBAHMPZ/


[Python-Dev] Re: python-ideas and python-dev migrated to Mailman 3/HyperKitty

2019-06-07 Thread Gregory P. Smith
On Thu, Jun 6, 2019 at 6:19 AM Victor Stinner  wrote:

> Le jeu. 6 juin 2019 à 14:18, Steven D'Aprano  a
> écrit :
> > i.e. 25-40% longer. Is there a shorter permalink form available, like
> > goo.gl, bitly, youtu.be etc use? That would be awesome if we could use
> > them instead.
>
> I really dislike URL shorteners.
>
> From my point of view, URL shorteners are the opposite of permanent
> links. It adds a new single point of failure.
>

Unless you run your own shortener... operated it part of the infrastructure
the shortened links will be going to.  In which case the shortened url
service prevents the need to modify each and every application to handle
shorter things on its own.  That way short urls can be made meaningful on
their own to indicate where they link to while still being a friendly to
avoid gross wrapping length.

We could implement our own automatic short namespaces such as
https://url.python.org/mail/ideas/48+bit-random-id

goo.gl and youtu.be are examples of site specific url shorteners already
used this way (now that goo.gl has stopped / is stopping offering itself as
a public bit.ly clone service)

This also avoids one (of many) ways of giving traffic log stats away to a
third party hosting the service.

-gps


> If a full URL becomes broken (HTTP error 404), I can still try to get
> find the information in a different way using the full URL. For
> example, there are some services like archive.org who archive public
> websites. Or look in other emails "around" this email, maybe the
> missing email is quoted from another email.
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> Python-Dev mailing list -- python-dev(a)python.org
> To unsubscribe send an email to python-dev-leave(a)python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BBEKSKJCNTUK4LMH435UAZOHZ4KXTP3D/


[Python-Dev] Re: python-ideas and python-dev migrated to Mailman 3/HyperKitty

2019-06-07 Thread Abhilash Raj
Wes Turner wrote:
> Thanks for getting these upgraded. IMHO, being able to copy URLs from list
> message footers as references in e.g. issues will be a great boost in
> productivity.

This is possible to do using "$hyperkitty_url" in the message footer. You can
request the list owners to add that.

> 
> On Friday, June 7, 2019, Stephen J. Turnbull <
> turnbull.stephen.fw(a)u.tsukuba.ac.jp wrote:
> 
> >   Barry Warsaw writes:
> >   On Jun 6, 2019, at 09:15, David Mertz
> >  >  >
> >  > The old URL is definitely a lot friendlier, even apart from the  length.
> >  
> >  Unfortunately, the old URLs aren’t really permanent. 
> >  True.  That could be addressed in theory, but it would be fragile (ie,
> >  vulnerable to loss or corruption of the external database mapping
> >  messages to URLs).  Calculating from the message itself means that if
> >  you have the message you can always get where you want to go.
> > 
> >   The new URLs are guaranteed to be reproducible
> > from the original
> >  message source.  The downside is that they are less friendly. 
> >  They could, however be made more friendly than they currently are.
> >  There's no reason (in principle, of course it requires changing code
> >  and the DNS) why your message, currently given the Archived-At URL
> > 
> >  https://mail.python.org/archives/list/python-dev@python.org/message/
> >  EFHTPGCSB5VZSRS3DDXZN6ETYP5H6NDS/
> > 
> >  couldn't be given (A is for Archives)
> > 
> >  https://a.python.org/python-dev@python.org/EFHTPGCSB5VZSRS3DDXZN6ETYP5H6N
> >  DS/
> > 
> >  which gets it down to an RFC-conformant 76 characters. ;-)  Of course
> >  many lists would overflow that, and I agree with David that
> > 
> >  https://a.python.org/python-dev@python.org/2019/06/
> >  EFHTPGCSB5VZSRS3DDXZN6ETYP5H6NDS/
> > 
> >  would be better still.  Although the risk of collision would be orders
> >  of magnitude higher (the date buys us some leeway but not much, we
> >  could make the ID-Hash be 2019/06/B5VZSRS3DDXZN6ET (arbitrarily chose
> >  middle 16), giving 
> 
> >  
> > https://a.python.org/python-dev@python.org/2019/06/B5VZSRS3DDXZN6ET
> > 
> >  (67 characters, allowing a few more characters for domain names and/or
> >  list names -- note with the current scheme, a domain name which is 1
> >  character longer probably uses up two more characters of space). 
> 
> Are these message IDs or hashes?
> Do they have to be (is this) base-36?
> Could they instead be base-62? (26+10+26)
> 
> 
> >  
> >  None of this is very attractive to me, for reasons I will go into on
> >  Mailman-Developers or gitlab.com/mailman/mailman/issues if you want to
> >  file one.  Briefly, people who want bit.ly-length short URLs won't be
> >  satisfied, and the proposed URLs are more useful but still ugly. 
> 
> We shouldn't just drop extra date information from the URL and only lookup
> by the messageid unless we add a redirect to the correct dated URL; because
> caching and trickery.
> 
> 
> >  
> >  Personally I think we should all just switch to RestructuredText- and
> >  Markdown-capable MUAs, and kill off both ugly visible URLs and HTML
> >  email with one big ol' rock. 
> 
> While I personally prefer .rst and .md, hovering over URL anchor text takes
> unnecessary time (and I'll remember whether I've been to the actual
> http://URL, but not 'here' and 'there').
> So I'm fine with ridiculous, preposterous long links (even in the middle of
> the email; without footnotes to scroll back and forth to)
> 
> 
> >  
> >  Steve
> >  ___
> >  Python-Dev mailing list -- python-dev(a)python.org
> >  To unsubscribe send an email to python-dev-leave(a)python.org
> >  https://mail.python.org/mailman3/lists/python-dev.python.org/
> >  Message archived at https://mail.python.org/archives/list/python-dev@
> >  python.org/message/O3T27UUHKKXATOPJT4KEQHREUGYVMELV/
> >
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BIGAWAPAQZXQ4B6NGDA34VYKUS3ULC3N/


[Python-Dev] Re: Using vectorcall for tp_new and tp_init

2019-06-07 Thread Terry Reedy

On 6/7/2019 6:41 AM, Jeroen Demeyer wrote:

Hello,

I'm starting this thread to brainstorm for using vectorcall to speed up 
creating instances of Python classes.


Currently the following happens when creating an instance of a Python 
class X using X(.) and assuming that __new__ and __init__ are Python 
functions and that the metaclass of X is simply "type":


1. type_call (the tp_call wrapper for type) is invoked with arguments 
(X, args, kwargs).


2. type_call calls slot_tp_new with arguments (X, args, kwargs).

3. slot_tp_new calls X.__new__, prepending X to the args tuple. A new 
object obj is returned.


4. type_call calls slot_tp_init with arguments (obj, args, kwargs).

5. slot_tp_init calls type(obj).__init__ method, prepending obj to the 
args tuple. A new object obj is returned.


My understanding is that the argument obj is just mutated, which is one 
reason why a separate __new__ is needed.



In the worst case, no less than 6 temporary objects are needed just to 
pass arguments around:


1. An args tuple and kwargs dict for tp_call

3. An args array with X prepended and a kwnames tuple for __new__

5. An args array with obj prepended and a kwnames tuple for __init__

This is clearly not as efficient as it could be.

An obvious solution would be to introduce variants of tp_new and tp_init 
using the vectorcall protocol. Assuming PY_VECTORCALL_ARGUMENTS_OFFSET 
is used, all 6 temporary allocations could be dropped. The 
implementation could be in the form of two new slots tp_vector_new and 
tp_vector_init. Since we're just dealing with type slots here (as 
opposed to offsets in an object structure), this should be easier to 
implement than PEP 590 itself.



--
Terry Jan Reedy
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZU5FWY7G25IS725WDOEWT5KRAS5T5YQS/


[Python-Dev] Summary of Python tracker Issues

2019-06-07 Thread Python tracker

ACTIVITY SUMMARY (2019-05-31 - 2019-06-07)
Python tracker at https://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open7007 (-21)
  closed 41943 (+104)
  total  48950 (+83)

Open issues with patches: 2817 


Issues opened (51)
==

#21492: email.header.decode_header sometimes returns bytes, sometimes 
https://bugs.python.org/issue21492  reopened by ezio.melotti

#32912: Raise non-silent warning for invalid escape sequences
https://bugs.python.org/issue32912  reopened by rhettinger

#35621: asyncio.create_subprocess_exec() only works with main event lo
https://bugs.python.org/issue35621  reopened by koobs

#36818: Add PyInterpreterState.runtime.
https://bugs.python.org/issue36818  reopened by vstinner

#36993: zipfile: tuple IndexError on extract
https://bugs.python.org/issue36993  reopened by berker.peksag

#37111: Logging - Inconsistent behaviour when handling unicode
https://bugs.python.org/issue37111  reopened by jonathan-lp

#37120: Provide knobs to disable session ticket generation on TLS 1.3
https://bugs.python.org/issue37120  opened by njs

#37123: test_multiprocessing fails randomly on Windows
https://bugs.python.org/issue37123  opened by pablogsal

#37124: test_msilib is potentially leaking references and memory block
https://bugs.python.org/issue37124  opened by pablogsal

#37127: Handling pending calls during runtime finalization may cause p
https://bugs.python.org/issue37127  opened by eric.snow

#37129: Add RWF_APPEND flag
https://bugs.python.org/issue37129  opened by bezoka

#37130: pathlib.with_name() doesn't like unnamed files.
https://bugs.python.org/issue37130  opened by Nophke

#37133: Erro "ffi.h: No such file" when build python 3.8 (branch maste
https://bugs.python.org/issue37133  opened by heckad

#37138: PEP 590 method_vectorcall calls memcpy with NULL src
https://bugs.python.org/issue37138  opened by gregory.p.smith

#37140: ctypes change made clang fail to build
https://bugs.python.org/issue37140  opened by petr.viktorin

#37141: Allow multiple separators in Stream.readuntil
https://bugs.python.org/issue37141  opened by bmerry

#37144: tarfile.open: improper handling of path-like object
https://bugs.python.org/issue37144  opened by dm

#37146: opcode cache for LOAD_GLOBAL emits false alarm in memory leak 
https://bugs.python.org/issue37146  opened by vstinner

#37149: link to official documentation tkinter failed  !!!
https://bugs.python.org/issue37149  opened by xameridu

#37150: Do not allow to pass FileType class object instead of instance
https://bugs.python.org/issue37150  opened by zygocephalus

#37151: Calling code cleanup after PEP 590
https://bugs.python.org/issue37151  opened by jdemeyer

#37154: test_utf8_mode: test_env_var() fails on AMD64 Fedora Rawhide C
https://bugs.python.org/issue37154  opened by vstinner

#37155: test_asyncio: test_stdin_broken_pipe() failed on AMD64 FreeBSD
https://bugs.python.org/issue37155  opened by vstinner

#37157: shutil: add reflink=False to file copy functions to control cl
https://bugs.python.org/issue37157  opened by vstinner

#37159: Use copy_file_range() in shutil.copyfile() (server-side copy)
https://bugs.python.org/issue37159  opened by giampaolo.rodola

#37160: thread native id netbsd support
https://bugs.python.org/issue37160  opened by David Carlier

#37161: Pre-populate user editable text in input()
https://bugs.python.org/issue37161  opened by steven.daprano

#37163: dataclasses.replace() fails with the field named "obj"
https://bugs.python.org/issue37163  opened by serhiy.storchaka

#37166: inspect.findsource doesn't handle shortened files gracefully
https://bugs.python.org/issue37166  opened by thatch

#37168: Decimal divisions sometimes 10x or 100x too large
https://bugs.python.org/issue37168  opened by Phil Frost

#37172: Odd error awating a Future
https://bugs.python.org/issue37172  opened by Dima.Tisnek

#37173: inspect.getfile error names module instead of passed class
https://bugs.python.org/issue37173  opened by flying sheep

#37174: sched.py: run() is caught in delayfunc even if all events are 
https://bugs.python.org/issue37174  opened by ernestum

#37175: make install: make compileall optional
https://bugs.python.org/issue37175  opened by blueyed

#37176: super() docs don't say what super() does
https://bugs.python.org/issue37176  opened by jdemeyer

#37178: One argument form of math.perm()
https://bugs.python.org/issue37178  opened by rhettinger

#37179: asyncio loop.start_tls() provide support for TLS in TLS
https://bugs.python.org/issue37179  opened by cooperlees

#37181: fix test_regrtest failures on Windows arm64
https://bugs.python.org/issue37181  opened by Paul Monson

#37184: suggesting option to raise exception if process exits nonzero 
https://bugs.python.org/issue37184  opened by nlevitt

#37185: use  os.memfd_create in multiprocessing.shared_memory?

[Python-Dev] Re: python-ideas and python-dev migrated to Mailman 3/HyperKitty

2019-06-07 Thread Barry Warsaw
On Jun 6, 2019, at 23:50, Wes Turner  wrote:
> 
> Thanks for getting these upgraded. IMHO, being able to copy URLs from list 
> message footers as references in e.g. issues will be a great boost in 
> productivity.

Just FYI, these URLs are a “standard" we proposed many years ago, with 
discussions among list owners, third party archive maintainers, and developers.

This page gives additional details and background:

https://wiki.list.org/DEV/Stable%20URLs

(standard-in-quotes because no one’s ever proposed an official RFC.)

-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/A7HRYW7JNOYCEPHCNXLQSVVWG4MRENA6/


[Python-Dev] Using vectorcall for tp_new and tp_init

2019-06-07 Thread Jeroen Demeyer

Hello,

I'm starting this thread to brainstorm for using vectorcall to speed up 
creating instances of Python classes.


Currently the following happens when creating an instance of a Python 
class X using X(.) and assuming that __new__ and __init__ are Python 
functions and that the metaclass of X is simply "type":


1. type_call (the tp_call wrapper for type) is invoked with arguments 
(X, args, kwargs).


2. type_call calls slot_tp_new with arguments (X, args, kwargs).

3. slot_tp_new calls X.__new__, prepending X to the args tuple. A new 
object obj is returned.


4. type_call calls slot_tp_init with arguments (obj, args, kwargs).

5. slot_tp_init calls type(obj).__init__ method, prepending obj to the 
args tuple. A new object obj is returned.


In the worst case, no less than 6 temporary objects are needed just to 
pass arguments around:


1. An args tuple and kwargs dict for tp_call

3. An args array with X prepended and a kwnames tuple for __new__

5. An args array with obj prepended and a kwnames tuple for __init__

This is clearly not as efficient as it could be.

An obvious solution would be to introduce variants of tp_new and tp_init 
using the vectorcall protocol. Assuming PY_VECTORCALL_ARGUMENTS_OFFSET 
is used, all 6 temporary allocations could be dropped. The 
implementation could be in the form of two new slots tp_vector_new and 
tp_vector_init. Since we're just dealing with type slots here (as 
opposed to offsets in an object structure), this should be easier to 
implement than PEP 590 itself.



Jeroen.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CF5JGQ5DQBEWP4XLF4FAH66MNY2VRREG/


[Python-Dev] Re: python-ideas and python-dev migrated to Mailman 3/HyperKitty

2019-06-07 Thread Wes Turner
Thanks for getting these upgraded. IMHO, being able to copy URLs from list
message footers as references in e.g. issues will be a great boost in
productivity.

On Friday, June 7, 2019, Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> Barry Warsaw writes:
>  > On Jun 6, 2019, at 09:15, David Mertz  wrote:
>  > >
>  > > The old URL is definitely a lot friendlier, even apart from the
> length.
>  >
>  > Unfortunately, the old URLs aren’t really permanent.
>
> True.  That could be addressed in theory, but it would be fragile (ie,
> vulnerable to loss or corruption of the external database mapping
> messages to URLs).  Calculating from the message itself means that if
> you have the message you can always get where you want to go.
>
>  > The new URLs are guaranteed to be reproducible from the original
>  > message source.  The downside is that they are less friendly.
>
> They could, however be made more friendly than they currently are.
> There's no reason (in principle, of course it requires changing code
> and the DNS) why your message, currently given the Archived-At URL
>
> https://mail.python.org/archives/list/python-dev@python.org/message/
> EFHTPGCSB5VZSRS3DDXZN6ETYP5H6NDS/
>
> couldn't be given (A is for Archives)
>
> https://a.python.org/python-dev@python.org/EFHTPGCSB5VZSRS3DDXZN6ETYP5H6N
> DS/
>
> which gets it down to an RFC-conformant 76 characters. ;-)  Of course
> many lists would overflow that, and I agree with David that
>
> https://a.python.org/python-dev@python.org/2019/06/
> EFHTPGCSB5VZSRS3DDXZN6ETYP5H6NDS/
>
> would be better still.  Although the risk of collision would be orders
> of magnitude higher (the date buys us some leeway but not much, we
> could make the ID-Hash be 2019/06/B5VZSRS3DDXZN6ET (arbitrarily chose
> middle 16), giving


> https://a.python.org/python-dev@python.org/2019/06/B5VZSRS3DDXZN6ET
>
> (67 characters, allowing a few more characters for domain names and/or
> list names -- note with the current scheme, a domain name which is 1
> character longer probably uses up two more characters of space).


Are these message IDs or hashes?
Do they have to be (is this) base-36?
Could they instead be base-62? (26+10+26)


>
> None of this is very attractive to me, for reasons I will go into on
> Mailman-Developers or gitlab.com/mailman/mailman/issues if you want to
> file one.  Briefly, people who want bit.ly-length short URLs won't be
> satisfied, and the proposed URLs are more useful but still ugly.


We shouldn't just drop extra date information from the URL and only lookup
by the messageid unless we add a redirect to the correct dated URL; because
caching and trickery.


>
> Personally I think we should all just switch to RestructuredText- and
> Markdown-capable MUAs, and kill off both ugly visible URLs and HTML
> email with one big ol' rock.


While I personally prefer .rst and .md, hovering over URL anchor text takes
unnecessary time (and I'll remember whether I've been to the actual
http://URL, but not 'here' and 'there').
So I'm fine with ridiculous, preposterous long links (even in the middle of
the email; without footnotes to scroll back and forth to)


>
> Steve
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@
> python.org/message/O3T27UUHKKXATOPJT4KEQHREUGYVMELV/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/N62MYXZCUHPDTEQWWTH33BU2FIZ3ROPF/


[Python-Dev] Re: python-ideas and python-dev migrated to Mailman 3/HyperKitty

2019-06-07 Thread Chris Angelico
On Fri, Jun 7, 2019 at 4:30 PM Stephen J. Turnbull
 wrote:
> They could, however be made more friendly than they currently are.
> There's no reason (in principle, of course it requires changing code
> and the DNS) why your message, currently given the Archived-At URL
>
> https://mail.python.org/archives/list/python-dev@python.org/message/EFHTPGCSB5VZSRS3DDXZN6ETYP5H6NDS/
>
> couldn't be given (A is for Archives)
>
> https://a.python.org/python-dev@python.org/EFHTPGCSB5VZSRS3DDXZN6ETYP5H6NDS/
>
> which gets it down to an RFC-conformant 76 characters. ;-)

Can the list name be abbreviated to just "python-dev"? That'd give
some extra room to play with. It'd require that lists hosted on that
server be unique without their domain names; are there any known
collisions?

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ATA5JGGJM6NDMGPP2PJ2QVMZXH6FWETY/


[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

2019-06-07 Thread Neil Schemenauer
On 2019-06-06, Tim Peters wrote:
> The doubly linked lists in gc primarily support efficient
> _partitioning_ of objects for gc's purposes (a union of disjoint sets,
> with constant-time moving of an object from one set to another, and
> constant-time union of disjoint sets).  "All objects" is almost never
> interesting to it (it is only when the oldest non-frozen generation is
> being collected).

My current idea is to put partitioning flags on the interior radix
tree nodes.  If you mark an object as "finalizer reachable", for
example, it would mark all the nodes on the path from the root with
that flag.  Then, when you want to iterate over all the GC objects
with a flag, you can avoid uninteresting branches of the tree.

For generations, maybe tracking them at the pool level is good
enough.  Interior nodes can track generations too (i.e. the youngest
generation contained under them).

My gut feeling is that the prev/next pointer updates done by
move_unreachable() and similar functions must be quite expensive.
Doing the traversal with an explicit stack is a lot less elegant but
I think it should be faster.  At least, when you are dealing with a
big set of GC objects that don't fit in the CPU cache.

Regards,

  Neil
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/J422RWENKJAYHMXSZVRV5KGWSHNMAMJF/


[Python-Dev] Re: python-ideas and python-dev migrated to Mailman 3/HyperKitty

2019-06-07 Thread Stephen J. Turnbull
Barry Warsaw writes:
 > On Jun 6, 2019, at 09:15, David Mertz  wrote:
 > > 
 > > The old URL is definitely a lot friendlier, even apart from the length.
 > 
 > Unfortunately, the old URLs aren’t really permanent.

True.  That could be addressed in theory, but it would be fragile (ie,
vulnerable to loss or corruption of the external database mapping
messages to URLs).  Calculating from the message itself means that if
you have the message you can always get where you want to go.

 > The new URLs are guaranteed to be reproducible from the original
 > message source.  The downside is that they are less friendly.

They could, however be made more friendly than they currently are.
There's no reason (in principle, of course it requires changing code
and the DNS) why your message, currently given the Archived-At URL

https://mail.python.org/archives/list/python-dev@python.org/message/EFHTPGCSB5VZSRS3DDXZN6ETYP5H6NDS/

couldn't be given (A is for Archives)

https://a.python.org/python-dev@python.org/EFHTPGCSB5VZSRS3DDXZN6ETYP5H6NDS/

which gets it down to an RFC-conformant 76 characters. ;-)  Of course
many lists would overflow that, and I agree with David that

https://a.python.org/python-dev@python.org/2019/06/EFHTPGCSB5VZSRS3DDXZN6ETYP5H6NDS/

would be better still.  Although the risk of collision would be orders
of magnitude higher (the date buys us some leeway but not much, we
could make the ID-Hash be 2019/06/B5VZSRS3DDXZN6ET (arbitrarily chose
middle 16), giving

https://a.python.org/python-dev@python.org/2019/06/B5VZSRS3DDXZN6ET

(67 characters, allowing a few more characters for domain names and/or
list names -- note with the current scheme, a domain name which is 1
character longer probably uses up two more characters of space).

None of this is very attractive to me, for reasons I will go into on
Mailman-Developers or gitlab.com/mailman/mailman/issues if you want to
file one.  Briefly, people who want bit.ly-length short URLs won't be
satisfied, and the proposed URLs are more useful but still ugly.

Personally I think we should all just switch to RestructuredText- and
Markdown-capable MUAs, and kill off both ugly visible URLs and HTML
email with one big ol' rock.

Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/O3T27UUHKKXATOPJT4KEQHREUGYVMELV/


[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

2019-06-07 Thread Neil Schemenauer
On 2019-06-06, Tim Peters wrote:
> Like now:  if the size were passed in, obmalloc could test the size
> instead of doing the `address_in_range()` dance(*).  But if it's ever
> possible that the size won't be passed in, all the machinery
> supporting `address_in_range()` still needs to be there, and every
> obmalloc spelling of malloc/realloc needs to ensure that machinery
> will work if the returned address is passed back to an obmalloc
> free/realloc spelling without the size.


We can almost make it work for GC objects, the use of obmalloc is
quite well encapsulated.  I think I intentionally designed the
PyObject_GG_New/PyObject_GC_Del/etc APIs that way.

Quick and dirty experiment is here:

https://github.com/nascheme/cpython/tree/gc_malloc_free_size

The major hitch seems my new gc_obj_size() function.  We can't be
sure the 'nbytes' passed to _PyObject_GC_Malloc() is the same as
what is computed by gc_obj_size().  It usually works but there are
exceptions (freelists for frame objects and tuple objects, for one)

A nasty problem is the weirdness with PyType_GenericAlloc() and the
sentinel item.  _PyObject_GC_NewVar() doesn't include space for the
sentinel but PyType_GenericAlloc() does.  When you get to
gc_obj_size(), you don't if you should use "nitems" or "nitems+1".

I'm not sure how the fix the sentinel issue.  Maybe a new type slot
or a type flag?  In any case, making a change like my git branch
above would almost certainly break extensions that don't play
nicely.  It won't be hard to make it a build option, like the
original gcmodule was.  Then, assuming there is a performance boost,
people can enable it if their extensions are friendly.


> The "only"problem with address_in_range is that it limits us to a
> maximum pool size of 4K.  Just for fun, I boosted that to 8K to see
> how likely segfaults really are, and a Python built that way couldn't
> even get to its first prompt before dying with an access violation
> (Windows-speak for segfault).

If we can make the above idea work, you could set the pool size to
8K without issue.  A possible problem is that the obmalloc and
gcmalloc arenas are separate.  I suppose that affects 
performance testing.

> We could eliminate the pool size restriction in many ways.  For
> example, we could store the addresses obtained from the system
> malloc/realloc - but not yet freed - in a set, perhaps implemented as
> a radix tree to cut the memory burden.  But digging through 3 or 4
> levels of a radix tree to determine membership is probably
> significantly slower than address_in_range.

You are likely correct. I'm hoping to benchmark the radix tree idea.
I'm not too far from having it working such that it can replace
address_in_range().  Maybe allocating gc_refs as a block would
offset the radix tree cost vs address_in_range().  If the above idea
works, we know the object size at free() and realloc(), we don't
need address_in_range() for those code paths.

Regards,

  Neil
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ILFK2MTCVA7GB7JGBVSUWASKJ7T4LLJE/