Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Paul Moore
On 12 December 2012 00:58, Nick Coghlan ncogh...@gmail.com wrote:
 I'd prefer a more aggressive name for this like tzdata_override. My
 rationale is that *nix users need to thoroughly aware that if they install
 this package, they will stop benefiting from the automatic tz database
 updates provided by their OS (especially if they install it into the system
 site packages on a distro that has migrated to Python 3 for system tools).

 Such a name would also make it possible to provide *two* packaged databases,
 one checked before the OS data (tzdata_override), and one shipped with
 Python itself that is used only if the OS doesn't provide the timezone
 database (tzdata_fallback). tzdata_fallback would then be updated to the
 latest Olsen database for each maintenance release. Cross-platform
 applications that wanted more reliably up to date timezone data could then
 conditionally depend on tzdata_override for Windows deployments (using the
 environment marker support in metadata 1.2+).

That sounds sensible, EIBTI and all that. It is a lot simpler than
shipping the package and some sort of auto-updater, too.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guido, Dropbox, and Python

2012-12-12 Thread Chris Jerdonek
On Dec 10, 2012, at 1:52 PM, Terry Reedy tjre...@udel.edu wrote:

 My question, Guido, is how this will affect Python development, and in 
 particular, your work on async. If not proprietary info, does or will Dropbox 
 use Python3?

I talked to some Dropbox people tonight, and they said they use 2.7
for the client and 2.5 for the server.  It is a project for them to
switch the server to using 2.7.

--Chris

Sent from my iPhone
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Christian Heimes
Am 12.12.2012 01:58, schrieb Nick Coghlan:
 Ick, why a new module? Why not just add this directly to datetime? (It
 doesn't need to be provided by the C accelerator, it can go straight in
 the pure Python part).

+1 for something like datetime.timezone

How well does hg handle files renames? The datetime module could be
converted to a package.

 I'd prefer a more aggressive name for this like tzdata_override. My
 rationale is that *nix users need to thoroughly aware that if they
 install this package, they will stop benefiting from the automatic tz
 database updates provided by their OS (especially if they install it
 into the system site packages on a distro that has migrated to Python 3
 for system tools).

+1, too.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Emacs users: hg-tools-grep

2012-12-12 Thread Petri Lehtinen
Brandon W Maister wrote:
 (defconst git-tools-grep-command
   git ls-files -z | xargs -0 grep -In %s
   The command used for grepping files using git. See `git-tools-grep'.)

What's wrong with git grep?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Emacs users: hg-tools-grep

2012-12-12 Thread Ross Lagerwall
On Wed, Dec 12, 2012 at 01:27:21PM +0200, Petri Lehtinen wrote:
 Brandon W Maister wrote:
  (defconst git-tools-grep-command
git ls-files -z | xargs -0 grep -In %s
The command used for grepping files using git. See `git-tools-grep'.)
 
 What's wrong with git grep?

Or hg grep, for that matter?

-- 
Ross Lagerwall
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Emacs users: hg-tools-grep

2012-12-12 Thread Xavier Morel
On 2012-12-12, at 15:12 , Ross Lagerwall wrote:

 On Wed, Dec 12, 2012 at 01:27:21PM +0200, Petri Lehtinen wrote:
 Brandon W Maister wrote:
 (defconst git-tools-grep-command
  git ls-files -z | xargs -0 grep -In %s
  The command used for grepping files using git. See `git-tools-grep'.)
 
 What's wrong with git grep?
 
 Or hg grep, for that matter?

hg grep searches the history, not the working copy. *-tools-grep only
searches the working copy but automatically filters files to only search
in files under version control.

Which as far as I know is indeed what git-grep does already.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Emacs users: hg-tools-grep

2012-12-12 Thread Petri Lehtinen
Ross Lagerwall wrote:
 On Wed, Dec 12, 2012 at 01:27:21PM +0200, Petri Lehtinen wrote:
  Brandon W Maister wrote:
   (defconst git-tools-grep-command
 git ls-files -z | xargs -0 grep -In %s
 The command used for grepping files using git. See `git-tools-grep'.)
  
  What's wrong with git grep?
 
 Or hg grep, for that matter?

hg grep searches in the repository history, so it's not good for this.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Emacs users: hg-tools-grep

2012-12-12 Thread Brandon W Maister
Yes indeed-- in my eagerness to make my first post to python-dev be
well-received I completely forgot about git grep.

brandon


On Wed, Dec 12, 2012 at 9:20 AM, Xavier Morel python-...@masklinn.netwrote:

 On 2012-12-12, at 15:12 , Ross Lagerwall wrote:

  On Wed, Dec 12, 2012 at 01:27:21PM +0200, Petri Lehtinen wrote:
  Brandon W Maister wrote:
  (defconst git-tools-grep-command
   git ls-files -z | xargs -0 grep -In %s
   The command used for grepping files using git. See
 `git-tools-grep'.)
 
  What's wrong with git grep?
 
  Or hg grep, for that matter?

 hg grep searches the history, not the working copy. *-tools-grep only
 searches the working copy but automatically filters files to only search
 in files under version control.

 Which as far as I know is indeed what git-grep does already.
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/quodlibetor%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Lennart Regebro
General comments:


It seems like the consensus is moving towards making sure there always is a
database available. If this means including it in the standard Python
distribution as well, or only on Windows, I don't know, opinions on that are
welcome.

The steps to look for a database would then change to:

  1. The path specified, if not None.

  2. The module for timezone overrides.

  3. The OS database.

  4. The database included in Python.

We need to determine if a warning should be raised in case of 4 or not, as
well as the name for the override module. I think the word override here is
possibly unclear, I'd prefer something like timezone-update or similar.

I'm personally a bit sceptical to writing a special updater/installer just
for this. I don't want to have a special unique way to install this package.

As it comes to OS packages, Christian Heimes pointed out that most Windows
installations today has Java installed, and kept updated, and it has a
zoneinfo database. We could consider using that on Windows as well, although
it admittedly feels quite icky.

I haven't been able to find any other common locations for the
zoneinfo database on Windows.



Specific answers:

On Tue, Dec 11, 2012 at 4:39 PM, Dirkjan Ochtman dirk...@ochtman.nl wrote:

 I wonder if there needs to be something here about how to port from
 pytz to the new timezone library.

It would be nice to have, but I don't think it's necessary to have in the PEP.

 It seems like calling get_timezone() with an unknown timezone should
 just throw ValueError, not necessarily some custom Exception?

That could very well be. What are others opinions on this?

 Why not keep a bit more of the pytz API to make porting easy?

The renaming of the timezone() function to get_timezone() is indeed small,
but changing pytz.timezone(foo) to timezone.timezone(foo) is really
significantly easier than renaming it to timezone.get_timezone(foo).

If we keep all of the API intact you could do

try:
import pytz as timezone
except ImportError:
import timezone

Which would make porting quicker, that's true, but do we really want to keep
unecessary API's around forever? Isn't it better to minimize the noise from
the start?

 It also seems relatively painless to keep localize() and normalize()
 functions around for easy porting.

Sure, but you then have two ways of doing the same thing, which I think we
should avoid.


On Tue, Dec 11, 2012 at 5:07 PM, Antoine Pitrou solip...@pitrou.net wrote:

 The ``is_dst`` parameter can be ``True`` (default), ``False``, or
 ``None``.

 Why is it True by default? Do we have statistics showing that Python
 gets more use in summer?

Because for some reason both me and Stuart Bishop thought it should be, but
at least in my case I don't have any actual good reason why. Checking with
how pytz does this shows that pytz in fact defaults to False, so I think
the default should be False.


On Wed, Dec 12, 2012 at 3:50 AM, Barry Warsaw ba...@python.org wrote:

 This is likely the hardest part of this PEP as this involves updating the

 Oops, something got cut off there.

Ah, yes, I was going to write that the difficult bit was updating the
_datetime.c module.

 Why add a new module instead of putting all this into the existing datetime
 module, either directly or as a submodule? Seems like the obvious place to
 put it instead of claiming another top-level module name.

pytz as it is consists of several modules, and a significant amount of code,
it didn't feel right to move all that into the datetime.py module. It also
didn't feel right to then not implement it in _datetime.c, but perhaps that's
just me being silly.

But a submodule could work.

 I'm bikeshedding, but can we find a better name than `db` for the second
 argument?  Something that makes it obvious we're looking for file system path?

Absolutely. db_path?

 I'd really like to see a TimeZoneError base class from which all these new
 exceptions inherit.

That makes sense.

The ``timezonedata``-package
-

 Just to be clear, this doesn't expose any new modules, right?

That's the intention, yes, although I haven't investigated ways of knowing if
it's installed or not yet, and exposing a module is the obvious way of doing
that. But I'm hoping there will be better ways, right?

 One other thing that the PEP should describe is what happens on a distro that
 has timezone data, but which you also pip install the PyPI tzdata package.
 Which one wins?  Is there a way to control it, other than providing an
 explicit path?  Is there a way to uninstall the PyPI package?  Does the API
 need to provide a method which tells you where the database it is using by
 default lives?

The PyPI package wins, I'll clarify that bit. I'm think the data should end
up in site-packages somewhere, and that it should be installable and
uninstallable with pip/easy_install and by simply deleting it.


On Wed, Dec 12, 2012 at 4:14 AM, Nick Coghlan 

Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Brian Curtin
On Wed, Dec 12, 2012 at 9:56 AM, Lennart Regebro rege...@gmail.com wrote:
 General comments:


 It seems like the consensus is moving towards making sure there always is a
 database available. If this means including it in the standard Python
 distribution as well, or only on Windows, I don't know, opinions on that are
 welcome.

 The steps to look for a database would then change to:

   1. The path specified, if not None.

   2. The module for timezone overrides.

   3. The OS database.

   4. The database included in Python.

 We need to determine if a warning should be raised in case of 4 or not, as
 well as the name for the override module. I think the word override here is
 possibly unclear, I'd prefer something like timezone-update or similar.

 I'm personally a bit sceptical to writing a special updater/installer just
 for this. I don't want to have a special unique way to install this package.

 As it comes to OS packages, Christian Heimes pointed out that most Windows
 installations today has Java installed, and kept updated, and it has a
 zoneinfo database. We could consider using that on Windows as well, although
 it admittedly feels quite icky.

Depending on Java being installed or even installing it alongside
Python would be a funny April Fools prank. This can't happen.

I don't think it's all that bad to include a small script on Windows
which runs every few days to check PyPI, then present an option to
update the info. This is what Java itself is doing anyway.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Dirkjan Ochtman
On Wed, Dec 12, 2012 at 4:56 PM, Lennart Regebro rege...@gmail.com wrote:
 Why not keep a bit more of the pytz API to make porting easy?

 The renaming of the timezone() function to get_timezone() is indeed small,
 but changing pytz.timezone(foo) to timezone.timezone(foo) is really
 significantly easier than renaming it to timezone.get_timezone(foo).

 If we keep all of the API intact you could do

 try:
 import pytz as timezone
 except ImportError:
 import timezone

 Which would make porting quicker, that's true, but do we really want to keep
 unecessary API's around forever? Isn't it better to minimize the noise from
 the start?

That entirely depends on when you define to be the start. It seems
to me the consensus on python-dev has been that packages primarily
evolve outside the stdlib; it seems a bit weird to then, at the time
of stdlib inclusion, start changing the API.

 Why is it True by default? Do we have statistics showing that Python
 gets more use in summer?

 Because for some reason both me and Stuart Bishop thought it should be, but
 at least in my case I don't have any actual good reason why. Checking with
 how pytz does this shows that pytz in fact defaults to False, so I think
 the default should be False.

Here, too, I think that sticking with pytz's default would be a good idea.

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Paul Moore
On 12 December 2012 16:11, Brian Curtin br...@python.org wrote:
 I don't think it's all that bad to include a small script on Windows
 which runs every few days to check PyPI, then present an option to
 update the info. This is what Java itself is doing anyway.

What would that do in an environment without internet access? Or with
a firewall blocking Python's requests and returning an error page
without warning (so the updater just sees incorrect data)? What about
corporate environments that want to control the rollout of updates? (I
can't imagine that in practice, but certainly companies do it for
Java). Most Windows updaters use the official Windows APIs so that
they work properly with odd cases like ISA proxies taking credentials
from the Windows user login. Python's stdlib doesn't support that type
of thing.

I'm -1 on auto-updating because it's too easy to produce a nearly
right solution that doesn't work in highly-controlled (e.g.,
corporate) environments. And a correct solution would be hard to
support with python-dev's level of Windows expertise.

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Antoine Pitrou
Le Wed, 12 Dec 2012 10:11:15 -0600,
Brian Curtin br...@python.org a écrit :
 
 I don't think it's all that bad to include a small script on Windows
 which runs every few days to check PyPI, then present an option to
 update the info. This is what Java itself is doing anyway.

I don't get why people are so obsessed about updating the timezone
database. Really, this is not worse than having a vulnerable OpenSSL
linked with your Python executable. Purity does not bring any
advantage here.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Guido van Rossum
On Wed, Dec 12, 2012 at 8:44 AM, Antoine Pitrou solip...@pitrou.net wrote:
 Le Wed, 12 Dec 2012 10:11:15 -0600,
 Brian Curtin br...@python.org a écrit :

 I don't think it's all that bad to include a small script on Windows
 which runs every few days to check PyPI, then present an option to
 update the info. This is what Java itself is doing anyway.

 I don't get why people are so obsessed about updating the timezone
 database. Really, this is not worse than having a vulnerable OpenSSL
 linked with your Python executable. Purity does not bring any
 advantage here.

Bingo. As long as the recipe to update is clear, most users can ignore
this, because the countries about which they care don't change DST
rules often enough for it to matter. When it does matter, they'll know
(changing the DST rules is something that local news sources tend to
track :-) and they can update their software when stuff they use
starts getting the time wrong. Obviously sysadmins responsible for
large numbers of users can make this into a routine, and ditto people
who run services. But these folks are professionals and are good at
automating tasks like this.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Steve Dower
Paul Moore wrote:
 On 12 December 2012 16:11, Brian Curtin br...@python.org wrote:
  I don't think it's all that bad to include a small script on Windows
  which runs every few days to check PyPI, then present an option to
  update the info. This is what Java itself is doing anyway.
 
 What would that do in an environment without internet access? Or with a
 firewall blocking Python's requests and returning an error page without
 warning (so the updater just sees incorrect data)? What about corporate
 environments that want to control the rollout of updates? (I can't imagine
 that in practice, but certainly companies do it for Java). Most Windows
 updaters use the official Windows APIs so that they work properly with
 odd cases like ISA proxies taking credentials from the Windows user login.
 Python's stdlib doesn't support that type of thing.
 
 I'm -1 on auto-updating because it's too easy to produce a nearly right
 solution that doesn't work in highly-controlled (e.g.,
 corporate) environments. And a correct solution would be hard to support
 with python-dev's level of Windows expertise.

And what about embedded installations of Python, such as in TortoiseHg? And all 
the people (such as myself) who disable updaters that they don't like or didn't 
expect?

The correct solution on Windows may be to use a static database for 
historical dates and the information in the registry for current and future 
dates. The registry is updated through Windows Update, which is at least as 
reliable as anything Python could do. (I'm not sure exactly what the state of 
updates to older versions is like, but I'd assume WinXP still gets timezone 
updates and Win2K doesn't.)

Details of the registry entries are at 
http://msdn.microsoft.com/en-us/library/ms725481.aspx. It looks like the data 
is focused on modern timezones rather than localities, which would mean a 
many-to-one mapping from zoneinfo. Unfortunately it doesn't look like there's 
enough overlap to allow an automated mapping.

That said, it is incredibly easy to convert between UTC and local 
(http://msdn.microsoft.com/en-us/library/ms724949.aspx), even for dates in the 
past or future when the information is available. It's just that timezones 
other than the user's preference are difficult.

Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Éric Araujo
Hi,

Le 12/12/2012 04:53, Christian Heimes a écrit :
 Am 12.12.2012 01:58, schrieb Nick Coghlan:
 Ick, why a new module? Why not just add this directly to datetime? (It
 doesn't need to be provided by the C accelerator, it can go straight in
 the pure Python part).
 
 +1 for something like datetime.timezone
 
 How well does hg handle files renames? The datetime module could be
 converted to a package.

Quite well.  It’s easy to rename datetime.py to datetime/__init__py, and
subsequent fixes in 3.3’s datetime.py will be merged into
datetime/__init__.py by Mercurial’s merge subsystem.

Cheers
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More compact dictionaries with faster iteration

2012-12-12 Thread Dag Sverre Seljebotn

On 12/12/2012 01:15 AM, Nick Coghlan wrote:

On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland di...@microsoft.com
mailto:di...@microsoft.com wrote:

OTOH changing certain dictionaries in IronPython (such as keyword
args) to be
ordered would certainly be possible.  Personally I just wouldn't
want to see it
be the default as that seems like unnecessary overhead when the
specialized
class exists.


Which reminds me, I was going to note that one of the main gains with
ordered keyword arguments, is their use in the construction of
string-keyed objects where you want to be able to control the order of
iteration (e.g. for serialisation or display purposes). Currently you
have to go the path of something like namedtuple where you define the
order of iteration in one operation, and set the values in another.


So here's a brand new argument against ordered dicts: The existence of 
perfect hashing schemes. They fundamentally conflict with ordered dicts.


I played with using them for vtable dispatches in Cython this summer, 
and they can perform really, really well for branch-predicted lookups in 
hot loops, because you always/nearly always eliminate linear probing and 
so there's no branch misses or extra comparisons. (The overhead of a 
perfect hash table lookup over a traditional vtable lookup was only a 
couple of cycles in my highly artificial fully branch-predicted 
micro-benchmark.)


There's some overhead in setup; IIRC, ~20 microseconds for 64 elements, 
2 GHz CPU, though that was a first prototype implementation and both 
algorithmic improvements and tuning should be possible.


So it's not useful for everything, but perhaps for things like module 
dictionaries and classes an optionally perfect dict can make sense.


Note: I'm NOT suggesting the use of perfect hashing, just making sure 
it's existence is mentioned and that one is aware that if always-ordered 
dicts become the language standard it precludes this option far off in 
the future.


(Something like a sort() method could still work and make the dict 
unperfect; one could also have a pack() method that made the dict 
perfect again.).


That concludes the on-topic parts of my post.

--
Dag Sverre Seljebotn

APPENDIX

Going off-topic for those who are interested, here's the longwinded and 
ugly details. My code [1] is based on the paper [2]  (psuedo-code in 
Appendix A), but I adapted it a bit to be useful for tens/hundreds of 
elements rather than billions.


The ingredients:

 1) You need the hash to be 32 bits (or 64) of good entropy (md5 or 
murmurhash or similar). (Yes, that's a tall order for CPython, I'm just 
describing the scheme.) (If the hash collides on all bits you *will* 
collide, so some fallback is still necesarry, just unlikely.)


 2) To lookup, the idea is (psuedo-code!)

typedef struct {
int m_f m_g, r, k;
int16_t d[k]; /* small int, like current proposal */
} table_header_t;

And then one computes index of an element with hash h using the function

((h  tab-r)  tab-m_f) ^ tab-d[h  tab-m_g]

rather than the usual h % n. While more arithmetic, arithmetic is 
cheap and branch misses are not.


 3) To set up/repack a table one needs to find the parameters. The 
general idea is:


 a) Partition the hashes into k bins by using h  m_g. There will be 
collisions, but the number of bins with many collisions will be very 
small; most bins will have 2 or 1 or 0 elements.


 b) Starting with the largest bin, distribute the elements according to
the hash function. If a bin collides with the existing contents, try 
another value for d[binindex] until it doesn't.


The r parameter let's you try again 32 (or 64) times to find a solution. 
In my testcases there was ~0.1% chance of not finding a solution (that 
is, exhausting possible choices of r) with 64-bit hashes with 4 or 8 
elements and no empty table elements. For any other number of elements, 
or with some empty elements, the chance of failure was much lower.)



[1] It's not exactly a great demo, but it contains the algorithm. If 
there's much interest I should clean it up and make a proper benchmark 
demo out of it:


https://github.com/dagss/pyextensibletype/blob/perfecthash/include/perfecthash.h


[2] Pagh (1999)

http://www.brics.dk/RS/99/13/BRICS-RS-99-13.ps.gz




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More compact dictionaries with faster iteration

2012-12-12 Thread PJ Eby
On Wed, Dec 12, 2012 at 3:37 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 On 12/12/2012 01:15 AM, Nick Coghlan wrote:

 On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland di...@microsoft.com
 mailto:di...@microsoft.com wrote:

 OTOH changing certain dictionaries in IronPython (such as keyword
 args) to be
 ordered would certainly be possible.  Personally I just wouldn't
 want to see it
 be the default as that seems like unnecessary overhead when the
 specialized
 class exists.


 Which reminds me, I was going to note that one of the main gains with
 ordered keyword arguments, is their use in the construction of
 string-keyed objects where you want to be able to control the order of
 iteration (e.g. for serialisation or display purposes). Currently you
 have to go the path of something like namedtuple where you define the
 order of iteration in one operation, and set the values in another.


 So here's a brand new argument against ordered dicts: The existence of
 perfect hashing schemes. They fundamentally conflict with ordered dicts.

If I understand your explanation, then they don't conflict with the
type of ordering described in this thread.  Raymond's optimization
separates the hash table part from the contents part of a
dictionary, and there is no requirement that these two parts be in the
same size or the same order.

Indeed, Raymond's split design lets you re-parameterize the hashing
all you want, without perturbing the iteration order at all.  You
would in fact be able to take a dictionary at any moment, and say,
optimize the 'hash table' part to a non-colliding state based on the
current contents, without touching the 'contents' part at all.

(One could do this at class creation time on a class dictionary, and
just after importing on a module dictionary, for example.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More compact dictionaries with faster iteration

2012-12-12 Thread Dag Sverre Seljebotn

On 12/12/2012 10:31 PM, PJ Eby wrote:

On Wed, Dec 12, 2012 at 3:37 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:

On 12/12/2012 01:15 AM, Nick Coghlan wrote:


On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland di...@microsoft.com
mailto:di...@microsoft.com wrote:

 OTOH changing certain dictionaries in IronPython (such as keyword
 args) to be
 ordered would certainly be possible.  Personally I just wouldn't
 want to see it
 be the default as that seems like unnecessary overhead when the
 specialized
 class exists.


Which reminds me, I was going to note that one of the main gains with
ordered keyword arguments, is their use in the construction of
string-keyed objects where you want to be able to control the order of
iteration (e.g. for serialisation or display purposes). Currently you
have to go the path of something like namedtuple where you define the
order of iteration in one operation, and set the values in another.



So here's a brand new argument against ordered dicts: The existence of
perfect hashing schemes. They fundamentally conflict with ordered dicts.


If I understand your explanation, then they don't conflict with the
type of ordering described in this thread.  Raymond's optimization
separates the hash table part from the contents part of a
dictionary, and there is no requirement that these two parts be in the
same size or the same order.


I don't fully agree.

Perfect hashing already separates hash table from contents (sort 
of), and saves the memory in much the same way.


Yes, you can repeat the trick and have 2 levels of indirection, but that 
then requires an additional table of small ints which is pure overhead 
present for the sorting; in short, it's no longer an optimization but 
just overhead for the sortability.


Dag Sverre



Indeed, Raymond's split design lets you re-parameterize the hashing
all you want, without perturbing the iteration order at all.  You
would in fact be able to take a dictionary at any moment, and say,
optimize the 'hash table' part to a non-colliding state based on the
current contents, without touching the 'contents' part at all.

(One could do this at class creation time on a class dictionary, and
just after importing on a module dictionary, for example.)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More compact dictionaries with faster iteration

2012-12-12 Thread Dag Sverre Seljebotn

On 12/12/2012 11:06 PM, Dag Sverre Seljebotn wrote:

On 12/12/2012 10:31 PM, PJ Eby wrote:

On Wed, Dec 12, 2012 at 3:37 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:

On 12/12/2012 01:15 AM, Nick Coghlan wrote:


On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland di...@microsoft.com
mailto:di...@microsoft.com wrote:

 OTOH changing certain dictionaries in IronPython (such as keyword
 args) to be
 ordered would certainly be possible.  Personally I just wouldn't
 want to see it
 be the default as that seems like unnecessary overhead when the
 specialized
 class exists.


Which reminds me, I was going to note that one of the main gains with
ordered keyword arguments, is their use in the construction of
string-keyed objects where you want to be able to control the order of
iteration (e.g. for serialisation or display purposes). Currently you
have to go the path of something like namedtuple where you define the
order of iteration in one operation, and set the values in another.



So here's a brand new argument against ordered dicts: The existence of
perfect hashing schemes. They fundamentally conflict with ordered dicts.


If I understand your explanation, then they don't conflict with the
type of ordering described in this thread.  Raymond's optimization
separates the hash table part from the contents part of a
dictionary, and there is no requirement that these two parts be in the
same size or the same order.


I don't fully agree.

Perfect hashing already separates hash table from contents (sort
of), and saves the memory in much the same way.


This was a bit inaccurate, but the point is: The perfect hash function 
doesn't need any fill-in to avoid collisions, you can (except in 
exceptional circumstances) fill the table 100%, so the memory is already 
saved.


Dag Sverre




Yes, you can repeat the trick and have 2 levels of indirection, but that
then requires an additional table of small ints which is pure overhead
present for the sorting; in short, it's no longer an optimization but
just overhead for the sortability.

Dag Sverre



Indeed, Raymond's split design lets you re-parameterize the hashing
all you want, without perturbing the iteration order at all.  You
would in fact be able to take a dictionary at any moment, and say,
optimize the 'hash table' part to a non-colliding state based on the
current contents, without touching the 'contents' part at all.

(One could do this at class creation time on a class dictionary, and
just after importing on a module dictionary, for example.)





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Greg Ewing

On Tue, Dec 11, 2012 at 8:07 AM, Antoine Pitrou solip...@pitrou.net wrote:


Do we have statistics showing that Python
gets more use in summer?


Well, pythons are cold-blooded, so they're probably more
active during the warmer seasons...

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Lennart Regebro
On Wed, Dec 12, 2012 at 5:21 PM, Dirkjan Ochtman dirk...@ochtman.nl wrote:
 That entirely depends on when you define to be the start. It seems
 to me the consensus on python-dev has been that packages primarily
 evolve outside the stdlib; it seems a bit weird to then, at the time
 of stdlib inclusion, start changing the API.

But this bit of the API is there only as a hack, because stdlib does
not support is_dst. We are changing that. Hence those extra functions
are no longer needed.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Lennart Regebro
On Wed, Dec 12, 2012 at 5:54 PM, Steve Dower steve.do...@microsoft.com wrote:
 Details of the registry entries are at 
 http://msdn.microsoft.com/en-us/library/ms725481.aspx. It looks like the data 
 is focused on modern timezones rather than localities, which would mean a 
 many-to-one mapping from zoneinfo. Unfortunately it doesn't look like there's 
 enough overlap to allow an automated mapping.

No, but the Unicode consortium (I think) is keeping a mapping updated
manually. I'm using that in tzlocal, to figure out the local timezone
of the computer on Windows.
However, I think that mixing and matching timezone data in this way
from the two systems are likely to be full of pitfalls edge-cases and
complexities I do not dare even think seriously about. There will
probably be *less* errors by just keeping an old timezone database
around. Besides, what it they don't run Windows update? Then the data
still is outdated?

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Christian Tismer

On 12.12.12 02:43, Guido van Rossum wrote:

On Tue, Dec 11, 2012 at 5:11 PM, Robert Brewerfuman...@aminus.org  wrote:

Guido van Rossum wrote:

Sent: Tuesday, December 11, 2012 4:11 PM
To: Antoine Pitrou
Cc:python-dev@python.org
Subject: Re: [Python-Dev] Draft PEP for time zone support.

On Tue, Dec 11, 2012 at 8:07 AM, Antoine Pitrousolip...@pitrou.net
wrote:

Le Tue, 11 Dec 2012 16:23:37 +0100,
Lennart Regebrorege...@gmail.com  a écrit :

Changes in the ``datetime``-module
--

A new ``is_dst`` parameter is added to several of the `tzinfo`
methods to handle time ambiguity during DST changeovers.

* ``tzinfo.utcoffset(self, dt, is_dst=True)``

* ``tzinfo.dst(self, dt, is_dst=True)``

* ``tzinfo.tzname(self, dt, is_dst=True)``

The ``is_dst`` parameter can be ``True`` (default), ``False``, or
``None``.

``True`` will specify that the given datetime should be interpreted
as happening during daylight savings time, ie that the time

specified

is before the change from DST.

Why is it True by default? Do we have statistics showing that Python
gets more use in summer?

My question exactly.

Summer in the USA, at least, is 238 days in 2012, while Winter into 2013 is 
only 126 days:


import datetime
datetime.date(2012, 11, 4) - datetime.date(2012, 3, 11)

datetime.timedelta(238)

datetime.date(2013, 3, 10) - datetime.date(2012, 11, 4)

datetime.timedelta(126)

Very funny, but that can't be the real reason. *Most* datetime values
aren't ambiguous, so in those cases the parameter should be ignored,
right? There's only one hour per year where you need to specify it
(two, if we want to artificially assign a meaning to values falling
the impossible hour). And during those times it's equally likely that
you meant either of the possibilities. I think the meaning of the
parameter must be clarified, perhaps as follows:

- ignored except during the ambiguous hour and during the impossible hour
- during the ambiguous or impossible hour:
   - if True, prefer/pretend DST
   - if False, prefer/pretend non-DST
   - if None, raise an error

Here I'd prefer the default to be None if I had to do it over again,
but given that the current behavior is one of the first two (which
one?) we probably can't do that. Still, it's slightly confusing that
passing None is not the same as omitting the parameter altogether --
there aren't many APIs that explicitly support passing None but don't
use it as the default (though there probably are some precedents).
Maybe requesting an error should be done through some other special
value, and None should be the same as omitted and the same as the old
behavior? But where would the special value come from? It should be
made as easy as possible to do the right thing (i.e. raise an
error). Or maybe have a separate Boolean flag to request an error?



I see an issue here that makes me a little uncomfortable:
Having a default that makes code work all year but raises an error during
the impossible hour could be problematic in critical code.
Can we make this more explicit by forcing the users to decide?

I like the idea of the extra boolean flag here, because it will be 
explicitly

visible that this code intentionally creates an exception.
Or even not a flag, but the exception to be raised, or a callable to
handle this case?

Sloppy coding can be dangerous. So maybe the warning module could be
helpful as well: If None is passed and no explicit flag/exception/callable
given, bother the user with a warning message ;-)

cheers - chris

--
Christian Tismer :^)mailto:tis...@stackless.com
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship*http://starship.python.net/
14482 Potsdam: PGP key -http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?http://www.stackless.com/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Terry Reedy

On 12/12/2012 11:53 AM, Guido van Rossum wrote:


Bingo. As long as the recipe to update is clear, most users can ignore
this, because the countries about which they care don't change DST
rules often enough for it to matter. When it does matter, they'll know
(changing the DST rules is something that local news sources tend to
track :-) and they can update their software when stuff they use
starts getting the time wrong. Obviously sysadmins responsible for
large numbers of users can make this into a routine, and ditto people
who run services. But these folks are professionals and are good at
automating tasks like this.


As a Windows user, I would like there to be one tz data file used by all 
Python versions on my machine, including ones included with other apps. 
I would like every installer, including for bug fix releases, to update 
it. This should be sufficient for 99% of Windows users. As Guido says 
above, the docs should tell the other 1% how to update it explicitly.



--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Terry Reedy

On 12/12/2012 10:56 AM, Lennart Regebro wrote:


It seems like calling get_timezone() with an unknown timezone should
just throw ValueError, not necessarily some custom Exception?


That could very well be. What are others opinions on this?


ValueError. That is what it is. Nothing special here.


Why not keep a bit more of the pytz API to make porting easy?


The renaming of the timezone() function to get_timezone() is indeed small,


And gratuitous, to me. I don't generally like 'get' prefixes anyway.


but changing pytz.timezone(foo) to timezone.timezone(foo) is really
significantly easier than renaming it to timezone.get_timezone(foo).

If we keep all of the API intact you could do

 try:
 import pytz as timezone
 except ImportError:
 import timezone

Which would make porting quicker, that's true, but do we really want to keep
unecessary API's around forever? Isn't it better to minimize the noise from
the start?


While the module that was the basis for the ipaddress module was 
released on PyPI and its api developed however it did, the API was 
worked over quite a bit before the addition of ipaddress. So I agree 
that the current api can be revised before being more-or-less frozen in 
the stdlib.



It also seems relatively painless to keep localize() and normalize()
functions around for easy porting.


Sure, but you then have two ways of doing the same thing, which I think we
should avoid.


I agree that this is precisely the time to remove cruft (if indeed it is 
such).


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Lennart Regebro
On Thu, Dec 13, 2012 at 12:23 AM, Terry Reedy tjre...@udel.edu wrote:
 As a Windows user, I would like there to be one tz data file used by all
 Python versions on my machine, including ones included with other apps.

That would be nice, but where would that be installed? There is no
standard location for zoneinfo files. And do we really want to install
python-specific files outside the Python tree?


//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread MRAB

On 2012-12-12 23:33, Lennart Regebro wrote:

On Thu, Dec 13, 2012 at 12:23 AM, Terry Reedy tjre...@udel.edu wrote:

As a Windows user, I would like there to be one tz data file used by all
Python versions on my machine, including ones included with other apps.


That would be nice, but where would that be installed? There is no
standard location for zoneinfo files. And do we really want to install
python-specific files outside the Python tree?


Python version x.y is installed into, say, C:\Pythonxy, so perhaps a
good place would be, say, C:\Python.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Brian Curtin
On Wed, Dec 12, 2012 at 6:10 PM, MRAB pyt...@mrabarnett.plus.com wrote:
 On 2012-12-12 23:33, Lennart Regebro wrote:

 On Thu, Dec 13, 2012 at 12:23 AM, Terry Reedy tjre...@udel.edu wrote:

 As a Windows user, I would like there to be one tz data file used by all
 Python versions on my machine, including ones included with other apps.


 That would be nice, but where would that be installed? There is no
 standard location for zoneinfo files. And do we really want to install
 python-specific files outside the Python tree?

 Python version x.y is installed into, say, C:\Pythonxy, so perhaps a
 good place would be, say, C:\Python.

C:\ProgramData\Python
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Terry Reedy

On 12/12/2012 7:27 PM, Brian Curtin wrote:

On Wed, Dec 12, 2012 at 6:10 PM, MRAB pyt...@mrabarnett.plus.com wrote:

On 2012-12-12 23:33, Lennart Regebro wrote:


On Thu, Dec 13, 2012 at 12:23 AM, Terry Reedy tjre...@udel.edu wrote:


As a Windows user, I would like there to be one tz data file used by all
Python versions on my machine, including ones included with other apps.



That would be nice, but where would that be installed? There is no
standard location for zoneinfo files. And do we really want to install
python-specific files outside the Python tree?


There is no 'Python tree' on windows. Rather, there is a separate tree 
for each version, located where the user directs.


Windows used to have a %APPDATA% directory variable. Not sure about Win 
7, let alone 8. Martin and others should know better.


Or ask the user where to put it. I know where I would choose, and it 
would not be on my C drive. Un-installers would not delete (unless a 
reference count were kept and were decremented to 0).



Python version x.y is installed into, say, C:\Pythonxy, so perhaps a
good place would be, say, C:\Python.


C:\ProgramData\Python


Making a new top-level directory without asking is obnoxious.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Brian Curtin
On Dec 12, 2012 7:24 PM, Terry Reedy tjre...@udel.edu wrote:

 On 12/12/2012 7:27 PM, Brian Curtin wrote:

 On Wed, Dec 12, 2012 at 6:10 PM, MRAB pyt...@mrabarnett.plus.com wrote:

 On 2012-12-12 23:33, Lennart Regebro wrote:


 On Thu, Dec 13, 2012 at 12:23 AM, Terry Reedy tjre...@udel.edu wrote:


 As a Windows user, I would like there to be one tz data file used by
all
 Python versions on my machine, including ones included with other
apps.



 That would be nice, but where would that be installed? There is no
 standard location for zoneinfo files. And do we really want to install
 python-specific files outside the Python tree?


 There is no 'Python tree' on windows. Rather, there is a separate tree
for each version, located where the user directs.

 Windows used to have a %APPDATA% directory variable. Not sure about Win
7, let alone 8. Martin and others should know better.

 Or ask the user where to put it. I know where I would choose, and it
would not be on my C drive. Un-installers would not delete (unless a
reference count were kept and were decremented to 0).


 Python version x.y is installed into, say, C:\Pythonxy, so perhaps a
 good place would be, say, C:\Python.


 C:\ProgramData\Python


 Making a new top-level directory without asking is obnoxious.

See
http://stackoverflow.com/questions/9518890/what-is-the-significance-programdata-in-windows
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Glenn Linderman

On 12/12/2012 5:36 PM, Brian Curtin wrote:


 C:\ProgramData\Python



  ^ That.  Is not the path that the link below is talking 
about, though.





 Making a new top-level directory without asking is obnoxious.

See 
http://stackoverflow.com/questions/9518890/what-is-the-significance-programdata-in-windows




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Janzert

On 12/12/2012 8:43 PM, Glenn Linderman wrote:

On 12/12/2012 5:36 PM, Brian Curtin wrote:


 C:\ProgramData\Python



   ^ That.  Is not the path that the link below is talking
about, though.



It actually does; it is rather confusing though. :/ It's referring to 
KNOWNFOLDERID constant FOLDERID_ProgramData. The actual on disk location 
for this has changed over windows versions. As noted below in the SO 
link given:


Note that this documentation refers to the typical path as per older 
versions of Windows. In modern versions of Windows it is located in 
%SystemDrive%\ProgramData.





 Making a new top-level directory without asking is obnoxious.

See
http://stackoverflow.com/questions/9518890/what-is-the-significance-programdata-in-windows








___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Brian Curtin
On Wed, Dec 12, 2012 at 8:10 PM, Janzert janz...@janzert.com wrote:
 On 12/12/2012 8:43 PM, Glenn Linderman wrote:

 On 12/12/2012 5:36 PM, Brian Curtin wrote:


  C:\ProgramData\Python


^ That.  Is not the path that the link below is talking
 about, though.


 It actually does; it is rather confusing though. :/ It's referring to
 KNOWNFOLDERID constant FOLDERID_ProgramData. The actual on disk location for
 this has changed over windows versions. As noted below in the SO link given:

 Note that this documentation refers to the typical path as per older
 versions of Windows. In modern versions of Windows it is located in
 %SystemDrive%\ProgramData.

Correct.

Anyway, on with the actual timezone stuff...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More compact dictionaries with faster iteration

2012-12-12 Thread PJ Eby
On Wed, Dec 12, 2012 at 5:06 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 Perfect hashing already separates hash table from contents (sort of),
 and saves the memory in much the same way.

 Yes, you can repeat the trick and have 2 levels of indirection, but that
 then requires an additional table of small ints which is pure overhead
 present for the sorting; in short, it's no longer an optimization but just
 overhead for the sortability.

I'm confused.  I understood your algorithm to require repacking,
rather than it being a suitable algorithm for incremental change to an
existing dictionary.  ISTM that that would mean you still pay some
sort of overhead (either in time or space) while the dictionary is
still being mutated.

Also, I'm not sure how 2 levels of indirection come into it.   The
algorithm you describe has, as I understand it, only up to 12
perturbation values (bins), so it's a constant space overhead, not a
variable one.  What's more, you can possibly avoid the extra memory
access by using a different perfect hashing algorithm, at the cost of
a slower optimization step or using a little more memory.

 Note: I'm NOT suggesting the use of perfect hashing, just making
 sure it's existence is mentioned and that one is aware that if
 always-ordered dicts become the language standard it precludes
 this option far off in the future.

Not really.  It means that some forms of perfect hashing might require
adding a few more ints worth of overhead for the dictionaries that use
it.  If it's really a performance benefit for very-frequently-used
dictionaries, that might still be worthwhile.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Lennart Regebro
On Thu, Dec 13, 2012 at 2:24 AM, Terry Reedy tjre...@udel.edu wrote:
 Or ask the user where to put it.

If we ask where it should be installed, then we need a registry
setting for that or we don't know where it's located when it is to be
used. And if we ask, then people will install it in non-standard
locations. While installers for software with Python don't want their
users to be asked, so they'll install it in the standard location,
overriding the managers preferred, updated custom location with the
standard location with a database that is probably not updated.

So I think that asking is not an option at all. It either goes in
%PROGRAMDATA%\Python\zoneinfo or it's not shared at all.

 I know where I would choose, and it would
 not be on my C drive. Un-installers would not delete (unless a reference
 count were kept and were decremented to 0).

True, and that's annoying when those counters go wrong.

All in all I would say I would prefer to install this per Python.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Distutils] Is is worth disentangling distutils?

2012-12-12 Thread Lennart Regebro
On Mon, Dec 10, 2012 at 8:22 AM, Antonio Cavallo
a.cava...@cavallinux.eu wrote:
 Hi,
 I wonder if is it worth/if there is any interest in trying to clean up
 distutils: nothing in terms to add new features, just a *major* cleanup
 retaining the exact same interface.


 I'm not planning anything like *adding features* or rewriting rpm/rpmbuild
 here, simply cleaning up that un-holy code mess. Yes it served well, don't
 get me wrong, and I think it did work much better than anything it was meant
 to replace it.

 I'm not into the py3 at all so I wonder how possibly it could fit/collide
 into the big plan.

 Or I'll be wasting my time?

The effort of making something that replaces distutils is, as far as I
can understand, currently on the level of taking the best bits out of
distutils2 and putting it into Python 3.4 under the name packaging.
I'm sure that effort can need more help.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Glenn Linderman

On 12/12/2012 6:10 PM, Janzert wrote:

On 12/12/2012 8:43 PM, Glenn Linderman wrote:

On 12/12/2012 5:36 PM, Brian Curtin wrote:


 C:\ProgramData\Python



   ^ That.  Is not the path that the link below is talking
about, though.



It actually does; it is rather confusing though. :/ 


I agree with the below. But I have never seen a version of Windows on 
which c:\ProgramData was the actual path for FOLDERID_ProgramData. Can 
you reference documentation that states that it was there, for some 
version?  This documentation speaks of:


c:\Documents and Settings\AllUsers\Application Data (which I knew from 
XP, and I think 2000, not sure I remember NT)


In Vista.0, Vista.1, and Vista.2, I guess it is moved to 
C:\users\AllUsers\AppData\Roaming (typically).


Neither of those would result in C:\ProgramData\Python.

It's referring to KNOWNFOLDERID constant FOLDERID_ProgramData. The 
actual on disk location for this has changed over windows versions. As 
noted below in the SO link given:


Note that this documentation refers to the typical path as per older 
versions of Windows. In modern versions of Windows it is located in 
%SystemDrive%\ProgramData.





 Making a new top-level directory without asking is obnoxious.

See
http://stackoverflow.com/questions/9518890/what-is-the-significance-programdata-in-windows 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: expose TCP_FASTOPEN and MSG_FASTOPEN

2012-12-12 Thread Antoine Pitrou
On Thu, 13 Dec 2012 04:24:54 +0100 (CET)
benjamin.peterson python-check...@python.org wrote:
 http://hg.python.org/cpython/rev/5435a9278028
 changeset:   80834:5435a9278028
 user:Benjamin Peterson benja...@python.org
 date:Wed Dec 12 22:24:47 2012 -0500
 summary:
   expose TCP_FASTOPEN and MSG_FASTOPEN
 
 files:
   Misc/NEWS  |  3 +++
   Modules/socketmodule.c |  7 ++-
   2 files changed, 9 insertions(+), 1 deletions(-)
 
 
 diff --git a/Misc/NEWS b/Misc/NEWS
 --- a/Misc/NEWS
 +++ b/Misc/NEWS
 @@ -163,6 +163,9 @@
  Library
  ---
  
 +- Expose the TCP_FASTOPEN and MSG_FASTOPEN flags in socket when they're
 +  available.

This should be documented, no?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More compact dictionaries with faster iteration

2012-12-12 Thread Dag Sverre Seljebotn

On 12/13/2012 06:11 AM, PJ Eby wrote:

On Wed, Dec 12, 2012 at 5:06 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:

Perfect hashing already separates hash table from contents (sort of),
and saves the memory in much the same way.

Yes, you can repeat the trick and have 2 levels of indirection, but that
then requires an additional table of small ints which is pure overhead
present for the sorting; in short, it's no longer an optimization but just
overhead for the sortability.


I'm confused.  I understood your algorithm to require repacking,
rather than it being a suitable algorithm for incremental change to an
existing dictionary.  ISTM that that would mean you still pay some
sort of overhead (either in time or space) while the dictionary is
still being mutated.


As-is the algorithm just assumes all key-value-pairs are available at 
creation time.


So yes, if you don't reallocate when making the dict perfect then it 
could make sense to combine it with the scheme discussed in this thread.


If one does leave some free slots open there's some probability of an 
insertion working without complete repacking, but the probability is 
smaller than with a normal dict. Hybrid schemes and trade-offs in this 
direction could be possible.




Also, I'm not sure how 2 levels of indirection come into it.   The
algorithm you describe has, as I understand it, only up to 12
perturbation values (bins), so it's a constant space overhead, not a
variable one.  What's more, you can possibly avoid the extra memory
access by using a different perfect hashing algorithm, at the cost of
a slower optimization step or using a little more memory.


I said there's k perturbation values; you need an additional array

some_int_t d[k]

where some_int_t is large enough to hold n (the number of entries). Just 
like what's proposed in this thread.


The paper recommends k  2*n, but in my experiments I could get away 
with k = n in 99.9% of the cases (given perfect entropy in the 
hashes...). So the overhead is roughly the same as what's proposed here.


I think the most promising thing would be to have always have a single 
integer table and either use it for indirection (usual case) or perfect 
hash function parameters (say, after a pack() method has been called and 
before new insertions).



Note: I'm NOT suggesting the use of perfect hashing, just making
sure it's existence is mentioned and that one is aware that if
always-ordered dicts become the language standard it precludes
this option far off in the future.


Not really.  It means that some forms of perfect hashing might require
adding a few more ints worth of overhead for the dictionaries that use
it.  If it's really a performance benefit for very-frequently-used
dictionaries, that might still be worthwhile.



As mentioned above the overhead is larger.

I think the main challenge is to switch to a hashing scheme with larger 
entropy for strings, like murmurhash3. Having lots of zero bits in the 
string for short strings will kill the scheme, since it needs several 
attempts to succeed (the r parameter). So string hashing is slowed 
down a bit (given the caching I don't know how important this is).


Ideally one should make sure the hashes 64-bit on 64-bit platforms too 
(IIUC long is 32-bit on Windows but I don't know Windows well).


But the main reason I say I'm not proposing it is I don't have time to 
code it up for demonstration and people like to have something to look 
at when they get proposals :-)


Dag Sverre
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP for time zone support.

2012-12-12 Thread Janzert

On 12/13/2012 1:39 AM, Glenn Linderman wrote:

On 12/12/2012 6:10 PM, Janzert wrote:

On 12/12/2012 8:43 PM, Glenn Linderman wrote:

On 12/12/2012 5:36 PM, Brian Curtin wrote:


 C:\ProgramData\Python



   ^ That.  Is not the path that the link below is talking
about, though.



It actually does; it is rather confusing though. :/


I agree with the below. But I have never seen a version of Windows on
which c:\ProgramData was the actual path for FOLDERID_ProgramData. Can
you reference documentation that states that it was there, for some
version?  This documentation speaks of:

c:\Documents and Settings\AllUsers\Application Data (which I knew from
XP, and I think 2000, not sure I remember NT)

In Vista.0, Vista.1, and Vista.2, I guess it is moved to
C:\users\AllUsers\AppData\Roaming (typically).

Neither of those would result in C:\ProgramData\Python.



The SO answer links to the KNOWNFOLDERID docs; the relevant entry 
specifically is at


http://msdn.microsoft.com/en-us/library/windows/desktop/dd378457.aspx#FOLDERID_ProgramData

which gives the default path as,

%ALLUSERSPROFILE% (%ProgramData%, %SystemDrive%\ProgramData)

checking on my local windows 7 install gives:

C:\echo %ALLUSERSPROFILE%
C:\ProgramData

C:\echo %ProgramData%
C:\ProgramData



It's referring to KNOWNFOLDERID constant FOLDERID_ProgramData. The
actual on disk location for this has changed over windows versions. As
noted below in the SO link given:

Note that this documentation refers to the typical path as per older
versions of Windows. In modern versions of Windows it is located in
%SystemDrive%\ProgramData.




 Making a new top-level directory without asking is obnoxious.

See
http://stackoverflow.com/questions/9518890/what-is-the-significance-programdata-in-windows








___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com