Re: cPython, IronPython, Jython, and PyPy (Oh my!)

2012-05-19 Thread Adam Tauno Williams
On Thu, 2012-05-17 at 11:13 +1000, Chris Angelico wrote: 
 On Thu, May 17, 2012 at 9:01 AM, Ethan Furman et...@stoneleaf.us wrote:
  A record is an interesting critter -- it is given life either from the user
  or from the disk-bound data;  its fields can then change, but those changes
  are not reflected on disk until .write_record() is called;  I do this
  because I am frequently moving data from one table to another, making
  changes to the old record contents before creating the new record with the
  changes -- since I do not call .write_record() on the old record those
  changes do not get backed up to disk.
 I strongly recommend being more explicit about usage and when it gets
 written and re-read, 

You need to define a 'session' that tracks records and manages flushing.
Potentially it can hold a pool of weak references to record objects that
have been read from disk.  Record what records are 'dirty' and flush
those to disk explicitly or drop all records ('essentially rollback').
That is the only sane way to manage this.

 rather than relying on garbage collection.

+1 +1 Do *not* rely on implementation details as features.  Sooner or
later doing so will always blow-up.

 Databasing should not be tied to a language's garbage collection.
 Imagine you were to reimplement the equivalent logic in some other
 language - could you describe it clearly? If so, then that's your
 algorithm. If not, you have a problem.


signature.asc
Description: This is a digitally signed message part
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: cPython, IronPython, Jython, and PyPy (Oh my!)

2012-05-16 Thread Ian Kelly
On Wed, May 16, 2012 at 3:33 PM, Ethan Furman et...@stoneleaf.us wrote:
 Just hit a snag:

 In cPython the deterministic garbage collection allows me a particular
 optimization when retrieving records from a dbf file -- namely, by using
 weakrefs I can tell if the record is still in memory and active, and if so
 not hit the disk to get the data;  with PyPy (and probably the others) this
 doesn't work because the record may still be around even when it is no
 longer active because it hasn't been garbage collected yet.

 For PyPy I can use `'PyPy' in sys.version` to set a constant
 (REFRESH_FROM_DISK in this case) to disable the cPython optimization; does
 anyone know what strings to look for for the other implementations?

Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
(Intel)] on win32
Type help, copyright, credits or license for more information.
 import sys
 sys.subversion
('CPython', 'tags/r271', '86832')

Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_31
Type help, copyright, credits or license for more information.
 import sys
 sys.subversion
('Jython', 'tags/Release_2_5_2', '7206')

I don't know what IronPython or PyPy return, but it should be
something other than 'CPython'.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: cPython, IronPython, Jython, and PyPy (Oh my!)

2012-05-16 Thread Tim Delaney
On 17 May 2012 07:33, Ethan Furman et...@stoneleaf.us wrote:

 Just hit a snag:

 In cPython the deterministic garbage collection allows me a particular
 optimization when retrieving records from a dbf file -- namely, by using
 weakrefs I can tell if the record is still in memory and active, and if so
 not hit the disk to get the data;  with PyPy (and probably the others) this
 doesn't work because the record may still be around even when it is no
 longer active because it hasn't been garbage collected yet.


What is the distinguishing feature of an active record? What is the
problem if you get back a reference to an inactive record? And if there is
indeed a problem, don't you already have a race condition on CPython?

1. Record is active;
2. Get reference to record through weak ref;
3. Record becomes inactive;
4. Start trying to use the (now inactive) record.

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: cPython, IronPython, Jython, and PyPy (Oh my!)

2012-05-16 Thread Ethan Furman

Ian Kelly wrote:

On Wed, May 16, 2012 at 3:33 PM, Ethan Furman et...@stoneleaf.us wrote:

Just hit a snag:

In cPython the deterministic garbage collection allows me a particular
optimization when retrieving records from a dbf file -- namely, by using
weakrefs I can tell if the record is still in memory and active, and if so
not hit the disk to get the data;  with PyPy (and probably the others) this
doesn't work because the record may still be around even when it is no
longer active because it hasn't been garbage collected yet.

For PyPy I can use `'PyPy' in sys.version` to set a constant
(REFRESH_FROM_DISK in this case) to disable the cPython optimization; does
anyone know what strings to look for for the other implementations?


Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
(Intel)] on win32
Type help, copyright, credits or license for more information.

import sys
sys.subversion

('CPython', 'tags/r271', '86832')

Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_31
Type help, copyright, credits or license for more information.

import sys
sys.subversion

('Jython', 'tags/Release_2_5_2', '7206')

I don't know what IronPython or PyPy return, but it should be
something other than 'CPython'.


Thanks!  That will do the trick.  On CPython 2.4 .subversion does not 
exist, so I'll use:


subversion = getattr(sys, 'subversion', None)
if subversion is not None and subversion[0] != 'CPython':
...

Hopefully all the others do have it defined (PyPy does, at least as of 1.8).

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: cPython, IronPython, Jython, and PyPy (Oh my!)

2012-05-16 Thread Ethan Furman

Tim Delaney wrote:

On 17 May 2012 07:33, Ethan Furman wrote:

Just hit a snag:

In cPython the deterministic garbage collection allows me a
particular optimization when retrieving records from a dbf file --
namely, by using weakrefs I can tell if the record is still in
memory and active, and if so not hit the disk to get the data;  with
PyPy (and probably the others) this doesn't work because the record
may still be around even when it is no longer active because it
hasn't been garbage collected yet.



What is the distinguishing feature of an active record? What is the 
problem if you get back a reference to an inactive record? And if there 
is indeed a problem, don't you already have a race condition on CPython?


1. Record is active;
2. Get reference to record through weak ref;
3. Record becomes inactive;
4. Start trying to use the (now inactive) record.


A record is an interesting critter -- it is given life either from the 
user or from the disk-bound data;  its fields can then change, but those 
changes are not reflected on disk until .write_record() is called;  I do 
this because I am frequently moving data from one table to another, 
making changes to the old record contents before creating the new record 
with the changes -- since I do not call .write_record() on the old 
record those changes do not get backed up to disk.


With CPython as soon as a record goes out of scope it dies, and the next 
time I try to access that record I will get the disk version, without 
the temporary changes I had made earlier (this is good).  However, with 
PyPy (and others) not all records are destroyed before I try to access 
them again, and I end up seeing the temp data instead of the disk data.


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: cPython, IronPython, Jython, and PyPy (Oh my!)

2012-05-16 Thread Chris Angelico
On Thu, May 17, 2012 at 9:01 AM, Ethan Furman et...@stoneleaf.us wrote:
 A record is an interesting critter -- it is given life either from the user
 or from the disk-bound data;  its fields can then change, but those changes
 are not reflected on disk until .write_record() is called;  I do this
 because I am frequently moving data from one table to another, making
 changes to the old record contents before creating the new record with the
 changes -- since I do not call .write_record() on the old record those
 changes do not get backed up to disk.

I strongly recommend being more explicit about usage and when it gets
written and re-read, rather than relying on garbage collection.
Databasing should not be tied to a language's garbage collection.
Imagine you were to reimplement the equivalent logic in some other
language - could you describe it clearly? If so, then that's your
algorithm. If not, you have a problem.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: cPython, IronPython, Jython, and PyPy (Oh my!)

2012-05-16 Thread Tim Delaney
On 17 May 2012 11:13, Chris Angelico ros...@gmail.com wrote:

 On Thu, May 17, 2012 at 9:01 AM, Ethan Furman et...@stoneleaf.us wrote:
  A record is an interesting critter -- it is given life either from the
 user
  or from the disk-bound data;  its fields can then change, but those
 changes
  are not reflected on disk until .write_record() is called;  I do this
  because I am frequently moving data from one table to another, making
  changes to the old record contents before creating the new record with
 the
  changes -- since I do not call .write_record() on the old record those
  changes do not get backed up to disk.

 I strongly recommend being more explicit about usage and when it gets
 written and re-read, rather than relying on garbage collection.
 Databasing should not be tied to a language's garbage collection.
 Imagine you were to reimplement the equivalent logic in some other
 language - could you describe it clearly? If so, then that's your
 algorithm. If not, you have a problem.


Agreed. To me, this sounds like a perfect case for with: blocks and
explicit reference counting.  Something like (pseudo-python - not runnable):

class Record:
def __init__(self):
self.refs = 0
self.lock = threading.Lock()

def __enter__(self):
with self.lock:
self.refs += 1

def __exit__(self):
with self.lock:
self.refs -=1

if self.refs == 0:
self.write_record()

rest of Record class

rec = record_weakrefs.get('record_name')

if rec is None:
rec = load_record()
record_weakrefs.put('record_name', rec)

with rec:
do_stuff

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: cPython, IronPython, Jython, and PyPy (Oh my!)

2012-05-16 Thread Ethan Furman

Chris Angelico wrote:

On Thu, May 17, 2012 at 9:01 AM, Ethan Furman et...@stoneleaf.us wrote:

A record is an interesting critter -- it is given life either from the user
or from the disk-bound data;  its fields can then change, but those changes
are not reflected on disk until .write_record() is called;  I do this
because I am frequently moving data from one table to another, making
changes to the old record contents before creating the new record with the
changes -- since I do not call .write_record() on the old record those
changes do not get backed up to disk.


I strongly recommend being more explicit about usage and when it gets
written and re-read, rather than relying on garbage collection.
Databasing should not be tied to a language's garbage collection.
Imagine you were to reimplement the equivalent logic in some other
language - could you describe it clearly? If so, then that's your
algorithm. If not, you have a problem.


Yeah, I've been thinking about this for a couple hours now;  initially 
(way back when) I didn't want to keep hitting the disk unnecessarily 
-- but all my other supporting data structures go to great lengths to 
not keep records in memory unless the user has them explicitly named or 
contained... I think I've been fighting against myself!  Good news is 
I'm winning.  ;)


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list