Re: cPython, IronPython, Jython, and PyPy (Oh my!)
On Thu, 2012-05-17 at 11:13 +1000, Chris Angelico wrote: On Thu, May 17, 2012 at 9:01 AM, Ethan Furman et...@stoneleaf.us wrote: A record is an interesting critter -- it is given life either from the user or from the disk-bound data; its fields can then change, but those changes are not reflected on disk until .write_record() is called; I do this because I am frequently moving data from one table to another, making changes to the old record contents before creating the new record with the changes -- since I do not call .write_record() on the old record those changes do not get backed up to disk. I strongly recommend being more explicit about usage and when it gets written and re-read, You need to define a 'session' that tracks records and manages flushing. Potentially it can hold a pool of weak references to record objects that have been read from disk. Record what records are 'dirty' and flush those to disk explicitly or drop all records ('essentially rollback'). That is the only sane way to manage this. rather than relying on garbage collection. +1 +1 Do *not* rely on implementation details as features. Sooner or later doing so will always blow-up. Databasing should not be tied to a language's garbage collection. Imagine you were to reimplement the equivalent logic in some other language - could you describe it clearly? If so, then that's your algorithm. If not, you have a problem. signature.asc Description: This is a digitally signed message part -- http://mail.python.org/mailman/listinfo/python-list
Re: cPython, IronPython, Jython, and PyPy (Oh my!)
On Wed, May 16, 2012 at 3:33 PM, Ethan Furman et...@stoneleaf.us wrote: Just hit a snag: In cPython the deterministic garbage collection allows me a particular optimization when retrieving records from a dbf file -- namely, by using weakrefs I can tell if the record is still in memory and active, and if so not hit the disk to get the data; with PyPy (and probably the others) this doesn't work because the record may still be around even when it is no longer active because it hasn't been garbage collected yet. For PyPy I can use `'PyPy' in sys.version` to set a constant (REFRESH_FROM_DISK in this case) to disable the cPython optimization; does anyone know what strings to look for for the other implementations? Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import sys sys.subversion ('CPython', 'tags/r271', '86832') Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_31 Type help, copyright, credits or license for more information. import sys sys.subversion ('Jython', 'tags/Release_2_5_2', '7206') I don't know what IronPython or PyPy return, but it should be something other than 'CPython'. -- http://mail.python.org/mailman/listinfo/python-list
Re: cPython, IronPython, Jython, and PyPy (Oh my!)
On 17 May 2012 07:33, Ethan Furman et...@stoneleaf.us wrote: Just hit a snag: In cPython the deterministic garbage collection allows me a particular optimization when retrieving records from a dbf file -- namely, by using weakrefs I can tell if the record is still in memory and active, and if so not hit the disk to get the data; with PyPy (and probably the others) this doesn't work because the record may still be around even when it is no longer active because it hasn't been garbage collected yet. What is the distinguishing feature of an active record? What is the problem if you get back a reference to an inactive record? And if there is indeed a problem, don't you already have a race condition on CPython? 1. Record is active; 2. Get reference to record through weak ref; 3. Record becomes inactive; 4. Start trying to use the (now inactive) record. Tim Delaney -- http://mail.python.org/mailman/listinfo/python-list
Re: cPython, IronPython, Jython, and PyPy (Oh my!)
Ian Kelly wrote: On Wed, May 16, 2012 at 3:33 PM, Ethan Furman et...@stoneleaf.us wrote: Just hit a snag: In cPython the deterministic garbage collection allows me a particular optimization when retrieving records from a dbf file -- namely, by using weakrefs I can tell if the record is still in memory and active, and if so not hit the disk to get the data; with PyPy (and probably the others) this doesn't work because the record may still be around even when it is no longer active because it hasn't been garbage collected yet. For PyPy I can use `'PyPy' in sys.version` to set a constant (REFRESH_FROM_DISK in this case) to disable the cPython optimization; does anyone know what strings to look for for the other implementations? Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import sys sys.subversion ('CPython', 'tags/r271', '86832') Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_31 Type help, copyright, credits or license for more information. import sys sys.subversion ('Jython', 'tags/Release_2_5_2', '7206') I don't know what IronPython or PyPy return, but it should be something other than 'CPython'. Thanks! That will do the trick. On CPython 2.4 .subversion does not exist, so I'll use: subversion = getattr(sys, 'subversion', None) if subversion is not None and subversion[0] != 'CPython': ... Hopefully all the others do have it defined (PyPy does, at least as of 1.8). ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: cPython, IronPython, Jython, and PyPy (Oh my!)
Tim Delaney wrote: On 17 May 2012 07:33, Ethan Furman wrote: Just hit a snag: In cPython the deterministic garbage collection allows me a particular optimization when retrieving records from a dbf file -- namely, by using weakrefs I can tell if the record is still in memory and active, and if so not hit the disk to get the data; with PyPy (and probably the others) this doesn't work because the record may still be around even when it is no longer active because it hasn't been garbage collected yet. What is the distinguishing feature of an active record? What is the problem if you get back a reference to an inactive record? And if there is indeed a problem, don't you already have a race condition on CPython? 1. Record is active; 2. Get reference to record through weak ref; 3. Record becomes inactive; 4. Start trying to use the (now inactive) record. A record is an interesting critter -- it is given life either from the user or from the disk-bound data; its fields can then change, but those changes are not reflected on disk until .write_record() is called; I do this because I am frequently moving data from one table to another, making changes to the old record contents before creating the new record with the changes -- since I do not call .write_record() on the old record those changes do not get backed up to disk. With CPython as soon as a record goes out of scope it dies, and the next time I try to access that record I will get the disk version, without the temporary changes I had made earlier (this is good). However, with PyPy (and others) not all records are destroyed before I try to access them again, and I end up seeing the temp data instead of the disk data. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: cPython, IronPython, Jython, and PyPy (Oh my!)
On Thu, May 17, 2012 at 9:01 AM, Ethan Furman et...@stoneleaf.us wrote: A record is an interesting critter -- it is given life either from the user or from the disk-bound data; its fields can then change, but those changes are not reflected on disk until .write_record() is called; I do this because I am frequently moving data from one table to another, making changes to the old record contents before creating the new record with the changes -- since I do not call .write_record() on the old record those changes do not get backed up to disk. I strongly recommend being more explicit about usage and when it gets written and re-read, rather than relying on garbage collection. Databasing should not be tied to a language's garbage collection. Imagine you were to reimplement the equivalent logic in some other language - could you describe it clearly? If so, then that's your algorithm. If not, you have a problem. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: cPython, IronPython, Jython, and PyPy (Oh my!)
On 17 May 2012 11:13, Chris Angelico ros...@gmail.com wrote: On Thu, May 17, 2012 at 9:01 AM, Ethan Furman et...@stoneleaf.us wrote: A record is an interesting critter -- it is given life either from the user or from the disk-bound data; its fields can then change, but those changes are not reflected on disk until .write_record() is called; I do this because I am frequently moving data from one table to another, making changes to the old record contents before creating the new record with the changes -- since I do not call .write_record() on the old record those changes do not get backed up to disk. I strongly recommend being more explicit about usage and when it gets written and re-read, rather than relying on garbage collection. Databasing should not be tied to a language's garbage collection. Imagine you were to reimplement the equivalent logic in some other language - could you describe it clearly? If so, then that's your algorithm. If not, you have a problem. Agreed. To me, this sounds like a perfect case for with: blocks and explicit reference counting. Something like (pseudo-python - not runnable): class Record: def __init__(self): self.refs = 0 self.lock = threading.Lock() def __enter__(self): with self.lock: self.refs += 1 def __exit__(self): with self.lock: self.refs -=1 if self.refs == 0: self.write_record() rest of Record class rec = record_weakrefs.get('record_name') if rec is None: rec = load_record() record_weakrefs.put('record_name', rec) with rec: do_stuff Tim Delaney -- http://mail.python.org/mailman/listinfo/python-list
Re: cPython, IronPython, Jython, and PyPy (Oh my!)
Chris Angelico wrote: On Thu, May 17, 2012 at 9:01 AM, Ethan Furman et...@stoneleaf.us wrote: A record is an interesting critter -- it is given life either from the user or from the disk-bound data; its fields can then change, but those changes are not reflected on disk until .write_record() is called; I do this because I am frequently moving data from one table to another, making changes to the old record contents before creating the new record with the changes -- since I do not call .write_record() on the old record those changes do not get backed up to disk. I strongly recommend being more explicit about usage and when it gets written and re-read, rather than relying on garbage collection. Databasing should not be tied to a language's garbage collection. Imagine you were to reimplement the equivalent logic in some other language - could you describe it clearly? If so, then that's your algorithm. If not, you have a problem. Yeah, I've been thinking about this for a couple hours now; initially (way back when) I didn't want to keep hitting the disk unnecessarily -- but all my other supporting data structures go to great lengths to not keep records in memory unless the user has them explicitly named or contained... I think I've been fighting against myself! Good news is I'm winning. ;) ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list