Re: [ZODB-Dev] what's the latest on zodb/zeo+memcached?

2013-01-18 Thread Laurence Rowe
On 18 January 2013 10:21, Claudiu Saftoiu csaft...@gmail.com wrote:

  Er, to be clearer: my goal is for the preload to load everything into
  the
  cache that the query mechanism might use.
 
  It seems the bucket approach only takes ~10 seconds on the 350k-sized
  index
  trees vs. ~60-90 seconds. This seems to indicate that less things end up
  being pre-loaded...

 I guess I was too subtle before.

 Preloading is a waste of time.  Just use a persistent ZEO cache
 of adequate size and be done with it.


 Okay. I did that, and I only tried the preloading because it didn't seem I
 was getting what I wanted.

 To wit: I ran a simple query and it took a good few minutes. It's true,
 after it took a few minutes, it ran instantly, and even after a server
 restart it only took a few seconds, but I don't understand why it took a few
 minutes in the first place. There are only 750k objects in that database,
 and I gave it a cache object size of 5 million; the packed database .fs is
 only 400 megabytes, and I gave it a cache byte size of 3000 megabytes.

 Then when I change one parameter of the query (to ask for objects with a
 month of november instead of october), it takes another few minutes...

 Speaking to your point, preloading didn't seem to help either (I had
 'preloaded' dozens of times over the past few days and the queries still
 took forever), but the fact remains: it does not seem unreasonable to want
 these queries to run instantly from the get-go, given that is the point of
 indexing in the first place. As it stands now, for certain queries I could
 probably do better loading each object and filtering it via python because I
 wouldn't have to deal with loading the indices in order to run the 'fast'
 query, but this seems to defeat the point of indices entirely, and I'd like
 to not have to create custom search routines for every separate query.
 Again, maybe I'm doing something wrong, but I haven't been able to figure it
 out yet.

 I made a view to display the output of cacheDetailSize like Jeff suggested
 and I got something like this:

 db = ...
 for conn_d in db.cacheDetailSize():
 writer.write(%(connection)s, size=%(size)s,
 non-ghost-size=%(ngsize)s\n % conn_d)

 output:

 Connection at 0684fe90, size=635683, non-ghost-size=209039
 Connection at 146c5ad0, size=3490, non-ghost-size=113

 That is after having run the 'preloading'. It seems that when the query
 takes forever, the non-ghost-size is slowly increasing (~100 objects/second)
 while the 'size' stays the same. Once the query is done after having taken a
 few minutes, each subsequent run is instant and the ngsize doesn't grow. My
 naive question is: it has plenty of RAM, why does it not just load
 everything into the RAM?

 Any suggestions? There must be a way to effectively use indexing with zodb
 and what I'm doing isn't working.

Have you confirmed that the ZEO client cache file is being used?
Configure logging to display the ZEO messages to make sure.

The client cache is transient by default, so you will need to enable
persistent client caching to see an effect past restarts:

zeoclient
  client zeo1
  ...
/zeoclient

https://github.com/zopefoundation/ZODB/blob/master/doc/zeo-client-cache.txt

Laurence
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RFC: ZODB 4.0 (without persistent)

2012-10-14 Thread Laurence Rowe
On 14 October 2012 22:49, Jim Fulton j...@zope.com wrote:
 On Sun, Oct 14, 2012 at 5:28 PM, Tres Seaver tsea...@palladion.com wrote:
 ...
 Well, I don't have time to chase BTrees.  This could always be done in
 ZODB 5. :)

 I could help chop BTrees out, if that would be useful:  most of the
 effort will be purely subtractive in the ZODB package (I don't think
 anything depends on BTrees).

 FileStorage uses BTrees for it's in-memory index.

 MappingStorage used BTrees.

 There are ZODB tests that use BTrees,
 but I suppose they could be fixed.

 I just don't think the win is that great
 in separating BTrees at this time.

I don't think Hanno is suggesting removing BTrees as a dependency from
ZODB but rather breaking out the BTrees package into a separate PyPI
distribution to make it more visible to potential users outside of the
ZODB community, e.g.
http://www.reddit.com/r/Python/comments/exj74/btree_c_extension_module_for_python_alpha/

To do that, refactoring tests shouldn't be required. I guess it could
be argued that the fsBTree should be part of the ZODB rather than
BTrees distribution, but leaving it where it is would be much easier.

Laurence
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RFC: ZODB 4.0 (without persistent)

2012-10-14 Thread Laurence Rowe
On 14 October 2012 23:33, Jim Fulton j...@zope.com wrote:
 On Sun, Oct 14, 2012 at 6:07 PM, Laurence Rowe l...@lrowe.co.uk wrote:
 On 14 October 2012 22:49, Jim Fulton j...@zope.com wrote:
 On Sun, Oct 14, 2012 at 5:28 PM, Tres Seaver tsea...@palladion.com wrote:
 ...
 Well, I don't have time to chase BTrees.  This could always be done in
 ZODB 5. :)

 I could help chop BTrees out, if that would be useful:  most of the
 effort will be purely subtractive in the ZODB package (I don't think
 anything depends on BTrees).

 FileStorage uses BTrees for it's in-memory index.

 MappingStorage used BTrees.

 There are ZODB tests that use BTrees,
 but I suppose they could be fixed.

 I just don't think the win is that great
 in separating BTrees at this time.

 I don't think Hanno is suggesting removing BTrees as a dependency from
 ZODB but rather breaking out the BTrees package into a separate PyPI
 distribution to make it more visible to potential users outside of the
 ZODB community, e.g.
 http://www.reddit.com/r/Python/comments/exj74/btree_c_extension_module_for_python_alpha/

 I think if we released a package named BTrees and people looked at it and
 saw that it was dependent on persistent and ZODB, they'd get pissed.

 Let's leave BTrees alone for now.

Presumably the dependency tree would look something like:

  persistent  BTrees  ZODB  ZEO

The persistent dependency is definitely less to swallow than the whole
ZODB for a potential user of the BTrees package, but its still a
complication and there's no urgent reason to make the change now.
Smaller, iterative changes usually win.

Laurence
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Storm/ZEO deadlocks (was Re: [Zope-dev] [announce] NEO 1.0 - scalable and redundant storage for ZODB)

2012-08-30 Thread Laurence Rowe
On 30 August 2012 19:19, Shane Hathaway sh...@hathawaymix.org wrote:
 On 08/30/2012 10:14 AM, Marius Gedminas wrote:

 On Wed, Aug 29, 2012 at 06:30:50AM -0400, Jim Fulton wrote:

 On Wed, Aug 29, 2012 at 2:29 AM, Marius Gedminas mar...@gedmin.as
 wrote:

 On Tue, Aug 28, 2012 at 06:31:05PM +0200, Vincent Pelletier wrote:

 On Tue, 28 Aug 2012 16:31:20 +0200,
 Martijn Pieters m...@zopatista.com wrote :

 Anything else different? Did you make any performance comparisons
 between RelStorage and NEO?


 I believe the main difference compared to all other ZODB Storage
 implementation is the finer-grained locking scheme: in all storage
 implementations I know, there is a database-level lock during the
 entire second phase of 2PC, whereas in NEO transactions are serialised
 only when they alter a common set of objects.


 This could be a compelling point.  I've seen deadlocks in an app that
 tried to use both ZEO and PostgreSQL via the Storm ORM.  (The thread
 holding the ZEO commit lock was blocked waiting for the PostgreSQL
 commit to finish, while the PostgreSQL server was waiting for some other
 transaction to either commit or abort -- and that other transaction
 couldn't proceed because it was waiting for the ZEO lock.)


 This sounds like an application/transaction configuration problem.


 *shrug*

 Here's the code to reproduce it: http://pastie.org/4617132

 To avoid this sort of deadlock, you need to always commit in a
 a consistent order.  You also need to configure ZEO (or NEO)
 to time-out transactions that take too long to finish the second phase.


 The deadlock happens in tpc_begin() in both threads, which is the first
 phase, AFAIU.

 AFAICS Thread #2 first performs tpc_begin() for ClientStorage and takes
 the ZEO commit lock.  Then it enters tpc_begin() for Storm's
 StoreDataManager and blocks waiting for a response from PostgreSQL --
 which is delayed because the PostgreSQL server is waiting to see if
 the other thread, Thread #1, will commit or abort _its_ transaction, which
 is conflicting with the one from Thread #2.

 Meanwhile Thread #1 is blocked in ZODB's tpc_begin(), trying to acquire
 the
 ZEO commit lock held by Thread #2.


 So thread 1 acquires in this order:

 1. PostgreSQL
 2. ZEO

 Thread 2 acquires in this order:

 1. ZEO
 2. PostgreSQL

 SQL databases handle deadlocks by detecting and automatically rolling back
 transactions, while the transaction package expects all data managers to
 completely avoid deadlocks using the sortKey method.

 I haven't looked at the code, but I imagine Storm's StoreDataManager
 implements IDataManager.  I wonder if StoreDataManager provides a consistent
 sortKey.  The sortKey method must return a string (not an integer or other
 object) that is consistent yet different from all other participating data
 managers.

Storm's DataManager defines sortKey as:

def sortKey(self):
# Stores in TPC mode should be the last to be committed, this makes
# it possible to have TPC behavior when there's only a single store
# not in TPC mode.
if self._store._tpc:
prefix = zz
else:
prefix = aa
return %s_store_%d % (prefix, id(self))

http://bazaar.launchpad.net/~storm/storm/trunk/view/head:/storm/zope/zstorm.py#L320

(By default self._store._tpc is set to False.)


This is essentially similar to zope.sqlalchemy's, the single phase
variant being:

105 def sortKey(self):
106 # Try to sort last, so that we vote last - we may commit
in tpc_vote(),
107 # which allows Zope to roll back its transaction if the RDBMS
108 # threw a conflict error.
109 return ~sqlalchemy:%d % id(self.tx)

http://zope3.pov.lt/trac/browser/zope.sqlalchemy/trunk/src/zope/sqlalchemy/datamanager.py#L105

(The TPC variant simply omits the leading tilde as it is not required
to sort last - zope.sqlalchemy commits in tpc_vote() rather than
tpc_finish() when using single phase commit.)


ZEO's sortKey is:

698 def sortKey(self):
699 # If the client isn't connected to anything, it can't have a
700 # valid sortKey().  Raise an error to stop the transaction 
early.
701 if self._server_addr is None:
702 raise ClientDisconnected
703 else:
704 return '%s:%s' % (self._storage, self._server_addr)

http://zope3.pov.lt/trac/browser/ZODB/trunk/src/ZEO/ClientStorage.py#L698

(self._storage defaults to the string '1'.)


This should mean that ZEO always gets a sortKey like '1:./zeosock' in
the example given whereas Storm gets a sortKey like 'aa_storm_12345'
(though the final number will vary per transaction.) Which should mean
a consistent sort order and ZEO always committing first.

It seems StormDataManager only commits in tpc_finish, doing nothing in
either of commit() or tpc_vote() stages when in 1PC mode. As ZEO sorts
first a failure to commit by Storm could never abort the ZEO server's

Re: [ZODB-Dev] ZODB via Pipe/Socket

2012-03-20 Thread Laurence Rowe
On 20 March 2012 16:52, Adam Tauno Williams awill...@whitemice.org wrote:
 It is possible to open a ZODB in a thread and share it to other threads
 via a filesystem socket or pipe [rather than a TCP conntection]?  I've
 searched around and haven't found any reference to such a configuration.

This resolved bug report suggests you can using ZEO:
https://bugs.launchpad.net/zodb/+bug/663259

Laurence
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Build compression into ZODB 3.11?

2012-03-14 Thread Laurence Rowe
On 14 March 2012 17:47, Jim Fulton j...@zope.com wrote:
 I'm pretty happy with how zc.zlibstorage has worked out.

 Should I build this into ZODB 3.11?

+1

 BTW, lz4 compression looks interesting.

 The Python binding (at least from PyPI) is broken.
 I submitted an issue. Hopefully it will be fixed.

FWIW, I experimented with c_zlib from https://gist.github.com/242459
in order to use a zlib default dictionary - a 32KB string used to
pre-fill the compression buffer.

Using a ~75MB Data.fs from a Plone site that compressed down to ~30MB
with zc.zlibstorage normally, the most successful dictionary I tried
was the end of the Data.fs itself which saved only an additional 6%
over an empty dictionary. That feels like an unfair test to me,
probably deduplicating serialized catalog bucket values. The next best
was the last 32KB from another Plone Data.fs which only managed to
save an additional 2.5% and a fairly short dictionary with common
pickled classes saved an additional 2%.

None of those savings seem worthwhile pursuing further given the extra
brittleness involved.

Laurence
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Server-side caching

2012-02-13 Thread Laurence Rowe
On 13 February 2012 10:06, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote:
 The OS' file-system cache acts as a storage server cache.  The storage
 server does (essentially) no processing to data read from disk, so an
 application-level cache would add nothing over the disk cache provided by
 the storage server.


 I see, then I guess it would be good to have at least the same amount of RAM
 as the total size of the DB, no? From what I see in our server, the linux
 buffer cache takes around 13GB of the 16G available, while the rest is
 mostly taken by the ZEO process (1.7G). The database is 17GB on disk.

Adding enough memory so the database fits in RAM is always a good idea.

Since the introduction of blobs, this should be possible (and
relatively cheap) for most ZODB deployments. For Plone sites, a 30GB
pre-blobs Data.fs typically falls to 2-3GB with blobs.

There's also the wrapper storage zc.zlibstorage which compresses ZODB
records allowing more of the database to fit in RAM (RelStorage has an
option to compress records.)

 Also note that, for better or worse, FileStorage uses an in-memory index
 of current record positions, so no disk access is needed to find current
 data.


 Yes, but pickles still have to be retrieved, right? I guess this would mean
 random access (for a database like ours, in which we have many small
 objects), which doesn't favor cache performance.

 I'm asking this because in the tests we've made wih SSDs we have seen a 20%
 decrease in reading time for non-client-cached objects. So, there seems to
 be some disk i/o going on.

The mean performance improvement doesn't tell the whole story here.
With most of you database in the file-system cache median read times
will be identical, but your 95th percentile read times will show a
huge decrease as the seek time on an SSD is orders of magnitude lower
than the seek time of a spinning disk.

Even when you have enough RAM so the OS can cache the database in
memory, I still think SSDs are worthwhile. Packing the database,
backing up or any operation that churns through the disk can all cause
the database to drop out of the file-system cache. Be sure to choose
an SSD with capacitor backup so it won't lose your data, see:
http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html.

 In general, I'd say no.  It can depend on lots of details, including:

 - database size
 - active set size
 - network speed
 - memory and disk speeds on clients and servers
 - ...


 In any case, from what I see, these client caches cannot be shared between
 processes, which doesn't make them very useful , in which we have many
 parallel processes asking for the same objects over and over again.

You could try a ZEO fanout setup too, where you have a  ZEO server
running on each client machine. The intermediary ZEO's client cache
(you could put it on tmpfs if you have enough RAM) is then shared
between all the clients running on that machine.

Laurence
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] zeopack error

2012-02-09 Thread Laurence Rowe
On 9 February 2012 11:24, Jim Fulton j...@zope.com wrote:
 I'm sorry I haven't had time to look at this. Still don't really.

 Thanks Marius!!!

 On Wed, Feb 8, 2012 at 6:48 PM, Marius Gedminas mar...@gedmin.as wrote:
 On Thu, Feb 09, 2012 at 01:25:48AM +0200, Marius Gedminas wrote:
 On Wed, Feb 08, 2012 at 01:24:55PM +0100, Kaweh Kazemi wrote:
  Recap: last week I examined problems I had packing our 4GB users
  storage.
 ...
      unp = pickle.Unpickler(f)
      unp.persistent_load = lambda oid: 'persistent reference %r' % oid
      pprint.pprint(unp.load())
     {'data': {persistent reference ['m', ('game', 
 '\\x00\\x00\\x00\\x00\\x00\\x00\\tT', class '__main__.Tool')]: 1,
               persistent reference ['m', ('game', 
 '\\x00\\x00\\x00\\x00\\x00\\x00\\x12\\x03', class 
 '__main__.EnergyPack')]: 1}}

 Note the reference to __main__. This is almost certainly the root problem.
 Classes shouldn't be defined in __main__ (except when experimenting).

 At one time, I thought pickle disallowed pickling classes from __main__.
 ZODB probably should. It's a bug magnet.



 Those look like cross-database references to me.

 The original error (aaaugh Mutt makes it hard for me to look upthread
 while I'm writing a response) was something about non-hashable lists?
 Looks like a piece of code is trying to put persistent references into a
 dict, which can't possibly work in all cases.
 ...
  During my checks I realized that running the pack in a Python 2.7
  environment (using the same ZODB version - 3.10.3) works fine, the
  pack reduces our 4GB storage to 1GB. But our production server uses
  Python 2.6 (same ZODB3.10.3) which yields the problem (though the test
  had been done on OS X 10.7.3 - 64bit, and the production server is
  Debian Squeeze 32bit).

 I've no idea why running the same ZODB version on Python 2.7 instead of
 2.6 would make this error go away.

 Duh!  The code that fails is in the standard library -- in the cPickle
 module:

  Traceback (most recent call last):
 ...
    File 
  /usr/local/lib/python2.6/dist-packages/ZODB3-3.10.3-py2.6-linux-i686.egg/ZODB/FileStorage/fspack.py,
   line 328, in findrefs
      return self.referencesf(self._file.read(dh.plen))
    File 
  /usr/local/lib/python2.6/dist-packages/ZODB3-3.10.3-py2.6-linux-i686.egg/ZODB/serialize.py,
   line 630, in referencesf
      u.noload()
  TypeError: unhashable type: 'list'

 Since the bug is in the stdlib, it's not surprising that the newer
 stdlib cPickle from Python 2.7 fixes it.

 I suspect a bug in the application (defining persistent classes in __main__)
 is the root problem that's aggravated by the cPickle problem.

The pickle's classes were defined in a normal module, I think Marius
just aliased those to modules to __main__ and defined the classes
there in order to load the pickle without the original code:

sys.modules['game.objects.item'] = sys.modules['__main__'] # hack
sys.modules['game.objects'] = sys.modules['__main__'] # hack
sys.modules['game'] = sys.modules['__main__'] # hack

Laurence
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app

2012-01-24 Thread Laurence Rowe
On 24 January 2012 13:50, steve st...@lonetwin.net wrote:
 Hi All,

 I apologize for the cross-post but by this mail I simply hope to get a few
 pointers on how to narrow down to the problem I am seeing. I shall post to the
 relevant list if I have further questions.

 So here is the issue:

 Short description:
 I've got a repoze.bfg application running on top of zeo/zodb across multiple
 servers, served using mod_wsgi and it's showing bad resource usage (both high
 memory consumption as well as CPU usage). Are there any steps i can do to
 localise whether this is an issue with zeo/zodb/mod_wsgi configuration, and/or
 usage ?

 Long description:

 * I have a repoze.bfg (version 1.3) based app, which uses zodb (over zeo,
 version 3.10.2) as the backend and is served up using apache+mod_wsgi. All
 running on a minimal debian 6.0 based amazon instances.

 * The architecture is 1 zodb server and 4 app instances running on individual
 EC2 instances (all in the same availability zone). All of the instances are
 behind an amazon Elastic Load Balancer

 * At the web-server, we don't customize apache much (ie: we pretty much use 
 the
 stock debian apache config). We use mod_wsgi (version 3.3-2) to serve the
 application in daemon mode, with the following parameters:

 WSGIDaemonProcess webapp user=appname threads=7 processes=4
 maximum-requests=1 python-path=/path/to/virtualenv/eggs

 * The web app is the only thing that is served from these instances and we 
 serve
 the static content for the using apache rather than the web app.

 * The zodb config on the db server looks like:
 zeo
  address 8886
  read-only false
  invalidation-queue-size 1000
  pid-filename $INSTANCE/var/ZEO.pid
  # monitor-address 8887
  # transaction-timeout SECONDS
 /zeo

 blobstorage 1
  filestorage
    path $INSTANCE/var/webapp.db
  /filestorage
  blob-dir $INSTANCE/var/blobs
 /blobstorage

 * The zeo connection string (for repoze.zodbconn-0.11) is:

 zodb_uri = zeo://zodb server
 ip:8886/?blob_dir=/path/to/var/blobsshared_blob_dir=falseconnection_pool_size=50cache_size=1024MBdrop_cache_rather_verify=true

 (Note: the drop_cache_rather_verify=true is for faster startups)

 Now with this, on live we have typical load such as:
 top - 13:34:54 up 1 day,  8:22,  2 users,  load average: 11.87, 8.75, 6.37
 Tasks:  85 total,   2 running,  82 sleeping,   0 stopped,   1 zombie
 Cpu(s): 81.1%us,  6.7%sy,  0.0%ni, 11.8%id,  0.0%wa,  0.0%hi,  0.1%si,  0.2%st
 Mem:  15736220k total,  7867340k used,  7868880k free,   283332k buffers
 Swap:        0k total,        0k used,        0k free,  1840876k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  5079 appname   21   0 1587m 1.2g 6264 S   77  8.1   9:23.86 apache2
  5065 appname   20   0 1545m 1.2g 6272 S   95  7.9   9:31.24 apache2
  5144 appname   20   0 1480m 1.1g 6260 S   86  7.4   5:49.92 apache2
  5127 appname   20   0 1443m 1.1g 6264 S   94  7.2   7:13.10 apache2
 
 
 

 As you can see that very high load avg. and the apache processes spawned for
 mod_wsgi (identifiable because of the user whose context they run under) 
 consume
 about 1.2Gs resident memory each.

 With a constant load like this, the app. response progressively degrades. 
 We've
 tried to tweak the number of processes, the cache_size in the zeo connection
 string but all to no avail. So, now rather than shoot in the dark, I would
 appreciate suggestions on how I might be able to isolate the bottle-neck in 
 the
 stack.

 One thing to note is that is high load and memory usage is only seen on the
 production instances. When we test the app. using ab or funkload on a similar
 setup (2 app instances instead of 4), we do not see this problem.

 Any pointers/comments would be appreciated.

(Following up only on zodb-dev as I'm not subscribed to the other lists.)

I'm guessing, but I suspect your load tests may only be reading from
the ZODB so you rarely see any cache misses.

The most important tuning paramaters for ZODB in respect to memory
usage are the number of threads and the connection_cache_size. The
connection_cache_size controls the number of persistent objects kept
live in the interpreter at a time. It's a per-connection setting and
as each thread needs its own connection. Memory usage increases
proportionally to connection_cache_size * number of threads. Most
people use either one or two threads per process with the ZODB. I know
plone.recipe.zope2instance defaults to two threads per process, though
I think this is only to avoid locking up in the case of Plone being
configured to load an RSS feed from itself.

The Python Global Interpreter Lock prevents threads from running
concurrently, so with ZEO running so many threads per process is
likely to be counter-productive. Try with one or two threads and
perhaps up the connection_cache_size (though loading from the zeo
cache is very quick you must ensure your working set fits in the
connection cache or else you'll be loading the 

Re: [ZODB-Dev] zeo.memcache

2011-10-13 Thread Laurence Rowe
On 12 October 2011 23:53, Shane Hathaway sh...@hathawaymix.org wrote:
 As I see it, a cache of this type can take 2 basic approaches: it can
 either store {oid: (state, tid)}, or it can store {(oid, tid): (state,
 last_tid)}. The former approach is much simpler, but since memcache has
 no transaction guarantees whatsoever, it would lead to consistency
 errors. The latter approach makes it possible to avoid all consistency
 errors even with memcache, but it requires interesting algorithms to
 make efficient use of the cache. I chose the latter.

On first reading I had thought that the {oid: (state, tid)} approach
would not necessarily lead to consistency errors as a connection could
simply discard cached values where the cached state tid is later than
the current transaction's last tid. But I guess that it must be
impossible for a committing connection to guarantee that all cached
oids remain invalidated during a commit and are not refilled with a
previous state by another connection performing a read. This would
necessitate the same checkpointing algorithm to avoid consistency
errors.

I sometimes wonder if it would be better to separate the maintenance
of the oid_tid mapping from the storage of object states. A database
storing only the oid_tid mapping and enough previous tids to support
current transactions -- essentially the Data.fs.index -- would always
fit easily in RAM and could conceivably be replicated to every machine
in a cluster to ensure fast lookups. The storage / caching of object
states could then be very simple.

Laurence
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Corrupted OOTreeSet - strange behavior

2011-07-18 Thread Laurence Rowe
On 18 July 2011 11:07, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote:
 Hello,

 I have an OOTreeSet in my DB that is behaving a bit funny (seems to be
 corrupted). I thought I could get some more information by performing a
 sanity check, but that doesn't seem to help a lot:

 
 c in s
 False
 c in list(s)
 True
 s._check()

 

 Shouldn't there be an error in this case?

TreeSets are essentially BTrees with only keys. This means that the
members of a TreeSet must have a stable ordering. I suspect that that
c's class does not define the comparison methods (such as __lt__)
which means under Python 2 it falls back to the default ordering based
on object id (Python 3 will raise a TypeError instead, avoiding this
problem.) With ZODB an object's Python id (the memory address of the
object) will change whenever it is reloaded, i.e. across restarts,
after invalidation or removal from the cache.

A TreeSet is ordered, so the contains comparison only needs to perform
a lookup to see whether an object is a member of the TreeSet, as the
id of the object has changed, its expected position has changed and it
is not found. A list is not ordered, so it has to check against every
object in the list to test for containment.

The _check() method only confirms that the BTree/TreeSets's internal
data structure is consistent. It does not check every item. So it does
not show an error in this case.

You will need to add comparison methods for the class of the objects
you are storing in the TreeSet and then rebuild the TreeSets.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Corrupted OOTreeSet - strange behavior

2011-07-18 Thread Laurence Rowe
On 18 July 2011 13:08, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote:
 TreeSets are essentially BTrees with only keys. This means that the
 members of a TreeSet must have a stable ordering. I suspect that that
 c's class does not define the comparison methods (such as __lt__)
 which means under Python 2 it falls back to the default ordering based
 on object id (Python 3 will raise a TypeError instead, avoiding this
 problem.) With ZODB an object's Python id (the memory address of the
 object) will change whenever it is reloaded, i.e. across restarts,
 after invalidation or removal from the cache.

 Yes, I know that. But I have a __cmp__ function defined, based on an
 object property that never changes. That should be enough, no?

I think it should, but are you absolutely certain it never changes?
Does list(s) == sorted(list(s)) and does list(s) ==
list(OOTreeSet(s))?

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RFC: Blobs in S3

2011-07-07 Thread Laurence Rowe
On 6 July 2011 19:44, Jim Fulton j...@zope.com wrote:
 We're evaluating AWS for some of our applications and I'm thinking of adding
 some options to support using S3 to store Blobs:

 1. Allow a storage in a ZEO storage server to store Blobs in S3.
    This would probably be through some sort of abstraction to make
    this not actually depend on S3.  It would likely leverage the fact that
    a storage server's interaction with blobs is more limited than application
    code.

 2. Extend blob objects to provide an optional URL to fetch data
    from. This would allow applications to provide S3 (or similar service)
    URLs for blobs, rather than serving blob data themselves.


    2.1 If I did this I think I'd also add a blob size property, so you could
          get a blob's size without opening the blob file or downloading
          it from a database server.

 Option 3.  Handle blob URLs at the application level.

   To make this work for the S3 case, I think we'd have to use  a
   ZEO server connection to be called by application code.  Something like:

       self.blob = ZODB.blob.Blob()
       f = self.blob.open('w')
       f.write(some_data)


 Option 1 is fairly straightforward, and low risk.

 Option 2 is much trickier:

 - It's an API change
 - There are bits of implementation that depend on the
  current blob record format.  I'm not sure if these
  bits extend beyond the ZODB code base.
 - The handling of blob object state would be a little
   delicate, since some of the state would be set on the storage
   server.
 -  The win depends on being able to load a blob
    file independently of loading blob objects, although
    the ZEO blob cache implementation already depends
    on this.

Adding the ability to store blobs in S3 would be an excellent feature
for AWS based deployments. I'm not convinced that presenting S3 urls
to the end users is terribly useful as there is no ability to set a
Content-Disposition header and the url will not end with the correct
file extension, which will cause problems for users downloading files.

I would imagine a more common setup would be to serve the S3 stored
blobs through a proxy server running in EC2, using something similar
to Nginx's X-Accel-Redirect. Lovely Systems has some information on
generating an S3 Authrorization header in Nginx here:
http://www.lovelysystems.com/nginx-as-an-amazon-s3-authentication-proxy-2/
- though generating an authenticated S3 URL in Python to set in the
X-Accel-Redirect header would lead to much simpler proxy
configuration.

In either case though, I don't see why doing so would necessitate
changing the blob record format - presumably a blob's url can be
simply mapped from the S3 blobstorage configuration and a blob's oid
and tid?

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RFC: Blobs in S3

2011-07-07 Thread Laurence Rowe
On 7 July 2011 16:55, Jim Fulton j...@zope.com wrote:
 On Thu, Jul 7, 2011 at 10:49 AM, Laurence Rowe l...@lrowe.co.uk wrote:
 ...
 One thing I found with my (rather naive) experiments building
 s3storage a few years ago is that you need to ensure requests to S3
 are made in parallel to get reasonable performance. This would be a
 lesser problem with blobs, but even then you might have multiple file
 uploads in the same request. The boto library is really useful, but
 doesn't support async requests.

 Right, it occurred to me that commit performance with s3 might be an issue.

 I guess the simplest implementation would only upload a blob to S3 in
 tpc_begin as that is where the tid is set (and presumably the tid will
 form part of the blob's S3 url.) With large files that might make
 tpc_begin take a long time to complete as it waits for the blob data
 to be loaded into S3. It might be better to upload large blobs to a
 temporary s3 url first and then only make an S3 copy in tpc_begin,
 you'd need to do some benchmarks to see if this was worthwhile for all
 files or only files over a certain size.

 I think I get where you're going, although I'd quibble with the details.
 There is certainly some opportunity for doing things in parallel
 up until you get to tpc_vote. I wonder if renames in S3 take much
 time. I can image that they do.

Thinking about this again, perhaps it would be better to store a url
or uuid in the blob's record. This would allow a blob's S3 url to be
assigned much earlier as it need not contain the tid. The commit would
not then need to involve any requests to S3 at all. While I don't
suppose an S3 copy request should be any slower than a zero byte PUT
(S3 only promises eventual consistency), you still need to pay the
latency.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Immutable blobs?

2011-05-09 Thread Laurence Rowe
On 9 May 2011 13:32, Hanno Schlichting ha...@hannosch.eu wrote:
 On Mon, May 9, 2011 at 2:26 PM, Laurence Rowe l...@lrowe.co.uk wrote:
 While looking at the Plone versioning code the other day, it struck me
 that it would be much more efficient to implement file versioning if
 we could rely on blobs never changing after their first commit, as a
 copy of the file data would not need to be made proactively in the
 versioning repository incase the blob was changed in a future
 transaction.

 Subclassing of blobs is not supported, but looking at the code I
 didn't see anything that actively prevented this other than the
 Blob.__init__ itself. Is there something I've missed here? I had
 thought that an ImmutableBlob could be implemented by overriding the
 open and consumeFile methods of Blob to prevent modification after
 first commit.

 I thought blobs are always immutable by design?

Blobs can be opened writable in subsequent transactions with
blob.open('w'). This leads to the blob storage creating a new file
when the transaction is committed - the naming scheme is basically
oid/tid.blob.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] How to check for setting the same values on persistent objects?

2011-05-04 Thread Laurence Rowe
On 4 May 2011 10:53, Hanno Schlichting ha...@hannosch.eu wrote:
 Hi.

 I tried to analyze the overhead of changing content in Plone a bit. It
 turns out we write back a lot of persistent objects to the database,
 even tough the actual values of these objects haven't changed.

 Digging deeper I tried to understand what happens here:

 1. persistent.__setattr__ will always set _p_changed to True and thus
 cause the object to be written back
 2. Some BTree buckets define the VALUE_SAME macro. If the macro is
 available and the new value is the same as the old, the change is
 ignored
 3. The VALUE_SAME macro is only defined for the int, long and float
 value variants but not the object based ones
 4. All code in Products.ZCatalog does explicit comparisons of the old
 and new value and ignores non-value-changes. I haven't seen any other
 code doing this.

 I'm assuming doing a general check for old == new is not safe, as it
 might not be implemented correctly for all objects and doing the
 comparison might be expensive.

 But I'm still curious if we could do something about this. Some ideas:

 1. Encourage everyone to do the old == new check in all application
 code before setting attributes on persistent objects.

 Pros: This works today, you know what type of values you are dealing
 with and can be certain when to apply this, you might be able to avoid
 some computation if you store multiple values based on the same input
 data
 Cons: It clutters all code

 2. Create new persistent base classes which do the checking in their
 __setattr__ methods

 Pros: A lot less cluttering in the application code
 Cons: All applications would need to use the new base classes.
 Developers might not understand the difference between the variants
 and use the checking versions, even though they store data which
 isn't cheap to compare

 2.a. Create new base classes and do type checking for built-in types

 Pros: Safer to use than always doing value comparisons
 Cons: Still separate base classes and overhead of doing type checks

 3. Compare object state at the level of the pickled binary data

 This would need to work at the level of the ZODB connection. When
 doing savepoints or commits, the registered objects flagged as
 _p_changed would be checked before being added to the modified list.
 In order to do this, we need to get the old value of the object,
 either by loading it again from the database or by keeping a cache of
 the non-modified state of all objects. The latter could be done in
 persistent.__setattr__, where we add the pristine state of an object
 into a separate cache before doing any changes to it. This probably
 should be a cache with an upper limit, so we avoid running out of
 memory for connections that change a lot of objects. The cache would
 only need to hold the binary data and not unpickle it.

 Pros: On the level of the binary data, the comparisons is rather cheap
 and safe to do
 Cons: We either add more database reads or complex change tracking,
 the change tracking would require more memory for keeping a copy of
 the pristine object. Interactions with ghosted objects and the new
 cache could be fragile.

 4. Compare the binary data on the server side

 Pros: We can get to the old state rather quickly and only need to deal
 with binary string data
 Cons: We make all write operations slower, by adding additional read
 overhead. Especially those which really do change data. This won't
 work on RelStorage. We only safe disk space and cache invalidations,
 but still do the bulk of the work and sent data over the network.


 I probably missed some approaches here. None of the approaches feels
 like a good solution to me. Doing it server side (4) is a bad idea in
 my book. Option 3 seems to be the most transparent and safe version,
 but is also the most complicated to write with all interactions to
 other caches. It's also not clear what additional responsibilities
 this would introduce for subclasses of persistent which overwrite
 various hooks.

 Maybe option one is the easiest here, but it would need some
 documentation about this being a best practice. Until now I didn't
 realize the implications of setting attributes to unchanged values.

Persistent objects are also used as a cache and in that case code
relies on an object being invalidated to ensure its _v_ attributes are
cleared. Comparing at the pickle level would break these caches.

I suspect that this is only really a problem for the catalogue.
Content objects will always change on the pickle level when they are
invalidated as they will have their modification date updated. I
imagine you also see archetypes doing bad things as it tends to store
one persistent object per field, but that is just bad practise.

It would be interesting to see the performance impact of adding
newvalue != oldvalue checks on the catalogue data structures. This
would also prevent the unindex logic being called unnecessarily.

I don't think that the dobbin requirement 

Re: [ZODB-Dev] transaction as context manager, exception during commit

2011-02-24 Thread Laurence Rowe
On 24 February 2011 10:17, Chris Withers ch...@simplistix.co.uk wrote:
 Hi Jim,

 The current __exit__ for transaction managers looks like this:

     def __exit__(self, t, v, tb):
         if v is None:
             self.commit()
         else:
             self.abort()

 ..which means that if you're using the transaction package as a context
 manager and, say, a relational database integrity constraint is
 violated, then you're left with a hosed transaction that still needs
 aborting.

 How would you feel about the above changing to:

     def __exit__(self, t, v, tb):
         if v is None:
             try:
                 self.commit()
             except:
                 self.abort()
                 raise
         else:
             self.abort()

 If this is okay, I'll be happy to write the tests and make the changes
 provided someone does a release when I have...

Looking at the way ZPublisher handles this, I think you're right. I
think you might also need to modify the __exit__ in Attempt, which
additionally handles retrying transactions that fail.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RelStorage pack with history-free storage results in POSKeyErrors

2011-01-26 Thread Laurence Rowe
On 26 January 2011 21:57, Jürgen Herrmann juergen.herrm...@xlhost.de wrote:
  is there a script or some example code to search for cross db
  references?
  i'm also eager to find out... for now i disabled my packing cronjobs.

Packing with garbage collection disabled (pack-gc = false) should
definitely be safe.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RelStorage pack with history-free storage results in POSKeyErrors

2011-01-26 Thread Laurence Rowe
On 26 January 2011 23:11, Chris Withers ch...@simplistix.co.uk wrote:
 On 26/01/2011 22:49, Laurence Rowe wrote:

 On 26 January 2011 21:57, Jürgen Herrmannjuergen.herrm...@xlhost.de
  wrote:

  is there a script or some example code to search for cross db
  references?
  i'm also eager to find out... for now i disabled my packing cronjobs.

 Packing with garbage collection disabled (pack-gc = false) should
 definitely be safe.

 Am I right in thinking this is pointless if you're using a history-free
 storage?

Yes.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RelStorage and PosKey errors - is this a risky hotfix?

2011-01-24 Thread Laurence Rowe
On 24 January 2011 21:28, Shane Hathaway sh...@hathawaymix.org wrote:
 On 01/24/2011 02:02 PM, Anton Stonor wrote:
 Hi there,

 We have recently experienced a couple of PosKey errors with a Plone 4
 site running RelStorage 1.4.1 and Mysql 5.1.

 After digging down we found that the objects that were throwing
 PosKeyErrors  actually existed in the object_state table with pickles
 etc, however not in the current_object table.

 After inserting the missing pointers into the current_object  table,
 everything worked fine:

    mysql SELECT zoid, tid FROM object_state WHERE zoid=561701;

    +++
    | zoid   | tid                |
    +++
    | 561701 | 255267099158685832 |
    +++

    mysql INSERT INTO current_object(zoid, tid) VALUES('561701',
 '255267099158685832');

 Looks like it works -- but is this a safe way to fix PosKeyErrors?

 Now, I wonder why these pointers were deleted from the current_object
 table in the first place. My money is on packing -- and it might fit
 with the fact that we recently ran a pack that removed an unusual large
 amount of transactions in a single pack (100.000+ transactions).

 But I don't know how to investigate the root cause further. Ideas?

 This suggests MySQL not only lost some data (due to a MySQL bug or a
 filesystem-level error), but it failed to enforce a foreign key that is
 supposed to ensure this never happens.  I think you need to check the
 integrity of your filesystem (e2fsck -f) and database (mysqlcheck -c).
 You might also reconsider the choice to use MySQL.

Must this imply a failure to maintain a foreign key constraint? While
there are FK constraints on current_object (zoid, tid) - object_state
(zoid, tid) there is no foreign key that might prevent a
current_object row from being incorrectly deleted.

I think that means the possibilities are:

1. The current_object table was not updated properly during a commit
or corrupted so that some rows were lost.

2. Something goes wrong during pack gc (either in the pack logic or on
the database).

3. Database corruption.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RelStorage recommended maintenance

2011-01-21 Thread Laurence Rowe
On 21 January 2011 20:57, Shane Hathaway sh...@hathawaymix.org wrote:
 On 01/21/2011 10:46 AM, Chris Withers wrote:
 I'm wondering what the recommended maintenance for these two types of
 storage are that I use:

 - keep-history=true, never want to lose any revisions

 My guess is zodbpack with pack-gc as true, but what do I specify for the
 number of days in order to keep all history?

 Is 100 years enough?  365.24 * 100 = 36524 ;-)

Why would you pack a database from which you don't want to lose any revisions.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Plone in P2P using Zope over DHT

2011-01-04 Thread Laurence Rowe
I'm not very optimistic about this I'm afraid. First the problems with
using Plone:

 * Plone relies heavily on its in ZODB indexes of all content
(portal_catalog). This means that every edit will change lots of
objects (without versioning ~15-20, most of which are in the
catalogue).

 * At least with archetypes a content object's data is spread over
multiple objects. (This should be better with Dexterity, though you
will still have multiple objects for locking and workflow)

 * If you use versioning you'll see ~ 100 objects changed in an edit.

 * Even loading the front-page will take a long time - In my
experiments writing an amazon s3 backend for ZODB the extra latency of
fetching each object was really noticeable.

But I'm not sure even a simpler ZODB CMS would be a good fit for a p2p DHT:

 * ZODB is transactional using two phase commit. With p2p latencies,
these commits will be horribly slow - all clients storing changed
objects would need to participate in the transaction.

 * Each client's object cache will need to know about invalidations, I
don't see any way of supplying these from a DHT.

I expect you'd have more success storing content items as single
content objects / pages in the DHT and then generating indexes based
on that. You'll need some way of storing parent - child relationships
between the content objects too, as updating a single list of children
object will be incredibly difficult to get right in a distributed
system.

Laurence


On 4 January 2011 11:40, Aran Dunkley a...@organicdesign.co.nz wrote:
 Thanks for the feedback Vincent :-) it sounds like NEO is pretty close
 to being SQL-free. As one of the NEO team, what are your thoughts on the
 practicality of running Plone in a P2P environment with the latencies
 experienced in standard DHT (such as for example those based on
 Kademlia) implemtations?

 On 04/01/11 22:27, Vincent Pelletier wrote:
 Hi.

 Le mardi 4 janvier 2011 07:18:34, Aran Dunkley a écrit :
 The problem is that it uses SQL for its indexing queries (they quote
 NoSQL as meaning Not only SQL). SQL cannot work in P2P space, but
 can be made to work on server-clusters.

 Yes, we use MySQL, and it bites us on both worlds actually:
 - in relational world, we irritate developers as we ask questions like why
   does InnoDB load a whole row when we just select primary key columns, 
 which
   ends up with don't store blobs in mysql
 - in key-value world, because NoSQL using MySQL doesn't look consistent

 So, why do we use MySQL in NEO ?
 We use InnoDB as an efficient BTree implementation, which handles 
 persistence.
 We use MySQL as a handy data definition language (NEO is still evolving, we
 need an easy way to tweak table structure when a new feature requires it), 
 but
 we don't need any transactional isolation (each MySQL process used for NEO is
 accessed by only one process through one connection).
 We want to stop using MySQL  InnoDB in favour of leaner-and-meaner 
 back-ends.
 I would especially like to try kyoto cabinet[1] in on-disk BTree mode, but it
 requires more work than the existing MySQL adaptor and there are more urgent
 tasks in NEO.

 Just as a proof-of-concept, NEO can use a Python BTree implementation as an
 alternative (RAM-only) storage back-end. We use ZODB's BTree implementation,
 which might look surprising as it's designed to be stored in a ZODB... But
 they work just as well in-RAM, and that's all I needed for such proof-of-
 concept.

 [1] http://fallabs.com/kyotocabinet/

 Regards,

 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/

 ZODB-Dev mailing list  -  zodb-...@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RelStorage support to Microsoft SQLServer

2010-11-17 Thread Laurence Rowe
On 17 November 2010 16:34, Alan Runyan runy...@gmail.com wrote:
 I have read that there is a problem to implement MS-SQL adapter for
 Relstorage because the “Two phase commit” feature is not exposed by
 MS-SQL server .

 unsure about that. probably depends on the client access library.

At least when I looked at pyodbc/FreeTDS in 2008 FreeTDS did not have
support for the tds packets necessary for joining an XA transaction.
(FreeTDS is the odbc - SQL Server driver used on unix.) See:
http://article.gmane.org/gmane.comp.db.tds.freetds/9598. I did have
some more information on a SQLAlchemy wiki page but that seems to have
gone now.

However, two phase commit may not be necessary with the current
version of RelStorage - it's not used with PostgreSQL anymore.

 Is there solution to overcome this problem, Without introducing too many
 layers?
 Can we use PyMSSQL and ADODB Python extension to implement the
 relstorage Adapter for MS-SQL.

 i recently had a discussion with some guys about this. i am unsure what
 their analysis was.  but my opinion:
  - adodbapi is not good.
  - pymssql i've not used
  - pyodbc we used but it doesnt support storedprocs. works ok.
  - mxodbc we use and highly recommend.

 yes mxodbc costs money but you have support.  i spoke with shane
 about this in the past about which library would he probably use if
 he were to support mssqlserver and his unresearched/not definitive
 answer was mxodbc.  mainly because its supported and has been
 in production usage for almost a decade.

I've used stored procedures with pyodbc:
http://code.google.com/p/pyodbc/wiki/StoredProcedures

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] RelStorage support to Microsoft SQLServer

2010-11-17 Thread Laurence Rowe
On 17 November 2010 17:05, Laurence Rowe l...@lrowe.co.uk wrote:
 On 17 November 2010 16:34, Alan Runyan runy...@gmail.com wrote:
 I have read that there is a problem to implement MS-SQL adapter for
 Relstorage because the “Two phase commit” feature is not exposed by
 MS-SQL server .

 unsure about that. probably depends on the client access library.

 At least when I looked at pyodbc/FreeTDS in 2008 FreeTDS did not have
 support for the tds packets necessary for joining an XA transaction.
 (FreeTDS is the odbc - SQL Server driver used on unix.) See:
 http://article.gmane.org/gmane.comp.db.tds.freetds/9598. I did have
 some more information on a SQLAlchemy wiki page but that seems to have
 gone now.

Found that here: http://www.sqlalchemy.org/trac/wiki/MSSQLTwoPhaseCommit

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] zodb monitor port / tailing a .fs

2010-10-13 Thread Laurence Rowe
On 14 October 2010 01:28, Darryl Dixon - Winterhouse Consulting
darryl.di...@winterhouseconsulting.com wrote:
 On 13/10/2010 15:23, Jim Fulton wrote:
 You can connect to the monitor port in 3.9 and earlier,
 if the monitor port is configured.  In 3.10, the monitor server is
 replaced by a ZEO client method, server_status. This tells you
 much the same information that's in the log messages.

 Okay, monitor port up and running now.
 I see commits listed when I'm not expecting any.

 Do we have any kind of tail -f /some/filestorage.fs yet? (or have we
 always had such a tool) to see what the last few transactions in the
 underlying file storage look like in a human-readable form?


 fsdump.py gets you pretty close (ZODB/scripts/fsdump.py). Between that and
 the Undo log for the DB inside Zope, you might be able to figure it out...

There's also fstail:

$ bin/zopepy -m ZODB.scripts.fstail var/filestorage/Data.fs
2010-09-03 22:15:17.658204: hash=2e11770947c4c9af50cfec0183c38b460507cad6
user=' admin' description='/Plone/login_failed' length=1126 offset=8229031

2010-08-21 23:28:12.580279: hash=c1e7af2df41b6506db65681bc7f2f58587cb8b8b
user=' admin' description='/Plone/front-page/plone_lock_operations/safe_unlock'
length=279 offset=8228776

2010-08-21 23:28:03.903884: hash=9ca763b978c804c87d920945d4c0b5470bb3aad4
user=' admin' description='/Plone/atct_edit' length=919 offset=8227814

2010-08-21 23:28:01.835501: hash=dec67acaa2685822d68d586ef83eff13e12d3e78
user=' admin' description='/Plone/front-page/plone_lock_operations/safe_unlock'
length=279 offset=8227562
...

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] read-only database

2010-09-27 Thread Laurence Rowe
On 27 September 2010 18:26, Nathan Van Gheem vangh...@gmail.com wrote:
 BTW, I thought I could just use the ZPublisherEventsBackup to abort
 every transaction when zope is in read-only... Kind of hacky, but not
 too bad :)

That sounds really evil, but I guess it should work...

plone.app.imaging / plone.scale create scales on demand, caching as an
annotation. You could define a different storage method by overriding
the plone.app.imaging.scaling.ImageScaling view.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] db.undoLog() does not work as documented ..

2010-08-23 Thread Laurence Rowe
On 23 August 2010 17:51, Jim Fulton j...@zope.com wrote:
 It's worth noting that these are not the docs.  I didn't write or
 review them.  I don't have any control over zodb.org. I have no idea
 how to comment on the docs. (I could possibly find out, but I don't have time
 to work that hard.)
...
 This is problematic.  I didn't write the docs and the docs
 are not part of the software.  I can't do anything about this
 I don't know if anyone who can deal with bugs has any control
 over zodb.org.

That website is created from svn+ssh://svn.zope.org/repos/main/zodbdocs/trunk

Those docs used to live in ZODB. I converted them to rst from latex to
make them easier to edit. They've mostly not been updated since 2002
though.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] db.undoLog() does not work as documented ..

2010-08-23 Thread Laurence Rowe
On 23 August 2010 19:08, Jim Fulton j...@zope.com wrote:
 On Mon, Aug 23, 2010 at 1:08 PM, Laurence Rowe l...@lrowe.co.uk wrote:
 On 23 August 2010 17:51, Jim Fulton j...@zope.com wrote:
 It's worth noting that these are not the docs.  I didn't write or
 review them.  I don't have any control over zodb.org. I have no idea
 how to comment on the docs. (I could possibly find out, but I don't have 
 time
 to work that hard.)
 ...
 This is problematic.  I didn't write the docs and the docs
 are not part of the software.  I can't do anything about this
 I don't know if anyone who can deal with bugs has any control
 over zodb.org.

 That website is created from svn+ssh://svn.zope.org/repos/main/zodbdocs/trunk

 Cool.

 Those docs used to live in ZODB. I converted them to rst from latex to
 make them easier to edit.

 I appreciate the good intention. :) Honestly.

 They've mostly not been updated since 2002
 though.

 :/ Sigh.

 I'll have to think about what the next step is then.  This will probably
 involve deleting lots of wrong content.

 Do you maintain zodb.org then?  What's the process for updating it?

I created the zodbdocs following the examples of zope2docs and
zope3docs for docs.zope.org. Jens Vagelpohl added them to the cron job
that updates docs.zope.org and is the person responsible for that site
as far as I know. At some point they were moved to zodb.org, which
runs on the same box.

As for updating it, just check in changes to svn. The cron job picks
up changes and rebuilds the sphinx docs every hour (or perhaps every
day).

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Weird KeyError with OOBTree

2010-08-16 Thread Laurence Rowe
On 16 August 2010 13:13, Tres Seaver tsea...@palladion.com wrote:
 Hanno Schlichting wrote:
 On Mon, Aug 16, 2010 at 12:14 PM, Pedro Ferreira
 jose.pedro.ferre...@cern.ch wrote:
 Could this be some problem with using persistent objects as keys in a BTree?
 Some comparison problem?

 I'm not entirely sure about this, but I think using persistent objects
 as keys isn't supported. Looking at the code, I doubt using anything
 expect simple types like unicode strings or tuples of simple types
 will work without further work.

From what I can see in the code, BTree's use functions like
 PyObject_Compare to compare different keys. Persistent doesn't
 implement any special compare function and falls back to the standard
 hash algorithm for an object. This happens to be its memory address.
 The memory address obviously changes over time and the same address
 gets reused for different objects.

 I think implementing a stable hash function for your type could make
 this work though.

 The ZODB gods correct me please :)

 Btrees require comparability, rather than hashability:  your
 persistent type needs to define a total ordering[1], which typically
 means defining '__cmp__' for your class.  You could also define just
 '__eq__' and '__lt__', but '__cmp__' is slightly more efficient.


 [1]http://www.zodb.org/documentation/guide/modules.html#total-ordering-and-persistence

While ZODB 3.8 makes it possible to use Persistent objects as keys in
a BTree, it's almost certainly a bad idea as a lookup will incur many
more object loads while traversing the BTree as the Persistent keys
will have to be loaded before they can be compared. Consider using one
of these alternatives instead:

* Set the IOTreeSet as an attribute directly on the persistent object.
* Use http://pypi.python.org/pypi/zope.intid and use the intid for the
key. (This uses http://pypi.python.org/pypi/zope.keyreference which
uses the referenced object's oid and database name to perform the
comparison, avoiding the need to load the persistent object.)

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Weird KeyError with OOBTree

2010-08-16 Thread Laurence Rowe
On 16 August 2010 17:29, Pedro Ferreira jose.pedro.ferre...@cern.ch wrote:


 Consider using one
 of these alternatives instead:

 * Set the IOTreeSet as an attribute directly on the persistent object.


 You mean on the persistent object I am using as key?

Yes.

 * Use http://pypi.python.org/pypi/zope.intid and use the intid for the
 key. (This uses http://pypi.python.org/pypi/zope.keyreference which
 uses the referenced object's oid and database name to perform the
 comparison, avoiding the need to load the persistent object.)


 This looks really nice. However it seems to depend on a lot of zope
 libraries that I'm currently including: location, component, security...
 well, I guess they're not that large. I will give it a look, maybe I'll use
 it.

I guess you could avoid the dependencies by using
(obj._p_jar.db().database_name, obj._p_oid) as the key.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] SpatialIndex

2010-06-28 Thread Laurence Rowe
On 28 June 2010 15:23, Nitro ni...@dr-code.org wrote:
 Am 28.06.2010, 14:10 Uhr, schrieb Dylan Jay d...@pretaweb.com:

 I don't use a lot of other indexes other than what comes with plone but
 I can see the value of what your suggesting in having an installable
 tested collection of indexes. I can also see that this is a really big
 itch for you and you've already identified a bunch of candidates to
 include. So why not go the next step and create a package that's nothing
 more than requires specifications and release it with versions
 corresponding to zodb releases. Then others may find it useful and help
 you support it. And if it's really popular it may even get taken into
 consideration with zodb packaging. Who knows?

 Thanks for your feedback, Dylan.

 My main problem is not the lack of an index collection. It's one of the
 problems I faced, but not the main one. Indexing is just a small part of
 working with a database.

 The main problem (imo) is that there are already 50 zodb related packages
 on pypi and none of them gathered a lot of people working on them. I don't
 see why this should be any different if I publish yet another package.
 Especially if most people use plone and the built-in indices. Just look at
 what happened to ZCatalog Standalone.

 Here's a little metaphor for what I'm trying to say:

 Once upon a time there was a man who wanted to go to the bakery to buy a
 bread. He thought I'll be done with this quickly, after all many people
 want to buy a bread. So he went off to visit the ZBakery.
 When asking for a bread, the people in the ZBakery told him there's no
 need to sell whole breads. They said: See, we have all the ingredients
 here so you can make a bread suiting your own taste. Look, there's ZFlour,
 ZMilk and ZSalt. And if you rummage the corners of this bakery, you'll
 also might find ZFlour2, CustomFlour and MyOwnCoolFlour. We don't know if
 they are any good, because each flour is used by just one or two people..
 The man thought about it for a while and went off to try the different
 flours. When he wanted to try the CustomFlour it did not work. It turned
 out this was because CustomFlour relied on 3rdPartyMill and 3rdPartyMill
 had a problem, so CustomFlour was broken. The man shaked his head after he
 realized a few dozen people already tried to get CustomFlour and nobody
 pointed out the problem to its producer. Finding out about all of this
 took the whole morning and so he finally made lunch break.
 After his lunch break was over he finally found a flour suiting his bread
 he went off to look at the different milks and salts. He experienced
 similar problems there. One of the milks had just a label milk on it,
 the other areas of the packaging were blank. The man had no idea if the
 milk in question would work for his bread or not. So he had to analyze the
 contents of the milk to see if it might be useful. It turned out the milk
 was mislabeled and not a milk.
 As the sun was already touching the horizon and the air was getting cold
 the man ignored the milk for the time being and went looking for salt. He
 did not have to search for long and was delighted to find a single salt
 which would just work.
 When the man looked out of a window of the ZBakery he saw it was already
 dark and went home. When lying in bed he thought to himself: All I wanted
 to have this morning was a bread. Now I'm about to fall asleep and still
 don't have one. The bakery even had all the ingredients! But why did they
 made me try and analyze each ingredient? I even would've taken a bread
 which tasted a bit worse then the bread which I now have to bake on my
 own. The other customers of ZBakery surely also want breads, rolls and
 cake. Aren't they interested in creating a standard package of breads,
 rolls and cake? If they'd work together on a single bread, they'd all
 benefit from the improved recipes. New customers would immediately notice
 that there's a good default bread which many people like. These customers
 might point their colleagues at ZBakery, because it sells tasty,
 ready-to-use breads. If there was a special customer he could still bake
 his own bread using the individual ingredients.
 Pondering all these things he slided into a deep sleep. When he woke up
 the next morning he found a handful of committed people who had gathered
 in the ZBakery to bake and sell their first bread together...

There are some valid criticisms in here. One problem with PyPI is that
there is no way to clearly mark a package as having been superseded,
as zc.relationship was by zc.relation.

So why don't we all work on the same packages? The main reason is one
of legacy. Plone is built on Zope2 and ZCatalog. It works, but it is
not without it's issues - we can't have queries that join from that
catalog to a zc.relation catalog. Standalone ZCatalog failed because
it came to early - Zope2 was only recently eggified, so to be
successful the standalone ZCatalog would need to be used in Zope2.

Re: [ZODB-Dev] SpatialIndex

2010-06-28 Thread Laurence Rowe
On 28 June 2010 19:31, Nitro ni...@dr-code.org wrote:
 Am 28.06.2010, 16:52 Uhr, schrieb Laurence Rowe l...@lrowe.co.uk:

 So why don't we all work on the same packages? The main reason is one
 of legacy. Plone is built on Zope2 and ZCatalog. It works, but it is
 not without it's issues - we can't have queries that join from that
 catalog to a zc.relation catalog. Standalone ZCatalog failed because
 it came to early - Zope2 was only recently eggified, so to be
 successful the standalone ZCatalog would need to be used in Zope2.
 Nobody has bothered with this because non-legacy code shouldn't be
 using ZCatalog anyway - there are newer and better ways of doing it.

 Oh, nice to know. I was already writing test cases for standalone ZCatalog
 integration in my project as all other indices seemed tied to plone :)

In general, if it's not on PyPI it doesn't exist as far as the Zope
world is concerned. (I can't find any references to standalone
ZCatalog after 2005.)

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] SpatialIndex

2010-06-28 Thread Laurence Rowe
On 28 June 2010 21:27, Nitro ni...@dr-code.org wrote:
 ZODB is a general python object database with a much wider audience than
 just plone. It suits desktop applications just as well as applications
 you'd normally use twisted and pickle for. Forcing all those zope
 dependencies like buildout on people does not add to the attractiveness of
 ZODB for users outside zope. Having indices only in plone does also not
 make sense. Many applications would benefit from keyword, field,
 full-text, spatial, younameit indices. Yet extracting individual packages
  from zope/plone is impossible due to the slew of dependencies. While I can
 accept a dependency like zope.interface I don't accept a lot of the
 others.  It really prevents ZODB from living up to its full potential in
 non-plone applications.

Remember that Plone is an eight year old application that is built on
top of a 12 year old Application server. There has been much progress
since then (and plenty of people who build non-Plone ZODB based
applications), but the size of the codebase means it is not possible
to always be using the current best practice.
http://zope2.zope.org/about-zope-2/the-history-of-zope

Nobody would recommend that you try to extract stuff from Plone or
Zope2. In my opinion there are two main sources of packages for
non-Zope2 dependent applications.

* The ZTK extracted the core of Zope 3 and is used in application
servers such as Grok and BlueBream. It contains zope.catalog and it's
related packages. There are several extensions on top of this such as
zc.catalog and hurry.query. The 1.0 release has not been there yet,
but the underlying packages are stable. Installing zcatalog requires a
total of 34 packages (ZODB3 requires 10)
http://docs.zope.org/zopetoolkit/releases/packages-trunk.html

* The Repoze project has focussed on making zope technologies more
easily accessible to applications outside of Zope. Whilst the ZTK
project has improved things a lot, it is still a relatively large
chunk to swallow whole. repoze.catalog is extracted from zope.catalog
and requires only zope.index in addition to ZODB3.
http://docs.repoze.org/catalog/

At the very lowest level are the indexes themselves such as zope.index
and zc.relation, a spatial index would fit in here too.

(Health warning: I'm mostly a Plone developer, so do not yet have
experience using these packages)

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Spatial indices

2010-06-16 Thread Laurence Rowe
It really depends on what you are trying to achieve.

The simplest solution would probably be to use a geohash string within
an OOBTree.

If you need a full geospatial solution, postgis is featureful and easy
to use, and simple to integrate transactionally with ZODB.

Reinventing the wheel is rarely the right option, though it might be more fun ;)

Laurence

On 16 June 2010 16:45, Nitro ni...@dr-code.org wrote:
 Hello,

 I tried to find a spatial index which integrates seamlessly into the ZODB.
 Unfortunately I did not find a satisfying solution anywhere. So I came up
 with three solutions how this could be implemented:

 1) Write a native r-tree package, just like the current BTrees. Would
 likely have to be written in C for performance.

 2) Make use of the existing B+ Trees by using a space filling curve such
 as the Z-curve or Hilbert curve to transform higher-dimensional data into
 1D data which can then be stored in a BTree. Since B+ trees also provide
 range querying capabilities this should give good query performance.
 Unsure how much speed-up a C implementation of the insert/query functions
 would give. More info:
 http://www.scholarpedia.org/article/B-tree_and_UB-tree and
 http://www.drdobbs.com/184410998 .

 3) Use the already existing Rtree package from
 http://pypi.python.org/pypi/Rtree . It's a thin wrapper of a C library, so
 it should be very fast. I can see two methods to make this work:

 3a)
 - Create an rtree.RTree (stored in a separate file) and an OOTreeSet.
 - inserting: insert item into BTree. Then insert item's oid into Rtree.
 - querying: user supplies bounding box, rtree is queried, oids are
 returned. look up objects by oid in BTree.
 - zeo: does not work out-of-the-box with zeo since the Rtrees on different
 machines are not synchronized.

 3b)
 - Create an rtree.RTree, a OOTreeSet and an IOTree. Difference to 3a):
 Create RTree with a custom storage manager (example:
 http://svn.gispython.org/svn/spatialindex/spatialindex/trunk/src/storagemanager/MemoryStorageManager.h
 and
 http://trac.gispython.org/spatialindex/browser/spatialindex/trunk/README#storage-manager
 ). This storage manager stores each page into the IOTree (key: pageId,
 value: pageData).
 - inserting: insert item into BTree. Then insert item's oid into Rtree.
 Causes storage manager to write out changed rtree pages to IOTree.
 - querying: user supplies bounding box, rtree is queried, pages for rtree
 returned from IOTree, oids finally returned from query. look up objects by
 oid in BTree.
 - zeo: works out-of-the-box with zeo since the rtree pulls its data from a
 btree (which is hooked up with zeo).

 Conclusion:

 1) Native r-tree package: It is a lot of work which has already been done
 before. Bug-prone. Ruled out.
 2) Spatial index on top of current BTrees: Looks interesting, could be
 done in python. Disadvantages: unclear UB tree patent situation, unclear
 how much work this really is.
 3a) Does not work with zeo out-of-the-box. Ruled out.
 3b) Requires writing a custom storage manager for the rtree package
 (likely in C). Provides different trees. Basic technology (rtrees +
 btrees) is tested.

 Would it make sense to add a default spatial index to ZODB? Does anybody
 of you have any experience with one of the mentioned solutions? Is anybody
 else interested in having a zodb spatial index?

 -Matthias
 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/

 ZODB-Dev mailing list  -  zodb-...@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Merge request - transaction savepoint release support

2010-06-07 Thread Laurence Rowe
Hi Jim,

I've created a new branch for my savepoint release changes following
the 1.1 release here:

  
svn+ssh://svn.zope.org/repos/main/transaction/branches/elro-savepoint-release-1.1

This does seem to be a real requirement, as I've had another request
to provide this functionality for zope.sqlalchemy - when a large
number of savepoints are used, the eventual commit can lead to a
`RuntimeError: maximum recursion depth exceeded` in SQLAlchemy as it
attempts to unroll its nested substransactions.

Laurence

On 17 January 2010 15:45, Laurence Rowe l...@lrowe.co.uk wrote:
 2010/1/17 Jim Fulton j...@zope.com:
 On Sat, Jan 16, 2010 at 1:03 PM, Laurence Rowe l...@lrowe.co.uk wrote:
 I've had a request to add savepoint release support to zope.sqlalchemy
 as some databases seem to limit the number of savepoints in a
 transaction.

 I've added this in a branch of transaction here:
 svn+ssh://svn.zope.org/repos/main/transaction/branches/elro-savepoint-release

 From the changelog:

 * Add support for savepoint.release(). Some databases only support a limited
  number of savepoints or subtransactions, this provides an opportunity for a
  data manager to free those resources.

 * Rename InvalidSavepointRollbackError to InvalidSavepointError (BBB 
 provided.)

 If there are no objections, I shall merge this to trunk.

 I'll review and merge.

 Great, thanks!

 What does it mean to release a savepoint? How is this different from 
 aborting
 a save point? I ask particularly in light of:

 On Sat, Jan 16, 2010 at 2:26 PM, Laurence Rowe l...@lrowe.co.uk wrote:
 2010/1/16 Laurence Rowe l...@lrowe.co.uk:
 I'm still not sure this will allow me to add savepoint release support
 to zope.sqlalchemy, as SQLAlchemy has a concept of nested transactions
 rather than savepoints.
 http://groups.google.com/group/sqlalchemy/browse_thread/thread/7a4632587fd97724

 Michael Bayer noted on the sqlalchemy group that on RELEASE SAVEPOINT
 Postgresql destroys all subsequent savepoints. My branch now
 implements this behaviour.

 For zope.sqlalchemy I commit the sqlalchemy substransaction on
 savepoint.release(). This translates to a RELEASE SAVEPOINT on
 postgresql, best described by their docs here:

 
 RELEASE SAVEPOINT destroys a savepoint previously defined in the
 current transaction.

 Destroying a savepoint makes it unavailable as a rollback point, but
 it has no other user visible behavior. It does not undo the effects of
 commands executed after the savepoint was established. (To do that,
 see ROLLBACK TO SAVEPOINT.) Destroying a savepoint when it is no
 longer needed allows the system to reclaim some resources earlier than
 transaction end.

 RELEASE SAVEPOINT also destroys all savepoints that were established
 after the named savepoint was established.
 
 http://developer.postgresql.org/pgdocs/postgres/sql-release-savepoint.html

 Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Automating retry management

2010-05-11 Thread Laurence Rowe
On 11 May 2010 15:08, Jim Fulton j...@zope.com wrote:
 On Tue, May 11, 2010 at 8:38 AM, Benji York be...@zope.com wrote:
 On Tue, May 11, 2010 at 7:34 AM, Jim Fulton j...@zope.com wrote:
 [...] The best I've been
 able to come up with is something like:

    t = ZODB.transaction(3)
    while t.trying:
        with t:
            ... transaction body ...

 I think you could get this to work:

 for transaction in ZODB.retries(3):
    with transaction:
        ... transaction body ...

 ZODB.retries would return an iterator that would raise StopIteration on
 the next go-round if the previously yielded context manager exited
 without a ConflictError.

 This is an improvement. It's still unsatisfying, but I don't think I'm going 
 to
 get satisfaction. :)

 BTW, if I do something like this, I think I'll add a retry exception to
 the transaction package and have ZODB.POSException.ConflictError
 extend it so I can add the retry automation to the transaction package.

The repoze.retry package lets you configure a list of exceptions.
http://pypi.python.org/pypi/repoze.retry
http://svn.repoze.org/repoze.retry/trunk/repoze/retry/__init__.py

 Though it seems inspecting the error text is required for most sql
database errors to know if they are retryable, as ZPsycoPGDA does:

 188 except (psycopg2.ProgrammingError,
psycopg2.IntegrityError), e:
 189 if e.args[0].find(concurrent update)  -1:
 190 raise ConflictError

(https://dndg.it/cgi-bin/gitweb.cgi?p=public/psycopg2.git;a=blob;f=ZPsycopgDA/db.py)

For PostgreSQL it should be sufficient to catch these errors and raise
Retry during tpc_vote.

For databases which do not provide MVCC in the same way as PostgreSQL,
concurrency errors could be manifested at any point in the
transaction. Even Oracle can raise an error during a long running
transaction when insufficient rollback space is available, resulting
in what is essentially a read conflict error. Such errors could not be
caught by a data manager and reraised as a Retry exception.

I think it might be useful to add an optional method to data managers
that is queried by the retry automation machinery to see if an
exception should potentially be retried. Perhaps this would best be
accomplished in two steps:

1. Add an optional property to data managers called ``retryable``.
This is a list of potentially retryable exceptions. When a data
manager is added to the transaction, the transaction's list of
retryable exceptions is extended by the joining data managers list of
retryable exceptions.

t = transaction.begin()
try:
application()
except t.retryable, e:
t.retry(e):

2. t.retry(e) is then checks with each registered data manager if that
particular exceptions is retryable, and if so raises Retry.

def retry(self, e):
for datamanager in self._resources:
try:
retry = datamanager.retry
except AttributeError:
continue
if isinstance(e, datamanager.retryable):
datamanager.retry(e) # dm may raise Retry here

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)

2010-05-11 Thread Laurence Rowe
I think this means that you are storing all of your data in a single
persistent object, the database root PersistentMapping. You need to
break up your data into persistent objects (instances of objects that
inherit from persistent.Persistent) for the ZODB to have a chance of
performing memory mapping. You want to do something like:

import transaction
from ZODB import FileStorage, DB
from BTrees.LOBTree import BTree, TreeSet
storage = FileStorage.FileStorage('/tmp/test-filestorage.fs')
db = DB(storage)
conn = db.open()
root = conn.root()
transaction.begin()
index = root['index'] = BTree()
values = index[1] = TreeSet()
values.add(42)
transaction.commit()

You should probably read:
http://www.zodb.org/documentation/guide/modules.html#btrees-package.
Since that was written an L variants of the BTree types have been
introduced for storing 64bit integers. I'm using an LOBTree because
that maps 64bit integers to python objects. For values I'm using an
LOTreeSet, though you could also use an LLTreeSet (which has larger
buckets).

Laurence

On 12 May 2010 00:37, Ryan Noon rmn...@gmail.com wrote:
 Hi Jim,
 I'm really sorry for the miscommunication, I thought I made that clear in my
 last email:
 I'm wrapping ZODB in a 'ZMap' class that just forwards all the dictionary
 methods to the ZODB root and allows easy interchangeability with my old
 sqlite OODB abstraction.
 wordid_to_docset is a ZMap, which just wraps the ZODB
 boilerplate/connection and forwards dictionary methods to the root.  If this
 seems superfluous, it was just to maintain backwards compatibility with all
 of the code I'd already written for the sqlite OODB I was using before I
 switched to ZODB.  Whenever you see something like wordid_to_docset[id] it's
 just doing self.root[id] behind the scenes in a __setitem__ call inside the
 ZMap class, which I've pasted below.
 The db is just storing longs mapped to array('L')'s with a few thousand
 longs in em.  I'm going to try switching to the persistent data structure
 that Laurence suggested (a pointer to relevant documentation would be really
 useful), but I'm still sorta worried because in my experimentation with ZODB
 so far I've never been able to observe it sticking to any cache limits, no
 matter how often I tell it to garbage collect (even when storing very small
 values that should give it adequate granularity...see my experiment at the
 end of my last email).  If the memory reported to the OS by Python 2.6 is
 the problem I'd understand, but memory usage goes up the second I start
 adding new things (which indicates that Python is asking for more and not
 actually freeing internally, no?).
 If you feel there's something pathological about my memory access patterns
 in this operation I can just do the actual inversion step in Hadoop and load
 the output into ZODB for my application later, I was just hoping to keep all
 of my data in OODB's the entire time.
 Thanks again all of you for your collective time.  I really like ZODB so
 far, and it bugs me that I'm likely screwing it up somewhere.
 Cheers,
 Ryan


 class ZMap(object):

     def __init__(self, name=None, dbfile=None, cache_size_mb=512,
 autocommit=True):
         self.name = name
         self.dbfile = dbfile
         self.autocommit = autocommit

         self.__hash__ = None #can't hash this

         #first things first, figure out if we need to make up a name
         if self.name == None:
             self.name = make_up_name()
         if sep in self.name:
             if self.name[-1] == sep:
                 self.name = self.name[:-1]
             self.name = self.name.split(sep)[-1]


         if self.dbfile == None:
             self.dbfile = self.name + '.zdb'

         self.storage = FileStorage(self.dbfile, pack_keep_old=False)
         self.cache_size = cache_size_mb * 1024 * 1024

         self.db = DB(self.storage, pool_size=1,
 cache_size_bytes=self.cache_size,
 historical_cache_size_bytes=self.cache_size, database_name=self.name)
         self.connection = self.db.open()
         self.root = self.connection.root()

         print 'Initializing ZMap %s in file %s with %dmb cache. Current
 %d items' % (self.name, self.dbfile, cache_size_mb, len(self.root))

     # basic operators
     def __eq__(self, y): # x == y
         return self.root.__eq__(y)
     def __ge__(self, y): # x = y
         return len(self) = len(y)
     def __gt__(self, y): # x  y
         return len(self)  len(y)
     def __le__(self, y): # x = y
         return not self.__gt__(y)
     def __lt__(self, y): # x  y
         return not self.__ge__(y)
     def __len__(self): # len(x)
         return len(self.root)


     # dictionary stuff
     def __getitem__(self, key): # x[key]
         return self.root[key]
     def __setitem__(self, key, value): # x[key] = value
         self.root[key] = value
         self.__commit_check() # write back if necessary

     def __delitem__(self, key): # del x[key]
         del self.root[key]

     def get(self, key, 

Re: [ZODB-Dev] Problem with handling of data managers that join transactions after savepoints

2010-05-10 Thread Laurence Rowe
On 10 May 2010 21:41, Jim Fulton j...@zope.com wrote:
 A. Change transaction._transaction.AbortSavepoint to remove the
   datamanager from the transactions resources (joined data managers)
   when the savepoint is rolled back and abort called on the data
   manager. Then, if the data manager rejoins, it will have joined
   only once.

   Update the documentation of the data manager abort method (in
   IDataManager) to say that abort is called either when a transaction
   is aborted or when rolling back to a savepoint created before the
   data manager joined, and that the data manager is no longer joined
   to the transaction after abort is called.

   This is a backward incompatible change to the interface (because it
   weakens a precondition) that is unlikely to cause harm.

 I plan to implement A soon if there are no objections.

 Unless someone somehow convinced me to do D, I'll also add an
 assertion in the Transaction.join method to raise an error if a
 data manager joins more than once.

Option A sounds sensible. It also means I won't have to change
anything in the zope.sqlalchemy data manager.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)

2010-05-10 Thread Laurence Rowe
I think that moving to an LLTreeSet for the docset will significantly
reduce your memory usage. Non persistent objects are stored as part of
their parent persistent object's record. Each LOBTree object bucket
contains up to 60 (key, value) pairs. When the values are
non-persistent objects they are stored as part of the bucket object's
record, and so accessing any key of a bucket in a transaction brings
up to 60 docsets into memory. I would not be surprised if your program
forces most of your data into memory each batch - as most words are in
most documents.

At the very least you should move to an LLSet (essentially a single
BTree bucket). An LLTreeSet has the additional advantage of being
scalable to many values, and if under load from multiple clients you
are far less likely to see conflicts.

Laurence

On 11 May 2010 01:20, Ryan Noon rmn...@gmail.com wrote:
 P.S. About the data structures:
 wordset is a freshly unpickled python set from my old sqlite oodb thingy.
 The new docsets I'm keeping are 'L' arrays from the stdlib array module.
  I'm up for using ZODB's builtin persistent data structures if it makes a
 lot of sense to do so, but it sorta breaks my abstraction a bit and I feel
 like the memory issues I'm having are somewhat independent of the container
 data structures (as I'm having the same issue just with fixed size strings).
 Thanks!
 -Ryan

 On Mon, May 10, 2010 at 5:16 PM, Ryan Noon rmn...@gmail.com wrote:

 Hi all,
 I've incorporated everybody's advice, but I still can't get memory to obey
 cache-size-bytes.  I'm using the new 3.10 from pypi (but the same behavior
 happens on the server where I was using 3.10 from the new lucid apt repos).
 I'm going through a mapping where we take one long integer docid and map
 it to a collection of long integers (wordset) and trying to invert it into
 a mapping for each 'wordid in those wordsets to a set of the original
 docids (docset).
 I've even tried calling cacheMinimize after every single docset append,
 but reported memory to the OS never goes down and the process continues to
 allocate like crazy.
 I'm wrapping ZODB in a ZMap class that just forwards all the dictionary
 methods to the ZODB root and allows easy interchangeability with my old
 sqlite OODB abstraction.
 Here's the latest version of my code, (minorly instrumented...see below):
         try:
             max_docset_size = 0
             for docid, wordset in docid_to_wordset.iteritems():
                 for wordid in wordset:
                     if wordid_to_docset.has_key(wordid):
                         docset = wordid_to_docset[wordid]
                     else:
                         docset = array('L')
                     docset.append(docid)
                     if len(docset)  max_docset_size:
                         max_docset_size = len(docset)
                         print 'Max docset is now %d (owned by wordid %d)'
 % (max_docset_size, wordid)
                     wordid_to_docset[wordid] = docset
                     wordid_to_docset.garbage_collect()
                     wordid_to_docset.connection.cacheMinimize()

                 n_docs_traversed += 1

                 if n_docs_traversed % 100 == 1:
                     status_tick()
                 if n_docs_traversed % 5 == 1:
                     self.do_commit()

             self.do_commit()
         except KeyboardInterrupt, ex:
             self.log_write('Caught keyboard interrupt, committing...')
             self.do_commit()
 I'm keeping track of the greatest docset (which would be the largest
 possible thing not able to be paged out) and its only 10,152 longs (at 8
 bytes each according to the array module's documentation) at the point 75
 seconds into the operation when the process has allocated 224 MB (on a
 cache_size_bytes of 64*1024*1024).

 On a lark I just made an empty ZMap in the interpreter and filled it with
 1M unique strings.  It took up something like 190mb.  I committed it and mem
 usage went up to 420mb.  I then ran cacheMinimize (memory stayed at 420mb).
  Then I inserted another 1M entries (strings keyed on ints) and mem usage
 went up to 820mb.  Then I committed and memory usage dropped to ~400mb and
 went back up to 833mb.  Then I ran cacheMinimize again and memory usage
 stayed there.  Does this example (totally decoupled from any other
 operations by me) make sense to experienced ZODB people?  I have really no
 functional mental model of ZODB's memory usage patterns.  I love using it,
 but I really want to find some way to get its allocations under control.
  I'm currently running this on a Macbook Pro, but it seems to be behaving
 the same way on Windows and Linux.
 I really appreciate all of the help so far, and if there're any other
 pieces of my code that might help please let me know.
 Cheers,
 Ryan
 On Mon, May 10, 2010 at 3:18 PM, Jim Fulton j...@zope.com wrote:

 On Mon, May 10, 2010 at 5:39 PM, Ryan Noon rmn...@gmail.com wrote:
  First off, thanks everybody.  I'm 

Re: [ZODB-Dev] Changing the pickle protocol?

2010-04-28 Thread Laurence Rowe
I suspect that something like 90% of ZODB pickle data will be string
values, so the scope for reducing the space used by a ZODB through the
newer pickle protocol – and even the class registry – is limited.

What would make a significant impact on data size is compression. With
lots of short strings it's probably best to use a preset dictionary
(which sadly does not seem to be exposed through the python zlib
module). Text is usually very amenable to compression, and now we have
blobs most binary data will no longer be in the Data.fs.

Compression could either be implemented on the database level (which
is probably cleanest) or on the application level (which would also
reduce the size of content objects in memory). This would bring clear
wins where I/O or memory bandwidth are the limiting factors - CPUs
spend most of their time waiting for data to be copied into their
cache from memory.

Laurence

2010/4/28 Hanno Schlichting ha...@hannosch.eu:
 Hi.

 The ZODB currently uses a hardcoded pickle protocol one. There's both
 the more efficient protocol two and in Python 3 protocol 3. Protocol
 two has seen various improvements in recent Python versions, triggered
 by its use in memcached.

 I'd be interested to work on changing the protocol. How should I approach 
 this?

 I can see three general approaches:

 1. Hardcode the version to 2 in all places, instead of one.

 Pros: Easy to do, backwards compatible with all supported Python versions
 Cons: Still inflexible

 2. Make the protocol version configurable

 Pros: Give control to the user, one could change the protocol used for
 storages or persistent caches independently
 Cons: More overhead, different protocol versions could have different bugs

 3. Make the format configurable

 Shane made a proposal in this direction at some point. This would
 abstract the persistent format and allow for different serialization
 formats. As part of this one could also have different Pickle/Protocol
 combinations.

 Pros: Lots of flexibility, it might be possible to access the data
 from different languages
 Cons: Even more overhead


 If I am to look into any of these options, which one should I look
 into? Option 1 is obviously the easiest and I made a branch for this
 at some point already. I'm not particularly interested in option 3
 myself, as I haven't had the use-case.

 Thanks for any advice,
 Hanno
 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/

 ZODB-Dev mailing list  -  zodb-...@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] plone and postgres connection problems

2010-04-19 Thread Laurence Rowe
I've had this issue reported to me in the context of zope.sqlalchemy,
but have been unable to reproduce it. Others have also seen it, but as
far as I am aware have not been able to reproduce it:
http://www.mail-archive.com/pgsql-hack...@postgresql.org/msg146522.html

As there have now been three sightings, in the context of SQLAlchemy,
Django, and plain dbapi2 usage (RelStorage). I suspect it is a real
issue with Psycopg2, however until it can be reproduced I'm not
hopeful it can be fixed.

Laurence

On 19 April 2010 19:34, lista administracion reference.l...@gmail.com wrote:
 Hi

 We have two servers with Plone that point to the same database [Postgres 8]
 so that if one fails the other working.

 The problem is that at least once a day left to answer my Plone few minutes
 (it is in white or takes a long to respond)
 and cause apache send Proxy Error, this is solved but it is only annoying
 for we users to wait 5 to 10 minutes not counting the lack of availability
 of the page.

 We find that when it fails, PostgreSQL maintains a connection and keeps a
 long time as in transaction and after about 10 minutes this connection
 automatically terminates. this causes plone fail,at least 10 minutes

 The solution restart plone or kill the process as follows

 postgres 23267  0.0  0.1 2172016 7768 ?    S    10:17   0:00 postgres:
 ploneadmin plonetesting 10.9.33.116(45189) idle in transaction
 kill -15 23267


 We have
 2 Servers  with Apache/Plone
 Ram   4 Gigas
 RedHat    5.4
 Apache    2.2.3
 Plone 3.3.4
 RelStorage-1.4.0b3

 1 Server with Postgres
 Ram   4 Gigas
 RedHat    5.4
 Postgres  8.1.18

 Configuration.

 -- Conexion DB
    relstorage
     blob-dir var/blobs
     postgresql
     dsn dbname='plone' user='admin' host='10.9.33.128'
 password='password123'
     /postgresql
     /relstorage

 Recently modify these parameters in both servers and now it takes about 3
 days to fail.

     blob-dir var/blobs
     cache-local-mb 512
     cache-prefix prod
     cache-delta-size-limit 5000
     commit-lock-timeout 10
     poll-interval 60
     pack-dry-run true
     pack-batch-timeout 8
     pack-duty-cycle 0.3
     pack-max-delay 30

 We do not know if the problem is plone, postgres or network
 I hope someone can advise something.

 thanks in advance.

 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/

 ZODB-Dev mailing list  -  zodb-...@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Low-level reference finding, tracking, deleting?

2010-04-17 Thread Laurence Rowe
On 17 April 2010 05:27, Jeff Shell j...@bottlerocket.net wrote:
 We encountered a problem during an export/import in a Zope 3 based 
 application that resulted in something not being importable. This is from our 
 very first Zope 3 based application, and I stumbled across some very old 
 adapter/utility registrations that I thought I had cleared out. There are 
 references to `zope.interface.adapter.Null` which haven't been around for 
 years. This is in an old `LocalAdapterRegistry` which, again, I thought I had 
 removed along time ago. These objects and what they reference are not part of 
 our normal object graph, and I was surprised to see them.

 Given an oid, how can I trace what references that object/oid? There is 
 something in our normal object hierarchy retaining a reference, but I don't 
 know how to find it, and imagine that trying to investigate/load the objects 
 from the ZODB level will help me find the culprit.

I describe how to do this in an article here:
http://plone.org/documentation/kb/debug-zodb-bloat

Since then, Jim has written zc.zodbgc in which the
multi-zodb-check-refs script will optionally produce a database
of reverse references.
http://www.mail-archive.com/zodb-dev@zope.org/msg04389.html

 Are there low level deletion tools in the ZODB to delete individual objects?

You delete an object by removing all references to it, so it becomes
liable for garbage collection. Persistent component registrations will
be referenced from the registry as well as the _components container.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-13 Thread Laurence Rowe
Running your test script on my small amazon EC2 instance on linux
takes between 0.0 and 0.04 seconds (I had to remove the divide by
total to avoid a zero division error). 0.02 is 5000/s.

Laurence

On 14 April 2010 00:25, Nitro ni...@dr-code.org wrote:
 40 tps sounds low:  are you pushing blob content over the wire somehow?

 I have seen the ZEO storage committing transactions at least an order of
 magnitude faster than that (e.g., when processing incoming newswire
 feeds).  I would guess that there could have been some other latencies
 involved in your setup (e.g., that 0-100ms lag you mention below).

 See my attached test script. It outputs 45-55 transactions/s for 100 byte
 sized payload. Maybe there's a very fundamental flaw in the way the test is
 setup. Note that I am testing on a regular desktop machine (Windows 7,
 WoW64, 4GB RAM, 1TB hard disk capable of transfer rates 100MB/s).

 The zeo server and clients will be in different physical locations, so
 I'd
 probably have to employ some shared filesystem which can deal with that.
 Speaking of locations of server and clients, is it a problem - as in zeo
 will perform very badly under these circumstances as it was not designed
 for this - if they are not in the same location (typical latency
 0-100ms)?

 That depends on the mix of reads and writes in your application.  I have
 personnally witnessed a case where the clients stayed up and serving
 pages over a whole weekend in a clusterfsck where both the ZEO server
 and the monitoring infrastructure went belly up.  This was for a large
 corporate intranet, in case that helps:  the problem surfaced
 mid-morning on Monday when the employee in charge of updating the lunch
 menu for the week couldn't save the changes.

 Haha, I hope they solved this critical problem in time!

 In my case the clients might be down for a couple of days (typically 1 or
 2 days) and they should not spend 30 mins in cache verification time each
 time they reconnect. So if these 300k objects take up 1k each, then they
 occupy 300 MB of ram which I am fine with.

 If the client is disconnected for any period of time, it is far more
 likely that just dumping the cache and starting over fresh will be a
 win.  The 'invalidation_queue' is primarily to support clients which
 remain up while the storage server is down or unreachable.

 Yes, taking the verification time hit is my plan for now. However, dumping
 the whole client cache is something I'd like to avoid, since the app I am
 working on will not work over a corporate intranet and thus the bandwidth
 for transferring the blobs is limited (and so can take up considerable
 time). Maybe I am overestimating the whole client cache problem though.

 Thanks again for your valuable advice,
 -Matthias
 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/

 ZODB-Dev mailing list  -  zodb-...@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Checking the length of OOBTree

2010-04-08 Thread Laurence Rowe
A BTree does not keep track of it's length. See BTrees.Length.Length:

http://apidoc.zope.org/++apidoc++/Code/BTrees/Length/Length/index.html

Laurence

On 8 April 2010 16:36, Leszek Syroka leszek.marek.syr...@cern.ch wrote:
 Hi,

 what is the fastest way of checking the number of elements in OOBtree.
 Execution time of

 len( OOBtree.keys() ) and len(OOBtree)

 is exactly the same. For big data sets execution time is unacceptable. I
 found out that in the implementation of OOBtree (written in C) there is
 a variable called 'len', which seems to contain the length of the tree.
 Is it possible to access that variable from the python code without
 modifying the source?

 Best regards
 Leszek
 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/

 ZODB-Dev mailing list  -  zodb-...@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Savepoint release support

2010-01-17 Thread Laurence Rowe
2010/1/17 Jim Fulton j...@zope.com:
 On Sat, Jan 16, 2010 at 1:03 PM, Laurence Rowe l...@lrowe.co.uk wrote:
 I've had a request to add savepoint release support to zope.sqlalchemy
 as some databases seem to limit the number of savepoints in a
 transaction.

 I've added this in a branch of transaction here:
 svn+ssh://svn.zope.org/repos/main/transaction/branches/elro-savepoint-release

 From the changelog:

 * Add support for savepoint.release(). Some databases only support a limited
  number of savepoints or subtransactions, this provides an opportunity for a
  data manager to free those resources.

 * Rename InvalidSavepointRollbackError to InvalidSavepointError (BBB 
 provided.)

 If there are no objections, I shall merge this to trunk.

 I'll review and merge.

Great, thanks!

 What does it mean to release a savepoint? How is this different from 
 aborting
 a save point? I ask particularly in light of:

 On Sat, Jan 16, 2010 at 2:26 PM, Laurence Rowe l...@lrowe.co.uk wrote:
 2010/1/16 Laurence Rowe l...@lrowe.co.uk:
 I'm still not sure this will allow me to add savepoint release support
 to zope.sqlalchemy, as SQLAlchemy has a concept of nested transactions
 rather than savepoints.
 http://groups.google.com/group/sqlalchemy/browse_thread/thread/7a4632587fd97724

 Michael Bayer noted on the sqlalchemy group that on RELEASE SAVEPOINT
 Postgresql destroys all subsequent savepoints. My branch now
 implements this behaviour.

For zope.sqlalchemy I commit the sqlalchemy substransaction on
savepoint.release(). This translates to a RELEASE SAVEPOINT on
postgresql, best described by their docs here:


RELEASE SAVEPOINT destroys a savepoint previously defined in the
current transaction.

Destroying a savepoint makes it unavailable as a rollback point, but
it has no other user visible behavior. It does not undo the effects of
commands executed after the savepoint was established. (To do that,
see ROLLBACK TO SAVEPOINT.) Destroying a savepoint when it is no
longer needed allows the system to reclaim some resources earlier than
transaction end.

RELEASE SAVEPOINT also destroys all savepoints that were established
after the named savepoint was established.

http://developer.postgresql.org/pgdocs/postgres/sql-release-savepoint.html

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB3 installation ambiguous conclusion

2009-12-20 Thread Laurence Rowe
2009/12/20 Ross Boylan rossboy...@stanfordalumni.org:
 easy_install ZODB3 looked fairly good during installation until  the end:
 quote
 Processing transaction-1.0.0.tar.gz
 Running transaction-1.0.0\setup.py -q bdist_egg --dist-dir
 c:\users\ross\appdata\local\temp\easy_install-cw1i4f\transaction-1.0.0\egg-dist-tmp-z7nrfd
 Adding transaction 1.0.0 to easy-install.pth file

 Installed c:\python26\lib\site-packages\transaction-1.0.0-py2.6.egg
 Finished processing dependencies for ZODB3
 
 WARNING:

        An optional code optimization (C extension) could not be compiled.

        Optimizations for this package will not be available!

 Unable to find vcvarsall.bat
 
 /quote
 This seems to say things will work, just not as fast as they could.  But
 I'm a little puzzled why things would work at all, since I don't have a
 build environment on the machine (well, there is a compiler that's part
 of the MS SDK, but I'm not really sure how capable or operational it
 is--it did seem to compile some sample C code in the kit).

 Is there a pure python fallback for the C code?  I thought ZODB had some
 C-level magic.

ZODB requires C-code modules to work, but pre-compiled win32 eggs are
available, and presumably that is what easy_install picked. It's not
clear to me what generated that warning, but then I don't use Windows.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] No module named Persistence

2009-12-20 Thread Laurence Rowe
2009/12/20 Ross Boylan rossboy...@stanfordalumni.org:
 The IPC10 presentation says
 #Works as a side-effect of importing ZODB above
 from Persistence import Persistent

 I tried that (with the indicate other imports first).  It led to a No
 module error.

 I tried commenting out the line, since the comment could be interpreted
 to mean that importing ZODB already does what's necessary.  But there
 was no Persistent class defined I could use.

 I tried from Globals import Persistent, as suggested in a 1998 posting.
 This produced No module named  Globals.

 Suggestions?

That is the old Zope2 persistence base class. Try 'from persistent
import Persistent'.

http://docs.zope.org/zodb/zodbguide/prog-zodb.html#writing-a-persistent-class

(note that guide has probably not been updated since ZODB 3.7, so
don't expect any newer features to be documented there).

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Python properties on Persistent objects

2009-12-16 Thread Laurence Rowe
2009/12/17 Mikko Ohtamaa mi...@redinnovation.com:
 Hi,

 I need to have little clarification should properties work on
 Persistent objects. I am running ZODB 3.8.4 on Plone 3.3.

 I am using plone.behavior and adapters to retrofit objects with a new
 behavior (HeaderBehavior object). This object is also editable through
 z3c.form interface. z3c.form requires a context variable on the object
 e.g. to look up dynamic vocabularies. To avoid having this
 object.context attribute to be peristent (as it's known every time by
 the factory method of the adapter which creates/look-ups
 HeaderBehavior) I tried to spoof context variable using properties and
 internal volatile variable. This was a trick I learnt somewhere
 (getpaid.core?)

This sounds like you are passing context somewhere where view is expected.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Laurence Rowe
2009/12/9 Pedro Ferreira jose.pedro.ferre...@cern.ch:
 Hello,
 Just zodbbrowser with no prefix:

   http://pypi.python.org/pypi/zodbbrowser
   https://launchpad.net/zodbbrowser

 It's a web-app: it can connect to your ZEO server so you can inspect the
 DB while it's being used.

 We tried this, but we currently get an error related with the general
 security policy for zope.app. Maybe we need to install Zope?
 This would be a very handy tool.
 I'd suggest dumping the last few transactions with one of the ZODB
 scripts (fsdump.py perhaps) and seeing what objects get modified.

 That's what we've being doing, and we got some clues. We've modified
 Jim's script in order to find out which OIDs are being rewritten, and
 how much space they are taking, and this is a fragment of it:

 OID class_name total_size percent_size n_pickles min_size avg_size max_size
 '\x00\x00\x00\x00%T\x89{' BTrees.OOBTree.OOBucket 17402831841 30% 8683
 1977885 2004241 2026518
 '\x00\x00\x00\x00%T\x89|' BTrees.OOBTree.OOBucket 14204430890 24% 8683
 1616904 1635889 1651956
 '\x00\x00\x00\x00\x04dUH' MaKaC.common.indexes.StatusIndex 11955954522
 20% 28513 418230 419315 420294
 '\x00\x00\x00\x00%\xa0%\x7f' BTrees.OOBTree.OOBucket 3532998238 6% 11238
 307112 314379 320647
 '\x00\x00\x00\x00%\xa0%\x80' BTrees.OOBTree.OOBucket 2193843302 3% 11238
 190816 195216 199007
 '\x00\x00\x00\x00\x04\x8e\xb6\x04' BTrees.OOBTree.OOBucket 1728216003 3%
 1953 880615 884903 887285
 [...]

 As you can see, we have an OOBucket occupying more than 2MB (!) per
 write. That's almost 17GB only considering the last 1M transactions of
 the DB (we get ~3M transactions per week). We believe this bucket
 belongs to some OOBTree-based index that we are using, whose values are
 Python lists (maybe that was a bad choice to start with?). In any case,
 how do OOBuckets work? Is it a simple key space segmentation strategy,
 or are the values taken into account as well?
 Our theory is that an OOBTree simply divides the N keys in K buckets,
 and doesn't care about the contents. So, since we are adding very large
 lists as values, the tree remains unbalanced, and since new contents
 will be added to this last bucket, each rewrite will imply the addition
 of ~2MB to the file storage.

BTree buckets have no concept of the size of their contents, they
split when their number of keys reaches a threshold (30 for OOBTrees).

 Will the replacement of these lists with a persistent structure such as
 a PersistentList solve the issue?

The list would then be stored as a separate persistent object, so
changes to the bucket would not rewrite the entire list object. The
downside of this is that your application may become slower as reading
the contents of the index will incur additional object loads. Zope2's
ZCatalog stores index data as tuples in BTrees, but only a small
amount of metadata is stored (so the buckets are maybe 30-60KB). It
sounds like you are storing a large amount of metadata in the index,
or perhaps inadvertently indexing something. I've seen similar
problems caused by binary data ending up in a text index (where a
'word' ended up being several megabytes). Load the object to check the
problem is large values, rather than large keys.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-07 Thread Laurence Rowe
2009/12/7 Jose Benito Gonzalez Lopez jose.benito.gonza...@cern.ch:
 Dear ZODB developers,

 Since some time ago (not sure since when) our database
 has passed from 15GB to 65GB so fast, and it keeps growing
 little by little (2 to 5 GB per day). It is clear that something is not
 correct in it.

 We would like to check which objects are taking most of the space
 or just try to find out what is going on,...

 Any help or suggestions would be much appreciated.

Take a look at my write up here:
http://plone.org/documentation/kb/debug-zodb-bloat

You will want analyze.py from the latest ZODB release (or download it
from http://svn.zope.org/ZODB/trunk/src/ZODB/scripts/) the version
that ships with Zope 2.10.9 is broken.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] repozo, neither official nor supported, apparently...

2009-11-20 Thread Laurence Rowe
2009/11/20 Chris Withers ch...@simplistix.co.uk:
 Jim Fulton wrote:
 On Thu, Nov 19, 2009 at 7:01 PM, Chris Withers ch...@simplistix.co.uk 
 wrote:
 Jim Fulton wrote:
 There's nothing official or supported about a backup solution without
 automated tests.

 So I guess there isn't one.
 Right, so what does Zope Corp use?

 We use ZRS, of course.

 Well, ZRS solves the HA challenge the same way as zeoraid, if I
 understand correctly, but what about offsite backups and the like?

 The project I'm currently working on uses repozo to create backups that:

 - get hoovered by the hosting provider's backup mechanisms and rotated
 offsite daily

 - get sprayed by rsync over ssh to a DR site on another continent

 How would ZRS solve these problems?

 I'd prefer that there be a file-storage backup solution out of the box.
 repozo is the logical choice.  It sounds like it needs some love though.
 This isn't something I'd likely get to soon.

 I'm not sure how much love repozo needs. It works, and it won't need
 changing until FileStorage's format changes, which I don't see happening
 any time soon.

Maybe this test I added for analyze.py could be a helpful template.
http://zope3.pov.lt/trac/changeset/100422

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] repozo, neither official nor supported, apparently...

2009-11-20 Thread Laurence Rowe
2009/11/20 Jim Fulton j...@zope.com:
 On Fri, Nov 20, 2009 at 9:32 AM, Chris Withers ch...@simplistix.co.uk wrote:
 ...
 I'm not sure how much love repozo needs. It works, and it won't need
 changing until FileStorage's format changes, which I don't see happening any
 time soon.

 It just occurred to me that repozo doesn't support blobs.

This was touched on in a thread Backing up Data.fs and blob
directory: https://mail.zope.org/pipermail/zodb-dev/2008-September/012094.html

While there is no direct support in repozo, the approach of first
taking a repozo backup followed by a blob directory backup works so
long as you do not pack between the repozo and blob backups. (Blobs
newer than the repozo backup are safely ignored.)

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZEO and blobs: missing tmp file breaks transaction on retry

2009-11-13 Thread Laurence Rowe
2009/11/13 Martin Aspeli optilude+li...@gmail.com:
 Hanno Schlichting wrote:
 On Fri, Nov 13, 2009 at 5:40 PM, Jim Fulton j...@zope.com wrote:
 On Fri, Nov 13, 2009 at 10:18 AM, Mikko Ohtamaa mi...@redinnovation.com 
 wrote:
 Unfortunately the application having the issues is Plone 3.3. ZODB 3.9
 depends on Zope 2.12 so, right?
 ZODB does depend on Zope anything. :)

 Plone 3.3 may use an earlier version of ZODB. but perhaps it is
 possible to get it to work with a later one. I wouldn't know. :)

 Plone 3.x uses Zope 2.10 and ZODB 3.7. Upgrading it to ZODB 3.8.x is trivial.

 But the changes in ZODB 3.9 (essentially the removal of the version
 feature) require a bunch of non-trivial changes to Zope2. So only Zope
 2.12 works with ZODB 3.9.

 Anyone using Plone 3.x who wants to use blobs is therefor stuck with
 ZODB 3.8.x. It's not supported by Plone and considered experimental on
 all layers :)

 Meanwhile, several people have used it in production. I was a little
 taken aback to discover that it is considered somewhat experimental
 (and it seems, a bit broken) in ZODB 3.8 (as distinct from the Plone
 integration package, plone.app.blob, which indeed has been experimental
 up until now). I think a lot of other people would be too.

 A lot of people would be very happy if this bug in ZODB 3.8 could be
 fixed, since the option of upgrading is not there (since ZODB 3.9
 introduces too-incompatible changes to work with Zope 2.10) for anyone
 on a released, stable version of Plone.

Presumably ZODB 3.9 maintains backwards compatibility for ZEO clients,
so a ZODB 3.9 ZEO server could be used with Zope 2.10 + ZODB 3.8
clients?

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] tools for analysing the tail of a filestorage?

2009-09-28 Thread Laurence Rowe

This may help: http://plone.org/documentation/how-to/debug-zodb-bloat/

Laurence


Chris Withers wrote:
 
 Hi All,
 
 I have a filestorage being used by Zope 2 that is mysteriously growing. 
 I don't have confidence in the Undo tab, since this setup has two 
 storages, once mounted into the other.
 
 I tried fstail.py, and while it tells me the same info as the Undo tab 
 (except with more certainty that it's showing the right storage results 
 ;-) it doesn't say much about the objects in question...
 
 Are there any other tools that might tell me more?
 
 cheers,
 
 Chris
 
 -- 
 Simplistix - Content Management, Batch Processing  Python Consulting
 - http://www.simplistix.co.uk
 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/
 
 ZODB-Dev mailing list  -  ZODB-Dev@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev
 
 

-- 
View this message in context: 
http://www.nabble.com/tools-for-analysing-the-tail-of-a-filestorage--tp25547059p25649804.html
Sent from the Zope - ZODB-Dev mailing list archive at Nabble.com.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN

2009-05-27 Thread Laurence Rowe
2009/5/27 Chris Withers ch...@simplistix.co.uk:
 Laurence Rowe wrote:

 Jim Fulton wrote:

 Well said. A feature I'd like to add is the ability to have persistent
  objects that don't get their own database records, so that you can get  the
 benefit of having them track their changes without incuring the  expense of
 a separate database object.

 +lots

 Hanno Schlichting recently posted a nice graph showing the persistent
 structure of a Plone Page object and it's 9 (!) sub-objects.
 http://blog.hannosch.eu/2009/05/visualizing-persistent-structure-of.html

 That graph isn't quite correct ;-)

 workflow_history has DateTime objects in it, and I think they get their own
 pickle.

 I had a major win on one CMFWorkflow project by changing the workflow
 implementation to use a better data structure *and* store ints instead of
 DateTime object. CMF should change this...

Good point, though it is 'correct' for an object that has not
undergone any workflow transitions yet, as is the case here ;)

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN

2009-05-26 Thread Laurence Rowe
Jim Fulton wrote:
 On May 26, 2009, at 10:16 AM, Pedro Ferreira wrote:

 In any case, it's not such a surprising number, since we have ~73141
 event objects and ~344484 contribution objects, plus ~492016  resource
 objects, and then each one of these may contain authors, and fore sure
 some associated objects that store different bits of info... So,  
 even if
 it doesn't include revisions, 19M is not such a surprising number.
 I've also tried to run the analyze.py script, but it returns me a
 stream of '''type' object is unsubscriptable errors, due to:

 classinfo = pickle.loads(record.data)[0]

 any suggestion?
 
 No. Unfortunately, most of the scripts in ZODB aren't tested or  
 documented well and tend to bitrot.
 
 Also, is there any documentation about the basic structures of the
 database available? We found some information spread through different
 sites, but we couldn't find exhaustive documentation for the API
 (information about the different kinds of persistent classes, etc...).
 Is there any documentation on this?
 
 
 No.  Comprehensive ZODB documentation is needed. This is an upcoming  
 project for me.

I have a patch at https://bugs.launchpad.net/zodb/+bug/223331 which 
fixes this.

Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN

2009-05-26 Thread Laurence Rowe
Jim Fulton wrote:
 Well said. A feature I'd like to add is the ability to have persistent  
 objects that don't get their own database records, so that you can get  
 the benefit of having them track their changes without incuring the  
 expense of a separate database object.

+lots

Hanno Schlichting recently posted a nice graph showing the persistent 
structure of a Plone Page object and it's 9 (!) sub-objects. 
http://blog.hannosch.eu/2009/05/visualizing-persistent-structure-of.html

A sub-persitent type would allow us to fix the latency problems we 
experience without needing to re-engineer Archetypes at the same time.

Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] ZODB Documentation

2009-05-26 Thread Laurence Rowe
A few weeks ago I converted the ZODB/ZEO Programming Guide and a few 
more articles into structured text and added them to the zope2docs 
buildout. I've now moved them to their own buildout in 
svn+ssh://svn.zope.org/repos/main/zodbdocs/trunk and they will soon 
appear at http://docs.zope.org/zodb (thanks Jens!)

This means we now have two copies of the programming guide, one in latex 
in the ZODB sources and one in stx in zodbdocs. I'd like to propose 
removing the latex version and direct any changes to the stx version.

Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB Documentation

2009-05-26 Thread Laurence Rowe
Andreas Jung wrote:
 On 26.05.09 19:08, Andreas Jung wrote:
 On 26.05.09 18:54, Laurence Rowe wrote:
   
 A few weeks ago I converted the ZODB/ZEO Programming Guide and a few 
 more articles into structured text and added them to the zope2docs 
 buildout. I've now moved them to their own buildout in 
 svn+ssh://svn.zope.org/repos/main/zodbdocs/trunk and they will soon 
 appear at http://docs.zope.org/zodb (thanks Jens!)
   
 
 There is also (the same=) ZODB documentation available under

 http://docs.zope.org/zope2/articles/

 We should get rid of one copy.
   
 oppss..sorry, for misreading...just seen your checkins for moving the stuff.

They're actually copies at the moment, once Jens performs his magic I'll 
remove them from the Zope 2 buildout.

Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZODB Documentation

2009-05-26 Thread Laurence Rowe
Jim Fulton wrote:
 On May 26, 2009, at 12:54 PM, Laurence Rowe wrote:
 
 A few weeks ago I converted the ZODB/ZEO Programming Guide and a few
 more articles into structured text and added them to the zope2docs
 buildout. I've now moved them to their own buildout in
 svn+ssh://svn.zope.org/repos/main/zodbdocs/trunk and they will soon
 appear at http://docs.zope.org/zodb (thanks Jens!)


 This means we now have two copies of the programming guide, one in  
 latex
 in the ZODB sources and one in stx in zodbdocs. I'd like to propose
 removing the latex version and direct any changes to the stx version.
 
 
 +1
 
 (I'd repressed knowledge of the latex version.)

Done.

Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN

2009-05-25 Thread Laurence Rowe
Pedro Ferreira wrote:
 Dear all,
 
 Thanks a lot for your help. In fact, it was a matter of increasing the
 maximum recursion limit.
 There's still an unsolved issue, though. Each time we try to recover a
 backup using repozo, we get a CRC error. Is this normal? Has it happened
 to anyone?
 
 I guess we have a very large database, for what is normal in ZODB
 applications. We were wondering if there's any way to optimize the size
 (and performance) of such a large database, through the removal of
 unused objects and useless data. We perform packs in a weekly basis, but
 we're not sure if this is enough, or if there are other ways of
 lightening up the DB. Any recommendations regarding this point?

You might want to try packing without garbage collection, which is a 
much cheaper operation. See 
http://mail.zope.org/pipermail/zodb-dev/2009-January/012365.html

Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Proposal (version 2): cross database reference seat belt

2009-04-30 Thread Laurence Rowe
Christian Theune wrote:
 Hi,
 
 On Tue, 2009-04-28 at 13:54 -0400, Jim Fulton wrote:
 Thanks again!

 (Note to everyone else, Shane and I discussed this on IRC, along with  
 another alternative that I'll mention below.)

 I like version 2 better than version 1.  I'd be inclined to simplify  
 and it and skip the configuration flag and simply publish an event any  
 time we see a cross-database reference when saving an object.

 Here's proposed solution 3. :)

 - We add a flag to disable new cross-database references unless they  
 are explicitly registered.
 - We add a connection method to register a reference:

   def registerCrossDatabaseReference(from_, to):
 Register a new cross-database reference from from_ to to.

 - We arrange that connections can recognize old cross-database  
 references.

 If someone accidentally creates a new reference and the flag is set,  
 then transaction will be aborted.

 An interim step, if we're in a hurry to get 3.9 out, is to simply add  
 the flag.  This would disallow cross-database references in new  
 applications.  These applications could still support multiple  
 databases by providing application-level traversal across databases.
 
 I think I'm reading something incorrectly: is there an emphasis on
 *new* applications? The flag would disallow the creation of
 cross-database references for a given DB -- independent of whether the
 app is new or old, right? Only depending on whether the application uses
 a ZODB that has the feature and has it enabled. Right?

I think the emphasis was on new versus existing cross-database references.

Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] ZEO and time.sleep

2009-03-18 Thread Laurence Rowe
For Plone, the standard remedy to this problem is to separate out 
portal_catalog into it's own storage (zeo has support for serving 
multiple storages). You may then control the object cache size per 
storage, setting the one for the portal_catalog storage large enough to 
keep all it's objects in the cache. As navigation is driven from the 
catalog this can significantly help performance and reduce the number of 
zeo loads to only those objects required to traverse to the published 
object.

Other things that might help:

   * Reduce the number of zserver-threads from the default 4, object 
caches are per thread so this allows you to have fewer, larger caches.

   * Use FileSystemStorage for Archetypes, this can help if you serve 
many files. Files are stored in 64k pdata chunks, serving a large file 
can clear your cache. With newer versions of Plone you can use ZODB 3.8 
and blobs.

   * Put Varnish or some other proxy cache in front and cache agressively.

   * Buy more memory, memory is cheap.

Hope that helps,

Laurence

Juan Pablo Gimenez wrote:
 Hi all...
 
 
   I'm profiling a big plone 2.5 instance with huge performance problems
 and I was wondering if this bug is still present in zope 2.9.9-final,
 http://mail.zope.org/pipermail/zodb-dev/2007-March/010855.html
 
   We can't increment the zodb-cache-size because we're running out of
 memory... so a lot of times we read objects from zeo/zodb...
 
   Any help will be really appreciated... 
 
 
   Saludos...
 
 
 
 
 
 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/
 
 ZODB-Dev mailing list  -  ZODB-Dev@zope.org
 http://mail.zope.org/mailman/listinfo/zodb-dev

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Relstorage pack problems

2009-01-20 Thread Laurence Rowe
Shane Hathaway wrote:
 
 I should note that this KeyError occurs while trying to report on a
 KeyError.  I need to fix that.  Fortunately, the same error pops out anyway.

There's a fix for this in the Jarn branch. Note that to collect more 
interesting data it rolls back the load connection at this point, 
relying on the KeyError to cause the transaction to fail.

Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] How to turn off 'GC' when packing on ZODB3.6.2

2009-01-20 Thread Laurence Rowe
eastxing wrote:
 Hi,
 
 I am using Plone2.5.5 with Zope2.9.8-final and ZODB3.6.2.Now my Data.fs 
 size is nearly 26G with almost 140k Plone objects and more than 4100k 
 zope objects in the database. Since 2 moths ago, I could not pack my 
 database successfully. Recent days I tried to pack it again, but after 
 more than 72 hours running, the pack process wasn't end
 
 I readed lots of discussions on the forum, some guys said turn off 'GC' 
 when packing will improve the speed tremendously.Then I found an 
 experimental product -- 'zc.FileStorage' written by Jim, but it seems 
 that it only used by ZODB3.8 or later. So what should I do on ZODB3.6.2 
 to turn off 'GC' when do packing.
 
 ps:If this is a wrong place to ask the question, please let me know, 
 I'll move it to the right place.

As an alternative to backporting the changes to pack, you could try 
doing a zexp export of the site, and then reimport the zexp into a blank 
Data.fs.

Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] problem with broken

2008-11-05 Thread Laurence Rowe
Broken objects occur when the class for a pickled object cannot be 
imported. To change the location of a class, you need to provide an 
alias at the old location so that the object can be unpickled, i.e. 
MyOldClassName = MyNewClassName. You can only remove MyOldClassName 
after you have updated all of the pickles (with your code below).

Laurence



Adam GROSZER wrote:
 Hello,
 
 I'm having a problem with broken objects here.
 It's coming when I'm trying to evolve generations and the generation
 just touches all objects in the ZODB to store them again with the
 non-deprecated classes.
 
 The code is like this:
 storage = context.connection._storage
 
 next_oid = None
 n = 0
 while True:
 oid, tid, data, next_oid = storage.record_iternext(next_oid)
 
 obj = context.connection.get(oid)
 # Make sure that we tell all objects that they have been changed. Who
 # cares whether it is true! :-)
 obj._p_activate()
 obj._p_changed = True
 
 if next_oid is None:
 break
 
 
 2008-11-04T19:40:16 ERROR SiteError 
 http://localhost:8080/++etc++process/@@generations.html
 Traceback (most recent call last):
   File F:\W\Zope3\src\zope\publisher\publish.py, line 133, in publish
 result = publication.callObject(request, obj)
 ...
   File F:\W\Zope3\src\zope\tal\talinterpreter.py, line 343, in interpret
 handlers[opcode](self, args)
   File F:\W\Zope3\src\zope\tal\talinterpreter.py, line 583, in 
 do_setLocal_tal
 self.engine.setLocal(name, self.engine.evaluateValue(expr))
   File F:\W\Zope3\src\zope\tales\tales.py, line 696, in evaluate
 return expression(self)
   File F:\W\Zope3\src\zope\tales\expressions.py, line 217, in __call__
 return self._eval(econtext)
   File F:\W\Zope3\src\zope\tales\expressions.py, line 211, in _eval
 return ob()
   File F:\W\Zope3\src\zope\app\generations\browser\managers.py, line 182, 
 in evolve
 transaction.commit()
   File F:\W\Zope3\src\transaction\_manager.py, line 93, in commit
 return self.get().commit()
   File F:\W\Zope3\src\transaction\_transaction.py, line 322, in commit
 self._commitResources()
   File F:\W\Zope3\src\transaction\_transaction.py, line 416, in 
 _commitResources
 rm.commit(self)
   File F:\W\Zope3\src\ZODB\Connection.py, line 541, in commit
 self._commit(transaction)
   File F:\W\Zope3\src\ZODB\Connection.py, line 586, in _commit
 self._store_objects(ObjectWriter(obj), transaction)
   File F:\W\Zope3\src\ZODB\Connection.py, line 620, in _store_objects
 p = writer.serialize(obj)  # This calls __getstate__ of obj
   File F:\W\Zope3\src\ZODB\serialize.py, line 405, in serialize
 meta = klass, newargs()
   File F:\W\Zope3\src\ZODB\broken.py, line 325, in __getnewargs__
 return self.__Broken_newargs__
 AttributeError: 'VocabularyManager' object has no attribute 
 '__Broken_newargs__'
 
 

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Amazon SimpleDB Adapter

2008-10-11 Thread Laurence Rowe
Shane Hathaway wrote:
 Benjamin Liles wrote:
 Currently at the Plone conference it seems that a large number of people
 are beginning to host their Plone sites on the Amazon EC2 service.  A
 simpleDB adapter might be a good way to provide persistent storage for
 an EC2 base Zope instance.  Has there been any interest in this?  If I
 was to write one, should I add it to RelStorage or create my own package
 along the lines of relstorage.adapters.simpledb?
 
 This sounds interesting!  We should add an adapter to RelStorage.  We
 might run into some trouble with MVCC, but I think we can solve that.
 We should also use Amazon S3 directly for blob storage.
 
 In general, Amazon's services seem a much better fit for ZODB apps than
 what Google is offering.

I'm not sure RelStorage is the best place for it - SimpleDB is very 
different to relational databases.

A couple of years ago I experimented with s3storage [1]. This turned out 
to be very slow due to the number of writes performed every transaction 
- one per object, though this could be improved if the writes were 
parallelized. It reached the point where zope2 would start up. This took 
about 10 or 15 mintutes at the time (I did not have access to EC2 at the 
time and this was over public wifi).

It worked by creating it's own indexes in S3. I don't think SimpleDB 
will give any advantage unless it is shown to be faster to query than 
S3. You cannot store pickles directly in SimpleDB because it is limited 
to an attribute size of 1024 bytes.

The challenge in building such a system is in Amazon's eventual 
consistency model means you cannot know how up to date your view of the 
data is. I think it could make a great backend for storing pickles 
(keyed by oid, tid) but it is probably much easier to have a separate 
index to consult during loadSerial.

It may also be worth experimenting with DirectoryStorage over s3fs [2].

Laurence


[1] http://code.google.com/p/s3storage

[2] http://code.google.com/p/s3fs

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Broken instances after refactoring in ZODB

2008-10-04 Thread Laurence Rowe
Leonardo Santagada wrote:
 On Oct 4, 2008, at 12:36 PM, Wichert Akkerman wrote:
 
 Adam wrote:

 Thanks for that, guys, I've not used a mailing list like this  
 before so
 unsure how to respond.

 If ZODB stores the Package.Module.Class name in the pickle would it  
 be
 possible for me to simply rename them in the binary file?
 Possible it is, but probably harder than just doing what they said
 
 My confusion here is that I've globally imported everything from the
 packages into the current namespace of my main module. ZODB  
 shouldn't be
 aware I've moved the modules since for all intents and purposes to
 Python, they are still there.

 It doesn't matter where you import it from or to - python uses the  
 location of the actual implementation and ZODB uses that. If you  
 move your implementation to another place you have to either update  
 all objects in the ZODB or add module aliases.

 Wichert.
 
 
 I would like to know from where does it get that info? I would guess  
 from __module__.

Correct.

 Why doesn't zodb has a table of some form for this info? I heard that  
 sometimes for very small objects the string containing this  
 information can use up to 30% of the whole space of the file (using  
 FileStorage). How does RelStorage store this?

I believe this was what the python pickle protocol 2 was created for. 
However I think when someone last looked the potential space savings 
with real world data did not justify making the change (Hanno has a 
branch in svn for this).

Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Zope memory usage

2008-09-18 Thread Laurence Rowe



Izak Burger-2 wrote:
 
 Dieter Maurer wrote:
 This is standard behaviour with long running processes on
 a system without memory compaction:
 
 Of course, I remember now, there was something about that in my 
 Operating Systems course ten years ago :-) I suppose the bigger page 
 sizes used on some architectures doesn't help.
 
 The zope instance in question is 2.10.5, which includes ZODB 3.7.1. Can 
 we simply swap that out with 3.8.0?  Or should we rather do a svn diff 
 on the dm-memory_size_limited-cache branch (based on 3.7.0) and see if 
 that applies cleanly to 3.7.1 (I suspect it will)?
 

I'm using the 3.8 branch (that will become 3.8.1) for it's blob support
happily with Plone 3.1 and Zope 2.10

Laurence

-- 
View this message in context: 
http://www.nabble.com/Zope-memory-usage-tp19528989p19558656.html
Sent from the Zope - ZODB-Dev mailing list archive at Nabble.com.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: ZODB not saving sometimes

2008-06-22 Thread Laurence Rowe

Andreas Jung wrote:



--On 22. Juni 2008 08:49:32 -0700 tsmiller [EMAIL PROTECTED] 
wrote:




Gary,
I have been using the ZODB for about a year and a half with a bookstore
application.  I am just now about ready to put it out on the internet for
people to use.  I have had the same problem with saving data.  I have
tried alot of things.  But I have never gotten the database to save
consistently.  I can create an x number of records one right after the
other that uses the exact same code to save them, and it is likely that
all of them will save perfectly except one - or maybe two.


We have never seen that - except with badly written application code. 
Commiting a transaction should always commit the data. That's the sense 
of a transaction system. The only reason I can imagine causing such a 
failure:

bare try..except with in your code suppressing ZODB conflict errors.


The other likely cause of this is modifying non-persistent sub objects 
and not setting _p_changed = True on the parent persistent object. e.g:


 dbroot['a_list'] = [1, 2, 3]
 transaction.commit()
 a_list = dbroot['a_list']
 a_list.append(4)
 transaction.commit()

The second commit actually has no effect as the persistence machinary 
has not been notified that the object has changed. This is not 
immediately apparent though as the 'live' shows what you expect:


 a_list
[1, 2, 3, 4]

And if a later transaction also modifies the persistent object, then all 
of the data is saved.


To avoid this, avoid using mutable, non-persistent types for storage in 
the ZODB, replace lists and dicts with PersistentList and PersistentMapping.


Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: Advice on ZODB with large datasets

2008-06-19 Thread Laurence Rowe
It's helpful to post your responses to the mailing list, that way when
someone else has a similar problem in the future they'll be able to
find the information.

Inheriting from Persistent is also necessary to control the
granularity of the database. Persistent objects are saved as separate
`records` by ZODB. Other objects do not have a _p_oid attribute and
have to be saved as part of their parent record.

Laurence

2008/6/19  [EMAIL PROTECTED]:
 Laurence Rowe wrote:

 [EMAIL PROTECTED] wrote:
 Does your record class inherit from persistent.Persistent? 650k integers +
 object pointers should only be of the order 10 Mb or so. It sounds to me
 like the record data is being stored in the btrees bucket directly.

No, it does not.  It's just a simple dictionary for the time being.
  I assumed the BTree bucket would itself know to load the values only when
 they are explicitly requested, and that the Persistence of the objects just
 merely meant that the database didn't keep track of changes of nonpersistent
 objects.  I will try copying my dictionaries to Persistent Mappings for now.

 Something like this should lead to smaller bucket objects where the record
 data is only loaded when you access the values of the btree:

   from BTrees.IOBTree import IOBTree
   bt = IOBTree()
   from persistent import Persistent
   class Record(Persistent):
 ... def __init__(self, data):
 ... super(Record, self).__init__()
 ... self.data = data
 ...
   rec = Record(my really long string data)
   bt[1] = rec


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: zodb does not save transaction

2008-05-29 Thread Laurence Rowe

tsmiller wrote:


I have a bookstore that uses the ZODB as its storage.  It uses qooxdoo as
the client and CherryPy for the server.  The server has a 'saveBookById'
routine that works 'most' of the time.  However, sometimes the
transaction.commit() does NOT commit the changes and when I restart my
server the changes are lost.



This sounds like you are using mutable data types (like lists or dicts) in
the
non-persistence aware variants.



Christian, thanks for the reply.
When I save a book I save a dictionary where all of the keys are strings and
all of the values are strings. But what you say makes sense.  I keep
thinking that it must have something to do with the data itself.  I will
check very carefully to make sure that I am not saving anything but strings
in the book record.  Thanks.  Tom


The problem is not saving things that are not strings, but modifying a 
non persistent object without notifying the parent persistent object 
that a change has happened and it needs to be saved.

e.g.

you have a persistent object (inherits from persistent.Persistent) pobj

 pobj.dict = {}
 transaction.commit()
 pobj.dict['foo'] = 'bar'
 transaction.commit()
 print pobj.dict
{'foo': 'bar'}

#restart your python process
 print pobj.dict
{}

Instead you must either tell zodb the object has changed:

 pobj.dict = {}
 transaction.commit()
 pobj.dict['foo'] = 'bar'
 pbj._p_changed = True # alternatively: pobj.dict = pobj.dict
 transaction.commit()
 print pobj.dict
{'foo': 'bar'}

#restart your python process
 print pobj.dict
{'foo': 'bar'}

Or use a persistence aware replacement.

 from persistent.mapping import PersistentMapping
 pobj.dict = PersistentMapping()
 transaction.commit()
 pobj.dict['foo'] = 'bar'
 transaction.commit()
 print pobj.dict
{'foo': 'bar'}

#restart your python process
 print pobj.dict
{'foo': 'bar'}

The same principles apply to other mutable non-peristent objects, such 
as lists.


Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: PGStorage

2008-01-23 Thread Laurence Rowe
PGStorage does require packing currently, but it would be fairly trivial 
to change it to only store single revisions. Postgres would still ensure 
mvcc. Then you just need to make sure postgres auto-vacuum daemon is 
running.


Laurence

David Pratt wrote:
Yes, Shane had done some benchmarking about a year or so ago. PGStorage 
was actually faster with small writes but slower for larger ones. As far 
as packing, as a zodb implementation, packing is still required to 
reduce the size of data in Postgres. BTW Stephan, where is Lovely using 
it - a site example? I had read some time ago that they were exploring 
it but not that it was being used.


Regards,
David

Stephan Richter wrote:

On Tuesday 22 January 2008, Dieter Maurer wrote:

OracleStorage was abandoned because it was almost an order
or magnitude slower than FileStorage.


Actually, Lovely Systems uses PGStorage because it is faster for them.

Regards,
Stephan

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: ZODB Benchmarks

2007-11-02 Thread Laurence Rowe

Matt Hamilton wrote:

David Binger dbinger at mems-exchange.org writes:



On Nov 2, 2007, at 6:20 AM, Lennart Regebro wrote:


Lots of people don't do nightly packs, I'm pretty sure such a process
needs to be completely automatic. The question is weather doing it in
a separate process in the background, or ever X transactions, or every
X seconds, or something.

Okay, perhaps the trigger should be the depth of the small-bucket tree.


That may just end up causing delays periodically in transactions... ie delays
that the user sees, as opposed to doing it via another thread or something.  But
then as only one thread would be doing this at a time it might not be too bad.

-Matt


ClockServer sections can now be specified in zope.conf. If you specify 
them with a period of say 10 mins (or even 2) then the queue should 
never get too large, and the linear search time is not a problem as n is 
small.


Essentially you end up with a solution very similar to QueueCatalog but 
with the queue being searchable.


The pain is then in modifying all of the indexes to search the queue in 
addition to their standard data structures.


Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: ZODB Benchmarks

2007-10-31 Thread Laurence Rowe
It looks like ZODB performance in your test has the same O(log n) 
performance as PostgreSQL checkpoints (the periodic drops in your 
graph). This should come as no surprise. B-Trees have a theoretical 
Search/Insert/Delete time complexity equal to the height of the tree, 
which is (up to) log(n).


So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for 
inserts. Instead of inserting into the (B-Tree based) data files at 
every transaction commit it writes a record to the WAL. This does not 
require traversal of the B-Tree and has O(1) time complexity. The 
penalty for this is that read operations become more complex, they must 
look first in the WAL and overlay those results with the main index. The 
WAL is never allowed to get too large, or its in memory index would 
become too big.


If you are going to have this number of records -- in a single B-Tree -- 
then use a relational database. It's what they're optimised for.


Laurence

Roché Compaan wrote:

Well I finally realised that ZODB benchmarks are not going to fall from
the sky so compelled by a project that needs to scale to very large
numbers and a general desire to have real numbers I started to write
some benchmarks.

My first goal was to get a baseline and test performance for the most
basic operations like inserts and lookups. The first test tests BTree
performance (OOBTree to be specific) and insert instances of a persitent
class into a BTree. Each instance has a single attribute that is 1K in
size. The test tries out different commit intervals - the first
iteration commits every 10 inserts, the second iteration commits every
100 inserts and the last one commits every 1000 inserts. I don't have
results for the second and third iterations since the first iteration
takes a couple of hours to complete and I'm still waiting for the
results on the second and third iteration.

The results so far is worrying in that performance deteriorates
logarithmically. The test kicks of with a bang at close to 750 inserts
per second, but after 1 million objects the insert rate drops to 260
inserts per second and at 10 million objects the rate is not even 60
inserts per second. Why?

In an attempt to determine if this drop in performance is normal I
created a test with Postgres purely to observe transaction rate and not
to compare it with the ZODB. In Postgres the transaction rate hovers
around 2700 inserts throughout the test. There are periodic drops but I
guess these are times when Postgres flushes to disc. I was hoping to
have a consistent transaction rate in the ZODB too. See the attached
image for the comparison. I also attach csv files of the data collected
by both tests.

During the last Plone conference I started a project called zodbbench
available here:

https://svn.plone.org/svn/collective/collective.zodbbench

The tests are written as unit tests and are run with a testrunner
script. The project uses buildout to make it easy to get going.
Unfortunately installing it with buildout on some systems seems to lead
to weird import errors that I can't explain so I would appreciate it if
somebody with buildout fu can look at it. 


What I would appreciate more though is an explanation of the drop in
performance or alternatively, why the test is insane ;-)








___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: AW: diploma thesis: ZODB Indexing

2007-09-05 Thread Laurence Rowe

Christian Theune wrote:
snip /

We imagine we need two kinds of components to make this work:

1. A query processor that could look like:

class IQueryProcessor(Interface):

def query(...):
Returns a list of matching objects. The parameters are
   specific to the query processor in use.


Alternatively, as the signature of the only method isn't specified
anyway, we could make each query processor define its own interface
instead.

2. An object collection that serves two purposes:

a) maintain indexes

b) provide a low-level query API that is rich enough to let different
query processors e.g. for SQL, xpath, ... work against them.

This is the one that needs most work to get the separation of concerns
right. One split we came up with are the responsibilities to define:

- which objects to index
- how to store the indexes
- how to derive the structural relations between objects

Those could be separated into individual components and make the object
collection a component that joins those together.

On the definition of indexes: we're not sure whether a generic set of
indexes will be sufficient (e.g. the three indexes from XISS - class
index, attribute index, structural index) or do those need to be
exchanged? 


For our ad-hoc querying we certainly don't want to have to set up
specialised indexes to make things work, but maybe optional indexes
could be used when possible -- just like RDBMS.



Make sure you take a look at SQLAlchemy's implementation of this, 
sqlalchemy.orm.query.


RDBMS do not get fast querying for free... They just revert to a 
complete record scan when they do not have an index - analogous to the 
find tab in the ZMI. As anyone who has ever queried such a database can 
attest, it ain't quick. (RDBMSs tend to create implicit indexes on 
primary and foreign keys also.)


Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: checking what refers to an object in zodb

2007-05-04 Thread Laurence Rowe

Chris,

I think you're looking at forward references when you want to look at 
back references.


This might help: http://plone.org/documentation/how-to/debug-zodb-bloat

(you might have to change the refmap to be in a zodb with that much data 
though)


Laurence

Chris Withers wrote:

Hi All,

We have a big(ish) zodb, which is about 29GB in size.
Thanks to the laughable difficulty of getting larger disks in big 
corporates, we've been looking into what's taking up that 29GB and were 
a bit surprised by the results.


Using space.py from the ZODBTools in Zope 2.9.4, it turns out that we 
have a lot of PersistentMapping's:


990,35913,555,382,871Persistence.mapping.PersistentMapping

So, that's almost half of the 29GB!

AT's default storage is a PersistentMapping called _md so this isn't too 
surprising. However, when looking into it, it turns out that half of the 
PersistentMapping's actually appear to be workflow_history's from 
DCWorkflow.


To try and find out which objects were referencing all these workflow 
histories, we tried the following starting with one of the oid of these 
histories:


from ZODB.FileStorage import FileStorage
from ZODB.serialize import referencesf

fs = FileStorage(path, read_only=1)
data, serialno = fs.load(oid, '')
refs = referencesf(data)

To our surprise, all of the workflow histories returned an empty list 
for refs.


What does this mean? Is there a bug that means these objects are hanging 
around even though there are no references? Are we using the wrong 
method to find references to these objects?


(if it helps, we pack to 1 day and each pack removes between 0.5GB and 
1GB from the overall size)


If there's any more info that would be helpful here, please ask away...

cheers,

Chris



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Supporting a DataManager without Two Phase Commit

2007-05-04 Thread Laurence Rowe

Hi,

Several people have made SQLalchemy integrations recently. SQLAlchemy 
does not support Two Phase Commit (2PC) so correctly tying it in with 
zope's transactions is tricky. With multiple One Phase Commit (1PC) 
DataManagers the problem is of course intractable, but given the 
popularity of mappers like SQLAlchemy I think Zope should support a 
single 1PC DataManager.


This websphere document describes a method to integrate a single 1PC 
resource with 2PC resources:

http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/lao/tasks/tla_ep.html

Following a discussion with several of the sqlalchemy integration 
authors on #plone today we came up with the following hack to implement 
this:

http://dev.plone.org/collective/browser/collective.lead/trunk/collective/lead/tx.py

The DataManager is given a high sortKey to ensure that it is considered 
last, and commits in tpc_vote, before the other (2PC) DataManagers' 
tpc_finish methods are called.


The hack obviously relies on only one DataManager making use of the 
trick. It would be nice to make this was supported directly so that an a 
 error could be thrown when more than one 1PC DataManager joined a 
transaction.


This could be implemented by changing the signature of 
transaction._transaction.Transaction.join to have an optional 
single_phase argument (default would be False). The 1PC resource would 
then be registered seperately to the 2PC resources and _commitResources 
would call commit on the 1PC resource between tpc_vote and tpc_finish.


If you think this would be helpful I'll try and supply a patch (need to 
look into the detail of how failed transactions are cleaned up).


Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: KeyError / POSKeyError

2007-03-23 Thread Laurence Rowe
You need to provide the full traceback so we can tell where it is coming 
from.


My guess (though I'm surprised by the particular error) is that you have 
perhaps got content owned by users in a user folder outside the site 
that is no longer accessible when you mount the database on its own. If 
that is the case then you need to write a script to fix up the 
__ac_local_roles__ on the affected objects.


Laurence

Tim Tisdall wrote:

  Here's the thing...  I get a KeyError if that ZODB is on it's own,
but if I create a fammed-old object that's similar to what it's
looking for, it will then throw a POSKeyError.

  The Plone instance was created fresh and then only the file
contents of the old site were copied over to the new instance.  The
migration of the old Plone site didn't work, but it did manage to make
it so I could access the files contained within and copy them over.  I
didn't copy over any stylings, products, users, widget things...  I'm
pretty sure I just copied over AT types and a few basic zope files
(like DTML files and zope page templates).

  -Tim

On 3/23/07, Christian Theune [EMAIL PROTECTED] wrote:

Hi,

Can you tell whether you get a KeyError or a POSKeyError?

If you get a KeyError, it's likely that the app (Plone) is broken, e.g.
during the migration you mentioned.

A POSKeyError would (very likely) not talk about a a key like
'fammed-old', so I suspect you don't have a corruption in your
storage/database but your application.

Christian

Am Freitag, den 23.03.2007, 12:04 -0400 schrieb Tim Tisdall:
   I've got a 1gb ZODB that contains a single plone site and I'm not
 able to access any part of it via the ZMI.  It keeps saying that it's
 looking for key fammed-old which is another plone site in another
 ZODB file.  Basically I managed to partly migrate a Plone 2.0 to Plone
 2.5 and then copied over the file contents from that instance into a
 new Plone instance.  I have no idea why the new one would be
 referencing the old one, but it seemed to always throw this error if
 the old database was unmounted.
   I've tried several cookbook fixes I've found, but the problem is
 that the plone instance itself is throwing the KeyError.  Deleting the
 whole plone instance is not going to help me much.  Any suggestions?
   I've also tried running the fsrecovery.py, but it simply makes a
 complete duplicate of the file.  fstest.py doesn't seem to find any
 errors.  fsrefs.py finds a series of errors, but I have no idea what
 to do with that information.  It seems that it's finding that it's
 referencing fammed-old and that that doesn't exist.

   -Tim
 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/

 ZODB-Dev mailing list  -  ZODB-Dev@zope.org
 http://mail.zope.org/mailman/listinfo/zodb-dev
--
gocept gmbh  co. kg - forsterstraße 29 - 06112 halle/saale - germany
www.gocept.com - [EMAIL PROTECTED] - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: roll back filestorage zodb to a certain date?

2007-03-21 Thread Laurence Rowe

Jim Fulton wrote:
snip /

I wasn't asking about implementation.

Here are some questions:

- Should this create a new FileStorage? Or should it modify the existing 
FileStorage in place?


Probably create a new one (analogous to a pack). Seems safer than 
truncating to me.




- Should this work while the FileStorage is being used?


I don't think this is important. If a new file is created it can open 
the existing one readonly anyhow.



- Should this behave transactional?


No need if it creates a new file

However its done it'll sure beat the iterate through transactions to 
find the offset for a particular time then dd to create a truncated copy 
method that I use ;-)


Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: History-less FileStorage?

2006-12-29 Thread Laurence Rowe
I'm sure you're probably aware of these, but I thought I'd file this 
summary while they were in my head.


There is no history-less FileStorage. It is essentially a transaction log.

Directory Storage has Minimal.py which is history-less, very simple 
though it is not proven in production. Could be a good candidate for 
storing the catalogue, though I imagine you would want to rebuild after 
an unclean shutdown of zope in this case.

http://dirstorage.sourceforge.net/FAQ.html

BDBStorage never made it. http://wiki.zope.org/ZODB/BDBStorage.html

PGStorage does store the history. However it would be fairly simple to 
rework it not to (indeed it would simplify the code considerably). 
Performance is similar to or better than ZEO + FileStorage, though 
slower than local FileStorage.

http://sourceforge.net/projects/pgstorage


Laurence

Stefan H. Holek wrote:

Do we have a history-less (i.e. no-grow) FileStorage?

Thanks,
Stefan

--
Anything that, in happening, causes something else to happen,
causes something else to happen.  --Douglas Adams


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev