Re: [ZODB-Dev] Using zodb and blobs

2010-04-16 Thread Christian Theune

On 04/15/2010 06:00 PM, Adam GROSZER wrote:

Hello Christian,


Wednesday, April 14, 2010, 8:30:50 AM, you wrote:

CT  I don't think the transfer rate is actually that interesting. For small
CT  but many transactions the seek time/spinning speed should have the
CT  limiting influence.

CT  I've run the attached script a couple of times on my notebook, here's
CT  the results:

CT  0.11 909.090909091
CT  0.15 666.7
CT  0.2 500.0
CT  0.07 1428.57142857
CT  0.07 1428.57142857
CT  0.14 714.285714286

CT  The initial runs are a bit lower as they were interfered with by other
CT  applications writing to the disk.

CT  It's a notebook w/ Intel P9600, Seagate 7.2k SATA drive, 4GB RAM, Ubuntu
CT  10.04, linux 2.6.32, ext4

Something is wrong with the seek time.
Trying it on an Intel G1 80G SSD goes hardly over a 1000.


Well, sounds like a broken fsync then. :/

--
Christian Theune · c...@gocept.com
gocept gmbh  co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 0 · fax +49 345 1229889 1
Zope and Plone consulting and development



smime.p7s
Description: S/MIME Cryptographic Signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-15 Thread Adam GROSZER
Hello Christian,


Wednesday, April 14, 2010, 8:30:50 AM, you wrote:

CT I don't think the transfer rate is actually that interesting. For small
CT but many transactions the seek time/spinning speed should have the 
CT limiting influence.

CT I've run the attached script a couple of times on my notebook, here's 
CT the results:

CT 0.11 909.090909091
CT 0.15 666.7
CT 0.2 500.0
CT 0.07 1428.57142857
CT 0.07 1428.57142857
CT 0.14 714.285714286

CT The initial runs are a bit lower as they were interfered with by other
CT applications writing to the disk.

CT It's a notebook w/ Intel P9600, Seagate 7.2k SATA drive, 4GB RAM, Ubuntu
CT 10.04, linux 2.6.32, ext4

Something is wrong with the seek time.
Trying it on an Intel G1 80G SSD goes hardly over a 1000.
NB, P7400 3GB RAM, ubuntu 9.10, ext4

0.14 714.285714286
0.14 714.285714286
0.1 1000.0
0.14 714.285714286

Just for fun, it's flying on a tmpfs

0.36 2777.7778
0.37 2702.7027027
0.37 2702.7027027

Vmware makes it fly also:
with 1000 runs
on a 4GHz E8400, ubuntu 9.10, ext4 within VMware

0.24 4166.6667
0.33 3030.3030303
0.26 3846.15384615
0.31 3225.80645161
0.29 3448.27586207

in the VM tmpfs does not boost a lot:

0.2 5000.0
0.2 5000.0
0.2 5000.0
0.19 5263.15789474
0.19 5263.15789474


-- 
Best regards,
 Adam GROSZERmailto:agros...@gmail.com
--
Quote of the day:
People seem to enjoy things more when they know a lot of other people have been 
left out of the pleasure.  -  Russell Baker

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-15 Thread Jim Fulton
On Tue, Apr 13, 2010 at 8:42 PM, Nitro ni...@dr-code.org wrote:
 Am 14.04.2010, 04:08 Uhr, schrieb Laurence Rowe l...@lrowe.co.uk:

 Running your test script on my small amazon EC2 instance on linux
 takes between 0.0 and 0.04 seconds (I had to remove the divide by
 total to avoid a zero division error). 0.02 is 5000/s.

 I don't know how EC2 works in detail, but 5000 transactions per second
 sound impossible to write to disk. Even 500 are impossible if your disk
 doesn't have VERY fast access times.

Unlike most other databases, ZODB records written to file storages are always
appended, so there is no seeking involved.  The only seeking involved in writes
is that needed to read previous records, but if a test is simply
writing the same
object over and over, or updating a small corpus, the previous record is likely
to be in disk cache. Of course, other things happening on the system will
typically cause the disk heads to seek away from the end of the database file,
but you're unlikely to see that in a simpler benchmark.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-14 Thread Christian Theune
On 04/14/2010 03:30 AM, Nitro wrote:
 Am 14.04.2010, 04:39 Uhr, schrieb Tim Peterstim.pet...@gmail.com:

 [Nitro]
 ...
 I wonder if _commit is really *that* slow

 Six years ago I timed factor-of-100 speed differences due to using MS
 _commit() on WinXP at the time:

  https://mail.zope.org/pipermail/zodb-dev/2004-July/007720.html

 or if there's another (faster) function which can be called...

 No MS function that does the same thing.

 Seems like ZODB has a long history of discussions on this matter:

 http://www.mail-archive.com/zodb-dev@zope.org/msg01874.html

 There's even a proposal for improvement in that thread, also on the wiki:

 http://wiki.zope.org/ZODB/FsyncBehaviourSetting

 What I don't really get is why you should never use None on windows. As
 far as I can judge from the various transaction rates in the thread Tim
 mentioned, fsync is just a no-op on linux anyways (depending on the
 specific file system of course).

I'm pretty sure it's not. IIRC fsync is defined by POSIX and absolutely 
requires the implementor to flush data physically to disk ensuring its 
persistency. If that doesn't hold true then all transactions are borked.

I've seen virtualised environments like VMWare ESX lie about fsyncs from 
a virtual hardware perspective.

A similar issue with breaking fsync was the discussion around ext4 on 
notebooks with a only actually flush discs every 10 seconds.

In the end it really depends on what you need your data for. If I'd 
store the information that a customer paid me 50 EUR for something and I 
presented a screen to him that told him he'll receive some good for that 
then I'd rather stick with compliant transactions.

 I am almost tempted to do

 os.fsync = lambda fd: 0

 and rely on yesterday's backup. 0.49 j/k.

j/k?

-- 
Christian Theune · c...@gocept.com
gocept gmbh  co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 0 · fax +49 345 1229889 1
Zope and Plone consulting and development
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-14 Thread Wichert Akkerman
On 4/14/10 08:24 , Christian Theune wrote:

 I'm pretty sure it's not. IIRC fsync is defined by POSIX and absolutely
 requires the implementor to flush data physically to disk ensuring its
 persistency. If that doesn't hold true then all transactions are borked.

That was the problem with fsync on Linux: it effectively flushed all 
pending filesystem work, not just that for your current filehandle. That 
was needed to satisfy ordering constraints for the filesystem. And even 
though the result might be a lie since disks or other bits of hardwire 
can lie to you. It is generally better to use fdatasync() instead of 
fsync(), but you could still end up waiting much longer than you would 
expect.

Wichert.
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-14 Thread Nitro
Am 14.04.2010, 09:24 Uhr, schrieb Christian Theune c...@gocept.com:

 What I don't really get is why you should never use None on windows.  
 As
 far as I can judge from the various transaction rates in the thread Tim
 mentioned, fsync is just a no-op on linux anyways (depending on the
 specific file system of course).

 I'm pretty sure it's not. IIRC fsync is defined by POSIX and absolutely
 requires the implementor to flush data physically to disk ensuring its
 persistency. If that doesn't hold true then all transactions are borked.

They are. See  
https://mail.zope.org/pipermail/zodb-dev/2004-July/007683.html . Important  
quote Linux will allow fsync() to return even if the data isn't actually  
on disk.

Also you can do the math yourself. If you have 10 ms average seek time,  
then you can only do 100 fsync/s. If you get more, there's buffering of  
some sort. Be it in hardware or file system. Since you cannot rely on  
fsync to work here I see the fsync as an hopeful wish, but nothing you can  
rely on. Things you cannot rely on give a false sense of security even if  
they might actually help in some cases.
That's why I trust Windows' _commit to do it's work much more I trust  
linux' fsync to do it's job. 50 transaction = 20ms per transaction which  
sounds reasonable.

 In the end it really depends on what you need your data for. If I'd
 store the information that a customer paid me 50 EUR for something and I
 presented a screen to him that told him he'll receive some good for that
 then I'd rather stick with compliant transactions.

Yes, in my case it's nothing critical or related to money. If there's a  
hardware outage a day of work is lost at worst. In case of corruption  
(which can happen also without fsync as data within the file can just be  
garbled) you need a backup anyways.

 j/k?

just kidding.

 I don't think the transfer rate is actually that interesting. For small 
 but many transactions the seek time/spinning speed should have the 
 limiting influence.

Yes, seek time is important here. I didn't recall the seek times of my  
hard disk, but wanted to mention it's not a slow hard disk.

-Matthias
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-14 Thread Hanno Schlichting
On Wed, Apr 14, 2010 at 11:52 AM, Nitro ni...@dr-code.org wrote:
 Yes, in my case it's nothing critical or related to money. If there's a
 hardware outage a day of work is lost at worst. In case of corruption
 (which can happen also without fsync as data within the file can just be
 garbled) you need a backup anyways.

Usually you will only loose the last transaction and not a days of
work. The Data.fs is an append-only file, with one transaction
appended after another. If there's a garbled or incomplete write,
you'll typically loose the last transaction. The ZODB is smart enough
to detect broken transactions and skip them on restart.

I have witnessed one ZEO installation myself, where the physical
machine hosting the ZEO server restarted multiple times a day, over a
period of months. Nobody noticed for a long time, as the application
was accessible all the time and no data had been lost. Obviously this
wasn't a very write-intense application. But it still showed me how
stable the ZODB really is.

Hanno
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-14 Thread Nitro
Am 14.04.2010, 14:45 Uhr, schrieb Hanno Schlichting ha...@hannosch.eu:

 Usually you will only loose the last transaction and not a days of
 work. The Data.fs is an append-only file, with one transaction
 appended after another. If there's a garbled or incomplete write,
 you'll typically loose the last transaction. The ZODB is smart enough
 to detect broken transactions and skip them on restart.

 I have witnessed one ZEO installation myself, where the physical
 machine hosting the ZEO server restarted multiple times a day, over a
 period of months. Nobody noticed for a long time, as the application
 was accessible all the time and no data had been lost. Obviously this
 wasn't a very write-intense application. But it still showed me how
 stable the ZODB really is.

Yes, I agree with your opinion in general.

There's still a chance that broken transactions are written IIUC (see  
https://mail.zope.org/pipermail/zodb-dev/2004-July/007683.html ):

 Doesn't this mean that if the system suddenly crashes in the middle of
 os.fsync, the Data.fs on disk will contain an incomplete transaction,  
 but
 the transaction status byte would claim that the transaction is  
 complete.
 Wouldn't that be bad?
 If that happened, perhaps.

The chance exists because fsync does not work as advertised on many  
systems. On the systems where it seems to work, the slowdown is massive.  
So I am doubting the usefulness of using fsync in current ZODB at all.

As your observations seem to hint, it's probably very unlikely to  
encounter this problem in practice. And I doubt Tim finally got Jim to pay  
him for a 48 hour pull-the-plug session :-) That's why I am not going to  
dig further into this and am satisfied with the current reliability.

-Matthias
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-13 Thread Nitro
Hello Tres,

thanks for your detailed answers!

Am 12.04.2010, 22:42 Uhr, schrieb Tres Seaver tsea...@palladion.com:

 Additionally I made some quick performance tests. I committed 1kb sized
 objects and I can do about 40 transaction/s if one object is changed per
 transaction. For 100kb objects it's also around 40 transactions/s. Only
 for object sizes bigger than that the raw I/O throughput seems to start  
 to
 matter.

 40 tps sounds low:  are you pushing blob content over the wire somehow?

No, that test was with a plain file storage. Just a plain Persistent  
object with a differently sized string and an integer attribute. I did  
something like

1) create object with attribute x (integer) and y (variably sized string)
2) for i in range(100): obj.x = i; transaction.commit()
3) Measure time taken for step 2

 Still don't know the answers to these:

 - Does it make sense to use ZODB in this scenario? My data is not suited
 well for an RDBMS.

 YMMV.  I still default to using ZODB for anything at all, unless the
 problem smells very strongly relational.

Ok, the problem at hand certainly doesn't smell relational. It is more  
about storing lots of different data than querying it extensively. It's a  
mixture of digital asset management (the blobs are useful for this part)  
and projects which reference the assets. The projects are shared between  
the clients and will consist of a big tree with Persistent objects hooked  
up to it.

 - Are there more complications to blobs other than a slightly different
 backup procedure?

 You need to think about how the blob data is shared between ZEO clients
 (your appserver) and the ZEO storage server:  opinions vary here, but I
 would prefer to have the blobs living in a writable shared filesystem,
 in order to avoid the necessity of fetching their data over ZEO on the
 individual clients which were not the one pushing the blob into the
 database.

The zeo server and clients will be in different physical locations, so I'd  
probably have to employ some shared filesystem which can deal with that.  
Speaking of locations of server and clients, is it a problem - as in zeo  
will perform very badly under these circumstances as it was not designed  
for this - if they are not in the same location (typical latency 0-100ms)?

 - Are there any performance penalties by using very large invalidation
 queues (i.e. 300,000 objects) to reduce client cache verification time?

 At a minimum, RAM occupied by that queue might be better used elsewhere.
  I just don't use persistent caches, and tend to reboot appservers in
 rotation after the ZEO storage has been down for any significant period
 (almost never happens).

In my case the clients might be down for a couple of days (typically 1 or  
2 days) and they should not spend 30 mins in cache verification time each  
time they reconnect. So if these 300k objects take up 1k each, then they  
occupy 300 MB of ram which I am fine with.

  From what I've read it only seems to consume memory.

 Note that the ZEO storage server makes copies of that queue to avoid
 race conditions.

Ok, I can see how copying and storing 300k objects is slow and can take up  
excessive amounts of memory.

Thanks,
-Matthias
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-13 Thread Nitro

40 tps sounds low:  are you pushing blob content over the wire somehow?


I have seen the ZEO storage committing transactions at least an order of
magnitude faster than that (e.g., when processing incoming newswire
feeds).  I would guess that there could have been some other latencies
involved in your setup (e.g., that 0-100ms lag you mention below).


See my attached test script. It outputs 45-55 transactions/s for 100 byte  
sized payload. Maybe there's a very fundamental flaw in the way the test  
is setup. Note that I am testing on a regular desktop machine (Windows 7,  
WoW64, 4GB RAM, 1TB hard disk capable of transfer rates 100MB/s).


The zeo server and clients will be in different physical locations, so  
I'd

probably have to employ some shared filesystem which can deal with that.
Speaking of locations of server and clients, is it a problem - as in zeo
will perform very badly under these circumstances as it was not designed
for this - if they are not in the same location (typical latency  
0-100ms)?


That depends on the mix of reads and writes in your application.  I have
personnally witnessed a case where the clients stayed up and serving
pages over a whole weekend in a clusterfsck where both the ZEO server
and the monitoring infrastructure went belly up.  This was for a large
corporate intranet, in case that helps:  the problem surfaced
mid-morning on Monday when the employee in charge of updating the lunch
menu for the week couldn't save the changes.


Haha, I hope they solved this critical problem in time!

In my case the clients might be down for a couple of days (typically 1  
or
2 days) and they should not spend 30 mins in cache verification time  
each

time they reconnect. So if these 300k objects take up 1k each, then they
occupy 300 MB of ram which I am fine with.


If the client is disconnected for any period of time, it is far more
likely that just dumping the cache and starting over fresh will be a
win.  The 'invalidation_queue' is primarily to support clients which
remain up while the storage server is down or unreachable.


Yes, taking the verification time hit is my plan for now. However, dumping  
the whole client cache is something I'd like to avoid, since the app I am  
working on will not work over a corporate intranet and thus the bandwidth  
for transferring the blobs is limited (and so can take up considerable  
time). Maybe I am overestimating the whole client cache problem though.


Thanks again for your valuable advice,
-Matthias

test.py
Description: Binary data
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-13 Thread Laurence Rowe
Running your test script on my small amazon EC2 instance on linux
takes between 0.0 and 0.04 seconds (I had to remove the divide by
total to avoid a zero division error). 0.02 is 5000/s.

Laurence

On 14 April 2010 00:25, Nitro ni...@dr-code.org wrote:
 40 tps sounds low:  are you pushing blob content over the wire somehow?

 I have seen the ZEO storage committing transactions at least an order of
 magnitude faster than that (e.g., when processing incoming newswire
 feeds).  I would guess that there could have been some other latencies
 involved in your setup (e.g., that 0-100ms lag you mention below).

 See my attached test script. It outputs 45-55 transactions/s for 100 byte
 sized payload. Maybe there's a very fundamental flaw in the way the test is
 setup. Note that I am testing on a regular desktop machine (Windows 7,
 WoW64, 4GB RAM, 1TB hard disk capable of transfer rates 100MB/s).

 The zeo server and clients will be in different physical locations, so
 I'd
 probably have to employ some shared filesystem which can deal with that.
 Speaking of locations of server and clients, is it a problem - as in zeo
 will perform very badly under these circumstances as it was not designed
 for this - if they are not in the same location (typical latency
 0-100ms)?

 That depends on the mix of reads and writes in your application.  I have
 personnally witnessed a case where the clients stayed up and serving
 pages over a whole weekend in a clusterfsck where both the ZEO server
 and the monitoring infrastructure went belly up.  This was for a large
 corporate intranet, in case that helps:  the problem surfaced
 mid-morning on Monday when the employee in charge of updating the lunch
 menu for the week couldn't save the changes.

 Haha, I hope they solved this critical problem in time!

 In my case the clients might be down for a couple of days (typically 1 or
 2 days) and they should not spend 30 mins in cache verification time each
 time they reconnect. So if these 300k objects take up 1k each, then they
 occupy 300 MB of ram which I am fine with.

 If the client is disconnected for any period of time, it is far more
 likely that just dumping the cache and starting over fresh will be a
 win.  The 'invalidation_queue' is primarily to support clients which
 remain up while the storage server is down or unreachable.

 Yes, taking the verification time hit is my plan for now. However, dumping
 the whole client cache is something I'd like to avoid, since the app I am
 working on will not work over a corporate intranet and thus the bandwidth
 for transferring the blobs is limited (and so can take up considerable
 time). Maybe I am overestimating the whole client cache problem though.

 Thanks again for your valuable advice,
 -Matthias
 ___
 For more information about ZODB, see the ZODB Wiki:
 http://www.zope.org/Wikis/ZODB/

 ZODB-Dev mailing list  -  zodb-...@zope.org
 https://mail.zope.org/mailman/listinfo/zodb-dev


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-13 Thread Nitro
Am 14.04.2010, 04:08 Uhr, schrieb Laurence Rowe l...@lrowe.co.uk:

 Running your test script on my small amazon EC2 instance on linux
 takes between 0.0 and 0.04 seconds (I had to remove the divide by
 total to avoid a zero division error). 0.02 is 5000/s.

Thanks for running the test.

Intrigued by this extreme difference I've done a little run with cProfile,  
script attached. On my machine the 100 runs take ~2.65 seconds, at least  
2.55 seconds are spent in the nt.fsync function. That's an alias for  
os.fsync on windows. According to the python docs it calls the _commit C  
function on windows. See  
http://msdn.microsoft.com/en-us/library/17618685(VS.80).aspx .

I wonder if _commit is really *that* slow or if there's another (faster)  
function which can be called...

-Matthias
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-13 Thread Tim Peters
[Nitro]
 ...
 I wonder if _commit is really *that* slow

Six years ago I timed factor-of-100 speed differences due to using MS
_commit() on WinXP at the time:

https://mail.zope.org/pipermail/zodb-dev/2004-July/007720.html

 or if there's another (faster) function which can be called...

No MS function that does the same thing.
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-13 Thread Nitro
For the record, with

import os
os.fsync = lambda fd: 0

at the top of the test app I get ~3700 tx/s.

-Matthias
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-12 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Nitro wrote:
 After lots of googling and browsing the source I can answer some of my  
 questions:
 
 - What's the difference between storing bigger objects as blobs and as
 plain large strings?
 
 Plain large strings cannot be streamed for instance. Products like Zope  
 chop up their file uploads into 64kb chunks which are then stored as  
 individual objects in the zodb.

That was the strategy before blobs.  ZODB versions since 3.8 support
storage of BLOb data as files on the filesystem.

 - Can I stream in parts of a blob/large string without having to read all
 of it?
 
 I can get a file handle to a blob. Strings are always read as a whole.
 
 - Where can I find example code on zodb blobs? E.g. how do I save a blob,
 how do I read it back in?
 
 The ZODB/tests directory features a few blob doctests which provide all  
 the necessary code to get started. Having this on zodb.org would be nice  
 (especially since the doctests are already ReST-formatted).
 
 Additionally I made some quick performance tests. I committed 1kb sized  
 objects and I can do about 40 transaction/s if one object is changed per  
 transaction. For 100kb objects it's also around 40 transactions/s. Only  
 for object sizes bigger than that the raw I/O throughput seems to start to  
 matter.

40 tps sounds low:  are you pushing blob content over the wire somehow?

 Still don't know the answers to these:
 
 - Does it make sense to use ZODB in this scenario? My data is not suited  
 well for an RDBMS.

YMMV.  I still default to using ZODB for anything at all, unless the
problem smells very strongly relational.

 - Are there more complications to blobs other than a slightly different  
 backup procedure?

You need to think about how the blob data is shared between ZEO clients
(your appserver) and the ZEO storage server:  opinions vary here, but I
would prefer to have the blobs living in a writable shared filesystem,
in order to avoid the necessity of fetching their data over ZEO on the
individual clients which were not the one pushing the blob into the
database.

 - Is it ok to use cross-database references? Or is this better avoided at  
 all cost?

I would normally avoid them out of habit.  They seem to work, though.

 And new questions:
 
 - Does the _p_invalidate hooking as outlined at  
 http://www.mail-archive.com/zodb-dev@zope.org/msg00637.html work reliably?

Never tried it, nor felt the need.

 - Are there any performance penalties by using very large invalidation  
 queues (i.e. 300,000 objects) to reduce client cache verification time?  

At a minimum, RAM occupied by that queue might be better used elsewhere.
 I just don't use persistent caches, and tend to reboot appservers in
rotation after the ZEO storage has been down for any significant period
(almost never happens).

  From what I've read it only seems to consume memory.

Note that the ZEO storage server makes copies of that queue to avoid
race conditions.



Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkvDd5sACgkQ+gerLs4ltQ5h4wCghqTQsNO/5XrqHMZyhi8Hz17d
oRcAn1el86604KoTTWB8Bx5R13ZlvQB/
=momg
-END PGP SIGNATURE-

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Using zodb and blobs

2010-04-11 Thread Nitro
After lots of googling and browsing the source I can answer some of my  
questions:

 - What's the difference between storing bigger objects as blobs and as
 plain large strings?

Plain large strings cannot be streamed for instance. Products like Zope  
chop up their file uploads into 64kb chunks which are then stored as  
individual objects in the zodb.

 - Can I stream in parts of a blob/large string without having to read all
 of it?

I can get a file handle to a blob. Strings are always read as a whole.

 - Where can I find example code on zodb blobs? E.g. how do I save a blob,
 how do I read it back in?

The ZODB/tests directory features a few blob doctests which provide all  
the necessary code to get started. Having this on zodb.org would be nice  
(especially since the doctests are already ReST-formatted).

Additionally I made some quick performance tests. I committed 1kb sized  
objects and I can do about 40 transaction/s if one object is changed per  
transaction. For 100kb objects it's also around 40 transactions/s. Only  
for object sizes bigger than that the raw I/O throughput seems to start to  
matter.

Still don't know the answers to these:

- Does it make sense to use ZODB in this scenario? My data is not suited  
well for an RDBMS.
- Are there more complications to blobs other than a slightly different  
backup procedure?
- Is it ok to use cross-database references? Or is this better avoided at  
all cost?

And new questions:

- Does the _p_invalidate hooking as outlined at  
http://www.mail-archive.com/zodb-dev@zope.org/msg00637.html work reliably?
- Are there any performance penalties by using very large invalidation  
queues (i.e. 300,000 objects) to reduce client cache verification time?  
 From what I've read it only seems to consume memory.

Thanks,
-Matthias
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev