Hi Claudiu,

On 18.08.13 20:07, Claudiu Saftoiu wrote:
I wonder, if you have a problem which an SQL database would be so good for that 
youre mimicking an SQL database with zodb, why not just use an SQL database? It 
doesn't sound like you'll gain much from being able to persist objects which is 
one of the main reasons to use an object database...


This is because I hate to create DB servers in the first place, loose all the
flexibility of Python, create import scripts which deal with the limitations
of the RDBMS, ...

Of cource, it probably makes sense to switch to an SQL database, in the end.
I just wanted to keep things in Python as long as possible, to explore the
data and not having to understand the relations in the first place.

I need to squeeze and treat and brush the data, before I use something else.
This is pretty much like switching from Python to C - it is the very last thing
that I want to do, because Python -> SQLDB is like Python -> C:

You are carving things into stone, get lots of constraints and loose flexibility.

In this case I was a bit over the tops, but I'm already quite pleased with today's
approach, 25 btrees of namedtuple records are very nice to explore.
Utilizing a tuple cache (also as zodb/durus), I can create and save the database
in 20 minutes, resulting in compressed size of 300 MB. Quite a starter...

cheers - chris


On Aug 18, 2013, at 12:17 PM, Christian Tismer <tis...@stackless.com> wrote:

On 18.08.13 17:09, Jim Fulton wrote:
On Fri, Aug 16, 2013 at 11:49 PM, Christian Tismer <tis...@stackless.com> wrote:
<snip>
Explaining very concisely, now.

I don't think I/we understand your problem well enough to answer. If data has a 
very low shelf life, then replacing it frequently might make sense. If the 
schema changes that frequently, I'd as why. If this is a data analysis 
application, you might be better served by tools designed for that.
Is Python still the way to go, or should I stop this and use something like
PostgreSQL? (And I doubt that this would give a benefit, actually).
Ditto,

Would you implement a column store, and how would you do that?
Ditto.

Right now, everything gets too large, and I'm quite desperate. Therefore,
I'm
asking the master, which you definately are!
"large" can mean many things. The examples you give don't
seem very large in terms of storage, at least not for ZODB.

Beyond that there are lots of dimensions of scale that ZODB
doesn't handle well (e.g. large transaction rates, very
high availability).

It's really hard to make specific recommendations without
knowing more about the problem. (And it's likely that someone
wouldn't be able to spend the time necessary to learn more
about the problem without a stake in it. IOW, don't assume I'll
read a much longer post getting into details. :)

Ok, just the sketch of it to make things clearer, don't waste time on this ;-)

We get a medication prescription database in a certain serialized format
which is standard in Germany for all pharmacy support companies.

This database comes in ~25 files == tables in a zip file every two weeks.
The DB is actually a structured set of SQL tables with references et al.

I actually did not want to change the design and simply created the table
structure that they have, using ZODB, with tables as btrees that contain
tuples for the records, so this is basically the SQL model, mimicked in Zodb.

What is boring is the fact, that the database gets incremental updates all the 
time,
changed prices, packing info, etc.
We need to cope with millions of recipes that come from certain dates
and therefore need to inquire different versions of the database.

I just hate the huge redundancy that these database versions would have
and tried to find a way to put this all into a single Zodb with a way to
time-travel to every version.

The weird thing is that the DB also changes its structure over time:

- new fields are added, old fields dropped.

That's the reason why I thought to store the tables by column, and each column 
is
a BTree on itself. Is that feasible at all?

Of the 25 tables, there are 4 quite large, like
4 tables x 500,000 rows x 100 columns,
== 200,000,000 cells in one database.

With a btree bucket size of ~60, this gives ~ 3,333,333 buckets.
With multiple versions, this will be even more.

-- Can Zodb handle so many objects and still open the db fast?
-- Or will the huge index kill performance?

That's all I'm asking before doing another experiment ;-)

but don't waste time, just telling you the story -- chris

--
Christian Tismer             :^)   <mailto:tis...@stackless.com>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship* http://starship.python.net/
14482 Potsdam                :     PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/

_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


--
Christian Tismer             :^)   <mailto:tis...@stackless.com>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship* http://starship.python.net/
14482 Potsdam                :     PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/

_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to