subject:"\[ZODB\-Dev\] polite advice request"

Re: [ZODB-Dev] polite advice request

2013-08-20 Thread Joerg Baach

On 19/08/13 00:39, Alan Runyan wrote:
> I just wrote up some thoughts on ZODB.
> Might be useful for others - doubtful - but maybe.

For me this is really useful. So, thanks a lot, lots of new things learned!

Cheers,

  Joerg
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

2013-08-19 Thread Christian Tismer

Issue resolved, see at end

On 19.08.13 00:39, Alan Runyan wrote:

Would you implement a column store, and how would you do that?

Ditto.

So many Dittos, it sounds like a Rush Limbaugh talk show :)

"large" can mean many things. The examples you give don't
seem very large in terms of storage, at least not for ZODB.

One app we have is 26,344,368 objects.
ZODB is the least of its concerns.

It's really hard to make specific recommendations without
knowing more about the problem. (And it's likely that someone
wouldn't be able to spend the time necessary to learn more
about the problem without a stake in it. IOW, don't assume I'll
read a much longer post getting into details. :)

This is fair. ZODB is intimately tied to the application design so
it is a bit difficult for someone to qualify what they are doing
without having to explain the application design.

This sucks from a newbie's point of view but its reality.

I just wrote up some thoughts on ZODB.
Might be useful for others - doubtful - but maybe.

https://docs.google.com/document/d/12RGOTSMrl0CttkCZJ5rp-TSaakAY2Pn4VnWhVMcFMQw/edit?usp=sharing

Anyway. Tismer if you write up more thoughts; I will read them.

Hey, nice write-up, thanks a lot!

On 19.08.13 09:33, Dylan Jay wrote:

In some ways the ZODB is less flexible. It requires you to understand more
about how you will access the data before you import it, than does an SQL
database. This is because the datastructure defines how you can query it in a
ZODB.
For example, if you need multiple indexes to your data, then to make it
efficient you might choose a different data structure. Whereas in SQL you can
add indexes after the fact. Which ever way you go however, you are always
better off thinking about how you will access your data first. for example when
you reimport the data do you need to do a look up on each item to see if it's
there and merge, or will you just delete the lot and start from scratch?

Having said this, you might look at a project like souper that tries to support
tabular type data without having to think too much about the data structures.

I looked a bit into souper, maybe I'll try.

Right now I'm happy with this very dumb brute-force solution:

I turned all the 25 tables into a column-store, very simple implementation
with no keys, nothing.
I just took the original table data, sorted it by primary key, and then
built a persistent list for each column.

This unoptimized solution has very little overhead. The primary key can be
searched by bisect, which is right now all we need.

I used ZlibStorage, and the stunning effect:

The database is now 44.5 MB, it loads the few columns that we need
in a fraction of a second, and the original serialization format
took 44.4 MB as a ZIP file. :-D

So the former bloat of almost a GB is gone, versions are cheap, and I don't
try to do further reduction of size or calculate deltas between versions,
but happily use the small, absolute column store databases
which I calculate every two weeks, together with an index database.

cheers - chris

--
Christian Tismer :^)
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list - ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

2013-08-19 Thread Dylan Jay

In some ways the ZODB is less flexible. It requires you to understand more 
about how you will access the data before you import it, than does an SQL 
database. This is because the datastructure defines how you can query it in a 
ZODB. 
For example, if you need multiple indexes to your data, then to make it 
efficient you might choose a different data structure. Whereas in SQL you can 
add indexes after the fact. Which ever way you go however, you are always 
better off thinking about how you will access your data first. for example when 
you reimport the data do you need to do a look up on each item to see if it's 
there and merge, or will you just delete the lot and start from scratch?

Having said this, you might look at a project like souper that tries to support 
tabular type data without having to think too much about the data structures.


On 19/08/2013, at 1:09 AM, Jim Fulton  wrote:

> On Fri, Aug 16, 2013 at 11:49 PM, Christian Tismer  
> wrote:
>> Hi Jim et all!
>> 
>> I am struggling with a weird data base, and my goal is to show off how
>> great this works with (zodb|durus, the latter already failed pretty much).
>> 
>> Just to give you an impression of the size of the problem:
>> 
>> There are about 25 tables, each with currently 450,000 records.
>> After all the changes since 20120101, there were 700,000 records involved
>> and morphed for each table.
>> 
>> These records have some relevant data, but extend to something like 95
>> additional columns which are pretty cumbersome.
>> 
>> This database is pretty huge and contains lots of irrelevant data.
>> 
>> When I create the full database in native dumb style (create everything
>> as tuples), this crap becomes huge and nearly untractable by Python.
>> 
>> I managed to build some versions, but see further:
>> 
>> In extent to the 25 tables snapshot, this database mutates every 2 weeks!
>> Most of the time, there are a few thousand updates.
>> But sometimes, the whole database changes, because they decided to
>> remove and add some columns, which creates a huge update that changes
>> almost everything.
>> 
>> I am trying to cope with that in a better way.
>> I examined lots of approaches to cope with such structures and tried some
>> things with btree forests.
>> 
>> After all, it turned out that structural changes of the database (2 columns
>> removed, 5 inserted) result in huge updates with no real effect.
>> 
>> Question:
>> Did you have that problem, and can you give me some advice?
>> I was thinking to switch the database to a column-oriented layout, since
>> this way I could probably get rid of big deltas which just re-arrange very
>> many columns.
>> 
>> But the overhead for doing this seems to be huge, again.
>> 
>> Do you have a good implementation of a column store?
>> I would like to implement a database that tracks everything, but is able to
>> cope
>> with such massive but simple changes.
>> 
>> In effect, I don't want to keep all the modified records, but have some
>> function
>> that creates the currently relevant tuples on-demand.
>> Even that seems difficult. And the whole problem is quite trivial, it just
>> suffers
>> from Python's idea to create so very many objects.
>> 
>> 
>> 
>> So my question, again:
> 
> I doubt I understand them. :)
> 
>> - you have 25 tables
> 
> Of course, ZODB doesn't have tables.
> 
> We have applications with many more data types.
> 
> We also have applications with many more collections,
> which are often heterogeneous.
> 
> In ZODB data types and collections are generally
> orthogonal.
> 
> Good OO database design tries to avoid
> queries/joins in favor of object traversal.
> 
>> 
>> - tables are huge (500,000 to 1,000,000 records)
> 
> We have larger collections. 
> 
>> - highly redundant (very many things could be resolved by a function with
>> special cases)
>> 
>> - a new version comes every two weeks
>> 
>> - I need to be able to inquire every version
> 
> Not sure what this means.
> 
> 
>> How would you treat this?
> 
> I don't know what you're referring to as
> "this".
> 
> There are a number of strategies
> to schema migration, some as simple
> as providing defaults for new attributes
> in classes, to custom __setstate__ scripts
> to in-place data migration, to *potentially*,
> database transformation during replication.
> 
>> What would you actually store?
> 
> Um, that's too vague a question.
> 
>> Would you generate a full DB every 2 weeks, or would you (as I do) try to
>> find a structure that knows about the differences?
> 
> I don't think I/we understand your problem well enough to
> answer.  If data has a very low shelf life, then replacing it frequently
> might make sense.  If the schema changes that frequently, I'd
> as why.  If this is a data analysis application, you might be better
> served by tools designed for that.
> 
>> Is Python still the way to go, or should I stop this and use something like
>> PostgreSQL? (And I doubt that this would give a benefi

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Christian Tismer


On 18.08.13 18:34, Jim Fulton wrote:

On Sun, Aug 18, 2013 at 12:17 PM, Christian Tismer  wrote:
...

We get a medication prescription database in a certain serialized format
which is standard in Germany for all pharmacy support companies.

This database comes in ~25 files == tables in a zip file every two weeks.
The DB is actually a structured set of SQL tables with references et al.

So you get an entire database snapshot every 2 weeks?


I actually did not want to change the design and simply created the table
structure that they have, using ZODB, with tables as btrees that contain
tuples for the records, so this is basically the SQL model, mimicked in
Zodb.

OK.  I don't see what advantage you hope to get from ZODB.


I want its flexibility. I need python and zodb to transform the data tables
before I understand them. I use Python to stress and inquire and validate my
implementation, and their data structures, before I trust it and maybe 
turn it
(painfully) into an SQL db. Maybe not at all, as I learn from playing 
with Zodb.


Have you ever tried to "play" with an SQL DB?
This is very painful and boring to set up and get right.
I only do that after I have studied the data with Python.
In this case, simply looking at pickles huge dicts did not scale, because of
too much data. That was the reason to dive into Zodb. With success.




What is boring is the fact, that the database gets incremental updates all
the time,
changed prices, packing info, etc.

Are these just data updates? Or schema updates too?


At first I was told that there are data updates, only. Then, due to my 
validation
analyze during parsing, I found out that there were structural schema 
changes

as well. Some were just relaxations or strengthened constraints, but there
were three major changes lately, that incolved the whole tables by inserting
and removing columns.
The whole catastrope, so to say.

As always, when the customer swears "this will never happen", you should be
prepared to implement exactly that impossible case. :-)




We need to cope with millions of recipes that come from certain dates
and therefore need to inquire different versions of the database.

I don't understand this. What's a "recipe"?  Why do you need to
consider old versions of the database?



Not recipes, but prescriptions. (Unfortunately these words collapse in 
German).

We get millions of these every month and have to use the right data from the
DB version which was active at that time when the prescription was issued.

That made me want to create a "time machine" interface to the DB without the
need to have several GB of that crap as slightly different variations of
basically the same stuff.

Made some promising experiments today with column btrees.
ZODB is performing well with 100 million of buckets!

cheers - Chris

--
Christian Tismer :^)   
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Christian Tismer


Leo,
You seem to understand me perfectly!
Have we met before?

ciao - chris

 On 19.08.13 00:31, Leonardo Rochael Almeida wrote:

AFAICT, Chistian's problem is that an SQL database would not be such a
good fit, due to the "time travel" requirement. IIUC, he has to look
up records as they were in the past, including whatever fields they
had in the past, even if they're no longer part of the schema for the
current data. To do this in SQL:

  * EITHER he creates a new SQL database (or a new set of tables on the
same database) for each new revision of the incoming information, and
each database (or set of tables) would then be free to have its own
schema, but there would be lots of duplication in the data that hasn't
changed between databases,

  * OR he'll have to do a time-travel superstructure on his one
database [1], adding time range columns to each table indicating the
validity of each record, and having lots of duplicated records
differing only in the time-range and a few fields. Not to mention the
fact that the schema for these tables would contain the union of all
columns that were valid at any one point in time, and lots of NULLS in
these columns.

[1] http://en.wikipedia.org/wiki/Temporal_database

So, I believe Christian is considering flexible (or rather,
non-existent) schema of ZODB (and perhaps the built-in time-travel
capabilities) as pretty good fit to his problem.

But he seems to be worried about data volume and its impact on
performance. He's also wondering how to best design the storage of
this data on ZODB taking into account the fact that the schema changes
frequently.

If (as he indicates) he stores the data as tuples in BTrees (one BTree
per "table", keyed by the primary key of the original table), he'll be
forced to rewrite all the tuples of each BTree (table) that changes
schema, which could mean almost as much duplication as the "one SQL
Database per revision" case.

On the other hand, he seems to speculate that perhaps he could store
one BTree per table COLUMN (per revision?), keyed by the primary key
of the original table. This way, each new incoming data revision would
only need touch the data that actually changed, and schema changes
would mean the deletion or addition of entire BTrees, w/o having to
touch the unchanged data.

Cheers,

Leo


On Sun, Aug 18, 2013 at 3:07 PM, Claudiu Saftoiu  wrote:

I wonder, if you have a problem which an SQL database would be so good for that 
youre mimicking an SQL database with zodb, why not just use an SQL database? It 
doesn't sound like you'll gain much from being able to persist objects which is 
one of the main reasons to use an object database...


On Aug 18, 2013, at 12:17 PM, Christian Tismer  wrote:


On 18.08.13 17:09, Jim Fulton wrote:

On Fri, Aug 16, 2013 at 11:49 PM, Christian Tismer  wrote:


Explaining very concisely, now.


I don't think I/we understand your problem well enough to answer. If data has a 
very low shelf life, then replacing it frequently might make sense. If the 
schema changes that frequently, I'd as why. If this is a data analysis 
application, you might be better served by tools designed for that.

Is Python still the way to go, or should I stop this and use something like
PostgreSQL? (And I doubt that this would give a benefit, actually).

Ditto,


Would you implement a column store, and how would you do that?

Ditto.


Right now, everything gets too large, and I'm quite desperate. Therefore,
I'm
asking the master, which you definately are!

"large" can mean many things. The examples you give don't
seem very large in terms of storage, at least not for ZODB.

Beyond that there are lots of dimensions of scale that ZODB
doesn't handle well (e.g. large transaction rates, very
high availability).

It's really hard to make specific recommendations without
knowing more about the problem. (And it's likely that someone
wouldn't be able to spend the time necessary to learn more
about the problem without a stake in it. IOW, don't assume I'll
read a much longer post getting into details. :)


Ok, just the sketch of it to make things clearer, don't waste time on this ;-)

We get a medication prescription database in a certain serialized format
which is standard in Germany for all pharmacy support companies.

This database comes in ~25 files == tables in a zip file every two weeks.
The DB is actually a structured set of SQL tables with references et al.

I actually did not want to change the design and simply created the table
structure that they have, using ZODB, with tables as btrees that contain
tuples for the records, so this is basically the SQL model, mimicked in Zodb.

What is boring is the fact, that the database gets incremental updates all the 
time,
changed prices, packing info, etc.
We need to cope with millions of recipes that come from certain dates
and therefore need to inquire different versions of the database.

I just hate the huge redundancy that these database versions would have
and tried to find a

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Alan Runyan

> > Would you implement a column store, and how would you do that?
>
> Ditto.

So many Dittos, it  sounds like a Rush Limbaugh talk show :)

> "large" can mean many things. The examples you give don't
> seem very large in terms of storage, at least not for ZODB.

One app we have is 26,344,368 objects.
ZODB is the least of its concerns.

> It's really hard to make specific recommendations without
> knowing more about the problem. (And it's likely that someone
> wouldn't be able to spend the time necessary to learn more
> about the problem without a stake in it. IOW, don't assume I'll
> read a much longer post getting into details. :)

This is fair.  ZODB is intimately tied to the application design so
it is a bit difficult for someone to qualify what they are doing
without having to explain the application design.

This sucks from a newbie's point of view but its reality.

I just wrote up some thoughts on ZODB.
Might be useful for others - doubtful - but maybe.

https://docs.google.com/document/d/12RGOTSMrl0CttkCZJ5rp-TSaakAY2Pn4VnWhVMcFMQw/edit?usp=sharing

Anyway.  Tismer if you write up more thoughts; I will read them.

Not guaranteeing a response.

cheers
-- 
Alan Runyan
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Christian Tismer


Ah, danke mabe ;-)

On 18.08.13 19:56, Jim Fulton wrote:

On Sun, Aug 18, 2013 at 1:40 PM, [mabe]  wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

He meant prescription.

In german Rezept is the word for both prescription and recipe (like in
cooking). Easy to confuse for us germans in english :)

Great.  Now I don't know what he meant by prescription. :) Does it
matter?  Might it as easily be foos and bars?

Christian,

Are you saying that you might need to access items
from an old database that aren't in the current snapshot?


Yes, prescription, sorry.
Yes, we need to look into different versions of the
continuously actualized data base. Like I did it now this creates
a slightly different, read-only data base every two weeks. Not that big deal
after I built the first DB today, we can probably live with < 300 MB
of database each version. (using zlibstorage)
It is just my optimizer brain, and the fact that the whole history of 
the stuff

since 2012-01-01 fits into 125 MB of ZIP files, as delta-updates.

There must be a solution that utilizes this incremental update stuff nicely.
I wanted to use a versioned variant of btree, until I found out that even
the table lauout changed a bit three times, which creates a huge update.

cheers - chris

p.s.:
I needed to patch zlibstorage for Python 3.
Where can I put a pull request?


Jim



On 08/18/2013 06:34 PM, Jim Fulton wrote:

On Sun, Aug 18, 2013 at 12:17 PM, Christian Tismer
 wrote:

We need to cope with millions of recipes that come from certain
dates and therefore need to inquire different versions of the
database.

I don't understand this. What's a "recipe"?  Why do you need to
consider old versions of the database?


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.20 (GNU/Linux)

iQIcBAEBAgAGBQJSEQb/AAoJEAOmTcUxK/swEXgP/Ry3x9Y98wp43e2F2cf2063O
F2UGRNZfylMjG3kTBLfwW9eH5KWk7AmCXdzUw/fXggueyg0NrH9f8aScYVPYHSEp
g3q9n/I93DrMdDakqLXcnpHlKuUrd1ZfBk+XSyavvnOdV4LWGJ6+Wd8yqAFmUUCl
bn//STvajUqSpO1+nG0aQsSceeTCVTEuyzQ/O4nSujhERG2ED7XOwi/1WwgruZSY
2ZGZCeLmHHLgYg6G8zPDRX6q/Y0GYLGi2bCQ0aQWlHEkBJBtPgCWn3rG+9GBlNXv
bSXu0yjbaHL3q8VvdwAh4Y7n8E9TV1KVojOJmCg6MOA+AusL475Lao2/yBtZG3s3
mg12/NSUY/hGGoqtnsvXkIV8+ggK7WVlZRDzAoiHymR/3kdNO4MWYxFcvjCrvu8x
RB6gIsVLglWKu5cuCJDrK7eGmdVK/y0Tmtl2qGKNnn+PJrZqNB9rk2kfmPMVIBdy
VkFjvBQICL3aFZjSEDeqOeLdis221V9y3ndgKer6K5OG2KBNsv8dUX2smb7Qx7RT
dbhhXwhI3C9i7ifzDEcrUavUfJCDQNLQovo1F/sL5hChFJAFS6USeWALt7B41YBu
lN5ThjgIhkuyWfhs+ZAPeze5rRcY5lt+3oWLcD9fav+jJsifGodBdLrJ2dbljtWw
4FJBrKq/+ULC03toajwM
=A/VY
-END PGP SIGNATURE-
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev






--
Christian Tismer :^)   
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Leonardo Rochael Almeida

AFAICT, Chistian's problem is that an SQL database would not be such a
good fit, due to the "time travel" requirement. IIUC, he has to look
up records as they were in the past, including whatever fields they
had in the past, even if they're no longer part of the schema for the
current data. To do this in SQL:

 * EITHER he creates a new SQL database (or a new set of tables on the
same database) for each new revision of the incoming information, and
each database (or set of tables) would then be free to have its own
schema, but there would be lots of duplication in the data that hasn't
changed between databases,

 * OR he'll have to do a time-travel superstructure on his one
database [1], adding time range columns to each table indicating the
validity of each record, and having lots of duplicated records
differing only in the time-range and a few fields. Not to mention the
fact that the schema for these tables would contain the union of all
columns that were valid at any one point in time, and lots of NULLS in
these columns.

[1] http://en.wikipedia.org/wiki/Temporal_database

So, I believe Christian is considering flexible (or rather,
non-existent) schema of ZODB (and perhaps the built-in time-travel
capabilities) as pretty good fit to his problem.

But he seems to be worried about data volume and its impact on
performance. He's also wondering how to best design the storage of
this data on ZODB taking into account the fact that the schema changes
frequently.

If (as he indicates) he stores the data as tuples in BTrees (one BTree
per "table", keyed by the primary key of the original table), he'll be
forced to rewrite all the tuples of each BTree (table) that changes
schema, which could mean almost as much duplication as the "one SQL
Database per revision" case.

On the other hand, he seems to speculate that perhaps he could store
one BTree per table COLUMN (per revision?), keyed by the primary key
of the original table. This way, each new incoming data revision would
only need touch the data that actually changed, and schema changes
would mean the deletion or addition of entire BTrees, w/o having to
touch the unchanged data.

Cheers,

Leo

On Sun, Aug 18, 2013 at 3:07 PM, Claudiu Saftoiu  wrote:
> I wonder, if you have a problem which an SQL database would be so good for 
> that youre mimicking an SQL database with zodb, why not just use an SQL 
> database? It doesn't sound like you'll gain much from being able to persist 
> objects which is one of the main reasons to use an object database...
>
>
> On Aug 18, 2013, at 12:17 PM, Christian Tismer  wrote:
>
>> On 18.08.13 17:09, Jim Fulton wrote:
>>> On Fri, Aug 16, 2013 at 11:49 PM, Christian Tismer  
>>> wrote:
>>> 
>>
>> Explaining very concisely, now.
>>
>>> I don't think I/we understand your problem well enough to answer. If data 
>>> has a very low shelf life, then replacing it frequently might make sense. 
>>> If the schema changes that frequently, I'd as why. If this is a data 
>>> analysis application, you might be better served by tools designed for that.
 Is Python still the way to go, or should I stop this and use something like
 PostgreSQL? (And I doubt that this would give a benefit, actually).
>>> Ditto,
>>>
 Would you implement a column store, and how would you do that?
>>> Ditto.
>>>
 Right now, everything gets too large, and I'm quite desperate. Therefore,
 I'm
 asking the master, which you definately are!
>>> "large" can mean many things. The examples you give don't
>>> seem very large in terms of storage, at least not for ZODB.
>>>
>>> Beyond that there are lots of dimensions of scale that ZODB
>>> doesn't handle well (e.g. large transaction rates, very
>>> high availability).
>>>
>>> It's really hard to make specific recommendations without
>>> knowing more about the problem. (And it's likely that someone
>>> wouldn't be able to spend the time necessary to learn more
>>> about the problem without a stake in it. IOW, don't assume I'll
>>> read a much longer post getting into details. :)
>>>
>>
>> Ok, just the sketch of it to make things clearer, don't waste time on this 
>> ;-)
>>
>> We get a medication prescription database in a certain serialized format
>> which is standard in Germany for all pharmacy support companies.
>>
>> This database comes in ~25 files == tables in a zip file every two weeks.
>> The DB is actually a structured set of SQL tables with references et al.
>>
>> I actually did not want to change the design and simply created the table
>> structure that they have, using ZODB, with tables as btrees that contain
>> tuples for the records, so this is basically the SQL model, mimicked in Zodb.
>>
>> What is boring is the fact, that the database gets incremental updates all 
>> the time,
>> changed prices, packing info, etc.
>> We need to cope with millions of recipes that come from certain dates
>> and therefore need to inquire different versions of the database.
>>
>> I just hate the huge redundancy

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Christian Tismer


Hi Claudiu,

On 18.08.13 20:07, Claudiu Saftoiu wrote:

I wonder, if you have a problem which an SQL database would be so good for that 
youre mimicking an SQL database with zodb, why not just use an SQL database? It 
doesn't sound like you'll gain much from being able to persist objects which is 
one of the main reasons to use an object database...


This is because I hate to create DB servers in the first place, loose 
all the

flexibility of Python, create import scripts which deal with the limitations
of the RDBMS, ...

Of cource, it probably makes sense to switch to an SQL database, in the end.
I just wanted to keep things in Python as long as possible, to explore the
data and not having to understand the relations in the first place.

I need to squeeze and treat and brush the data, before I use something else.
This is pretty much like switching from Python to C - it is the very 
last thing

that I want to do, because Python -> SQLDB is like Python -> C:

You are carving things into stone, get lots of constraints and loose 
flexibility.


In this case I was a bit over the tops, but I'm already quite pleased 
with today's

approach, 25 btrees of namedtuple records are very nice to explore.
Utilizing a tuple cache (also as zodb/durus), I can create and save the 
database

in 20 minutes, resulting in compressed size of 300 MB. Quite a starter...

cheers - chris



On Aug 18, 2013, at 12:17 PM, Christian Tismer  wrote:


On 18.08.13 17:09, Jim Fulton wrote:

On Fri, Aug 16, 2013 at 11:49 PM, Christian Tismer  wrote:


Explaining very concisely, now.


I don't think I/we understand your problem well enough to answer. If data has a 
very low shelf life, then replacing it frequently might make sense. If the 
schema changes that frequently, I'd as why. If this is a data analysis 
application, you might be better served by tools designed for that.

Is Python still the way to go, or should I stop this and use something like
PostgreSQL? (And I doubt that this would give a benefit, actually).

Ditto,


Would you implement a column store, and how would you do that?

Ditto.


Right now, everything gets too large, and I'm quite desperate. Therefore,
I'm
asking the master, which you definately are!

"large" can mean many things. The examples you give don't
seem very large in terms of storage, at least not for ZODB.

Beyond that there are lots of dimensions of scale that ZODB
doesn't handle well (e.g. large transaction rates, very
high availability).

It's really hard to make specific recommendations without
knowing more about the problem. (And it's likely that someone
wouldn't be able to spend the time necessary to learn more
about the problem without a stake in it. IOW, don't assume I'll
read a much longer post getting into details. :)


Ok, just the sketch of it to make things clearer, don't waste time on this ;-)

We get a medication prescription database in a certain serialized format
which is standard in Germany for all pharmacy support companies.

This database comes in ~25 files == tables in a zip file every two weeks.
The DB is actually a structured set of SQL tables with references et al.

I actually did not want to change the design and simply created the table
structure that they have, using ZODB, with tables as btrees that contain
tuples for the records, so this is basically the SQL model, mimicked in Zodb.

What is boring is the fact, that the database gets incremental updates all the 
time,
changed prices, packing info, etc.
We need to cope with millions of recipes that come from certain dates
and therefore need to inquire different versions of the database.

I just hate the huge redundancy that these database versions would have
and tried to find a way to put this all into a single Zodb with a way to
time-travel to every version.

The weird thing is that the DB also changes its structure over time:

- new fields are added, old fields dropped.

That's the reason why I thought to store the tables by column, and each column 
is
a BTree on itself. Is that feasible at all?

Of the 25 tables, there are 4 quite large, like
4 tables x 500,000 rows x 100 columns,
== 200,000,000 cells in one database.

With a btree bucket size of ~60, this gives ~ 3,333,333 buckets.
With multiple versions, this will be even more.

-- Can Zodb handle so many objects and still open the db fast?
-- Or will the huge index kill performance?

That's all I'm asking before doing another experiment ;-)

but don't waste time, just telling you the story -- chris

--
Christian Tismer :^)   
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Claudiu Saftoiu

I wonder, if you have a problem which an SQL database would be so good for that 
youre mimicking an SQL database with zodb, why not just use an SQL database? It 
doesn't sound like you'll gain much from being able to persist objects which is 
one of the main reasons to use an object database...


On Aug 18, 2013, at 12:17 PM, Christian Tismer  wrote:

> On 18.08.13 17:09, Jim Fulton wrote:
>> On Fri, Aug 16, 2013 at 11:49 PM, Christian Tismer  
>> wrote:
>> 
> 
> Explaining very concisely, now.
> 
>> I don't think I/we understand your problem well enough to answer. If data 
>> has a very low shelf life, then replacing it frequently might make sense. If 
>> the schema changes that frequently, I'd as why. If this is a data analysis 
>> application, you might be better served by tools designed for that.
>>> Is Python still the way to go, or should I stop this and use something like
>>> PostgreSQL? (And I doubt that this would give a benefit, actually).
>> Ditto,
>> 
>>> Would you implement a column store, and how would you do that?
>> Ditto.
>> 
>>> Right now, everything gets too large, and I'm quite desperate. Therefore,
>>> I'm
>>> asking the master, which you definately are!
>> "large" can mean many things. The examples you give don't
>> seem very large in terms of storage, at least not for ZODB.
>> 
>> Beyond that there are lots of dimensions of scale that ZODB
>> doesn't handle well (e.g. large transaction rates, very
>> high availability).
>> 
>> It's really hard to make specific recommendations without
>> knowing more about the problem. (And it's likely that someone
>> wouldn't be able to spend the time necessary to learn more
>> about the problem without a stake in it. IOW, don't assume I'll
>> read a much longer post getting into details. :)
>> 
> 
> Ok, just the sketch of it to make things clearer, don't waste time on this ;-)
> 
> We get a medication prescription database in a certain serialized format
> which is standard in Germany for all pharmacy support companies.
> 
> This database comes in ~25 files == tables in a zip file every two weeks.
> The DB is actually a structured set of SQL tables with references et al.
> 
> I actually did not want to change the design and simply created the table
> structure that they have, using ZODB, with tables as btrees that contain
> tuples for the records, so this is basically the SQL model, mimicked in Zodb.
> 
> What is boring is the fact, that the database gets incremental updates all 
> the time,
> changed prices, packing info, etc.
> We need to cope with millions of recipes that come from certain dates
> and therefore need to inquire different versions of the database.
> 
> I just hate the huge redundancy that these database versions would have
> and tried to find a way to put this all into a single Zodb with a way to
> time-travel to every version.
> 
> The weird thing is that the DB also changes its structure over time:
> 
> - new fields are added, old fields dropped.
> 
> That's the reason why I thought to store the tables by column, and each 
> column is
> a BTree on itself. Is that feasible at all?
> 
> Of the 25 tables, there are 4 quite large, like
> 4 tables x 500,000 rows x 100 columns,
> == 200,000,000 cells in one database.
> 
> With a btree bucket size of ~60, this gives ~ 3,333,333 buckets.
> With multiple versions, this will be even more.
> 
> -- Can Zodb handle so many objects and still open the db fast?
> -- Or will the huge index kill performance?
> 
> That's all I'm asking before doing another experiment ;-)
> 
> but don't waste time, just telling you the story -- chris
> 
> -- 
> Christian Tismer :^)   
> Software Consulting  : Have a break! Take a ride on Python's
> Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
> 14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
> phone +49 173 24 18 776  fax +49 (30) 700143-0023
> PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
>  whom do you want to sponsor today?   http://www.stackless.com/
> 
> ___
> For more information about ZODB, see http://zodb.org/
> 
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> https://mail.zope.org/mailman/listinfo/zodb-dev
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Jim Fulton

On Sun, Aug 18, 2013 at 1:40 PM, [mabe]  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> He meant prescription.
>
> In german Rezept is the word for both prescription and recipe (like in
> cooking). Easy to confuse for us germans in english :)

Great.  Now I don't know what he meant by prescription. :) Does it
matter?  Might it as easily be foos and bars?

Christian,

Are you saying that you might need to access items
from an old database that aren't in the current snapshot?

Jim


>
> On 08/18/2013 06:34 PM, Jim Fulton wrote:
>> On Sun, Aug 18, 2013 at 12:17 PM, Christian Tismer
>>  wrote:
>>> We need to cope with millions of recipes that come from certain
>>> dates and therefore need to inquire different versions of the
>>> database.
>>
>> I don't understand this. What's a "recipe"?  Why do you need to
>> consider old versions of the database?
>
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.20 (GNU/Linux)
>
> iQIcBAEBAgAGBQJSEQb/AAoJEAOmTcUxK/swEXgP/Ry3x9Y98wp43e2F2cf2063O
> F2UGRNZfylMjG3kTBLfwW9eH5KWk7AmCXdzUw/fXggueyg0NrH9f8aScYVPYHSEp
> g3q9n/I93DrMdDakqLXcnpHlKuUrd1ZfBk+XSyavvnOdV4LWGJ6+Wd8yqAFmUUCl
> bn//STvajUqSpO1+nG0aQsSceeTCVTEuyzQ/O4nSujhERG2ED7XOwi/1WwgruZSY
> 2ZGZCeLmHHLgYg6G8zPDRX6q/Y0GYLGi2bCQ0aQWlHEkBJBtPgCWn3rG+9GBlNXv
> bSXu0yjbaHL3q8VvdwAh4Y7n8E9TV1KVojOJmCg6MOA+AusL475Lao2/yBtZG3s3
> mg12/NSUY/hGGoqtnsvXkIV8+ggK7WVlZRDzAoiHymR/3kdNO4MWYxFcvjCrvu8x
> RB6gIsVLglWKu5cuCJDrK7eGmdVK/y0Tmtl2qGKNnn+PJrZqNB9rk2kfmPMVIBdy
> VkFjvBQICL3aFZjSEDeqOeLdis221V9y3ndgKer6K5OG2KBNsv8dUX2smb7Qx7RT
> dbhhXwhI3C9i7ifzDEcrUavUfJCDQNLQovo1F/sL5hChFJAFS6USeWALt7B41YBu
> lN5ThjgIhkuyWfhs+ZAPeze5rRcY5lt+3oWLcD9fav+jJsifGodBdLrJ2dbljtWw
> 4FJBrKq/+ULC03toajwM
> =A/VY
> -END PGP SIGNATURE-
> ___
> For more information about ZODB, see http://zodb.org/
>
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> https://mail.zope.org/mailman/listinfo/zodb-dev



-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread [mabe]

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

He meant prescription.

In german Rezept is the word for both prescription and recipe (like in
cooking). Easy to confuse for us germans in english :)

On 08/18/2013 06:34 PM, Jim Fulton wrote:
> On Sun, Aug 18, 2013 at 12:17 PM, Christian Tismer 
>  wrote:
>> We need to cope with millions of recipes that come from certain 
>> dates and therefore need to inquire different versions of the 
>> database.
> 
> I don't understand this. What's a "recipe"?  Why do you need to 
> consider old versions of the database?


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.20 (GNU/Linux)

iQIcBAEBAgAGBQJSEQb/AAoJEAOmTcUxK/swEXgP/Ry3x9Y98wp43e2F2cf2063O
F2UGRNZfylMjG3kTBLfwW9eH5KWk7AmCXdzUw/fXggueyg0NrH9f8aScYVPYHSEp
g3q9n/I93DrMdDakqLXcnpHlKuUrd1ZfBk+XSyavvnOdV4LWGJ6+Wd8yqAFmUUCl
bn//STvajUqSpO1+nG0aQsSceeTCVTEuyzQ/O4nSujhERG2ED7XOwi/1WwgruZSY
2ZGZCeLmHHLgYg6G8zPDRX6q/Y0GYLGi2bCQ0aQWlHEkBJBtPgCWn3rG+9GBlNXv
bSXu0yjbaHL3q8VvdwAh4Y7n8E9TV1KVojOJmCg6MOA+AusL475Lao2/yBtZG3s3
mg12/NSUY/hGGoqtnsvXkIV8+ggK7WVlZRDzAoiHymR/3kdNO4MWYxFcvjCrvu8x
RB6gIsVLglWKu5cuCJDrK7eGmdVK/y0Tmtl2qGKNnn+PJrZqNB9rk2kfmPMVIBdy
VkFjvBQICL3aFZjSEDeqOeLdis221V9y3ndgKer6K5OG2KBNsv8dUX2smb7Qx7RT
dbhhXwhI3C9i7ifzDEcrUavUfJCDQNLQovo1F/sL5hChFJAFS6USeWALt7B41YBu
lN5ThjgIhkuyWfhs+ZAPeze5rRcY5lt+3oWLcD9fav+jJsifGodBdLrJ2dbljtWw
4FJBrKq/+ULC03toajwM
=A/VY
-END PGP SIGNATURE-
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Jim Fulton

On Sun, Aug 18, 2013 at 12:17 PM, Christian Tismer  wrote:
...
> We get a medication prescription database in a certain serialized format
> which is standard in Germany for all pharmacy support companies.
>
> This database comes in ~25 files == tables in a zip file every two weeks.
> The DB is actually a structured set of SQL tables with references et al.

So you get an entire database snapshot every 2 weeks?

> I actually did not want to change the design and simply created the table
> structure that they have, using ZODB, with tables as btrees that contain
> tuples for the records, so this is basically the SQL model, mimicked in
> Zodb.

OK.  I don't see what advantage you hope to get from ZODB.

> What is boring is the fact, that the database gets incremental updates all
> the time,
> changed prices, packing info, etc.

Are these just data updates? Or schema updates too?

> We need to cope with millions of recipes that come from certain dates
> and therefore need to inquire different versions of the database.

I don't understand this. What's a "recipe"?  Why do you need to
consider old versions of the database?

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Christian Tismer


On 18.08.13 17:09, Jim Fulton wrote:

On Fri, Aug 16, 2013 at 11:49 PM, Christian Tismer  wrote:



Explaining very concisely, now.

I don't think I/we understand your problem well enough to answer. If 
data has a very low shelf life, then replacing it frequently might 
make sense. If the schema changes that frequently, I'd as why. If this 
is a data analysis application, you might be better served by tools 
designed for that.

Is Python still the way to go, or should I stop this and use something like
PostgreSQL? (And I doubt that this would give a benefit, actually).

Ditto,


Would you implement a column store, and how would you do that?

Ditto.


Right now, everything gets too large, and I'm quite desperate. Therefore,
I'm
asking the master, which you definately are!

"large" can mean many things. The examples you give don't
seem very large in terms of storage, at least not for ZODB.

Beyond that there are lots of dimensions of scale that ZODB
doesn't handle well (e.g. large transaction rates, very
high availability).

It's really hard to make specific recommendations without
knowing more about the problem. (And it's likely that someone
wouldn't be able to spend the time necessary to learn more
about the problem without a stake in it. IOW, don't assume I'll
read a much longer post getting into details. :)



Ok, just the sketch of it to make things clearer, don't waste time on 
this ;-)


We get a medication prescription database in a certain serialized format
which is standard in Germany for all pharmacy support companies.

This database comes in ~25 files == tables in a zip file every two weeks.
The DB is actually a structured set of SQL tables with references et al.

I actually did not want to change the design and simply created the table
structure that they have, using ZODB, with tables as btrees that contain
tuples for the records, so this is basically the SQL model, mimicked in 
Zodb.


What is boring is the fact, that the database gets incremental updates 
all the time,

changed prices, packing info, etc.
We need to cope with millions of recipes that come from certain dates
and therefore need to inquire different versions of the database.

I just hate the huge redundancy that these database versions would have
and tried to find a way to put this all into a single Zodb with a way to
time-travel to every version.

The weird thing is that the DB also changes its structure over time:

- new fields are added, old fields dropped.

That's the reason why I thought to store the tables by column, and each 
column is

a BTree on itself. Is that feasible at all?

Of the 25 tables, there are 4 quite large, like
4 tables x 500,000 rows x 100 columns,
== 200,000,000 cells in one database.

With a btree bucket size of ~60, this gives ~ 3,333,333 buckets.
With multiple versions, this will be even more.

-- Can Zodb handle so many objects and still open the db fast?
-- Or will the huge index kill performance?

That's all I'm asking before doing another experiment ;-)

but don't waste time, just telling you the story -- chris

--
Christian Tismer :^)   
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

2013-08-18 Thread Jim Fulton

On Fri, Aug 16, 2013 at 11:49 PM, Christian Tismer  wrote:
> Hi Jim et all!
>
> I am struggling with a weird data base, and my goal is to show off how
> great this works with (zodb|durus, the latter already failed pretty much).
>
> Just to give you an impression of the size of the problem:
>
> There are about 25 tables, each with currently 450,000 records.
> After all the changes since 20120101, there were 700,000 records involved
> and morphed for each table.
>
> These records have some relevant data, but extend to something like 95
> additional columns which are pretty cumbersome.
>
> This database is pretty huge and contains lots of irrelevant data.
>
> When I create the full database in native dumb style (create everything
> as tuples), this crap becomes huge and nearly untractable by Python.
>
> I managed to build some versions, but see further:
>
> In extent to the 25 tables snapshot, this database mutates every 2 weeks!
> Most of the time, there are a few thousand updates.
> But sometimes, the whole database changes, because they decided to
> remove and add some columns, which creates a huge update that changes
> almost everything.
>
> I am trying to cope with that in a better way.
> I examined lots of approaches to cope with such structures and tried some
> things with btree forests.
>
> After all, it turned out that structural changes of the database (2 columns
> removed, 5 inserted) result in huge updates with no real effect.
>
> Question:
> Did you have that problem, and can you give me some advice?
> I was thinking to switch the database to a column-oriented layout, since
> this way I could probably get rid of big deltas which just re-arrange very
> many columns.
>
> But the overhead for doing this seems to be huge, again.
>
> Do you have a good implementation of a column store?
> I would like to implement a database that tracks everything, but is able to
> cope
> with such massive but simple changes.
>
> In effect, I don't want to keep all the modified records, but have some
> function
> that creates the currently relevant tuples on-demand.
> Even that seems difficult. And the whole problem is quite trivial, it just
> suffers
> from Python's idea to create so very many objects.
>
> 
>
> So my question, again:

I doubt I understand them. :)

> - you have 25 tables

Of course, ZODB doesn't have tables.

We have applications with many more data types.

We also have applications with many more collections,
which are often heterogeneous.

In ZODB data types and collections are generally
orthogonal.

Good OO database design tries to avoid
queries/joins in favor of object traversal.

>
> - tables are huge (500,000 to 1,000,000 records)

We have larger collections. 

> - highly redundant (very many things could be resolved by a function with
> special cases)
>
> - a new version comes every two weeks
>
> - I need to be able to inquire every version

Not sure what this means.

> How would you treat this?

I don't know what you're referring to as
"this".

There are a number of strategies
to schema migration, some as simple
as providing defaults for new attributes
in classes, to custom __setstate__ scripts
to in-place data migration, to *potentially*,
database transformation during replication.

> What would you actually store?

Um, that's too vague a question.

> Would you generate a full DB every 2 weeks, or would you (as I do) try to
> find a structure that knows about the differences?

I don't think I/we understand your problem well enough to
answer.  If data has a very low shelf life, then replacing it frequently
might make sense.  If the schema changes that frequently, I'd
as why.  If this is a data analysis application, you might be better
served by tools designed for that.

> Is Python still the way to go, or should I stop this and use something like
> PostgreSQL? (And I doubt that this would give a benefit, actually).

Ditto,

> Would you implement a column store, and how would you do that?

Ditto.

>
> Right now, everything gets too large, and I'm quite desperate. Therefore,
> I'm
> asking the master, which you definately are!

"large" can mean many things. The examples you give don't
seem very large in terms of storage, at least not for ZODB.

Beyond that there are lots of dimensions of scale that ZODB
doesn't handle well (e.g. large transaction rates, very
high availability).

It's really hard to make specific recommendations without
knowing more about the problem. (And it's likely that someone
wouldn't be able to spend the time necessary to learn more
about the problem without a stake in it. IOW, don't assume I'll
read a much longer post getting into details. :)

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] polite advice request

2013-08-16 Thread Christian Tismer


Hi Jim et all!

I am struggling with a weird data base, and my goal is to show off how
great this works with (zodb|durus, the latter already failed pretty much).

Just to give you an impression of the size of the problem:

There are about 25 tables, each with currently 450,000 records.
After all the changes since 20120101, there were 700,000 records involved
and morphed for each table.

These records have some relevant data, but extend to something like 95
additional columns which are pretty cumbersome.

This database is pretty huge and contains lots of irrelevant data.

When I create the full database in native dumb style (create everything
as tuples), this crap becomes huge and nearly untractable by Python.

I managed to build some versions, but see further:

In extent to the 25 tables snapshot, this database mutates every 2 weeks!
Most of the time, there are a few thousand updates.
But sometimes, the whole database changes, because they decided to
remove and add some columns, which creates a huge update that changes
almost everything.

I am trying to cope with that in a better way.
I examined lots of approaches to cope with such structures and tried some
things with btree forests.

After all, it turned out that structural changes of the database (2 columns
removed, 5 inserted) result in huge updates with no real effect.

Question:
Did you have that problem, and can you give me some advice?
I was thinking to switch the database to a column-oriented layout, since
this way I could probably get rid of big deltas which just re-arrange very
many columns.

But the overhead for doing this seems to be huge, again.

Do you have a good implementation of a column store?
I would like to implement a database that tracks everything, but is able 
to cope

with such massive but simple changes.

In effect, I don't want to keep all the modified records, but have some 
function

that creates the currently relevant tuples on-demand.
Even that seems difficult. And the whole problem is quite trivial, it 
just suffers

from Python's idea to create so very many objects.



So my question, again:

- you have 25 tables

- tables are huge (500,000 to 1,000,000 records)

- highly redundant (very many things could be resolved by a function 
with special cases)


- a new version comes every two weeks

- I need to be able to inquire every version

How would you treat this?

What would you actually store?

Would you generate a full DB every 2 weeks, or would you (as I do) try to
find a structure that knows about the differences?

Is Python still the way to go, or should I stop this and use something like
PostgreSQL? (And I doubt that this would give a benefit, actually).

Would you implement a column store, and how would you do that?


Right now, everything gets too large, and I'm quite desperate. 
Therefore, I'm

asking the master, which you definately are!

cheers -- Chris

--
Christian Tismer :^)   
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

Re: [ZODB-Dev] polite advice request

[ZODB-Dev] polite advice request

16 matches

Site Navigation

Mail list logo

Footer information