On Saturday 10 August 2002 11:25 am, Johan Carlsson [Torped] wrote:
> At 08:59 2002-08-09 -0400, Casey Duncan said:
> >__record_schema__ is simply a dictionary which maps field names to column
> >positions (ints) so that the record knows the index of each field in the
> >record tuples.
> >See line 154 of Catalog.py to see how it is initialized to the Metadata
> >plus a few extra columns for catalog rid and scores.
> Hi Casey (and zope-dev),
> After some experimenting I realized that :-)
> One of the reasons I was because I am thinking about
> how to implement a "SELECT col1 as 'name', ... "type
> of feature for ZCatalogs.
> I'm not entirely sure it's an good idea to start with, but
> I'm thinking in the line of large ZCatalogs (by large I mean
> allot of columns in the self.data structure).
> If all columns are copied the brains would grow larger as well
> and by selecting explicitly which columns should be copied to
> the brain they would be lighter.
> Now that I understand how the data tuples are copied to the brain
> I'm not at all sure adding a filter when copying the tuple will optimize
> thing, because of the overhead in the filter process.
This occurs lazily so the savings would be heavily dependant on the
application. For most web apps presenting small batches of records, the
savings in limiting columns returned would be pretty minimal.
The general usage is to put a minimal set of columns in metadata, only enough
to create a results page and load the objects in cases where either large,
dynamic or otherwise arbitrary data elements are needed.
> (The way that I "solved" the group/calc part of my "project", I don't think
> it will lead to memory bloat. I'm going to implement a LacyGroupMap
> which take an extra parameter (a list of IISet). Each brain created
> in the LacyMap will have methods for calculations directly on the self.data
> in the Catalog. The data it self will not be stored.
> There will most probably be a pre calculate method that calculate all
> variables that are applicable and caches the result.)
Sounds like a pretty good solution. However, I would be hesitant in creating
direct dependancies on the internal Catalog data structures if you can help
it (sometimes you can't though).
> One way to reduce memory consumption in wide Catalogs would be
> to have LacyBrains (vertical lacyness, there might be reasons
> why that would be a bad idea, which I'm not aware of)
That would pretty much require a rewrite of the Catalog as the data structures
would need to be completely different. It would introduce significant
database overhead since each metadata field would need to be loaded
individually. I think that would negate whatever performance benefit metadata
might have over simply loading the objects.
> Another way would be to have multiple data attributes in the Catalog, like
> tables, and to join the tuples from them with a "from table1, table2"
> In this way it would be possible to control the width of the brains.
> It would also be possible for the object indexing it self to tell the
> in which "tables" it should store meta data.
Yes, this would be better. You could have different sets of metadata for each
catalog record. You would select which one you wanted at query time.
> There have been some proposals (ObjectHub et al) which I read some
> time ago. I didn't feel then that we what I was looking for.
> Please tell me if there's been any proposals or discussions regarding this.
I don't think so. If you feel strongly about this, write up a proposal and
provide some use cases for discussion.
> Johan Carlsson
Zope-Dev maillist - [EMAIL PROTECTED]
** No cross posts or HTML encoding! **
(Related lists -