Re: [Zope-dev] __record_schema__ of Brains (Was: Record.pyd)

2002-08-11 Thread Johan Carlsson [Torped]

At 21:28 2002-08-10 -0400, Casey Duncan said:
On Saturday 10 August 2002 11:25 am, Johan Carlsson [Torped] wrote:
  Now that I understand how the data tuples are copied to the brain
  I'm not at all sure adding a filter when copying the tuple will optimize
  thing, because of the overhead in the filter process.

This occurs lazily so the savings would be heavily dependant on the
application. For most web apps presenting small batches of records, the
savings in limiting columns returned would be pretty minimal.

But there must be some though implementing Record.pyd i C, but off course
I suppose Record.pyd was first used for ZSQL?

An easy filter would be to let __record_schema__ control which columns to
save, as it works to day __record_schema__  must point on a sequence starting
with 0, so I can't specify indexes into the tuple like this:

__record_schema__= {'hey':12, 'dude': 22}

Maybe this is easy to change in the record.pyd, or I just implement it
in a special brain base class?

After revisited Record.c I realized that the tuple from the catalogs self.data
is stored as a tuple (or as a C-array I suppose?) in a Record or as attributes
depending on what you provide to the constructor.
I suppose coping data to a C-array is much faster than creating
attributes on each brain, but if the array is large and the number attributes
needed to be set is small it might be the other way around.
I have no idea where they would break even.

Maybe I just will settle with having two different brain base classes and use
one that suits the current need.

The general usage is to put a minimal set of columns in metadata, only enough
to create a results page and load the objects in cases where either large,
dynamic or otherwise arbitrary data elements are needed.

Yes, and that is somewhat restricting.
My current applications use several different catalogs to get the
width of the meta_data down. The downside of this approach  is
that I end up with allot of catalogs and that it's a multitude time more
things to do for management, e.g. I must reindex all catalogs instead
of just one.

My primary goals are:
1. Get a general ZCatalog that can be used for all ZCatalog requirement 
(not only site searches),
2. Implement feature that removes the need for external RDBS (for instance
report generation is hard with ZCatalogs because of the lack of 
grouping/statistics).
3. Make ZCatalogs easier to manage, for instance the need of updating 
indexes and meta_data
definitions every time you change your applications data structure is 
annoying, especially at
development time. Objects could tell the ZCatalog which meta_data and 
indexes it wants removing
the need to manually add them. Off course you will need to clean up the 
ZCatalog from time to time.


  (The way that I solved the group/calc part of my project, I don't think
  it will lead to memory bloat. I'm going to implement a LacyGroupMap
  which take an extra parameter (a list of IISet). Each brain created
  in the LacyMap will have methods for calculations directly on the self.data
  in the Catalog. The data it self will not be stored.
  There will most probably be a pre calculate method that calculate all
  variables that are applicable and caches the result.)

Sounds like a pretty good solution. However, I would be hesitant in creating
direct dependancies on the internal Catalog data structures if you can help
it (sometimes you can't though).

I could soften the dependency by providing the catalog with an interface for
calculations and give the brain an reference to the catalog it self and
use the interface on that reference.


  One way to reduce memory consumption in wide Catalogs would be
  to have LacyBrains (vertical lacyness, there might be reasons
  why that would be a bad idea, which I'm not aware of)

That would pretty much require a rewrite of the Catalog as the data 
structures
would need to be completely different. It would introduce significant
database overhead since each metadata field would need to be loaded
individually. I think that would negate whatever performance benefit metadata
might have over simply loading the objects.

I'm not sure that it would be necessary to change the data structure, the 
brain could
use the same method as the LacyMap uses to load the data.
But LacyBrain would need to save all applicable data at once to be efficient.
The different would be that the brain will not fetch any data before the first
attribute has been called. When the first is called all applicable data will
be copied to the attribute according to __record_schema__.

This would probably not be more efficient for regular use of brains, but for
calculated group brains they wouldn't need to store the data at all if
they only used calculated fields.


  Another way would be to have multiple data attributes in the Catalog, like
  tables, and to join the tuples from them with a from table1, table2
  statement.
  In this way it would be possible to control the width of the 

Re: [Zope-dev] __record_schema__ of Brains (Was: Record.pyd)

2002-08-10 Thread Johan Carlsson [Torped]

At 08:59 2002-08-09 -0400, Casey Duncan said:
__record_schema__ is simply a dictionary which maps field names to column
positions (ints) so that the record knows the index of each field in the
record tuples.

See line 154 of Catalog.py to see how it is initialized to the Metadata 
schema
plus a few extra columns for catalog rid and scores.


Hi Casey (and zope-dev),
Thanks!
After some experimenting I realized that :-)

One of the reasons I was because I am thinking about
how to implement a SELECT col1 as 'name', ... type
of feature for ZCatalogs.

I'm not entirely sure it's an good idea to start with,  but
I'm thinking in the line of large ZCatalogs (by large I mean
allot of columns in the self.data structure).
If all columns are copied the brains would grow larger as well
and by selecting explicitly which columns should be copied to
the brain they would be lighter.

Now that I understand how the data tuples are copied to the brain
I'm not at all sure adding a filter when copying the tuple will optimize
thing, because of the overhead in the filter process.

(The way that I solved the group/calc part of my project, I don't think
it will lead to memory bloat. I'm going to implement a LacyGroupMap
which take an extra parameter (a list of IISet). Each brain created
in the LacyMap will have methods for calculations directly on the self.data
in the Catalog. The data it self will not be stored.
There will most probably be a pre calculate method that calculate all
variables that are applicable and caches the result.)

One way to reduce memory consumption in wide Catalogs would be
to have LacyBrains (vertical lacyness, there might be reasons
why that would be a bad idea, which I'm not aware of)

Another way would be to have multiple data attributes in the Catalog, like
tables, and to join the tuples from them with a from table1, table2 
statement.
In this way it would be possible to control the width of the brains.
It would also be possible for the object indexing it self to tell the catalog
in which tables it should store meta data.

There have been some proposals (ObjectHub et al) which I read some
time ago. I didn't feel then that we what I was looking for.
Please tell me if there's been any proposals or discussions regarding this.

Regards,
Johan Carlsson




-- 
Torped Strategi och Kommunikation AB
Johan Carlsson
[EMAIL PROTECTED]

Mail:
Birkagatan 9
SE-113 36  Stockholm
Sweden

Visit:
Västmannagatan 67, Stockholm, Sweden

Phone +46-(0)8-32 31 23
Fax +46-(0)8-32 31 83
Mobil +46-(0)70-558 25 24
http://www.torped.se
http://www.easypublisher.com


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] __record_schema__ of Brains (Was: Record.pyd)

2002-08-09 Thread Casey Duncan

__record_schema__ is simply a dictionary which maps field names to column 
positions (ints) so that the record knows the index of each field in the 
record tuples.

See line 154 of Catalog.py to see how it is initialized to the Metadata schema 
plus a few extra columns for catalog rid and scores.

-Casey

On Friday 09 August 2002 07:17 am, Johan Carlsson [Torped] wrote:
 Hi,
 I'm back on the Brain track :-)
 What function does the __record_schema__ attribute of the Brains have?
 
 Does it do anything else when provide the has_key feature?
  def has_key(self, key):
  return self.__record_schema__.has_key(key)
 
 
 Best Regards,
 Johan Carlsson


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )