Re: [U2] Improving performance

Tony Gravagno Sat, 10 Nov 2012 13:12:32 -0800

A join is a join in any DBMS. In SQL you'd do something  like this:

SELECT DONOR.id, PERSON.first_name, PERSON.last_name,
       DONOR.given_amt, DONOR.pledge_amt, PERSON.email
   FROM DONOR
   INNER JOIN PERSON
   ON DONOR.id=PERSON.id
   WHERE PERSON.spouse_last='Smith'
   ORDER BY PERSON.last_name BY PERSON.first_name


We trust that a RDBMS will cache each person related to each donor,
just as in MV we trust that the PERSON item won't go out of memory
while we are processing a given DONOR item. Further, if we're talking
about polygamists and two Donor Persons have a single Donor spouse, we
trust that the translate will keep the spouse in memory for as long as
possible as well. So this isn't a situation where MV should or should
not be one's friend -  every platform faces the same challenge.

In your example where the ID is the same in both items, it's certain
barring some .000000000001% possibility (time slice issue where some
other process consumes all memory ... not really possible) that the
Person won't be in memory for the Donor.

But your question becomes more valid when, for example, you're sorting
pledges by date, and you hit the same donor/person/spouse at many
different times throughout the rendering process. If you have 80,000
pledge records and record 1 references Donor1, then you get to the
last record which also happens to reference Donor1 after so many other
records have consumed cache and forced flushing, will Donor1 and
related records still be in memory? I believe the answer would be that
it depends on how much memory is available, how big those records are,
and what each MV implementation does with data that it caches like
this. Only an engineer at Rocket can answer the question for current
Universe and Unidata platforms.

Not much of an answer, just refining the question...
HTH
T

> From: Jeff Butera
> 
> At the outset, I'm a lover of Pick/MV but also have experience with
> 1NF (mostly mysql/postgresql).  Here's my delimma:
> 
> For the sake of simplicity, let's say I have two tables with the
following
> data attributes:
> 
> PERSON: id, first_name, last_name, email, phone, spouse
> 
> DONOR: id, given_amt, pledge_amt
> 
> The ID for PERSON and DONOR are same, and spouse is a X-pointer to
> another PERSON record (if populated).  Often our fundraisers write
> reports out of DONOR and often want spouse info (name, email,
> phone).
> This is typically accomplished with I-desc in DONOR such as:
> 
> spouse_first:
> TRANS('PERSON',TRANS('PERSON',@ID,'spouse','X'),'first_name','X')
> 
> spouse_last:
> TRANS('PERSON',TRANS('PERSON',@ID,'spouse','X'),'last_name','X')
> 
> spouse_email:
> TRANS('PERSON',TRANS('PERSON',@ID,'spouse','X'),'email','X')
> 
> spouse_phone:
> TRANS('PERSON',TRANS('PERSON',@ID,'spouse','X'),'phone','X')
> 
> Thus, for each DONOR record in my report if I want the four fields
> above I've got to perform 8 reads of the PERSON table.  Sure,
caching
> will improve this performance - but let's forget that for the
moment.
> So for
> 80000 DONOR records I'm doing 640000 PERSON reads.
> 
> Am I correct about how TRANS is working in these cases?
> 
> In an attempt to improve performance, I've taken to replicating the
> spouse field in the DONOR record by using an update trigger on
> PERSON to
> ensure spouse is updated in DONOR as well.   By replicating the
spouse
> in DONOR, I eliminate one TRANS in each field and cut my 640000
> PERSON reads to 320000 reads.
> 
> That said, I'm looking for other means - this is one place where
Pick
> isn't my friend as we often have reports written out of various
tables
> and need to use 32+ TRANS statements to pull in PERSON demographic
> info.  Any insight appreciated.


_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] Improving performance

Reply via email to