RE: [Users] Related to replication from multiple database nodestoonedatabase node

Abbate, Joseph M Wed, 31 Aug 2005 07:35:57 -0700

Hello Ajay,

You wrote ...


> I completely uderstood the thigs that you have explain me. 

It doesn't seem like I'm getting through because you didn't answer my
question:

    If node A does NOT forward data to the other nodes, WHO will?

Let's address the basic purpose of replication first.  As implied by the
term Consistent Distributed Data Sets (CDDS), the main use of
replication is to keep tables in one database completely in sync (in
"near real-time") with the same named tables in another database.  If
database A has a customer table with 3,245 rows, then Replicator's
objective is to keep those same 3,245 rows in database B (and C, D and
E, in your case).  If you add a new customer in B, it has to be added to
all other four databases, otherwise there's no point in talking about
CONSISTENT data sets.

If you have two full peer databases, the paths to achieve consistency
are straightforward:  A to B and B to A.  If you define the path from B
to A and do NOT define the path from A to B, then updates that are made
in A will be lost and the databases will be inconsistent with each
other.  The only way you could avoid defining one of the paths with two
databases is if one of them was to be read-only.  In other words, if you
only define the path from B to A, then A should never be updated
locally.  Otherwise, if you allow updates in A they will not be
replicated to B and again the databases will be inconsistent with each
other.

Another use for Replicator is to distribute a data set.  Instead of
having five customer tables each with 3,245 rows, you could have each of
what you're calling "remote" nodes would have a portion of the table,
e.g., node B only customers with numbers from 1000 to 1999, node C only
customer numbers from 2000 to 2999, and so forth (this is just an
example and is NOT a good production technique).  The "master" node
could then have the full table, possibly with its own customers.  This
scheme distributes the data set in what is called horizontal
partitioning.  It's analogous to a five layer cake in the "master" node
with each "remote" node having a copy of their "slice" of the cake.
>From this perspective, each "remote" slice can be viewed as full peer to
read-only and the "master" slice (if it exists) isn't replicated
anywhere.  If a customer is added at node B it's replicated to A so that
users at the master node can see the entire table, but users at A
shouldn't update "remote" slices in the full table at A.

Please review the above and advise whether the tables that you intend to
replicate are going to be full copies of each other at each remote node
and the master or whether they will be horizontally partitioned.


Joe Abbate 
Senior Software Engineer 
Computer Associates 
[EMAIL PROTECTED] 

 


________________________________

        From: Ajay Dalvi [mailto:[EMAIL PROTECTED] 
        Sent: Wednesday, August 31, 2005 4:17 AM
        To: [email protected]
        Cc: Abbate, Joseph M
        Subject: RE: [Users] Related to replication from multiple
database nodestoonedatabase node
        
        
        Hi Joseph,
            thanks for your guidance.
        I would like to add some few points in my yesterdays mails.
        
        As per my requirement the various peers ( remote nodes) will
just feed a single
        repository of all data.
        I will explain this with a diagram
        My current setup can be depicted as follows
        
                B        C
                  \        /
                     \     /
                        A
                     /     \
                   /        \
                 D         E
                  
        Node A will be Master node and all other nodes are remote nodes.
The requirement is that from all remote nodes (B,C,D,E)  data should be
replicated to the master node (A). In this case master node A is not
going to forward the data to other remote nodes.
              So in above case the data propagation paths will be from
remote nodes to master node.
                With the numbers to the nodes A to E as 10,20,30,40,50
                Data propagation paths will be as follows
                
               Original     Local     Target
                20             20           10
                30             30           10
                40             40           10
                50             50           10
        
        If I defined the paths as above, what is happening is that at
each remote node end,records are getting transfered in to the input
queue but they are not getting processed further as if all the nodes are
getting deadlock.
                But if I add atleast one path in the  reverse way. i.e.
from master node to remote node
             i.e.
                Original     Local     Target
                20             20           10
                10             10           20  //extra added path from
master to remote
                30             30           10
                40             40           10
                50             50           10
            with above paths replication setup works fine and data from
remote nodes gets transfered to master nodes. But the caveat is: if any
record is changed on master(a.k.a. Repository Server) will try to
enforce replication of that record onto Remote. If the record exists on
the remote server (e.g. B or 20 in above case) it gets modified. But if
it does not exist on the remote server B (the reason could be that this
record was replicated from C or 30) then there is an error "archive
append error". This is obviously not desired.
            
            I would like to know if there is a way that will allow me to
avoid  this extra path from master to remote node just to make it work. 
        OR
        If there is a way to define one-way replication where bunch of
nodes replicate data to one repository node. 
        
        I would appreciate if you can confirm if this is the way Ingres
replication works, i.e. we need to define paths from every node to every
other node in the replication setup?
        
        
        
        Thanks
        
        -Ajay
        
        
        
        
        On Mon, 2005-08-29 at 20:17, Abbate, Joseph M wrote: 

                Hi Ajay,
                
                You wrote ...
                
                > Node A will be Master node and all other nodes are
remote nodes. The
                requirement
                > is that from all remote nodes (B,C,D,E)  data should
be replicated to
                the master
                > node (A). In this case master node A is not going to
forward the data
                to other
                > remote nodes.
                
                If node A does NOT forward data to the other nodes, WHO
will?  Consider
                a customer table.  Unless the table is horizontally
partitioned, so that
                node A has one segment, B has another segment, etc., and
the master node
                has all four segments, any update to a customer in B
*has* to be
                propagated to all other four nodes, so either B
replicates directly to
                the other four, or it replicates to the master and the
master forwards,
                or some combination thereof.  Otherwise the address of
some customer
                will be correct in B and A but not in C, D or E.
                
                > So is there any way that will allow me to avoid  this
extra path from
                master to
                > remote node.
                
                As I stated earlier, you have to imagine yourself at
*each* node and
                ensure that an update reaches *all* desired targets.  I
only started to
                give you the configuration, but as I said you have to
have 20 paths when
                you're done.  This is the full configuration for a star,
five-node
                scheme:
                
                  Orig   Local   Target
                   20      20      10       B copies to master
                   20      10      30       Master sends on to other
three
                   20      10      40
                   20      10      50
                   30      30      10       C copies to master
                   30      10      20       Master sends on to other
three
                   30      10      40
                   30      10      50
                   40      40      10       D copies to master
                   40      10      20       Master sends on to other
three
                   40      10      30
                   40      10      50
                   50      50      10       E copies to master
                   50      10      20       Master sends on to other
three
                   50      10      30
                   50      10      40
                   10      10      20       Master propagates to all
four
                   10      10      30       (for completeness)
                   10      10      40
                   10      10      50
                
                The last four may never be used but they should still be
specified.
                
                Regards,
                
                
                Joe Abbate
                Senior Software Engineer
                Computer Associates
                [EMAIL PROTECTED]
                _______________________________________________
                Users mailing list
                [email protected]
                http://ingres.ca.com/mailman/listinfo/users
<http://ingres.ca.com/mailman/listinfo/users> 

_______________________________________________
Users mailing list
[email protected]
http://ingres.ca.com/mailman/listinfo/users

RE: [Users] Related to replication from multiple database nodestoonedatabase node

Reply via email to