RE: [U2] Fastest Bi-Directional data transfer btwn MV and non MV dbms [AD}

Janet Bond Wed, 24 Oct 2007 11:57:24 -0700

As promised here is Robert Houben's input to your question Baker!!! :)

For anyone who doesn't know me, I was the lead designer and developer of the PK 
Harmony product which we demoed at PC Labs at the Spectrum show in 1986 (over 
20 years ago!)  I've been involved in data communications since the early 
1980's and I'm still intimately involved in it, so I think that I have some 
expertise in the matter! ;)


I put the ad marker in so the moderators won't flip.  I don't believe that 
anyone markets PK Harmony anymore (that was another company) so I shouldn't 
need it for that, but just in case...  Also, I may accidentally reference some 
products that I worked on that my present company markets, so we'll have to 
comply! ;)  What I say here can be applied to any product currently on the 
market.

There are several factors that affect throughput and performance when 
transferring data between systems (any systems).  I'll detail these and then go 
through them, with some special emphasis for how they are impacted by 
MultiValue processing.  I use SQL Server as the example target. In some cases 
your target is different, but most of what I say is either still relevant or at 
the very least, worth thinking about:

- I/O bandwidth and contention
- CPU speed and contention
- Disk bandwidth and contention
- Synchronization
- End to end latency
- Transformation

I/O Bandwidth and Contention:
=============================
The first thing to look at is I/O bandwidth and contention.  There are products 
that you can get that will allow you to set up two endpoints and push data 
through, and measure the throughput.  If you have a 10MBit LAN, you will never 
exceed 10 MBits.  If you have a busy network, and your two endpoints need to go 
through multiple routers, you will undoubtedly have less than 10 MBits (or 
100MBits) to work with.  There is a hard limit, determined by your network 
environment, to how much data you can push through.  Although this is not 
usually the most limiting factor, I've been amazed when people who had smoking 
throughput pushing data between two applications on the same machine, are 
surprised when they lose a ton of performance when they move one of these 
application to another system and they suddenly run into a bottleneck on the 
network.

CPU Speed and Contention:
=========================
The other thing to consider is CPU speed and contention.  On a typical 
MultiValue system, you will find yourself disk constrained, but if you are 
doing a lot of transformation (we'll look at that later) then you may find that 
this is a limiting factor.  The other thing to consider is that whenever you 
can push processing from a shared CPU resource (your MultiValue system) to a 
dedicated resource (the client's desktop), you can significantly increase 
performance.

Disk Bandwidth and Contention:
==============================
Next up is Disk bandwidth and contention.  This can be a hugely significant 
factor.  If you look at most OLTP type, MultiValue applications, you will see 
that the CPU sits mostly idle (seems over the years to average about 10%).  Not 
all of this is file access, BTW, in many cases what you are encountering is 
context switches and internal program space being managed in virtual memory.  
Again, as with CPU, moving as much of that from the shared resource to the 
dedicated resource as you can will ALWAYS be a good thing for performance.

Synchronization:
================
Next is synchronization.  Actually, most MultiValue databases are MUCH better 
at this than SQL Server! :)  Still, whenever you run the risk of contention 
over locks, you can encounter significant performance problems.  In most cases 
when doing this type of thing, on the MultiValue side, you will be reading or 
writing without any locks.  You may need to think about what happens if another 
user is on the system and tries to write to the same record you are writing to. 
 When this happens you have no reasonable choice but to take the hit.  On SQL 
Server, you want to choose the cursor model that best suits what you are doing, 
and possibly force an exclusive table lock, or just do it when no one is on the 
system.  On an "almost related" note, you may wish to size your SQL Database 
*before* you start the push.  SQL Server will automatically resize the 
database, but this is expensive.  You are better off to size it first, then do 
the push.

End to End Latency:
===================
End to end latency is another issue.  Multi-threaded systems allow you to be 
retrieving and transforming data while you are also working with the previous 
row.  This type of processing does not tend to happen on the MultiValue system. 
 You really need to use the dedicated resource to do this for you.

Transformation:
===============
Finally, we come to Transformation.  This is the kicker.  [AD]I had a prospect 
who was looking at our Direct product, who also had some people who wrote a 
program.  This program took their MultiValue data, and pushed it raw to a file 
on disk at the other end.  Then they tried to compare that to what we were 
doing.  The problem with that approach was that they had MultiValues and 
SubValue marks, they had dates, times, masked decimals and other unusual 
constructs that were meaningless to any non-MultiValue target that they could 
have chosen.  Needless to say, their home-grown benchmark app outperformed our 
product.  It also happened to be a meaningless comparison. [/AD]

Someone has to process the MultiValues, SubValues and data types.  Doing it in 
BASIC, which on all MultiValue systems is a stack-based language has 
performance issues associated with it.  If you are familiar with the Immutable 
string issue in Java and .NET and the reason why you use StringBuilder or 
StringBuffer classes to process changing strings in these languages, MultiValue 
BASIC actually has the same issue under the covers.  It also garbage collects, 
so the comparison is amazingly accurate.  Doing this on the MultiValue side 
causes performance problems.

Evolution of MultiValue Data Transfer:
======================================
So, in the evolution of data transfer products that I've been involved in over 
the years, a number of milestones have been reached, and these are some of them:

Serial I/O Replaced with TCP/IP:
================================
The original PK Harmony (and even original ODBC) products allowed you to use 
Serial I/O to communicate with the MultiValue systems.  In many cases, that was 
the only available way at the time.  There were problems with buffer sizes, and 
lossy boundaries in Serial I/O, that required you to have an error correcting 
packeting structure at both ends.  This meant that you were doing this type of 
stuff in MultiValue/BASIC. Yuck!!!  The move to TCP/IP for communications 
allowed us to stop worrying about these things and just stream the data out 
with minimal packeting structure.

ANSI SQL:
=========
Relational products require a relational engine. That engine must reside on the 
database.  The transformation effort of taking a complex ANSI compliant SQL 
statement and translating it to run *correctly* on a MultiValue system often 
overshadows all other performance characteristics.  Some products in the past 
have taken shortcuts. These shortcuts result in SQL Statements that return 
inconsistent results, depending on the fields you reference 
(MultiValue/SubValue counts change). If you don't take the shortcuts, you get 
hit with performance.  Sometimes you just can't win... :(

Shared Resources vs. Dedicated:
===============================
[AD]We finally made a decision to produce a product set that did not require 
ANSI SQL, that allowed us to push the raw data and a metadata record (from our 
mapping tool) to the dedicated resource, so that the dedicated resource could 
do the heavy lifting.  This was our Direct product set.  We feel that this hits 
the sweet spot.[/AD]

The Sweet Spot:
===============
Over my more-than 20 years of MultiValue data communications, I've come to see 
a certain set of characteristics as a sweet spot.  Here, for what it's worth, 
are those characteristics of a data transfer solution:

- Favor dedicated resources to shared
- Do transformation on the dedicated resource
- Streaming I/O using transport layer
- As little packeting structure as possible
- Avoid imposing ANSI SQL on MultiValue - recognize the differences and get 
over them
- Think about synchronization issues - they may be unavoidable, but where they 
aren't they can cost you big time
- Use multi-threading to mitigate end-to-end delay



Robert Houben
CTO

Logo: FusionWare Corporation - Enterprise Service Bus (ESB), Service-Oriented 
Architecture (SOA)

604-633-9891 #158
 mailto:[EMAIL PROTECTED]
http://www.fusionware.net


/AD


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Baker Hughes
Sent: Tuesday, October 23, 2007 12:15 PM
To: u2-users@listserver.u2ug.org
Subject: RE: [U2] Fastest Bi-Directional data transfer btwn MV and non MV dbms

Janet,

<snip/>
I can setup a conference call with one of Developers.

We have been in the transferring MultiValue data to other data sources
since the early 80's (PK Harmony to start with, anyone remember). We may
have some good input for you.

</snip>
I'm not in a position to buy anything, really just trying to think
through the questions posted.
It would be lovely to have your developer join the thread and describe
how PKH/FW does it's magic.
Not expecting him to share code, of course, just a few thoughts about
your approach is all.

Sorry to draw you into the cross fire, that's why I said what I did
about ads; maybe I should've put it at the top though.

sincere regards,
-Baker
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/

RE: [U2] Fastest Bi-Directional data transfer btwn MV and non MV dbms [AD}

Reply via email to