RE: [U2] Fastest Bi-Directional data transfer btwn MV and non MV dbms [AD}

Baker Hughes Wed, 24 Oct 2007 15:31:49 -0700

Thank you Robert and Janet. Overly kind of you Robert to take the time
to distill some insights into this reply.


You give more consideration to the overhead of data Transformation and
make an almost convincing argument to do it on the dedicated target,
assumedly something relational/non-MV. The anecdote you give is an
interesting one about the benchmark attempt, which sounded half-baked by
the MV programmers. I'd still be interested to see a real comparative
benchmark with thorough transformation done on the MV side before
jettison. [Ad] I've written and extensive ETL myself that was used to
"normalize"/extract MV data from 27 UniData systems [due to their
untimely merger-induced demise]. I even used WRITESEQ's instead of
WRITEBLK and it was still extremely fast. [/Ad] Most of us have a long
history of transformation if we've been doing EDI - flattening our
dimensioned data into the ANSI standards. I honestly raised an eyebrow
at your thought that non-MV DB could transform MV data better/faster.
But you've done a good bit of it and apparently written some things to
accomplish it, and I revere your experience at this.

hmmm ... maybe the transformation issue (and others you've outlined to a
lesser extent) is why it's such a long leap for MV-based BI tools to
mash disparate data stores.

Sincere regards,
-Baker

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Janet Bond
Sent: Wednesday, October 24, 2007 1:35 PM
To: u2-users@listserver.u2ug.org
Subject: RE: [U2] Fastest Bi-Directional data transfer btwn MV and non
MV dbms [AD}

As promised here is Robert Houben's input to your question Baker!!! :)

For anyone who doesn't know me, I was the lead designer and developer of
the PK Harmony product which we demoed at PC Labs at the Spectrum show
in 1986 (over 20 years ago!)  I've been involved in data communications
since the early 1980's and I'm still intimately involved in it, so I
think that I have some expertise in the matter! ;)

I put the ad marker in so the moderators won't flip.  I don't believe
that anyone markets PK Harmony anymore (that was another company) so I
shouldn't need it for that, but just in case...  Also, I may
accidentally reference some products that I worked on that my present
company markets, so we'll have to comply! ;)  What I say here can be
applied to any product currently on the market.

There are several factors that affect throughput and performance when
transferring data between systems (any systems).  I'll detail these and
then go through them, with some special emphasis for how they are
impacted by MultiValue processing.  I use SQL Server as the example
target. In some cases your target is different, but most of what I say
is either still relevant or at the very least, worth thinking about:

- I/O bandwidth and contention
- CPU speed and contention
- Disk bandwidth and contention
- Synchronization
- End to end latency
- Transformation

I/O Bandwidth and Contention:
=============================
The first thing to look at is I/O bandwidth and contention.  There are
products that you can get that will allow you to set up two endpoints
and push data through, and measure the throughput.  If you have a 10MBit
LAN, you will never exceed 10 MBits.  If you have a busy network, and
your two endpoints need to go through multiple routers, you will
undoubtedly have less than 10 MBits (or 100MBits) to work with.  There
is a hard limit, determined by your network environment, to how much
data you can push through.  Although this is not usually the most
limiting factor, I've been amazed when people who had smoking throughput
pushing data between two applications on the same machine, are surprised
when they lose a ton of performance when they move one of these
application to another system and they suddenly run into a bottleneck on
the network.

CPU Speed and Contention:
=========================
The other thing to consider is CPU speed and contention.  On a typical
MultiValue system, you will find yourself disk constrained, but if you
are doing a lot of transformation (we'll look at that later) then you
may find that this is a limiting factor.  The other thing to consider is
that whenever you can push processing from a shared CPU resource (your
MultiValue system) to a dedicated resource (the client's desktop), you
can significantly increase performance.

Disk Bandwidth and Contention:
==============================
Next up is Disk bandwidth and contention.  This can be a hugely
significant factor.  If you look at most OLTP type, MultiValue
applications, you will see that the CPU sits mostly idle (seems over the
years to average about 10%).  Not all of this is file access, BTW, in
many cases what you are encountering is context switches and internal
program space being managed in virtual memory.  Again, as with CPU,
moving as much of that from the shared resource to the dedicated
resource as you can will ALWAYS be a good thing for performance.

Synchronization:
================
Next is synchronization.  Actually, most MultiValue databases are MUCH
better at this than SQL Server! :)  Still, whenever you run the risk of
contention over locks, you can encounter significant performance
problems.  In most cases when doing this type of thing, on the
MultiValue side, you will be reading or writing without any locks.  You
may need to think about what happens if another user is on the system
and tries to write to the same record you are writing to.  When this
happens you have no reasonable choice but to take the hit.  On SQL
Server, you want to choose the cursor model that best suits what you are
doing, and possibly force an exclusive table lock, or just do it when no
one is on the system.  On an "almost related" note, you may wish to size
your SQL Database *before* you start the push.  SQL Server will
automatically resize the database, but this is expensive.  You are
better off to size it first, then do the push.

End to End Latency:
===================
End to end latency is another issue.  Multi-threaded systems allow you
to be retrieving and transforming data while you are also working with
the previous row.  This type of processing does not tend to happen on
the MultiValue system.  You really need to use the dedicated resource to
do this for you.

Transformation:
===============
Finally, we come to Transformation.  This is the kicker.  [AD]I had a
prospect who was looking at our Direct product, who also had some people
who wrote a program.  This program took their MultiValue data, and
pushed it raw to a file on disk at the other end.  Then they tried to
compare that to what we were doing.  The problem with that approach was
that they had MultiValues and SubValue marks, they had dates, times,
masked decimals and other unusual constructs that were meaningless to
any non-MultiValue target that they could have chosen.  Needless to say,
their home-grown benchmark app outperformed our product.  It also
happened to be a meaningless comparison. [/AD]

Someone has to process the MultiValues, SubValues and data types.  Doing
it in BASIC, which on all MultiValue systems is a stack-based language
has performance issues associated with it.  If you are familiar with the
Immutable string issue in Java and .NET and the reason why you use
StringBuilder or StringBuffer classes to process changing strings in
these languages, MultiValue BASIC actually has the same issue under the
covers.  It also garbage collects, so the comparison is amazingly
accurate.  Doing this on the MultiValue side causes performance
problems.

Evolution of MultiValue Data Transfer:
======================================
So, in the evolution of data transfer products that I've been involved
in over the years, a number of milestones have been reached, and these
are some of them:

Serial I/O Replaced with TCP/IP:
================================
The original PK Harmony (and even original ODBC) products allowed you to
use Serial I/O to communicate with the MultiValue systems.  In many
cases, that was the only available way at the time.  There were problems
with buffer sizes, and lossy boundaries in Serial I/O, that required you
to have an error correcting packeting structure at both ends.  This
meant that you were doing this type of stuff in MultiValue/BASIC.
Yuck!!!  The move to TCP/IP for communications allowed us to stop
worrying about these things and just stream the data out with minimal
packeting structure.

ANSI SQL:
=========
Relational products require a relational engine. That engine must reside
on the database.  The transformation effort of taking a complex ANSI
compliant SQL statement and translating it to run *correctly* on a
MultiValue system often overshadows all other performance
characteristics.  Some products in the past have taken shortcuts. These
shortcuts result in SQL Statements that return inconsistent results,
depending on the fields you reference (MultiValue/SubValue counts
change). If you don't take the shortcuts, you get hit with performance.
Sometimes you just can't win... :(

Shared Resources vs. Dedicated:
===============================
[AD]We finally made a decision to produce a product set that did not
require ANSI SQL, that allowed us to push the raw data and a metadata
record (from our mapping tool) to the dedicated resource, so that the
dedicated resource could do the heavy lifting.  This was our Direct
product set.  We feel that this hits the sweet spot.[/AD]

The Sweet Spot:
===============
Over my more-than 20 years of MultiValue data communications, I've come
to see a certain set of characteristics as a sweet spot.  Here, for what
it's worth, are those characteristics of a data transfer solution:

- Favor dedicated resources to shared
- Do transformation on the dedicated resource
- Streaming I/O using transport layer
- As little packeting structure as possible
- Avoid imposing ANSI SQL on MultiValue - recognize the differences and
get over them
- Think about synchronization issues - they may be unavoidable, but
where they aren't they can cost you big time
- Use multi-threading to mitigate end-to-end delay



Robert Houben
CTO

Logo: FusionWare Corporation - Enterprise Service Bus (ESB),
Service-Oriented Architecture (SOA)

604-633-9891 #158
 mailto:[EMAIL PROTECTED]
http://www.fusionware.net


/AD


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Baker Hughes
Sent: Tuesday, October 23, 2007 12:15 PM
To: u2-users@listserver.u2ug.org
Subject: RE: [U2] Fastest Bi-Directional data transfer btwn MV and non
MV dbms

Janet,

<snip/>
I can setup a conference call with one of Developers.

We have been in the transferring MultiValue data to other data sources
since the early 80's (PK Harmony to start with, anyone remember). We may
have some good input for you.

</snip>
I'm not in a position to buy anything, really just trying to think
through the questions posted.
It would be lovely to have your developer join the thread and describe
how PKH/FW does it's magic.
Not expecting him to share code, of course, just a few thoughts about
your approach is all.

Sorry to draw you into the cross fire, that's why I said what I did
about ads; maybe I should've put it at the top though.

sincere regards,
-Baker
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/

RE: [U2] Fastest Bi-Directional data transfer btwn MV and non MV dbms [AD}

Reply via email to