Thank you Robert and Janet. Overly kind of you Robert to take the time to distill some insights into this reply.
You give more consideration to the overhead of data Transformation and make an almost convincing argument to do it on the dedicated target, assumedly something relational/non-MV. The anecdote you give is an interesting one about the benchmark attempt, which sounded half-baked by the MV programmers. I'd still be interested to see a real comparative benchmark with thorough transformation done on the MV side before jettison. [Ad] I've written and extensive ETL myself that was used to "normalize"/extract MV data from 27 UniData systems [due to their untimely merger-induced demise]. I even used WRITESEQ's instead of WRITEBLK and it was still extremely fast. [/Ad] Most of us have a long history of transformation if we've been doing EDI - flattening our dimensioned data into the ANSI standards. I honestly raised an eyebrow at your thought that non-MV DB could transform MV data better/faster. But you've done a good bit of it and apparently written some things to accomplish it, and I revere your experience at this. hmmm ... maybe the transformation issue (and others you've outlined to a lesser extent) is why it's such a long leap for MV-based BI tools to mash disparate data stores. Sincere regards, -Baker -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Janet Bond Sent: Wednesday, October 24, 2007 1:35 PM To: u2-users@listserver.u2ug.org Subject: RE: [U2] Fastest Bi-Directional data transfer btwn MV and non MV dbms [AD} As promised here is Robert Houben's input to your question Baker!!! :) For anyone who doesn't know me, I was the lead designer and developer of the PK Harmony product which we demoed at PC Labs at the Spectrum show in 1986 (over 20 years ago!) I've been involved in data communications since the early 1980's and I'm still intimately involved in it, so I think that I have some expertise in the matter! ;) I put the ad marker in so the moderators won't flip. I don't believe that anyone markets PK Harmony anymore (that was another company) so I shouldn't need it for that, but just in case... Also, I may accidentally reference some products that I worked on that my present company markets, so we'll have to comply! ;) What I say here can be applied to any product currently on the market. There are several factors that affect throughput and performance when transferring data between systems (any systems). I'll detail these and then go through them, with some special emphasis for how they are impacted by MultiValue processing. I use SQL Server as the example target. In some cases your target is different, but most of what I say is either still relevant or at the very least, worth thinking about: - I/O bandwidth and contention - CPU speed and contention - Disk bandwidth and contention - Synchronization - End to end latency - Transformation I/O Bandwidth and Contention: ============================= The first thing to look at is I/O bandwidth and contention. There are products that you can get that will allow you to set up two endpoints and push data through, and measure the throughput. If you have a 10MBit LAN, you will never exceed 10 MBits. If you have a busy network, and your two endpoints need to go through multiple routers, you will undoubtedly have less than 10 MBits (or 100MBits) to work with. There is a hard limit, determined by your network environment, to how much data you can push through. Although this is not usually the most limiting factor, I've been amazed when people who had smoking throughput pushing data between two applications on the same machine, are surprised when they lose a ton of performance when they move one of these application to another system and they suddenly run into a bottleneck on the network. CPU Speed and Contention: ========================= The other thing to consider is CPU speed and contention. On a typical MultiValue system, you will find yourself disk constrained, but if you are doing a lot of transformation (we'll look at that later) then you may find that this is a limiting factor. The other thing to consider is that whenever you can push processing from a shared CPU resource (your MultiValue system) to a dedicated resource (the client's desktop), you can significantly increase performance. Disk Bandwidth and Contention: ============================== Next up is Disk bandwidth and contention. This can be a hugely significant factor. If you look at most OLTP type, MultiValue applications, you will see that the CPU sits mostly idle (seems over the years to average about 10%). Not all of this is file access, BTW, in many cases what you are encountering is context switches and internal program space being managed in virtual memory. Again, as with CPU, moving as much of that from the shared resource to the dedicated resource as you can will ALWAYS be a good thing for performance. Synchronization: ================ Next is synchronization. Actually, most MultiValue databases are MUCH better at this than SQL Server! :) Still, whenever you run the risk of contention over locks, you can encounter significant performance problems. In most cases when doing this type of thing, on the MultiValue side, you will be reading or writing without any locks. You may need to think about what happens if another user is on the system and tries to write to the same record you are writing to. When this happens you have no reasonable choice but to take the hit. On SQL Server, you want to choose the cursor model that best suits what you are doing, and possibly force an exclusive table lock, or just do it when no one is on the system. On an "almost related" note, you may wish to size your SQL Database *before* you start the push. SQL Server will automatically resize the database, but this is expensive. You are better off to size it first, then do the push. End to End Latency: =================== End to end latency is another issue. Multi-threaded systems allow you to be retrieving and transforming data while you are also working with the previous row. This type of processing does not tend to happen on the MultiValue system. You really need to use the dedicated resource to do this for you. Transformation: =============== Finally, we come to Transformation. This is the kicker. [AD]I had a prospect who was looking at our Direct product, who also had some people who wrote a program. This program took their MultiValue data, and pushed it raw to a file on disk at the other end. Then they tried to compare that to what we were doing. The problem with that approach was that they had MultiValues and SubValue marks, they had dates, times, masked decimals and other unusual constructs that were meaningless to any non-MultiValue target that they could have chosen. Needless to say, their home-grown benchmark app outperformed our product. It also happened to be a meaningless comparison. [/AD] Someone has to process the MultiValues, SubValues and data types. Doing it in BASIC, which on all MultiValue systems is a stack-based language has performance issues associated with it. If you are familiar with the Immutable string issue in Java and .NET and the reason why you use StringBuilder or StringBuffer classes to process changing strings in these languages, MultiValue BASIC actually has the same issue under the covers. It also garbage collects, so the comparison is amazingly accurate. Doing this on the MultiValue side causes performance problems. Evolution of MultiValue Data Transfer: ====================================== So, in the evolution of data transfer products that I've been involved in over the years, a number of milestones have been reached, and these are some of them: Serial I/O Replaced with TCP/IP: ================================ The original PK Harmony (and even original ODBC) products allowed you to use Serial I/O to communicate with the MultiValue systems. In many cases, that was the only available way at the time. There were problems with buffer sizes, and lossy boundaries in Serial I/O, that required you to have an error correcting packeting structure at both ends. This meant that you were doing this type of stuff in MultiValue/BASIC. Yuck!!! The move to TCP/IP for communications allowed us to stop worrying about these things and just stream the data out with minimal packeting structure. ANSI SQL: ========= Relational products require a relational engine. That engine must reside on the database. The transformation effort of taking a complex ANSI compliant SQL statement and translating it to run *correctly* on a MultiValue system often overshadows all other performance characteristics. Some products in the past have taken shortcuts. These shortcuts result in SQL Statements that return inconsistent results, depending on the fields you reference (MultiValue/SubValue counts change). If you don't take the shortcuts, you get hit with performance. Sometimes you just can't win... :( Shared Resources vs. Dedicated: =============================== [AD]We finally made a decision to produce a product set that did not require ANSI SQL, that allowed us to push the raw data and a metadata record (from our mapping tool) to the dedicated resource, so that the dedicated resource could do the heavy lifting. This was our Direct product set. We feel that this hits the sweet spot.[/AD] The Sweet Spot: =============== Over my more-than 20 years of MultiValue data communications, I've come to see a certain set of characteristics as a sweet spot. Here, for what it's worth, are those characteristics of a data transfer solution: - Favor dedicated resources to shared - Do transformation on the dedicated resource - Streaming I/O using transport layer - As little packeting structure as possible - Avoid imposing ANSI SQL on MultiValue - recognize the differences and get over them - Think about synchronization issues - they may be unavoidable, but where they aren't they can cost you big time - Use multi-threading to mitigate end-to-end delay Robert Houben CTO Logo: FusionWare Corporation - Enterprise Service Bus (ESB), Service-Oriented Architecture (SOA) 604-633-9891 #158 mailto:[EMAIL PROTECTED] http://www.fusionware.net /AD -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Baker Hughes Sent: Tuesday, October 23, 2007 12:15 PM To: u2-users@listserver.u2ug.org Subject: RE: [U2] Fastest Bi-Directional data transfer btwn MV and non MV dbms Janet, <snip/> I can setup a conference call with one of Developers. We have been in the transferring MultiValue data to other data sources since the early 80's (PK Harmony to start with, anyone remember). We may have some good input for you. </snip> I'm not in a position to buy anything, really just trying to think through the questions posted. It would be lovely to have your developer join the thread and describe how PKH/FW does it's magic. Not expecting him to share code, of course, just a few thoughts about your approach is all. Sorry to draw you into the cross fire, that's why I said what I did about ads; maybe I should've put it at the top though. sincere regards, -Baker ------- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/ ------- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/ ------- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/