Re: [U2] Migration (OT)

Dawn Wolthuis Fri, 24 Dec 2010 20:06:29 -0800

Oh wow, it sure is hard not to jump in on this one, but it's Christmas
Eve so I will render quick opinion and hope that the conversation is
still going on when I really have a chance to respond.


A couple of opinions --

1. Were it not for the standardization of SQL, a majority of BI would
be more easily accomplished using data modeled for and stored in an MV
database over a SQL DBMS.

2. The MV model is more the future than is the legacy SQL DBMS data
model. Thinking that the RDBMS model has trumped the MV model makes
sense on the one hand, but the RDBMS model is now on the wane while
the MV model is part of the group of data models that will wax again,
some under the category "NoSQL." So, were I to write new software
(hey, I am working with a team that is doing just that), I would
choose one of the non-RDBMS models (Ok, I chose Pick/MV) rather than
the old-fashioned, restrictive, and so-last-century SQL DBMS tools. I
did choose a database with the fastest SQL implementation against the
MV model, however. Given that for some applications SaaS software will
replace software deployed by other means, user organizations will care
a lot less about whether the DBMS is Oracle or some other big dog.

That's it for now, back to family. Merry Christmas and Happy Holidays.  --dawn

On Fri, Dec 24, 2010 at 12:39 PM, Robert Houben <robert.hou...@fwic.net> wrote:
> Yes, you were close.
>
> I deal with lots of different MV platforms, and depending on what platform 
> you are on, indexing either doesn't exist or may or may not use b-tree (most 
> do).  In my experience, if you are trying to use more than one field to sort, 
> at best, MV will give you the first one sorted, then it has to manually do 
> the others.   So, let me amend my statement to read like this: "Sorting tends 
> to be less efficient in MV than in SQL."
>
> I'm not really sure if MV is more efficient than SQL on reads. Sometimes it 
> looks like it is, sometimes it looks like it isn't.  I admit that I waffle on 
> that.  They are close enough that it doesn't make a big difference, but I'm 
> inclined anecdotally to lean to the feeling that MV is maybe a bit faster.  
> It's hard to say, since the moment you include actual multivalues, MV wins 
> hands-down, because of the disk-access situation.
>
> My reference to inserting into a file with a modulo of 1 was in counterpoint 
> to my observation that a poorly sized SQL database can cause really bad 
> performance on inserts.  My point is, if you don't get the design right, your 
> mileage will vary regardless of the dbms.  With any dbms, failure to do 
> reasonable administration will bite you in the end, and it's unfair to blame 
> the dbms for that.  That's operator error.
>
> The fact that relational, by definition, encourages normalization, and the 
> resulting distributed table structure from that normalization, ensures that a 
> relational database will require more indexes.  However, if you decide to 
> create an MV account with all the files normalized, as we were discussing, 
> and use indexes to link them, then you will have much the same situation.  If 
> you nest the data in the traditional MV manner, then you will have way less 
> indexes, and the elimination of duplicated keys will reduce you disk 
> requirements significantly.  Again, if you use MV to take advantage of 
> multivalues and what they give you, you can gain some significant benefits 
> from it.
>
> MV can be much more suited to online transactional stuff.  Relational is much 
> better at generic reporting, data mining and things of that ilk, simply 
> because it provides standard metadata that follows relational set theory, 
> which, for better or worse, is how BI and data mining tools do their thing.  
> It also provides discoverable metadata, based on standards, that the tools 
> vendors understand and require.  Each platform has its benefits and its 
> trade-offs.
>
> The thought of writing and maintaining a full-blown application using SQL as 
> the underlying dbms compared with MV makes my head hurt (I've done both.)  
> With the relational approach you wind up having to create an object model 
> that exposes the relationships.  Effectively, what the DMBS won't do for you, 
> you have to do for yourself, in code.  This becomes brittle and difficult to 
> maintain.  MV is *so* much easier to write an application in.
>
> Unfortunately, the debate over which is better is a bit late.  The train has 
> left the station.  If I was writing a new app, the difficulty of marketing 
> and selling a "proprietary" MV platform is a serious problem that I would 
> have to overcome in my business plan.  How do I convince my prospects that 
> the vendor is stable, when I'm not convinced of that myself in some cases?  
> Even where I think they are, you say the vendor's name and the prospect says 
> "Who?" That's a real problem!
>
> I'm also not aware of a new application, built from scratch, of any 
> significant size, that has been written against MV for a long time.  Does 
> anyone out there know of one? I'm really curious if one has been written and 
> achieved significant market penetration.  I'm not considering new releases of 
> old apps like R&R or Dynix or Epicor, I'm talking about a new product written 
> from scratch.
>
> Anyways, we've digressed seriously from our original discussion.  The point 
> is, there are fairly common SQL constructs that defy simple migration to MV, 
> just as there are unusual uses of multivalues, subvalues, and transient data 
> types in MV that defy simple migration to SQL.  You could probably move 
> 75-80% of your tables to MV if you decided how to handle multi-part keys.  
> Then you'd have to look for the implicit, hidden, foreign key relationships 
> that your automatic search missed, and figure out how to deal with them. And 
> you'd have to probably auto-create dictionaries and define indexes to enable 
> your app to efficiently find all the children in a parent/child relationship.
>
> You may also choose to merge some child relationships into the parent MV file 
> instead of making it a separate table, but you could do that in a separate 
> step.
>
> In the end, there will undoubtedly be some application-specific stuff that 
> you do, but then, that's almost always the case, when dealing with a dbms, 
> isn't it!
> -----Original Message-----
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Mecki Foerthmann
> Sent: Friday, December 24, 2010 9:25 AM
> To: U2 Users List
> Subject: Re: [U2] Migration (OT)
>
> So I was more or less right then. ;-)
>
> Afaik mv-indexing uses a b-tree structure for better performance.
> Therefore your statement that "you don't get anything pre-sorted" might not 
> be quite right.
>
> And which one of your statements shall we believe when you claim that "a read 
> on a primary key is about as efficient as a MV hashed read" and "for direct 
> reads, MV seems to have a slight advantage"? It's either one or the other.
> And talking about inserting millions of records into a file with a modulo of 
> 1 isn't really helpful now, or is it?
>
> I use relational for data warehousing, analytics and reporting myself.
> Relational can be fast but once you start using joins...
> Nevertheless, in a well designed mv database you will need a lot less 
> indexing than in a similar relational one.
> And as you admit yourself, when it comes to writing (inserting) mv can't be 
> beat.
> So for operational data, mv would be my choice of db anytime.
>
> And no, we are not talking about just creating a relational model using a 
> mv-database.
> That defies the purpose and is what this tread imho was about.
> After all first normal form is just a sub-set of mv, while in sql-land it's 
> the only one you have!
> MV only excels when you convert dependant sub-tables into mv fields.
> And that's where any attempts to fully automate this process has to fail.
>
>
> On 24/12/2010 15:50, Robert Houben wrote:
>> A read on a primary key is about as efficient as an MV hashed read.  Each 
>> has their trade-offs. Get the modulo wrong and your MV read can be nasty.  
>> You can get a SQL table in trouble, too.  You really can't beat getting all 
>> your data in one disk head movement, but we were talking about just creating 
>> a single PICK file for each table, to keep the migration "simple".
>>
>> There are some excellent articles on how indexes work in SQL Server that you 
>> may wish to read.  Your understanding below is partially correct in places, 
>> but indexes do NOT copy all the data.  Just the indexed fields. Note that a 
>> PICK index has to copy all the item-ids and all the indexed fields, but you 
>> don't get anything pre-sorted, just hashed. Note that the primary key sorts 
>> the actual table. All subsequent indexes are actually sorted copies of the 
>> indexed columns with keys or some other references to the real records. If 
>> your query only uses fields in the index, you won't ever read the real 
>> table's data.
>>
>> SQL will beat MV every time when you sort fields that are indexed.
>> For direct reads, MV seems to have a slight advantage.  Inserts and
>> updates that affect indexed fields are slower in SQL (inserts are
>> painfully slow if you fail to size your SQL table well, but try
>> "inserting" millions of records into a file with a modulo of 1...)
>>
>> Here's one short article with some diagrams that show what goes on in 
>> clustered and non-clustered indexes:
>> http://technet.microsoft.com/en-us/library/aa964133(SQL.90).aspx
>>
>>
>> -----Original Message-----
>> From: u2-users-boun...@listserver.u2ug.org
>> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Mecki
>> Foerthmann
>> Sent: Friday, December 24, 2010 5:56 AM
>> To: U2 Users List
>> Subject: Re: [U2] Migration
>>
>> I was under the impression that when a relational table is being "indexed" 
>> the DBMS creates and maintains a sorted copy of the original table for every 
>> indexed field.
>> That means for clustered indices tables sorted by every conceivable 
>> combination need to be maintained, having a huge impact on space 
>> requirements and performance.
>> So when you are looking for a row or rows that match the indexed key,
>> in a RDBMS an algorithm can be used to locate the row(s) instead of
>> reading in the whole table and comparing every single row. So you
>> could for instance go to the middle of the (already pre-sorted) table
>> (if you have
>> 1000000 Million rows in your table you read row 500000) and check if the key 
>> is greater equal or smaller than the value you are looking for and carry on 
>> that way until you have a match. That way you eliminate half the number of 
>> rows you need to compare with every read. Of course they probably use much 
>> more sophisticated algorithms these days.
>> But regardless what algorithm they may use it has to be slower than the 
>> hashing algorithm used by mv as long as you have well sized files using 
>> sensible item ids.
>> To save space some RDBMS may also have implemented reduced datasets so they 
>> may just hold the indexed keys in the row instead of duplicating the whole 
>> data; in which case the primary index somehow needs to be used to retrieve 
>> the data in the row afterwards.
>> So as you can see even in a RDBMS there is calculation going on when indices 
>> are used.
>> I would actually go so far to say that relational databases don't use real 
>> indices at all. They just duplicate the dataset sorted in different ways.
>> But that of course is a matter of how you define what a "real" index is 
>> supposed to be.
>>
>> Well, and when you want to "sort" then you just read the already sorted 
>> table into memory instead of the original - so it can be a lot faster than 
>> reading a list of item ids from an index file and then reading the items one 
>> by one from the data file using the hashing algorithm as it is done in the 
>> mv-world.
>> That is also the reason why mv can only use one index at a time and why we 
>> don't need joins.
>>
>>
>> On 24/12/2010 04:45, fft2...@aol.com wrote:
>>> In a message dated 12/23/2010 4:28:38 PM Pacific Standard Time,
>>> antli...@youngman.org.uk writes:
>>>
>>>
>>>>> SQL uses indexes.  MV uses cross references to item-ids (MV
>>>>> sometimes
>>>> supports indexes, but they don't always work as well as in the
>>>> relational
>>>> world.)
>>>> I don't know as that is true ... or are you using the word "index"
>>>> to mean something completely different to me? I'll agree the
>>>> implementation of indices can be buggy, but surely that's true of 
>>>> relational engines too?
>>>>
>>> I'm not quite sure I'm confortable with the idea (expressed in the
>>> prior-prior posting of which I here quote and enquote the reponse)
>>> that MV uses cross-references to item-ids.
>>>
>>> To me a hash table, isn't the same thing as a cross-reference which
>>> sounds a lot like a secondary key.  Hashing calculates an exact jump
>>> point at which a group of related records are kept.  They are related
>>> by having the same hash value.  But the hash value itself isn't looked up, 
>>> it's a calculation.
>>>
>>> I wonder if you can setup a first normal form table in such a way,
>>> that it maintains a constant sorted order ?  Sorting on the primary
>>> key, would then be merely display time bound, there is no effort to
>>> it.  I suppose you could even pick up and lay down the database
>>> periodically so the sort order matches the actual disk layout.  Pick
>>> could never do something like that.  There is always going to be
>>> effort involved in any sorting, even if you're simply traversing the index 
>>> tree and grabbing the underlying data records.
>>>
>>> But back to reality, I don't think SQL works this way anyway.  The
>>> perception that it sorts much faster is probably related more closely
>>> to the horsepower behind the scenes.  Pick systems tend to be
>>> installed on slower systems because they are so efficient and most
>>> users are cheap with their database, and expensive with their
>>> graphics.  So they install the SQL type databases on speedy machines.
>>> _______________________________________________
>>> U2-Users mailing list
>>> U2-Users@listserver.u2ug.org
>>> http://listserver.u2ug.org/mailman/listinfo/u2-users
>>>
>> _______________________________________________
>> U2-Users mailing list
>> U2-Users@listserver.u2ug.org
>> http://listserver.u2ug.org/mailman/listinfo/u2-users
>> _______________________________________________
>> U2-Users mailing list
>> U2-Users@listserver.u2ug.org
>> http://listserver.u2ug.org/mailman/listinfo/u2-users
>>
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
>



-- 
Dawn M. Wolthuis

Take and give some delight today
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] Migration (OT)

Reply via email to