RE: physical order of rows on disk

Zabach, Elke Wed, 27 Mar 2002 04:11:48 -0800

Hi,

there are no plans to allow the definition of 'primary key nonclustered'.
If one defines a primary key this will always be the 'clustered index'.


BTW: what do you think, DB2 does if you say PRIMARY KEY NONCLUSTERED ?
         I think, we can assume, it will create a unique index.
 
Elke
SAP Labs Berlin


> -----Original Message-----
> From: Riesch Nicolas [mailto:[EMAIL PROTECTED]]
> Sent: Mittwoch, 27. M�rz 2002 10:50
> To: Zabach, Elke; Riesch Nicolas; [EMAIL PROTECTED]
> Subject: RE: physical order of rows on disk
> 
> 
> thank you for your answer.
> 
> I work with Microsoft SQL Server, with Datacom DB on 
> mainframe, and a little
> with DB2.
> In order to manage and store the data, all these database 
> systems store the
> data by chunks called "pages", of equal size.
> SQL Server manage data by 8Kb pages, for instance.
> sap db does the same.
> The files or devices where data are to be stored are divided 
> in pages, and
> each page is given a physical id, allowing the database 
> system to write and
> read this page.
> Most database system number the pages in a file or device 
> sequentially,
> starting from 1, 2, 3, 4, ...
> 
> All the pages containing the rows of a table make a linked 
> list, each page
> having a pointer to the next page and to the previous page.
> When new data are inserted, new pages are allocated here and 
> there, more or
> less at random location.
> Whenever a new page is allocated, it is inserted in the 
> linked list at the
> proper location.
> Suppose we have a table "employees" :
> 
>  create table employees
>  ( emp_id int not null,
>    name varchar(30) not null,
>    birthdate date null
>   )
> 
> Suppose for now that there is a way to define a "cluster 
> order", which is
> the order of the rows in the leaf pages of the B-tree of the 
> "employees"
> table.
> Suppose we define this "cluster order" to be on the field "name".
> 
> now, we insert the following records:
> 
> 022 gerber
> 294 suter
> 438 rolfo
> 402 black
> 806 davidson
> 403 einstein
> 104 adams
> 968 holmes
> 302 weber
> 294 jackson
> 732 hofer
> 101 bowles
> 110 fitzgerald
> 
> These records will be inserted in the linked list of pages 
> with a particular
> order, called "physical order", or "clustered index" (for sql 
> server), or
> "cluster index" (for db2), or "native key" (for datacom db). 
> All these terms
> are equivalent. Here, let's use the term "cluster order", as 
> it is the term
> I prefer.
> 
> As we have defined the "cluster order" on field "name", the 
> database system
> can store the rows like this:
> 
> page id:  40023
> previous page: null
> next page: 7298
>   104 adams
>   402 black
>   101 bowles
> 
> page id:  7298
> previous page: 40023
> next page: 1870
>   806 davidson
>   403 einstein
> 
> page id:  1870
> previous page: 7298
> next page: 4513
>   110 fitzgerald
>   022 gerber
> 
> page id:  4513
> previous page: 1870
> next page: 17322
>   732 hofer
>   968 holmes
>   294 jackson
> 
> page id:  17322
> previous page: 4513
> next page: null
>   438 rolfo
>   294 suter
>   302 weber
> 
> We see that in our example, the database system has allocated 
> the physical
> pages whose physical id are:
> 40023 <--> 7298 <--> 1870 <--> 4513 <--> 17322
> 
> Of course, those pages have been allocated at random 
> locations. We see that
> because the physical ids of allocated pages are not sequential.
> When the database system needs to scan the table "employees", 
> it retrieves
> the first page (here, the page with id 40023), and then 
> follows the pointer
> to next page, in order to retrieve the following page. It 
> continues this
> way, reading all linked pages until the last page (id 17322).
> 
> As the pages are retrieved, the rows are thus also retrieved in the
> "clustered order", from "adams" to "weber".
> 
> Why is the "cluster order" so important ?
> 
> Well, it is of no use if you just want to retrieve just one 
> precise record.
> If you want to retrieve the employee whose id is 968 
> ("holmes") and there is
> an index on field "emp_id", the database system will just 
> read the page no
> 4513.
>   select * from employees where emp_id=968
> This statement can be done just with one physical read from disk. (We
> suppose that the index pages of higher level are already in 
> memory buffer
> cache)
> 
> But if you want to retrieve a range of data, for instance the 
> employees
> whose name begins with "h":
>   select * from employees where name like 'h*'
> The database system will read the first page containing an 
> employee whose
> name begins with "h", for instance "holmes".
> But in this same page, the following records also contain 
> employees whose
> name begins with "h", as the records are ordered in the pages 
> by "name"
> alphabetical order.
> To retrieve the next matching employees, the database system 
> just need to
> follow the page link and read the next page and so on.
> But each time a page is retrieved, it contains many matching 
> employees,
> which is very good.
> 
> Suppose now that the employees table is physically ordered on 
> "emp_id".
> Then, the rows on leaf pages of the table are ordered on increasing
> "emp_id".
> If we have a secondary index on "name", the database system needs to
> retrieve one page for each matching row.
> And when a page containing the data of "holmes", for 
> instance, is retrieved,
> it is extremely improbable (if the table is large) that it 
> contains another
> employee whose name begins with "h".
> ( this example is not so good, but you get the idea)
> 
> - So, we can say that retrieving a range of data is more IO 
> costly if we use
> a secondary index.
>   This is because each leaf page retrieve contains most of 
> the time just one
> matching record.
> 
> - On the opposite, it is very efficient if the data are ordered with a
> "cluster order" like the range we want to retrieve.
>   This way, each leaf page retrieved contains many matching rows.
>  
> That's why it is very important, when creating a table, to define the
> "cluster order".
> It is even the first thing you do, just after having defined 
> the column.
> 
> In all databases, there is a mean to define a "cluster order" 
> when creating
> a table.
> And the "cluster order" is something independant or the "primary key".
> 
> In sap db, it seems that the "cluster order" is defined by 
> the "primary
> key", which seems quite strange to me.
> 
> As I said in my previous message, in sap db, it seems that if 
> I waqnt to
> define a "cluster order" for a table, I must define it as a 
> "primary key".
> And then, when I want to define the "real logical primary key" on this
> table, I must declare it as a unique index.
> Doing this achieves the result I want, but it is not very pretty.
> 
> That's why I wonder if sap db will include in its syntax a 
> way of defining
> the "cluster order" independantly or the "primary key" definition.
> All other big db systems allow this feature, so ...
> 
> Just to give an idea, here is the syntax for sql server, but db2 uses
> something approaching:
> 
>  create table employees
>  ( emp_id int not null,
>    name varchar(30) not null,
>    birthdate date null,
>    primary key nonclustered (emp_id)
>   )
>  create index clustered idx_employees on employees(name)
> 
> Here, the rows are ordered on the leaf pages with the 
> clustered index order
> (name).
> And independantly of this, here is a primary key that 
> uniquely identifies
> the records on emp_id.
> 
> So, my question is:
> 
> In sapdb, when you define a "primary key", it is at the same time the
> cluster ordering ?
> You cannot define a "cluster ordering" different from the 
> "primary key" ?
> 
> 
> ( as for the utility of defining an appropriate cluster 
> index, it allows to
> reduce considerably the number of read/write on disk for 
> range retrieval
> queries, if chosen properly.
>   For instance, the cluster of employees table is on "name", 
> as we often
> want to display list of employees by alphabetical order, 
> knowing the first
> letters of the name.
>   For payments records, we want the table to have a cluster order on (
> payment_year, payment_month), as we often make queries 
> summing amount for
> payments made during one month or a range of month.
>   and so on...
> )
> 
> 
> > -----Message d'origine-----
> > De: Zabach, Elke [SMTP:[EMAIL PROTECTED]]
> > Date:       mardi, 26. mars 2002 17:46
> > �:  'Riesch Nicolas'; [EMAIL PROTECTED]
> > Objet:      RE: physical order of rows on disk
> > 
> > Riesch Nicolas wrote:
> > 
> > > I have read the documentation many times, but I am note 
> sure I have
> > > understood everything.
> > > Can you tell me if all the following affirmations are correct ?
> > > 
> > > 1) the following two ways of creating a table are exactly 
> the same ?:
> > > 
> > > create table employees
> > > ( company_id int key not null,
> > >   emp_id int key not null,
> > >   name varchar(30) not null,
> > >   birthdate date null
> > > )
> > > 
> > > create table employees
> > > ( company_id int not null,
> > >   emp_id int not null,
> > >   name varchar(30) not null,
> > >   birthdate date null,
> > >   primary key (company_id, emp_id)
> > > )
> > > 
> > > Is is correct to say that the PRIMARY KEY can be defined 
> > > either with the
> > > "key" keyword at the column level, or with the "primary key 
> > > (col1, col2,
> > > ...)" syntax ?
> > 
> >      For now it is the same, but in future, the first 
> version will not
> > exist
> >      any longer. It is non-standard --> forget the first version
> > 
> > > 2) A table is physically stored as a B-tree on disk.
> > > The rows are ordered physically with the same order as the 
> > > PRIMARY KEY, as
> > > defined during the creation of the table.
> > > The physical ordering of rows on disks is called 
> "clustered index" or
> > > "native key" in other database systems.
> > > So, with sapdb, the clustered index is always the PRIMARY KEY ?
> > 
> >      A B*-tree is a logical point of view. For this you are right.
> >      But the tree is not stored from left to right on 
> consecutive pages on
> > disk.
> >      If a new page is needed, it is used whereever it may 
> be stored on
> > disk.
> >      It is inserted in the B*-tree whereever needed and 
> there it belongs
> >      to the logical sight from left to right, i.e. from small
> > primkey-values
> >      to higher ones.
> >      If we would like to store one table in correct 
> physically order each
> > insert
> >      would need to push all rows behind it to the right...
> >  
> > > 3) So, is it impossible to define a table with rows having a 
> > > particular
> > > physical order, and a PRIMARY KEY defining another order ?
> > > Suppose I have this table:
> > > create table employees
> > > ( emp_id int not null,
> > >   name varchar(30) not null,
> > >   birthdate date null,
> > >   primary key (emp_id)
> > > )
> > > 
> > > I want to have the rows of this table physically ordered on 
> > > disk by the
> > > field "name".
> > > But if I create an index on "name", the physical order of the 
> > > rows is not
> > > changed. It just creates another B-tree for the index, 
> > > pointing on the real
> > > rows of the table.
> > > So, the only way I see to achieve this is to do the following :
> > > 
> > > create table employees
> > > ( emp_id int not null,
> > >   name varchar(30) not null,
> > >   birthdate date null,
> > >   primary key (name, emp_id)
> > > )
> > > create unique index pk_employees  on employees (emp_id)
> > > 
> > > Then I insert those data:
> > > 
> > > insert into employees
> > >   values (20, 'smith',null)
> > > insert into employees
> > >   values (10, 'watson',null)
> > > insert into employees
> > >   values (30, 'holmes',null)
> > > 
> > > Then I display the content of the table with "select * from 
> > > employees".
> > > The result is:
> > > 
> > > 30  holmes  <null>  
> > > 20  smith   <null>  
> > > 10  watson  <null>  
> > > 
> > > Here, we see that the physical order is alphabetically by 
> > > name, as I wanted
> > > it to be.
> > > 
> > > And the secondary unique index on "emp_id" is acting as the 
> > > real logical
> > > "primary key".
> > > 
> > > Is is right to say that what is defined as "primary key" in 
> > > sapdb is in fact
> > > the "clusted index" (or "physical index", or "native key") of 
> > > the table,
> > > which defines the physical order of the rows on disk ?
> > > 
> > > Is there no other way to force another index than the 
> > > "primary key" to be
> > > the "physical index" ?
> > 
> > 
> >     As I wrote, a      physical      sequence on disk can never be
> > assured.
> >     But what do you expect if you store according to name? 
> Why is it so
> > necessary for you?
> > 
> > Elke
> > SAP Labs Berlin
> 
> 
> **********************************************************************
> This message is being transmitted to you without prior inspection 
> by Geneva Insurance.
> 
> The message engages only the person who authored it and in 
> no way is binding on Geneva Insurance.
> 
> The latter can under no circumstances be held responsible 
> for the contents of this message, neither contractually nor in 
> any other manner.
> 
> If the case arises, any commitments made by the author of 
> this message in the name of Geneva Insurance have no legal 
> validity unless confirmed in a letter duly signed by persons 
> authorized to represent the company.
> 
> [EMAIL PROTECTED]
> **********************************************************************
> 
_______________________________________________
sapdb.general mailing list
[EMAIL PROTECTED]
http://listserv.sap.com/mailman/listinfo/sapdb.general

RE: physical order of rows on disk

Reply via email to