RE: physical order of rows on disk

Riesch Nicolas Wed, 27 Mar 2002 04:18:51 -0800

thank you for your answer.

I work with Microsoft SQL Server, with Datacom DB on mainframe, and a little
with DB2.
In order to manage and store the data, all these database systems store the
data by chunks called "pages", of equal size.
SQL Server manage data by 8Kb pages, for instance.
sap db does the same.
The files or devices where data are to be stored are divided in pages, and
each page is given a physical id, allowing the database system to write and
read this page.
Most database system number the pages in a file or device sequentially,
starting from 1, 2, 3, 4, ...

All the pages containing the rows of a table make a linked list, each page
having a pointer to the next page and to the previous page.
When new data are inserted, new pages are allocated here and there, more or
less at random location.
Whenever a new page is allocated, it is inserted in the linked list at the
proper location.
Suppose we have a table "employees" :

 create table employees
 ( emp_id int not null,
   name varchar(30) not null,
   birthdate date null
  )

Suppose for now that there is a way to define a "cluster order", which is
the order of the rows in the leaf pages of the B-tree of the "employees"
table.
Suppose we define this "cluster order" to be on the field "name".

now, we insert the following records:

022 gerber
294 suter
438 rolfo
402 black
806 davidson
403 einstein
104 adams
968 holmes
302 weber
294 jackson
732 hofer
101 bowles
110 fitzgerald

These records will be inserted in the linked list of pages with a particular
order, called "physical order", or "clustered index" (for sql server), or
"cluster index" (for db2), or "native key" (for datacom db). All these terms
are equivalent. Here, let's use the term "cluster order", as it is the term
I prefer.

As we have defined the "cluster order" on field "name", the database system
can store the rows like this:

page id:  40023
previous page: null
next page: 7298
  104 adams
  402 black
  101 bowles

page id:  7298
previous page: 40023
next page: 1870
  806 davidson
  403 einstein

page id:  1870
previous page: 7298
next page: 4513
  110 fitzgerald
  022 gerber

page id:  4513
previous page: 1870
next page: 17322
  732 hofer
  968 holmes
  294 jackson

page id:  17322
previous page: 4513
next page: null
  438 rolfo
  294 suter
  302 weber

We see that in our example, the database system has allocated the physical
pages whose physical id are:
40023 <--> 7298 <--> 1870 <--> 4513 <--> 17322

Of course, those pages have been allocated at random locations. We see that
because the physical ids of allocated pages are not sequential.
When the database system needs to scan the table "employees", it retrieves
the first page (here, the page with id 40023), and then follows the pointer
to next page, in order to retrieve the following page. It continues this
way, reading all linked pages until the last page (id 17322).

As the pages are retrieved, the rows are thus also retrieved in the
"clustered order", from "adams" to "weber".

Why is the "cluster order" so important ?

Well, it is of no use if you just want to retrieve just one precise record.
If you want to retrieve the employee whose id is 968 ("holmes") and there is
an index on field "emp_id", the database system will just read the page no
4513.
  select * from employees where emp_id=968
This statement can be done just with one physical read from disk. (We
suppose that the index pages of higher level are already in memory buffer
cache)

But if you want to retrieve a range of data, for instance the employees
whose name begins with "h":
  select * from employees where name like 'h*'
The database system will read the first page containing an employee whose
name begins with "h", for instance "holmes".
But in this same page, the following records also contain employees whose
name begins with "h", as the records are ordered in the pages by "name"
alphabetical order.
To retrieve the next matching employees, the database system just need to
follow the page link and read the next page and so on.
But each time a page is retrieved, it contains many matching employees,
which is very good.

Suppose now that the employees table is physically ordered on "emp_id".
Then, the rows on leaf pages of the table are ordered on increasing
"emp_id".
If we have a secondary index on "name", the database system needs to
retrieve one page for each matching row.
And when a page containing the data of "holmes", for instance, is retrieved,
it is extremely improbable (if the table is large) that it contains another
employee whose name begins with "h".
( this example is not so good, but you get the idea)

- So, we can say that retrieving a range of data is more IO costly if we use
a secondary index.
  This is because each leaf page retrieve contains most of the time just one
matching record.

- On the opposite, it is very efficient if the data are ordered with a
"cluster order" like the range we want to retrieve.
  This way, each leaf page retrieved contains many matching rows.

That's why it is very important, when creating a table, to define the
"cluster order".
It is even the first thing you do, just after having defined the column.

In all databases, there is a mean to define a "cluster order" when creating
a table.
And the "cluster order" is something independant or the "primary key".

In sap db, it seems that the "cluster order" is defined by the "primary
key", which seems quite strange to me.

As I said in my previous message, in sap db, it seems that if I waqnt to
define a "cluster order" for a table, I must define it as a "primary key".
And then, when I want to define the "real logical primary key" on this
table, I must declare it as a unique index.
Doing this achieves the result I want, but it is not very pretty.

That's why I wonder if sap db will include in its syntax a way of defining
the "cluster order" independantly or the "primary key" definition.
All other big db systems allow this feature, so ...

Just to give an idea, here is the syntax for sql server, but db2 uses
something approaching:

 create table employees
 ( emp_id int not null,
   name varchar(30) not null,
   birthdate date null,
   primary key nonclustered (emp_id)
  )
 create index clustered idx_employees on employees(name)

Here, the rows are ordered on the leaf pages with the clustered index order
(name).
And independantly of this, here is a primary key that uniquely identifies
the records on emp_id.

So, my question is:

In sapdb, when you define a "primary key", it is at the same time the
cluster ordering ?
You cannot define a "cluster ordering" different from the "primary key" ?

( as for the utility of defining an appropriate cluster index, it allows to
reduce considerably the number of read/write on disk for range retrieval
queries, if chosen properly.
  For instance, the cluster of employees table is on "name", as we often
want to display list of employees by alphabetical order, knowing the first
letters of the name.
  For payments records, we want the table to have a cluster order on (
payment_year, payment_month), as we often make queries summing amount for
payments made during one month or a range of month.
  and so on...
)

> -----Message d'origine-----
> De:   Zabach, Elke [SMTP:[EMAIL PROTECTED]]
> Date: mardi, 26. mars 2002 17:46
> �:    'Riesch Nicolas'; [EMAIL PROTECTED]
> Objet:        RE: physical order of rows on disk
> 
> Riesch Nicolas wrote:
> 
> > I have read the documentation many times, but I am note sure I have
> > understood everything.
> > Can you tell me if all the following affirmations are correct ?
> > 
> > 1) the following two ways of creating a table are exactly the same ?:
> > 
> > create table employees
> > ( company_id int key not null,
> >   emp_id int key not null,
> >   name varchar(30) not null,
> >   birthdate date null
> > )
> > 
> > create table employees
> > ( company_id int not null,
> >   emp_id int not null,
> >   name varchar(30) not null,
> >   birthdate date null,
> >   primary key (company_id, emp_id)
> > )
> > 
> > Is is correct to say that the PRIMARY KEY can be defined 
> > either with the
> > "key" keyword at the column level, or with the "primary key 
> > (col1, col2,
> > ...)" syntax ?
> 
>      For now it is the same, but in future, the first version will not
> exist
>      any longer. It is non-standard --> forget the first version
> 
> > 2) A table is physically stored as a B-tree on disk.
> > The rows are ordered physically with the same order as the 
> > PRIMARY KEY, as
> > defined during the creation of the table.
> > The physical ordering of rows on disks is called "clustered index" or
> > "native key" in other database systems.
> > So, with sapdb, the clustered index is always the PRIMARY KEY ?
> 
>      A B*-tree is a logical point of view. For this you are right.
>      But the tree is not stored from left to right on consecutive pages on
> disk.
>      If a new page is needed, it is used whereever it may be stored on
> disk.
>      It is inserted in the B*-tree whereever needed and there it belongs
>      to the logical sight from left to right, i.e. from small
> primkey-values
>      to higher ones.
>      If we would like to store one table in correct physically order each
> insert
>      would need to push all rows behind it to the right...
>  
> > 3) So, is it impossible to define a table with rows having a 
> > particular
> > physical order, and a PRIMARY KEY defining another order ?
> > Suppose I have this table:
> > create table employees
> > ( emp_id int not null,
> >   name varchar(30) not null,
> >   birthdate date null,
> >   primary key (emp_id)
> > )
> > 
> > I want to have the rows of this table physically ordered on 
> > disk by the
> > field "name".
> > But if I create an index on "name", the physical order of the 
> > rows is not
> > changed. It just creates another B-tree for the index, 
> > pointing on the real
> > rows of the table.
> > So, the only way I see to achieve this is to do the following :
> > 
> > create table employees
> > ( emp_id int not null,
> >   name varchar(30) not null,
> >   birthdate date null,
> >   primary key (name, emp_id)
> > )
> > create unique index pk_employees  on employees (emp_id)
> > 
> > Then I insert those data:
> > 
> > insert into employees
> >   values (20, 'smith',null)
> > insert into employees
> >   values (10, 'watson',null)
> > insert into employees
> >   values (30, 'holmes',null)
> > 
> > Then I display the content of the table with "select * from 
> > employees".
> > The result is:
> > 
> > 30  holmes  <null>  
> > 20  smith   <null>  
> > 10  watson  <null>  
> > 
> > Here, we see that the physical order is alphabetically by 
> > name, as I wanted
> > it to be.
> > 
> > And the secondary unique index on "emp_id" is acting as the 
> > real logical
> > "primary key".
> > 
> > Is is right to say that what is defined as "primary key" in 
> > sapdb is in fact
> > the "clusted index" (or "physical index", or "native key") of 
> > the table,
> > which defines the physical order of the rows on disk ?
> > 
> > Is there no other way to force another index than the 
> > "primary key" to be
> > the "physical index" ?
> 
> 
>     As I wrote, a      physical      sequence on disk can never be
> assured.
>     But what do you expect if you store according to name? Why is it so
> necessary for you?
> 
> Elke
> SAP Labs Berlin

**********************************************************************
This message is being transmitted to you without prior inspection 
by Geneva Insurance.

The message engages only the person who authored it and in 
no way is binding on Geneva Insurance.

The latter can under no circumstances be held responsible 
for the contents of this message, neither contractually nor in 
any other manner.

If the case arises, any commitments made by the author of 
this message in the name of Geneva Insurance have no legal 
validity unless confirmed in a letter duly signed by persons 
authorized to represent the company.

[EMAIL PROTECTED]
**********************************************************************

_______________________________________________
sapdb.general mailing list
[EMAIL PROTECTED]
http://listserv.sap.com/mailman/listinfo/sapdb.general

RE: physical order of rows on disk

Reply via email to