Hi,
there are no plans to allow the definition of 'primary key nonclustered'.
If one defines a primary key this will always be the 'clustered index'.
BTW: what do you think, DB2 does if you say PRIMARY KEY NONCLUSTERED ?
I think, we can assume, it will create a unique index.
Elke
SAP Labs Berlin
> -----Original Message-----
> From: Riesch Nicolas [mailto:[EMAIL PROTECTED]]
> Sent: Mittwoch, 27. M�rz 2002 10:50
> To: Zabach, Elke; Riesch Nicolas; [EMAIL PROTECTED]
> Subject: RE: physical order of rows on disk
>
>
> thank you for your answer.
>
> I work with Microsoft SQL Server, with Datacom DB on
> mainframe, and a little
> with DB2.
> In order to manage and store the data, all these database
> systems store the
> data by chunks called "pages", of equal size.
> SQL Server manage data by 8Kb pages, for instance.
> sap db does the same.
> The files or devices where data are to be stored are divided
> in pages, and
> each page is given a physical id, allowing the database
> system to write and
> read this page.
> Most database system number the pages in a file or device
> sequentially,
> starting from 1, 2, 3, 4, ...
>
> All the pages containing the rows of a table make a linked
> list, each page
> having a pointer to the next page and to the previous page.
> When new data are inserted, new pages are allocated here and
> there, more or
> less at random location.
> Whenever a new page is allocated, it is inserted in the
> linked list at the
> proper location.
> Suppose we have a table "employees" :
>
> create table employees
> ( emp_id int not null,
> name varchar(30) not null,
> birthdate date null
> )
>
> Suppose for now that there is a way to define a "cluster
> order", which is
> the order of the rows in the leaf pages of the B-tree of the
> "employees"
> table.
> Suppose we define this "cluster order" to be on the field "name".
>
> now, we insert the following records:
>
> 022 gerber
> 294 suter
> 438 rolfo
> 402 black
> 806 davidson
> 403 einstein
> 104 adams
> 968 holmes
> 302 weber
> 294 jackson
> 732 hofer
> 101 bowles
> 110 fitzgerald
>
> These records will be inserted in the linked list of pages
> with a particular
> order, called "physical order", or "clustered index" (for sql
> server), or
> "cluster index" (for db2), or "native key" (for datacom db).
> All these terms
> are equivalent. Here, let's use the term "cluster order", as
> it is the term
> I prefer.
>
> As we have defined the "cluster order" on field "name", the
> database system
> can store the rows like this:
>
> page id: 40023
> previous page: null
> next page: 7298
> 104 adams
> 402 black
> 101 bowles
>
> page id: 7298
> previous page: 40023
> next page: 1870
> 806 davidson
> 403 einstein
>
> page id: 1870
> previous page: 7298
> next page: 4513
> 110 fitzgerald
> 022 gerber
>
> page id: 4513
> previous page: 1870
> next page: 17322
> 732 hofer
> 968 holmes
> 294 jackson
>
> page id: 17322
> previous page: 4513
> next page: null
> 438 rolfo
> 294 suter
> 302 weber
>
> We see that in our example, the database system has allocated
> the physical
> pages whose physical id are:
> 40023 <--> 7298 <--> 1870 <--> 4513 <--> 17322
>
> Of course, those pages have been allocated at random
> locations. We see that
> because the physical ids of allocated pages are not sequential.
> When the database system needs to scan the table "employees",
> it retrieves
> the first page (here, the page with id 40023), and then
> follows the pointer
> to next page, in order to retrieve the following page. It
> continues this
> way, reading all linked pages until the last page (id 17322).
>
> As the pages are retrieved, the rows are thus also retrieved in the
> "clustered order", from "adams" to "weber".
>
> Why is the "cluster order" so important ?
>
> Well, it is of no use if you just want to retrieve just one
> precise record.
> If you want to retrieve the employee whose id is 968
> ("holmes") and there is
> an index on field "emp_id", the database system will just
> read the page no
> 4513.
> select * from employees where emp_id=968
> This statement can be done just with one physical read from disk. (We
> suppose that the index pages of higher level are already in
> memory buffer
> cache)
>
> But if you want to retrieve a range of data, for instance the
> employees
> whose name begins with "h":
> select * from employees where name like 'h*'
> The database system will read the first page containing an
> employee whose
> name begins with "h", for instance "holmes".
> But in this same page, the following records also contain
> employees whose
> name begins with "h", as the records are ordered in the pages
> by "name"
> alphabetical order.
> To retrieve the next matching employees, the database system
> just need to
> follow the page link and read the next page and so on.
> But each time a page is retrieved, it contains many matching
> employees,
> which is very good.
>
> Suppose now that the employees table is physically ordered on
> "emp_id".
> Then, the rows on leaf pages of the table are ordered on increasing
> "emp_id".
> If we have a secondary index on "name", the database system needs to
> retrieve one page for each matching row.
> And when a page containing the data of "holmes", for
> instance, is retrieved,
> it is extremely improbable (if the table is large) that it
> contains another
> employee whose name begins with "h".
> ( this example is not so good, but you get the idea)
>
> - So, we can say that retrieving a range of data is more IO
> costly if we use
> a secondary index.
> This is because each leaf page retrieve contains most of
> the time just one
> matching record.
>
> - On the opposite, it is very efficient if the data are ordered with a
> "cluster order" like the range we want to retrieve.
> This way, each leaf page retrieved contains many matching rows.
>
> That's why it is very important, when creating a table, to define the
> "cluster order".
> It is even the first thing you do, just after having defined
> the column.
>
> In all databases, there is a mean to define a "cluster order"
> when creating
> a table.
> And the "cluster order" is something independant or the "primary key".
>
> In sap db, it seems that the "cluster order" is defined by
> the "primary
> key", which seems quite strange to me.
>
> As I said in my previous message, in sap db, it seems that if
> I waqnt to
> define a "cluster order" for a table, I must define it as a
> "primary key".
> And then, when I want to define the "real logical primary key" on this
> table, I must declare it as a unique index.
> Doing this achieves the result I want, but it is not very pretty.
>
> That's why I wonder if sap db will include in its syntax a
> way of defining
> the "cluster order" independantly or the "primary key" definition.
> All other big db systems allow this feature, so ...
>
> Just to give an idea, here is the syntax for sql server, but db2 uses
> something approaching:
>
> create table employees
> ( emp_id int not null,
> name varchar(30) not null,
> birthdate date null,
> primary key nonclustered (emp_id)
> )
> create index clustered idx_employees on employees(name)
>
> Here, the rows are ordered on the leaf pages with the
> clustered index order
> (name).
> And independantly of this, here is a primary key that
> uniquely identifies
> the records on emp_id.
>
> So, my question is:
>
> In sapdb, when you define a "primary key", it is at the same time the
> cluster ordering ?
> You cannot define a "cluster ordering" different from the
> "primary key" ?
>
>
> ( as for the utility of defining an appropriate cluster
> index, it allows to
> reduce considerably the number of read/write on disk for
> range retrieval
> queries, if chosen properly.
> For instance, the cluster of employees table is on "name",
> as we often
> want to display list of employees by alphabetical order,
> knowing the first
> letters of the name.
> For payments records, we want the table to have a cluster order on (
> payment_year, payment_month), as we often make queries
> summing amount for
> payments made during one month or a range of month.
> and so on...
> )
>
>
> > -----Message d'origine-----
> > De: Zabach, Elke [SMTP:[EMAIL PROTECTED]]
> > Date: mardi, 26. mars 2002 17:46
> > �: 'Riesch Nicolas'; [EMAIL PROTECTED]
> > Objet: RE: physical order of rows on disk
> >
> > Riesch Nicolas wrote:
> >
> > > I have read the documentation many times, but I am note
> sure I have
> > > understood everything.
> > > Can you tell me if all the following affirmations are correct ?
> > >
> > > 1) the following two ways of creating a table are exactly
> the same ?:
> > >
> > > create table employees
> > > ( company_id int key not null,
> > > emp_id int key not null,
> > > name varchar(30) not null,
> > > birthdate date null
> > > )
> > >
> > > create table employees
> > > ( company_id int not null,
> > > emp_id int not null,
> > > name varchar(30) not null,
> > > birthdate date null,
> > > primary key (company_id, emp_id)
> > > )
> > >
> > > Is is correct to say that the PRIMARY KEY can be defined
> > > either with the
> > > "key" keyword at the column level, or with the "primary key
> > > (col1, col2,
> > > ...)" syntax ?
> >
> > For now it is the same, but in future, the first
> version will not
> > exist
> > any longer. It is non-standard --> forget the first version
> >
> > > 2) A table is physically stored as a B-tree on disk.
> > > The rows are ordered physically with the same order as the
> > > PRIMARY KEY, as
> > > defined during the creation of the table.
> > > The physical ordering of rows on disks is called
> "clustered index" or
> > > "native key" in other database systems.
> > > So, with sapdb, the clustered index is always the PRIMARY KEY ?
> >
> > A B*-tree is a logical point of view. For this you are right.
> > But the tree is not stored from left to right on
> consecutive pages on
> > disk.
> > If a new page is needed, it is used whereever it may
> be stored on
> > disk.
> > It is inserted in the B*-tree whereever needed and
> there it belongs
> > to the logical sight from left to right, i.e. from small
> > primkey-values
> > to higher ones.
> > If we would like to store one table in correct
> physically order each
> > insert
> > would need to push all rows behind it to the right...
> >
> > > 3) So, is it impossible to define a table with rows having a
> > > particular
> > > physical order, and a PRIMARY KEY defining another order ?
> > > Suppose I have this table:
> > > create table employees
> > > ( emp_id int not null,
> > > name varchar(30) not null,
> > > birthdate date null,
> > > primary key (emp_id)
> > > )
> > >
> > > I want to have the rows of this table physically ordered on
> > > disk by the
> > > field "name".
> > > But if I create an index on "name", the physical order of the
> > > rows is not
> > > changed. It just creates another B-tree for the index,
> > > pointing on the real
> > > rows of the table.
> > > So, the only way I see to achieve this is to do the following :
> > >
> > > create table employees
> > > ( emp_id int not null,
> > > name varchar(30) not null,
> > > birthdate date null,
> > > primary key (name, emp_id)
> > > )
> > > create unique index pk_employees on employees (emp_id)
> > >
> > > Then I insert those data:
> > >
> > > insert into employees
> > > values (20, 'smith',null)
> > > insert into employees
> > > values (10, 'watson',null)
> > > insert into employees
> > > values (30, 'holmes',null)
> > >
> > > Then I display the content of the table with "select * from
> > > employees".
> > > The result is:
> > >
> > > 30 holmes <null>
> > > 20 smith <null>
> > > 10 watson <null>
> > >
> > > Here, we see that the physical order is alphabetically by
> > > name, as I wanted
> > > it to be.
> > >
> > > And the secondary unique index on "emp_id" is acting as the
> > > real logical
> > > "primary key".
> > >
> > > Is is right to say that what is defined as "primary key" in
> > > sapdb is in fact
> > > the "clusted index" (or "physical index", or "native key") of
> > > the table,
> > > which defines the physical order of the rows on disk ?
> > >
> > > Is there no other way to force another index than the
> > > "primary key" to be
> > > the "physical index" ?
> >
> >
> > As I wrote, a physical sequence on disk can never be
> > assured.
> > But what do you expect if you store according to name?
> Why is it so
> > necessary for you?
> >
> > Elke
> > SAP Labs Berlin
>
>
> **********************************************************************
> This message is being transmitted to you without prior inspection
> by Geneva Insurance.
>
> The message engages only the person who authored it and in
> no way is binding on Geneva Insurance.
>
> The latter can under no circumstances be held responsible
> for the contents of this message, neither contractually nor in
> any other manner.
>
> If the case arises, any commitments made by the author of
> this message in the name of Geneva Insurance have no legal
> validity unless confirmed in a letter duly signed by persons
> authorized to represent the company.
>
> [EMAIL PROTECTED]
> **********************************************************************
>
_______________________________________________
sapdb.general mailing list
[EMAIL PROTECTED]
http://listserv.sap.com/mailman/listinfo/sapdb.general