Re: [sqlite] Index size in file
Kees Nuyt wrote: On Wed, 3 Oct 2007 15:39:06 +0200, you wrote: I created an index on a TEXT column as I want to be able to I noticed a large increase in the file size. Looking at the binary of the file, I see that the index has a copy of all the data being indexed. 1. Is this necassary? 2. Is there a way to keep the index only in memory and not in the file. Clive A long time ago I saw index systems (don't remember, perhaps a mainframe with indexed sequential files), where the B-Tree used simple 'key compression'. Some encoding scheme which replaces the key field by a structure (repeat_length, keypart). Best shown in an example (my keys are 'frank', 'franklin', 'fred', 'google', 'gopher'): - store the full key 'frank' of the first entry in the page as (0,frank) - store 'franklin' as (5,lin), meaning: take the first five characters of the previous key and concatenate the rest. - store 'fred' as (2,ed) - store 'google' as (0,google) - store 'gopher' as (2,pher) This works nicely for large indexes with long keys and a lot of repetition. Of course the effort to handle insertions and deletions is significant. A simpler algorithm is "prefix b-trees" where only the significant part of the keys is held in the interior and leaf nodes. It can reduce the size of the index. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index size in file
Let's assume that my whole database can be in the cache. If my indexes have duplicate data, then I will either need a bigger cache or have to page out row data in favour of index data. In that case it will either be slower or require more memory to keep duplicate data for the indexes as opposed to referencing the original data. Clive John Stanton <[EMAIL PROTECTED]> on 05/10/2007 00:54:21 Please respond to sqlite-users@sqlite.org To: sqlite-users@sqlite.org cc:(bcc: clive/Emultek) Subject: Re: [sqlite] Index size in file Trevor Talbot wrote: > On 10/4/07, John Stanton <[EMAIL PROTECTED]> wrote: > >>A B-Tree index holds keys in sorted sequence. They are in random >>sequence in the database. That requires holding the keys in the B-Tree >>nodes. > > > Actually, it doesn't strictly require that; it could store references > to the keys. An obvious tradeoff is I/O; an index walk is less useful > if you have to do random seeks to the locations of row data just to > get the keys to walk the tree in the first place. IOW in simplistic > terms, an index walk suddenly doubles in disk I/O. > > The information on SQL Server would be interesting, as I know it > stores sort keys under some conditions, which is effectively duplicate > data. > One would need to be a paleontologist to measure the performance of an ordered index with indirect key references. - To unsubscribe, send email to [EMAIL PROTECTED] - This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index size in file
On Wed, 3 Oct 2007 15:39:06 +0200, you wrote: > > > >I created an index on a TEXT column as I want to be able to >I noticed a large increase in the file size. >Looking at the binary of the file, I see that the index has a copy of all the >data being indexed. >1. Is this necassary? >2. Is there a way to keep the index only in memory and not in the file. > >Clive A long time ago I saw index systems (don't remember, perhaps a mainframe with indexed sequential files), where the B-Tree used simple 'key compression'. Some encoding scheme which replaces the key field by a structure (repeat_length, keypart). Best shown in an example (my keys are 'frank', 'franklin', 'fred', 'google', 'gopher'): - store the full key 'frank' of the first entry in the page as (0,frank) - store 'franklin' as (5,lin), meaning: take the first five characters of the previous key and concatenate the rest. - store 'fred' as (2,ed) - store 'google' as (0,google) - store 'gopher' as (2,pher) This works nicely for large indexes with long keys and a lot of repetition. Of course the effort to handle insertions and deletions is significant. -- ( Kees Nuyt ) c[_] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index size in file
Trevor Talbot wrote: On 10/4/07, John Stanton <[EMAIL PROTECTED]> wrote: A B-Tree index holds keys in sorted sequence. They are in random sequence in the database. That requires holding the keys in the B-Tree nodes. Actually, it doesn't strictly require that; it could store references to the keys. An obvious tradeoff is I/O; an index walk is less useful if you have to do random seeks to the locations of row data just to get the keys to walk the tree in the first place. IOW in simplistic terms, an index walk suddenly doubles in disk I/O. The information on SQL Server would be interesting, as I know it stores sort keys under some conditions, which is effectively duplicate data. One would need to be a paleontologist to measure the performance of an ordered index with indirect key references. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index size in file
[EMAIL PROTECTED] wrote: > Regarding: > >>> Looking at the binary of the file, I see that the index has a copy of >>> all the data being indexed. >>> 1. Is this necassary? >>> > > Unless you're programming for a cellphone or some other embedded gadget, > you might want to calculate the cost (on the margin) of the disk storage > for the estimated amount of duplicated data. > > You might find it's vastly less than the cost of your time to think > about it. YMMV. > I'm as other developpong for smart/pocket/- devices but sqlite is very rich for this platform - too rich :) mak - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] Index size in file
Actually yes, I am programming for a cellphone and you are right, that is the only reason I am thinking about it! Clive "Griggs, Donald" <[EMAIL PROTECTED]> on 04/10/2007 21:23:17 Please respond to sqlite-users@sqlite.org To: sqlite-users@sqlite.org cc:(bcc: clive/Emultek) Subject: RE: [sqlite] Index size in file Regarding: >>Looking at the binary of the file, I see that the index has a copy of >>all the data being indexed. >>1. Is this necassary? Unless you're programming for a cellphone or some other embedded gadget, you might want to calculate the cost (on the margin) of the disk storage for the estimated amount of duplicated data. You might find it's vastly less than the cost of your time to think about it. YMMV. This message has been scanned for viruses by MailControl - www.mailcontrol.com - To unsubscribe, send email to [EMAIL PROTECTED] - This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] Index size in file
Regarding: >>Looking at the binary of the file, I see that the index has a copy of >>all the data being indexed. >>1. Is this necassary? Unless you're programming for a cellphone or some other embedded gadget, you might want to calculate the cost (on the margin) of the disk storage for the estimated amount of duplicated data. You might find it's vastly less than the cost of your time to think about it. YMMV. This message has been scanned for viruses by MailControl - www.mailcontrol.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index size in file
On 10/4/07, John Stanton <[EMAIL PROTECTED]> wrote: > A B-Tree index holds keys in sorted sequence. They are in random > sequence in the database. That requires holding the keys in the B-Tree > nodes. Actually, it doesn't strictly require that; it could store references to the keys. An obvious tradeoff is I/O; an index walk is less useful if you have to do random seeks to the locations of row data just to get the keys to walk the tree in the first place. IOW in simplistic terms, an index walk suddenly doubles in disk I/O. The information on SQL Server would be interesting, as I know it stores sort keys under some conditions, which is effectively duplicate data. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index size in file
>From what I read SQL Server has 2 basic types of index: 1. Clustered, that holds the single instance of the data itself (actaully the whole row) 2. Non-clustered that hold a pointer to the single instance of the data, but not the data itself. Clive John Stanton <[EMAIL PROTECTED]> on 04/10/2007 20:02:16 Please respond to sqlite-users@sqlite.org To: sqlite-users@sqlite.org cc:(bcc: clive/Emultek) Subject: Re: [sqlite] Index size in file A B-Tree index holds keys in sorted sequence. They are in random sequence in the database. That requires holding the keys in the B-Tree nodes. A hashed type access does not have that storage overhead, but it does not deliver the rows in sorted sequence. [EMAIL PROTECTED] wrote: > > > I am not an expert on indexes, however it does seem strange to me that a > database should keep duplicate data in it. > This prompted me to look up how indexes are stored in other databases. To tell > the truth I only looked at one, and that is SQL Server. > They do not store any duplicate data. If you would like the reference I can give > it to you. > I agree with you that it means an extra lookup that could make things slower, ( > I say could because you use more space in the cache which could result in more > reads) > Anyway given that that is the way it is implemented, does anyone know if it is > possible to create an index in memory? > > Clive > > > > > > John Stanton <[EMAIL PROTECTED]> on 03/10/2007 17:36:58 > > Please respond to sqlite-users@sqlite.org > > To: sqlite-users@sqlite.org > cc:(bcc: clive/Emultek) > > Subject: Re: [sqlite] Index size in file > > > > An index which does not hold keys is not an index. If you don't want to > allocate space for indexing then you put up with slow performance and > use row searches. > > [EMAIL PROTECTED] wrote: > >> >>I created an index on a TEXT column as I want to be able to >>I noticed a large increase in the file size. >>Looking at the binary of the file, I see that the index has a copy of all the >>data being indexed. >>1. Is this necassary? >>2. Is there a way to keep the index only in memory and not in the file. >> >>Clive >> >> >> >>- >>To unsubscribe, send email to [EMAIL PROTECTED] >>- >> > > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > > > > > > > > This footnote confirms that this email message has been scanned by > PineApp Mail-SeCure for the presence of malicious code, vandals & computer > viruses. > > > > > > > > > > > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > - To unsubscribe, send email to [EMAIL PROTECTED] - This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index size in file
A B-Tree index holds keys in sorted sequence. They are in random sequence in the database. That requires holding the keys in the B-Tree nodes. A hashed type access does not have that storage overhead, but it does not deliver the rows in sorted sequence. [EMAIL PROTECTED] wrote: I am not an expert on indexes, however it does seem strange to me that a database should keep duplicate data in it. This prompted me to look up how indexes are stored in other databases. To tell the truth I only looked at one, and that is SQL Server. They do not store any duplicate data. If you would like the reference I can give it to you. I agree with you that it means an extra lookup that could make things slower, ( I say could because you use more space in the cache which could result in more reads) Anyway given that that is the way it is implemented, does anyone know if it is possible to create an index in memory? Clive John Stanton <[EMAIL PROTECTED]> on 03/10/2007 17:36:58 Please respond to sqlite-users@sqlite.org To: sqlite-users@sqlite.org cc:(bcc: clive/Emultek) Subject: Re: [sqlite] Index size in file An index which does not hold keys is not an index. If you don't want to allocate space for indexing then you put up with slow performance and use row searches. [EMAIL PROTECTED] wrote: I created an index on a TEXT column as I want to be able to I noticed a large increase in the file size. Looking at the binary of the file, I see that the index has a copy of all the data being indexed. 1. Is this necassary? 2. Is there a way to keep the index only in memory and not in the file. Clive - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] - This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index size in file
I am not an expert on indexes, however it does seem strange to me that a database should keep duplicate data in it. This prompted me to look up how indexes are stored in other databases. To tell the truth I only looked at one, and that is SQL Server. They do not store any duplicate data. If you would like the reference I can give it to you. I agree with you that it means an extra lookup that could make things slower, ( I say could because you use more space in the cache which could result in more reads) Anyway given that that is the way it is implemented, does anyone know if it is possible to create an index in memory? Clive John Stanton <[EMAIL PROTECTED]> on 03/10/2007 17:36:58 Please respond to sqlite-users@sqlite.org To: sqlite-users@sqlite.org cc:(bcc: clive/Emultek) Subject: Re: [sqlite] Index size in file An index which does not hold keys is not an index. If you don't want to allocate space for indexing then you put up with slow performance and use row searches. [EMAIL PROTECTED] wrote: > > > I created an index on a TEXT column as I want to be able to > I noticed a large increase in the file size. > Looking at the binary of the file, I see that the index has a copy of all the > data being indexed. > 1. Is this necassary? > 2. Is there a way to keep the index only in memory and not in the file. > > Clive > > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > - To unsubscribe, send email to [EMAIL PROTECTED] - This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index size in file
An index which does not hold keys is not an index. If you don't want to allocate space for indexing then you put up with slow performance and use row searches. [EMAIL PROTECTED] wrote: I created an index on a TEXT column as I want to be able to I noticed a large increase in the file size. Looking at the binary of the file, I see that the index has a copy of all the data being indexed. 1. Is this necassary? 2. Is there a way to keep the index only in memory and not in the file. Clive - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -