[sqlite] Putting an index on a boolean
2015-12-12 22:50 GMT+01:00 Keith Medcalf : > > > One other point: The use of grave accents to quote column names is a > > > mysql-ism. SQLite also supports that for compatibility. But you > > > still shouldn't do it. The proper SQL-standard way is double-quote. > > > ?That is funny: I did not use them at first (or double). But I am using > 'DB > > Browser for SQLite' and this shows those, so I thought that > > ?I should use them.? > > Like the Windows Veneer of Long File Names and permitting embedded > "special characters" in filenames, you only need to use quotes identifying > field names if you are making the atrocious error of using restricted > symbols or words as column names. If you do not do that, then you do not > need them. > ?I can go back to not using them. :-D -- Cecil Westerhof
[sqlite] Putting an index on a boolean
2015-12-12 22:44 GMT+01:00 Keith Medcalf : > > The first question(s) I would ask are: > Are all the fields case sensitive? (according to your definition they are) > ?Do you mean the name or the contents? ? > Are any of them, other than the primary key, unique? (according to your > definition they are not) > ?Only the primary key is unique. Maybe I should make name unique to, but it could be possible to have to projects with the same name in different groups I think. ? > Other than the isPersonal column all of the columns permit a NULL entry. > Is this your intent (it is what is written) > ?When thinking more about it, that should only be groupID. I think the others should always be filled. ? > One would presume (based on the English meanings of the column names) > that the groupid groups multiple projects together. How do you intend to > insure the group name is correct since you are repeating it in every > record. Do you not think it may be more appropriate to have a table of > groups, and have the projects.groupid be a foreign key into the groups > table? > ?I have a groups table also. But did not show it, because I did not think it important. It is: CREATE TABLE "groups" ( "groupID" TEXT PRIMARY KEY, "name"TEXT ); And probably I should make name unique.? -- Cecil Westerhof
[sqlite] Putting an index on a boolean
2015-12-12 22:12 GMT+01:00 Mark Hamburg : > Though to the extent that speed is proportional to data size, it would be > good to use something other than hexadecimal to store UUIDs. Binary blobs > would be the most compact, but ASCII85 encoding would work well if you need > strings. > > Also, if these values are reused repeatedly as I suspect projectID and > groupID might be, then it may be useful to intern them into a table and use > integer keys. We got a noticeable performance improvement when I made that > sort of change recently in our project. (I also implemented a > string-to-integer-to-string cache that sits ahead of hitting the database.) > ?That was something I was wondering about. I was saving this question for when I got further ahead. But it does not hurt to have info about this. :-D
[sqlite] Putting an index on a boolean
2015-12-12 22:07 GMT+01:00 Darren Duncan : > On 2015-12-12 12:56 PM, Cecil Westerhof wrote: > >> By the way: I am thinking about using UUID for projectID and groupID, >>> but I >>> heard somewhere that it was a bad idea to use UUID for an indexed field. >>> Is >>> this true?? >>> >>> I think you might have misunderstood. UUID is almost always a good >>> field to index. >>> >> >> ?I was told because of the nature of random UUID (what I will be using) it >> is hard to create a good index. The article said that data that is really >> random cannot be indexed very efficient. But I do not have to worry about >> it then. :-) It has been a few years back, so it is also possible that the >> problem is solved nowadays. >> > > Cecil, it isn't about randomness, it is about uniqueness or cardinality. > The fields that index the best are ones with many different values, in > particular key fields where every record has a different value from every > other record. UUIDs have this quality in spades. It is even more important > to index such fields if you will either be searching/filtering on them or > if they are the parent in a foreign key constraint. This has always been > the case, its not a new thing. > ?That was what that (old) article said: because the data was completely random it was hard to create a balanced tree for the index. I did find it a little strange, but I am not an expert on creating balanced trees for an index. But again: I am happy that it is not a point. -- Cecil Westerhof
[sqlite] Putting an index on a boolean
2015-12-12 21:45 GMT+01:00 Richard Hipp : > On 12/12/15, Cecil Westerhof wrote: > > I am playing with SQLite. I am thinking about writing an application for > > projects. At the moment I have the following: > > > > CREATE TABLE `projects` ( > > `projectID` TEXTPRIMARY KEY, > > `groupID` TEXT, > > `isPersonal`INTEGER NOT NULL CHECK(isPersonal in (0, 1)), > > `name` TEXT, > > `description` TEXT, > > `outcome` TEXT > > ); > > CREATE INDEX projects_groupID_idx > > ON projects(groupID); > > CREATE INDEX projects_isPersonal_idx > > ON projects(isPersonal); > > > > ?I like to differentiate between personal and non personal projects. Is > it > > a good idea to put a index on isPersonal? > > No, usually not. An exception would be if the boolean is almost > always true or almost always false. Then using a partial index on > (the infrequent value of) that boolean might make sense. > ?I do not think that will be the case. It is even possible that one time it is mostly true and another time mostly false. I will remove the index. > > By the way: I am thinking about using UUID for projectID and groupID, > but I > > heard somewhere that it was a bad idea to use UUID for an indexed field. > Is > > this true?? > > > > I think you might have misunderstood. UUID is almost always a good > field to index. > ?I was told because of the nature of random UUID (what I will be using) it is hard to create a good index. The article said that data that is really random cannot be indexed very efficient. But I do not have to worry about it then. :-) It has been a few years back, so it is also possible that the problem is solved nowadays. ? > One other point: The use of grave accents to quote column names is a > mysql-ism. SQLite also supports that for compatibility. But you > still shouldn't do it. The proper SQL-standard way is double-quote. > ?That is funny: I did not use them at first (or double). But I am using 'DB Browser for SQLite' and this shows those, so I thought that ? ?I should use them.? -- Cecil Westerhof
[sqlite] Putting an index on a boolean
I am playing with SQLite. I am thinking about writing an application for projects. At the moment I have the following: CREATE TABLE `projects` ( `projectID` TEXTPRIMARY KEY, `groupID` TEXT, `isPersonal`INTEGER NOT NULL CHECK(isPersonal in (0, 1)), `name` TEXT, `description` TEXT, `outcome` TEXT ); CREATE INDEX projects_groupID_idx ON projects(groupID); CREATE INDEX projects_isPersonal_idx ON projects(isPersonal); ?I like to differentiate between personal and non personal projects. Is it a good idea to put a index on isPersonal? By the way: I am thinking about using UUID for projectID and groupID, but I heard somewhere that it was a bad idea to use UUID for an indexed field. Is this true?? -- Cecil Westerhof
[sqlite] Putting an index on a boolean
> > One other point: The use of grave accents to quote column names is a > > mysql-ism. SQLite also supports that for compatibility. But you > > still shouldn't do it. The proper SQL-standard way is double-quote. > ?That is funny: I did not use them at first (or double). But I am using 'DB > Browser for SQLite' and this shows those, so I thought that > ?I should use them.? Like the Windows Veneer of Long File Names and permitting embedded "special characters" in filenames, you only need to use quotes identifying field names if you are making the atrocious error of using restricted symbols or words as column names. If you do not do that, then you do not need them.
[sqlite] Putting an index on a boolean
The first question(s) I would ask are: Are all the fields case sensitive? (according to your definition they are) Are any of them, other than the primary key, unique? (according to your definition they are not) Other than the isPersonal column all of the columns permit a NULL entry. Is this your intent (it is what is written) One would presume (based on the English meanings of the column names) that the groupid groups multiple projects together. How do you intend to insure the group name is correct since you are repeating it in every record. Do you not think it may be more appropriate to have a table of groups, and have the projects.groupid be a foreign key into the groups table? Using "big long inscrutable strings or binary blobs" for keys is relatively unimportant compared to answering the questions above. You could also just use great big long strings of random data as keys (which is what a uuid is). It will only serve to make it difficult for a human to debug your database. If you are going to do that, then create a separate table of uuid to simple-integer-sequence keys. That way the uuids are kept where they belong, completely outside of the need for human viewing. Yet you can still use them as unique long strings to attach to simple abstract keys if you are so addicted. (I have a set of big rusty pinking shears for use on people who use uuid's as keys. It is almost impossible to "just look" at something that uses uuids as keys and be able to tell what you are looking at (though it is a little better on my 104" monitor, but it is still impossible to "see" relations. Using simple small integers is far better, though YMMV if you have eidetic memory.) > -Original Message- > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users- > bounces at mailinglists.sqlite.org] On Behalf Of Cecil Westerhof > Sent: Saturday, 12 December, 2015 15:32 > To: SQLite mailing list > Subject: [sqlite] Putting an index on a boolean > > I am playing with SQLite. I am thinking about writing an application for > projects. At the moment I have the following: > > CREATE TABLE `projects` ( > `projectID` TEXTPRIMARY KEY, > `groupID` TEXT, > `isPersonal`INTEGER NOT NULL CHECK(isPersonal in (0, 1)), > `name` TEXT, > `description` TEXT, > `outcome` TEXT > ); > CREATE INDEX projects_groupID_idx > ON projects(groupID); > CREATE INDEX projects_isPersonal_idx > ON projects(isPersonal); > > ?I like to differentiate between personal and non personal projects. Is it > a good idea to put a index on isPersonal? > > > By the way: I am thinking about using UUID for projectID and groupID, but > I > heard somewhere that it was a bad idea to use UUID for an indexed field. > Is > this true?? > > -- > Cecil Westerhof > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Putting an index on a boolean
On 12/12/15, Cecil Westerhof wrote: > I am playing with SQLite. I am thinking about writing an application for > projects. At the moment I have the following: > > CREATE TABLE `projects` ( > `projectID` TEXTPRIMARY KEY, > `groupID` TEXT, > `isPersonal`INTEGER NOT NULL CHECK(isPersonal in (0, 1)), > `name` TEXT, > `description` TEXT, > `outcome` TEXT > ); > CREATE INDEX projects_groupID_idx > ON projects(groupID); > CREATE INDEX projects_isPersonal_idx > ON projects(isPersonal); > > ?I like to differentiate between personal and non personal projects. Is it > a good idea to put a index on isPersonal? No, usually not. An exception would be if the boolean is almost always true or almost always false. Then using a partial index on (the infrequent value of) that boolean might make sense. CREATE TABLE t1( id INTEGER PRIMARY KEY, isPersonal BOOLEAN -- usually false ); CREATE INDEX t1x ON t1(isPersonal) WHERE isPersonal; Then queries of the form: SELECT * FROM t1 WHERE isPersonal; Would use the index, but the index will not take up much space. > > > By the way: I am thinking about using UUID for projectID and groupID, but I > heard somewhere that it was a bad idea to use UUID for an indexed field. Is > this true?? > I think you might have misunderstood. UUID is almost always a good field to index. One other point: The use of grave accents to quote column names is a mysql-ism. SQLite also supports that for compatibility. But you still shouldn't do it. The proper SQL-standard way is double-quote. -- D. Richard Hipp drh at sqlite.org
[sqlite] Putting an index on a boolean
For my part, in a database I designed that used a SHA-256 hash for a unique identifier that was then a foreign key from many other tables, I stored that as an integer and not as a hex string. If UUIDs are similarly numbers fundamentally, they possibly could do likewise. I agree with Mark's comment re binary. -- Darren Duncan On 2015-12-12 1:12 PM, Mark Hamburg wrote: > Though to the extent that speed is proportional to data size, it would be > good to use something other than hexadecimal to store UUIDs. Binary blobs > would be the most compact, but ASCII85 encoding would work well if you need > strings. > > Also, if these values are reused repeatedly as I suspect projectID and > groupID might be, then it may be useful to intern them into a table and use > integer keys. We got a noticeable performance improvement when I made that > sort of change recently in our project. (I also implemented a > string-to-integer-to-string cache that sits ahead of hitting the database.) > > Mark > >> On Dec 12, 2015, at 1:07 PM, Darren Duncan >> wrote: >> >> On 2015-12-12 12:56 PM, Cecil Westerhof wrote: > By the way: I am thinking about using UUID for projectID and groupID, but I > heard somewhere that it was a bad idea to use UUID for an indexed field. Is > this true?? I think you might have misunderstood. UUID is almost always a good field to index. >>> >>> ?I was told because of the nature of random UUID (what I will be using) it >>> is hard to create a good index. The article said that data that is really >>> random cannot be indexed very efficient. But I do not have to worry about >>> it then. :-) It has been a few years back, so it is also possible that the >>> problem is solved nowadays. >> >> Cecil, it isn't about randomness, it is about uniqueness or cardinality. >> The fields that index the best are ones with many different values, in >> particular key fields where every record has a different value from every >> other record. UUIDs have this quality in spades. It is even more important >> to index such fields if you will either be searching/filtering on them or if >> they are the parent in a foreign key constraint. This has always been the >> case, its not a new thing. -- Darren Duncan
[sqlite] Putting an index on a boolean
Though to the extent that speed is proportional to data size, it would be good to use something other than hexadecimal to store UUIDs. Binary blobs would be the most compact, but ASCII85 encoding would work well if you need strings. Also, if these values are reused repeatedly as I suspect projectID and groupID might be, then it may be useful to intern them into a table and use integer keys. We got a noticeable performance improvement when I made that sort of change recently in our project. (I also implemented a string-to-integer-to-string cache that sits ahead of hitting the database.) Mark > On Dec 12, 2015, at 1:07 PM, Darren Duncan wrote: > > On 2015-12-12 12:56 PM, Cecil Westerhof wrote: By the way: I am thinking about using UUID for projectID and groupID, >>> but I heard somewhere that it was a bad idea to use UUID for an indexed field. >>> Is this true?? >>> >>> I think you might have misunderstood. UUID is almost always a good >>> field to index. >> >> ?I was told because of the nature of random UUID (what I will be using) it >> is hard to create a good index. The article said that data that is really >> random cannot be indexed very efficient. But I do not have to worry about >> it then. :-) It has been a few years back, so it is also possible that the >> problem is solved nowadays. > > Cecil, it isn't about randomness, it is about uniqueness or cardinality. The > fields that index the best are ones with many different values, in particular > key fields where every record has a different value from every other record. > UUIDs have this quality in spades. It is even more important to index such > fields if you will either be searching/filtering on them or if they are the > parent in a foreign key constraint. This has always been the case, its not a > new thing. -- Darren Duncan > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Putting an index on a boolean
On 2015-12-12 12:56 PM, Cecil Westerhof wrote: >>> By the way: I am thinking about using UUID for projectID and groupID, >> but I >>> heard somewhere that it was a bad idea to use UUID for an indexed field. >> Is >>> this true?? >> >> I think you might have misunderstood. UUID is almost always a good >> field to index. > > ?I was told because of the nature of random UUID (what I will be using) it > is hard to create a good index. The article said that data that is really > random cannot be indexed very efficient. But I do not have to worry about > it then. :-) It has been a few years back, so it is also possible that the > problem is solved nowadays. Cecil, it isn't about randomness, it is about uniqueness or cardinality. The fields that index the best are ones with many different values, in particular key fields where every record has a different value from every other record. UUIDs have this quality in spades. It is even more important to index such fields if you will either be searching/filtering on them or if they are the parent in a foreign key constraint. This has always been the case, its not a new thing. -- Darren Duncan