[sqlite] Putting an index on a boolean

2015-12-12 Thread Cecil Westerhof
2015-12-12 22:50 GMT+01:00 Keith Medcalf :

> > > One other point:  The use of grave accents to quote column names is a
> > > mysql-ism.  SQLite also supports that for compatibility.  But you
> > > still shouldn't do it.  The proper SQL-standard way is double-quote.
>
> > ?That is funny: I did not use them at first (or double). But I am using
> 'DB
> > Browser for SQLite' and this shows those, so I thought that
> > ?I should use them.?
>
> Like the Windows Veneer of Long File Names and permitting embedded
> "special characters" in filenames, you only need to use quotes identifying
> field names if you are making the atrocious error of using restricted
> symbols or words as column names.  If you do not do that, then you do not
> need them.
>

?I can go back to not using them. :-D

-- 
Cecil Westerhof


[sqlite] Putting an index on a boolean

2015-12-12 Thread Cecil Westerhof
2015-12-12 22:44 GMT+01:00 Keith Medcalf :

>
> The first question(s) I would ask are:
>  Are all the fields case sensitive? (according to your definition they are)
>

?Do you mean the name or the contents?

?


>  Are any of them, other than the primary key, unique? (according to your
> definition they are not)
>

?Only the primary key is unique. Maybe I should make name unique to, but it
could be possible to have to projects with the same name in different
groups I think.

?


>  Other than the isPersonal column all of the columns permit a NULL entry.
> Is this your intent (it is what is written)
>

?When thinking more about it, that should only be groupID. I think the
others should always be filled.

?


>  One would presume (based on the English meanings of the column names)
> that the groupid groups multiple projects together.  How do you intend to
> insure the group name is correct since you are repeating it in every
> record.  Do you not think it may be more appropriate to have a table of
> groups, and have the projects.groupid be a foreign key into the groups
> table?
>

?I have a groups table also. But did not show it, because I did not think
it important. It is:
CREATE TABLE "groups" (
"groupID" TEXT PRIMARY KEY,
"name"TEXT
);

And probably I should make name unique.?


-- 
Cecil Westerhof


[sqlite] Putting an index on a boolean

2015-12-12 Thread Cecil Westerhof
2015-12-12 22:12 GMT+01:00 Mark Hamburg :

> Though to the extent that speed is proportional to data size, it would be
> good to use something other than hexadecimal to store UUIDs. Binary blobs
> would be the most compact, but ASCII85 encoding would work well if you need
> strings.
>
> Also, if these values are reused repeatedly as I suspect projectID and
> groupID might be, then it may be useful to intern them into a table and use
> integer keys. We got a noticeable performance improvement when I made that
> sort of change recently in our project. (I also implemented a
> string-to-integer-to-string cache that sits ahead of hitting the database.)
>

?That was something I was wondering about. I was saving this question for
when I got further ahead. But it does not hurt to have info about this. :-D


[sqlite] Putting an index on a boolean

2015-12-12 Thread Cecil Westerhof
2015-12-12 22:07 GMT+01:00 Darren Duncan :

> On 2015-12-12 12:56 PM, Cecil Westerhof wrote:
>
>> By the way: I am thinking about using UUID for projectID and groupID,

>>> but I
>>>
 heard somewhere that it was a bad idea to use UUID for an indexed field.

>>> Is
>>>
 this true??

>>>
>>> I think you might have misunderstood.  UUID is almost always a good
>>> field to index.
>>>
>>
>> ?I was told because of the nature of random UUID (what I will be using) it
>> is hard to create a good index. The article said that data that is really
>> random cannot be indexed very efficient. But I do not have to worry about
>> it then. :-) It has been a few years back, so it is also possible that the
>> problem is solved nowadays.
>>
>
> Cecil, it isn't about randomness, it is about uniqueness or cardinality.
> The fields that index the best are ones with many different values, in
> particular key fields where every record has a different value from every
> other record. UUIDs have this quality in spades.  It is even more important
> to index such fields if you will either be searching/filtering on them or
> if they are the parent in a foreign key constraint.  This has always been
> the case, its not a new thing.
>

?That was what that (old) article said: because the data was completely
random it was hard to create a balanced tree for the index. I did find it a
little strange, but I am not an expert on creating balanced trees for an
index. But again: I am happy that it is not a point.

-- 
Cecil Westerhof


[sqlite] Putting an index on a boolean

2015-12-12 Thread Cecil Westerhof
2015-12-12 21:45 GMT+01:00 Richard Hipp :

> On 12/12/15, Cecil Westerhof  wrote:
> > I am playing with SQLite. I am thinking about writing an application for
> > projects. At the moment I have the following:
> >
> > CREATE TABLE `projects` (
> > `projectID` TEXTPRIMARY KEY,
> > `groupID`   TEXT,
> > `isPersonal`INTEGER NOT NULL CHECK(isPersonal in (0, 1)),
> > `name`  TEXT,
> > `description`   TEXT,
> > `outcome`   TEXT
> > );
> > CREATE INDEX projects_groupID_idx
> > ON projects(groupID);
> > CREATE INDEX projects_isPersonal_idx
> > ON projects(isPersonal);
> >
> > ?I like to differentiate between personal and non personal projects. Is
> it
> > a good idea to put a index on isPersonal?
>
> No, usually not.  An exception would be if the boolean is almost
> always true or almost always false.  Then using a partial index on
> (the infrequent value of) that boolean might make sense.
>

?I do not think that will be the case. It is even possible that one time it
is mostly true and another time mostly false. I will remove the index.



> > By the way: I am thinking about using UUID for projectID and groupID,
> but I
> > heard somewhere that it was a bad idea to use UUID for an indexed field.
> Is
> > this true??
> >
>
> I think you might have misunderstood.  UUID is almost always a good
> field to index.
>

?I was told because of the nature of random UUID (what I will be using) it
is hard to create a good index. The article said that data that is really
random cannot be indexed very efficient. But I do not have to worry about
it then. :-) It has been a few years back, so it is also possible that the
problem is solved nowadays.
?



> One other point:  The use of grave accents to quote column names is a
> mysql-ism.  SQLite also supports that for compatibility.  But you
> still shouldn't do it.  The proper SQL-standard way is double-quote.
>

?That is funny: I did not use them at first (or double). But I am using 'DB
Browser for SQLite' and this shows those, so I thought that ?

?I should use them.?

-- 
Cecil Westerhof


[sqlite] Putting an index on a boolean

2015-12-12 Thread Cecil Westerhof
I am playing with SQLite. I am thinking about writing an application for
projects. At the moment I have the following:

CREATE TABLE `projects` (
`projectID` TEXTPRIMARY KEY,
`groupID`   TEXT,
`isPersonal`INTEGER NOT NULL CHECK(isPersonal in (0, 1)),
`name`  TEXT,
`description`   TEXT,
`outcome`   TEXT
);
CREATE INDEX projects_groupID_idx
ON projects(groupID);
CREATE INDEX projects_isPersonal_idx
ON projects(isPersonal);

?I like to differentiate between personal and non personal projects. Is it
a good idea to put a index on isPersonal?


By the way: I am thinking about using UUID for projectID and groupID, but I
heard somewhere that it was a bad idea to use UUID for an indexed field. Is
this true??

-- 
Cecil Westerhof


[sqlite] Putting an index on a boolean

2015-12-12 Thread Keith Medcalf
> > One other point:  The use of grave accents to quote column names is a
> > mysql-ism.  SQLite also supports that for compatibility.  But you
> > still shouldn't do it.  The proper SQL-standard way is double-quote.

> ?That is funny: I did not use them at first (or double). But I am using 'DB
> Browser for SQLite' and this shows those, so I thought that 
> ?I should use them.?

Like the Windows Veneer of Long File Names and permitting embedded "special 
characters" in filenames, you only need to use quotes identifying field names 
if you are making the atrocious error of using restricted symbols or words as 
column names.  If you do not do that, then you do not need them.






[sqlite] Putting an index on a boolean

2015-12-12 Thread Keith Medcalf

The first question(s) I would ask are:
 Are all the fields case sensitive? (according to your definition they are)
 Are any of them, other than the primary key, unique? (according to your 
definition they are not)
 Other than the isPersonal column all of the columns permit a NULL entry.  Is 
this your intent (it is what is written)
 One would presume (based on the English meanings of the column names) that the 
groupid groups multiple projects together.  How do you intend to insure the 
group name is correct since you are repeating it in every record.  Do you not 
think it may be more appropriate to have a table of groups, and have the 
projects.groupid be a foreign key into the groups table?

Using "big long inscrutable strings or binary blobs" for keys is relatively 
unimportant compared to answering the questions above.  You could also just use 
great big long strings of random data as keys (which is what a uuid is).  It 
will only serve to make it difficult for a human to debug your database.  If 
you are going to do that, then create a separate table of uuid to 
simple-integer-sequence keys.  That way the uuids are kept where they belong, 
completely outside of the need for human viewing.  Yet you can still use them 
as unique long strings to attach to simple abstract keys if you are so addicted.

(I have a set of big rusty pinking shears for use on people who use uuid's as 
keys.  It is almost impossible to "just look" at something that uses uuids as 
keys and be able to tell what you are looking at (though it is a little better 
on my 104" monitor, but it is still impossible to "see" relations.  Using 
simple small integers is far better, though YMMV if you have eidetic memory.) 

> -Original Message-
> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
> bounces at mailinglists.sqlite.org] On Behalf Of Cecil Westerhof
> Sent: Saturday, 12 December, 2015 15:32
> To: SQLite mailing list
> Subject: [sqlite] Putting an index on a boolean
> 
> I am playing with SQLite. I am thinking about writing an application for
> projects. At the moment I have the following:
> 
> CREATE TABLE `projects` (
> `projectID` TEXTPRIMARY KEY,
> `groupID`   TEXT,
> `isPersonal`INTEGER NOT NULL CHECK(isPersonal in (0, 1)),
> `name`  TEXT,
> `description`   TEXT,
> `outcome`   TEXT
> );
> CREATE INDEX projects_groupID_idx
> ON projects(groupID);
> CREATE INDEX projects_isPersonal_idx
> ON projects(isPersonal);
> 
> ?I like to differentiate between personal and non personal projects. Is it
> a good idea to put a index on isPersonal?
> 
> 
> By the way: I am thinking about using UUID for projectID and groupID, but
> I
> heard somewhere that it was a bad idea to use UUID for an indexed field.
> Is
> this true??
> 
> --
> Cecil Westerhof
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users





[sqlite] Putting an index on a boolean

2015-12-12 Thread Richard Hipp
On 12/12/15, Cecil Westerhof  wrote:
> I am playing with SQLite. I am thinking about writing an application for
> projects. At the moment I have the following:
>
> CREATE TABLE `projects` (
> `projectID` TEXTPRIMARY KEY,
> `groupID`   TEXT,
> `isPersonal`INTEGER NOT NULL CHECK(isPersonal in (0, 1)),
> `name`  TEXT,
> `description`   TEXT,
> `outcome`   TEXT
> );
> CREATE INDEX projects_groupID_idx
> ON projects(groupID);
> CREATE INDEX projects_isPersonal_idx
> ON projects(isPersonal);
>
> ?I like to differentiate between personal and non personal projects. Is it
> a good idea to put a index on isPersonal?

No, usually not.  An exception would be if the boolean is almost
always true or almost always false.  Then using a partial index on
(the infrequent value of) that boolean might make sense.

   CREATE TABLE t1(
  id INTEGER PRIMARY KEY,
  isPersonal BOOLEAN -- usually false
   );
   CREATE INDEX t1x ON t1(isPersonal) WHERE isPersonal;

Then queries of the form:

   SELECT * FROM t1 WHERE isPersonal;

Would use the index, but the index will not take up much space.

>
>
> By the way: I am thinking about using UUID for projectID and groupID, but I
> heard somewhere that it was a bad idea to use UUID for an indexed field. Is
> this true??
>

I think you might have misunderstood.  UUID is almost always a good
field to index.

One other point:  The use of grave accents to quote column names is a
mysql-ism.  SQLite also supports that for compatibility.  But you
still shouldn't do it.  The proper SQL-standard way is double-quote.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Putting an index on a boolean

2015-12-12 Thread Darren Duncan
For my part, in a database I designed that used a SHA-256 hash for a unique 
identifier that was then a foreign key from many other tables, I stored that as 
an integer and not as a hex string.  If UUIDs are similarly numbers 
fundamentally, they possibly could do likewise.  I agree with Mark's comment re 
binary. -- Darren Duncan

On 2015-12-12 1:12 PM, Mark Hamburg wrote:
> Though to the extent that speed is proportional to data size, it would be 
> good to use something other than hexadecimal to store UUIDs. Binary blobs 
> would be the most compact, but ASCII85 encoding would work well if you need 
> strings.
>
> Also, if these values are reused repeatedly as I suspect projectID and 
> groupID might be, then it may be useful to intern them into a table and use 
> integer keys. We got a noticeable performance improvement when I made that 
> sort of change recently in our project. (I also implemented a 
> string-to-integer-to-string cache that sits ahead of hitting the database.)
>
> Mark
>
>> On Dec 12, 2015, at 1:07 PM, Darren Duncan  
>> wrote:
>>
>> On 2015-12-12 12:56 PM, Cecil Westerhof wrote:
> By the way: I am thinking about using UUID for projectID and groupID,
 but I
> heard somewhere that it was a bad idea to use UUID for an indexed field.
 Is
> this true??

 I think you might have misunderstood.  UUID is almost always a good
 field to index.
>>>
>>> ?I was told because of the nature of random UUID (what I will be using) it
>>> is hard to create a good index. The article said that data that is really
>>> random cannot be indexed very efficient. But I do not have to worry about
>>> it then. :-) It has been a few years back, so it is also possible that the
>>> problem is solved nowadays.
>>
>> Cecil, it isn't about randomness, it is about uniqueness or cardinality.  
>> The fields that index the best are ones with many different values, in 
>> particular key fields where every record has a different value from every 
>> other record. UUIDs have this quality in spades.  It is even more important 
>> to index such fields if you will either be searching/filtering on them or if 
>> they are the parent in a foreign key constraint.  This has always been the 
>> case, its not a new thing. -- Darren Duncan



[sqlite] Putting an index on a boolean

2015-12-12 Thread Mark Hamburg
Though to the extent that speed is proportional to data size, it would be good 
to use something other than hexadecimal to store UUIDs. Binary blobs would be 
the most compact, but ASCII85 encoding would work well if you need strings.

Also, if these values are reused repeatedly as I suspect projectID and groupID 
might be, then it may be useful to intern them into a table and use integer 
keys. We got a noticeable performance improvement when I made that sort of 
change recently in our project. (I also implemented a 
string-to-integer-to-string cache that sits ahead of hitting the database.)

Mark

> On Dec 12, 2015, at 1:07 PM, Darren Duncan  wrote:
> 
> On 2015-12-12 12:56 PM, Cecil Westerhof wrote:
 By the way: I am thinking about using UUID for projectID and groupID,
>>> but I
 heard somewhere that it was a bad idea to use UUID for an indexed field.
>>> Is
 this true??
>>> 
>>> I think you might have misunderstood.  UUID is almost always a good
>>> field to index.
>> 
>> ?I was told because of the nature of random UUID (what I will be using) it
>> is hard to create a good index. The article said that data that is really
>> random cannot be indexed very efficient. But I do not have to worry about
>> it then. :-) It has been a few years back, so it is also possible that the
>> problem is solved nowadays.
> 
> Cecil, it isn't about randomness, it is about uniqueness or cardinality.  The 
> fields that index the best are ones with many different values, in particular 
> key fields where every record has a different value from every other record. 
> UUIDs have this quality in spades.  It is even more important to index such 
> fields if you will either be searching/filtering on them or if they are the 
> parent in a foreign key constraint.  This has always been the case, its not a 
> new thing. -- Darren Duncan
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Putting an index on a boolean

2015-12-12 Thread Darren Duncan
On 2015-12-12 12:56 PM, Cecil Westerhof wrote:
>>> By the way: I am thinking about using UUID for projectID and groupID,
>> but I
>>> heard somewhere that it was a bad idea to use UUID for an indexed field.
>> Is
>>> this true??
>>
>> I think you might have misunderstood.  UUID is almost always a good
>> field to index.
>
> ?I was told because of the nature of random UUID (what I will be using) it
> is hard to create a good index. The article said that data that is really
> random cannot be indexed very efficient. But I do not have to worry about
> it then. :-) It has been a few years back, so it is also possible that the
> problem is solved nowadays.

Cecil, it isn't about randomness, it is about uniqueness or cardinality.  The 
fields that index the best are ones with many different values, in particular 
key fields where every record has a different value from every other record. 
UUIDs have this quality in spades.  It is even more important to index such 
fields if you will either be searching/filtering on them or if they are the 
parent in a foreign key constraint.  This has always been the case, its not a 
new thing. -- Darren Duncan