On 2017/09/03 4:16 PM, Joseph L. Casale wrote:
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On
Behalf Of R Smith
Sent: Sunday, September 3, 2017 7:51 AM
To: sqlite-users@mailinglists.sqlite.org
Subject: Re: [sqlite] SQLITE bug
Lastly, a comment I've made possibly more than once on this list: There
is no imperative to trust the SQL engine with ID assignments. You are
free to (and I prefer to) assign IDs yourself.
What exactly do you feel you benefit by taking ownership of the ID, specifically
that of which you feel supersedes the obvious perils in the cases you noted?
Several things, I'll restricted this to the top three because TLDR.
First and foremost, precise control. I can decide on a per-system basis
which IDs will be available. Now the use-case in SQLite terms is obvious
since every SQLite database is a unique system unto itself (no central
server) and I can control which IDs (or ID blocks) will be used. The use
case for data restoration or synchronization between keyed DBs is
obvious too, so I'll concentrate on the less obvious things. For one, I
can rest assured no tampering will break the systems when the PKs are
controlled (like some naughty software/person/bug adjusting the
sqlite3_sequence table or just adding a row that has an ID near the
64bit limit).
For a more non-SQLite reason, I can control distributed systems
specifying which ID-blocks are assigned by which system as a way to
control system-group-wide unique key IDs without resorting to GUIDs and
the like[1]. I can also check for non-DB based corruption (the DB
mechanism is fine but the data isn't working like it should because of
bugs, tampering or other faults) based on violation of the predefined
Key assignment ranges - at least as one of the checks.
The next item pertains to all SQL DBs. Lower linking complexity - this
is the most important, but a bit hard to explain. If you have multiple
linked tables (as one often does) then I can control the linked IDs
between them. A small example from a typical use case we have: We store
Addresses in a table (A single company might have more than one site and
address), and to every address is a linked row in another table with GPS
information (Lat, Long, google maps api links, etc.) and yet another
table holds cached maps image blobs and the like - so sometimes an ID is
used for an address which do not have any cached maps, but the next
address ID added will add the same ID for the GPS table and the Maps
table (often skipping a PK ID or two to achieve this). Also, later I can
add cached maps for any skipped item by simply inserting with its same ID.
The first thing to note about the above is that I can simply say in
pseudo code:
i = calc_new_id();
INSERT INTO Addresses(ID, V1, V2, ...) VALUES (:i, :P1, :P2, ....);
INSERT INTO Maps(ID, V1, V2, ...) VALUES (:i, :P1, :P2, ....);
INSERT INTO GPS(ID, V1, V2, ...) VALUES (:i, :P1, :P2, ....);
whereas in a AUTOINCREMENT based conventional approach might end up
something like:
INSERT INTO Address(ID, V1, V2, ...) VALUES (NULL, :P1, :P2, ....);
i = getLastInsertID();
INSERT INTO Maps(ID, KeyToAddress, V1, V2 ...) VALUES (NULL, :i, :P1,
:P2, ....);
m = getLastInsertID();
INSERT INTO GPS(ID, KeyToAddress, KeyToMap, V1, V2 ...) VALUES (NULL,
:i, :m, :P1, :P2, ....);
with some possible variation depending on your needs.
Debugging systems like the first example above is much easier in human
terms since, when manually cross-checking, I don't have to remember 3
different IDs... The Address at ID 177 has Map data at ID 177 in the
Maps table and GPS data at ID 177 in the GPS table... you see the
pattern easily. It also obviates the need for an additional FK column in
the subsequent tables to "map" to the parent address, since the PK in
itself /IS/ the FK to the parent. (I might still add FKs to gain
cascading functionality or where the relation is one-to-many).
You control EVERY other piece of data you push into the DB, why not the
Key too? Or put differently, why would you rather do in principle:
INSERT Stuff;
Get the Key for it;
Use the key to insert more stuff or use elsewhere if needed; [2]
Than:
Make the Key;
Use the key to insert stuff or use elsewhere if needed;
I know the "Make the Key" step might be a little bit of effort for quick
little DBs, so I too use the auto-increment for them, but for everything
substantial, the amount of coding to do this pales in comparison with
the amount of code you write to do normal system checking and testing,
AND, using your own specified keys can often save some code on the
checking side.
[Let me admit here that in many non-intrinsic applications my
"calc_new_id();" function often resolves to simply "SELECT MAX(ID)+1
FROM..." and a validation check, often no more than a 2-line function,
but in other systems it may get as complex as contacting a central
server to gain/verify the next pool of insert IDs for the local system
to use.]
Lastly - One small gripe I have with auto-increment is that it lulls
system programmers into wanting to make INT PK's for everything, even
stuff where that is clearly the wrong/unneeded approach. An Order-entry
system should have the full Order-Number as the PK, not an INT. I often
see people making a table like this and just out of habit throwing an
INT PRIMARY KEY AUTOINCREMENT in there with the next column being the:
OrderNo TEXT COLLATE NOCASE UNIQUE - I ask you: if it walks like a PK,
and talks like a PK... isn't it the real PK?.
There is nothing inherently relational about integers, it's a computer
sourced convention used more for its ordinal properties than relational
properties. Nobody talks about person 44's children.. It's John's
children. Of course it has merit on the point of storing the referring
INT in many other tables being more efficient than including the entire
TEXT value, and in some systems the lookup is faster (in MSSQL
definitely, but I have not tested it in SQLite). This is however an
ever-dwindling advantage.
Oh yes, and another reason: My OCD. :)
Now just to be clear: I am not advocating the banning of AUTOINCREMENT,
just trying to point out that it's a hammer graciously provided by DB
engines, but no rule requires you to use it, and some jobs are not nails
- especially if you have an expectation of what the ID should be (like
the OP did).
Cheers,
Ryan
[1] I do actually use and suggest to use correctly calculated GUIDs for
system-wide Unique references which avoids ID blocks or pools, but, a
GUID is the most human-unfriendly ID possible and takes more space than
an INT, so I only resort to it when the scale of the system justifies,
but that's another debate and you can find discussion threads on it in
this very list.
[2] If you do not care at all what the Key is (non-linked tables where
you won't use the Key for anything else), then Auto-Increment is just
dandy, of course.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users