On 2017/09/03 4:16 PM, Joseph L. Casale wrote:
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On
Behalf Of R Smith
Sent: Sunday, September 3, 2017 7:51 AM
To: sqlite-users@mailinglists.sqlite.org
Subject: Re: [sqlite] SQLITE bug

Lastly, a comment I've made possibly more than once on this list: There
is no imperative to trust the SQL engine with ID assignments. You are
free to (and I prefer to) assign IDs yourself.
What exactly do you feel you benefit by taking ownership of the ID, specifically
that of which you feel supersedes the obvious perils in the cases you noted?

Several things, I'll restricted this to the top three because TLDR.

First and foremost, precise control. I can decide on a per-system basis which IDs will be available. Now the use-case in SQLite terms is obvious since every SQLite database is a unique system unto itself (no central server) and I can control which IDs (or ID blocks) will be used. The use case for data restoration or synchronization between keyed DBs is obvious too, so I'll concentrate on the less obvious things. For one, I can rest assured no tampering will break the systems when the PKs are controlled (like some naughty software/person/bug adjusting the sqlite3_sequence table or just adding a row that has an ID near the 64bit limit).

For a more non-SQLite reason, I can control distributed systems specifying which ID-blocks are assigned by which system as a way to control system-group-wide unique key IDs without resorting to GUIDs and the like[1].  I can also check for non-DB based corruption (the DB mechanism is fine but the data isn't working like it should because of bugs, tampering or other faults) based on violation of the predefined Key assignment ranges - at least as one of the checks.


The next item pertains to all SQL DBs. Lower linking complexity - this is the most important, but a bit hard to explain. If you have multiple linked tables (as one often does) then I can control the linked IDs between them. A small example from a typical use case we have: We store Addresses in a table (A single company might have more than one site and address), and to every address is a linked row in another table with GPS information (Lat, Long, google maps api links, etc.) and yet another table holds cached maps image blobs and the like - so sometimes an ID is used for an address which do not have any cached maps, but the next address ID added will add the same ID for the GPS table and the Maps table (often skipping a PK ID or two to achieve this). Also, later I can add cached maps for any skipped item by simply inserting with its same ID.

The first thing to note about the above is that I can simply say in pseudo code:
  i = calc_new_id();
  INSERT INTO Addresses(ID, V1, V2, ...) VALUES (:i, :P1, :P2, ....);
  INSERT INTO Maps(ID, V1, V2, ...) VALUES (:i, :P1, :P2, ....);
  INSERT INTO GPS(ID, V1, V2, ...) VALUES (:i, :P1, :P2, ....);

whereas in a AUTOINCREMENT based conventional approach might end up something like:
  INSERT INTO Address(ID, V1, V2, ...) VALUES (NULL, :P1, :P2, ....);
  i = getLastInsertID();
  INSERT INTO Maps(ID, KeyToAddress, V1, V2 ...) VALUES (NULL, :i, :P1, :P2, ....);
  m = getLastInsertID();
  INSERT INTO GPS(ID, KeyToAddress, KeyToMap, V1, V2 ...) VALUES (NULL, :i, :m, :P1, :P2, ....);

with some possible variation depending on your needs.

Debugging systems like the first example above is much easier in human terms since, when manually cross-checking, I don't have to remember 3 different IDs... The Address at ID 177 has Map data at ID 177 in the Maps table and GPS data at ID 177 in the GPS table... you see the pattern easily. It also obviates the need for an additional FK column in the subsequent tables to "map" to the parent address, since the PK in itself /IS/ the FK to the parent. (I might still add FKs to gain cascading functionality or where the relation is one-to-many).

You control EVERY other piece of data you push into the DB, why not the Key too? Or put differently, why would you rather do in principle:
  INSERT Stuff;
  Get the Key for it;
  Use the key to insert more stuff or use elsewhere if needed; [2]

Than:
  Make the Key;
  Use the key to insert stuff or use elsewhere if needed;

I know the "Make the Key" step might be a little bit of effort for quick little DBs, so I too use the auto-increment for them, but for everything substantial, the amount of coding to do this pales in comparison with the amount of code you write to do normal system checking and testing, AND, using your own specified keys can often save some code on the checking side.

[Let me admit here that in many non-intrinsic applications my "calc_new_id();" function often resolves to simply "SELECT MAX(ID)+1 FROM..." and a validation check, often no more than a 2-line function, but in other systems it may get as complex as contacting a central server to gain/verify the next pool of insert IDs for the local system to use.]


Lastly - One small gripe I have with auto-increment is that it lulls system programmers into wanting to make INT PK's for everything, even stuff where that is clearly the wrong/unneeded approach. An Order-entry system should have the full Order-Number as the PK, not an INT.  I often see people making a table like this and just out of habit throwing an INT PRIMARY KEY AUTOINCREMENT in there with the next column being the: OrderNo TEXT COLLATE NOCASE UNIQUE - I ask you: if it walks like a PK, and talks like a PK... isn't it the real PK?.

There is nothing inherently relational about integers, it's a computer sourced convention used more for its ordinal properties than relational properties. Nobody talks about person 44's children.. It's John's children.   Of course it has merit on the point of storing the referring INT in many other tables being more efficient than including the entire TEXT value, and in some systems the lookup is faster (in MSSQL definitely, but I have not tested it in SQLite). This is however an ever-dwindling advantage.

Oh yes, and another reason: My OCD.  :)

Now just to be clear: I am not advocating the banning of AUTOINCREMENT, just trying to point out that it's a hammer graciously provided by DB engines, but no rule requires you to use it, and some jobs are not nails - especially if you have an expectation of what the ID should be (like the OP did).


Cheers,
Ryan


[1] I do actually use and suggest to use correctly calculated GUIDs for system-wide Unique references which avoids ID blocks or pools, but, a GUID is the most human-unfriendly ID possible and takes more space than an INT, so I only resort to it when the scale of the system justifies, but that's another debate and you can find discussion threads on it in this very list.

[2] If you do not care at all what the Key is (non-linked tables where you won't use the Key for anything else), then Auto-Increment is just dandy, of course.


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to