Hi Stephen,
On 2013/07/27 13:22, Stephen Chrzanowski wrote:
Just out of curiosity, does SQLite keep a record on how large the content
of a blob value is? If I make a fictitious call like [ *select
length(BlobField) from A_Table where TheID=1* ] does SQLite actually count
each and every byte from start to finish, or, does it reference an extra
bit/nibble/byte/word/integer/int64 internal record on that field?
SQLite has type identifiers and length specifiers for every datatype - i.e. it stores the length of variable length types such as
text and blob. You can even very deterministically find exactly how many bytes a data field takes up by understanding its type and
length specifiers and data section. There is no byte-counting.
http://www.sqlite.org/different.html#flex
I know in some languages, depending on the declaration type of the
variable, getting the length of a string can be done by looking at the very
first set of bytes of the string instead of counting characters from start
to finish. For instance, back in the Delphi 7 days, I could use *
LengthOfString:=SubjectString[0];* or something to that effect. I can't
recall if that initial character was a byte, word, or integer (8-bit,
16-bit, 32-bit) but it never failed me. ;)
In the first Pascal environments a string had a maximum length of 255 (256 if you include the 1-byte length specifier). This worked
a lot better for keeping shorter strings on the stack from a programming point of view than the (back-then) C-type of habit of
having to set memory aside for every string with direct memory pointers (where an extra Zero character denotes the end of the
string, but the length is not limited to 255) or the more stack-compatible character arrays. However, this fixed a problem that was
soon to be no longer a problem (memory usage and speed of requesting memory from the OS etc.), so very soon, those kind of strings
were relegated to the "ShortString" type in Delphi and more conventional memory strings were used (By Delphi 1 in fact), along with
another brilliant invention called reference counting. Since then all String types were really just pointers to memory byte lists
ending in a #00 and very much compatible with the C model (and others).
The fact that you could still do that in Delphi 7 is only because of maintained backward compatibility from the folks at Borland,
and had no significance to D7 at all, or indeed to any new RAD studio incarnations by Embarcadero. Interestingly, this backward
compatibility did provide some advantage, in that a memory table was kept (as is still the case today) of strings and references to
multiple instances, and so getting the length of a string was simply a reference lookup in stead of byte-counting until #00 - which
was the normal C method back then. You can still declare a shortstring (or ANSIString) in Delphi, but they are no more efficient
than a long one these days, also, normal strings now can be almost 4GB long (32bit counter) or 2GB if it is an MCBS or WideString
allowing UTF8 and other formats - none of which worked (or works) with shortstrings which have ANSI format only.
I'm in a self-imposed situation where I'm kind of looking at the "best
method for the sake of speed" on how to most efficiently implement the
design of the table and the code to get information about the blob data.
Once the data is in the field, that record will never change, only
deleted. So I'm pondering on whether I should just use length(BlobField)
or (as some would say "best practices") reference a numerical field that
stores the length of the chunk of data that is assigned on the insert.
Length() should be as efficient as any lookup system you could implement.
Considerations:
- This application will NOT be run on a limited OS, like Android or iOS.
- What I'm storing is in the 10s of megabytes per RECORD, so the size and
number of fields isn't exactly a serious concern.
- Writing the data to the table isn't a significant ordeal as far as speed
is concerned, but to validate I'm not writing multiple copies of the same
file to the database, I'm looking at the files time stamp (Accuracy to the
minute) and the physical size of the file. The application that writes
what I'm archiving, first of all, isn't mine, and second saves data using
13 file names, and overwrites the data... hence my attempt at archiving.
- I have absolutely NO idea how many rows this particular table is going to
contain eventually. I may be looking at maybe 30ish, or 300ish, or
3000ish. (I'm kinda getting nauseous thinking about that 3000 mark and the
potential size of the database. Not in the least towards the capabilities
of SQLite (I have total faith it can handle the sheer size) but more about
the capacity of my SSD. ;))
Well, rest assured, we all use SQLite databases with several millions of records and many gigabytes of data. The size should be easy
to guess at - what is the average size of the files you need to archive? How frequently do you need to archive them? If you can get
those two figures, you can calculate the rate of growth of the data and estimate it's size after x time. On a more general note, if
you EVER log or archive stuff, please do it to the second drive, or external one, NOT to your main SSD :)
Cheers!
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users