Re: [sqlite] FTS index size
> So, indeed, there's room for improvement! Thanks for looking into it, I'm interested in your results. That said, I still believe that the best way would be to make the current FTS implementation independent of the VIRTUAL TABLE structure, but be able to accept more general structures, namely ordinary SQLite tables. The reasons were mentioned in my last e-mail, one of them was that if I have a field in FTS table, it can't be indexed (using ordinary SQLite index, not only FTS), which is sometimes desirable (see Path field in my Songs table example). Jiri ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS index size
> Again, you've given a relatively broad description of what you're > trying to do. I could make up a bunch of stuff and answer my own > question, but you'd probably rather than I considered the problem > _you_ are having. Ok, I'll try to be as specific as possible. The main table I have is (the real version has much more fields, but it isn't important for our example): CREATE TABLE Songs ( ID INTEGER PRIMARY KEY AUTOINCREMENT, Artist TEXT COLLATE IUNICODE, Album TEXT COLLATE IUNICODE, SongTitle TEXT COLLATE IUNICODE, Path TEXT COLLATE IUNICODE, Year INTEGER, Bitrate INTEGER) This table can have even >100k records, even close to million and is mostly accessed by SELECTing all fields of some records, i.e.: SELECT * FROM Songs WHERE {something} In order to use FTS3, I could take all the text fields from Songs table and move them to a FTS3 table: CREATE TABLE SongsBase ( ID INTEGER PRIMARY KEY AUTOINCREMENT, Year INTEGER, Bitrate INTEGER) CREATE VIRTUAL TABLE SongsText USING FTS3(TOKENIZE mm, Artist, Album, SongTitle, Path) This way I would lose my custom collation (IUNICODE), which would be quite a problem, particularly for Path field (and if you're asking, yes, I'd like to include Path in the full-text index). Another problem is that joined SELECT on SongsBase and SongsText is slower than SELECT on the original Songs table. So, the only solution using FTS3 seems to be to use the original Songs table and add SongsText table, automatically updated by triggers like: CREATE TRIGGER update_songs UPDATE OF Artist,Album,SongTitle,Path ON Songs BEGIN UPDATE SongsText SET Artist=new.Artist, Album=new.Album, SongTitle=new.Title, Path=new.Path WHERE rowid=new.id; END; This solution probably isn't bad, but according to my knowledge of FTS3, it unnecessarily occupies some DB space (all text fields are actually stored twice, once in Songs and once in SongsText). Any ideas or recommedations? Thanks, Jiri ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS index size
> It's hard to say - you might want to describe your tables and the join > by simply including the SQL itself, rather than a loose description of > it. It may be that there's a simple change which will clear things > up. Ok, what exactly I mean is a table containing tracks in an audio library, i.e.: Table Tracks: ID INT Title TEXT Artist TEXT Album TEXT Length INT Bitrate INT ... (and much more) The table can be split in two parts, where TEXT fields would be indexed by FTS, but I'd like to avoid it, due to several reasons, particularly because of performance. > My question to SQLite experts is whether there is any solution to this > choice between speed and DB size. Preferrably I'd need to specify that just > some fields of an ordinary table participate in FTS3 index, but that > obviously isn't possible, at least not yet. Are there any plans for this? > No plans at this time. The goal of the current functionality > breakdown was to keep SQLite core from having to understand too much > about FTS, and FTS from having to understand too much SQL. >From my point of view, it would be best to separate FTS index data from the actual text data (I think that the idea was already mentioned in another thread here). It would have several advantages and interesting points. That said, I haven't studied the FTS implementation and don't have any idea whether this is technically possible with reasonable effort. Btw, whan I mean could be implemented like this: CREATE VIRTUAL INDEX ind USING fts4(TOKENIZE simple) ON Tracks(title, artist, album); Jiri ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS index size
Hello, from what I have read about FTS3, it stores the original data as well as the index needed for fast full-text access. Therefore, in several posts here it was recommended to use two tables joined one-to-one in case it's needed to store both text data and some other, possibly numeric data. Let's use an example: Table1: ID Int Value1 Int Value2 Int Table2 (FTS3): Text1 Text2 However, according to my tests, the join needed in case I want to get all data in tables Table1 and Table2 takes some time, it looks like it's almost twice as slow as having all the data in one table only. So, another option is to 'denormalize' the schema above to: Table12: ID Int Value1 Int Value2 Int Text1 Text Text2 Text but then I need another FTS3 table and so all values in Text1 and Text2 fields are actually duplicated in DB, right? My question to SQLite experts is whether there is any solution to this choice between speed and DB size. Preferrably I'd need to specify that just some fields of an ordinary table participate in FTS3 index, but that obviously isn't possible, at least not yet. Are there any plans for this? Thanks, Jiri ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Assertion failed - checkActiveVdbeCnt()
Hi, one of our users have recently reported a failed assertion in checkActiveVdbeCnt(), i.e. on: assert( cnt==db->activeVdbeCnt ); Could any SQLite expert direct me to where should I search for source of this problem, please? Can it be a bug in SQLite? Or in our incorrect usage? In such a case, what could be wrong (note that it all works well for thousands of users)? Thanks, Jiri ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Possible UNICODE LIKE, upper(), lower() function solution
Nice! I can imagine that this could be also used e.g. in FTS3, particularly unaccent() function could make searching for international users better. Thans for sharing your code, Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite is in Android
> http://www.sqlite.org/famous.html Btw, you write there: > There are unconfirmed reports on the internet that Apple also uses SQLite in > the iPhone and in the iPod touch. I'm pretty sure that SQLite is used there, I browsed my phone and saw several instances of SQLite there, I'd say that they use it almost for any data storage in iPhone... ;-) Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] FTS Usage
Hello, I'm considering usage of FTS, but from the documentation it's a little unclear to me what's the recommended way of using it, particularly in the following scenario: Let's say that I already have some database structure containing several tables with mixes type of data (some string fields that I would like to index by FTS and others that I wouldn't need to index). How should I use FTS in this case? Create a new FTS table and store all text data there? Would it actually be a duplication, or can FTS store only index, without the actual full strings? Thanks for your help, Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] TryEnterCriticalSection compilation
Hi, I read the past thread on TryEnterCriticalSection() compilation. I wonder whether it couldn't be included in SQLite sources as follows: BOOL (WINAPI *TryEnterCriticalSection)( LPCRITICAL_SECTION lpCriticalSection ); SQLITE_API int sqlite3_mutex_try(sqlite3_mutex *p){ if (!TryEnterCriticalSection) { TryEnterCriticalSection = GetProcAddress( LoadLibraryA("kernel32.dll"), "TryEnterCriticalSection"); } if (!TryEnterCriticalSection) return SQLITE_BUSY; else { // .. the standard sqlite3_mutex_try() follows here... } } The initialization code could certainly be moved to SQLite initialization. This way, it would work fine under any Windows version. Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Threads in 3.5.0
Hello, I was glad to read in http://sqlite.org/34to35.html document that: Restrictions on the use of the same database connection by multiple threads have been dropped. It is now safe for multiple threads to use the same database connection at the same time. I'd like if someone could clarify this: Does it mean that threads can be freely use with 3.5? It there any limitation at all? I.e., if I create a DB connection, can I freely execute statements in several threads? Is any kind of synchronization needed? Thanks, Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] SQL logic error or missing
Hi, Some time ago I asked here about a strange and very rare SQLite DB problem that puzzles me, but unfortunatelly I got no answer. I'll try to describe it again and hopefully someone will be able to comment it... Very rarely (I have just 5 debug logs from all our beta testers) executing 'COMMIT' statement returns SQLITE_ERROR + 'SQL logic error or missing database'. Analysis of the debug logs and source codes doesn't show any problem, there simply begins a transaction, some SQL statements are executed and COMMIT should finish it - but it doesn't. Unfortunatelly SQLITE_ERROR is quite a generic term that doesn't tell much. I tried to go through the places SQLite returns this message (I don't have any deeper understanding of SQLite sources) and one place that seems to be related to my problem is in sqlite3PagerCommitPhaseTwo(), namely: if( pPager->state
[sqlite] Unknown SQLITE_ERROR problem
Hi, Rarely (I have 4 debug logs from all our beta testers) executing 'COMMIT' statement returns SQLITE_ERROR + 'SQL logic error or missing database'. Analysis of the debug logs and source codes doesn't show any problem, there simply begins a transaction, some SQL statements are executed and COMMIT should finish it - but it doesn't. Unfortunatelly SQLITE_ERROR is quite a generic term that doesn't tell much. I tried to go through the places SQLite returns this message (I don't have any deeper understanding of SQLite sources) and one place that seems to be related to my problem is in sqlite3PagerCommitPhaseTwo(), namely: if( pPager->state
Re: [sqlite] Threading issues in SQLite
After several experiments and tests, I ended up using one SQLite connection in several threads. This certainly doesn't currently work in SQLite natively, but if you properly use some locking mechanism (CriticalSections in my case), it works fine. Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] VACUUM problems
Hello, I recently got some debug logs from my users that indicate problems with VACUUM command. The error message is: SQL logic error or missing database (1). This is using the latest SQLite (3.4.1) Windows DLL. I tried to rule out my coding mistakes, at the moment of the problem there is only one connection to the DB, this connection doesn't have any other SQLite command open, there isn't any transaction in progress - all this confirmed using debug logs and also safety measures in the code (CriticalSections, etc.). Does anyone has any idea what could be wrong? Any suggestiong regarding getting more information useful for debugging the problem? Thanks, Jiri P.S.: Except for this, everything seems to work fine for quite large user base - and SQLite is used pretty heavily in my app. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode collation
After reading some more it seems language and country tags alone aren't enough. Inside the same country, the same language and even the same culture, different orderings can be used depending on the ordering objective. As an example, there can be a specific order used in phone lists and other for dictionaries. That's why I left my original suggestion of en_AU in favor of UNIL_en_AU and UNIS_en_AU, because Linguistic and String ordering seem to be the most often needed types. This probably doesn't cover all situations, but definitely the most often needed. Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Unicode collation
My idea is to implement the UCA collation in SQLite (with the usual OMIT_* #ifdef's), using the DUCET table as base, and if people need the tailoring part for localized sorting, have it be optional by having a "sqlite_collation_data" table with the needed locale data included on the database. That would certainly be great if this is implemented. Note, however, that it doesn't fully solve the issues described in this thread - i.e. if you create a DB by some SQLite version and then use it by a newer version, where some elements were added/modified in DUCET, indexes of these DBs wouldn't be compatible. It can be resolved in several ways, e.g. as suggested to have all DUCET data stored in a special table in SQLite database. It's just a matter of choosing a well-balanced solution... Btw, even if this is implemented, there is still a need for a standardization such new collation names. E.g. that new language neutral collation could be called Unicode or DUCET? And how about language specific collations? After some thoughts, I'd suggest something like UNIL_en_AU (where UNIL means Unicode linguistic - i.e. some characters are properly ignored, given for example by an ordering of 'con', 'coop', 'co-op') and UNIS_en_AU (where UNIS means Unicode strings - i.e. special characters aren't ignored, so that above words would be ordered as 'co-op', 'con', 'coop'). Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Unicode collation
> Actually, reading one of the links you posted > (http://blogs.msdn.com/michkap/archive/2005/05/04/414520.aspx - > everybody please read it before continuing in discussion), I got (what > I think is a great) idea: Let's include version information about > collation algorithm we are using. So, extending my previous > suggestion, the SQLite collation title would look like > 'en_AU_Win_2_1', where the individual parts mean: > 'en_AU' - obviously a sort order > 'Win' - that we are basing the ordering on default Windows methods > (otherwise could be e.g. 'ICU') > '2_1' - version string as retrieved from GetNLSVersion() function and > NLSVERSIONINFO struct (see the link above for details). I don't see how that is different than the previous difficulty of having incompatible databases between systems. The only new thing is we can check it, but the database will have inconsistent behaviour anyway. As I wrote, my main intention was to handle incompatibilities among applications reading a database on a given system - and this would handle the issue perfectly. > This way we are absolutely sure that we don't corrupt SQLite database. > While I'm working on a single system, the version string wouldn't > change. Then, e.g. if I open SQLite database with 'en_AU_Win_2_1' > collation on a system NLSVERSIONINFO returns version 2.3, I know I > have to reindex, change collation title to 'en_AU_Win_2_3' and then > can safely use the database. For small databases, the task of re-indexing is not big, but you are forgetting you can have a database in a shared network folder, used by PC's in different parts of the world and even different OSs (with samba/cifs). That's why I like Trevor's idea so much. You are right in this, my proposal handles some problems, but can't handle this situation. That said, it at least allows sharing databases under compatible platforms - which is still an improvement over the current situation where every application has its own collation and there's absolutely no compatibility. Trevor proposed method doesn't need that, because the collation data goes with the database (which is the big advantage to me). And if you want to use another collation, you only need to import the collation data needed for that locale. Well, Trevor's proposal sounds good indeed, but realistically, I don't see any big chances it to be implemented - collations aren't that simple and maintaining all the related issues without relying on some external code is probably too much to expect. Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Unicode collation
After thinking a bit, it occurs to me that there's a compromise for the Unicode case that might be workable. The algorithm for collation is pretty stable, it's just the locale data that's the problem. If SQLite understands the algorithm, then locale data can go into special tables in the database itself. Applications manipulating the database schema would need to have the relevant collation data on hand to fill in the database, but other apps concerned with only the data could operate without any special knowledge. This approach keeps the database internally self-consistent while avoiding platform and versioning issues. Something like this could probably work, but it still seems to me that we are assigning SQLite tasks that OS or some other libraries should handle. Actually, reading one of the links you posted (http://blogs.msdn.com/michkap/archive/2005/05/04/414520.aspx - everybody please read it before continuing in discussion), I got (what I think is a great) idea: Let's include version information about collation algorithm we are using. So, extending my previous suggestion, the SQLite collation title would look like 'en_AU_Win_2_1', where the individual parts mean: 'en_AU' - obviously a sort order 'Win' - that we are basing the ordering on default Windows methods (otherwise could be e.g. 'ICU') '2_1' - version string as retrieved from GetNLSVersion() function and NLSVERSIONINFO struct (see the link above for details). This way we are absolutely sure that we don't corrupt SQLite database. While I'm working on a single system, the version string wouldn't change. Then, e.g. if I open SQLite database with 'en_AU_Win_2_1' collation on a system NLSVERSIONINFO returns version 2.3, I know I have to reindex, change collation title to 'en_AU_Win_2_3' and then can safely use the database. It seems to me to be a really cross-platform solution and what's best - without any coding on SQLite side necessary (even though possible to handle some parts of this proposal internally). Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode collation
Right now sqlite database files are portable across systems as-is. You're proposing they should need to be explicitly prepared for transport? Remember, the risk is silent data corruption. This is not a trivial matter. Well, I do understand that Unicode standard is quite a complicated thing, but we got a little farther in the discussion than I originally intended. My problem isn't that I'd like to transport SQLite database between platforms, let's stay for example with Windows - as I already wrote, if I create a database where I want to fully support Unicode strings (including correct sorting), I have to define some custom collation. This results in the fact that I can't open (or edit) the database in any other DB application across Windows, because they don't know the collation I defined. So, in fact, Unicode SQLite databases aren't portable even within a single platform! Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode collation
Unfortunately, Unicode specifies no such thing. There is no such thing as _the_ Unicode collation, if only because collation rules depend on locale. Yes, what I mean is to define how individual locales are identified in SQLite, like that already suggested 'en_AU', 'tr_TR', etc. Implementations vary significantly. In addition, Unicode keeps defining new characters. More recently defined characters may or may not be supported by a particular implementation's collation tables. Ok, you are probably right, but I guess that this shouldn't stop us. Or am I the only one who sees this as a problem? Is it really that big issue that particular implementations can differ? For example, doesn't VACUUM recreate indexes, so that they would be accurate after moving to another platform (and if it doesn't, shouldn't it?). Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Unicode collation
A moments reflection convinces me that this is not a good idea as stated. But perhaps it can be the seed for a idea that will actually work. Is there some way of adding customizations to the database file itself, or perhaps to a separate file in a standard place the SQLite always knows to look, so that custom enhancements and extensions to SQLite can be accomodated in standard tools? Worth thinking about, perhaps I tried to think about this in as many details as possible and I'd say that the original problem that I'd like to solve here (i.e. collation of Unicode character sets) doesn't require any special complex handling in SQLite. We really should just define how will individual collations be named, so that Unicode SQLite databases become portable. The only argument against this was that individual implementations of Unicode standard (i.e. mainly internal Windows methods or ICU library) could differ. However, is it really a problem? I'd say that it isn't. Unicode specifies exactly how characters should be sorted and so _if_ there really are any differences between Windows and ICU implementations, they should be considered as a bug and be solved there. I.e. it isn't anything SQLite should be responsible for. Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Unicode collation
So the choices seem to be: (1) Databases that corrupt if you move across platforms. (2) A 10MB database engine (3) Leave things as they are I prefer to go with (3) Yes, based on the facts I agree that (3) is the best way to go - a developer can decide whether to use Windows methods only, or be cross-platform and use ICU or anything else. However, it isn't what I mean. I suggest to publish some kind of a standard that database developers would use. I.e. SQLite web would define, that if you want to make a Unicode database that can be openned by other applications, you should define collation e.g. as 'en_AU' for Australian English. This way, if I want let users of my application to open DB in e.g. in some database editor, I can use this standard and users will be able to open it in many applications following the standard. So, what I propose aren't actually any changes in SQLite, but rather publishing something that would guide all SQLite developers. Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Unicode collation
Ok, drh, could you share your thoughts about this. Could something like this, i.e. some kind of recommendation be created, so that all applications are consistent in Unicode support? Thanks, Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Unicode collation
The reason is as you've surmised. Not all systems have full unicode support (I'm not sure, but if I had to guess, I would say very few systems do). Including an implementation with SQLite would bloat the library to at least several times it's current size. I know really well only Windows where it's only about using CompareString() function, i.e. almost no code in SQLite. Another reason is that it's a pretty complex topic. Supporting most European languages would probably be possible without too much trouble, but once you get into Asian and Middle-eastern languages I think it's much harder. As far as I know, there wouldn't be any complexity on SQLite's side - it's only about calling proper methods (be it CompareString() on Windows or ICU methods elsewhere), i.e. again pretty much no code needed in SQLite. There is an extension packaged with SQLite sources that uses the ICU library to provide locale dependent collation sequences and case folding. See here for details: ICU is nice, but pretty large. Since I develop for Windows, I'd rather not distribute it with my application considering that this is alredy provided in Windows in reasonable quality. Anyway, I guess that the question isn't mainly about how to implement this in SQLite, but about the problem that SQLite doesn't define any standard how to handle Unicode. Currently, any application that needs to work with Unicode data has to define its own collation and name it 'tr_TR', 'turkish', 'MyTurkish', or any other way. The result is a big mess and no chance of opening SQLite database in other application than it was designed for. So, why don't we (or you - SQLite developers) define how to name collations (e.g. that 'tr_TR', 'en_AU', ... standard?) and then every database complying this would be perfectly portable. Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Unicode collation
Hello, I wonder whether there are any plans to include internally proper Unicode comparisons? Don't get me wrong, I think that it's great that SQLite supports custom collations, there's absolutely no problem to handle it in internally for my database, but problem is that if I define UNICODE collation, no other application knows about it and so users can't open it in any SQLite DB editor. Nowadays applications without Unicode support slowly become rare, as I see, I'm not the first one asking for this kind of support in SQLite. Is there any technical reason why not to include UNICODE and e.g. IUNICODE (for case-insensitive comparisons) collations in SQLite? Is it because of some systems that don't have (full) Unicode support? In such a case, I guess that it could be a compile-time option. Thanks, Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Equal distribution from random rows
Is there a way I can modify my query to attain a much more equal distribution? It doesn't have to be perfect, but right now it is too noticiably weighted. What about this: SELECT * FROM Table LIMIT 1 OFFSET round((CAST(random(*) as float)/(9223372036854775807)+1)/2*(SELECT COUNT(*) FROM Table)) It possibly looks a little weird, but it seems to me to be a logical way of how to get Uniform distributed samples from a table. Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Database disk image is malformed (11)
Hello, bug reports from our users indicate that SQLite ocassionaly (or rather rarely) returns 'database disk image is malformed (11)' error. However, there doesn't seem to be any good reason for this behaviour, everything else seems to be fine. I have searched some older posts here and one user indicated that re-running the query always fixed this problem for him - I wonder, is this a known issue? Any other resolution? Btw, we use several threads in the application, but in 'SQLite nice' manner, i.e. each thread uses only its own connection. Also, at least on one ocassion it happened to user when there wasn't any other thread running. Reported on Win XP. Thanks for help, Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Problem with Unicode surrogates
Ok, I reviewed the sources (utf.c) and I'd say that it's still incorrect. Actually, it doesn't test for surrogates at all, the problem I was experiencing was caused by the fact that SQLite reads unallocated memory when there's an unpaired surrogate present as the last character of string - see READ_UTF16LE macro. I tried searching cvstrac, but haven't find this issue there, haven't anyone entered it or even fixed? Was my description clear? Should I enter the issue myself? It isn't anything huge, but at least in can result in AV in SQLite... Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Problem with Unicode surrogates
> Re. that 0xE000 character, should I submit a bugreport somewhere? You already did. Thanks. http://www.sqlite.org/cvstrac/chngview?cn=4017 Ok, I reviewed the sources (utf.c) and I'd say that it's still incorrect. Actually, it doesn't test for surrogates at all, the problem I was experiencing was caused by the fact that SQLite reads unallocated memory when there's an unpaired surrogate present as the last character of string - see READ_UTF16LE macro. Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Problem with Unicode surrogates
The Unicode standard is beside the point. There is lots of code that does not handle charsets and encodings correctly, which can open vulnerabilities to metacharacter injection. (Examples of this class of problem are SQL injection, XSS and format string exploits.) I can't agree. SQLite itself wouldn't be vurnelable at all by accepting any UTF-16 string (including invalid ones). Certainly, it could cause problems to some applications using SQLite, but SQLite can't be responsible for poorly written applications using it, can it? Anyway, it certainly can't be called a bug if SQLite returns error when I try to prepare an SQL statement with invalid characters. However, it should be clear what SQLite considers as an invalid character, is it only an unpaired surrogate, anything that Unicode standard defines as a 'noncharacter' or even any character that currently isn't defined by Unicode standard (which would be pretty bad in my opinion)? Re. that 0xE000 character, should I submit a bugreport somewhere? Thanks, Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] One more SQLite threading question
This almost immediately raises "library routine called out of sequence". It occurs as soon as the processing of A and B overlap, that means A is preparing statement #2 while B is still executing #1. Have you tried using Mutex or some other way to prevevent really simultaneous calling of SQLite methods? My guess that it should work well then, but I haven't tried it myself... Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Problem with Unicode surrogates
> What I would propose is to be more robust in handling such incorrect Unicode > strings, so that application can e.g. insert really any string, not only > those that comply Unicode standards. > Doing this can potentially lead to security exploits in programs that use SQLite. If you want to handle ill-formed UTF8 strings, use a BLOB. 1. To explain a bit more - I don't plan to handle ill-formed UTF-16 (really UTF-16, not UTF-8) strings, it's just that strings to DB application come from various sources, e.g. are read from some files and such strings can easily be incorrect. That said, I'd still expect that I can insert such strings to an ordinary text field in DB. As for security exploits, I don't see any, Unicode 4.0 standard allows applications to ignore such incorrect characters. Citation: Applications are free to use any of these noncharacter code points internally but should never attempt to exchange them. If a noncharacter is received in open interchange, an application is not required to interpret it in any way. It is good practice, however, to recognize it as a noncharacter and to take appropriate action, such as removing it from the text. Note that Unicode conformance freely allows the removal of these characters. (See C10 in Section 3.2, Conformance Requirements.) [End of citation] 2. No matter how you feel about 1., there's another problem: SQLite fails e.g. on 0xE000 UTF-16 character, which, as far as I know, isn't illegal. As a different example, SQLite doesn't fail on 0x character, which is by definition of Unicode standard a 'noncharacter' and isn't allowed in open interchange of Unicode text data. So, the upshot is, that I think SQLite should simply discard any Unicode 'noncharacters' in SQL statements and don't consider such statements as invalid. Jiri - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] One more SQLite threading question
Hello, I have tried to search all the documentation about threading in SQLite, but I'm still somewhat confused. It's often suggested to create a pool of sqlite3 structures, but what if I would like to have only only sqlite3 connection and serialize all the DB operations to one thread (name it 'A') that would prepare and execute all the queries. I guess that this would work well... However, it would be too time consuming to serialize every call to sqlite3_step(), so I wonder whether it can be called in another thread. So my scenario is: 1. Thread B wants to open a query 'SELECT * FROM Tbl1' 2. Thread A is used to prepare the query 'SELECT * FROM Tbl1' using sqlite3_prepare16_v2() and using the same sqlite3 DB connection as for possibly some other running SQL statements. 3. Thread B now repeatedly calls sqlite3_step(), sqlite3_column_text16() and similar functions in order to get all rows from DB. 4. Thread A is used to call sqlite3_finalize() on the openned query. So my questions are: a. Would the code described above work. b. In step 3., do I have to somehow make sure that calls to sqlite3_step() don't interfere with other SQLite processing in thread A, e.g. by Windows CriticalSections? Is anything like this also needed for sqlite3_column_text16()? Thanks for any explanation, Jiri
[sqlite] Problem with Unicode surrogates
Hello, what I describe below probably can't be called a SQLite bug, but in my opinion it could be fixed in SQLite anyway. The problem happens on Windows platform when using Sqlite3_Prepare16(), i.e. function accepting Unicode strings in UTF-16 encoding. When I for example perform a query: SELECT * FROM Table1 Where Field1='XY' where instead of X is 0xD800 Unicode (UTF-16) character, the query fails. It's apparently because 0xD800 is one of UTF-16 surrogates, i.e. the next character is expected to be something like 0xDD00, which isn't our case. What I would propose is to be more robust in handling such incorrect Unicode strings, so that application can e.g. insert really any string, not only those that comply Unicode standards. Thanks, Jiri
RE: [sqlite] Problems with multiple threads?
> If it is inconvenient to rollback and retry the entire transaction, then start the transaction initially with BEGIN EXCLUSIVE. > This will acquire the reserved lock immediately (instead of waiting to the first write occurs) and so you will either get an > SQLITE_BUSY right away (when it is a simple matter to just rerun the BEGIN EXCLUSIVE statement until it works) or you can be > assured of never getting another SQLITE_BUSY again until you try to COMMIT (and there too, you can simply rerun COMMIT > repeatedly until it works.) Thanks, I overlooked that by default transactions are DEFERRED in SQLite. It really fixes the problem. As I think about it, if I make _all_ transactions in my application IMMEDIATE, there shouldn't be any risk of a deadlock, right? Thanks, Jiri
RE: [sqlite] Problems with multiple threads?
Thanks for an additional explanation, I used sqlite3_get_autocommit() for debugging and it helped me to find out that it really was my fault. There was an incorrect processing after COMMIT returned SQLITE_BUSY. So sorry for this. However, right after fixing this, I found another problem. It certainly can be my fault, but I don't see how could it be: If I don't use transactions, multiple threads seem to proceed well, but then right after I add BEGIN and COMMIT to some place, all threads lock eventually. I debugged it and found that _all_ threads accessing SQLite are in a loop waiting for an action to proceed and all of them keep getting SQLITE_BUSY result. I wonder, can it be somehow my fault, or is it some kind of a dead-lock in SQLite? In the simpliest form it takes only two threads to reproduce and the problem looks like: Thread 1: BEGIN TRANSACTION<-- proceeded INSERT INTO ... <-- Processing stops here, waiting in a loop, Sqlite3_step() infinitely returns SQLITE_BUSY COMMIT Thread 2: // no explicit transaction start here DELETE FROM ... <-- Processing stops here, waiting in a loop, Sqlite3_step() infinitely returns SQLITE_BUSY Thread 2 can also look like this in order to reproduce the problem (situation in Thread 1 remains the same): BEGIN TRANSACTION<-- proceeded DELETE FROM ... <-- proceeded COMMIT <-- Processing stops here, waiting in a loop, Sqlite3_step() infinitely returns SQLITE_BUSY No other thread calls any SQLite function. Do you have any idea what could be wrong? Thanks, Jiri
[sqlite] Problems with multiple threads?
Hello, I'm trying to use SQLite in an application where it's needed to work with one database in mutliple threads. Based on all the info I read in SQLite documentation I create a new database connection for every new thread created. Each thread does some SELECTs, INSERTs or UPDATEs, but there isn't any schema modification. If multiple threads are running, I encounter some strange problems: 1. Occasionally after running Sqlite3_step() I get SQLITE_CANTOPEN ('Unable to open the database file') error. I found out that it can be fixed by running the query again, i.e. again calling Sqlite3_Prepare(). So this isn't a big issue, but still I wonder why this error message is returned? Shouldn't SQLITE_BUSY or SQLITE_LOCKED be returned instead? 2. More serious issue is that after I enable transaction usage (not used in 1.) sometimes (again, it's random) after calling 'BEGIN TRANSACTION' I get an error SQLITE_ERROR ('cannot start a transaction within a transaction'). Problem is that I definitely am not already in a transaction. The only reason for this seems to be that there's >1 thread running, with only 1 thread running there's no problem. I tried this also with the latest version of SQLite (3.3.6 on Windows). Any idea what can I do about it? Thanks, Jiri