Re: [sqlite] FTS index size

2008-07-24 Thread Jiri Hajek
> So, indeed, there's room for improvement!

Thanks for looking into it, I'm interested in your results. That said, I
still believe that the best way would be to make the current FTS
implementation independent of the VIRTUAL TABLE structure, but be able to
accept more general structures, namely ordinary SQLite tables. The reasons
were mentioned in my last e-mail, one of them was that if I have a field in
FTS table, it can't be indexed (using ordinary SQLite index, not only FTS),
which is sometimes desirable (see Path field in my Songs table example).

Jiri

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS index size

2008-07-23 Thread Jiri Hajek
> Again, you've given a relatively broad description of what you're
> trying to do.  I could make up a bunch of stuff and answer my own
> question, but you'd probably rather than I considered the problem
> _you_ are having.

Ok, I'll try to be as specific as possible. The main table I have is (the
real version has much more fields, but it isn't important for our example):

CREATE TABLE Songs (
  ID INTEGER PRIMARY KEY AUTOINCREMENT,
  Artist TEXT COLLATE IUNICODE,
  Album TEXT COLLATE IUNICODE,
  SongTitle TEXT COLLATE IUNICODE,
  Path TEXT COLLATE IUNICODE,
  Year INTEGER,
  Bitrate INTEGER)

This table can have even >100k records, even close to million and is mostly
accessed by SELECTing all fields of some records, i.e.:

SELECT * FROM Songs WHERE {something}

In order to use FTS3, I could take all the text fields from Songs table and
move them to a FTS3 table:

CREATE TABLE SongsBase (
  ID INTEGER PRIMARY KEY AUTOINCREMENT,
  Year INTEGER,
  Bitrate INTEGER)

CREATE VIRTUAL TABLE SongsText USING FTS3(TOKENIZE mm,
  Artist,
  Album,
  SongTitle,
  Path)

This way I would lose my custom collation (IUNICODE), which would be quite a
problem, particularly for Path field (and if you're asking, yes, I'd like to
include Path in the full-text index). Another problem is that joined SELECT
on SongsBase and SongsText is slower than SELECT on the original Songs
table.

So, the only solution using FTS3 seems to be to use the original Songs table
and add SongsText table, automatically updated by triggers like:

CREATE TRIGGER update_songs UPDATE OF Artist,Album,SongTitle,Path ON Songs
BEGIN
  UPDATE SongsText SET Artist=new.Artist, Album=new.Album,
SongTitle=new.Title, Path=new.Path WHERE rowid=new.id;
END;

This solution probably isn't bad, but according to my knowledge of FTS3, it
unnecessarily occupies some DB space (all text fields are actually stored
twice, once in Songs and once in SongsText).

Any ideas or recommedations?

Thanks,
Jiri

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS index size

2008-07-22 Thread Jiri Hajek
> It's hard to say - you might want to describe your tables and the join
> by simply including the SQL itself, rather than a loose description of
> it.  It may be that there's a simple change which will clear things
> up.

Ok, what exactly I mean is a table containing tracks in an audio library,
i.e.:

Table Tracks:
ID INT
Title TEXT
Artist TEXT
Album TEXT
Length INT
Bitrate INT
... (and much more)

The table can be split in two parts, where TEXT fields would be indexed by
FTS, but I'd like to avoid it, due to several reasons, particularly because
of performance.

> My question to SQLite experts is whether there is any solution to this
> choice between speed and DB size. Preferrably I'd need to specify that
just
> some fields of an ordinary table participate in FTS3 index, but that
> obviously isn't possible, at least not yet. Are there any plans for this?

> No plans at this time.  The goal of the current functionality
> breakdown was to keep SQLite core from having to understand too much
> about FTS, and FTS from having to understand too much SQL.

>From my point of view, it would be best to separate FTS index data from the
actual text data (I think that the idea was already mentioned in another
thread here). It would have several advantages and interesting points. That
said, I haven't studied the FTS implementation and don't have any idea
whether this is technically possible with reasonable effort.

Btw, whan I mean could be implemented like this:

CREATE VIRTUAL INDEX ind USING fts4(TOKENIZE simple) ON Tracks(title,
artist, album);

Jiri

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS index size

2008-07-18 Thread Jiri Hajek
Hello,

from what I have read about FTS3, it stores the original data as well as the
index needed for fast full-text access. Therefore, in several posts here it
was recommended to use two tables joined one-to-one in case it's needed to
store both text data and some other, possibly numeric data. Let's use an
example:

Table1:
ID Int
Value1 Int
Value2 Int

Table2 (FTS3):
Text1
Text2

However, according to my tests, the join needed in case I want to get all
data in tables Table1 and Table2 takes some time, it looks like it's almost
twice as slow as having all the data in one table only.

So, another option is to 'denormalize' the schema above to:

Table12:
ID Int
Value1 Int
Value2 Int
Text1 Text
Text2 Text

but then I need another FTS3 table and so all values in Text1 and Text2
fields are actually duplicated in DB, right?

My question to SQLite experts is whether there is any solution to this
choice between speed and DB size. Preferrably I'd need to specify that just
some fields of an ordinary table participate in FTS3 index, but that
obviously isn't possible, at least not yet. Are there any plans for this?

Thanks,
Jiri

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Assertion failed - checkActiveVdbeCnt()

2008-04-10 Thread Jiri Hajek
Hi,

one of our users have recently reported a failed assertion in
checkActiveVdbeCnt(), i.e. on:

assert( cnt==db->activeVdbeCnt );

Could any SQLite expert direct me to where should I search for source
of this problem, please? Can it be a bug in SQLite? Or in our
incorrect usage? In such a case, what could be wrong (note that it all
works well for thousands of users)?

Thanks,
Jiri
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Possible UNICODE LIKE, upper(), lower() function solution

2008-01-04 Thread Jiri Hajek
Nice! I can imagine that this could be also used e.g. in FTS3,
particularly unaccent() function could make searching for
international users better.

Thans for sharing your code,
Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite is in Android

2007-12-05 Thread Jiri Hajek
> http://www.sqlite.org/famous.html

Btw, you write there:

> There are unconfirmed reports on the internet that Apple also uses SQLite in 
> the iPhone and in the iPod touch.

I'm pretty sure that SQLite is used there, I browsed my phone and saw
several instances of SQLite there, I'd say that they use it almost for
any data storage in iPhone... ;-)

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] FTS Usage

2007-11-29 Thread Jiri Hajek
Hello,

I'm considering usage of FTS, but from the documentation it's a little
unclear to me what's the recommended way of using it, particularly in
the following scenario:

Let's say that I already have some database structure containing
several tables with mixes type of data (some string fields that I
would like to index by FTS and others that I wouldn't need to index).
How should I use FTS in this case? Create a new FTS table and store
all text data there? Would it actually be a duplication, or can FTS
store only index, without the actual full strings?

Thanks for your help,
Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] TryEnterCriticalSection compilation

2007-10-04 Thread Jiri Hajek
Hi, I read the past thread on TryEnterCriticalSection() compilation. I
wonder whether it couldn't be included in SQLite sources as follows:

BOOL (WINAPI *TryEnterCriticalSection)(
   LPCRITICAL_SECTION lpCriticalSection
);

SQLITE_API int sqlite3_mutex_try(sqlite3_mutex *p){
  if (!TryEnterCriticalSection)
  {
  TryEnterCriticalSection = GetProcAddress(
LoadLibraryA("kernel32.dll"), "TryEnterCriticalSection");
  }

  if (!TryEnterCriticalSection)
return SQLITE_BUSY;
  else
  {
//  .. the standard sqlite3_mutex_try() follows here...
  }
}

The initialization code could certainly be moved to SQLite initialization.

This way, it would work fine under any Windows version.

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Threads in 3.5.0

2007-10-04 Thread Jiri Hajek
Hello, I was glad to read in http://sqlite.org/34to35.html document that:

Restrictions on the use of the same database connection by multiple
threads have been dropped. It is now safe for multiple threads to use
the same database connection at the same time.

I'd like if someone could clarify this: Does it mean that threads can
be freely use with 3.5? It there any limitation at all? I.e., if I
create a DB connection, can I freely execute statements in several
threads? Is any kind of synchronization needed?

Thanks,
Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] SQL logic error or missing

2007-08-31 Thread Jiri Hajek
Hi,

Some time ago I asked here about a strange and very rare SQLite DB
problem that puzzles me, but unfortunatelly I got no answer. I'll try
to describe it again and hopefully someone will be able to comment
it...

Very rarely (I have just 5 debug logs from all our beta testers) executing
'COMMIT' statement returns SQLITE_ERROR + 'SQL logic error or missing
database'. Analysis of the debug logs and source codes doesn't show
any problem, there simply begins a transaction, some SQL statements
are executed and COMMIT should finish it - but it doesn't.

Unfortunatelly SQLITE_ERROR is quite a generic term that doesn't tell
much. I tried to go through the places SQLite returns this message (I
don't have any deeper understanding of SQLite sources) and one place
that seems to be related to my problem is in
sqlite3PagerCommitPhaseTwo(), namely:

  if( pPager->state

[sqlite] Unknown SQLITE_ERROR problem

2007-08-15 Thread Jiri Hajek
Hi,

Rarely (I have 4 debug logs from all our beta testers) executing
'COMMIT' statement returns SQLITE_ERROR + 'SQL logic error or missing
database'. Analysis of the debug logs and source codes doesn't show
any problem, there simply begins a transaction, some SQL statements
are executed and COMMIT should finish it - but it doesn't.

Unfortunatelly SQLITE_ERROR is quite a generic term that doesn't tell
much. I tried to go through the places SQLite returns this message (I
don't have any deeper understanding of SQLite sources) and one place
that seems to be related to my problem is in
sqlite3PagerCommitPhaseTwo(), namely:

  if( pPager->state

Re: [sqlite] Threading issues in SQLite

2007-08-11 Thread Jiri Hajek
After several experiments and tests, I ended up using one SQLite
connection in several threads. This certainly doesn't currently work
in SQLite natively, but if you properly use some locking mechanism
(CriticalSections in my case), it works fine.

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] VACUUM problems

2007-08-03 Thread Jiri Hajek
Hello,

I recently got some debug logs from my users that indicate problems
with VACUUM command. The error message is:

SQL logic error or missing database (1).

This is using the latest SQLite (3.4.1) Windows DLL.

I tried to rule out my coding mistakes, at the moment of the problem
there is only one connection to the DB, this connection doesn't have
any other SQLite command open, there isn't any transaction in progress
- all this confirmed using debug logs and also safety measures in the
code (CriticalSections, etc.).

Does anyone has any idea what could be wrong? Any suggestiong
regarding getting more information useful for debugging the problem?

Thanks,
Jiri

P.S.: Except for this, everything seems to work fine for quite large
user base - and SQLite is used pretty heavily in my app.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode collation

2007-06-28 Thread Jiri Hajek

After reading some more it seems language and country tags alone
aren't enough. Inside the same country, the same language and even the
same culture, different orderings can be used depending on the
ordering objective. As an example, there can be a specific order used
in phone lists and other for dictionaries.


That's why I left my original suggestion of en_AU in favor of
UNIL_en_AU and UNIS_en_AU, because Linguistic and String ordering seem
to be the most often needed types. This probably doesn't cover all
situations, but definitely the most often needed.

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Unicode collation

2007-06-28 Thread Jiri Hajek

My idea is to implement the UCA collation in SQLite (with the usual
OMIT_* #ifdef's), using the DUCET table as base, and if people need
the tailoring part for localized sorting, have it be optional by
having a "sqlite_collation_data" table with the needed locale data
included on the database.


That would certainly be great if this is implemented. Note, however,
that it doesn't fully solve the issues described in this thread - i.e.
if you create a DB by some SQLite version and then use it by a newer
version, where some elements were
added/modified in DUCET, indexes of these DBs wouldn't be compatible.
It can be resolved in several ways, e.g. as suggested to have all
DUCET data stored in a special table in SQLite database. It's just a
matter of choosing a well-balanced solution...

Btw, even if this is implemented, there is still a need for a
standardization such new collation names. E.g. that new language
neutral collation could be called Unicode or DUCET? And how about
language specific collations? After some thoughts, I'd suggest
something like UNIL_en_AU (where UNIL means Unicode linguistic - i.e.
some characters are properly ignored, given for example by an ordering
of 'con', 'coop', 'co-op') and UNIS_en_AU (where UNIS means Unicode
strings - i.e. special characters aren't ignored, so that above words
would be ordered as 'co-op', 'con', 'coop').

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Unicode collation

2007-06-27 Thread Jiri Hajek

> Actually, reading one of the links you posted
> (http://blogs.msdn.com/michkap/archive/2005/05/04/414520.aspx -
> everybody please read it before continuing in discussion), I got (what
> I think is a great) idea: Let's include version information about
> collation algorithm we are using. So, extending my previous
> suggestion, the SQLite collation title would look like
> 'en_AU_Win_2_1', where the individual parts mean:
>  'en_AU' - obviously a sort order
>  'Win' - that we are basing the ordering on default Windows methods
> (otherwise could be e.g. 'ICU')
>  '2_1' - version string as retrieved from GetNLSVersion() function and
> NLSVERSIONINFO struct (see the link above for details).

I don't see how that is different than the previous difficulty of
having incompatible databases between systems. The only new thing is
we can check it, but the database will have inconsistent behaviour
anyway.


As I wrote, my main intention was to handle incompatibilities among
applications reading a database on a given system - and this would
handle the issue perfectly.


> This way we are absolutely sure that we don't corrupt SQLite database.
> While I'm working on a single system, the version string wouldn't
> change. Then, e.g. if I open SQLite database with 'en_AU_Win_2_1'
> collation on a system NLSVERSIONINFO returns version 2.3, I know I
> have to reindex, change collation title to 'en_AU_Win_2_3' and then
> can safely use the database.

For small databases, the task of re-indexing is not big, but you are
forgetting you can have a database in a shared network folder, used by
PC's in different parts of the world and even different OSs (with
samba/cifs). That's why I like Trevor's idea so much.


You are right in this, my proposal handles some problems, but can't
handle this situation. That said, it at least allows sharing databases
under compatible platforms - which is still an improvement over the
current situation where every application has its own collation and
there's absolutely no compatibility.


 Trevor proposed method doesn't need that, because the collation data
goes with the database (which is the big advantage to me). And if you
want to use another collation, you only need to import the collation
data needed for that locale.


Well, Trevor's proposal sounds good indeed, but realistically, I don't
see any big chances it to be implemented - collations aren't that
simple and maintaining all the related issues without relying on some
external code is probably too much to expect.

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Unicode collation

2007-06-27 Thread Jiri Hajek

After thinking a bit, it occurs to me that there's a compromise for
the Unicode case that might be workable.  The algorithm for collation
is pretty stable, it's just the locale data that's the problem.  If
SQLite understands the algorithm, then locale data can go into special
tables in the database itself.

Applications manipulating the database schema would need to have the
relevant collation data on hand to fill in the database, but other
apps concerned with only the data could operate without any special
knowledge.  This approach keeps the database internally
self-consistent while avoiding platform and versioning issues.


Something like this could probably work, but it still seems to me that
we are assigning SQLite tasks that OS or some other libraries should
handle.

Actually, reading one of the links you posted
(http://blogs.msdn.com/michkap/archive/2005/05/04/414520.aspx -
everybody please read it before continuing in discussion), I got (what
I think is a great) idea: Let's include version information about
collation algorithm we are using. So, extending my previous
suggestion, the SQLite collation title would look like
'en_AU_Win_2_1', where the individual parts mean:
'en_AU' - obviously a sort order
'Win' - that we are basing the ordering on default Windows methods
(otherwise could be e.g. 'ICU')
'2_1' - version string as retrieved from GetNLSVersion() function and
NLSVERSIONINFO struct (see the link above for details).

This way we are absolutely sure that we don't corrupt SQLite database.
While I'm working on a single system, the version string wouldn't
change. Then, e.g. if I open SQLite database with 'en_AU_Win_2_1'
collation on a system NLSVERSIONINFO returns version 2.3, I know I
have to reindex, change collation title to 'en_AU_Win_2_3' and then
can safely use the database.

It seems to me to be a really cross-platform solution and what's best
- without any coding on SQLite side necessary (even though possible to
handle some parts of this proposal internally).

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode collation

2007-06-27 Thread Jiri Hajek

Right now sqlite database files are portable across systems as-is.
You're proposing they should need to be explicitly prepared for
transport?  Remember, the risk is silent data corruption.  This is not
a trivial matter.


Well, I do understand that Unicode standard is quite a complicated
thing, but we got a little farther in the discussion than I originally
intended. My problem isn't that I'd like to transport SQLite database
between platforms, let's stay for example with Windows - as I already
wrote, if I create a database where I want to fully support Unicode
strings (including correct sorting), I have to define some custom
collation. This results in the fact that I can't open (or edit) the
database in any other DB application across Windows, because they
don't know the collation I defined. So, in fact, Unicode SQLite
databases aren't portable even within a single platform!

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode collation

2007-06-27 Thread Jiri Hajek

Unfortunately, Unicode specifies no such thing. There is no such thing
as _the_ Unicode collation, if only because collation rules depend on
locale.


Yes, what I mean is to define how individual locales are identified in
SQLite, like that already suggested 'en_AU', 'tr_TR', etc.


Implementations vary significantly. In addition, Unicode keeps
defining new characters. More recently defined characters may or may not
be supported by a particular implementation's collation tables.


Ok, you are probably right, but I guess that this shouldn't stop us.
Or am I the only one who sees this as a problem?

Is it really that big issue that particular implementations can
differ? For example, doesn't VACUUM recreate indexes, so that they
would be accurate after moving to another platform (and if it doesn't,
shouldn't it?).

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Unicode collation

2007-06-27 Thread Jiri Hajek

A moments reflection convinces me that this is not a good idea
as stated.  But perhaps it can be the seed for a idea that will
actually work.  Is there some way of adding customizations to
the database file itself, or perhaps to a separate file in a
standard place the SQLite always knows to look, so that
custom enhancements and extensions to SQLite can be accomodated
in standard tools?  Worth thinking about, perhaps


I tried to think about this in as many details as possible and I'd say
that the original problem that I'd like to solve here (i.e. collation
of Unicode character sets) doesn't require any special complex
handling in SQLite. We really should just define how will individual
collations be named, so that Unicode SQLite databases become portable.

The only argument against this was that individual implementations of
Unicode standard (i.e. mainly internal Windows methods or ICU library)
could differ. However, is it really a problem? I'd say that it isn't.
Unicode specifies exactly how characters should be sorted and so _if_
there really are any differences between Windows and ICU
implementations, they should be considered as a bug and be solved
there. I.e. it isn't anything SQLite should be responsible for.

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Unicode collation

2007-06-25 Thread Jiri Hajek

So the choices seem to be:

 (1) Databases that corrupt if you move across platforms.
 (2) A 10MB database engine
 (3) Leave things as they are

I prefer to go with (3)


Yes, based on the facts I agree that (3) is the best way to go - a
developer can decide whether to use Windows methods only, or be
cross-platform and use ICU or anything else.

However, it isn't what I mean. I suggest to publish some kind of a
standard that database developers would use. I.e. SQLite web would
define, that if you want to make a Unicode database that can be
openned by other applications, you should define collation e.g. as
'en_AU' for Australian English. This way, if I want let users of my
application to open DB in e.g. in some database editor, I can use this
standard and users will be able to open it in many applications
following the standard.

So, what I propose aren't actually any changes in SQLite, but rather
publishing something that would guide all SQLite developers.

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Unicode collation

2007-06-25 Thread Jiri Hajek

Ok, drh, could you share your thoughts about this. Could something
like this, i.e. some kind of recommendation be created, so that all
applications are consistent in Unicode support?

Thanks,
Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Unicode collation

2007-06-23 Thread Jiri Hajek

The reason is as you've surmised. Not all systems have full unicode
support (I'm not sure, but if I had to guess, I would say very few
systems do). Including an implementation with SQLite would bloat
the library to at least several times it's current size.


I know really well only Windows where it's only about using
CompareString() function, i.e. almost no code in SQLite.


Another reason is that it's a pretty complex topic. Supporting
most European languages would probably be possible without too
much trouble, but once you get into Asian and Middle-eastern
languages I think it's much harder.


As far as I know, there wouldn't be any complexity on SQLite's side -
it's only about calling proper methods (be it CompareString() on
Windows or ICU methods elsewhere), i.e. again pretty much no code
needed in SQLite.


There is an extension packaged with SQLite sources that uses the
ICU library to provide locale dependent collation sequences and
case folding. See here for details:


ICU is nice, but pretty large. Since I develop for Windows, I'd rather
not distribute it with my application considering that this is alredy
provided in Windows in reasonable quality.

Anyway, I guess that the question isn't mainly about how to implement
this in SQLite, but about the problem that SQLite doesn't define any
standard how to handle Unicode. Currently, any application that needs
to work with Unicode data has to define its own collation and name it
'tr_TR', 'turkish', 'MyTurkish', or any other way. The result is a big
mess and no chance of opening SQLite database in other application
than it was designed for.

So, why don't we (or you - SQLite developers) define how to name
collations (e.g. that 'tr_TR', 'en_AU', ... standard?) and then every
database complying this would be perfectly portable.

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Unicode collation

2007-06-22 Thread Jiri Hajek

Hello,

I wonder whether there are any plans to include internally proper
Unicode comparisons? Don't get me wrong, I think that it's great that
SQLite supports custom collations, there's absolutely no problem to
handle it in internally for my database, but problem is that if I
define UNICODE collation, no other application knows about it and so
users can't open it in any SQLite DB editor.

Nowadays applications without Unicode support slowly become rare, as I
see, I'm not the first one asking for this kind of support in SQLite.
Is there any technical reason why not to include UNICODE and e.g.
IUNICODE (for case-insensitive comparisons) collations in SQLite? Is
it because of some systems that don't have (full) Unicode support? In
such a case, I guess that it could be a compile-time option.

Thanks,
Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Equal distribution from random rows

2007-05-30 Thread Jiri Hajek

  Is there a way I can modify my query to attain a much more equal
distribution? It doesn't have to be perfect, but right now it is too
noticiably weighted.


What about this:

SELECT *  FROM Table LIMIT 1 OFFSET round((CAST(random(*) as
float)/(9223372036854775807)+1)/2*(SELECT COUNT(*) FROM Table))

It possibly looks a little weird, but it seems to me to be a logical
way of how to get Uniform distributed samples from a table.

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Database disk image is malformed (11)

2007-05-23 Thread Jiri Hajek

Hello,

bug reports from our users indicate that SQLite ocassionaly (or rather
rarely) returns 'database disk image is malformed (11)' error.
However, there doesn't seem to be any good reason for this behaviour,
everything else seems to be fine.

I have searched some older posts here and one user indicated that
re-running the query always fixed this problem for him - I wonder, is
this a known issue? Any other resolution?

Btw, we use several threads in the application, but in 'SQLite nice'
manner, i.e. each thread uses only its own connection. Also, at least
on one ocassion it happened to user when there wasn't any other thread
running. Reported on Win XP.

Thanks for help,
Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Problem with Unicode surrogates

2007-05-22 Thread Jiri Hajek

Ok, I reviewed the sources (utf.c) and I'd say that it's still
incorrect. Actually, it doesn't test for surrogates at all, the
problem I was experiencing was caused by the fact that SQLite reads
unallocated memory when there's an unpaired surrogate present as the
last character of string - see READ_UTF16LE macro.


I tried searching cvstrac, but haven't find this issue there, haven't
anyone entered it or even fixed? Was my description clear? Should I
enter the issue myself? It isn't anything huge, but at least in can
result in AV in SQLite...

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Problem with Unicode surrogates

2007-05-17 Thread Jiri Hajek

> Re. that 0xE000 character, should I submit a bugreport somewhere?

You already did. Thanks.

 http://www.sqlite.org/cvstrac/chngview?cn=4017


Ok, I reviewed the sources (utf.c) and I'd say that it's still
incorrect. Actually, it doesn't test for surrogates at all, the
problem I was experiencing was caused by the fact that SQLite reads
unallocated memory when there's an unpaired surrogate present as the
last character of string - see READ_UTF16LE macro.

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Problem with Unicode surrogates

2007-05-17 Thread Jiri Hajek

The Unicode standard is beside the point. There is lots of code
that does not handle charsets and encodings correctly, which can
open vulnerabilities to metacharacter injection. (Examples of
this class of problem are SQL injection, XSS and format string
exploits.)


I can't agree. SQLite itself wouldn't be vurnelable at all by
accepting any UTF-16 string (including invalid ones). Certainly, it
could cause problems to some applications using SQLite, but SQLite
can't be responsible for poorly written applications using it, can it?

Anyway, it certainly can't be called a bug if SQLite returns error
when I try to prepare an SQL statement with invalid characters.
However, it should be clear what SQLite considers as an invalid
character, is it only an unpaired surrogate, anything that Unicode
standard defines as a 'noncharacter' or even any character that
currently isn't defined by Unicode standard (which would be pretty bad
in my opinion)?

Re. that 0xE000 character, should I submit a bugreport somewhere?

Thanks,
Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] One more SQLite threading question

2007-05-17 Thread Jiri Hajek

This almost immediately raises
"library routine called out of sequence". It occurs as soon as the
processing of A and B overlap, that means A is preparing statement #2
while B is still executing #1.


Have you tried using Mutex or some other way to prevevent really
simultaneous calling of SQLite methods? My guess that it should work
well then, but I haven't tried it myself...

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Problem with Unicode surrogates

2007-05-16 Thread Jiri Hajek

> What I would propose is to be more robust in handling such incorrect Unicode
> strings, so that application can e.g. insert really any string, not only
> those that comply Unicode standards.
>

Doing this can potentially lead to security exploits in programs
that use SQLite.  If you want to handle ill-formed UTF8 strings,
use a BLOB.


1. To explain a bit more - I don't plan to handle ill-formed UTF-16
(really UTF-16, not UTF-8) strings, it's just that strings to DB
application come from various sources, e.g. are read from some files
and such strings can easily be incorrect. That said, I'd still expect
that I can insert such strings to an ordinary text field in DB.

As for security exploits, I don't see any, Unicode 4.0 standard allows
applications to ignore such incorrect characters. Citation:

Applications are free to use any of these noncharacter code points
internally but should never attempt to exchange them. If a
noncharacter is received in open interchange, an application is not
required to interpret it in any way. It is good practice, however, to
recognize it as a noncharacter and to take appropriate action, such as
removing it from the text. Note that Unicode conformance freely allows
the removal of these characters. (See C10 in Section 3.2, Conformance
Requirements.) [End of citation]

2. No matter how you feel about 1., there's another problem: SQLite
fails e.g. on 0xE000 UTF-16 character, which, as far as I know, isn't
illegal. As a different example, SQLite doesn't fail on 0x
character, which is by definition of Unicode standard a 'noncharacter'
and isn't allowed in open interchange of Unicode text data.


So, the upshot is, that I think SQLite should simply discard any
Unicode 'noncharacters' in SQL statements and don't consider such
statements as invalid.

Jiri

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] One more SQLite threading question

2007-05-14 Thread Jiri Hajek

Hello,

I have tried to search all the documentation about threading in SQLite, but
I'm still somewhat confused.

It's often suggested to create a pool of sqlite3 structures, but what if I
would like to have only only sqlite3 connection and serialize all the DB
operations to one thread (name it 'A') that would prepare and execute all
the queries. I guess that this would work well...

However, it would be too time consuming to serialize every call to
sqlite3_step(), so I wonder whether it can be called in another thread. So
my scenario is:

1. Thread B wants to open a query 'SELECT * FROM Tbl1'
2. Thread A is used to prepare the query 'SELECT * FROM Tbl1' using
sqlite3_prepare16_v2() and using the same sqlite3 DB connection as for
possibly some other running SQL statements.
3. Thread B now repeatedly calls sqlite3_step(), sqlite3_column_text16() and
similar functions in order to get all rows from DB.
4. Thread A is used to call sqlite3_finalize() on the openned query.

So my questions are:
a. Would the code described above work.
b. In step 3., do I have to somehow make sure that calls to sqlite3_step()
don't interfere with other SQLite processing in thread A, e.g. by Windows
CriticalSections? Is anything like this also needed for
sqlite3_column_text16()?

Thanks for any explanation,
Jiri


[sqlite] Problem with Unicode surrogates

2007-05-12 Thread Jiri Hajek

Hello,



what I describe below probably can't be called a SQLite bug, but in my
opinion it could be fixed in SQLite anyway.



The problem happens on Windows platform when using Sqlite3_Prepare16(), i.e.
function accepting Unicode strings in UTF-16 encoding. When I for example
perform a query:



SELECT * FROM Table1 Where Field1='XY'



where instead of X is 0xD800 Unicode (UTF-16) character, the query fails.
It's apparently because 0xD800 is one of UTF-16 surrogates, i.e. the next
character is expected to be something like 0xDD00, which isn't our case.



What I would propose is to be more robust in handling such incorrect Unicode
strings, so that application can e.g. insert really any string, not only
those that comply Unicode standards.



Thanks,

Jiri


RE: [sqlite] Problems with multiple threads?

2006-06-07 Thread Jiri Hajek
> If it is inconvenient to rollback and retry the entire transaction, then
start the transaction initially with BEGIN EXCLUSIVE.  
> This will acquire the reserved lock immediately (instead of waiting to the
first write occurs) and so you will either get an 
> SQLITE_BUSY right away (when it is a simple matter to just rerun the BEGIN
EXCLUSIVE statement until it works) or you can be
> assured of never getting another SQLITE_BUSY again until you try to COMMIT
(and there too, you can simply rerun COMMIT 
> repeatedly until it works.)

Thanks, I overlooked that by default transactions are DEFERRED in SQLite. It
really fixes the problem.

As I think about it, if I make _all_ transactions in my application
IMMEDIATE, there shouldn't be any risk of a deadlock, right?

Thanks,
Jiri



RE: [sqlite] Problems with multiple threads?

2006-06-07 Thread Jiri Hajek
Thanks for an additional explanation, I used sqlite3_get_autocommit() for
debugging and it helped me to find out that it really was my fault. There
was an incorrect processing after COMMIT returned SQLITE_BUSY. So sorry for
this.

However, right after fixing this, I found another problem. It certainly can
be my fault, but I don't see how could it be: If I don't use transactions,
multiple threads seem to proceed well, but then right after I add BEGIN and
COMMIT to some place, all threads lock eventually. I debugged it and found
that _all_ threads accessing SQLite are in a loop waiting for an action to
proceed and all of them keep getting SQLITE_BUSY result. I wonder, can it be
somehow my fault, or is it some kind of a dead-lock in SQLite?

In the simpliest form it takes only two threads to reproduce and the problem
looks like:

Thread 1:

BEGIN TRANSACTION<-- proceeded
INSERT INTO ...  <-- Processing stops here, waiting in a
loop, Sqlite3_step() infinitely returns SQLITE_BUSY
COMMIT


Thread 2:

  // no explicit transaction start here
DELETE FROM ...   <-- Processing stops here, waiting in a
loop, Sqlite3_step() infinitely returns SQLITE_BUSY


Thread 2 can also look like this in order to reproduce the problem
(situation in Thread 1 remains the same):

BEGIN TRANSACTION<-- proceeded
DELETE FROM ...  <-- proceeded 
COMMIT <-- Processing stops here, waiting in
a loop, Sqlite3_step() infinitely returns SQLITE_BUSY

No other thread calls any SQLite function.

Do you have any idea what could be wrong?

Thanks,
Jiri




[sqlite] Problems with multiple threads?

2006-06-06 Thread Jiri Hajek
Hello,

I'm trying to use SQLite in an application where it's needed to work with
one database in mutliple threads. Based on all the info I read in SQLite
documentation I create a new database connection for every new thread
created. Each thread does some SELECTs, INSERTs or UPDATEs, but there isn't
any schema modification. If multiple threads are running, I encounter some
strange problems:

1. Occasionally after running Sqlite3_step() I get SQLITE_CANTOPEN ('Unable
to open the database file') error. I found out that it can be fixed by
running the query again, i.e. again calling Sqlite3_Prepare(). So this isn't
a big issue, but still I wonder why this error message is returned?
Shouldn't SQLITE_BUSY or SQLITE_LOCKED be returned instead?

2. More serious issue is that after I enable transaction usage (not used in
1.) sometimes (again, it's random) after calling 'BEGIN TRANSACTION' I get
an error SQLITE_ERROR ('cannot start a transaction within a transaction').
Problem is that I definitely am not already in a transaction. The only
reason for this seems to be that there's >1 thread running, with only 1
thread running there's no problem.

I tried this also with the latest version of SQLite (3.3.6 on Windows).

Any idea what can I do about it?

Thanks,
Jiri