Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-23 Thread Igor Tandetnik

On 4/23/2012 2:34 PM, Pete wrote:

Folks,
I'm a bit late to this discussion but what are the new PRAGMAs referred to
here?


They were discussed hypothetically. As in, wouldn't it be nice if there 
were pragmas to control date and time formats.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-23 Thread Pete
Folks,
I'm a bit late to this discussion but what are the new PRAGMAs referred to
here?  I don't see them listed in the documentation - do they exist or are
they a suggestion for future implementation?

On Tue, Apr 3, 2012 at 9:00 AM, <sqlite-users-requ...@sqlite.org> wrote:

> Message: 2
> Date: Mon, 2 Apr 2012 20:38:37 +0400
> From: Alexey Pechnikov <pechni...@mobigroup.ru>
> To: sqlite-users@sqlite.org
> Subject: [sqlite] Suggestion about hard-coded time string format
>-MM-DD
> Message-ID:
>

Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-20 Thread Yves Goergen
On 02.04.2012 18:38 CE(S)T, Alexey Pechnikov wrote:
> Why we can't control this? As example, in Russia the date format is
> DD.MM. and is needed the patch
> http://sqlite.mobigroup.ru/fdiff?v1=288ad2e1e017565c=720cb1015e95af7a
> 
> I think the new pragmas DATEFORMAT and TIMEFORMAT will be helpful for
> internationalization. These may be used for parsing and formatting dates.

So is there an SQLite feature to accept floating point numbers with a
decimal comma instead of the English (and programming language standard)
point? After all, why doesn't SQLite accept a Russian translation of all
those SQL commands? Not even considering the code page issues (see
Unicode comments above)...

I also agree that a database should just stick to standard
representation of data, not to user-specific or local.

The time-to-string function is useful for selecting and grouping. You
can group by a day of all months, for example. But the best knowledge
about interpreting a local date representation is surely still in your
application, not in any database system.

-- 
Yves Goergen "LonelyPixel" 
Visit my web laboratory at http://beta.unclassified.de
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-12 Thread Alexey Pechnikov
2012/4/2 Jay A. Kreibich 

>  It would make more sense to just implement a strptime() SQL function
>  to compliment the existing strftime() function.  That would allow
>  SQLite to understand and convert any incoming date-time format
>  without depending on specific build parameters.
>
>
But may strptime() function returns datetime in format '%Y-%m-%d
%H:%M:%S'?.. It's produce some overhead by expensive (string parsing)
operations.

select strptime('%d.%m.%Y %H:%M:%S', '12.11.2001 18:31:01');
2001-11-12 18:31:01
select strftime('%s', strptime('%d.%m.%Y %H:%M:%S', '12.11.2001 18:31:01'));
1005589861


-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/04/12 14:34, Simon Slavin wrote:
> I wonder whether it respects languages.

A German user living in France has some Swedish names in their list to be
sorted.  Do you use Swedish sort order, German sort order, German name
order or French order?

There is a Unicode Collation Algorithm that tries to help:

  http://unicode.org/reports/tr10/

Something you may want to consider is to transliterate the text into the
closest ASCII sounds and then sort on that.  It will help when you have
multiple languages represented, some text missing accents etc.

If you search for "unidecode" you'll find a toolkit for doing just that in
various languages.  The original was written by Sean Burke and the
thinking is documented here:

  http://interglacial.com/~sburke/tpj/as_html/tpj22.html

That site appears dead so the wayback machine helps:


http://web.archive.org/web/20091014161106/http://interglacial.com/~sburke/tpj/as_html/tpj22.html

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk96ODwACgkQmOOfHg372QTOOgCZAQ95J0sYZZYU5LkmxA2+WtoV
8VQAoMy1FH/wiuaekSIdyspXIGt1pc+1
=ehka
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Nico Williams
On Mon, Apr 2, 2012 at 5:46 PM, Simon Slavin  wrote:
> Replace part of that routine with something which specifies the locale rather 
> than fetching it from OS settings.  And store the locale used with the index, 
> as a COLLATE setting.  Thus leaving it up to whoever writes the CREATE 
> command to decide which locale was used.  I find that acceptable.  This does 
> still give you the problem Jean-Christophe noted of sorting multilanguage 
> lists of names, but that's inherent in Unicode.  Encountering the problem 
> just means you're implementing Unicode properly.

If only it were that easy.  A plain C locale (i.e., byte-wise)
collation will result in "encountering the problem", but you won't be
"implementing Unicode properly", you won't be implementing it at all!

Even if you use some Unicode collation, if you don't handle
normalization insensitivity then you're not really doing it right
either.

Consider that HFS+ on MacOS X always normalizes to NFD on
file/directory create.  But all user input methods I've seen to date
produce NFC for all Latin-* characters!  This means that if someone
does a cut-n-paste of filenames from an HFS+ filesystem then there
will be a very difficult-to-detect conflict.

Unicode is hard.  I want a doll that says that.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Simon Slavin

On 2 Apr 2012, at 10:44pm, Igor Tandetnik  wrote:

> On 4/2/2012 5:34 PM, Simon Slavin wrote:
>> Hmm.  It's there:
>> 
>> 
>> 
>> "The following function compares two Unicode strings of possibly different 
>> lengths.
>> 
>> — Function: int u8_cmp2 (const uint8_t *s1, size_t n1, const uint8_t *s2, 
>> size_t n2)
>> — Function: int u16_cmp2 (const uint16_t *s1, size_t n1, const uint16_t *s2, 
>> size_t n2)
>> — Function: int u32_cmp2 (const uint32_t *s1, size_t n1, const uint32_t *s2, 
>> size_t n2)
>> Compares s1 and s2, lexicographically. Returns a negative value if s1 
>> compares smaller than s2, a positive value if s1 compares larger than s2, or 
>> 0 if they compare equal."
>> 
>> I wonder whether it respects languages.
> 
> These don't,

You know, I don't care that much.  Unicode sorting even without languages would 
be a nice plugin for SQLite3, if that makes things so much simpler.

> but u8_strcoll et al supposedly do, based on LC_COLLATE locale category. 
> Herein lies the problem: if you build an index using these functions while 
> running under locale A, then try to run queries against this database in an 
> application running with locale B, bad things happen. From the point of view 
> of the second application, the index is corrupted.

Replace part of that routine with something which specifies the locale rather 
than fetching it from OS settings.  And store the locale used with the index, 
as a COLLATE setting.  Thus leaving it up to whoever writes the CREATE command 
to decide which locale was used.  I find that acceptable.  This does still give 
you the problem Jean-Christophe noted of sorting multilanguage lists of names, 
but that's inherent in Unicode.  Encountering the problem just means you're 
implementing Unicode properly.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Nico Williams
On Mon, Apr 2, 2012 at 1:25 PM, Richard Hipp  wrote:
> On Mon, Apr 2, 2012 at 2:03 PM, Simon Slavin  wrote:
>> I think ... a higher priority than that would be handling Unicode
>> correctly.  And having Unicode support would be useful in writing the code
>> which handles dates.
>>
> size of SQLite library:  approx 500 KB
> size of ICU library: approx 21,919 KB
>
> The ICU library (needed to handle Unicode "correctly") is over 40x larger
> than SQLite.  Can you understand then why we don't want to make SQLite
> dependent upon ICU?
>
> If you really need correct ICU support, SQLite will optionally link with
> ICU and use it.  But *requiring* SQLite to link against ICU is a
> deal-breaker for many users.

Also, Unicode collation is typically orders of magnitude slower than
US-ASCII collation.  This comes up a lot in other contexts,
particularly as the various OSes have begun defaulting to Unicode
locales.  I've seen ls(1) of directories with millions of files run as
fast as the output device permits when run in the C locale (in less
than 1 second when tmpfs), but take many minutes when in a UTF-8
locale, and that's without any use of normalization.  But mostly this
is a result of Unicode collation in libc being awful.

The OpenSolaris u8_textprep code is designed to make u8_str*cmp()
really fast, though not quite as fast as the C locale strcmp(), when
strings are mostly ASCII and even when they are not because
u8_textprep does no memory allocation for normalization-insensitive
comparison and has a fast-path for comparing substrings of two or more
ASCII codepoints.  This is the main reason that I'd recommend
u8_textprep...

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Nico Williams
On Mon, Apr 2, 2012 at 4:24 PM, Simon Slavin  wrote:
> On 2 Apr 2012, at 9:58pm, Alexey Pechnikov  wrote:
>> Description: Unicode string library for C
>> The 'libunistring' library implements Unicode strings (in the UTF-8,
>> UTF-16, and UTF-32 encodings), together with functions for
>> Unicode characters (character names, classifications, properties) and
>> functions for string processing (formatted output, width, word
>> breaks, line breaks, normalization, case folding, regular expressions).
>
> Trying to figure out what SQLite would want from Unicode characters I don't 
> end up with any of those.  I think all it wants is sorting, so SQLite can 
> make an index properly.  And I don't really care whether it's case-sensitive 
> or not since my software can do case conversion on input.  Because they're in 
> standard functions, string length and substring substitution would be nice 
> but I can live without them working properly.

SQLite3 needs:

 - string comparison with normalization-insensitivity, unless
   SQLite3 were to normalize TEXT values on INSERT/UPDATE,
   but I don't recommend that, except that for indexes it's
   required; see below

 - string comparison with case-insensitivity as an option (for LIKE)

 - string normalization and case-folding functions, which are
   needed for computing index key prefixes for LIKE and GLOB
   patterns that use globbing, so that the index cursors can be
   positioned correctly

 - preferably a way to specify a collation for Unicode (i.e., a
   language, since collation rules may vary by language)

 - preferably a way to specify not to use locale environment
   variables (see Igor's comments)

 - functionality needed to implement SQLite3's built-in string functions
- i.e., trim(), ltrim(), rtrim(), replace(), substr(), lower(),
upper(), min(), max(), and length()

Incidentally, length() claims to return a count of characters, but it
actually counts *codepoints*.  Counting characters is a lot harder
than counting codepoints...  Codepoint counting in UTF-* is trivial;
character counting requires tables of combining codepoint ranges and
code to skip combining codepoints.  Counting graphemes is harder
still.  Getting these things right is non-trivial.  Ideally there
would be an option to the length() function to request counts of
different possible things: UTF-8 units (bytes), UTF-16 units,
codepoints, characters, glyphs, and graphemes, though just stopping at
characters would do.

Similar comments apply to string indices in functions like substr()!

In practice one should want to count characters when dealing with
sub-string operations, but storage units when dealing with
transmission.  Using codepoint counts in substr() risks breaking
combining codepoint sequences and thus producing garbage.

I think the OpenSolaris u8_textprep code is good enough for the
collation requirements, but it probably isn't sufficient for the
SQLite3 string functions, but I'd have to look carefully.  I suspect
that ICU and libunistring meet all the requirements.

> One problem is that, as someone explained to me last year, sorting of unicode 
> characters depends on which language you're using (and other things if you're 
> fussy).  So for every index you make you'd have to declare the language, and 
> SQLite would have to store it.

SQLite3 allows you to specify collations though, so that's not that
big a deal.  For a web application, say, it's very difficult to
implement sorting that satisfies all possible users because indexes
can't provide a globally sufficient collation, not unless you were
willing to have a multitude of indexes.  Sorting, then, has to be done
on result sets -- that is, without the benefit of indexes in most
cases, which means it will be slow for any queries that return large
row sets.

In practice though this is not that big a deal.  And there will be a
tendency to simplify collations.  For example the Royal Spanish
Academy no longer requires that 'ch' sort after 'c', nor that 'll'
sort after 'l' [*].  I suspect most users won't really care, but
whether they do will depend on the application and the user.

> I was trying to figure out whether SQLite could make use of the OS's unicode 
> library (using different compilation directives for each platform which 
> supports unicode) but I'm only really familiar with the Mac operating system 
> and I don't know how Windows or Linux does these things.

There's no standard C libraries that deal with Unicode in sufficient
detail.  In particular the wchar_t functions are useless for the
purposes of SQLite3 because they try to hide too much detail, and
because in some cases they attempt to hide even the codeset.

[*] http://servicios.larioja.com/romanpaladino/g02.htm claims that the
Academy changed this in 1994, and that people started noticing this in
phone books in 1996, and that they complained.

http://es.wikipedia.org/wiki/Ortograf%C3%ADa_del_espa%C3%B1ol goes
into 

Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Jean-Christophe Deschamps


At 23:44 02/04/2012, you wrote:


I wonder whether it respects languages.


These don't, but u8_strcoll et al supposedly do, based on LC_COLLATE 
locale category. Herein lies the problem: if you build an index using 
these functions while running under locale A, then try to run queries 
against this database in an application running with locale B, bad 
things happen. From the point of view of the second application, the 
index is corrupted.


That is: the collation used for this exact purpose becomes a strong 
metadata of the table but there's no easy way to deal with that.


Not only, but say you have a table of worldwide customers (I do have 
clients in 39 countries today), which exact locale are you going to 
use?  I know this question has no answer (and that's the main grief I 
have with Unicode).


A workable perspective is to come up with a "decent if not perfect" way 
to deal with unaccenting and getting rid of the locale concept.


For those ready to cut some corners and to give a rough idea of what 
can be done easily if you're ready to live with some compromises (since 
*-no-* perfect solution exists), my C shared library implementing a 
large number of string and misc. functions (with both UTF-8 & UTF-16 
interfaces) dealing with a weak form of "unaccented Unicode v5.1" is 
currently a 143Kb Win x86 DLL and runs reasonably fast.


JcD 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Igor Tandetnik

On 4/2/2012 5:34 PM, Simon Slavin wrote:

Hmm.  It's there:



"The following function compares two Unicode strings of possibly different 
lengths.

— Function: int u8_cmp2 (const uint8_t *s1, size_t n1, const uint8_t *s2, 
size_t n2)
— Function: int u16_cmp2 (const uint16_t *s1, size_t n1, const uint16_t *s2, 
size_t n2)
— Function: int u32_cmp2 (const uint32_t *s1, size_t n1, const uint32_t *s2, 
size_t n2)
Compares s1 and s2, lexicographically. Returns a negative value if s1 compares 
smaller than s2, a positive value if s1 compares larger than s2, or 0 if they 
compare equal."

I wonder whether it respects languages.


These don't, but u8_strcoll et al supposedly do, based on LC_COLLATE 
locale category. Herein lies the problem: if you build an index using 
these functions while running under locale A, then try to run queries 
against this database in an application running with locale B, bad 
things happen. From the point of view of the second application, the 
index is corrupted.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Simon Slavin

On 2 Apr 2012, at 10:24pm, Simon Slavin  wrote:

> On 2 Apr 2012, at 9:58pm, Alexey Pechnikov  wrote:
> 
>> Description: Unicode string library for C
>> The 'libunistring' library implements Unicode strings (in the UTF-8,
>> UTF-16, and UTF-32 encodings), together with functions for
>> Unicode characters (character names, classifications, properties) and
>> functions for string processing (formatted output, width, word
>> breaks, line breaks, normalization, case folding, regular expressions).
>> 
>> This package contains the shared library.
>> Homepage: http://www.gnu.org/software/libunistring/
> 
> Trying to figure out what SQLite would want from Unicode characters I don't 
> end up with any of those.  I think all it wants is sorting, so SQLite can 
> make an index properly.

Hmm.  It's there:



"The following function compares two Unicode strings of possibly different 
lengths.

— Function: int u8_cmp2 (const uint8_t *s1, size_t n1, const uint8_t *s2, 
size_t n2)
— Function: int u16_cmp2 (const uint16_t *s1, size_t n1, const uint16_t *s2, 
size_t n2)
— Function: int u32_cmp2 (const uint32_t *s1, size_t n1, const uint32_t *s2, 
size_t n2)
Compares s1 and s2, lexicographically. Returns a negative value if s1 compares 
smaller than s2, a positive value if s1 compares larger than s2, or 0 if they 
compare equal."

I wonder whether it respects languages.  I don't think so, but I can live 
without it.  Time to hack up an external function for SQLite and see how little 
of the library I need to make it work.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Simon Slavin

On 2 Apr 2012, at 9:58pm, Alexey Pechnikov  wrote:

> Description: Unicode string library for C
> The 'libunistring' library implements Unicode strings (in the UTF-8,
> UTF-16, and UTF-32 encodings), together with functions for
> Unicode characters (character names, classifications, properties) and
> functions for string processing (formatted output, width, word
> breaks, line breaks, normalization, case folding, regular expressions).

Trying to figure out what SQLite would want from Unicode characters I don't end 
up with any of those.  I think all it wants is sorting, so SQLite can make an 
index properly.  And I don't really care whether it's case-sensitive or not 
since my software can do case conversion on input.  Because they're in standard 
functions, string length and substring substitution would be nice but I can 
live without them working properly.

One problem is that, as someone explained to me last year, sorting of unicode 
characters depends on which language you're using (and other things if you're 
fussy).  So for every index you make you'd have to declare the language, and 
SQLite would have to store it.

I was trying to figure out whether SQLite could make use of the OS's unicode 
library (using different compilation directives for each platform which 
supports unicode) but I'm only really familiar with the Mac operating system 
and I don't know how Windows or Linux does these things.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Nico Williams
On Mon, Apr 2, 2012 at 3:58 PM, Alexey Pechnikov  wrote:
> And see too:
>
> Homepage: http://www.gnu.org/software/libunistring/

Thanks!  That looks like a dream come true (even if LGPL).  It's much
more complete than the OpenSolaris u8_textprep stuff, which would be
particularly useful for SQLite3 string functions.  But for just
collations the u8_textprep stuff is good enough, and probably much
smaller (though I've not yet examined the size of libunistring).

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Alexey Pechnikov
And see too:

# LANG=C aptitude show libunistring0
Package: libunistring0
New: yes
State: not installed
Version: 0.9.3-3
Priority: optional
Section: libs
Maintainer: Andreas Rottmann 
Uncompressed Size: 1221 k
Depends: libc6 (>= 2.3)
Description: Unicode string library for C
 The 'libunistring' library implements Unicode strings (in the UTF-8,
UTF-16, and UTF-32 encodings), together with functions for
 Unicode characters (character names, classifications, properties) and
functions for string processing (formatted output, width, word
 breaks, line breaks, normalization, case folding, regular expressions).

 This package contains the shared library.
Homepage: http://www.gnu.org/software/libunistring/


2012/4/2 Nico Williams 

> On Mon, Apr 2, 2012 at 1:25 PM, Richard Hipp  wrote:
> > On Mon, Apr 2, 2012 at 2:03 PM, Simon Slavin 
> wrote:
> >> I think ... a higher priority than that would be handling Unicode
> >> correctly.  And having Unicode support would be useful in writing the
> code
> >> which handles dates.
> >>
> >>
> > size of SQLite library:  approx 500 KB
> > size of ICU library: approx 21,919 KB
> >
> > The ICU library (needed to handle Unicode "correctly") is over 40x larger
> > than SQLite.  Can you understand then why we don't want to make SQLite
> > dependent upon ICU?
>
> I completely agree.  It'd be nice if SQLite3 could have an option for
> a weak dependency on ICU.  I.e., if it can be found with dlopen(),
> then use it, else not; a pragma could be used by applications to check
> whether SQLite3 found ICU, or to require that it always be found for
> any given DB file.  Along these lines it'd be nice if a DB file could
> record required loadable extensions and provide a single pragma to
> load them all, with errors returned when compiling or running
> statements other than pragmas until the required extensions are
> loaded.
>
> > If you really need correct ICU support, SQLite will optionally link with
> > ICU and use it.  But *requiring* SQLite to link against ICU is a
> > deal-breaker for many users.
>
> FYI, the OpenSolaris Unicode library is significantly smaller, most
> likely because it deals only with normalization, case conversion, and
> codepoint prohibitions -- everything needed for "stringprep" and
> normalization- and case-insensitive string comparison.  The data
> tables reside in a header file and are 1.8 MB.  The code is 55KB, +
> 3KB of headers.  The license for this is CDDL (a per-file copyleft, as
> opposed to project-wide copyleft), which is not quite as free as ICU's
> license (which is essentially a two-clause BSD license), but probably
> good enough for most projects that use SQLite3.  FreeBSD, for example,
> includes u8_textprep, probably because it also includes ZFS (which
> depends on u8_textprep).  This is still many times larger than
> SQLite3, but still more than ten times smaller than ICU.
>
>
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/common/unicode/
>
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/
> (scroll down to u8_textprep.h and u8_textprep_data.h)
>
> Nico
> --
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Nico Williams
On Mon, Apr 2, 2012 at 1:25 PM, Richard Hipp  wrote:
> On Mon, Apr 2, 2012 at 2:03 PM, Simon Slavin  wrote:
>> I think ... a higher priority than that would be handling Unicode
>> correctly.  And having Unicode support would be useful in writing the code
>> which handles dates.
>>
>>
> size of SQLite library:  approx 500 KB
> size of ICU library: approx 21,919 KB
>
> The ICU library (needed to handle Unicode "correctly") is over 40x larger
> than SQLite.  Can you understand then why we don't want to make SQLite
> dependent upon ICU?

I completely agree.  It'd be nice if SQLite3 could have an option for
a weak dependency on ICU.  I.e., if it can be found with dlopen(),
then use it, else not; a pragma could be used by applications to check
whether SQLite3 found ICU, or to require that it always be found for
any given DB file.  Along these lines it'd be nice if a DB file could
record required loadable extensions and provide a single pragma to
load them all, with errors returned when compiling or running
statements other than pragmas until the required extensions are
loaded.

> If you really need correct ICU support, SQLite will optionally link with
> ICU and use it.  But *requiring* SQLite to link against ICU is a
> deal-breaker for many users.

FYI, the OpenSolaris Unicode library is significantly smaller, most
likely because it deals only with normalization, case conversion, and
codepoint prohibitions -- everything needed for "stringprep" and
normalization- and case-insensitive string comparison.  The data
tables reside in a header file and are 1.8 MB.  The code is 55KB, +
3KB of headers.  The license for this is CDDL (a per-file copyleft, as
opposed to project-wide copyleft), which is not quite as free as ICU's
license (which is essentially a two-clause BSD license), but probably
good enough for most projects that use SQLite3.  FreeBSD, for example,
includes u8_textprep, probably because it also includes ZFS (which
depends on u8_textprep).  This is still many times larger than
SQLite3, but still more than ten times smaller than ICU.

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/common/unicode/
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/
(scroll down to u8_textprep.h and u8_textprep_data.h)

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Simon Slavin

On 2 Apr 2012, at 7:25pm, Richard Hipp  wrote:

> On Mon, Apr 2, 2012 at 2:03 PM, Simon Slavin  wrote:
> 
>> I think ... a higher priority than that would be handling Unicode
>> correctly.  And having Unicode support would be useful in writing the code
>> which handles dates.
> 
> size of SQLite library:  approx 500 KB
> size of ICU library: approx 21,919 KB
> 
> The ICU library (needed to handle Unicode "correctly") is over 40x larger
> than SQLite.  Can you understand then why we don't want to make SQLite
> dependent upon ICU?

Yep.  That's why you don't do it.  And handling other things to make data 
human-readable using Unicode characters would be wrong too.

Simon.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Alexey Pechnikov
Why you need ICU library to parse datetime strings?!! The my previous patch
to parse Russian dates has only single row without any external libs.

2012/4/2 Richard Hipp 

> On Mon, Apr 2, 2012 at 2:03 PM, Simon Slavin  wrote:
>
> >
> > I think ... a higher priority than that would be handling Unicode
> > correctly.  And having Unicode support would be useful in writing the
> code
> > which handles dates.
> >
> >
> size of SQLite library:  approx 500 KB
> size of ICU library: approx 21,919 KB
>
> The ICU library (needed to handle Unicode "correctly") is over 40x larger
> than SQLite.  Can you understand then why we don't want to make SQLite
> dependent upon ICU?
>
> If you really need correct ICU support, SQLite will optionally link with
> ICU and use it.  But *requiring* SQLite to link against ICU is a
> deal-breaker for many users.
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Richard Hipp
On Mon, Apr 2, 2012 at 2:03 PM, Simon Slavin  wrote:

>
> I think ... a higher priority than that would be handling Unicode
> correctly.  And having Unicode support would be useful in writing the code
> which handles dates.
>
>
size of SQLite library:  approx 500 KB
size of ICU library: approx 21,919 KB

The ICU library (needed to handle Unicode "correctly") is over 40x larger
than SQLite.  Can you understand then why we don't want to make SQLite
dependent upon ICU?

If you really need correct ICU support, SQLite will optionally link with
ICU and use it.  But *requiring* SQLite to link against ICU is a
deal-breaker for many users.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Simon Slavin

On 2 Apr 2012, at 6:18pm, Alexey Pechnikov  wrote:

> Please see http://www.sqlite.org/lang_datefunc.html and you will be
> frustrated because SQLite can format date/time to different string formats
> but can't read the produced date/time strings.

I am not frustrated.  That function does not require any understanding of 
locales, timezones, or Summer Time.  It doesn't 'understand' dates at all, it's 
just a version of sprintf() useful for dates.  Easy to write, easy to test.

Turning dates into arbitrary strings is easy.  Adjusting for time zones is 
difficult.  Analyzing a string to turn it into a date is difficult (some 
strings are ambiguous), and may require support for Unicode which is very 
difficult indeed.  This is a complicated thing and putting such functions on 
tiny devices like a GPS Unit would be a big waste.

If you want external functions, SQLite supplies very nice facilities which let 
you plug them in.  No need to patch the source code.  So you can write your 
functions to convert your dates however you want, with whatever understanding 
of locales you want.

I think a future version of SQLite might handle dates.  But a higher priority 
than that would be handling Unicode correctly.  And having Unicode support 
would be useful in writing the code which handles dates.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Alexey Pechnikov
> 2012/4/2 Simon Slavin 

Please see http://www.sqlite.org/lang_datefunc.html and you will be
frustrated because SQLite can format date/time to different string formats
but can't read the produced date/time strings.

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Jay A. Kreibich
On Mon, Apr 02, 2012 at 08:38:37PM +0400, Alexey Pechnikov scratched on the 
wall:
> Why we can't control this? As example, in Russia the date format is
> DD.MM. and is needed the patch
> http://sqlite.mobigroup.ru/fdiff?v1=288ad2e1e017565c=720cb1015e95af7a
> 
> I think the new pragmas DATEFORMAT and TIMEFORMAT will be helpful for
> internationalization. These may be used for parsing and formatting dates.

  It would make more sense to just implement a strptime() SQL function
  to compliment the existing strftime() function.  That would allow
  SQLite to understand and convert any incoming date-time format
  without depending on specific build parameters.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Simon Slavin

On 2 Apr 2012, at 5:38pm, Alexey Pechnikov  wrote:

> Why we can't control this? As example, in Russia the date format is
> DD.MM. and is needed the patch
> http://sqlite.mobigroup.ru/fdiff?v1=288ad2e1e017565c=720cb1015e95af7a
> 
> I think the new pragmas DATEFORMAT and TIMEFORMAT will be helpful for
> internationalization. These may be used for parsing and formatting dates.

I take the view that parsing and formatting data should be done by your 
software.  Your software must be aware of time zones, Summer time adjustments, 
and whether you want your months numbered or spelled.  It needs to deal with 
people entering gibberish as a date.  If needs to know, if a date was entered 
by a user in Russia, and printed by a user in Germany, whether it has to be 
adjusted for local time.

SQLite is a database system.  It's used for /storing/ information.  And that 
has nothing to do with what language you speak.  A date is a date, a time is a 
time, and you can be storing time as UTC or in your local timezone.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Suggestion about hard-coded time string format YYYY-MM-DD

2012-04-02 Thread Alexey Pechnikov
Why we can't control this? As example, in Russia the date format is
DD.MM. and is needed the patch
http://sqlite.mobigroup.ru/fdiff?v1=288ad2e1e017565c=720cb1015e95af7a

I think the new pragmas DATEFORMAT and TIMEFORMAT will be helpful for
internationalization. These may be used for parsing and formatting dates.

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users