Re: [sqlite] Data race (file race) in pager

2012-05-04 Thread Dan Kennedy

On 05/04/2012 11:21 PM, Paul Thomson wrote:

I am working on a tool that (among other things) can detect data
races, including file access races. I have detected a file race in
SQLite on the database file that appears to be real, although I am not
certain - I have no experience with SQLite. I compiled SQLite with:
#define SQLITE_THREADSAFE 2

I use a simple test case where two threads call opendb on the same
file, write into the same table and then close the database. The file
race appears to occur due to sqlite3PagerReadFileheader (as the
database is opened?) in one thread and pager_write_pagelist in the
other. It looks as though the page that was written was pgno 1. Can
any experts explain whether these two accesses are in fact
synchronised, or if the race is benign?


Is the issue that PagerReadFileheader() may be called to read the
file-header at the same time as the first page of the database
(which contains the file-header being read) is being written by the
second connection?

If so, it's a known issue. Immediately after opening a db file,
SQLite reads the first 100 bytes of it. Since it holds no locks
on the file at this point, some other process may be writing at
the same time. So the data read cannot be trusted.

SQLite knows this. The only field it uses from the file-header
read at this point is the page-size field (as in "PRAGMA page_size",
default 1024 bytes). Later on, when it actually accesses the database,
it obtains a SHARED lock and reads the first page of data from
the file - using the page-size garnered from the unsafe read made
of the file-header. Once the first page of the db is loaded, it takes
another look at the page-size field in the header (part of the first
page). If it turns out that the page-size is not as expected, it
throws away any data in the cache and tries to read the first page
again, this time hopefully with the correct page size.

So the call to sqlite4PagerReadFileheader() is just an optimization
that increases the chances that when the database is first read
SQLite will guess the correct page-size - and not have to load,
discard, and reload the first page of the db.

The proprietary zipvfs extension (implemented as a VFS layer that
intercepts all IO calls) always returns a buffer full of zeroes
when SQLite attempts the initial 100 byte read. And it seems to
work Ok.

The tool you're working on sounds pretty cool. Is it going to
be open-source?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Multi-lingual support?

2012-05-04 Thread Richard Hipp
On Fri, May 4, 2012 at 10:49 PM, Dale E. Edmons  wrote:

> Hi,
>
> I've built an extensive database that has three copies of the Jewish
> Tanach in it.  When trying to get sqlite3 to handle Hebrew (utf8 for
> starters) it seems to be trying to manipulate the text and it ends up
> backwards.  My editors and nearly all command-line utilities don't corrupt
> the data, but sqlite3 does, and I can't do any search queries.
>

What program are you using to access your SQLite database?  Are you using
the command-line shell that comes with SQLite, a program you wrote
yourself, or some third-party program?


>
> Since I've built my entire database in sqlite3, I'd much rather keep it.
>  Is there anybody that is knowledgable about Hebrew support?  I'm just
> learning Hebrew myself, to I'm no guru.
>
> Any help or suggestions will be appreciated.
>
> D.E.
>
> --
> "Any society that would give up a little liberty to gain
> a little security will deserve neither and lose both."
>  Benjamin Franklin
> __**_
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Multi-lingual support?

2012-05-04 Thread Dale E. Edmons

Hi,

I've built an extensive database that has three copies of the Jewish 
Tanach in it.  When trying to get sqlite3 to handle Hebrew (utf8 for 
starters) it seems to be trying to manipulate the text and it ends up 
backwards.  My editors and nearly all command-line utilities don't 
corrupt the data, but sqlite3 does, and I can't do any search queries.


Since I've built my entire database in sqlite3, I'd much rather keep 
it.  Is there anybody that is knowledgable about Hebrew support?  I'm 
just learning Hebrew myself, to I'm no guru.


Any help or suggestions will be appreciated.

D.E.

--
"Any society that would give up a little liberty to gain
a little security will deserve neither and lose both."
  Benjamin Franklin 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Nico Williams
On Fri, May 4, 2012 at 4:04 PM, Richard Hipp  wrote:
> Correction:  The one that it encounters first, since subsequent rows of the
> same value will not trigger a new copy of values into the output registers,
> since only a new min/max does that.

But surely that's non-deterministic, or you'd like users to think so.

I think that leaves SQLite3's handling of non-aggregate expressions in
GROUP BY queries as.. not fully satisfying.  I think there really has
to be a way to flag such queries as erroneous.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Richard Hipp
On Fri, May 4, 2012 at 5:02 PM, Richard Hipp  wrote:

>
>
> On Fri, May 4, 2012 at 5:01 PM, Ralf Junker  wrote:
>
>> On 04.05.2012 16:39, Richard Hipp wrote:
>>
>> > If a single min() or max() aggregate function appears in a query, then
>> any
>> > other columns that are not contained within aggregate functions and that
>> > are not elements of the GROUP BY will take values from one of the same
>> rows
>> > that satisfied the one min() or max() aggregate function.
>>
>> Given that more than one row satisfies the one min() or max() aggregate
>> function (think of multiple, identical smallest or largest values).
>> Which row will SQLite pick?
>>
>
> The row that it encounters last.
>

Correction:  The one that it encounters first, since subsequent rows of the
same value will not trigger a new copy of values into the output registers,
since only a new min/max does that.


>
>
>>
>> Ralf
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
>
> --
> D. Richard Hipp
> d...@sqlite.org
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Igor Tandetnik

On 5/4/2012 5:01 PM, Ralf Junker wrote:

On 04.05.2012 16:39, Richard Hipp wrote:


If a single min() or max() aggregate function appears in a query, then any
other columns that are not contained within aggregate functions and that
are not elements of the GROUP BY will take values from one of the same rows
that satisfied the one min() or max() aggregate function.


Given that more than one row satisfies the one min() or max() aggregate
function (think of multiple, identical smallest or largest values).
Which row will SQLite pick?


One of them.
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Richard Hipp
On Fri, May 4, 2012 at 5:01 PM, Ralf Junker  wrote:

> On 04.05.2012 16:39, Richard Hipp wrote:
>
> > If a single min() or max() aggregate function appears in a query, then
> any
> > other columns that are not contained within aggregate functions and that
> > are not elements of the GROUP BY will take values from one of the same
> rows
> > that satisfied the one min() or max() aggregate function.
>
> Given that more than one row satisfies the one min() or max() aggregate
> function (think of multiple, identical smallest or largest values).
> Which row will SQLite pick?
>

The row that it encounters last.


>
> Ralf
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Marc L. Allen
The last one it saw.  It's not deterministic.

> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Ralf Junker
> Sent: Friday, May 04, 2012 5:01 PM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Details on New Features
> 
> On 04.05.2012 16:39, Richard Hipp wrote:
> 
> > If a single min() or max() aggregate function appears in a query,
> then
> > any other columns that are not contained within aggregate functions
> > and that are not elements of the GROUP BY will take values from one
> of
> > the same rows that satisfied the one min() or max() aggregate
> function.
> 
> Given that more than one row satisfies the one min() or max() aggregate
> function (think of multiple, identical smallest or largest values).
> Which row will SQLite pick?
> 
> Ralf
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Ralf Junker
On 04.05.2012 16:39, Richard Hipp wrote:

> If a single min() or max() aggregate function appears in a query, then any
> other columns that are not contained within aggregate functions and that
> are not elements of the GROUP BY will take values from one of the same rows
> that satisfied the one min() or max() aggregate function.

Given that more than one row satisfies the one min() or max() aggregate
function (think of multiple, identical smallest or largest values).
Which row will SQLite pick?

Ralf
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Locking/Concurrency

2012-05-04 Thread Igor Tandetnik

On 5/4/2012 1:52 PM, KUSHAL SHAH wrote:

I am trying to use SQLite in my .NET project. Client APIs are from 
System.Data.SqLite. Can you please help with below:

It seemsthat multiple
threads can actually read simultaneously from a sqlite Db. However, I am
confused about the write part. Will SQLite manage the write requests or the
user has to have specific flags/locks on the connection?


Every connection has an associated mutex, and every SQLite API function 
calls are synchronized on this mutex.



I couldn’t find good documentation around it.


http://sqlite.org/threadsafe.html


Specifically, I am looking for the
following scenarios:

1.   I am reading Db and on another thread is
writing to that Db. Do I need specific flags on each of the connections? If so,
which ones?


Ah, so it's two separate connections? Earlier, you were talking about 
"the" connection. With multiple connections, different rules apply:


http://sqlite.org/lockingv3.html
http://sqlite.org/wal.html

Unless you enable WAL mode, you won't be able to read on one connection 
and write on another simultaneously.



2.   Both threads want to write to the Db? What
flags/locks can I have in my code to achieve that, if at all?


You can't have two writing connections, even in WAL mode. The two 
threads may share the same connection, but then they would be 
effectively serialized anyway, so you won't get any benefit from having 
two of them.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Locking/Concurrency

2012-05-04 Thread KUSHAL SHAH
I am trying to use SQLite in my .NET project. Client APIs are from 
System.Data.SqLite. Can you please help with below:
 
It seemsthat multiple
threads can actually read simultaneously from a sqlite Db. However, I am
confused about the write part. Will SQLite manage the write requests or the
user has to have specific flags/locks on the connection? And if so, which
locks/flags to use and how? I couldn’t find good documentation around it.
 
Specifically, I am looking for the
following scenarios:
 
1.   I am reading Db and on another thread is
writing to that Db. Do I need specific flags on each of the connections? If so,
which ones?
2.   Both threads want to write to the Db? What
flags/locks can I have in my code to achieve that, if at all? 
 
Thanks,
Kushal.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/05/12 09:42, Nico Williams wrote:
> A pragma by which to cause SQLite3 to return an error instead might be
> useful, but then, it's SQL_Lite_.

What I have always wanted for SQLite is some sort of "lint" mode.  It
would tell you when your queries rely on undefined behaviour, make no
sense or are suboptimal.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk+kFhYACgkQmOOfHg372QSV4QCgzWo6QrO5Umx6G49uVhawCvqR
KmYAnAiOX67q1yM5i1JOGQB41QVuc/oC
=yzM2
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Oliver Schneider
On 2012-05-02 14:06, peter korinis wrote:
> Thank you all. 
> Look like I'm stuck with the CLI though I have contacted Nucleon software
> support ... tried CLI yesterday but need more practice.
> Is there a good reference book you would recommend for SQLite?
Absolutely. The one by Mike Owens was a great read. The first two or
three chapters explain SQL as a whole. Even though I _had_ to work with
MySQL and stuff before, it was this book that actually "enlightened" me
w.r.t. to some of the cool features SQL (RDBMSs) give you.

Name: "The Definitive Guide to SQLite"

There is another one from O'Reilly whose author is also on this list. I
think I would have found this one harder to grasp without reading the
Owens book before, though.

// Oliver

PS: re-sending after the mailing lists chokes on my message with S/MIME
signature.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Oliver Schneider
On 2012-05-01 20:41, Baruch Burstein wrote:
> It is already wrapped in a transaction.
> I seem to remember seeing somewhere that the .import command doesn't
> understand escaping, e.g.
> 
> "one","two,three"
> 
> will get imported as
> 
> "one" | "two | three"
> 
> (the quotes are part of the data, and the second column was split into two
> by the comma)
> Just a point to be aware of.
That could be mitigated by writing a small script that "converts" the
CSV contents line-wise to SQL-statements, right?


// Oliver

PS: re-sending after the mailing lists chokes on my message with S/MIME
signature.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Richard Hipp
On Fri, May 4, 2012 at 12:44 PM, Eric Sink  wrote:

>
> Is this new syntax likely to perform any better than the traditional way
> of writing the query?
>

Dunno.  Depends on which "traditional way" you are talking about, I suppose.

Here's how it works:  SQLite internally maintains a set of "registers"
(objects in which it can store any valid value) for the result set.  As it
is evaluating each row of the inputs to an aggregate query, if there is a
min() or max() agg function that hits a new minimum or maximum value, then
the other values of the current input row are copied into the output
registers.  If the min() or max() did not reach a new low or high, then the
values are not copied.  So, after all input rows have been examined, the
output registers will contain values that correspond to the last min() or
max() row.

The above should make it clear what happens if you have multiple min()
and/or max() functions in the same aggregate query.  The one that reached
its extreme last is the one whose row values will appear in the output
set.  Of course, the order of the input rows is usually undefined, so there
are no guarantees about which row that will actually be.  You know that
whatever values appear in the final output will correspond to the extreme
of at least one of the various min()/max() functions - you just don't know
which one.

BTW, this ability for min()/max() to trigger the copying of input values
into the output register - it is not a capability that is exposed to
application-defined functions.  So you cannot add new functions that work
exactly like the built-in min() and max().  You can override the built-in
min() and max() aggregates, but if you do, you lose the magical processing
that makes output values all come from the row that contained the min or
max value.


>
> --
> E
>
>
> On May 4, 2012, at 11:42 AM, Nico Williams  wrote:
>
> > On Fri, May 4, 2012 at 9:20 AM, Richard Hipp  wrote:
> >>> Queries of the form: "SELECT max(x), y FROM table" returns the
> >>> value of y on the same row that contains the maximum x value.
> >>>
> >>> Is that standard SQL behavior?  I'd have expected that to return one
> row
> >>> for every row in the table.  To get the behavior described above, I'd
> use
> >>> "SELECT x, y FROM table WHERE x = (SELECT max(x) FROM table)".
> >>
> >> It is definitely NOT standard behavior.  The standard behavior is
> >> undefined.  Or (with many SQL engines) it will throw an error if you
> have a
> >> term in the result set that is not part of an aggregate function or an
> >> element of the GROUP BY clause.  But lots of newbies expect SQL to work
> as
> >> described in the 3.7.11 release comments, and we used to get support
> >> questions because it did not.  And so rather than continue to answer the
> >> questions over and over, I figured it would be easier to tweak SQLite to
> >> reliably do what newbies expect.  I never anticipated that this change
> >> would be so controversial or confusing.
> >
> > This is very clever.  I'm not sure that an informative error message
> > wouldn't have been better, but I think you made the right choice given
> > SQLite3's previous behavior.  A pragma by which to cause SQLite3 to
> > return an error instead might be useful, but then, it's SQL_Lite_.
> >
> > Is there any way to define aggregate functions that pick a row for
> > providing column values in non-aggregate expressions?  E.g., you could
> > have a median() or mode(), no?  I don't think this is important, am
> > just curious.
> >
> > Nico
> > --
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Eric Sink

Is this new syntax likely to perform any better than the traditional way of 
writing the query?

--
E


On May 4, 2012, at 11:42 AM, Nico Williams  wrote:

> On Fri, May 4, 2012 at 9:20 AM, Richard Hipp  wrote:
>>> Queries of the form: "SELECT max(x), y FROM table" returns the
>>> value of y on the same row that contains the maximum x value.
>>> 
>>> Is that standard SQL behavior?  I'd have expected that to return one row
>>> for every row in the table.  To get the behavior described above, I'd use
>>> "SELECT x, y FROM table WHERE x = (SELECT max(x) FROM table)".
>> 
>> It is definitely NOT standard behavior.  The standard behavior is
>> undefined.  Or (with many SQL engines) it will throw an error if you have a
>> term in the result set that is not part of an aggregate function or an
>> element of the GROUP BY clause.  But lots of newbies expect SQL to work as
>> described in the 3.7.11 release comments, and we used to get support
>> questions because it did not.  And so rather than continue to answer the
>> questions over and over, I figured it would be easier to tweak SQLite to
>> reliably do what newbies expect.  I never anticipated that this change
>> would be so controversial or confusing.
> 
> This is very clever.  I'm not sure that an informative error message
> wouldn't have been better, but I think you made the right choice given
> SQLite3's previous behavior.  A pragma by which to cause SQLite3 to
> return an error instead might be useful, but then, it's SQL_Lite_.
> 
> Is there any way to define aggregate functions that pick a row for
> providing column values in non-aggregate expressions?  E.g., you could
> have a median() or mode(), no?  I don't think this is important, am
> just curious.
> 
> Nico
> --
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Nico Williams
On Fri, May 4, 2012 at 9:20 AM, Richard Hipp  wrote:
>>         Queries of the form: "SELECT max(x), y FROM table" returns the
>> value of y on the same row that contains the maximum x value.
>>
>> Is that standard SQL behavior?  I'd have expected that to return one row
>> for every row in the table.  To get the behavior described above, I'd use
>> "SELECT x, y FROM table WHERE x = (SELECT max(x) FROM table)".
>
> It is definitely NOT standard behavior.  The standard behavior is
> undefined.  Or (with many SQL engines) it will throw an error if you have a
> term in the result set that is not part of an aggregate function or an
> element of the GROUP BY clause.  But lots of newbies expect SQL to work as
> described in the 3.7.11 release comments, and we used to get support
> questions because it did not.  And so rather than continue to answer the
> questions over and over, I figured it would be easier to tweak SQLite to
> reliably do what newbies expect.  I never anticipated that this change
> would be so controversial or confusing.

This is very clever.  I'm not sure that an informative error message
wouldn't have been better, but I think you made the right choice given
SQLite3's previous behavior.  A pragma by which to cause SQLite3 to
return an error instead might be useful, but then, it's SQL_Lite_.

Is there any way to define aggregate functions that pick a row for
providing column values in non-aggregate expressions?  E.g., you could
have a median() or mode(), no?  I don't think this is important, am
just curious.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Data race (file race) in pager

2012-05-04 Thread Richard Hipp
On Fri, May 4, 2012 at 12:21 PM, Paul Thomson  wrote:

> I am working on a tool that (among other things) can detect data
> races, including file access races. I have detected a file race in
> SQLite on the database file that appears to be real, although I am not
> certain - I have no experience with SQLite. I compiled SQLite with:
> #define SQLITE_THREADSAFE 2
>
> I use a simple test case where two threads call opendb


There is no such function "opendb()" in SQLite, either as an API or
internal.  Did you mean sqlite3_open()?



> on the same
> file, write into the same table and then close the database. The file
> race appears to occur due to sqlite3PagerReadFileheader (as the
> database is opened?) in one thread and pager_write_pagelist in the
> other. It looks as though the page that was written was pgno 1. Can
> any experts explain whether these two accesses are in fact
> synchronised, or if the race is benign?
>

Each database connection will have its own page cache, so access to
internal data structures need not be synchronized.  Access to the disk file
is synchronized using file locks - the exact workings of which depends on
the operating system, which you have left unspecified.


>
> I am using sqlite-amalgamation-3071100. I will later upload the test
> case, trace (showing all calls that are made by each thread) and the
> stack trace for each thread when the race occurs, unless it becomes
> obvious that the above functions cannot race. What is best way to
> provide these files? (attachments? upload to a website and provide a
> link?)
>
> Thanks,
> Paul
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Data Input Techniques

2012-05-04 Thread Nigel Verity

Dear All
Thanks for all the help regarding ways of improving the speed of loading large 
volumes of data from a text file into a Sqlite table from within a bespoke 
application. The solution was indeed to place all the INSERT instructions 
between a BEGIN and a COMMIT. I can now load approx 50,000 records in about 4 
seconds. Previously it was > 15 minutes!
A question was raised about what SQLiteman is. It's an open source admin tool 
for, as its name suggests, sqlite databases. It runs on Linux (not sure about 
Windows) and I highly recommend it. It should be ideal for loading the contents 
of a CSV file.
Regards
Nige  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread joe.fis...@tanguaylab.com
I'm using R (R Studio) with large SQLite databases and it's a dream come 
true.
Here's a simple example of a ToxCast chemical table inside a Substance 
database.


# Load the following libraries into R
# DBIR Database Interface
# RSQLiteSQLite interface for R
drv <- dbDriver("SQLite")
con <- dbConnect(drv, dbname = "substance.db", flags=SQLITE_RO)
dbListTables(con)
dbGetQuery(con, "select * from toxcast_phase2")
dbGetQuery(con, "select id, dsstox_rid, cas_no, formula, 
molecular_weight, chemical_name from toxcast_phase2 where id < 10")

dbDisconnect(con)
dbUnloadDriver(drv)

Once you have access to the SQLite database, you can of course direct 
your queries into a Vector, Matrix, or Array object and utilize in R as 
needed.


Awesome!


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Data race (file race) in pager

2012-05-04 Thread Paul Thomson
I am working on a tool that (among other things) can detect data
races, including file access races. I have detected a file race in
SQLite on the database file that appears to be real, although I am not
certain - I have no experience with SQLite. I compiled SQLite with:
#define SQLITE_THREADSAFE 2

I use a simple test case where two threads call opendb on the same
file, write into the same table and then close the database. The file
race appears to occur due to sqlite3PagerReadFileheader (as the
database is opened?) in one thread and pager_write_pagelist in the
other. It looks as though the page that was written was pgno 1. Can
any experts explain whether these two accesses are in fact
synchronised, or if the race is benign?

I am using sqlite-amalgamation-3071100. I will later upload the test
case, trace (showing all calls that are made by each thread) and the
stack trace for each thread when the race occurs, unless it becomes
obvious that the above functions cannot race. What is best way to
provide these files? (attachments? upload to a website and provide a
link?)

Thanks,
Paul
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Black, Michael (IS)
Here's a utility to import a comma separated file (does not work for quoted 
strings or strings with commas).



Figures out the # of columns automagically from the csv file.



All wrapped in a singled transaction.

Shows progress every 100,000 inserts.



Usage: csvimport filename databasename tablename



#include 
#include 
#include 
#include "sqlite3.h"

#define BUFSIZE 100
#define MOD 10

char *sep=",\r\n";
// Add comma delimited file to exisiting database/table
// Strings cannot have commas in them and quotes will be retained
int main(int argc, char *argv[]) {
  sqlite3 *db;
  sqlite3_stmt *stmt;
  int rc;
  int ncol=0;
  int nline=0;
  char *buf=malloc(BUFSIZE);
  char sql[8192];
  FILE *fp;
  char *filename;
  char *databasename;
  char *tablename;
  if (argc != 4) {
fprintf(stderr,"Usage: %s filename databasename tablename\n",argv[0]);
exit(1);
  }
  filename = argv[1];
  databasename = argv[2];
  tablename = argv[3];
  rc = sqlite3_open_v2(databasename,,SQLITE_OPEN_READWRITE,NULL);
  if (rc) {
fprintf(stderr,"Error opening database '%s': 
%s\n",databasename,sqlite3_errmsg(db));
exit(1);
  }
  sprintf(sql,"insert into %s values (",tablename);
  fp=fopen(filename,"r");
  if (fp == NULL) {
perror(filename);
exit(1);
  }
  buf[BUFSIZE-1] = '*';
  fgets(buf,BUFSIZE,fp);
  if (buf[BUFSIZE-1] != '*') {
fprintf(stderr,"BUFSIZE not big enough...aborting\n");
exit(1);
  }
  // count the columns
  char *p=strtok(buf,sep);
  ncol=0;
  while(p) {
++ncol;
strcat(sql,ncol==1?"":",");
strcat(sql,"?");
p=strtok(NULL,sep);
  }
  printf("%d columns detected\n",ncol);
  strcat(sql,")");
  puts(sql);
  rewind(fp);
  // Let's wrap things in a transaction
  rc = sqlite3_exec(db,"BEGIN",NULL,NULL,NULL);
  if (rc) {
fprintf(stderr,"BEGIN failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  // prepare our statement
  rc = sqlite3_prepare(db,sql,strlen(sql),,NULL);
  if (rc) {
fprintf(stderr,"Prepare failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  // Loop over file file
  while(fgets(buf,BUFSIZE,fp)) {
char *p=strtok(buf,sep);
int i=1;
++nline;
if ((nline % MOD)==0) {
  printf("%d\r",nline);
  fflush(stdout);
}
while(p) { // bind the columns as text, table will take care of conversion 
to column types
  rc=sqlite3_bind_text(stmt,i,p,-1,SQLITE_TRANSIENT);
  if (rc) {
fprintf(stderr,"bind_text failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  ++i;
  p=strtok(NULL,sep);
}
if (--i != ncol) {
  fprintf(stderr,"expected %d cols, got %d cols on line#%d\n",ncol,i,nline)

} else {
  rc = sqlite3_step(stmt);
  if (rc != SQLITE_DONE) {
fprintf(stderr,"Insert failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  rc = sqlite3_reset(stmt);
  if (rc) {
fprintf(stderr,"Reset failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
}
  }
  rc=sqlite3_finalize(stmt);
  if (rc) {
fprintf(stderr,"Finalize failed: %s\n",sqlite3_errmsg(db));
exit(1);
  }
  printf("%d inserts, committing...\n",nline);
  rc = sqlite3_exec(db,"COMMIT",NULL,NULL,NULL);
  if (rc) {
fprintf(stderr,"COMMIT failed on '%s': %s\n",sql,sqlite3_errmsg(db));
exit(1);
  }
  rc=sqlite3_close(db);
  if (rc) {
fprintf(stderr,"Close failed on '%s': %s\n",argv[2],sqlite3_errmsg(db));
exit(1);
  }
  fclose(fp);
  return 0;
}



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of peter korinis [kori...@earthlink.net]
Sent: Friday, May 04, 2012 10:23 AM
To: 'General Discussion of SQLite Database'
Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file

I have scaled down the attributes of interest to 46 columns (discarding the
other 550). No columns are calculated. No updates to this file ... one user
... only  query, sort, etc. type transactions.
So I want to load two 22GB csv files into an empty 46 column table. (I
intend to test load with 999 records by 46 col file.) initially I only have
1 index on a record # ... am not positive several other fields that I want
to index may not be missing data in some records (I assume that will error
out if I make those an index).
After I get the data loaded and inspect for nulls in prospective index
attributes, can I add indices?

I was planning to load using sqlite3 CLI ".import" command. Is there a way I
can monitor the progress of the load, with only minimal impact on
performance ? I've started several loads only to find out hours later that
nothing has been loaded.

Thanks for helpful info.

peter


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Valentin 

Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Simon Slavin

On 4 May 2012, at 4:23pm, peter korinis  wrote:

> I have scaled down the attributes of interest to 46 columns (discarding the
> other 550). No columns are calculated. No updates to this file ... one user
> ... only  query, sort, etc. type transactions. 
> So I want to load two 22GB csv files into an empty 46 column table. (I
> intend to test load with 999 records by 46 col file.) initially I only have
> 1 index on a record # ... am not positive several other fields that I want
> to index may not be missing data in some records (I assume that will error
> out if I make those an index). 
> After I get the data loaded and inspect for nulls in prospective index
> attributes, can I add indices?

Yes.  You have done 'CREATE TABLE'.  The import process will execute many 
'INSERT' commands.  And you can 'CREATE INDEX' commands after your data is 
already present in the table.  It will work the way you want.

> I was planning to load using sqlite3 CLI ".import" command. Is there a way I
> can monitor the progress of the load, with only minimal impact on
> performance ?

Using File Explorer or the Finder or whatever, just keep an information window 
on the database file open.  You should see the file size increasing constantly 
while the command is working.  Alternatively you might use Task Manager or 
Activity Monitor to monitor something else that's going on.  For example, how 
many octets that process has written to disk.

> I've started several loads only to find out hours later that
> nothing has been loaded.

Yes, that's very annoying.  You keep thinking "It must be finished soon.".

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Gabor Grothendieck
On Fri, May 4, 2012 at 10:46 AM, Gabor Grothendieck
 wrote:
> On Fri, May 4, 2012 at 10:39 AM, Richard Hipp  wrote:
>> On Fri, May 4, 2012 at 10:33 AM, Gabor Grothendieck >> wrote:
>>
>>> On Fri, May 4, 2012 at 10:20 AM, Richard Hipp  wrote:
>>> > On Fri, May 4, 2012 at 10:06 AM, Rob Richardson <
>>> rdrichard...@rad-con.com>wrote:
>>> >
>>> >> Gabor Grothendieck mentioned a new feature of SQLite in 3.7.11:
>>> >>         Queries of the form: "SELECT max(x), y FROM table" returns the
>>> >> value of y on the same row that contains the maximum x value.
>>> >>
>>> >> Is that standard SQL behavior?  I'd have expected that to return one row
>>> >> for every row in the table.  To get the behavior described above, I'd
>>> use
>>> >> "SELECT x, y FROM table WHERE x = (SELECT max(x) FROM table)".
>>> >>
>>> >
>>> > It is definitely NOT standard behavior.  The standard behavior is
>>> > undefined.  Or (with many SQL engines) it will throw an error if you
>>> have a
>>> > term in the result set that is not part of an aggregate function or an
>>> > element of the GROUP BY clause.  But lots of newbies expect SQL to work
>>> as
>>> > described in the 3.7.11 release comments, and we used to get support
>>> > questions because it did not.  And so rather than continue to answer the
>>> > questions over and over, I figured it would be easier to tweak SQLite to
>>> > reliably do what newbies expect.  I never anticipated that this change
>>> > would be so controversial or confusing.
>>> >
>>> > All the existing, portable, documented ways to find the maximum element
>>> of
>>> > one column while simultaneously finding the other elements in the same
>>> row,
>>> > continue to work as they always have.  You are not required to use this
>>> new
>>> > approach.  In fact, if you want your SQL to be portable, you should
>>> > probably avoid it.  By adding this feature, we had hoped to help
>>> > application developers avoid a common SQL programming error.  That's all.
>>> > There is nothing profound going on here.
>>>
>>> Can't anyone answer the question directly?  I would still like to know
>>> precisely what works and what does not.  Its not possible to rely on
>>> general SQL documentation for this so I think its important to
>>> document it exactly.  Otherwise, we are left to examine the source
>>> code or use trial and error (and these methods only tell you how it
>>> works but not how its intended to work and they could be different if
>>> there are bugs).  If the documentation does exist please point me to
>>> it but I did not find it on my own.
>>>
>>
>> If a single min() or max() aggregate function appears in a query, then any
>> other columns that are not contained within aggregate functions and that
>> are not elements of the GROUP BY will take values from one of the same rows
>> that satisfied the one min() or max() aggregate function.
>
> Thanks!  I expect that this will be useful for me for at least quick
> and dirty computations.
>
> I suggest that this statement be added to the docs if its not already there.

Also the other part of my question.  What were the improvements to the
csv import?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread peter korinis
I have scaled down the attributes of interest to 46 columns (discarding the
other 550). No columns are calculated. No updates to this file ... one user
... only  query, sort, etc. type transactions. 
So I want to load two 22GB csv files into an empty 46 column table. (I
intend to test load with 999 records by 46 col file.) initially I only have
1 index on a record # ... am not positive several other fields that I want
to index may not be missing data in some records (I assume that will error
out if I make those an index). 
After I get the data loaded and inspect for nulls in prospective index
attributes, can I add indices?

I was planning to load using sqlite3 CLI ".import" command. Is there a way I
can monitor the progress of the load, with only minimal impact on
performance ? I've started several loads only to find out hours later that
nothing has been loaded.

Thanks for helpful info.

peter


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Valentin Davydov
Sent: Friday, May 04, 2012 9:43 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote:
> I'm new to SQLite . not a programmer . not a DBA . just an end-user 
> with no dev support for a pilot project (single user, no updates, just
queries).
> 
>  
> 
> I want to analyze the data contained in a 44GB csv file with 44M rows 
> x 600 columns (fields all <15 char). Seems like a DBMS will allow me 
> to query it in a variety of ways to analyze the data.

Yes, SQLite is quite capable of doing simple analyzis of such amounts of
data, especially selecting small subsets based on a simple criteria. However
before trying to do some real work you have to understand the structure of
your data, realize your possible queries and carefully design database
schema (tables and, equally important, indises). Perhaps, putting all data
in a single 600-column table is not a good idea (though allowed
technically), especially if your columns are equal by their physical nature:
it is not so easy to select arbitrarily calculated columns, only rows.

> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel 
> dual-proc with 4GB RAM + 200GB free disk space.

Well-indexed database of small data pieces usually takes up several times
more disk space than the raw data. Probably 200GB would not be enough,
dependng mostly on the number of indises. Consider dedicating a separate
disk (or even RAID array) for it.

> End-user tools like Excel & Access failed due to lack of memory. I 
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager 
> add-on but it would not load the csv files - 'csv worker failed'. So I 
> tried Database Master from Nucleon but it failed after loading (it 
> took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". 
> I tried to create another table in the same db but could not with same 
> error message. The DB size shows as 10,000KB (that looks suspiciously 
> like a size setting?).

Try bare sqlite shell instead of those external tools. It should take at
least several hours to fill up your database.

> From what I've read SQLite can handle this size DB.

Surely it can. In one of my projects the database takes up almost 3
terabytes of disk space, contains more than 10^10 records and still provides
small selects of indexed data in real time.

> 1.   Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)

It depends on the data structure, semantics and what you are going to find
there. SQLite isn't very good for calculation of complex aggregate
functions, but works fine in simple selecting and sorting.

> 2.   If SQLite will work, are there configuration settings in SQLite
or
> Win7 that will permit the load . or is there a better tool for this
project?

Increasing PAGE_SIZE to match the filesystem block (cluster) size and
perhaps CACHE_SIZE to fill most of the available RAM would help a bit. 
Also, don't forget to turn off journaling and wrap all in a single
transaction when creating database for the first time.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Simon Slavin

On 4 May 2012, at 4:02pm, peter korinis  wrote:

> Sqlitespy looks good ... I will try it. 
> website says download contains sqlite itself, which I already have - will
> there be a problem using ...spy with existing sqlite?

SQLite is not a single library which has to live somewhere on your computer.  
It is C code which each programmer includes in their program.  You can have 
twenty programs on your disk, each using a different version of SQLite, without 
problems.

> I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" to load
> a 999x46 comma-delimited file into a previously created empty table with 46
> col. (if this works I will load two 22M row x 46 col csv files into that
> table.) does this cmd work this way or must I create INSERT statements to do
> 999 inserts (later 44M inserts)?

Semicolons are needed at the end of SQL commands.  You don't want them at the 
end of commands which start with a dot.  Apart from that you have something 
worth trying.  Why not make a tiny test case with three lines of two columns 
before you get started with the 46-column monster ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread peter korinis
Sqlitespy looks good ... I will try it. 
website says download contains sqlite itself, which I already have - will
there be a problem using ...spy with existing sqlite?

I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" to load
a 999x46 comma-delimited file into a previously created empty table with 46
col. (if this works I will load two 22M row x 46 col csv files into that
table.) does this cmd work this way or must I create INSERT statements to do
999 inserts (later 44M inserts)?

Thanks,
peter

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Oliver Peters
Sent: Thursday, May 03, 2012 7:23 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

Am 03.05.2012 19:59, schrieb peter korinis:
> I have R but really haven't used it much. I know it's a great stats 
> package and great for data reduction ... but I want to perform queries 
> against my 44GB of data, filtering records by a variety of attributes, 
> comparing those subsets in a variety of ad hoc ways, perhaps 
> summing/counting other fields, etc.

I prefer

http://www.yunqa.de/delphi/doku.php/products/sqlitespy/index

for creating the database (tables, views) and doing the queries cause it's
fast and reliable but I prefer to write(!) SQL code and not to create it
through a generator (as it is done in Access).

sqlitespy can't do the import job; I always do this with the CLI by creating
INSERT statements with my scripting language in a separate file

since sqlite 3.7.11 you don't need a statement like

INSERT INTO table(col01,col02,col03) VALUES(1,2,3); INSERT INTO
table(col01,col02,col03) VALUES(4,5,6);

you can make it shorter:

INSERT INTO table(col01,col02,col03) VALUES(1,2,3),(4,5,6);

this is a great advantage if you need to do many INSERTs cause your file
won't become so large


> This is the kind of job excel is good at ... but the data is too bit!
> Seems like a database plus a good query GUI or some BI app would work. is
R
> a good query tool?

afaik there is no other way than to write (!) SQL Code - depending on 
the problem this can be done in an R script or directly in the database 
(i.e. as a VIEW) or as a combination

[...]

Oliver
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Richard Hipp
On Fri, May 4, 2012 at 10:33 AM, Gabor Grothendieck  wrote:

> On Fri, May 4, 2012 at 10:20 AM, Richard Hipp  wrote:
> > On Fri, May 4, 2012 at 10:06 AM, Rob Richardson <
> rdrichard...@rad-con.com>wrote:
> >
> >> Gabor Grothendieck mentioned a new feature of SQLite in 3.7.11:
> >> Queries of the form: "SELECT max(x), y FROM table" returns the
> >> value of y on the same row that contains the maximum x value.
> >>
> >> Is that standard SQL behavior?  I'd have expected that to return one row
> >> for every row in the table.  To get the behavior described above, I'd
> use
> >> "SELECT x, y FROM table WHERE x = (SELECT max(x) FROM table)".
> >>
> >
> > It is definitely NOT standard behavior.  The standard behavior is
> > undefined.  Or (with many SQL engines) it will throw an error if you
> have a
> > term in the result set that is not part of an aggregate function or an
> > element of the GROUP BY clause.  But lots of newbies expect SQL to work
> as
> > described in the 3.7.11 release comments, and we used to get support
> > questions because it did not.  And so rather than continue to answer the
> > questions over and over, I figured it would be easier to tweak SQLite to
> > reliably do what newbies expect.  I never anticipated that this change
> > would be so controversial or confusing.
> >
> > All the existing, portable, documented ways to find the maximum element
> of
> > one column while simultaneously finding the other elements in the same
> row,
> > continue to work as they always have.  You are not required to use this
> new
> > approach.  In fact, if you want your SQL to be portable, you should
> > probably avoid it.  By adding this feature, we had hoped to help
> > application developers avoid a common SQL programming error.  That's all.
> > There is nothing profound going on here.
>
> Can't anyone answer the question directly?  I would still like to know
> precisely what works and what does not.  Its not possible to rely on
> general SQL documentation for this so I think its important to
> document it exactly.  Otherwise, we are left to examine the source
> code or use trial and error (and these methods only tell you how it
> works but not how its intended to work and they could be different if
> there are bugs).  If the documentation does exist please point me to
> it but I did not find it on my own.
>

If a single min() or max() aggregate function appears in a query, then any
other columns that are not contained within aggregate functions and that
are not elements of the GROUP BY will take values from one of the same rows
that satisfied the one min() or max() aggregate function.



> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Gabor Grothendieck
On Fri, May 4, 2012 at 10:20 AM, Richard Hipp  wrote:
> On Fri, May 4, 2012 at 10:06 AM, Rob Richardson 
> wrote:
>
>> Gabor Grothendieck mentioned a new feature of SQLite in 3.7.11:
>>         Queries of the form: "SELECT max(x), y FROM table" returns the
>> value of y on the same row that contains the maximum x value.
>>
>> Is that standard SQL behavior?  I'd have expected that to return one row
>> for every row in the table.  To get the behavior described above, I'd use
>> "SELECT x, y FROM table WHERE x = (SELECT max(x) FROM table)".
>>
>
> It is definitely NOT standard behavior.  The standard behavior is
> undefined.  Or (with many SQL engines) it will throw an error if you have a
> term in the result set that is not part of an aggregate function or an
> element of the GROUP BY clause.  But lots of newbies expect SQL to work as
> described in the 3.7.11 release comments, and we used to get support
> questions because it did not.  And so rather than continue to answer the
> questions over and over, I figured it would be easier to tweak SQLite to
> reliably do what newbies expect.  I never anticipated that this change
> would be so controversial or confusing.
>
> All the existing, portable, documented ways to find the maximum element of
> one column while simultaneously finding the other elements in the same row,
> continue to work as they always have.  You are not required to use this new
> approach.  In fact, if you want your SQL to be portable, you should
> probably avoid it.  By adding this feature, we had hoped to help
> application developers avoid a common SQL programming error.  That's all.
> There is nothing profound going on here.

Can't anyone answer the question directly?  I would still like to know
precisely what works and what does not.  Its not possible to rely on
general SQL documentation for this so I think its important to
document it exactly.  Otherwise, we are left to examine the source
code or use trial and error (and these methods only tell you how it
works but not how its intended to work and they could be different if
there are bugs).  If the documentation does exist please point me to
it but I did not find it on my own.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Richard Hipp
On Fri, May 4, 2012 at 10:06 AM, Rob Richardson wrote:

> Gabor Grothendieck mentioned a new feature of SQLite in 3.7.11:
> Queries of the form: "SELECT max(x), y FROM table" returns the
> value of y on the same row that contains the maximum x value.
>
> Is that standard SQL behavior?  I'd have expected that to return one row
> for every row in the table.  To get the behavior described above, I'd use
> "SELECT x, y FROM table WHERE x = (SELECT max(x) FROM table)".
>

It is definitely NOT standard behavior.  The standard behavior is
undefined.  Or (with many SQL engines) it will throw an error if you have a
term in the result set that is not part of an aggregate function or an
element of the GROUP BY clause.  But lots of newbies expect SQL to work as
described in the 3.7.11 release comments, and we used to get support
questions because it did not.  And so rather than continue to answer the
questions over and over, I figured it would be easier to tweak SQLite to
reliably do what newbies expect.  I never anticipated that this change
would be so controversial or confusing.

All the existing, portable, documented ways to find the maximum element of
one column while simultaneously finding the other elements in the same row,
continue to work as they always have.  You are not required to use this new
approach.  In fact, if you want your SQL to be portable, you should
probably avoid it.  By adding this feature, we had hoped to help
application developers avoid a common SQL programming error.  That's all.
There is nothing profound going on here.



>
> RobR
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Details on New Features

2012-05-04 Thread Rob Richardson
Gabor Grothendieck mentioned a new feature of SQLite in 3.7.11:
Queries of the form: "SELECT max(x), y FROM table" returns the value of 
y on the same row that contains the maximum x value.

Is that standard SQL behavior?  I'd have expected that to return one row for 
every row in the table.  To get the behavior described above, I'd use "SELECT 
x, y FROM table WHERE x = (SELECT max(x) FROM table)".

RobR
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Valentin Davydov
On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote:
> I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
> dev support for a pilot project (single user, no updates, just queries).
> 
>  
> 
> I want to analyze the data contained in a 44GB csv file with 44M rows x 600
> columns (fields all <15 char). Seems like a DBMS will allow me to query it
> in a variety of ways to analyze the data. 

Yes, SQLite is quite capable of doing simple analyzis of such amounts of data,
especially selecting small subsets based on a simple criteria. However before 
trying to do some real work you have to understand the structure of your data,
realize your possible queries and carefully design database schema (tables 
and, equally important, indises). Perhaps, putting all data in a single 
600-column table is not a good idea (though allowed technically), especially
if your columns are equal by their physical nature: it is not so easy to
select arbitrarily calculated columns, only rows.

> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
> with 4GB RAM + 200GB free disk space.

Well-indexed database of small data pieces usually takes up several times 
more disk space than the raw data. Probably 200GB would not be enough, 
dependng mostly on the number of indises. Consider dedicating a separate
disk (or even RAID array) for it.

> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
> but it would not load the csv files - 'csv worker failed'. So I tried
> Database Master from Nucleon but it failed after loading (it took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". I
> tried to create another table in the same db but could not with same error
> message. The DB size shows as 10,000KB (that looks suspiciously like a size
> setting?).

Try bare sqlite shell instead of those external tools. It should take 
at least several hours to fill up your database.

> From what I've read SQLite can handle this size DB.

Surely it can. In one of my projects the database takes up almost 3 terabytes 
of disk space, contains more than 10^10 records and still provides small 
selects of indexed data in real time.

> 1.   Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)

It depends on the data structure, semantics and what you are going to find 
there. SQLite isn't very good for calculation of complex aggregate functions,
but works fine in simple selecting and sorting.

> 2.   If SQLite will work, are there configuration settings in SQLite or
> Win7 that will permit the load . or is there a better tool for this project?

Increasing PAGE_SIZE to match the filesystem block (cluster) size and 
perhaps CACHE_SIZE to fill most of the available RAM would help a bit. 
Also, don't forget to turn off journaling and wrap all in a single 
transaction when creating database for the first time.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Details on New Features

2012-05-04 Thread Gabor Grothendieck
In this link:

  http://sqlite.org/releaselog/3_7_11.html

it refers to these new features:

Queries of the form: "SELECT max(x), y FROM table" returns the value
of y on the same row that contains the maximum x value.

Improvements to the handling of CSV inputs in the command-line shell

Is there documentation somewhere that defines exactly what these mean?
 Does the max(x) apply to min(x) too?  does the max have to precede
the y?  Can there be multiple y's?  What precisely are the
"improvements" in handling of CSV inputs?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] draft status is missing "is recommended."

2012-05-04 Thread Carlos Milon Silva
Status: Version 3.7.12 of SQLite is recommended for all new development. 
Upgrading from version 3.7.6.3, 3.7.7, 3.7.7.1, 3.7.8, 3.7.9, or 3.7.11 
is optional. Upgrading from all other SQLite versions.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users