date:20121025

[sqlite] I/O error on creating index with 3.7.14

2012-10-25 Thread Jamie Norrish

Using 3.7.14, when creating an index on a 27G database (on the table
that contains almost all of the data), I consistently (on Windows XP and
Debian GNU/Linux, on three different machines) get a disk I/O error.
This does not happen using 3.7.13 (only tested on Debian GNU/Linux), nor
does it happen when creating the same index on a smaller (~2G) version
of the database.

I ran the process under strace; the final relevant lines (as far as I
can judge, knowing nothing of this) are:

lseek(5, 23934032896, SEEK_SET) = 23934032896
write(5, "\231\216\344\271\213\351\235\240\345\261\261\n+\367K\340*\5\2I
\1\4\v\3
01\351\276\215\345\202\276\346\271"..., 1024) = 1024
lseek(5, 23934033920, SEEK_SET) = 23934033920
write(5, "\207\345\210\245\347\267\207\347\264\240\346\211\200\350\254
\202\346\2
11\223\351\274\223\345\274\204\n+\367u\334*\5"..., 632) = 632
lseek(5, 0, SEEK_SET)   = 0
read(5, 0x7f5462cb06b8, 18446744071873782392) = -1 EFAULT (Bad address)
close(5)= 0

The command that causes the error is "CREATE INDEX IF NOT EXISTS
TextNGramIndex ON TextNGram (text, ngram, size)". The database schema is
(without the failing index):

CREATE TABLE Text (
   id INTEGER PRIMARY KEY ASC,
   filename TEXT UNIQUE NOT NULL,
   checksum TEXT NOT NULL,
   label TEXT NOT NULL
   );
CREATE TABLE TextHasNGram (
   text INTEGER NOT NULL REFERENCES Text (id),
   size INTEGER NOT NULL
   );
CREATE TABLE TextNGram (
text INTEGER NOT NULL REFERENCES Text (id),
ngram TEXT NOT NULL,
size INTEGER NOT NULL,
count INTEGER NOT NULL
);
CREATE UNIQUE INDEX TextHasNGramIndex
   ON TextHasNGram (text, size);
CREATE INDEX TextIndexLabel ON Text (label);


Is there more information I should provide as part of a proper bug
report, or is this a known issue, or have I missed a trick somewhere?

Jamie

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] FW: how to select " char in sqlite

2012-10-25 Thread YAN HONG YE

char bh1[320];
memset(bh1,0,320);
strcpy(bh1,"sqlite3 -html -header t9_engine.db \"select 
id,partnumber,substr(\'\',1,180) as 
img,pcs from engine where id>7;\" >> n.html");
system(bh1);  //here couldn't work

error:
sqlite3 -html -header t9_engine.db "select id,partnumber,substr('',1,180) as img,pcs from engine where id>7;" >> n.htmlError: n
ear "'\',1,180) as 
img,pcs from engine where id>7;\" >> n.html");
strcpy(bh1,"sqlite3 -html -header t9_engine.db \"select id,partnumber,'' as img,pcs from engine where id>7;\" >> n.html");
system(bh1);  //here could work
the result is:
8
AA34841687 000 INSONO-SOUS-MOTEUR--
   
//here I wanna add " char between  'C:\t9\images\INSONO-SOUS-MOTEUR.jpg'  
1


and the best way is change 
to  >

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] FW: how to select " char in sqlite

2012-10-25 Thread YAN HONG YE

char bh1[320];
memset(bh1,0,320);
strcpy(bh1,"sqlite3 -html -header t9_engine.db \"select 
id,partnumber,substr(\'\',1,180) as 
img,pcs from engine where id>7;\" >> n.html");
system(bh1);  //here couldn't work

error:
sqlite3 -html -header t9_engine.db "select id,partnumber,substr('',1,180) as img,pcs from engine where id>7;" >> n.htmlError: n
ear "'\',1,180) as 
img,pcs from engine where id>7;\" >> n.html");
strcpy(bh1,"sqlite3 -html -header t9_engine.db \"select id,partnumber,'' as img,pcs from engine where id>7;\" >> n.html");
system(bh1);  //here could work
the result is:
8
AA34841687 000 INSONO-SOUS-MOTEUR--
   
//here I wanna add " char between  'C:\t9\images\INSONO-SOUS-MOTEUR.jpg'  
1


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread Theodore Ts'o

On Thu, Oct 25, 2012 at 11:03:13AM -0700, da...@lang.hm wrote:
> I agree, this is why I'm trying to figure out the recommended way to
> do this without needing to do full commits.
> 
> Since in most cases it's acceptable to loose the last few chunks
> written, if we had some way of specifying ordering, without having
> to specify "write this NOW", the solution would be pretty obvious.

Well, using data journalling with ext3/4 may do what you want.  If you
don't do any fsync, the changes will get written every 5 seconds when
the automatic journal sync happens (and sub-4k writes will also get
coalesced to a 5 second granularity).  Even with plain text files,
it's pretty easy to tell whether or not the final record is a
partially written or not after a crash; just look for a trailing
newline.

Better yet, if you are writing to multiple log files with data
journalling, all of the writes will happen at the same time, and they
will be streamed to the file system journal, minimizing random writes
for at least the journal writes.

- Ted
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread david


On Thu, 25 Oct 2012, Theodore Ts'o wrote:


Or does rsyslog *really* need to issue an fsync after each log
message?  Or could it batch updates so that every N seconds, it
flushes writes to the disk?


In part this depends on how paranoid the admin is. By default rsyslog 
doesn't do fsyncs, but admins can configure it to do so and can configure 
the batch size.


However, what I'm talking about here is not normal message traffic, it's 
the case where the admin has decided that they don't want to use the 
normal inmemory queues, they want to have the queues be on disk so that if 
the system crashes the queued data will still be there to be processed 
after the crash (In addition, this can get used to cover cases where you 
want queue sizes larger than your available RAM)


In this case, the extreme, and only at the explicit direction of the 
admin, is to fsync after every message.


The norm is that it's acceptable to loose the last few messages, but 
loosing a chunk out of the middle of the queue file can cause a whole lot 
more to be lost, passing the threshold of acceptable.



Sometimes, the answer is not to try to create exotic database like
functionality in the file system --- the answer is to be more
intelligent at the application leyer.  Not only will the application
be more portable, it will also in the end be more efficient, since
even with the most exotic database technologies, the most efficient
transactional commit is the unneeded commit that you optimize away at
the application layer.


I agree, this is why I'm trying to figure out the recommended way to do 
this without needing to do full commits.


Since in most cases it's acceptable to loose the last few chunks written, 
if we had some way of specifying ordering, without having to specify 
"write this NOW", the solution would be pretty obvious.


David Lang
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Disk I/O Error

2012-10-25 Thread Joe Mistachkin

Caio Honma wrote:
>
> I was trying to query from a huge table (150,000,000 lines+) using a left
> join statement, using mode csv and using an output to file.. after several
> minutes I got the Disk I/O error. I searched this maillist for some
> possible answers (tried to check_integrity, but the db was ok, then I
> installed the Process Monitor).
> 

Can you enable logging via the SQLITE_CONFIG_LOG configuration option
before calling any other SQLite APIs?  This should help to narrow down the
operation returning the error.

http://www.sqlite.org/c3ref/c_config_getmalloc.html#sqliteconfiglog

Example:

void logging_callback(void *notUsed, int iCode, const char *zErrmsg){
  /* handle output here... */
  fprintf(stdout, "%d: %s\n", iCode, zErrmsg);
}

sqlite3_config(SQLITE_CONFIG_LOG, logging_callback, 0);

--
Joe Mistachkin

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Disk I/O Error

2012-10-25 Thread Caio Honma

Hi!

I was trying to query from a huge table (150,000,000 lines+) using a left
join statement, using mode csv and using an output to file.. after several
minutes I got the Disk I/O error. I searched this maillist for some
possible answers (tried to check_integrity, but the db was ok, then I
installed the Process Monitor).

I saw the results after the query, but the only unusual result was a
RegQueryValue point to
HKLM\System\CurrentControlSet\Control\NIs\CustomLocale\EMPTY with the
result NAME NOT FOUND and a RegQueryValue pointing to
HKLM\System\CurrentControlSet\Control\NIs\Language\InstallLanguageFallback
that returned with a BUFFER OVERFLOW.

Does anybody know a way to solve this error?

Thanks in advance and sorry about my english.

Caio

PS: The DB is located in my hard drive and i'm using Windows 7 x64
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] segfault in 3.7.13 (and older) with UNION ALL

2012-10-25 Thread Richard Hipp

This bug was fixed on 2012-08-28 (by the checkin at
http://www.sqlite.org/src/info/200a81358c).  The fix is available in SQLite
version 3.7.14 and 3.7.14.1 as well as on trunk.

On Thu, Oct 25, 2012 at 11:50 AM, Steve Pomeroy  wrote:

> This bug was originally found on a version of sqlite3 (tested in 3.5.9
> and 3.7.11) in Android (2.2.1 and 4.1.2 respectively), but I've been
> able to recreate it on sqlite3 3.7.13 from Debian/wheezy.
>
> Test case:
>
> CREATE TABLE t1(a,b,c);
> INSERT INTO t1 VALUES(1,2,3), (4,5,6);
> CREATE TABLE t2(d,e,f);
> INSERT INTO t2 VALUES(3,6,9), (4,8,12);
>
> # This works as expected
> SELECT * FROM (SELECT a AS x, b AS y FROM t1 UNION ALL SELECT d AS x, e
> AS y FROM t2);
> #1|2
> #4|5
> #3|6
> #4|8
>
> # this doesn't crash...
> SELECT * FROM (SELECT a AS x, b AS y FROM t1 UNION ALL SELECT d AS x, e
> AS y, f AS z FROM t2);
> #1|2
> #4|5
> #3|6
> #4|8
>
> # but this causes a segfault (it's missing column "y" from the second
> inner SELECT):
> SELECT * FROM (SELECT a AS x, b AS y FROM t1 UNION ALL SELECT d AS x
> FROM t2);
>
> --
> Steve Pomeroy
> MIT Mobile Experience Lab
> http://mobile.mit.edu/
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] segfault in 3.7.13 (and older) with UNION ALL

2012-10-25 Thread Steve Pomeroy

This bug was originally found on a version of sqlite3 (tested in 3.5.9
and 3.7.11) in Android (2.2.1 and 4.1.2 respectively), but I've been
able to recreate it on sqlite3 3.7.13 from Debian/wheezy.

Test case:

CREATE TABLE t1(a,b,c);
INSERT INTO t1 VALUES(1,2,3), (4,5,6);
CREATE TABLE t2(d,e,f);
INSERT INTO t2 VALUES(3,6,9), (4,8,12);

# This works as expected
SELECT * FROM (SELECT a AS x, b AS y FROM t1 UNION ALL SELECT d AS x, e
AS y FROM t2);
#1|2
#4|5
#3|6
#4|8

# this doesn't crash...
SELECT * FROM (SELECT a AS x, b AS y FROM t1 UNION ALL SELECT d AS x, e
AS y, f AS z FROM t2);
#1|2
#4|5
#3|6
#4|8

# but this causes a segfault (it's missing column "y" from the second
inner SELECT):
SELECT * FROM (SELECT a AS x, b AS y FROM t1 UNION ALL SELECT d AS x
FROM t2);

-- 
Steve Pomeroy
MIT Mobile Experience Lab
http://mobile.mit.edu/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread Simon Slavin

On 25 Oct 2012, at 2:04am, da...@lang.hm wrote:

> But unless you are a filesystem, how can you make sure that the message data 
> is written to file1 before you write the metadata about the message to file2?

Wait for long enough for the disk subsystem to clear its backlog of write 
commands.  A few seconds should do it.

> right now it seems that there is no way for an application to do this other 
> than doing a fsync(file1) before writing the metadata to file2

No, as I've posted previously to this thread, you can assume that fsync() 
literally does nothing.  It really is implemented as a 'noop' in many cases.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread Alan Cox

> > Hopefully, eventually the storage developers will realize the value
> > behind ordered commands and learn corresponding SCSI facilities to
> > deal with them.
> 
> Eventually, drive manufacturers will realize that trying to price
> guage people who want advanced features such as TCQ, DIF/DIX, is the
> best way to gaurantee that most people won't bother to purchase them,
> and hence the features will remain largely unused

I doubt they care. The profit on high end features from the people who
really need them I would bet far exceeds any other benefit of giving it to
others. Welcome to capitalism 8)

Plus - spinning rust for those end users is on the way out, SATA to flash
is a bit of hack and people are already putting a lot of focus onto
things like NVM Express.

Alan
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] transaction commit is successful despite I/O error

2012-10-25 Thread Pavel Ivanov

I guess this can stand as one more argument for not using SQLite over
NFS. Another argument could be found here:
http://www.sqlite.org/lockingv3.html#how_to_corrupt.

Pavel


On Wed, Oct 24, 2012 at 2:01 PM, Efim Dyadkin  wrote:
> I was probably not quite specific. So I would like to rephrase the problem 
> and give more details.
>
> I run a default configuration of Sqlite on Linux and I don't use WAL or 
> anything that changes transaction behavior. A database is located on a remote 
> file server and accessed via NFS. There is a single process only that 
> accesses the database. The problem is when auto-mounter is stopped (on a 
> machine where the process is running) in the middle of database transaction, 
> commit returns successfully though it leaves a hot journal on disk because it 
> is not seen for unlink (ENOENT)!!!
>
> Note, when auto-mounter is down, you still can operate on a database file and 
> its journal if they were opened before the stop.
>
> The problem can be reproduced very easily with an Sqlite command shell:
> 1) Run the shell
> 2) Begin exclusive transaction
> 3) Insert a few records
> 4) Stop auto-mounter
> 5) Optionally insert a few more records (stopped auto-mounter does not affect 
> these inserts because database and journal are opened)
> 6) Commit (no errors!!!)
> 7) Quit the shell
>
> If you then restart auto-mounter and check the database directory you will 
> find a hot journal! If you reopen the database the transaction will be played 
> back and recently commited changes will be gone!
>
> Thanks for your feedbacks,
> Efim
>
>
>
> 
>  This e-mail, including any attached files, may contain confidential and 
> privileged information for the sole use of the intended recipient. Any 
> review, use, distribution, or disclosure by others is strictly prohibited. If 
> you are not the intended recipient (or authorized to receive information for 
> the intended recipient), please contact the sender by reply e-mail and delete 
> all copies of this message.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread Theodore Ts'o

On Wed, Oct 24, 2012 at 11:58:49PM -0700, da...@lang.hm wrote:
> The frustrating thing is that when people point out how things like
> sqlite are so horribly slow, the reply seems to be "well, that's
> what you get for doing so many fsyncs, don't do that", when there is
> a 'problem' like the KDE "config loss" problem a few years ago, the
> response is "well, that's what you get for not doing fsync"

Sure... but the answer is to only do the fsync's when you need to.
For example, if GNOME and KDE is rewriting the entire registry file
each time the application is changing a single registry key, sure, if
you rewrite the entire registry file, and then fsync after each
rewrite before you replace the file, you will be safe.  And if the
application needs to update dozens or hundreds of registry keys (or
every time the window gets moved or resized), then yes, it will be
slow.  But the application didn't have to do that!  It could have
updated all the registry keys in memory, and then update the registry
file periodically instead.

Similarly, Firefox didn't need to do a sqllite commit after every
single time its history file was written, causing a third of a
megabyte of write traffic each time you clicked on a web page.  It
could have batched its updates to the history file, since most of the
time, you don't care about making sure the web history is written to
stable store before you're allowed to click on a web page and visit
the next web page.

Or does rsyslog *really* need to issue an fsync after each log
message?  Or could it batch updates so that every N seconds, it
flushes writes to the disk?

(And this is a problem with most Android applications as well.
Apparently the framework API's are such that it's easier for an
application to treat each sqlite statement as an atomic update, so
many/most application writers don't use explicit transaction
boundaries, so updates don't get batched even though it would be more
efficient if they did so.)

Sometimes, the answer is not to try to create exotic database like
functionality in the file system --- the answer is to be more
intelligent at the application leyer.  Not only will the application
be more portable, it will also in the end be more efficient, since
even with the most exotic database technologies, the most efficient
transactional commit is the unneeded commit that you optimize away at
the application layer.

- Ted
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread Theodore Ts'o

On Thu, Oct 25, 2012 at 02:03:25PM +0100, Alan Cox wrote:
> 
> I doubt they care. The profit on high end features from the people who
> really need them I would bet far exceeds any other benefit of giving it to
> others. Welcome to capitalism 8)

Yes, but it's a question of pricing.  If they had priced it a just a
wee bit higher, then there would have been incentive to add support
for TCQ so it could actually be used into various Linux file systems,
since there would have been lots of users of it.  But as it is, the
folks who are purchasing huge, vast number of these drives --- such as
at the large cloud providers: Amazon, Facebook, Racespace, et. al. ---
will choose to purchase large numbers of commodity drives, and then
find ways to work around the missing functionality in userspace.  For
example, DIF/DIX would be nice, and if it were available for cheap, I
could imagine it being used.  But you can accomplish the same thing in
userspace, and in fact at Google I've implemented a special
not-for-mainline patch which spikes out stable writes (required for
DIF/DIX) because it has significant performance overhead, and DIF/DIX
has zero benefit if you're not willing to shell out $$$ for hardware
that supports it.

Maybe the HDD manufacturers have been able to price guage a small
number enterprise I/T shops with more dollars than sense, but
personally, I'm not convinced they picked an optimal pricing
strategy

Put another way, I accept that Toyota should price a Lexus ES more
than a Camry, but if it's priced at say, 3x the price of a Camry
instead of 20%, they might find that precious few people are willing
to pay that kind of money for what is essentially the same car with
minor luxury tweaks added to it.

> Plus - spinning rust for those end users is on the way out, SATA to flash
> is a bit of hack and people are already putting a lot of focus onto
> things like NVM Express.

Yeah  I don't buy that.  One, flash is still too expensive.  Two,
the capital costs to build enough Silicon foundries to replace the
current production volume of HDD's is way too expensive for any
company to afford (the cloud providers are buying *huge* numbers of
HDD's) --- and that's assuming companies wouldn't chose to use those
foundries for products with larger margins --- such as, for example,
CPU/GPU chips. :-) And third and finally, if you study the long-term
trends in terms of Data Retention Time (going down), Program and Read
Disturb (going up), and Write Endurance (going down) as a function of
feature size and/or time, you'd be wise to treat flash as nothing more
than short-term cache, and not as a long term stable store.

If end users completely give up on flash, and store all of their
precious family pictures on flash storage, after a couple of years,
they are likely going to be very disappointed

Speaking personally, I wouldn't want to have anything on flash for
more than a few months at *most* before I made sure I had another copy
saved on spinning rust platters for long-term retention.

  - Ted
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Which non-default options are used for the shell?

2012-10-25 Thread Clemens Ladisch

Baruch Burstein wrote:
> I know FTS is enabled (SQLITE_ENABLE_FTS*), but are any other flags used in
> compiling the default shell?

sqlite> pragma compile_options;
CURDIR
ENABLE_FTS3
ENABLE_RTREE
TEMP_STORE=1
THREADSAFE=0


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Which non-default options are used for the shell?

2012-10-25 Thread Baruch Burstein

I know FTS is enabled (SQLITE_ENABLE_FTS*), but are any other flags used in
compiling the default shell?

-- 
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Bug report concerning the juliandate function

2012-10-25 Thread Richard Hipp

On Wed, Oct 24, 2012 at 1:06 PM,  wrote:

> Dear sir or madam,
>
> I have tested sqlite version 3.6. and version 3.7.14.1 with the following
> inputs:
>
> select julianday('2005-02-29');
> select julianday('2005-02-30');
>
> In both cases, it should have returned NULL, but it returned a number.
>

The date-time functions all accept day numbers up to and including 31 on
all months, even months that have fewer days.  These extra days on the end
of each month roll over into the following month.  So, for example, all of
the following are equivalent:

julianday('2005-02-29')==julianday('2005-03-01')
julianday('2005-02-30')==julianday('2005-03-02')
julianday('2005-02-31')==julianday('2005-03-03')
julianday('2005-04-31')==julianday('2005-05-01')


>
> In the first case, the year 2005 is not a leap year. Therefore, this date
> is not valid.
> In the second case, February has never 30 days.
>
> Regards,
>
> Kagan Kayal
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread david


On Thu, 25 Oct 2012, Theodore Ts'o wrote:


On Wed, Oct 24, 2012 at 03:03:00PM -0700, da...@lang.hm wrote:

Like what is being described for sqlite, loosing the tail end of the
messages is not a big problem under normal conditions. But there is
a need to be sure that what is there is complete up to the point
where it's lost.

this is similar in concept to write-ahead-logs done for databases
(without the absolute durability requirement)


If that's what you require, and you are using ext3/4, usng data
journalling might meet your requirements.  It's something you can
enable on a per-file basis, via chattr +j; you don't have to force all
file systems to use data journaling via the data=journalled mount
option.

The potential downsides that you may or may not care about for this
particular application:

(a) This will definitely have a performance impact, especially if you
are doing lots of small (less than 4k) writes, since the data blocks
will get run through the journal, and will only get written to their
final location on disk.

(b) You don't get atomicity if the write spans a 4k block boundary.
All of the bytes before i_size will be written, so you don't have to
worry about "holes"; but the last message written to the log file
might be truncated.

(c) There will be a performance impact, since the contents of data
blocks will be written at least twice (once to the journal, and once
to the final location on disk).  If you do lots of small, sub-4k
writes, the performance might be even worse, since data blocks might
be written multiple times to the journal.


I'll have to dig into this option. In the case of rsyslog it sounds 
like it could work (not as good as a filesystem independant way of doing 
things, but better than full fsyncs)


Truncated messages are not great, but they are a detectable, and 
acceptable risk.


while the average message size is much smaller than 4K (on my network it's 
~250 bytes), the metadata that's broken out expands this somewhat, and we 
can afford to waste disk space if it makes things safer or more efficient.


If we do update in place with flags with each message, each message will 
need to be written up to three times (on recipt, being processed, finished 
processed). With high message burst rates, I'm worried that we would fill 
up the journal, is there a good way to deal with this?


I believe that ext4 can put the journal on a different device from the 
filesystem, would this help a lot?


If you were to put the journal for an ext4 filesystem on a ram disk, you 
would loose the data recovery protection of the journal, but could you use 
this trick to get ordered data writes onto the filesystem?


David Lang
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread david


On Thu, 25 Oct 2012, Theodore Ts'o wrote:


On Thu, Oct 25, 2012 at 12:18:47AM -0500, Nico Williams wrote:


By trusting fsync().  And if you don't care about immediate Durability
you can run the fsync() in a background thread and mark the associated
transaction as completed in the next transaction to be written after
the fsync() completes.


The challenge is when you have entagled metadata updates.  That is,
you update file A, and file B, and file A and B might share metadata.
In order to sync file A, you also have to update part of the metadata
for the updates to file B, which means calculating the dependencies of
what you have to drag in can get very complicated.  You can keep track
of what bits of the metadata you have to undo and then redo before
writing out the metadata for fsync(A), but that basically means you
have to implement soft updates, and all of the complexity this
implies: http://lwn.net/Articles/339337/

If you can keep all of the metadata separate, this can be somewhat
mitigated, but usually the block allocation records (regardless of
whether you use a tree, or a bitmap, or some other data structure)
tends of have entanglement problems.


hmm, two thoughts occur to me.

1. to avoid entanglement, put the two files in separate directories

2. take advantage of entaglement to enforce ordering


thread 1 (repeated): write new message to file 1, spawn new thread to 
fsync


thread 2: write to file 2 that message1-5 are being worked on

thread 2 (later): write to file 2 that messages 1-5 are done

when thread 1 spawns the new thread to do the fsync, the system will be 
forced to write the data to file 2 as of the time it does the fsync.


This should make it so that you never have data written to file2 that 
refers to data that hasn't been written to file1 yet.




It certainly is not impossible; RDBMS's have implemented this.  On the
other hand, they generally aren't as fast as file systems for
non-transactional workloads, and people really care about performance
on those sorts of workloads for file systems.


the RDBMS's have implemented stronger guarantees than what we are needing

A few years ago I was investigating this for logging. With the reliable 
(RDBMS style) , but inefficent disk queue that rsyslog has, writing to a 
high-end fusion-io SSD, ext2 resulted in ~8K logs/sec, ext3 resultedin ~2K 
logs/sec, and JFS/XFS resulted in ~4K logs/sec (ext4 wasn't considered 
stable enough at the time to be tested)



Still, if you want to try to implement such a thing, by all means,
give it a try.  But I think you'll find that creating a file system
that can compete with existing file systems for performance, and
*then* also supports a transactional model, is going to be quite a
challenge.


The question is trying to figure a way to get ordering right with existing 
filesystms (preferrably without using something too tied to a single 
filesystem implementation), not try and create a new one.


The frustrating thing is that when people point out how things like sqlite 
are so horribly slow, the reply seems to be "well, that's what you get for 
doing so many fsyncs, don't do that", when there is a 'problem' like the 
KDE "config loss" problem a few years ago, the response is "well, that's 
what you get for not doing fsync"


Both responses are correct, from a purely technical point of view.

But what's missing is any way to get the result of ordered I/O that will 
let you do something pretty fast, but with the guarantee that, if you 
loose data in a crash, the only loss you are risking is that your most 
recent data may be missing. (either for one file, or using multiple files 
if that's what it takes)


Since this topic came up again, I figured I'd poke a bit and try to either 
get educated on how to do this "right" or try and see if there's something 
that could be added to the kernel to make it possible for userspace 
programs to do this.


What I think userspace really needs is something like a barrier function 
call. "for this fd, don't re-order writes as they go down through the 
stack"


If the hardware is going to reorder things once it hits the hardware, this 
is going to hurt performance (how much depends on a lot of stuff)


but the filesystems are able to make their journals work, so there should 
be some way to let userspace do some sort of similar ordering


David Lang
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread Theodore Ts'o

On Thu, Oct 25, 2012 at 12:18:47AM -0500, Nico Williams wrote:
> 
> By trusting fsync().  And if you don't care about immediate Durability
> you can run the fsync() in a background thread and mark the associated
> transaction as completed in the next transaction to be written after
> the fsync() completes.

The challenge is when you have entagled metadata updates.  That is,
you update file A, and file B, and file A and B might share metadata.
In order to sync file A, you also have to update part of the metadata
for the updates to file B, which means calculating the dependencies of
what you have to drag in can get very complicated.  You can keep track
of what bits of the metadata you have to undo and then redo before
writing out the metadata for fsync(A), but that basically means you
have to implement soft updates, and all of the complexity this
implies: http://lwn.net/Articles/339337/

If you can keep all of the metadata separate, this can be somewhat
mitigated, but usually the block allocation records (regardless of
whether you use a tree, or a bitmap, or some other data structure)
tends of have entanglement problems.

It certainly is not impossible; RDBMS's have implemented this.  On the
other hand, they generally aren't as fast as file systems for
non-transactional workloads, and people really care about performance
on those sorts of workloads for file systems.  (About a decade ago,
Oracle tried to claim that you could run file system workloads using
an Oracle databsae as a back-end.  Everyone laughed at them, and the
idea died a quick, merciful death.)

Still, if you want to try to implement such a thing, by all means,
give it a try.  But I think you'll find that creating a file system
that can compete with existing file systems for performance, and
*then* also supports a transactional model, is going to be quite a
challenge.

 - Ted
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread Theodore Ts'o

On Wed, Oct 24, 2012 at 03:03:00PM -0700, da...@lang.hm wrote:
> Like what is being described for sqlite, loosing the tail end of the
> messages is not a big problem under normal conditions. But there is
> a need to be sure that what is there is complete up to the point
> where it's lost.
> 
> this is similar in concept to write-ahead-logs done for databases
> (without the absolute durability requirement)

If that's what you require, and you are using ext3/4, usng data
journalling might meet your requirements.  It's something you can
enable on a per-file basis, via chattr +j; you don't have to force all
file systems to use data journaling via the data=journalled mount
option.

The potential downsides that you may or may not care about for this
particular application:

(a) This will definitely have a performance impact, especially if you
are doing lots of small (less than 4k) writes, since the data blocks
will get run through the journal, and will only get written to their
final location on disk.

(b) You don't get atomicity if the write spans a 4k block boundary.
All of the bytes before i_size will be written, so you don't have to
worry about "holes"; but the last message written to the log file
might be truncated.

(c) There will be a performance impact, since the contents of data
blocks will be written at least twice (once to the journal, and once
to the final location on disk).  If you do lots of small, sub-4k
writes, the performance might be even worse, since data blocks might
be written multiple times to the journal.

- Ted
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread Theodore Ts'o

On Tue, Oct 23, 2012 at 03:53:11PM -0400, Vladislav Bolkhovitin wrote:
> Yes, SCSI has full support for ordered/simple commands designed
> exactly for that task: to have steady flow of commands even in case
> when some of them are ordered.

SCSI does, yes --- *if* the device actually implements Tagged Command
Queuing (TCQ).  Not all devices do.

More importantly, SATA drives do *not* have this capability, and when
you compare the price of SATA drives to uber-expensive "enterprise
drives", it's not surprising that most people don't actually use
SCSI/SAS drives that have implemented TCQ.  SATA's Native Command
Queuing (NCQ) is not equivalent; this allows the drive to reorder
requests (in particular read requests) so they can be serviced more
efficiently, but it does *not* allow the OS to specify a partial,
relative ordering of requests.

Yes, you can turn off writeback caching, but that has pretty huge
performance costs; and there is the FUA bit, but that's just an
unconditional high priority bypass of the writeback cache, which is
useful in some cases, but which again, does not give the ability for
the OS to specify a partial order, while letting the drive reorder
other requests for efficiency/performance's sake, since the drive has
a lot more information about the optimal way to reorder requests based
on the current location of the drive head and where certain blocks may
have been remapped due to bad block sparing, etc.

> Hopefully, eventually the storage developers will realize the value
> behind ordered commands and learn corresponding SCSI facilities to
> deal with them.

Eventually, drive manufacturers will realize that trying to price
guage people who want advanced features such as TCQ, DIF/DIX, is the
best way to gaurantee that most people won't bother to purchase them,
and hence the features will remain largely unused

   - Ted
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread david


On Wed, 24 Oct 2012, Nico Williams wrote:


On Wed, Oct 24, 2012 at 5:03 PM,   wrote:

I'm doing some work with rsyslog and it's disk-baded queues and there is a
similar issue there. The good news is that we can have a version that is
linux specific (rsyslog is used on other OSs, but there is an existing queue
implementation that they can use, if the faster one is linux-only, but is
significantly faster, that's just a win for Linux)

Like what is being described for sqlite, loosing the tail end of the
messages is not a big problem under normal conditions. But there is a need
to be sure that what is there is complete up to the point where it's lost.

this is similar in concept to write-ahead-logs done for databases (without
the absolute durability requirement)

[...]

I am not fully understanding how what you are describing (COW, separate
fsync threads, etc) would be implemented on top of existing filesystems.
Most of what you are describing seems like it requires access to the
underlying storage to implement.

could you give a more detailed explination?


COW is "copy on write", which is actually a bit of a misnomer -- all
COW means is that blocks aren't over-written, instead new blocks are
written.  In particular this means that inodes, indirect blocks, data
blocks, and so on, that are changed are actually written to new
locations, and the on-disk format needs to handle this indirection.


so how can you do this, and keep the writes in order (especially between 
two files) without being the filesystem?



As for fsyn() and background threads... fsync() is synchronous, but in
this scheme we want it to happen asynchronously and then we want to
update each transaction with a pointer to the last transaction that is
known stable given an fsync()'s return.


If you could specify ordering between two writes, I could see a process 
along the lines of


Append new message to file1

append tiny status updates to file2

every million messages, move to new files. once the last message has been 
processed for the old set of files, delete them.


since file2 is small, you can reconstruct state fairly cheaply

But unless you are a filesystem, how can you make sure that the message 
data is written to file1 before you write the metadata about the message 
to file2?


right now it seems that there is no way for an application to do this 
other than doing a fsync(file1) before writing the metadata to file2


And there is no way for the application to tell the filesystem to write 
the data in file2 in order (to make sure that block 3 is not written and 
then have the system crash before block 2 is written), so the application 
needs to do frequent fsync(file2) calls.


If you need complete durability of your data, there are well documented 
ways of enforcing it (including the lwn.net article 
http://lwn.net/Articles/457667/ )


But if you don't need the gurantee that your data is on disk now, you just 
need to have it ordered so that if you crash you can be guaranteed only to 
loose data off of the tail of your file, there doesn't seem to be any way 
to do this other than using the fsync() hammer and wait for the overhead 
of forcing the data to disk now.



Or, as I type this, it occurs to me that you may be saying that every time 
you want to do an ordering guarantee, spawn a new thread to do the fsync 
and then just keep processing. The fsync will happen at some point, and 
the writes will not be re-ordered across the fsync, but you can keep 
going, writing more data while the fsync's are pending.


Then if you have a filesystem and I/O subsystem that can consolodate the 
fwyncs from all the different threads together into one I/O operation 
without having to flush the entire I/O queue for each one, you can get 
acceptable performance, with ordering. If the system crashes, data that 
hasn't had it's fsync() complete will be the only thing that is lost.


David Lang
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] light weight write barriers

2012-10-25 Thread david


On Wed, 24 Oct 2012, Nico Williams wrote:


Before that happens, people will keep returning again and again with those
simple questions: why the queue must be flushed for any ordered operation?
Isn't is an obvious overkill?


That [cache flushing] is not what's being asked for here.  Just a
light-weight barrier.  My proposal works without having to add new
system calls: a) use a COW format, b) have background threads doing
fsync()s, c) in each transaction's root block note the last
known-committed (from a completed fsync()) transaction's root block,
d) have an array of well-known ubberblocks large enough to accommodate
as many transactions as possible without having to wait for any one
fsync() to complete, d) do not reclaim space from any one past
transaction until at least one subsequent transaction is fully
committed.  This obtains ACI- transaction semantics (survives power
failures but without durability for the last N transactions at
power-failure time) without requiring changes to the OS at all, and
with support for delayed D (durability) notification.


I'm doing some work with rsyslog and it's disk-baded queues and there is a 
similar issue there. The good news is that we can have a version that is 
linux specific (rsyslog is used on other OSs, but there is an existing 
queue implementation that they can use, if the faster one is linux-only, 
but is significantly faster, that's just a win for Linux)


Like what is being described for sqlite, loosing the tail end of the 
messages is not a big problem under normal conditions. But there is a need 
to be sure that what is there is complete up to the point where it's lost.


this is similar in concept to write-ahead-logs done for databases (without 
the absolute durability requirement)


1. new messages arrive and get added to the end of the queue file.

2. a thread updates the queue to indicate that it is in the process 
of delivering a block of messages


3. the thread updates the queue to indicate that the block of messages has 
been delivered


4. garbage collection happens to delete the old messages to free up space 
(if queues go into files, this can just be to limit the file size, 
spilling to multiple files, and when an old file is completely marked as 
delivered, delete it)


I am not fully understanding how what you are describing (COW, separate 
fsync threads, etc) would be implemented on top of existing filesystems. 
Most of what you are describing seems like it requires access to the 
underlying storage to implement.


could you give a more detailed explination?

David Lang
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] transaction commit is successful despite I/O error

2012-10-25 Thread Efim Dyadkin

I was probably not quite specific. So I would like to rephrase the problem and 
give more details.

I run a default configuration of Sqlite on Linux and I don't use WAL or 
anything that changes transaction behavior. A database is located on a remote 
file server and accessed via NFS. There is a single process only that accesses 
the database. The problem is when auto-mounter is stopped (on a machine where 
the process is running) in the middle of database transaction, commit returns 
successfully though it leaves a hot journal on disk because it is not seen for 
unlink (ENOENT)!!!

Note, when auto-mounter is down, you still can operate on a database file and 
its journal if they were opened before the stop.

The problem can be reproduced very easily with an Sqlite command shell:
1) Run the shell
2) Begin exclusive transaction
3) Insert a few records
4) Stop auto-mounter
5) Optionally insert a few more records (stopped auto-mounter does not affect 
these inserts because database and journal are opened)
6) Commit (no errors!!!)
7) Quit the shell

If you then restart auto-mounter and check the database directory you will find 
a hot journal! If you reopen the database the transaction will be played back 
and recently commited changes will be gone!

Thanks for your feedbacks,
Efim




 This e-mail, including any attached files, may contain confidential and 
privileged information for the sole use of the intended recipient. Any review, 
use, distribution, or disclosure by others is strictly prohibited. If you are 
not the intended recipient (or authorized to receive information for the 
intended recipient), please contact the sender by reply e-mail and delete all 
copies of this message.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Bug report concerning the juliandate function

2012-10-25 Thread kagan

Dear sir or madam,

I have tested sqlite version 3.6. and version 3.7.14.1 with the following 
inputs:

select julianday('2005-02-29');
select julianday('2005-02-30');

In both cases, it should have returned NULL, but it returned a number.

In the first case, the year 2005 is not a leap year. Therefore, this date is 
not valid.
In the second case, February has never 30 days.

Regards,

Kagan Kayal
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] how to select " char in sqlite

2012-10-25 Thread Kees Nuyt

On Thu, 25 Oct 2012 01:20:24 +, YAN HONG YE 
wrote:

>char bh1[320];
>memset(bh1,0,320);
>strcpy(bh1,"sqlite3 -html -header t9_engine.db \"select 
>id,partnumber,substr(\'\',1,180) as 
>img,pcs from engine where id>7;\" >> n.html");
>system(bh1);  //here couldn't work

As we said before "here couldn't work" is insufficient to describe your
problem.  Please show us the contents of bh1 just before the system()
call, and the error codes and error messages returned.

It is bad practice to shell out to the sqlite3 commandline tool from
within a program. As you see, you'll get all kinds of quoting issues
(probably the root cause of the failure here). 

You are supposed to use the API, either the C-API
http://sqlite.org/c3ref/intro.html or some API provided by a "wrapper"
for the computer language your application is coded in.

There is excellent sample code on the internet, like
http://icculus.org/~chunky/stuff/sqlite3_example/sqlite3_example_bind.c

By the way, I don't think the substr() is at the right place.

Hope this helps.

-- 
Groet, Cordialement, Pozdrawiam, Regards,

Kees Nuyt

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] I/O error on creating index with 3.7.14

[sqlite] FW: how to select " char in sqlite

[sqlite] FW: how to select " char in sqlite

Re: [sqlite] light weight write barriers

Re: [sqlite] light weight write barriers

Re: [sqlite] Disk I/O Error

[sqlite] Disk I/O Error

Re: [sqlite] segfault in 3.7.13 (and older) with UNION ALL

[sqlite] segfault in 3.7.13 (and older) with UNION ALL

Re: [sqlite] light weight write barriers

Re: [sqlite] light weight write barriers

Re: [sqlite] transaction commit is successful despite I/O error

Re: [sqlite] light weight write barriers

Re: [sqlite] light weight write barriers

Re: [sqlite] Which non-default options are used for the shell?

[sqlite] Which non-default options are used for the shell?

Re: [sqlite] Bug report concerning the juliandate function

Re: [sqlite] light weight write barriers

Re: [sqlite] light weight write barriers

Re: [sqlite] light weight write barriers

Re: [sqlite] light weight write barriers

Re: [sqlite] light weight write barriers

Re: [sqlite] light weight write barriers

Re: [sqlite] light weight write barriers

Re: [sqlite] transaction commit is successful despite I/O error

[sqlite] Bug report concerning the juliandate function

Re: [sqlite] how to select " char in sqlite

27 matches

Site Navigation

Mail list logo

Footer information