Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-12 Thread Yue
Hello,
Thanks for your advice!
It works pretty well when I dropped the rowid and set the other field as
primary key.
I also enable wal mode at the cost of a little bit read performance.
Overall, it works really well now.

Really appreciate your help.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-12 Thread Yue
Thanks for the advice!

I could never thought about this. Really learnt a lot.

It's a customized Android phone type device.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-11 Thread Eric Grange
> I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE
DS = 15;".

Not zipvfs specific, but that kind of update can be quite inefficient if
the record is large (in terms of bytes, or records per database page) as
your table declaration hints to.

This will be especially prominent with compression, as a single byte
changed early in the uncompressed stream can lead to changing pretty much
every byte of the compressed data, and so is bound to defeat any
optimizations the storage could have to handle unmodified pages and data.

Depending on your update patterns and frequencies, it could thus be
beneficial to split your table into two tables (or more) with a common
primary key:
- one table for the rarely changed fields (even better if they are
immutable)
- one table (or more) for the fields that change often

This will increase the database size and reduce insert performance, and it
can complicate querying when you need everything (extra inner joins), but
it can help reduce the amount of raw data that is affected by updates quite
drastically.

I have been able to use the above strategy sucessfully, however it was
under Windows compressed folders, so YMMV.


On Thu, Sep 7, 2017 at 9:13 PM, Yue Wu  wrote:

> Hello,
> As mentioned in the subject, our goal is to improve performance regarding
> to batch sql updates.
> The update sql as follow,
>
> > UPDATE ITEM SET FIELD4 =? WHERE DS=?
>
> We run 100,000 updates in a single transaction. The zipvfs version takes
> about 20 min while uncompressed version takes about 7 min.
> I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS
> = 15;".
>
> Any suggestions is very helpful.
>
> Some background:
> We have an app running on Android 4.2 using zipvfs.
> The page size is 4096
> The cache size is - 2000
>
> Table schema for ITEM
>
> > CREATE TABLE ITEM (
> > FIELD0 NUMERIC,
> > FIELD1 NUMERIC,
> > DS TEXT,
> > FIELD2 TEXT,
> > FIELD3 TEXT,
> > FIELD4 NUMERIC,
> > FIELD5 NUMERIC,
> > FIELD6 NUMERIC,
> > FIELD7 NUMERIC,
> > FIELD8 NUMERIC,
> > FIELD9 NUMERIC,
> > FIELD10 NUMERIC,
> > FIELD11 TEXT);
>
>
> The third column: "DS" is what we query by almost all the time. We also
> created index:
>
> > CREATE INDEX DS_INDEX ON ITEM(DS);
>
>
> There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the
> one shipped with Android 4.2) size of 1.39gb.
> Zipvfs db using zlib and aes128, which are default.
>
> Thanks
> --
>
> Yue Wu |  Android Developer
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-09 Thread Simon Slavin


On 9 Sep 2017, at 9:11pm, Teg  wrote:

> Is  it a phone or tablet?  I'd wonder if the CPU is going into thermal
> throttle  mode.  As I understand it most phones and tablets can't keep
> the CPU's running full speed without eventually throttling the CPU.

I would never have thought of that unless I had the device in my hands.  Well 
done.

Yes.  You’re trying to process 100,000 updates, and your device is working flat 
out for 7 minutes, or for 20 minutes.  How hot is it ?  If you do it unplugged, 
how much does that drain the battery ?

I don’t have experience with Android but I can tell you that iOS will have to 
throttle any process that attempts to do process data continuously for 7 
minutes.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-09 Thread Teg
Hello Yue,

Is  it a phone or tablet?  I'd wonder if the CPU is going into thermal
throttle  mode.  As I understand it most phones and tablets can't keep
the CPU's running full speed without eventually throttling the CPU.

To me the fact uncompressed, while moving much more data, is more than 2
times faster suggests the bottleneck isn't the fileIO.

Thursday, September 7, 2017, 3:13:04 PM, you wrote:

YW> Hello,
YW> As mentioned in the subject, our goal is to improve performance regarding
YW> to batch sql updates.
YW> The update sql as follow,

>> UPDATE ITEM SET FIELD4 =? WHERE DS=?

YW> We run 100,000 updates in a single transaction. The zipvfs version takes
YW> about 20 min while uncompressed version takes about 7 min.
YW> I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS
YW> = 15;".

YW> Any suggestions is very helpful.

YW> Some background:
YW> We have an app running on Android 4.2 using zipvfs.
YW> The page size is 4096
YW> The cache size is - 2000

YW> Table schema for ITEM

>> CREATE TABLE ITEM (
>> FIELD0 NUMERIC,
>> FIELD1 NUMERIC,
>> DS TEXT,
>> FIELD2 TEXT,
>> FIELD3 TEXT,
>> FIELD4 NUMERIC,
>> FIELD5 NUMERIC,
>> FIELD6 NUMERIC,
>> FIELD7 NUMERIC,
>> FIELD8 NUMERIC,
>> FIELD9 NUMERIC,
>> FIELD10 NUMERIC,
>> FIELD11 TEXT);


YW> The third column: "DS" is what we query by almost all the time. We also
YW> created index:

>> CREATE INDEX DS_INDEX ON ITEM(DS);


YW> There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the
YW> one shipped with Android 4.2) size of 1.39gb.
YW> Zipvfs db using zlib and aes128, which are default.

YW> Thanks



-- 
 Tegmailto:t...@djii.com

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-09 Thread Yue
Thanks for the advice. I'll try them now.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-08 Thread Barry Smith
Are your updates sorted by DS? If your queries are sorted then sequential 
queries are more likely to hit the same db pages while searching the index, 
resulting in higher cache usage and fewer decompression operations. This would 
have less benefit if your 100k DS values of the updates are randomly 
distributed through the 30m available, and more of an effect if they're tightly 
clustered in some range.

If you mainly query by DS, is that the same as saying it is your primary key? 
If so, have you tried to declare it so and try the table WITHOUT ROWID, and 
don't bother with the index? It may help since you wouldn't have to decompress 
both the index pages and the table pages.

> On 8 Sep 2017, at 12:33 pm, Dominique Pellé  wrote:
> 
> Yue Wu  wrote:
> 
>> Hello,
>> As mentioned in the subject, our goal is to improve performance regarding
>> to batch sql updates.
>> The update sql as follow,
>> 
>>> UPDATE ITEM SET FIELD4 =? WHERE DS=?
>> 
>> We run 100,000 updates in a single transaction. The zipvfs version takes
>> about 20 min while uncompressed version takes about 7 min.
> 
> Which compression algorithm do you use with zipvfs?
> Try LZ4, it's is times faster than zlib at compression and
> decompression, at the cost of compressing a bit less:
> 
> https://github.com/lz4/lz4
> 
> Or try zstd, which can compress more than zlib and faster (especially
> on 64-bit architectures), but it's not as fast as LZ4:
> 
> http://facebook.github.io/zstd/
> 
> What is also your SQLite page size?
> Reducing the SQLite page size probably helps to speed up
> updates, since zipvfs compresses by pages.
> 
> Dominique
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-08 Thread Dominique Pellé
Yue Wu  wrote:

> Hello,
> As mentioned in the subject, our goal is to improve performance regarding
> to batch sql updates.
> The update sql as follow,
>
>> UPDATE ITEM SET FIELD4 =? WHERE DS=?
>
> We run 100,000 updates in a single transaction. The zipvfs version takes
> about 20 min while uncompressed version takes about 7 min.

Which compression algorithm do you use with zipvfs?
Try LZ4, it's is times faster than zlib at compression and
decompression, at the cost of compressing a bit less:

https://github.com/lz4/lz4

Or try zstd, which can compress more than zlib and faster (especially
on 64-bit architectures), but it's not as fast as LZ4:

http://facebook.github.io/zstd/

What is also your SQLite page size?
Reducing the SQLite page size probably helps to speed up
updates, since zipvfs compresses by pages.

Dominique
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-08 Thread Yue Wu
Hello,
As mentioned in the subject, our goal is to improve performance regarding
to batch sql updates.
The update sql as follow,

> UPDATE ITEM SET FIELD4 =? WHERE DS=?

We run 100,000 updates in a single transaction. The zipvfs version takes
about 20 min while uncompressed version takes about 7 min.
I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS
= 15;".

Any suggestions is very helpful.

Some background:
We have an app running on Android 4.2 using zipvfs.
The page size is 4096
The cache size is - 2000

Table schema for ITEM

> CREATE TABLE ITEM (
> FIELD0 NUMERIC,
> FIELD1 NUMERIC,
> DS TEXT,
> FIELD2 TEXT,
> FIELD3 TEXT,
> FIELD4 NUMERIC,
> FIELD5 NUMERIC,
> FIELD6 NUMERIC,
> FIELD7 NUMERIC,
> FIELD8 NUMERIC,
> FIELD9 NUMERIC,
> FIELD10 NUMERIC,
> FIELD11 TEXT);


The third column: "DS" is what we query by almost all the time. We also
created index:

> CREATE INDEX DS_INDEX ON ITEM(DS);


There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the
one shipped with Android 4.2) size of 1.39gb.
Zipvfs db using zlib and aes128, which are default.

Thanks
-- 

Yue Wu |  Android Developer
sqlite> EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS = 15;
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 28000  Start at 28
1 Null   0 1 200  r[1..2]=NULL
2 OpenWrite  0 3 0 13 08  root=3 iDb=0; ITEM
3 OpenWrite  1 4 0 k(2,,) 02  root=4 iDb=0; ds_index
4 Integer1516000  r[16]=15
5 Affinity   161 0 B  00  affinity(r[16])
6 SeekGE 1 27161  00  key=r[16]
7   IdxGT  1 27161  00  key=r[16]
8   DeferredSeek   1 0 000  Move 0 to 1.rowid 
if needed
9   Rowid  0 2 000  r[2]=rowid
10  IsNull 2 27000  if r[2]==NULL goto 
27
11  Column 0 0 300  r[3]=ITEM.FIELD0
12  Column 0 1 400  r[4]=ITEM.FIELD1
13  Column 0 2 500  r[5]=ITEM.DS
14  Column 0 3 600  r[6]=ITEM.FIELD2
15  Column 0 4 700  r[7]=ITEM.FIELD3
16  Integer108 000  r[8]=10
17  Column 0 6 900  r[9]=ITEM.FIELD5
18  Column 0 7 10   00  r[10]=ITEM.FIELD6
19  Column 0 8 11   00  r[11]=ITEM.FIELD7
20  Column 0 9 12   00  r[12]=ITEM.FIELD8
21  Column 0 1013   00  r[13]=ITEM.FIELD9
22  Column 0 1114   00  r[14]=ITEM.FIELD10
23  Column 0 1215   00  r[15]=ITEM.FIELD11
24  MakeRecord 3 1317CCBBBCCCB  00  
r[17]=mkrec(r[3..15])
25  Insert 0 172 ITEM   07  intkey=r[2] 
data=r[17]
26Next   1 7 100
27Halt   0 0 000
28Transaction0 1 4 0  01  usesStmtJournal=0
29Goto   0 1 000___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] seeking advice

2014-12-10 Thread Paul
Hello, Rene

> Hi there,
> 
> I have to store and retrieve up to 2000 parameters.
> The parameters can have real and integer values.
> The max. change rate is 100ms and the max. duration is up to some hours.
> 
> The simple solution would be to use plain binary files. It's fast but not 
> flexible.
> 
> So I came to sqlite. Before starting I would like to ask if someone could 
> give 
> me some advice which way to go. Here are my current versions.
> 
> Version 1: One big table with time stamp and one column for each parameter
> - I can have up to SQLITE_MAX_COLUMNS=2000 but how can I insert all values 
> with only SQLITE_MAX_VARIABLE_NUMBER=999?
> - What about space consumption if NULL values are given?

You can always recompile SQLite having desired parameters.
Or you can make you own implementation of virtual table.
In my opinion, having virtual table would be the most efficient 
way to inflate data into SQLite.
Example:

INSERT INTO my_table(timestamp, param_1, ..., param_n) 
SELECT NOW(), _1, _2, ..., _n FROM my_virtual_table;

Though implementing virtual table correctly will take you some time
that you would spend elswhere.
The rule of thumb: do not bother with virtual table if CPU is not a bottleneck.

Use this approach if you always need to query all of the parameters.

> 
> Version 2: One table for each parameter with time stamp and value
> - Does this scale for up to 2000 parameters?
> - Can sqlite handle so much tables?
> 

First of all, it depends on your usage. If at any given point of
time you may need only one or few parameters and don't care about
the others, you should group then in separate tables or even make a personal
table for each of them. In this case, querying individual parameters or 
groups of parameters will be faster, since SQLite will not pick up 
extra data from the disk. But you must realise that each table will have 
its own copy of timestamp and potentially a ROWID (for non-WITHOUT ROWID tables)
that will eat up extra space. Also inserts/updates will be much more expensive
because data in the database will be fragmented.
So for example if to insert one row into 2000-column table if would take to
write a couple of pages to disk, inserting into 2000 tables would make 
at least 2000 pages to be written to disk.

Consider the fact that in this case it would be the nightmare to 
make sure every timestamp value is associated with a value of every parameter.
(If you have to provide this guarantee)

Yes, SQLite can handle this many tables.

> Version 3: One table with time stamp, parameter id and parameter value
> - Is it working when all values change?
> - Is retrieving values for one parameter fast?

For me, this looks as the most efficient way of implementing what you want.

Retrieving speed will depend on indexes you have.
If you when you are querying parameters you always specific timestamp
or time range, you can get away with compound index (timestamp, parameter_id).
Then selects like:

SELECT value FROM table WHERE timestamp = X;
SELECT value FROM table WHERE timestamp = X AND ...;
SELECT value FROM table WHERE timestamp > X;
SELECT value FROM table WHERE timestamp < X;
SELECT value FROM table WHERE timestamp > X; AND timestamp < y;
SELECT value FROM table WHERE timestamp IN (X1, ..., Xn);


will always be efficient.

Do not create separate index like this, make it PROMARY KEY.
And experement how using WITHOUT ROWID will perform.


> 
> I will write and read the data on the same time. But writing should have 
> priority. I could live with a potential lost of data because of a power 
> failure. So I will use try the settings of the options "synchronous" and 
> "journal_mode". Anything else I should consider?
> 

You may also play around with temp_store and cache_size pragmas.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] seeking advice

2014-12-09 Thread RSmith


On 2014/12/09 22:41, Rene Zaumseil wrote:

Hi there,

I have to store and retrieve up to 2000 parameters.
The parameters can have real and integer values.
The max. change rate is 100ms and the max. duration is up to some hours.

The simple solution would be to use plain binary files. It's fast but not
flexible.

So I came to sqlite. Before starting I would like to ask if someone could give
me some advice which way to go. Here are my current versions.

Version 1: One big table with time stamp and one column for each parameter
   - I can have up to SQLITE_MAX_COLUMNS=2000 but how can I insert all values
with only SQLITE_MAX_VARIABLE_NUMBER=999?
   - What about space consumption if NULL values are given?

Version 2: One table for each parameter with time stamp and value
   - Does this scale for up to 2000 parameters?
   - Can sqlite handle so much tables?

Version 3: One table with time stamp, parameter id and parameter value
   - Is it working when all values change?
   - Is retrieving values for one parameter fast?


Definitely No. 3 always. SQLite will retrieve a value from among 2000 items on an indexed column in under 1ms (as will any other 
localized DB). Even I/O lag will be masked by the cache at that size.


Assuming the parameter names do not change all the time and the time-stamp column is not indexed, writing will be similarly fast 
using a suitable journal mode. On this point, you might simply use an in-memory DB since you do not care about data loss on power 
failure (use ":memory:" as the file-name) which will be even faster but with an on-disk DB you will have a snapshot remain of the 
current parameter set if your application fails - which might assist debugging.


Also, making that DB, be sure to use "WITHOUT ROWID" and specifying the parameter-name as the primary key. Use Numeric affinity for 
time-stamp and value columns.


Cheers,
Ryan




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] seeking advice

2014-12-09 Thread Simon Slavin

On 9 Dec 2014, at 8:41pm, Rene Zaumseil  wrote:

> Version 3: One table with time stamp, parameter id and parameter value
>  - Is it working when all values change?
>  - Is retrieving values for one parameter fast?

That one.  Versions 1 & 2 will both, technically, work, but they're abuse of 
how SQL should be used and will result in horrible code.

The speed for retrieving all parameters will be bound by your programming 
language.  SQLite will do its side of the job very quickly.  And since columns 
have just affinity and not type, having some values INTEGER and other REAL will 
work fine.

> I will write and read the data on the same time. But writing should have 
> priority.

Multithread ?  Multiprocess ?  Neither are needed, but those are the things you 
need to decide on next.

Also, do not forget to pick a journal mode and to set a busy_timeout.  Ah, I 
see you already mentioned journal_mode.  Good.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] seeking advice

2014-12-09 Thread Rene Zaumseil
Hi there,

I have to store and retrieve up to 2000 parameters.
The parameters can have real and integer values.
The max. change rate is 100ms and the max. duration is up to some hours.

The simple solution would be to use plain binary files. It's fast but not 
flexible.

So I came to sqlite. Before starting I would like to ask if someone could give 
me some advice which way to go. Here are my current versions.

Version 1: One big table with time stamp and one column for each parameter
  - I can have up to SQLITE_MAX_COLUMNS=2000 but how can I insert all values 
with only SQLITE_MAX_VARIABLE_NUMBER=999?
  - What about space consumption if NULL values are given?

Version 2: One table for each parameter with time stamp and value
  - Does this scale for up to 2000 parameters?
  - Can sqlite handle so much tables?

Version 3: One table with time stamp, parameter id and parameter value
  - Is it working when all values change?
  - Is retrieving values for one parameter fast?

I will write and read the data on the same time. But writing should have 
priority. I could live with a potential lost of data because of a power 
failure. So I will use try the settings of the options "synchronous" and 
"journal_mode". Anything else I should consider?


Thank you
rene
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


RE: [sqlite] Seeking advice on On Demand Loading

2007-09-28 Thread Samuel R. Neff

I've found that the best trade-off in performance and memory for on-demand
loading is to first run a query which retrieves all the id's of the items
you want in the list and store the list in memory.  Then you can use that to
run a second query for full data using a where clause with "ID IN (...)".
This is much faster than using LIMIT/OFFSET but has overhead of maintaining
an array (or whatever) of ID's (find for a few thousand, millions could be
problematic, but then again, a list with millions of records is problematic
no matter how you do it).

HTH,

Sam


---
We’re Hiring! Seeking a passionate developer to join our team building
products. Position is in the Washington D.C. metro area. If interested
contact [EMAIL PROTECTED]
 
-Original Message-
From: Olaf Beckman Lapré [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 28, 2007 3:48 PM
To: sqlite-users@sqlite.org
Subject: [sqlite] Seeking advice on On Demand Loading

Hello,

I'm looking for some advice on implementing 'On Demand Loading' in my
program. Some GUI controls such as a Listview have a 'virtual mode' where
the control only needs to be loaded with the items currently shown. The
Listview will call a callback function where it indicates which items need
to be loaded. BTW I'm using wxWidgets for my GUI library.

My question centers on how to implement this using SQLite as I'm not really
sure how to retrieve 'record numbers 445 to 456' using SQL statements
efficiently.

Any thoughts on this?

Kind regards, Olaf



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Seeking advice on On Demand Loading

2007-09-28 Thread Olaf Beckman Lapré
Hello,

I'm looking for some advice on implementing 'On Demand Loading' in my program. 
Some GUI controls such as a Listview have a 'virtual mode' where the control 
only needs to be loaded with the items currently shown. The Listview will call 
a callback function where it indicates which items need to be loaded. BTW I'm 
using wxWidgets for my GUI library.

My question centers on how to implement this using SQLite as I'm not really 
sure how to retrieve 'record numbers 445 to 456' using SQL statements 
efficiently.

Any thoughts on this?

Kind regards, Olaf