Re: Design pattern for bulk data handling

2014-02-09 Thread Ben
Thank you for the comments everyone, however I think I must have phrased the 
question badly. I am not looking for assistance on speeding up SQLite, more for 
designing a cleaner API that works well with this. I am already pretty familiar 
with SQLite and it's various options.

To try and clarify:

I have an database object that looks something like this:

@interface MyDB : NSObject
- (id)initWithFileAtURL:(NSURL *)fileURL;
- (id)executeQuery:(NSString *)sql completionHandler:(myBlock)handler;
@end

Using -executeQuery:completionHandler: passes the sql string to an NSOperation 
which does the database-touching work and passes results back through the 
completion block.

I'm now trying to work out a tidy API of executing (up to) millions of SQL 
statements without building up a huge queue of operations. After sleeping on 
the matter, my current idea is to add a method to the MyDB class like:
- (void)importFromURL:(NSURL *)fileURL withOptions:(NSDictionary *)options 
completionHandler:(myBlock)handler;

This would allow me to pass key/value pairs with options for table or column 
mappings and have all the database-specific API grouped in one place.

- Ben


On 8 Feb 2014, at 18:20, Ben ben_cocoa_dev_l...@yahoo.co.uk wrote:

 Hi list,
 
 I'm looking for the right design pattern for providing different types of 
 access to an SQLite database.
 
 Currently I have a database object where queries are run on a serial 
 NSOperationQueue and each operation has a completion block for reporting its 
 results. The operation queue is an implementation detail behind a more basic 
 API.
 
 This is fine for most things, except that I sometimes need faster access to 
 the underlying database - for example, when importing/exporting data. In 
 these cases I'm after bulk data throughput without the overhead of 
 creating/destroying many NSOperations with completion handlers since there 
 can be in the order of millions of statements to handle.
 
 In the past I had simply broken encapsulation by exposing the underlying 
 database engine API. I'd rather find a better method than this for the future.
 
 Options considered so far:
 
 1. Continue exposing underlying DB engine
 2. Add a simpler synchronous API for use on a separate thread
 3. Expose NSOperationQueue and allow custom import operations to be queued
 4. Add importing functions to the database object itself
 
 Does anyone have any suggestions as to how best to approach this problem?
 
 - Ben


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Design pattern for bulk data handling

2014-02-08 Thread Ben
Hi list,

I'm looking for the right design pattern for providing different types of 
access to an SQLite database.

Currently I have a database object where queries are run on a serial 
NSOperationQueue and each operation has a completion block for reporting its 
results. The operation queue is an implementation detail behind a more basic 
API.

This is fine for most things, except that I sometimes need faster access to the 
underlying database - for example, when importing/exporting data. In these 
cases I'm after bulk data throughput without the overhead of 
creating/destroying many NSOperations with completion handlers since there can 
be in the order of millions of statements to handle.

In the past I had simply broken encapsulation by exposing the underlying 
database engine API. I'd rather find a better method than this for the future.

Options considered so far:

1. Continue exposing underlying DB engine
2. Add a simpler synchronous API for use on a separate thread
3. Expose NSOperationQueue and allow custom import operations to be queued
4. Add importing functions to the database object itself

Does anyone have any suggestions as to how best to approach this problem?

- Ben
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Design pattern for bulk data handling

2014-02-08 Thread Jens Alfke

On Feb 8, 2014, at 10:20 AM, Ben ben_cocoa_dev_l...@yahoo.co.uk wrote:

 This is fine for most things, except that I sometimes need faster access to 
 the underlying database - for example, when importing/exporting data. In 
 these cases I'm after bulk data throughput without the overhead of 
 creating/destroying many NSOperations with completion handlers since there 
 can be in the order of millions of statements to handle.

You can read arbitrarily large amounts of data with a single statement, so 
that's not an issue. But in general it takes many statements to insert a lot of 
rows. (And for performance reasons you really want to group all of those 
statements in a single transaction, or you'll lose an order of magnitude of 
performance.)

What I'd do is provide a new operation type that does a bulk-insert or 
bulk-update. When run the operation performs a series of SQL statements 
starting with a BEGIN and ending with an END.

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Design pattern for bulk data handling

2014-02-08 Thread Tito Ciuro
Also, consider dropping the table indexes before you start inserting. After the 
you're done inserting, rebuild the indexes. This should give you an an 
additional performance boost.

For even more speed, consider turning off the following setting:

http://www.sqlite.org/pragma.html#pragma_synchronous

*warning*: read the link carefully and understand what it does before using it!

-- Tito


 On Feb 8, 2014, at 14:25, Jens Alfke j...@mooseyard.com wrote:
 
 
 On Feb 8, 2014, at 10:20 AM, Ben ben_cocoa_dev_l...@yahoo.co.uk wrote:
 
 This is fine for most things, except that I sometimes need faster access to 
 the underlying database - for example, when importing/exporting data. In 
 these cases I'm after bulk data throughput without the overhead of 
 creating/destroying many NSOperations with completion handlers since there 
 can be in the order of millions of statements to handle.
 
 You can read arbitrarily large amounts of data with a single statement, so 
 that's not an issue. But in general it takes many statements to insert a lot 
 of rows. (And for performance reasons you really want to group all of those 
 statements in a single transaction, or you'll lose an order of magnitude of 
 performance.)
 
 What I'd do is provide a new operation type that does a bulk-insert or 
 bulk-update. When run the operation performs a series of SQL statements 
 starting with a BEGIN and ending with an END.
 
 —Jens
 ___
 
 Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
 
 Please do not post admin requests or moderator comments to the list.
 Contact the moderators at cocoa-dev-admins(at)lists.apple.com
 
 Help/Unsubscribe/Update your Subscription:
 https://lists.apple.com/mailman/options/cocoa-dev/tciuro%40mac.com
 
 This email sent to tci...@mac.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Design pattern for bulk data handling

2014-02-08 Thread Jens Alfke

On Feb 8, 2014, at 2:54 PM, Tito Ciuro tci...@mac.com wrote:

 For even more speed, consider turning off the following setting:
 http://www.sqlite.org/pragma.html#pragma_synchronous


(This is getting slightly off-topic, but…)

The only time I'd set synchronous to OFF is while populating a _new_ database 
file, as part of some sort of safe save process where you create the file in 
a temporary directory, write to it, and only move it into the real directory at 
the end.

Basically when synchronous is OFF it's entirely possible that a power failure 
or kernel panic could corrupt the database. (I've seen this happen several 
times. It's not just theoretical.) So don't use it on a database file that has 
any data you mind losing, or that's in a location where the user might try to 
re-open it later (because if she does, you'll get a support ticket about your 
app crashing or failing to open the file.)

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com