Re: [sqlite] large db performance considerations

2009-06-08 Thread Paul Perry
Hi Simon,   Thanks for your response.

> You need an index, obviously.  You can't have an index on a VIRTUAL
> table.

Yes, I did try on a regular table (not FTS3) and added the indices I
described (one each on Author, Content and Category) and still did not
notice any performance improvement.  And the db size increased a fair
amount.

> Could you instead create a TEMPORARY table and read the data into
> it ?  That can have indices on all the fields you care about.  You can
> even copy all the entries from the VIRTUAL table into it in one
> instruction using the form of INSERT that has a SELECT in it.  It's
> faster to load the data into the table first, then create the indices.
That is a good idea.  I will experiment with this.  The database is
basically read only.  So I could create some other tables.


> The other aspect of this is to wonder why you're using a VIRTUAL table
> in the first place.  Why aren't you searching the source data instead
> of a virtual table ?  The SELECT you described doesn't use the fts3
> feature.
Yes, When the project began, I was planning on using full text search.  As
the project moved forward, I ended up not using it at this point, although
future enhancements do call for using full text search.

For this iteration, if there is a better way, without using FTS, I could
certainly remove it and use a regular table.

Thanks.

Paul

On Sun, Jun 7, 2009 at 11:00 PM, Simon Slavin
<slav...@hearsay.demon.co.uk>wrote:

>
> On 8 Jun 2009, at 4:45am, Paul Perry wrote:
>
> > The latest content I have added, has over 31,000 records.  A basic
> > select,
> > such as:
> >
> > Select Content from BookContent where DocumentID = 10;
> > takes nearly 15 seconds.  Are there any ways to optimize this?
>
> You need an index, obviously.  You can't have an index on a VIRTUAL
> table.
>
> Could you instead create a TEMPORARY table and read the data into
> it ?  That can have indices on all the fields you care about.  You can
> even copy all the entries from the VIRTUAL table into it in one
> instruction using the form of INSERT that has a SELECT in it.  It's
> faster to load the data into the table first, then create the indices.
>
> The other aspect of this is to wonder why you're using a VIRTUAL table
> in the first place.  Why aren't you searching the source data instead
> of a virtual table ?  The SELECT you described doesn't use the fts3
> feature.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] large db performance considerations

2009-06-07 Thread Paul Perry
Hello, I was wondering if anybody has a tip for optimizing the query time on
a table.  I have read several web pages about optimizing SQLite queries, but
have not been able to come to any conclusions.

I have a table:

CREATE VIRTUAL TABLE BookContent using fts3(
 DocumentID INTEGER PRIMARY KEY,
 DocumentType INTEGER,
 ParentDocumentID INTEGER,
 Category TEXT,
 Content TEXT,
 Author TEXT,
 SrcFileIdent INTEGER,
 SrcLineNumber INTEGER);

The key pieces of information are the Category, Content and Author.
Performance is fine with up to 2000 records.  I can do basic queries, and
all works well.

The latest content I have added, has over 31,000 records.  A basic select,
such as:

Select Content from BookContent where DocumentID = 10;
takes nearly 15 seconds.  Are there any ways to optimize this?

The first thing I did was consider made it a regular SQLite table
(droped the FTS support), and used the same table schema, and added an index
on Author, Content and Category.  However, I did not notice any change in
time, and the database size grew from 11MB to 15MB.

Any suggestions would be greatly qppreciated.

Thank you,
Paul
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] First Program using sqlite3.dll - Linker Errors

2009-03-28 Thread Paul Perry
Add the lib file you generated with lib.exe into your project.  Just
pointing to the directory is not enough.  That should fix up your link
errors.




On Sat, Mar 28, 2009 at 10:53 AM, Neil Jeffery wrote:

>
> Hi All,
>
>
>
> I am attempting to write a program which will use the sqlite3.dll file.  I
> am developing in Visual Studio 2008 Express on Windows Vista.
>
>
>
> I downloaded this file: http://www.sqlite.org/sqlitedll-3_6_11.zip which
> contains the dll and a definitions file.  I have then used lib.exe to
> generate a.lib file.  The lib file is currently in C:\SQLITE\.
>
>
>
> I have done the following to alter the Linker configuration in the project:
>
>
>
> 1)Right Click Project -> Properties
>
> 2)Configuration Properties -> Linker
>
> 3)I entered "C:\SQLITE\" in the "Additional Library Directories" property
> field
>
>
>
> I believe that I need a sqlite3 header file in order to use the functions
> in my code.  This is not included in the file I downloaded above.  I have
> thus downloaded http://www.sqlite.org/sqlite-amalgamation-3_6_11.zip and
> taken sqlite3.h from this and added it to my project.
>
>
>
> I have only one source file in the project at present which is as
> http://www.sqlite.org/quickstart.html.  Trying to build throws up compile
> errors about the exit keyword, so I have added #include  as well.
>
>
>
> Now the project compiles without warnings or errors, but I see the
> following Linker Errors:
>
>
>
> 1>MAIN.obj : error LNK2019: unresolved external symbol _sqlite3_free
> referenced in function _main
>
> 1>MAIN.obj : error LNK2019: unresolved external symbol _sqlite3_exec
> referenced in function _main
>
> 1>MAIN.obj : error LNK2019: unresolved external symbol _sqlite3_close
> referenced in function _main
>
> 1>MAIN.obj : error LNK2019: unresolved external symbol _sqlite3_errmsg
> referenced in function _main
>
> 1>MAIN.obj : error LNK2019: unresolved external symbol _sqlite3_open
> referenced in function _main
>
> 1>*PATH*\Projects\Testing\SQLITE1\Debug\SQLITE1.exe
> : fatal error LNK1120: 5 unresolved externals
>
>
>
> I have never worked with DLLs before so I could be something quite
> fundamental wrong.
>
>
>
> Any help or thoughts would be welcome.
>
>
>
> Many thanks,
>
>
>
> Neil
>
> _
> View your Twitter and Flickr updates from one place – Learn more!
> http://clk.atdmt.com/UKM/go/137984870/direct/01/
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Formatted text with fts3

2009-03-10 Thread Paul Perry
Scott,

Thanks so much for your thoughtful response.  Things can get complicated
(especially with full text searching) when using formatted text.

What I really want to do is retain the formatting, to display in a HTML
file.  The text out of the database would be wrapped by a basic HTML outter
document, and the formatted text would mainly reference style sheets.  This
is for a desktop application, so I am not trying to target a browser.

I was thinking it is ok to limit myself to full open and close tags, and
also to not have a single word be broken up, as you illustrated nicely.

Thereby, I was thinking if I just filtered out (or removed) the basic tags I
am using, such as:

Some text. this is more text.  Followed by more.

by simply not putting the tags into the index (it technically wouldn't be
fully formed xml with a header and such) , it would still retain the text
for searching but ignore the tags.  I agree that trying to handle generic
HTML would be a bear.  The content is not coming from the web, so I have
control over that as well.  I guess there could be problems with snippets or
with the offsets of the text.

I'll have to take a closer look into the provided parsers, and maybe there
is a way that any tags, could just *not *be put into the index.

I read the documents pointed out and the readme.  It looks like most of the
info on it is the source code, so I will need to take some time to dig
deeper and understand it.  I am hoping to start with the current source code
and add a filter.

I read that some folks from Google did some of the work on the fts3 module.
They probably drop all tags when they search for text, and that allows for a
more reliable search.  I'm just guessing.

Thanks again,
Paul

On Tue, Mar 10, 2009 at 11:22 AM, Scott Hess <sh...@google.com> wrote:

> The fts module doesn't do anything "interesting" with embedded
> meta-data in the interests of simplicity.  Stripping the info out
> before inserting is probably easiest, but has the downsides of
> duplication (assuming you need to keep the raw data elsewhere), and it
> means that queries involving snippets and the like may be funky.
> Probably the best way to go about this would be to convert all tags to
> a single whitespace character.
>
> Building a custom tokenizer is certainly doable, but could be a
> frustrating goal if you intend to process generic HTML you find on the
> web, just because of the number of heuristics you'll have to layer in.
>  As a first pass, you might just treat tags as word breaks as you
> iterate over the input.  But there definitely are cases where HTML
> markup happens within words, so you might need something a bit more
> sophisticated.  There is some level of support for returning tokens
> which are not literally present in the input.  For instance, for the
> input 'this' you could return 'this' and indicate that it
> corresponds to 11 characters in the input, and everything should work.
>  I'm not sure anyone has ever exercised this aspect of things
> strongly, though, so it's possible that things don't work as intended
> when you do that.
>
> Before going either direction, you should probably sit down and figure
> out what exactly you're going to do with the results you get from the
> table.  If you want to, say, present them on a web page, then your
> problems are just beginning, because the tag nesting will open up
> layout issues and security problems.  It may be that thinking through
> that part of the system will help you figure out an appropriate
> approach for this part of the system (for instance, if you decide to
> strip tags for other reasons, then it all becomes easy!).
>
> -scott
>
>
> On Tue, Mar 10, 2009 at 6:56 AM, Paul Perry <paulmjpe...@gmail.com> wrote:
> > Thank you for the pointers Alexandre and Alexey.
> >
> > I spent about 30 minutes looking into the parser, and it looks like it is
> a
> > possibility.  I'll require a more in-depth understanding in order to do
> > this.  I would probably start with the simple parser, and go from there.
> >
> >> I think to prepare html before insert is more simple. You can transform
> >
> > html into "right" format for fts3 parser.
> > I would actually like to retain the tagged (html) formatting in the
> > database, thereby, when it is retrieved it can be displayed as rich text.
> >
> > Thanks,
> > Paul
> >
> >
> >
> > On Tue, Mar 10, 2009 at 4:32 AM, Alexey Pechnikov <
> pechni...@mobigroup.ru>wrote:
> >
> >> Hello!
> >>
> >> On Tuesday 10 March 2009 06:16:16 Alexandre Courbot wrote:
> >> > Never did this myself, but I think you can do what you need by writing
> >> > your own tokeni

Re: [sqlite] Formatted text with fts3

2009-03-10 Thread Paul Perry
Thank you for the pointers Alexandre and Alexey.

I spent about 30 minutes looking into the parser, and it looks like it is a
possibility.  I'll require a more in-depth understanding in order to do
this.  I would probably start with the simple parser, and go from there.

> I think to prepare html before insert is more simple. You can transform >
html into "right" format for fts3 parser.
I would actually like to retain the tagged (html) formatting in the
database, thereby, when it is retrieved it can be displayed as rich text.

Thanks,
Paul



On Tue, Mar 10, 2009 at 4:32 AM, Alexey Pechnikov wrote:

> Hello!
>
> On Tuesday 10 March 2009 06:16:16 Alexandre Courbot wrote:
> > Never did this myself, but I think you can do what you need by writing
> > your own tokenizer:
> >
> >
> http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/fts3/README.tokenizers
>
> It's not good advice for a few documented module.
>
> I think to prepare html before insert is more simple. You can transform
> html
> into "right" format for fts3 parser.
>
> Best regards.
>  ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Formatted text with fts3

2009-03-09 Thread Paul Perry
I've been looking into the full text search capabilities of SQLite and it
looks like exactly what I need.  Cool stuff, especially in the confines of
all the other great features of SQLite.

The question is if I could put formatted text in a text field, and not have
that show up in a full text search..  I would be fine with basic tags, and
would also be fine limiting it to only open and close, as in:

 

and not use the somewhat shorter format (as appropriate) of .  That
is, if it makes it any easier.


I have done some testing, (not surprisingly) and fts will return the text in
the tag.  Actually, I would probably be surprised if it did not, after-all
that is what it is doing.  I was wondering if there is anyway to manually
remove tags from the virtual table somehow (maybe remove some rows from
*table*_content, *table*_segdir, or *table*_segments).  Alas, I looked into
the content of those tables, and there is nothing too obvious that I could
see to be done from just peering into them.

If anybody has some ideas on this, it would be great.  I could always
provide formatting by having another table which would match up that
document, along with a formatting code and the starting and ending character
it applies to.  That does get more complicated, especially when preparing
the text.  I would really prefer having the tags intermixed with the text,
that is if I can prevent them from being in the full text search.

Any ideas would be loved.

Cheers,
Paul
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users