Re: [sqlite] [FTS3] Understanding the Flow of data through the tokenizer

2011-07-24 Thread Dan Kennedy
On 07/24/2011 08:16 PM, Abhinav Upadhyay wrote:
> Hi,
>
> I am trying to write my own custom tokenizer to filter stopwords apart
> from doing normalization and stemming. I have gone through the
> comments in fts3_tokenizer.h and also read the implementation of the
> simple tokenizer. While overall I am able to understand what I need to
> do to implement this tokenizer, but I still cannot visualize how the
> FTS engine calls the tokenizer and what data in what form it passes to
> it.
>
> Does the FTS engine pass the complete document data to the tokenizer
> or it passes some chunks of data, or individual words ? I need to
> understand this part because the next function needs to set the
> offsets accordingly. By just going through the code of the simple
> tokenizer I could not completely comprehend it (it would have been
> better if I could debug it).
>
> By the next functio I mean this: int (*xNext)(
>  sqlite3_tokenizer_cursor *pCursor,   /* Tokenizer cursor */
>  const char **ppToken, int *pnBytes,  /* OUT: Normalized text for token */
>  int *piStartOffset,  /* OUT: Byte offset of token in input buffer */
>  int *piEndOffset,/* OUT: Byte offset of end of token in input buffer 
> */
>  int *piPosition  /* OUT: Number of tokens returned before this one */
>);
> };
>
> It would be better if you could explain what is the role of these
> parameters: piEndOffset , piStartOffset ?

Each time xNext() returns SQLITE_OK to return a new token, xNext()
should set:

   *piStartOffset to the number of bytes in the input buffer before
   start of the token being returned,

   *piEndOffset to *piStartOffset plus the number of bytes in the
   token text, and

   *piPosition to the number of tokens that occur in the input buffer
   before the token being returned.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] System.DllNotFoundException: SQLite.Interop.DLL

2011-07-24 Thread Joe Mistachkin

Very odd, with the "UseSqliteStandard" property enabled, that error message
should not be possible.

How are you compiling the SQLite.Interop project?

When you compiled the System.Data.SQLite assembly, did you use the real
MSBuild (i.e. not XBuild)?

The command line I gave you would have produced a "Debug" build, did you
grab
the assembly from that directory?

For a "Release" build, you can use the following command line (all on one
line):

MSBuild System.Data.SQLite.2008.csproj /t:Rebuild
/p:Configuration=Release
/p:UseInteropDll=false /p:UseSqliteStandard=true

One potential problem I thought of the other day was the casing of the file
extension on the DLLs.  I checked in a change to make the file extensions
for
the DLLs all lower case.

Another possible issue is (although, this was Windows specific):

https://bugzilla.novell.com/show_bug.cgi?id=636915

--
Joe Mistachkin

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SELECT query first run is VERY slow

2011-07-24 Thread Григорий Григоренко
ANALYZE lasted for ~ 15 minutes.


24 июля 2011, 17:21 от Tito Ciuro :
> Hi,
> 
> It has worked fairly well with small databases, but I see the problem with 
> medium to large files. Have you tried to run ANALYZE on your database? I'm 
> curious to know how long it takes.
> 
> -- Tito
> 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selected string differs to Inserted one

2011-07-24 Thread Teg
Hello Andrew,

I convert all my strings the UTF-8 before insert or selecting. You
probably need to look into something like that too. Filenames for the
DB files have to be UTF-8 too or you'll have problems opening files
sometimes.

My test folder has an umlaut in the path so, this code gets exercised
every time I run my program.

C

Sunday, July 24, 2011, 10:39:51 AM, you wrote:

AL> Dear All

AL>  

AL> I am very new to SQLite and am trying to convert a Windows Forms
AL> C#.NET (VS2010 SP1) application from an Access database to an
AL> SQLite one.  All seems to have gone extremely well but I have come
AL> across one problem that has held me up for several days now.

AL>  

AL> I have a table with a TEXT field that does not return the same
AL> string that was inserted.  The character that misbehaves is Greek
AL> letter "Ø".  In fact any character from the high end of the ANSII
AL> or ASCII table shows the problem.  

AL>  

AL> Since I have done nothing special in creating the database I
AL> believe it is encoded UTF-8 by default.  I assume that my inserted
AL> string is similarly encoded UTF-8 since I have done nothing
AL> special in my C# code to change this.  The data in the database
AL> seems to be correct since the SQLite Administrator program correctly 
displays the data.

AL>  

AL> Why does the Selected data come back wrong, and how can I correct
AL> this.  Clearly this is a bit of a non-problem otherwise every
AL> other .NET user would be screaming but I have searched for several
AL> days now without finding a solution.

AL>  

AL> Can anyone help me please? 

AL> Thanks in advance

AL> ~A

AL> Andrew Leeder  

AL>  

AL> .

AL>  

AL> ___
AL> sqlite-users mailing list
AL> sqlite-users@sqlite.org
AL> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Best regards,
 Tegmailto:t...@djii.com

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selected string differs to Inserted one

2011-07-24 Thread Stephan Beal
On Sun, Jul 24, 2011 at 4:39 PM, Andrew Leeder
wrote:

> I have a table with a TEXT field that does not return the same string that
> was inserted.  The character that misbehaves is Greek letter "Ø".  In fact
> any character from the high end of the ANSII or ASCII table shows the
> problem.
>

ASCII actually stops at 127, so anything above that is no longer ASCII.
There are about a bazillion different 8-bit encodings based off of ASCII,
which define what lives at bits 128-255. i suspect you're using latin1,
which is not utf8-compatible (but is ascii-compatible up to value 127).

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Selected string differs to Inserted one

2011-07-24 Thread Andrew Leeder
Dear All

 

I am very new to SQLite and am trying to convert a Windows Forms C#.NET (VS2010 
SP1) application from an Access database to an SQLite one.  All seems to have 
gone extremely well but I have come across one problem that has held me up for 
several days now.

 

I have a table with a TEXT field that does not return the same string that was 
inserted.  The character that misbehaves is Greek letter "Ø".  In fact any 
character from the high end of the ANSII or ASCII table shows the problem.  

 

Since I have done nothing special in creating the database I believe it is 
encoded UTF-8 by default.  I assume that my inserted string is similarly 
encoded UTF-8 since I have done nothing special in my C# code to change this.  
The data in the database seems to be correct since the SQLite Administrator 
program correctly displays the data.

 

Why does the Selected data come back wrong, and how can I correct this.  
Clearly this is a bit of a non-problem otherwise every other .NET user would be 
screaming but I have searched for several days now without finding a solution.

 

Can anyone help me please? 

Thanks in advance

~A

Andrew Leeder  

 

.

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SELECT query first run is VERY slow

2011-07-24 Thread Tito Ciuro
Hi,

It has worked fairly well with small databases, but I see the problem with 
medium to large files. Have you tried to run ANALYZE on your database? I'm 
curious to know how long it takes.

-- Tito

On Jul 24, 2011, at 8:26 AM, Григорий Григоренко wrote:

>> 
>> Perhaps my post dated Aug. 19, 2009 will help a little bit:
>> 
>> http://www.cocoabuilder.com/archive/cocoa/242954-core-data-dog-slow-when-using-first-time-after-boot.html
>> 
>> -- Tito
>> 
> 
> Thanks for sharing.  "warming file" is a way to cache whole database as I 
> understand it.
> 
> After everything is cached scattered reading from database runs faster. 
> 
> Unfortunately, in my case base size is ~ 3.5 Gb; it's too big for this 
> strategy.
> 
> Even if I read at 25 Mb/s rate it will took 3500 / 25 = ~140 seconds just to 
> read whole db file.
> 
> And what's more important I've only 2 Gb of RAM. 
> 
> Anyway, thanks for sharing. I guess these cases are similar. 
> 
> 
> To me problem looks like this: 
> 
> SQLITE needs to read  (cache) from db a lot (too much?) while first-time 
> query execution even if a query uses nicely matched index and returns nothing.
> 
> And SQLITE is doing lot's of scattered readings during query execution; not 
> trying to somehow batch read or similar. That's why file caching helps.
> 
> If it's true not sure there's a simple and nice solution. 
> 
> I'll try some ideas (including normalization) and report results in this 
> topic next week.
> 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] [FTS3] Understanding the Flow of data through the tokenizer

2011-07-24 Thread Abhinav Upadhyay
Hi,

I am trying to write my own custom tokenizer to filter stopwords apart
from doing normalization and stemming. I have gone through the
comments in fts3_tokenizer.h and also read the implementation of the
simple tokenizer. While overall I am able to understand what I need to
do to implement this tokenizer, but I still cannot visualize how the
FTS engine calls the tokenizer and what data in what form it passes to
it.

Does the FTS engine pass the complete document data to the tokenizer
or it passes some chunks of data, or individual words ? I need to
understand this part because the next function needs to set the
offsets accordingly. By just going through the code of the simple
tokenizer I could not completely comprehend it (it would have been
better if I could debug it).

By the next functio I mean this: int (*xNext)(
sqlite3_tokenizer_cursor *pCursor,   /* Tokenizer cursor */
const char **ppToken, int *pnBytes,  /* OUT: Normalized text for token */
int *piStartOffset,  /* OUT: Byte offset of token in input buffer */
int *piEndOffset,/* OUT: Byte offset of end of token in input buffer */
int *piPosition  /* OUT: Number of tokens returned before this one */
  );
};

It would be better if you could explain what is the role of these
parameters: piEndOffset , piStartOffset ?

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] System.DllNotFoundException: SQLite.Interop.DLL

2011-07-24 Thread Grant Dunoon
Thanks Joe but no luck.

I tracked down the folders that are referenced by LD_LIBRARY_PATH in the
Linux OS and copied in SQLite.Interop.DLL into it with no luck.

>>MSBuild System.Data.SQLite.2008.csproj /t:Rebuild /p:UseInteropDll=false
/p:UseSqliteStandard=true

I then created the dll as above except now I'm getting: Unable to find an
entry point named 'sqlite3_open_interop' in DLL 'System.Data.SQLite.DLL'.

In both Windows and Linux.

Not sure what is going on.

Thanks
Grant




On 23 July 2011 16:15, Joe Mistachkin  wrote:

> >
> > I upgrade to the latest Managed Only System.Data.SQLite 1.0.74.0 which I
> > understand I have to rename sqlite.so to SQLite.Interop.dll
> > /SQLite.Interop.DLL (tried both) and placed in the apps bin folder.
> >
>
> Alternatively, you can compile the System.Data.SQLite project using:
>
> MSBuild System.Data.SQLite.2008.csproj /t:Rebuild /p:UseInteropDll=false
> /p:UseSqliteStandard=true
>
> This will produce a managed assembly that looks for a shared library named
> "sqlite3".
>
> >
> > When I run the app and try to access the database i'm getting the
> > error: System.DllNotFoundException: SQLite.Interop.DLL.
> >
>
> I seem to recall that the LD_LIBRARY_PATH may need to be modified to
> actually
> look in the bin folder for the application?  I could be wrong here because
> I
> am not an expert on Mono.
>
> --
> Joe Mistachkin
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SELECT query first run is VERY slow

2011-07-24 Thread Григорий Григоренко
> 
> Perhaps my post dated Aug. 19, 2009 will help a little bit:
> 
> http://www.cocoabuilder.com/archive/cocoa/242954-core-data-dog-slow-when-using-first-time-after-boot.html
> 
> -- Tito
> 

Thanks for sharing.  "warming file" is a way to cache whole database as I 
understand it.

After everything is cached scattered reading from database runs faster. 

Unfortunately, in my case base size is ~ 3.5 Gb; it's too big for this strategy.

Even if I read at 25 Mb/s rate it will took 3500 / 25 = ~140 seconds just to 
read whole db file.

And what's more important I've only 2 Gb of RAM. 

Anyway, thanks for sharing. I guess these cases are similar. 


To me problem looks like this: 

SQLITE needs to read  (cache) from db a lot (too much?) while first-time query 
execution even if a query uses nicely matched index and returns nothing.

And SQLITE is doing lot's of scattered readings during query execution; not 
trying to somehow batch read or similar. That's why file caching helps.

If it's true not sure there's a simple and nice solution. 

I'll try some ideas (including normalization) and report results in this topic 
next week.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users