[digikam] [Bug 391115] Very slow search, full text search (FTS) proposition for huge speedup.

2022-01-10 Thread Michał Karwowski
https://bugs.kde.org/show_bug.cgi?id=391115

--- Comment #10 from Michał Karwowski  ---
Hi, nice that you are writing.

Actually, I'm unable to check it even in next 4 weeks :/
For so many reasons, that it's even hard to explain.
I believe that from mid-February i will have resources, a bit of time etc

/Michał Karwowski


pon., 10 sty 2022 o 09:11  napisał(a):

> https://bugs.kde.org/show_bug.cgi?id=391115
>
> --- Comment #9 from caulier.gil...@gmail.com ---
> Hi Michal and happy new year,
>
> Can you check if problem remain with digiKam 7.5.0 pre-release bundle
> available
> here :
>
> https://files.kde.org/digikam/
>
> Thanks in advance
>
> Gilles Caulier
>
> --
> You are receiving this mail because:
> You reported the bug.

-- 
You are receiving this mail because:
You are watching all bug changes.

[digikam] [Bug 325653] MIGRATION : database conversion is very slow

2018-02-26 Thread Michał Karwowski
https://bugs.kde.org/show_bug.cgi?id=325653

Michał Karwowski <pro...@gmail.com> changed:

   What|Removed |Added

 CC||pro...@gmail.com

--- Comment #6 from Michał Karwowski <pro...@gmail.com> ---
I'm confirming, that in DigiKam 5.8.0 - migration from MySQL to SQLite is very
slow.

To speed up, I have created RAM disk to store SQLite database, but still, it
takes ages.

I notice, that during migration .jurnal file is created and deleted over and
over again.

That's why I think it might be related to single insert transaction commits.

According to http://www.sqlite.org/faq.html#q19
---
Transaction speed is limited by disk drive speed because (by default) SQLite
actually waits until the data really is safely stored on the disk surface
before the transaction is complete. That way, if you suddenly lose power or if
your OS crashes, your data is still safe. For details, read about atomic commit
in SQLite..

By default, each INSERT statement is its own transaction. But if you surround
multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped
into a single transaction. The time needed to commit the transaction is
amortized over all the enclosed insert statements and so the time per insert
statement is greatly reduced.
--

Thus probably it would be a good idea to wrap inserts by some transaction.



Moreover, we could try to do some tweaks to improve performance.
We can ask SQLite not to delete and recreate file over and over again for each
insert, by using PRAGMA JOURNAL_MODE.
With value eg. 'TRUNCATE', we can keep journal file, but it will be cleared.

If we can accept loose of imported data on eg power failure, we can use some
pragmas to speed up.

See: https://blog.devart.com/increasing-sqlite-performance.html
See:
https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite

-- 
You are receiving this mail because:
You are watching all bug changes.

[digikam] [Bug 391115] Very slow search, full text search (FTS) proposition for huge speedup.

2018-02-26 Thread Michał Karwowski
https://bugs.kde.org/show_bug.cgi?id=391115

--- Comment #5 from Michał Karwowski <pro...@gmail.com> ---
(In reply to Michał Karwowski from comment #4)
> Thanks for 'DebugView' - I didn't know such tool exists.
> 
> I made more detail research on the problem.
> 
> 
> To be precise, my total media count from DigiKam database stats is about
> 108012.
> (I didn't add all folders to DigiKam)
> 
> When I run the application I see in a log:
> [8144] digikam.database: Search result: 1512168
> 
> But: "1 512 168" is much more than "108 012"!
> Strange.. what does it mean?
> 
> 
> ==
> ==
> Text filter (available on the right) 
> ==
> ==
> 
> The problem with speed appears to be related to "Text filter" which I use.
> Filtering pictures with that filter takes from 20 seconds to 3 minutes.
> 
> DebugView doesn't show logs for database on "text filter".
> Thus now, I believe that's it's not related to database engine,
> but rather internal filtering system.
> 
> Filter query: 'tajlandia' took 2 minutes 31 seconds (to display results -
> 8586 items).
> Filter query: 'żaba' took 1 minute 25 seconds (to display results - 1862
> items).
> etc.
> Clearing filter also takes lots of time.
> 
> I did tests with offline and online albums.
> By term offline I mean disconnected USB drive with albums.
> By term online I mean connected USB drive with albums.
> Measurements I have don after the inilial digikam scan.
> 
> 
> ==
> ==
> Search (available on the left) 
> ==
> ==
> When I use search I see digikam.database logs on DebugView.
> This suggests that for searching database query is made in contrast to
> filtering.
> 
> To be clear I use: digiKam  5.8.0 with mariadb 10.2.10 winx64
> 
> 
> Query for: "gosia" (164794 results?)
> 
> 1448  21:55:15[10424] digikam.general: keywords changed to ' 
> "gosia" '
> 1449  21:55:15[10424] digikam.geoiface:   
> 1450  21:55:15[10424] digikam.geoiface:   
> 1451  21:55:15[10424] digikam.geoiface:   
> 1452  21:55:15[10424] digikam.general: Using  8  CPU core to 
> run threads  
> 1453  21:55:15[10424] digikam.general: Action Thread run  1  
> new jobs 
> 1454  21:55:15[10424] digikam.database: " ( (  
> (Albums.relativePath LIKE
> ?) OR (Images.name LIKE ?) OR (Images.id IN(SELECT imageid FROM
> ImageTags WHERE tagid IN(SELECT id FROM Tags WHERE name LIKE ?))) OR
> (Albums.caption LIKE ?) OR (Albums.collection LIKE ?) OR (Images.id IN 
> (SELECT imageid FROM ImageComments   WHERE type=? AND comment LIKE ?)) OR
> (Images.id IN  (SELECT imageid FROM ImageComments   WHERE type=? AND comment
> LIKE ?))  ) ) "   
> 1455  21:55:15[10424] digikam.database: Search query: 
> 1458  21:55:16[10424] digikam.database: Search result: 164794 
> # Result is ready but not visible
> 
> 1459  21:55:19[10424] digikam.general: Cancel Main Thread 
> 1507  21:55:22[10424] digikam.general: One job is done
> 1508  21:55:24[10424] digikam.general: Cancel Main Thread 
> 1509  21:55:45[10424] digikam.geoiface:   
> 1512  21:55:50[10424] digikam.general: keywords changed to ' 
> "gosia" '
> 1513  21:55:50[10424] digikam.general: same keywords as 
> before,
> ignoring...   
> 1514  21:55:58[10424] digikam.general: keywords changed to ' 
> "ekp" '  
> # application is responsive again
> 
> 
> 
> 
> Query for: "ekp" (394240 results?)
> 
> 
> 1514  21:55:58[10424] digikam.general: keywords changed to ' 
> "ekp" '  
> 1515  21:55:58[10424] digikam.geoiface:   
> 1516  21:55:58[10424] digikam.geoiface:   
> 1517  21:55:58[10424] digikam.geoiface:   
> 1518  21:55:58[10424] digikam.general: Using  8  CPU core to 
> run threads  
> 1519  21:55:58[10424] digikam.general: Action Thread run  1  
> new jobs 
> 1520  21:55:58[10424] digikam.database: " ( (  
> (Albums.relativePath LIKE
> ?) OR (Images.name LIKE ?) OR (Images.id IN(SELECT imageid FROM

[digikam] [Bug 391115] Very slow search, full text search (FTS) proposition for huge speedup.

2018-02-26 Thread Michał Karwowski
https://bugs.kde.org/show_bug.cgi?id=391115

--- Comment #4 from Michał Karwowski <pro...@gmail.com> ---
Thanks for 'DebugView' - I didn't know such tool exists.

I made more detail research on the problem.


To be precise, my total media count from DigiKam database stats is about
108012.
(I didn't add all folders to DigiKam)

When I run the application I see in a log:
[8144] digikam.database: Search result: 1512168

But: "1 512 168" is much more than "108 012"!
Strange.. what does it mean?


==
==
Text filter (available on the right) 
==
==

The problem with speed appears to be related to "Text filter" which I use.
Filtering pictures with that filter takes from 20 seconds to 3 minutes.

DebugView doesn't show logs for database on "text filter".
Thus now, I believe that's it's not related to database engine,
but rather internal filtering system.

Filter query: 'tajlandia' took 2 minutes 31 seconds (to display results - 8586
items).
Filter query: 'żaba' took 1 minute 25 seconds (to display results - 1862
items).
etc.
Clearing filter also takes lots of time.

I did tests with offline and online albums.
By term offline I mean disconnected USB drive with albums.
By term online I mean connected USB drive with albums.
Measurements I have don after the inilial digikam scan.


==
==
Search (available on the left) 
==
==
When I use search I see digikam.database logs on DebugView.
This suggests that for searching database query is made in contrast to
filtering.

To be clear I use: digiKam  5.8.0 with mariadb 10.2.10 winx64


Query for: "gosia" (164794 results?)

144821:55:15[10424] digikam.general: keywords changed to '
"gosia" '
144921:55:15[10424] digikam.geoiface:   
145021:55:15[10424] digikam.geoiface:   
145121:55:15[10424] digikam.geoiface:   
145221:55:15[10424] digikam.general: Using  8  CPU core to
run threads  
145321:55:15[10424] digikam.general: Action Thread run  1 
new jobs 
145421:55:15[10424] digikam.database: " ( ( 
(Albums.relativePath LIKE ?) OR (Images.name LIKE ?) OR (Images.id IN   
(SELECT imageid FROM ImageTags WHERE tagid IN(SELECT id FROM Tags WHERE
name LIKE ?))) OR (Albums.caption LIKE ?) OR (Albums.collection LIKE ?) OR
(Images.id IN  (SELECT imageid FROM ImageComments   WHERE type=? AND comment
LIKE ?)) OR (Images.id IN  (SELECT imageid FROM ImageComments   WHERE type=?
AND comment LIKE ?))  ) ) "  
145521:55:15[10424] digikam.database: Search query: 
145821:55:16[10424] digikam.database: Search result: 164794 
# Result is ready but not visible

145921:55:19[10424] digikam.general: Cancel Main Thread 
150721:55:22[10424] digikam.general: One job is done
150821:55:24[10424] digikam.general: Cancel Main Thread 
150921:55:45[10424] digikam.geoiface:   
151221:55:50[10424] digikam.general: keywords changed to '
"gosia" '
151321:55:50[10424] digikam.general: same keywords as
before, ignoring...   
151421:55:58[10424] digikam.general: keywords changed to '
"ekp" '  
# application is responsive again




Query for: "ekp" (394240 results?)


151421:55:58[10424] digikam.general: keywords changed to '
"ekp" '  
151521:55:58[10424] digikam.geoiface:   
151621:55:58[10424] digikam.geoiface:   
151721:55:58[10424] digikam.geoiface:   
151821:55:58[10424] digikam.general: Using  8  CPU core to
run threads  
151921:55:58[10424] digikam.general: Action Thread run  1 
new jobs 
152021:55:58[10424] digikam.database: " ( ( 
(Albums.relativePath LIKE ?) OR (Images.name LIKE ?) OR (Images.id IN   
(SELECT imageid FROM ImageTags WHERE tagid IN(SELECT id FROM Tags WHERE
name LIKE ?))) OR (Albums.caption LIKE ?) OR (Albums.collection LIKE ?) OR
(Images.id IN  (SELECT imageid FROM ImageComments   WHERE type=? AND comment
LIKE ?)) OR (Images.id IN  (SELECT imageid FROM ImageComments   WHERE type=?
AND comment LIKE ?))  ) ) "  
152121:55:58[10424] digikam.database: Search query: 
152421:56:00[10424] digikam.database: Search result: 394240 
152521:56:08[10424] digikam.general: Cancel Main Threa

[digikam] [Bug 391115] Very slow search, full text search (FTS) proposition for huge speedup.

2018-02-26 Thread Michał Karwowski
https://bugs.kde.org/show_bug.cgi?id=391115

--- Comment #2 from Michał Karwowski <pro...@gmail.com> ---
2018-02-26 19:20 GMT+01:00 <bugzilla_nore...@kde.org>:

> https://bugs.kde.org/show_bug.cgi?id=391115
>
> caulier.gil...@gmail.com changed:
>
>What|Removed |Added
> 
> 
>  CC||caulier.gil...@gmail.com
>
> --- Comment #1 from caulier.gil...@gmail.com ---
> Thanks for this analysis and for the proposal to improve SQL queries.
>
> Just to be sure, which digiKam version do you use exactly ?
>

*I'm using DigiKam 5.8.0 (on Windows 7 x64, currently with MySQL - mariadb
10.2.10 winx64).*

DigiKam seems to be best choice in the future (if search speed would be
improved).
I have tried many applications in different configurations but most of them
have some issues.
Some programs doesn't allow to search for file/folder names or time range
(limitting to time range from NOW only), sometimes accidently disconecting
USB drive couses huge library to disapear for ever).
DigiKam doesn't has such problems. In DigiKam we had speed issues which
disqualifies this application for larger databases.

It seems to be realy good application. I spend many hours to find out, why
searches in digikam are so slow. It is written in C++, and Qt -
technologies which I completly dont undertand :(
But I home introducing full text search (instead of 'like' queries) would
be quite easy for people having that technologies.


best regards,
Michał Karwowski


>
> Gilles Caulier
>
> --
> You are receiving this mail because:
> You reported the bug.

-- 
You are receiving this mail because:
You are watching all bug changes.

[digikam] [Bug 391115] New: Very slow search, full text search (FTS) proposition for huge speedup.

2018-02-26 Thread Michał Karwowski
https://bugs.kde.org/show_bug.cgi?id=391115

Bug ID: 391115
   Summary: Very slow search, full text search (FTS) proposition
for huge speedup.
   Product: digikam
   Version: unspecified
  Platform: Other
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: NOR
 Component: Database
  Assignee: digikam-bugs-n...@kde.org
  Reporter: pro...@gmail.com
  Target Milestone: ---

Hi.

I have more than 10 (100k+) pictures in my DigiKam library.
DigiKam seems to be VERY SLOW with such amount of pictures.

Using Sqlite database on RamDisk or using MySql database doesn't realy help
when I try to filter (search) pictures.


I notice that the speed issue is caused, by inefficent queries.

I cought SQL queris which looks like:

SELECT * from .. WHERE some_key LIKE '%query%'  (but ofcourse query is done
across multiple tables)



Queries to database with 'like' operator are very SLOW BY DESIGN.

Such queries requires FULL TABLE SCAN (reading every record and comparing it
with a query).

Nowadays many database backends support builtin full text search engines.

Even Sqlite have such one!


Sqlite


-
EXAMPLE SPEED GAIN by using FTS index instead of 'like' operator.
from: https://www.sqlite.org/fts3.html
-
SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux';  /* 0.03 seconds
*/
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds
*/



http://www.sqlitetutorial.net/sqlite-full-text-search/
https://sqlite.org/fts5.html

=
PostgreSQL
=
https://www.postgresql.org/docs/9.5/static/textsearch.html

=
MySQL
=
https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html


I belive introducing Full text search to DigiKam, could drasticly improve
search/filtering performace.

Without this feature, this application is useless.
Queries TAKES MINUTES (on i7, 16GB ram, SSD).

Moreover it's not clear for the user if DigiKam is realy doing anything during
search. After minutes results appear.

Later scrollig is as fast as usually, the problem is with searchin/filtering.

-- 
You are receiving this mail because:
You are watching all bug changes.