Hi,

We are working on using Tracker and faced some performance issues.
The objective is to retrieve multiple informations about audio files.
So we do a "big" Sparql query on a test base almost 10 000 files and we face a 
really slow reply causing dbus to timeout.

First we thought it come from the a limit of the complexity of the Sparql query 
causing a "parser stack overflow" error that is described here:
http://mail.gnome.org/archives/tracker-list/2009-December/msg00106.html
But after simplifying the query we discover that the problem may come from the 
conversion of the Sparql to SQL query in 
tracker-0.7.14/src/libtracker-data/tracker-sparql-query.vala
It seems that the SQL generated query is not totally efficient and may be 
improved.

After further investigations, it appears that
tracker-0.7.14/src/libtracker-data/tracker-sparql-query.vala is the source of 
the problem. There is an issue with the way the SQL query is generated.

Here is the Sparql query that we did:
                "SELECT ?url ?title ?len ?creadate ?artnam ?sbpm ?genre ?bpm 
?albtit ?ntrack ?com ?lastaddmod WHERE " //?nbtack
                "{ ?x a nmm:MusicPiece . "
                "optional { ?x nie:isStoredAs ?url } ."
                "optional { ?x nie:title ?title } ."
                "optional { ?x nie:comment ?com } ."
                "optional { ?x nie:contentCreated ?creadate } ."
                "optional { ?x tracker:added ?lastaddmod } ."
                "optional { ?x nfo:genre ?genre } ."
                "optional { ?x nfo:averageBitrate ?sbpm} ."
                "optional { ?x nfo:sampleRate ?bpm } ."
                "optional { ?x nmm:length ?len } ."
                "optional { ?x nmm:performer ?artist . ?artist nmm:artistName 
?artnam } ."
                "optional { ?x nmm:musicAlbum ?album . ?album nmm:albumTitle 
?albtit } ."
                "optional { ?x nmm:trackNumber ?ntrack } ."
        //       "optional { ?x nmm:albumTrackCount ?nbtrack} ."  //Removed 
because of the parser limited stack size
                " } LIMIT %d OFFSET %d";

And attached to this mail [ugly-tracker-sql-sparql.txt] is the automatically 
generated SQL (Reformatted to be human readable).
Trying directly on SQLite this query with the 11k files dataset takes around 2 
/ 3 minutes to return a result.

In the second attached file [good-tracker-sql.txt] is the same SQL request, 
simplified by me, that is like tracker is supposed to generate the SQL from the 
Sparql query.
That query executed with the same 11k files dataset on SQLite, gave a result in 
only 4 / 5 seconds. (And there, the complexity of the Sparql query could be 
increased).

I tried to modify myself the source code for this function, but it was too 
difficult for me because of lack of knowledge on the variables used by the 
Tracker's internal Sparql parser.

So I think you should modify the code in tracker-sparql-query.vala to generate 
a query like this by changing the creation of LEFT JOIN by the creation of a 
SELECT sub-query in the function translate_group_graph_pattern.

I know the common SQL "tips" on Internet that say that LEFT (and RIGHT) JOIN 
are more efficient than sub-queries. But here, always having ID as an index in 
all the tables could explain the difference.

A good idea could be to use SELECT sub-queries inside the main SELECT for the 
optionals of a same subject.

Regards,


Florent Viard
Software Development Engineer
........................................
[email protected]
www.lacie.com







This e-mail and any attachment are confidential and intended solely for the use 
of the individual to whom it is addressed. If you are not the intended 
recipient, please telephone or email the sender and delete this message and any 
attachment from your system. Unauthorized publication, use, dissemination, 
forwarding, printing or copying of this e-mail and its associated attachments 
is strictly prohibited.
SELECT 
(SELECT Uri FROM "rdfs:Resource" WHERE ID = "url_u"), "title_u", "len_u", 
strftime ("%Y-%m-%dT%H:%M:%SZ", "creadate_u", "unixepoch"), "artnam_u", CAST 
("sbpm_u" AS TEXT), "genre_u", CAST ("bpm_u" AS TEXT), "albtit_u", "ntrack_u", 
"com_u", strftime ("%Y-%m-%dT%H:%M:%SZ", "lastaddmod_u", "unixepoch") 
FROM 
(
SELECT * FROM(
            SELECT t10_g."ntrack_u", t9_g."x_u", t9_g."artnam_u", t9_g."len_u", 
t9_g."artist_u", t9_g."url_u", t9_g."com_u", t9_g."bpm_u", t9_g."sbpm_u",       
                        t9_g."album_u", t9_g."title_u", t9_g."lastaddmod_u", 
t9_g."genre_u", t9_g."albtit_u", t9_g."creadate_u" 
            FROM 
                (
                SELECT * FROM (
                        SELECT t8_g."albtit_u", t8_g."album_u", t7_g."x_u", 
t7_g."artnam_u", t7_g."len_u", t7_g."artist_u", t7_g."url_u", t7_g."com_u",     
                            t7_g."bpm_u", t7_g."sbpm_u", t7_g."title_u", 
t7_g."lastaddmod_u", t7_g."genre_u", t7_g."creadate_u" FROM 
                            (
                            SELECT * FROM 
                                (
                                SELECT t6_g."artnam_u", t6_g."artist_u", 
t5_g."x_u", t5_g."len_u", t5_g."url_u", t5_g."com_u", t5_g."bpm_u", 
t5_g."sbpm_u",                                             t5_g."title_u", 
t5_g."lastaddmod_u", t5_g."genre_u", t5_g."creadate_u" FROM 
                                    (
                                    SELECT * FROM 
                                        (SELECT t4_g."len_u", t3_g."x_u", 
t3_g."url_u", t3_g."com_u", t3_g."bpm_u", t3_g."sbpm_u", t3_g."title_u",        
                                              t3_g."lastaddmod_u", 
t3_g."genre_u", t3_g."creadate_u" FROM 
                                            (SELECT * FROM 
                                                (SELECT t2_g."bpm_u", 
t1_g."x_u", t1_g."url_u", t1_g."com_u", t1_g."sbpm_u", t1_g."title_u",          
                                                  t1_g."lastaddmod_u", 
t1_g."genre_u", t1_g."creadate_u" FROM 
                                                    (SELECT 
"nmm:MusicPiece1"."ID" AS "x_u", "nie:InformationElement2"."nie:isStoredAs" AS 
"url_u",                                                             
"nie:InformationElement2"."nie:title" AS "title_u", 
"nie:InformationElement2"."nie:comment" AS                                      
                    "com_u", "nie:InformationElement2"."nie:contentCreated" AS 
"creadate_u",                                                                
"rdfs:Resource3"."tracker:added" AS "lastaddmod_u", "nfo:Media4"."nfo:genre" AS 
"genre_u",                                                              
"nfo:Media4"."nfo:averageBitrate" AS "sbpm_u" FROM "nmm:MusicPiece" AS 
"nmm:MusicPiece1",                                                              
 "nie:InformationElement" AS "nie:InformationElement2", "rdfs:Resource" AS 
"rdfs:Resource3",                                                             
"nfo:Media" AS "nfo:Media4" WHERE "nmm:MusicPiece1"."ID" = 
"nie:InformationElement2"."ID" AND                                              
             "nie:InformationElement2"."ID" = "rdfs:Resource3"."ID" AND 
"rdfs:Resource3"."ID" = "nfo:Media4"."ID"
                                                    )
                                                    AS t1_g LEFT JOIN 
                                                    (SELECT "nfo:Audio5"."ID" 
AS "x_u", "nfo:Audio5"."nfo:sampleRate" AS "bpm_u" FROM "nfo:Audio" AS          
                                                  "nfo:Audio5" WHERE "bpm_u" IS 
NOT NULL
                                                    )
                                                    AS t2_g ON t1_g."x_u" = 
t2_g."x_u"
                                                )
                                            )
                                            AS t3_g LEFT JOIN 
                                            (SELECT "nmm:MusicPiece6"."ID" AS 
"x_u", "nmm:MusicPiece6"."nmm:length" AS "len_u" FROM "nmm:MusicPiece" AS       
                                          "nmm:MusicPiece6" WHERE "len_u" IS 
NOT NULL
                                            ) AS t4_g ON t3_g."x_u" = t4_g."x_u"
                                        )
                                    ) AS t5_g LEFT JOIN
                                    (SELECT "nmm:MusicPiece7"."ID" AS "x_u", 
"nmm:MusicPiece7"."nmm:performer" AS "artist_u",                                
           "nmm:Artist8"."nmm:artistName" AS "artnam_u" FROM "nmm:MusicPiece" 
AS "nmm:MusicPiece7", "nmm:Artist" AS                                           
     "nmm:Artist8" WHERE "nmm:MusicPiece7"."nmm:performer" = "nmm:Artist8"."ID" 
AND "artnam_u" IS NOT NULL
                                    )
                                    AS t6_g ON t5_g."x_u" = t6_g."x_u"
                                )
                            )
                            AS t7_g LEFT JOIN (
                                SELECT "nmm:MusicPiece9"."ID" AS "x_u", 
"nmm:MusicPiece9"."nmm:musicAlbum" AS "album_u",                                
                "nmm:MusicAlbum10"."nmm:albumTitle" AS "albtit_u" FROM 
"nmm:MusicPiece" AS "nmm:MusicPiece9", "nmm:MusicAlbum" AS                      
                         "nmm:MusicAlbum10" WHERE 
"nmm:MusicPiece9"."nmm:musicAlbum" = "nmm:MusicAlbum10"."ID" AND "albtit_u" IS 
NOT NULL
                            ) AS t8_g ON t7_g."x_u" = t8_g."x_u"
                        )
                ) AS t9_g
                LEFT JOIN (
                SELECT "nmm:MusicPiece11"."ID" AS "x_u", 
"nmm:MusicPiece11"."nmm:trackNumber" AS "ntrack_u" FROM "nmm:MusicPiece" AS 
"nmm:MusicPiece11"                         WHERE "ntrack_u" IS NOT NULL
                ) AS t10_g ON t9_g."x_u" = t10_g."x_u"
        )
)
LIMIT ? OFFSET ?
SELECT t1_g."x_u", t1_g."url_u", t1_g."com_u", t1_g."sbpm_u", t1_g."title_u", 
t1_g."lastaddmod_u", t1_g."genre_u", t1_g."creadate_u", t1_g."ntrack_u", 
t1_g."bpm_u", t1_g."len_u", t1_g."artnam_u", t1_g."albtit_u" FROM 
        (SELECT "nmm:MusicPiece1"."ID" AS "x_u", 
"nie:InformationElement2"."nie:isStoredAs" AS "url_u", 
"nie:InformationElement2"."nie:title" AS "title_u",                     
"nie:InformationElement2"."nie:comment" AS "com_u", 
"nie:InformationElement2"."nie:contentCreated" AS "creadate_u",                 
    "rdfs:Resource3"."tracker:added" AS "lastaddmod_u", 
"nfo:Media4"."nfo:genre" AS "genre_u", "nfo:Media4"."nfo:averageBitrate" AS 
"sbpm_u",                       "nmm:MusicPiece1"."nmm:trackNumber" AS 
"ntrack_u", (SELECT "nfo:Audio"."nfo:sampleRate" FROM "nfo:Audio" WHERE 
"nfo:Audio"."ID" =                       
"nie:InformationElement2"."nie:isStoredAs") AS "bpm_u", 
"nmm:MusicPiece1"."nmm:length" AS "len_u", (SELECT 
"nmm:Artist"."nmm:artistName" FROM                   "nmm:Artist" WHERE 
"nmm:MusicPiece1"."nmm:performer" = "nmm:Artist"."ID") AS "artnam_u", (SELECT 
"nmm:MusicAlbum"."nmm:albumTitle" FROM                 "nmm:MusicAlbum" WHERE 
"nmm:MusicPiece1"."nmm:musicAlbum" = "nmm:MusicAlbum"."ID") AS "albtit_u"
         FROM "nmm:MusicPiece" AS "nmm:MusicPiece1", "nie:InformationElement" 
AS "nie:InformationElement2", "rdfs:Resource" AS "rdfs:Resource3", "nfo:Media"  
                  AS "nfo:Media4"
        WHERE "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID" AND 
"nie:InformationElement2"."ID" = "rdfs:Resource3"."ID" AND 
"rdfs:Resource3"."ID" =                   "nfo:Media4"."ID"
       ) AS t1_g
_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to