Package: bacula-console-qt
Version: 5.2.6+dfsg-9
Severity: important

Dear Maintainer,

For recovery of files by means of bRestore, it is necessary to execute at first 
.bvfs_update.

When trying to build the PathHierarchy with .bvfs_update, the update for big 
jobs take forever... 

Steps to Reproduce: 
- have a big dataset (> 1 million files AND a deep directory hierarchy)
- .bvfs_update job=N
- mysql used 100% CPU very mach time
- at this time no run jobs on the schedule

Additional Information: 
I updated bacula from version 5.0 to 5.2 (debian 6 -> 7). In the new bat 
version the "Browse Cataloged Files" function became deprecated. It is offered 
to use bRestore. My bat hanged at bRestore choice. At this time mysql uses 100% 
of Cpu

I enable slow query log on mysql. 

The analysis of slow query is lower

TOP 1 from "#mysqldumpslow mysql-slow.log"

Count: 249  Time=243.29s (60579s)  Lock=0.00s (0s)  Rows=0.0 (0), 
bacula[bacula]@localhost
  INSERT INTO PathVisibility (PathId, JobId)  SELECT a.PathId,N FROM ( SELECT 
DISTINCT h.PPathId AS PathId FROM PathHierarchy AS h JOIN  PathVisibility AS p 
ON (h.PathId=p.PathId) WHERE p.JobId=N) AS a LEFT JOIN (SELECT PathId FROM 
PathVisibility WHERE JobId=N) AS b ON (a.PathId = b.PathId) WHERE b.PathId IS 
NULL

I carried out the analysis and optimization of this query.

Original query:
        INSERT INTO PathVisibility (PathId, JobId) 
        SELECT a.PathId,N 
        FROM ( 
                SELECT DISTINCT h.PPathId AS PathId 
                FROM PathHierarchy AS h 
                JOIN  PathVisibility AS p 
                        ON (h.PathId=p.PathId) 
                WHERE p.JobId=N) AS a 
        LEFT JOIN (
                        SELECT PathId 
                        FROM PathVisibility 
                        WHERE JobId=N) AS b 
                ON (a.PathId = b.PathId) 
        WHERE b.PathId IS NULL

Optimized query:
        INSERT INTO PathVisibility (PathId, JobId) 
        SELECT a.PathId,N 
        FROM ( 
                SELECT DISTINCT h.PPathId AS PathId 
                FROM PathHierarchy AS h 
                JOIN  PathVisibility AS p 
                        ON (h.PathId=p.PathId) 
                WHERE p.JobId=N) AS a 
        LEFT JOIN PathVisibility AS b 
                ON (b.JobId=N and a.PathId = b.PathId) 
        WHERE b.PathId IS NULL

Results of query are identical.

In original query in join on the right there is a temporary table (subquery 
"SELECT PathId FROM PathVisibility WHERE JobId=N"). The temporary table has no 
index across the field of PathId. For each record of the left table there is a 
full search of the right table.

The table PathVisibility has a compound index on the fields JobId and PathId. 
In the optimized query join is carried out with use of this index.

After introduction of editings and recompilation bacula of .bvfs_update it was 
executed in 17 minutes. With old inquiry in 16-17 hours it didn't end and was 
still very far from end.

Count: 19052  Time=0.01s (152s)  Lock=0.00s (1s)  Rows=0.0 (0), 
bacula[bacula]@localhost
  INSERT INTO PathVisibility (PathId, JobId)  SELECT a.PathId,N FROM ( SELECT 
DISTINCT h.PPathId AS PathId FROM PathHierarchy AS h JOIN  PathVisibility AS p 
ON (h.PathId=p.PathId) WHERE p.JobId=N) AS a LEFT JOIN PathVisibility AS b ON 
(JobId=N and a.PathId = b.PathId) WHERE b.PathId IS NULL

The similar situation was stated in 
http://sourceforge.net/mailarchive/forum.php?thread_name=3ccaa3a6fb746ee27273ef0b1db9045b%40bugs.bacula.org&forum_name=bacula-bugs

I have a base in mysql. So detailed check is necessary for other bases. I asked 
the friend to check on postgresql for a lot of work. After 1,5 hours of 
processing of one big job were compelled to stop postgresql.

Patch:

diff -Naur bacula-5.2.6+dfsg/src/cats/bvfs.c_orig 
bacula-5.2.6+dfsg/src/cats/bvfs.c
--- bacula-5.2.6+dfsg/src/cats/bvfs.c_orig      2012-06-02 13:26:27.000000000 
+0400
+++ bacula-5.2.6+dfsg/src/cats/bvfs.c   2014-02-06 10:42:34.798440879 +0400
@@ -404,10 +404,9 @@
      "SELECT DISTINCT h.PPathId AS PathId "
        "FROM PathHierarchy AS h "
        "JOIN  PathVisibility AS p ON (h.PathId=p.PathId) "
-      "WHERE p.JobId=%s) AS a LEFT JOIN "
-       "(SELECT PathId "
-          "FROM PathVisibility "
-         "WHERE JobId=%s) AS b ON (a.PathId = b.PathId) "
+      "WHERE p.JobId=%s) AS a "
+     "LEFT JOIN PathVisibility AS b "
+       "ON (b.JobId=%s and a.PathId = b.PathId) "
    "WHERE b.PathId IS NULL",  jobid, jobid, jobid);
    }



-- System Information:
Debian Release: 7.3
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/dash

Versions of packages bacula-console-qt depends on:
ii  bacula-common  5.2.6+dfsg-9
ii  libc6          2.13-38
ii  libcap2        1:2.22-1.2
ii  libgcc1        1:4.7.2-5
ii  libqtcore4     4:4.8.2+dfsg-11
ii  libqtgui4      4:4.8.2+dfsg-11
ii  libssl1.0.0    1.0.1e-2+deb7u3
ii  libstdc++6     4.7.2-5
ii  libwrap0       7.6.q-24
ii  zlib1g         1:1.2.7.dfsg-13

bacula-console-qt recommends no packages.

bacula-console-qt suggests no packages.

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to