Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Hello, Graham Keeling wrote: Hello, I now believe that the 'taking hours' problem that I was having was down to having additional indexes on my File table, as Eric suggested. I am using mysql-5.0.45. I had these indexes: JobId JobId, PathId, FilenameId PathId FilenameId Now I have these indexes: JobId JobId, PathId, FilenameId The queries on my 'real' database now take about a second, rather than half a day. Nice to see that my first advise was the good one. A suggestion - perhaps the following comment in src/cats/make_mysql_tables.in could be changed to include a warning: # # Possibly add one or more of the following indexes # to the above File table if your Verifies are # too slow. # # INDEX (PathId), # INDEX (FilenameId), # INDEX (FilenameId, PathId) # INDEX (JobId), # However, I also tested the 3.0.3 and 5.0.1 queries using Eric's test script and the much larger database that it generates. I found that there is a definite slowdown. Results from do_bench(10,13, 220). In this case, the slowdown is about 15%. new|220|220|312 old|220|220|268 graham|220|220|158 Result 'graham' is the time it takes to do a query that I came up with that looks similar to the postgresql query, but uses the mysql group by trick that is frowned upon: SELECT MAX(JobTDate) AS JobTDate, JobId, FileId, FileIndex, PathId, FilenameId, LStat, MD5 FROM (SELECT JobTDate, JobId, FileId, FileIndex, PathId, FilenameId, LStat, MD5 FROM (SELECT FileId, JobId, PathId, FilenameId, FileIndex, LStat, MD5 FROM File WHERE JobId IN ($jobid) UNION ALL SELECT File.FileId, File.JobId, PathId, FilenameId, File.FileIndex, LStat, MD5 FROM BaseFiles JOIN File USING (FileId) WHERE BaseFiles.JobId IN ($jobid) ) AS T JOIN Job USING (JobId) ORDER BY FilenameId, PathId, JobTDate DESC ) AS U GROUP BY PathId, FilenameId I don't think it will work, the documentation saids that if two records have the same FilenameId, PathId, the value for LStat, MD5, JobId and FileIndex will be a random one (which is not acceptable in our case). Thanks for your feeback and your help. Bye -- View this message in context: http://old.nabble.com/VirtualFull-mysql-query-blocks-other-jobs-for-a-long-time-tp28149748p28247276.html Sent from the Bacula - Users mailing list archive at Nabble.com. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Hello, I now believe that the 'taking hours' problem that I was having was down to having additional indexes on my File table, as Eric suggested. I am using mysql-5.0.45. I had these indexes: JobId JobId, PathId, FilenameId PathId FilenameId Now I have these indexes: JobId JobId, PathId, FilenameId The queries on my 'real' database now take about a second, rather than half a day. A suggestion - perhaps the following comment in src/cats/make_mysql_tables.in could be changed to include a warning: # # Possibly add one or more of the following indexes # to the above File table if your Verifies are # too slow. # # INDEX (PathId), # INDEX (FilenameId), # INDEX (FilenameId, PathId) # INDEX (JobId), # However, I also tested the 3.0.3 and 5.0.1 queries using Eric's test script and the much larger database that it generates. I found that there is a definite slowdown. Results from do_bench(10,13, 220). In this case, the slowdown is about 15%. new|220|220|312 old|220|220|268 graham|220|220|158 Result 'graham' is the time it takes to do a query that I came up with that looks similar to the postgresql query, but uses the mysql group by trick that is frowned upon: SELECT MAX(JobTDate) AS JobTDate, JobId, FileId, FileIndex, PathId, FilenameId, LStat, MD5 FROM (SELECT JobTDate, JobId, FileId, FileIndex, PathId, FilenameId, LStat, MD5 FROM (SELECT FileId, JobId, PathId, FilenameId, FileIndex, LStat, MD5 FROM File WHERE JobId IN ($jobid) UNION ALL SELECT File.FileId, File.JobId, PathId, FilenameId, File.FileIndex, LStat, MD5 FROM BaseFiles JOIN File USING (FileId) WHERE BaseFiles.JobId IN ($jobid) ) AS T JOIN Job USING (JobId) ORDER BY FilenameId, PathId, JobTDate DESC ) AS U GROUP BY PathId, FilenameId -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
On Thu, Apr 08, 2010 at 12:29:05PM -0700, ebollengier wrote: Graham Keeling wrote: On Thu, Apr 08, 2010 at 07:44:14AM -0700, ebollengier wrote: Hello, Graham Keeling wrote: Hello, I'm still waiting for my test database to fill up with Eric's data (actually, it's full now, but generating the right indexes is taking lots of time). But, I have another proposed solution, better than the last one I made. My previous solution was still taking a very very long time for a backup of a particular client that I had. Removing mention of BaseFiles did not help for this client. However, the following did, and it doesn't break Base jobs. Eric, I would appreciate it if you could give this a go on your test machine. It removes nasty the join on JobTDate by replacing it with a join on JobId (which makes more sense, and is also an index on Job and File). It's not possible, for example when you bscan a volume, or you run a copy, old records with old JobTDate have new FileIds or JobIds. You can't trust this field. It will work in many situations, but also will fail in an horrible way on many others... The first version (3.0.3) was using FileId, and it was quite good, but it could result wrong results on some cases, and for restore, you don't have choice, you need the exact one. I don't understand this at all. If you cannot trust the JobIds or FileIds in the File table, then the postgres query is also broken. The postgres query doesn't even mention JobTDate. In fact, the postgres query is using StartTime to do the ordering. And JobTDate is equivalent to StartTime (can be changed in PostgreSQL or in MySQL) I still don't understand your original point. If you cannot trust the JobIds or FileIds in the File table, then the postgres query is also broken. To clarify, can you tell me which fields in the File table you can trust? This also means it can get rid of the outer WHERE 'BaseJobId' OR 'JobId' that I was complaining about before. The correct JobId is chosen with MAX(JobTDate) by ordering by JobTDate DESC on the innermost select. I would love that, it's what DISTINCT ON() on PostgreSQL does. But, unfortunately in SQL, you can only get fields that have a group function (like MAX, MIN, AVG) and fields present in GROUP BY. (we can also take FileIndex, MD5 and LStat at the same time) from http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html When using this feature, all rows in each group should have the same values for the columns that are ommitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same. This is not only a MySQL gotcha, it's an SQL property. Unless i'm mistaken this paragraph, I'm sure that we can't use it :-( OK, I understand what you are saying here. I appreciate your help, and I hope that we will find a solution for those that have this performance problem (that I can't reproduce myself). I am probably out of ideas now, other than: a) reverting to 3.0.3, b) reverting to the 3.0.3 queries, or c) switching to postgres (with all the horrible migration problems that will cause). Even if we found a workaround for MySQL, Postgres will stay far more faster (For the 2M file query, postgres was about 12s, MySQL 3.0.3 at 60s and MySQL 5.0.1 at 90s) Can you confirm that the BaseJob doesn't change your timing ? I think that I must have made some sort of mistake when I was testing that. I do not believe that it changes my timing enough to help with the problem that I have. I also think that there must be some kind of mistake with your test program because the slowness of the query between the 3.0.3 query and the 5.0.1 query is very, very massive on my real database. I will attempt to prove this soon (I still haven't set up your test database properly). -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Graham Keeling wrote: I don't understand this at all. If you cannot trust the JobIds or FileIds in the File table, then the postgres query is also broken. The postgres query doesn't even mention JobTDate. In fact, the postgres query is using StartTime to do the ordering. And JobTDate is equivalent to StartTime (can be changed in PostgreSQL or in MySQL) I still don't understand your original point. If you cannot trust the JobIds or FileIds in the File table, then the postgres query is also broken. No, the PostgreSQL query is using StartTime to find the latest record for a given (FilenameId, PathId) tuple, it's not broken. see sql_cmd.c. The DISTINCT ON() permits to use other fields of this record directly (which seems to be the normal way to do things) To clarify, can you tell me which fields in the File table you can trust? You don't have direct File table fields that gives you info about the backup time. (only somewhere in the LStat field, but it will be hard to extract, performance wouldn't be there, and you have to join the File table again to get MD5 and the rest of the LStat field). The main difference between the 3.0.3 query and the 5.0.x one, is that instead of using FileId to join the table a second time to get all info, it uses JobId, FilenameId and PathId. We already have an index on those 3 criteria, and it should be a bit slower, but nothing to do with hours or days of work. What is possible, is that MySQL is trying to use an other index (that you added), and the result is worst than the original plan with the composed index. Even if we found a workaround for MySQL, Postgres will stay far more faster (For the 2M file query, postgres was about 12s, MySQL 3.0.3 at 60s and MySQL 5.0.1 at 90s) Can you confirm that the BaseJob doesn't change your timing ? I think that I must have made some sort of mistake when I was testing that. I do not believe that it changes my timing enough to help with the problem that I have. I also think that there must be some kind of mistake with your test program because the slowness of the query between the 3.0.3 query and the 5.0.1 query is very, very massive on my real database. I'm really thinking that the problem is on the MySQL side (bad version perhaps), or on your modifications (my tests shows that with a FilenameId, PathId index, results are 10 times slower than with the default indexes) What version of MySQL are you using ? (and on which OS) I will attempt to prove this soon (I still haven't set up your test database properly). Here, it takes less than 1 hour (maybe 20 mins), to get the job done... (It's a Ubuntu with MySQL 5.0.51a, and i'm also using MySQL 5.1.45) Bye -- View this message in context: http://old.nabble.com/VirtualFull-mysql-query-blocks-other-jobs-for-a-long-time-tp28149748p28190830.html Sent from the Bacula - Users mailing list archive at Nabble.com. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
On Fri, Apr 09, 2010 at 05:27:44AM -0700, ebollengier wrote: I'm really thinking that the problem is on the MySQL side (bad version perhaps), or on your modifications (my tests shows that with a FilenameId, PathId index, results are 10 times slower than with the default indexes) What version of MySQL are you using ? (and on which OS) I've been testing this with the default bacula indexes. I have: Kernel 2.4.33.3, mysql 5.0.45. The reporter of bug 1472 (the same problem), mnalis, has: Kernel 2.6.32+23, mysql 5.1.43-1~bpo50+1. What are you using? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
On Wed, Apr 07, 2010 at 09:58:51AM -0700, ebollengier wrote: ebollengier wrote: Graham Keeling wrote: On Wed, Apr 07, 2010 at 08:22:09AM -0700, ebollengier wrote: I tweaked my test to compare both queries, and it shows no difference with and without base job part... If you want to test queries on your side with your data, you can download my tool (accurate-test.pl) on http://bacula.git.sourceforge.net/git/gitweb.cgi?p=bacula/docs;a=tree;f=docs/techlogs;hb=HEAD If you can tweak my script to reproduce your problem, i would be able to fix things. http://old.nabble.com/file/p28166612/diff-with-without-basejob.png I'm currently running your script to generate the test database. I think that is going to take a long time, so I'll leave it overnight. This is your first problem, on my server (just a workstation), it takes less than 10s to add 200,000 records... In fact, it's 4seconds, here... JobId=15 files=2 Insert takes 4secs for 20 records JobId=16 files=5 Insert takes 11secs for 50 records JobId=20 files=20 Insert takes 43secs for 200 records I was running these three simultaneously, as described in the script. I assume that fill_table.pl is the same as docs_techlogs_accurate-test.pl. # filename=1 ./fill_table.pl # path=1 ./fill_table.pl # file=1 ./fill_table.pl However, I was also not running it on the machine that I did my original tests on because it didn't have pwgen and the perl DBD-mysql thing installed. But I have installed them now, so I will switch back to that machine. Running one instance looks comparable with yours: tserv tmp # file=1 docs_techlogs_accurate-test.pl JobId=1 files=2 Insert takes 9secs for 20 records JobId=2 files=2 Insert takes 6secs for 20 records JobId=3 files=2 Insert takes 8secs for 20 records JobId=4 files=2 (but I think it is still going to take some time) -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Hello, I'm still waiting for my test database to fill up with Eric's data (actually, it's full now, but generating the right indexes is taking lots of time). But, I have another proposed solution, better than the last one I made. My previous solution was still taking a very very long time for a backup of a particular client that I had. Removing mention of BaseFiles did not help for this client. However, the following did, and it doesn't break Base jobs. Eric, I would appreciate it if you could give this a go on your test machine. It removes nasty the join on JobTDate by replacing it with a join on JobId (which makes more sense, and is also an index on Job and File). This also means it can get rid of the outer WHERE 'BaseJobId' OR 'JobId' that I was complaining about before. The correct JobId is chosen with MAX(JobTDate) by ordering by JobTDate DESC on the innermost select. SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS FilenameId, LStat, MD5 FROM Job, File, ( SELECT JobId, MAX(JobTDate) AS JobTDate, PathId, FilenameId FROM ( SELECT JobId, JobTDate, PathId, FilenameId FROM File JOIN Job USING (JobId) WHERE File.JobId IN ($jobids) UNION ALL SELECT BaseJobId, JobTDate, PathId, FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job ON(BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN ($jobids) ORDER BY JobTDate DESC ) AS tmp GROUP BY PathId, FilenameId ) AS T1 WHERE T1.JobId = Job.JobId AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename ON (Filename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WHERE FileIndex 0 ORDER BY Path.Path, Filename.Name, Temp.JobId ASC; -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Hello, Graham Keeling wrote: Hello, I'm still waiting for my test database to fill up with Eric's data (actually, it's full now, but generating the right indexes is taking lots of time). But, I have another proposed solution, better than the last one I made. My previous solution was still taking a very very long time for a backup of a particular client that I had. Removing mention of BaseFiles did not help for this client. However, the following did, and it doesn't break Base jobs. Eric, I would appreciate it if you could give this a go on your test machine. It removes nasty the join on JobTDate by replacing it with a join on JobId (which makes more sense, and is also an index on Job and File). It's not possible, for example when you bscan a volume, or you run a copy, old records with old JobTDate have new FileIds or JobIds. You can't trust this field. It will work in many situations, but also will fail in an horrible way on many others... The first version (3.0.3) was using FileId, and it was quite good, but it could result wrong results on some cases, and for restore, you don't have choice, you need the exact one. This also means it can get rid of the outer WHERE 'BaseJobId' OR 'JobId' that I was complaining about before. The correct JobId is chosen with MAX(JobTDate) by ordering by JobTDate DESC on the innermost select. I would love that, it's what DISTINCT ON() on PostgreSQL does. But, unfortunately in SQL, you can only get fields that have a group function (like MAX, MIN, AVG) and fields present in GROUP BY. (we can also take FileIndex, MD5 and LStat at the same time) from http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html When using this feature, all rows in each group should have the same values for the columns that are ommitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same. This is not only a MySQL gotcha, it's an SQL property. Unless i'm mistaken this paragraph, I'm sure that we can't use it :-( I appreciate your help, and I hope that we will find a solution for those that have this performance problem (that I can't reproduce myself). Bye -- View this message in context: http://old.nabble.com/VirtualFull-mysql-query-blocks-other-jobs-for-a-long-time-tp28149748p28179634.html Sent from the Bacula - Users mailing list archive at Nabble.com. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
On Thu, Apr 08, 2010 at 07:44:14AM -0700, ebollengier wrote: Hello, Graham Keeling wrote: Hello, I'm still waiting for my test database to fill up with Eric's data (actually, it's full now, but generating the right indexes is taking lots of time). But, I have another proposed solution, better than the last one I made. My previous solution was still taking a very very long time for a backup of a particular client that I had. Removing mention of BaseFiles did not help for this client. However, the following did, and it doesn't break Base jobs. Eric, I would appreciate it if you could give this a go on your test machine. It removes nasty the join on JobTDate by replacing it with a join on JobId (which makes more sense, and is also an index on Job and File). It's not possible, for example when you bscan a volume, or you run a copy, old records with old JobTDate have new FileIds or JobIds. You can't trust this field. It will work in many situations, but also will fail in an horrible way on many others... The first version (3.0.3) was using FileId, and it was quite good, but it could result wrong results on some cases, and for restore, you don't have choice, you need the exact one. I don't understand this at all. If you cannot trust the JobIds or FileIds in the File table, then the postgres query is also broken. The postgres query doesn't even mention JobTDate. In fact, the postgres query is using StartTime to do the ordering. This also means it can get rid of the outer WHERE 'BaseJobId' OR 'JobId' that I was complaining about before. The correct JobId is chosen with MAX(JobTDate) by ordering by JobTDate DESC on the innermost select. I would love that, it's what DISTINCT ON() on PostgreSQL does. But, unfortunately in SQL, you can only get fields that have a group function (like MAX, MIN, AVG) and fields present in GROUP BY. (we can also take FileIndex, MD5 and LStat at the same time) from http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html When using this feature, all rows in each group should have the same values for the columns that are ommitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same. This is not only a MySQL gotcha, it's an SQL property. Unless i'm mistaken this paragraph, I'm sure that we can't use it :-( OK, I understand what you are saying here. I appreciate your help, and I hope that we will find a solution for those that have this performance problem (that I can't reproduce myself). I am probably out of ideas now, other than: a) reverting to 3.0.3, b) reverting to the 3.0.3 queries, or c) switching to postgres (with all the horrible migration problems that will cause). -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Graham Keeling wrote: On Thu, Apr 08, 2010 at 07:44:14AM -0700, ebollengier wrote: Hello, Graham Keeling wrote: Hello, I'm still waiting for my test database to fill up with Eric's data (actually, it's full now, but generating the right indexes is taking lots of time). But, I have another proposed solution, better than the last one I made. My previous solution was still taking a very very long time for a backup of a particular client that I had. Removing mention of BaseFiles did not help for this client. However, the following did, and it doesn't break Base jobs. Eric, I would appreciate it if you could give this a go on your test machine. It removes nasty the join on JobTDate by replacing it with a join on JobId (which makes more sense, and is also an index on Job and File). It's not possible, for example when you bscan a volume, or you run a copy, old records with old JobTDate have new FileIds or JobIds. You can't trust this field. It will work in many situations, but also will fail in an horrible way on many others... The first version (3.0.3) was using FileId, and it was quite good, but it could result wrong results on some cases, and for restore, you don't have choice, you need the exact one. I don't understand this at all. If you cannot trust the JobIds or FileIds in the File table, then the postgres query is also broken. The postgres query doesn't even mention JobTDate. In fact, the postgres query is using StartTime to do the ordering. And JobTDate is equivalent to StartTime (can be changed in PostgreSQL or in MySQL) This also means it can get rid of the outer WHERE 'BaseJobId' OR 'JobId' that I was complaining about before. The correct JobId is chosen with MAX(JobTDate) by ordering by JobTDate DESC on the innermost select. I would love that, it's what DISTINCT ON() on PostgreSQL does. But, unfortunately in SQL, you can only get fields that have a group function (like MAX, MIN, AVG) and fields present in GROUP BY. (we can also take FileIndex, MD5 and LStat at the same time) from http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html When using this feature, all rows in each group should have the same values for the columns that are ommitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same. This is not only a MySQL gotcha, it's an SQL property. Unless i'm mistaken this paragraph, I'm sure that we can't use it :-( OK, I understand what you are saying here. I appreciate your help, and I hope that we will find a solution for those that have this performance problem (that I can't reproduce myself). I am probably out of ideas now, other than: a) reverting to 3.0.3, b) reverting to the 3.0.3 queries, or c) switching to postgres (with all the horrible migration problems that will cause). Even if we found a workaround for MySQL, Postgres will stay far more faster (For the 2M file query, postgres was about 12s, MySQL 3.0.3 at 60s and MySQL 5.0.1 at 90s) Can you confirm that the BaseJob doesn't change your timing ? Bye -- View this message in context: http://old.nabble.com/VirtualFull-mysql-query-blocks-other-jobs-for-a-long-time-tp28149748p28183478.html Sent from the Bacula - Users mailing list archive at Nabble.com. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
On Tue, Apr 06, 2010 at 09:01:13AM -0700, ebollengier wrote: Hello Graham, Hello, thanks for your reply. Graham Keeling wrote: Hello, I'm using bacula-5.0.1. I have a 2.33GHz CPU with 2G of RAM. I am using MySQL. I had a VirtualFull scheduled for my client. My log says the following: Apr 4 18:56:02 Start Virtual Backup JobId 56, Job=Linux:cvs.2010-04-04_18.56.00_03 Apr 4 18:56:02 This Job is not an Accurate backup so is not equivalent to a Full backup. Apr 6 03:56:12 Bootstrap records written to /var/lib/bacula/backup.dev.equiinet.com-.restore.1.bsr Apr 6 10:23:20 Ready to read from volume backup-0002 on device Tower 2.0 (/write/mnt/Tower 2). Apr 6 10:23:20 Labeled new Volume backup-0046 on device Tower 2.1 (/write/mnt/Tower 2). ...and now, backup-0046 is being written to. At about Apr 6 10:00, I was logged on to the director, and attempting to run a 'status, director' command on bconsole. bconsole printed a few bits of information, then paused. Presumably because JobId 56 had locked the database. After 10:23:20, the command printed the rest of its information and gave me the prompt back. At about Apr 6 10:00, I logged into mysql, and ran 'show full processlist;', which gave the following. | Id | User | Host | db | Command | Time | State | Info | 3032 | root | localhost | bacula | Query | 22464 | Locked| UPDATE Job SE T JobStatus='R',Level='F',StartTime='2010-04-06 03:56:14',ClientId=2,JobTDate=12 70522574,PoolId=7,FileSetId=2 WHERE JobId=56 | 3033 | root | localhost | bacula | Query | 22464 | executing | SELECT Path.P ath, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS Filena meId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate, PathId, Fil enameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job USING (Job Id) WHERE File.JobId IN (22,23,31,34,42,48,52) UNION ALL SELECT JobTDate, PathId , FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job ON (BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52) ) AS tmp GROUP BY P athId, FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM B aseFiles WHERE JobId IN (22,23,31,34,42,48,52)) OR Job.JobId IN (22,23,31,34,42, 48,52)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename ON (Fi lename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WH ERE FileIndex 0 ORDER BY Temp.JobId, FileIndex ASC | 'Time 22464' is equivalent to about 6 hours, so I presume that after 'Bootstrap records written', bacula has been waiting for this sql command, during which time it could not do anything else. Backups of other clients were failing because I had Max Wait Time set to something less than 6 hours. The following is a listing of the number of files that each previous backup said it had backed up, which doesn't seem extraordinarily large to my mind. Full: 810,400 Incr: 52,487 Incr: 52,485 Incr: 52,485 Incr: 332,649 Incr: 52,492 Incr: 52,593 Incr: 332,677 Incr: 52,497 Incr: 52,497 Incr: 52,497 So, my questions are: Does it really need to take so long? Is it expected that this Job blocks everything else for 6 hours? I think that this part of the code doesn't use a dedicated connection, it would be nice to improve this area. (other part of the code are using a specific connection to do this work) Graham Keeling wrote: Is there anything I can do that would speed it up? Perhaps even more importantly, what was it doing for the 33 hour period between Apr 4 18:56:02 and Apr 6 03:56:12? Was it just writing the bootstrap records? On the Path table, I have these: PRIMARY KEY(PathId), INDEX (Path(255)) On the File table, I have these: PRIMARY KEY(FileId), INDEX (JobId, PathId, FilenameId), INDEX (PathId), INDEX (FilenameId) On the Job table, I have these: PRIMARY KEY(JobId), INDEX (Name(128)), INDEX (ClientId) Did you run tests with the original schema? It looks like you removed important index such as the one on JobId and you added others, and i'm not sure that MySQL can play properly with composed indexes (it doesn't looks like) in this treatment. In future, to avoid generating this sort of complaint, I will not fiddle with the default bacula indexes. However, I think that the problem is not to do with this. I shall explain below... I've made tests with 400,000,000 files and 40,000,000 filenames over 600 jobs, and it doesn't run fast as PostgreSQL (factor 6 between both engines), but it should handle the job rather
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Graham Keeling wrote: On Tue, Apr 06, 2010 at 09:01:13AM -0700, ebollengier wrote: Hello Graham, Hello, thanks for your reply. Graham Keeling wrote: Hello, I'm using bacula-5.0.1. I have a 2.33GHz CPU with 2G of RAM. I am using MySQL. I had a VirtualFull scheduled for my client. My log says the following: Apr 4 18:56:02 Start Virtual Backup JobId 56, Job=Linux:cvs.2010-04-04_18.56.00_03 Apr 4 18:56:02 This Job is not an Accurate backup so is not equivalent to a Full backup. Apr 6 03:56:12 Bootstrap records written to /var/lib/bacula/backup.dev.equiinet.com-.restore.1.bsr Apr 6 10:23:20 Ready to read from volume backup-0002 on device Tower 2.0 (/write/mnt/Tower 2). Apr 6 10:23:20 Labeled new Volume backup-0046 on device Tower 2.1 (/write/mnt/Tower 2). ...and now, backup-0046 is being written to. At about Apr 6 10:00, I was logged on to the director, and attempting to run a 'status, director' command on bconsole. bconsole printed a few bits of information, then paused. Presumably because JobId 56 had locked the database. After 10:23:20, the command printed the rest of its information and gave me the prompt back. At about Apr 6 10:00, I logged into mysql, and ran 'show full processlist;', which gave the following. | Id | User | Host | db | Command | Time | State | Info | 3032 | root | localhost | bacula | Query | 22464 | Locked| UPDATE Job SE T JobStatus='R',Level='F',StartTime='2010-04-06 03:56:14',ClientId=2,JobTDate=12 70522574,PoolId=7,FileSetId=2 WHERE JobId=56 | 3033 | root | localhost | bacula | Query | 22464 | executing | SELECT Path.P ath, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS Filena meId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate, PathId, Fil enameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job USING (Job Id) WHERE File.JobId IN (22,23,31,34,42,48,52) UNION ALL SELECT JobTDate, PathId , FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job ON (BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52) ) AS tmp GROUP BY P athId, FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM B aseFiles WHERE JobId IN (22,23,31,34,42,48,52)) OR Job.JobId IN (22,23,31,34,42, 48,52)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename ON (Fi lename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WH ERE FileIndex 0 ORDER BY Temp.JobId, FileIndex ASC | 'Time 22464' is equivalent to about 6 hours, so I presume that after 'Bootstrap records written', bacula has been waiting for this sql command, during which time it could not do anything else. Backups of other clients were failing because I had Max Wait Time set to something less than 6 hours. The following is a listing of the number of files that each previous backup said it had backed up, which doesn't seem extraordinarily large to my mind. Full: 810,400 Incr: 52,487 Incr: 52,485 Incr: 52,485 Incr: 332,649 Incr: 52,492 Incr: 52,593 Incr: 332,677 Incr: 52,497 Incr: 52,497 Incr: 52,497 So, my questions are: Does it really need to take so long? Is it expected that this Job blocks everything else for 6 hours? I think that this part of the code doesn't use a dedicated connection, it would be nice to improve this area. (other part of the code are using a specific connection to do this work) Graham Keeling wrote: Is there anything I can do that would speed it up? Perhaps even more importantly, what was it doing for the 33 hour period between Apr 4 18:56:02 and Apr 6 03:56:12? Was it just writing the bootstrap records? On the Path table, I have these: PRIMARY KEY(PathId), INDEX (Path(255)) On the File table, I have these: PRIMARY KEY(FileId), INDEX (JobId, PathId, FilenameId), INDEX (PathId), INDEX (FilenameId) On the Job table, I have these: PRIMARY KEY(JobId), INDEX (Name(128)), INDEX (ClientId) Did you run tests with the original schema? It looks like you removed important index such as the one on JobId and you added others, and i'm not sure that MySQL can play properly with composed indexes (it doesn't looks like) in this treatment. In future, to avoid generating this sort of complaint, I will not fiddle with the default bacula indexes. However, I think that the problem is not to do with this. I shall explain below... I've made tests with 400,000,000 files and 40,000,000 filenames over 600 jobs, and it doesn't run fast as PostgreSQL (factor 6 between
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
On Wed, 7 Apr 2010 12:40:24 +0100, Graham Keeling said: On Tue, Apr 06, 2010 at 09:01:13AM -0700, ebollengier wrote: Hello Graham, Hello, thanks for your reply. Graham Keeling wrote: Hello, I'm using bacula-5.0.1. I have a 2.33GHz CPU with 2G of RAM. I am using MySQL. I had a VirtualFull scheduled for my client. My log says the following: Apr 4 18:56:02 Start Virtual Backup JobId 56, Job=Linux:cvs.2010-04-04_18.56.00_03 Apr 4 18:56:02 This Job is not an Accurate backup so is not equivalent to a Full backup. Apr 6 03:56:12 Bootstrap records written to /var/lib/bacula/backup.dev.equiinet.com-.restore.1.bsr Apr 6 10:23:20 Ready to read from volume backup-0002 on device Tower 2.0 (/write/mnt/Tower 2). Apr 6 10:23:20 Labeled new Volume backup-0046 on device Tower 2.1 (/write/mnt/Tower 2). ...and now, backup-0046 is being written to. At about Apr 6 10:00, I was logged on to the director, and attempting to run a 'status, director' command on bconsole. bconsole printed a few bits of information, then paused. Presumably because JobId 56 had locked the database. After 10:23:20, the command printed the rest of its information and gave me the prompt back. At about Apr 6 10:00, I logged into mysql, and ran 'show full processlist;', which gave the following. | Id | User | Host | db | Command | Time | State | Info | 3032 | root | localhost | bacula | Query | 22464 | Locked| UPDATE Job SE T JobStatus='R',Level='F',StartTime='2010-04-06 03:56:14',ClientId=2,JobTDate=12 70522574,PoolId=7,FileSetId=2 WHERE JobId=56 | 3033 | root | localhost | bacula | Query | 22464 | executing | SELECT Path.P ath, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS Filena meId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate, PathId, Fil enameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job USING (Job Id) WHERE File.JobId IN (22,23,31,34,42,48,52) UNION ALL SELECT JobTDate, PathId , FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job ON (BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52) ) AS tmp GROUP BY P athId, FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM B aseFiles WHERE JobId IN (22,23,31,34,42,48,52)) OR Job.JobId IN (22,23,31,34,42, 48,52)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename ON (Fi lename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WH ERE FileIndex 0 ORDER BY Temp.JobId, FileIndex ASC | 'Time 22464' is equivalent to about 6 hours, so I presume that after 'Bootstrap records written', bacula has been waiting for this sql command, during which time it could not do anything else. Backups of other clients were failing because I had Max Wait Time set to something less than 6 hours. The following is a listing of the number of files that each previous backup said it had backed up, which doesn't seem extraordinarily large to my mind. Full: 810,400 Incr: 52,487 Incr: 52,485 Incr: 52,485 Incr: 332,649 Incr: 52,492 Incr: 52,593 Incr: 332,677 Incr: 52,497 Incr: 52,497 Incr: 52,497 So, my questions are: Does it really need to take so long? Is it expected that this Job blocks everything else for 6 hours? I think that this part of the code doesn't use a dedicated connection, it would be nice to improve this area. (other part of the code are using a specific connection to do this work) Graham Keeling wrote: Is there anything I can do that would speed it up? Perhaps even more importantly, what was it doing for the 33 hour period between Apr 4 18:56:02 and Apr 6 03:56:12? Was it just writing the bootstrap records? On the Path table, I have these: PRIMARY KEY(PathId), INDEX (Path(255)) On the File table, I have these: PRIMARY KEY(FileId), INDEX (JobId, PathId, FilenameId), INDEX (PathId), INDEX (FilenameId) On the Job table, I have these: PRIMARY KEY(JobId), INDEX (Name(128)), INDEX (ClientId) Did you run tests with the original schema? It looks like you removed important index such as the one on JobId and you added others, and i'm not sure that MySQL can play properly with composed indexes (it doesn't looks like) in this treatment. In future, to avoid generating this sort of complaint, I will not fiddle with the default bacula indexes. However, I think that the problem is not to do with this. I shall
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
On Wed, Apr 07, 2010 at 02:51:42PM +0100, Martin Simmons wrote: Does it still run quickly if keep that Job.JobId IN clause but use the numbers returned by SELECT DISTINCT BaseJobId FROM BaseFiles WHERE JobId IN (22,23,31,34,42,48,52) in place of the the nested select? In my case, there are no numbers returned. But, I assume that you are suggesting something like the command below. For me, it completes in under a second. SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS FilenameId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate, PathId, FilenameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job USING (JobId) WHERE File.JobId IN (22,23,31,34,42,48,52) UNION ALL SELECT JobTDate, PathId, FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job ON(BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52) ) AS tmp GROUP BY PathId, FilenameId ) AS T1 WHERE ( Job.JobId IN (22,23,31,34,42,48,52) OR Job.JobId IN (22,23,31,34,42,48,52) ) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename ON (Filename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WHERE FileIndex 0 ORDER BY Temp.JobId, FileIndex ASC -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Since: a) I am not using Base jobs b) I am currently stuck with using MySQL c) There is not a 'proper' fix yet I am going to use the attached patch as a temporary solution to the problem. Index: src/cats/sql_cmds.c === RCS file: /cvs/netpilot/GPL/bacula-5.0.1/WORK/src/cats/sql_cmds.c,v retrieving revision 1.2 diff -u -r1.2 sql_cmds.c --- src/cats/sql_cmds.c 26 Mar 2010 10:36:03 - 1.2 +++ src/cats/sql_cmds.c 7 Apr 2010 13:56:10 - @@ -481,9 +481,7 @@ WHERE BaseFiles.JobId IN (%s) /* Use Max(JobTDate) to find */ ) AS tmp GROUP BY PathId, FilenameId/* the latest file version */ ) AS T1 -WHERE (Job.JobId IN ( /* Security, we force JobId to be valid */ -SELECT DISTINCT BaseJobId FROM BaseFiles WHERE JobId IN (%s)) -OR Job.JobId IN (%s)) +WHERE Job.JobId IN (%s) AND T1.JobTDate = Job.JobTDate /* Join on JobTDate to get the orginal */ AND Job.JobId = File.JobId /* Job/File record */ AND T1.PathId = File.PathId Index: src/cats/sql_get.c === RCS file: /cvs/netpilot/GPL/bacula-5.0.1/WORK/src/cats/sql_get.c,v retrieving revision 1.1 diff -u -r1.1 sql_get.c --- src/cats/sql_get.c 26 Mar 2010 10:20:41 - 1.1 +++ src/cats/sql_get.c 7 Apr 2010 13:56:10 - @@ -,7 +,7 @@ #ifdef new_db_get_file_list POOL_MEM buf2(PM_MESSAGE); Mmsg(buf2, select_recent_version_with_basejob[db_type], -jobids, jobids, jobids, jobids); +jobids, jobids, jobids); Mmsg(buf, SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( %s ) AS Temp -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
ebollengier wrote: Graham Keeling wrote: On Tue, Apr 06, 2010 at 09:01:13AM -0700, ebollengier wrote: Hello Graham, Hello, thanks for your reply. Graham Keeling wrote: Is there anything I can do that would speed it up? Perhaps even more importantly, what was it doing for the 33 hour period between Apr 4 18:56:02 and Apr 6 03:56:12? Was it just writing the bootstrap records? On the Path table, I have these: PRIMARY KEY(PathId), INDEX (Path(255)) On the File table, I have these: PRIMARY KEY(FileId), INDEX (JobId, PathId, FilenameId), INDEX (PathId), INDEX (FilenameId) On the Job table, I have these: PRIMARY KEY(JobId), INDEX (Name(128)), INDEX (ClientId) Did you run tests with the original schema? It looks like you removed important index such as the one on JobId and you added others, and i'm not sure that MySQL can play properly with composed indexes (it doesn't looks like) in this treatment. In future, to avoid generating this sort of complaint, I will not fiddle with the default bacula indexes. However, I think that the problem is not to do with this. I shall explain below... I've made tests with 400,000,000 files and 40,000,000 filenames over 600 jobs, and it doesn't run fast as PostgreSQL (factor 6 between both engines), but it should handle the job rather quickly. (on my pc, for 2M files, it's between 1min and 1min30 (15seconds with PostgreSQL)) Here is the command that mysql takes hours over, and hence the one that I am using to test. In all of my tests, the result is the same whether I am using the default bacula indexes or my additional ones. SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS FilenameId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate, PathId, FilenameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job USING (JobId) WHERE File.JobId IN (22,23,31,34,42,48,52) UNION ALL SELECT JobTDate, PathId, FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job ON(BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52) ) AS tmp GROUP BY PathId, FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM BaseFiles WHERE JobId IN (22,23,31,34,42,48,52) ) ) OR Job.JobId IN (22,23,31,34,42,48,52) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename ON (Filename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WHERE FileIndex 0 ORDER BY Temp.JobId, FileIndex ASC; Since I am not using base jobs, I did the experiment of removing the lines from the query to do with base jobs. When I did that, the query changed from taking hours, to taking under a second. I then tracked it down to this specific part (i.e, remove this, and get a truly magnificent speed-up): (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM BaseFiles WHERE JobId IN (22,23,31,34,42,48,52) ) ) OR I think that mysql is repeatedly calling this section for each row that the SELECT returns. Now the question is what to do about it? It's very interesting, in my tests (with and without basejobs), it changes nothing... But i will check it again. (MySQL have a query cache, and if you run two times the same thing, the result is very fast) As we are using JobTDate to find the last record of each file, we use it to get JobId back from the Job table (yes, ugly, but MySQL doesn't have DISTINCT ON()), and if two jobs have the same JobTDate, we are in trouble. So, i limit the selection to this JobId list. Hope that you found the problem, it's rather easy to compute this list once, and use it after. Bye I tweaked my test to compare both queries, and it shows no difference with and without base job part... If you want to test queries on your side with your data, you can download my tool (accurate-test.pl) on http://bacula.git.sourceforge.net/git/gitweb.cgi?p=bacula/docs;a=tree;f=docs/techlogs;hb=HEAD If you can tweak my script to reproduce your problem, i would be able to fix things. http://old.nabble.com/file/p28166612/diff-with-without-basejob.png Bye -- View this message in context:
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
On Wed, Apr 07, 2010 at 08:22:09AM -0700, ebollengier wrote: I tweaked my test to compare both queries, and it shows no difference with and without base job part... If you want to test queries on your side with your data, you can download my tool (accurate-test.pl) on http://bacula.git.sourceforge.net/git/gitweb.cgi?p=bacula/docs;a=tree;f=docs/techlogs;hb=HEAD If you can tweak my script to reproduce your problem, i would be able to fix things. http://old.nabble.com/file/p28166612/diff-with-without-basejob.png I'm currently running your script to generate the test database. I think that is going to take a long time, so I'll leave it overnight. Meanwhile (unless I misunderstand), I think that your graph is labelled strangely. When I'm generating the file records, I get this: ... JobId=10 files=20 Insert takes 445secs for 200 records ... JobId=13 files=2 Insert takes 54secs for 20 records ... So, JobIds 10 + 13: 22 files 220 records Your script says: do_bench(10,13, 220); This is obviously the records number. Your graph uses this bigger number and labels it 'Files'. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Graham Keeling wrote: On Wed, Apr 07, 2010 at 08:22:09AM -0700, ebollengier wrote: I tweaked my test to compare both queries, and it shows no difference with and without base job part... If you want to test queries on your side with your data, you can download my tool (accurate-test.pl) on http://bacula.git.sourceforge.net/git/gitweb.cgi?p=bacula/docs;a=tree;f=docs/techlogs;hb=HEAD If you can tweak my script to reproduce your problem, i would be able to fix things. http://old.nabble.com/file/p28166612/diff-with-without-basejob.png I'm currently running your script to generate the test database. I think that is going to take a long time, so I'll leave it overnight. This is your first problem, on my server (just a workstation), it takes less than 10s to add 200,000 records... Meanwhile (unless I misunderstand), I think that your graph is labelled strangely. When I'm generating the file records, I get this: ... JobId=10 files=20 Insert takes 445secs for 200 records ... JobId=13 files=2 Insert takes 54secs for 20 records ... So, JobIds 10 + 13: 22 files 220 records Your script says: do_bench(10,13, 220); This is obviously the records number. Your graph uses this bigger number and labels it 'Files'. The accurate query on jobid=10,13 will work on 2,000,000 + 200,000 files (this is the argument), as each filenames are different, you can expect to retrieve 2,2M files. It's quite possible that it miss a *10 mutiplication somewhere in debug messages (I insert files 10 by 10) Bye -- View this message in context: http://old.nabble.com/VirtualFull-mysql-query-blocks-other-jobs-for-a-long-time-tp28149748p28167751.html Sent from the Bacula - Users mailing list archive at Nabble.com. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
ebollengier wrote: Graham Keeling wrote: On Wed, Apr 07, 2010 at 08:22:09AM -0700, ebollengier wrote: I tweaked my test to compare both queries, and it shows no difference with and without base job part... If you want to test queries on your side with your data, you can download my tool (accurate-test.pl) on http://bacula.git.sourceforge.net/git/gitweb.cgi?p=bacula/docs;a=tree;f=docs/techlogs;hb=HEAD If you can tweak my script to reproduce your problem, i would be able to fix things. http://old.nabble.com/file/p28166612/diff-with-without-basejob.png I'm currently running your script to generate the test database. I think that is going to take a long time, so I'll leave it overnight. This is your first problem, on my server (just a workstation), it takes less than 10s to add 200,000 records... In fact, it's 4seconds, here... JobId=15 files=2 Insert takes 4secs for 20 records JobId=16 files=5 Insert takes 11secs for 50 records JobId=20 files=20 Insert takes 43secs for 200 records -- View this message in context: http://old.nabble.com/VirtualFull-mysql-query-blocks-other-jobs-for-a-long-time-tp28149748p28167844.html Sent from the Bacula - Users mailing list archive at Nabble.com. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Hello, I'm using bacula-5.0.1. I have a 2.33GHz CPU with 2G of RAM. I am using MySQL. I had a VirtualFull scheduled for my client. My log says the following: Apr 4 18:56:02 Start Virtual Backup JobId 56, Job=Linux:cvs.2010-04-04_18.56.00_03 Apr 4 18:56:02 This Job is not an Accurate backup so is not equivalent to a Full backup. Apr 6 03:56:12 Bootstrap records written to /var/lib/bacula/backup.dev.equiinet.com-.restore.1.bsr Apr 6 10:23:20 Ready to read from volume backup-0002 on device Tower 2.0 (/write/mnt/Tower 2). Apr 6 10:23:20 Labeled new Volume backup-0046 on device Tower 2.1 (/write/mnt/Tower 2). ...and now, backup-0046 is being written to. At about Apr 6 10:00, I was logged on to the director, and attempting to run a 'status, director' command on bconsole. bconsole printed a few bits of information, then paused. Presumably because JobId 56 had locked the database. After 10:23:20, the command printed the rest of its information and gave me the prompt back. At about Apr 6 10:00, I logged into mysql, and ran 'show full processlist;', which gave the following. | Id | User | Host | db | Command | Time | State | Info | 3032 | root | localhost | bacula | Query | 22464 | Locked| UPDATE Job SE T JobStatus='R',Level='F',StartTime='2010-04-06 03:56:14',ClientId=2,JobTDate=12 70522574,PoolId=7,FileSetId=2 WHERE JobId=56 | 3033 | root | localhost | bacula | Query | 22464 | executing | SELECT Path.P ath, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS Filena meId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate, PathId, Fil enameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job USING (Job Id) WHERE File.JobId IN (22,23,31,34,42,48,52) UNION ALL SELECT JobTDate, PathId , FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job ON(BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52) ) AS tmp GROUP BY P athId, FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM B aseFiles WHERE JobId IN (22,23,31,34,42,48,52)) OR Job.JobId IN (22,23,31,34,42, 48,52)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename ON (Fi lename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WH ERE FileIndex 0 ORDER BY Temp.JobId, FileIndex ASC | 'Time 22464' is equivalent to about 6 hours, so I presume that after 'Bootstrap records written', bacula has been waiting for this sql command, during which time it could not do anything else. Backups of other clients were failing because I had Max Wait Time set to something less than 6 hours. The following is a listing of the number of files that each previous backup said it had backed up, which doesn't seem extraordinarily large to my mind. Full: 810,400 Incr: 52,487 Incr: 52,485 Incr: 52,485 Incr: 332,649 Incr: 52,492 Incr: 52,593 Incr: 332,677 Incr: 52,497 Incr: 52,497 Incr: 52,497 So, my questions are: Does it really need to take so long? Is it expected that this Job blocks everything else for 6 hours? Is there anything I can do that would speed it up? Perhaps even more importantly, what was it doing for the 33 hour period between Apr 4 18:56:02 and Apr 6 03:56:12? Was it just writing the bootstrap records? On the Path table, I have these: PRIMARY KEY(PathId), INDEX (Path(255)) On the File table, I have these: PRIMARY KEY(FileId), INDEX (JobId, PathId, FilenameId), INDEX (PathId), INDEX (FilenameId) On the Job table, I have these: PRIMARY KEY(JobId), INDEX (Name(128)), INDEX (ClientId) -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
On Tue, Apr 06, 2010 at 12:59:49PM +0200, Matija Nalis wrote: On Tue, Apr 06, 2010 at 11:13:44AM +0100, Graham Keeling wrote: I'm using bacula-5.0.1. At about Apr 6 10:00, I logged into mysql, and ran 'show full processlist;', which gave the following. | 3033 | root | localhost | bacula | Query | 22464 | executing | SELECT Path.P ath, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS Filena meId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate, PathId, Fil enameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job USING (Job Id) WHERE File.JobId IN (22,23,31,34,42,48,52) UNION ALL SELECT JobTDate, PathId , FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job ON (BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52) ) AS tmp GROUP BY P athId, FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM B aseFiles WHERE JobId IN (22,23,31,34,42,48,52)) OR Job.JobId IN (22,23,31,34,42, 48,52)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename ON (Fi lename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WH ERE FileIndex 0 ORDER BY Temp.JobId, FileIndex ASC | 'Time 22464' is equivalent to about 6 hours, so I presume that after 'Bootstrap records written', bacula has been waiting for this sql command, during which time it could not do anything else. Backups of other clients were failing because I had Max Wait Time set to something less than 6 hours. Yes, it happens, especially with bigger File tables - what does Select count(*) from File returns for you ? 3239328 This slowdown was not present in 3.0.3 but is new to 5.0.x series and has been documented on http://bugs.bacula.org/view.php?id=1472 which is in process of being debugged currently Thanks for that. A colleague of mine did exactly start several jobs; very soon the query gets run and everything else stops this morning. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
Hello Graham, Graham Keeling wrote: Hello, I'm using bacula-5.0.1. I have a 2.33GHz CPU with 2G of RAM. I am using MySQL. I had a VirtualFull scheduled for my client. My log says the following: Apr 4 18:56:02 Start Virtual Backup JobId 56, Job=Linux:cvs.2010-04-04_18.56.00_03 Apr 4 18:56:02 This Job is not an Accurate backup so is not equivalent to a Full backup. Apr 6 03:56:12 Bootstrap records written to /var/lib/bacula/backup.dev.equiinet.com-.restore.1.bsr Apr 6 10:23:20 Ready to read from volume backup-0002 on device Tower 2.0 (/write/mnt/Tower 2). Apr 6 10:23:20 Labeled new Volume backup-0046 on device Tower 2.1 (/write/mnt/Tower 2). ...and now, backup-0046 is being written to. At about Apr 6 10:00, I was logged on to the director, and attempting to run a 'status, director' command on bconsole. bconsole printed a few bits of information, then paused. Presumably because JobId 56 had locked the database. After 10:23:20, the command printed the rest of its information and gave me the prompt back. At about Apr 6 10:00, I logged into mysql, and ran 'show full processlist;', which gave the following. | Id | User | Host | db | Command | Time | State | Info | 3032 | root | localhost | bacula | Query | 22464 | Locked| UPDATE Job SE T JobStatus='R',Level='F',StartTime='2010-04-06 03:56:14',ClientId=2,JobTDate=12 70522574,PoolId=7,FileSetId=2 WHERE JobId=56 | 3033 | root | localhost | bacula | Query | 22464 | executing | SELECT Path.P ath, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS Filena meId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate, PathId, Fil enameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job USING (Job Id) WHERE File.JobId IN (22,23,31,34,42,48,52) UNION ALL SELECT JobTDate, PathId , FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job ON (BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52) ) AS tmp GROUP BY P athId, FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM B aseFiles WHERE JobId IN (22,23,31,34,42,48,52)) OR Job.JobId IN (22,23,31,34,42, 48,52)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename ON (Fi lename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId = Temp.PathId) WH ERE FileIndex 0 ORDER BY Temp.JobId, FileIndex ASC | 'Time 22464' is equivalent to about 6 hours, so I presume that after 'Bootstrap records written', bacula has been waiting for this sql command, during which time it could not do anything else. Backups of other clients were failing because I had Max Wait Time set to something less than 6 hours. The following is a listing of the number of files that each previous backup said it had backed up, which doesn't seem extraordinarily large to my mind. Full: 810,400 Incr: 52,487 Incr: 52,485 Incr: 52,485 Incr: 332,649 Incr: 52,492 Incr: 52,593 Incr: 332,677 Incr: 52,497 Incr: 52,497 Incr: 52,497 So, my questions are: Does it really need to take so long? Is it expected that this Job blocks everything else for 6 hours? I think that this part of the code doesn't use a dedicated connection, it would be nice to improve this area. (other part of the code are using a specific connection to do this work) Graham Keeling wrote: Is there anything I can do that would speed it up? Perhaps even more importantly, what was it doing for the 33 hour period between Apr 4 18:56:02 and Apr 6 03:56:12? Was it just writing the bootstrap records? On the Path table, I have these: PRIMARY KEY(PathId), INDEX (Path(255)) On the File table, I have these: PRIMARY KEY(FileId), INDEX (JobId, PathId, FilenameId), INDEX (PathId), INDEX (FilenameId) On the Job table, I have these: PRIMARY KEY(JobId), INDEX (Name(128)), INDEX (ClientId) Did you run tests with the original schema? It looks like you removed important index such as the one on JobId and you added others, and i'm not sure that MySQL can play properly with composed indexes (it doesn't looks like) in this treatment. I've made tests with 400,000,000 files and 40,000,000 filenames over 600 jobs, and it doesn't run fast as PostgreSQL (factor 6 between both engines), but it should handle the job rather quickly. (on my pc, for 2M files, it's between 1min and 1min30 (15seconds with PostgreSQL)) Bye -- View this message in context: http://old.nabble.com/VirtualFull-mysql-query-blocks-other-jobs-for-a-long-time-tp28149748p28153906.html Sent from the Bacula - Users mailing list archive at Nabble.com. -- Download Intel#174; Parallel Studio