Re: [rt-users] Option to store attachments on the filesystem
On 12/22/2011 12:43 PM, Joe Harris wrote: I am looking into this type of functionality as well. We were thinking of an NSF share in a web directory to drop the attachment with a way to drop a link within the ticket. So the attachments may not even exist on the RT server, but there will be links in the ticket to a web server that houses the attachment. It'd be nice if this was the case, but the big problem with having the web server directly serve your attachments is that suddenly you lose _all_ access control that RT normally provides around attachments. Even if attachments are stored on disk, they must be _served_ by RT, not the web server directly. Thomas RT Training Sessions (http://bestpractical.com/services/training.html) * Boston March 5 6, 2012
Re: [rt-users] Option to store attachments on the filesystem
Please keep replies on the list. For the record, I'm not claiming that core RT shouldn't support attachments on disk in the future. I'm just trying to give you the relevant info for right now. On 12/22/2011 08:58 PM, Geoff Mayes wrote: I searched but struck out. Could you provide some links? Why isn' it public? Any way I can take a look at it? :) The extension was originally the result of customer work, and it hasn't been made public. Will it be made public? We don't know yet. For the time being, you'll need to contact sa...@bestpractical.com if you're interested in it. The mailing list threads I found just now with a search: http://www.gossamer-threads.com/lists/rt/users/92667 http://www.gossamer-threads.com/lists/rt/users/100964 Is it that non-trivial? The Bugzilla in-house Attachments.pm module we used to use was 200 lines of Perl and that handled the main 8TB attachments datastore as well as the archives 30TB datastore, sorting out discrepancies between the two. And larger organizations will have the resources and expertise to do these kinds of things easily, so if RT really is for organizations of all sizes, then how does it cater to the non-trivial users in this matter? Larger organizations take a variety of steps to help ensure RT performs well. Yes, occasionally that includes putting attachments on disk, but it also includes good database tuning and many other tweaks before that. The fact is that a 15 or 20GB database is simply not large at all; 15-20GB fits on a single USB flash drive. I just don't see how keeping attachments in the database is *optimal* for backing up and restoring the database when the database gets beyond 10-20GB, especially when an organization hasn't paid Oracle $5,000 so they can run hotcopy on their InnoDB databases. As Ken from rice.edu said earlier, there are smarter backup solutions than dumping/restoring the entire DB every single time. He named a couple. As for Oracle's $5k tool, you can get similar results with Percona's completely free, open-source hot backup/copy tool for InnoDB called XtraBackup: http://www.percona.com/software/percona-xtrabackup/ How big are their databases and attachment tables? How do they do backups and restores? What are their disaster recovery plans? These are customers that we can't provide that information on. There was at least one thread in 2010 or 2011 on the mailing list asking people to contribute stats about the sizes of their RT instance. Some notably large ones came up. Thomas RT Training Sessions (http://bestpractical.com/services/training.html) * Boston March 5 6, 2012
Re: [rt-users] Option to store attachments on the filesystem
On Wed, Dec 21, 2011 at 11:12:04PM +, Geoff Mayes wrote: Hello RT Users and Developers, Our RT instance at the University of Oregon is outgrowing the standard settings in some ways. One way is with attachments. The size of our database is 15.3GB and 13.7GB of that comes from the Attachments table. If our attachments were stored on a high-performance fileserver (or locally if you prefer), our database would shrink to 1.6GB. This would have numerous positive ramifications: - Database dumps/backups would finish in 1/10 the time - Database restores would finish in 1/10 the time - Planned downtimes and disaster recovery situations could be more nimbly performed (scp'ing around the db dump, restoring, etc) - Backups could be taken much more frequently - More backups could be stored - MySQL replication would be more robust with less binary data to chew on - Larger attachments could be permitted because there would be less fear of the database growing too quickly - Reduced database load querying/inserting/deleting/joining attachments I've read in previous posts to this mailing list (see below) that the arguments against this are that (1) attachments on the filesystem can't be searched and (2) the data backing the application will not be in one tidy database package but instead spread out across the db and filesystem. For our instance we don't care about #1, and for #2, while I understand the argument, I would actually argue the opposite: when attachments are on a high-performance, redundant SAN managed by a dedicated storage team that I don't have to worry about, my job administering RT just got a whole lot easier because I only have to worry about ensuring the fileserver is mounted and $AttachmentsPath (just an example config option) is properly set. I worked previously at a company that ran one of the largest instances of Bugzilla in the world and we served up 30TB of attachments over a fileserver without any problems. Can you imagine those attachments in a MySQL database? When ticket tracking sys te ms are no longer small-ish, moving attachments out of the database becomes a must. I'm not asking the RT folks to switch attachment storage to the filesystem instead of the database. My wish is that RT offers its administrators the ability to choose one or the other. I know this has been a hot topic in the past, but I was hoping we could revisit the issue. Best Practical folks -- are you open to this? If so, would it help the process if I did all the work and submitted a patch? If so, should I file a bug so that we can talk about the way you would like this implemented? Given my reading of the history of this issue, I think a lot of folks would benefit from this feature. I've included previous postings about this issue below. Let me know if I can help and how I can. We would love to upstream a patch so our local instance doesn't diverge too severely from you all. Thanks for your consideration, Geoff Mayes One of the first, meaty discussions: http://www.gossamer-threads.com/lists/rt/devel/706 http://www.gossamer-threads.com/lists/rt/devel/37733 http://www.gossamer-threads.com/lists/rt/users/39507 The best discussion of the issue: http://www.gossamer-threads.com/lists/rt/users/67406 Best Practical has recently worked on this issue: http://www.gossamer-threads.com/lists/rt/users/89596 Hi Geoff, I had thought that something like this had already been implemented by Best Practical for a customer. Hopefully, they can provide some feedback regarding the utility and possible problems of such an approach from personal experience. Maybe they would consider releasing it as an extenstion. As far as the assertion that a lot of folks would benefit from this feature, I doubt that would be the case for the vast majority of RT users. Most users can handle one-stop-shopping type applications with far fewer problems. Once you divorce the metadate repository from the actual ticket data, you add a whole slew of different failure modes that will require much more sophisticated administration processes to prevent, ameliorate, or recover from. Your reference to leveraging an existing SAN+SAN management team gives a hint to the increase in both complexity and cost of running an instance. There are a wide range of RT users from systems that manage a handful of tickets a week all the way to systems handling thousands of tickets or more a week. Those on the small end can/should use whatever DB backend that they are familiar with to simplify administration and the what did I do?! errors due to a lack of familiarity. As you move towards larger implementations, your DB backend needs to be chosen based on it viability in an enterprise/large-scale environment. I do not know the level of your local MySQL expertise and I am certainly not a MySQL expert, but a 15GB database does not strike me as particularly large, by any
Re: [rt-users] Option to store attachments on the filesystem
I am looking into this type of functionality as well. We were thinking of an NSF share in a web directory to drop the attachment with a way to drop a link within the ticket. So the attachments may not even exist on the RT server, but there will be links in the ticket to a web server that houses the attachment. On Dec 22, 2011, at 9:42 AM, k...@rice.edu k...@rice.edu wrote: On Wed, Dec 21, 2011 at 11:12:04PM +, Geoff Mayes wrote: Hello RT Users and Developers, Our RT instance at the University of Oregon is outgrowing the standard settings in some ways. One way is with attachments. The size of our database is 15.3GB and 13.7GB of that comes from the Attachments table. If our attachments were stored on a high-performance fileserver (or locally if you prefer), our database would shrink to 1.6GB. This would have numerous positive ramifications: - Database dumps/backups would finish in 1/10 the time - Database restores would finish in 1/10 the time - Planned downtimes and disaster recovery situations could be more nimbly performed (scp'ing around the db dump, restoring, etc) - Backups could be taken much more frequently - More backups could be stored - MySQL replication would be more robust with less binary data to chew on - Larger attachments could be permitted because there would be less fear of the database growing too quickly - Reduced database load querying/inserting/deleting/joining attachments I've read in previous posts to this mailing list (see below) that the arguments against this are that (1) attachments on the filesystem can't be searched and (2) the data backing the application will not be in one tidy database package but instead spread out across the db and filesystem. For our instance we don't care about #1, and for #2, while I understand the argument, I would actually argue the opposite: when attachments are on a high-performance, redundant SAN managed by a dedicated storage team that I don't have to worry about, my job administering RT just got a whole lot easier because I only have to worry about ensuring the fileserver is mounted and $AttachmentsPath (just an example config option) is properly set. I worked previously at a company that ran one of the largest instances of Bugzilla in the world and we served up 30TB of attachments over a fileserver without any problems. Can you imagine those attachments in a MySQL database? When ticket tracking sy s te ms are no longer small-ish, moving attachments out of the database becomes a must. I'm not asking the RT folks to switch attachment storage to the filesystem instead of the database. My wish is that RT offers its administrators the ability to choose one or the other. I know this has been a hot topic in the past, but I was hoping we could revisit the issue. Best Practical folks -- are you open to this? If so, would it help the process if I did all the work and submitted a patch? If so, should I file a bug so that we can talk about the way you would like this implemented? Given my reading of the history of this issue, I think a lot of folks would benefit from this feature. I've included previous postings about this issue below. Let me know if I can help and how I can. We would love to upstream a patch so our local instance doesn't diverge too severely from you all. Thanks for your consideration, Geoff Mayes One of the first, meaty discussions: http://www.gossamer-threads.com/lists/rt/devel/706 http://www.gossamer-threads.com/lists/rt/devel/37733 http://www.gossamer-threads.com/lists/rt/users/39507 The best discussion of the issue: http://www.gossamer-threads.com/lists/rt/users/67406 Best Practical has recently worked on this issue: http://www.gossamer-threads.com/lists/rt/users/89596 Hi Geoff, I had thought that something like this had already been implemented by Best Practical for a customer. Hopefully, they can provide some feedback regarding the utility and possible problems of such an approach from personal experience. Maybe they would consider releasing it as an extenstion. As far as the assertion that a lot of folks would benefit from this feature, I doubt that would be the case for the vast majority of RT users. Most users can handle one-stop-shopping type applications with far fewer problems. Once you divorce the metadate repository from the actual ticket data, you add a whole slew of different failure modes that will require much more sophisticated administration processes to prevent, ameliorate, or recover from. Your reference to leveraging an existing SAN+SAN management team gives a hint to the increase in both complexity and cost of running an instance. There are a wide range of RT users from systems that manage a handful of tickets a week all the way to systems handling thousands of tickets or more a week. Those on the small end can/should use whatever DB backend that they are
Re: [rt-users] Option to store attachments on the filesystem
, I do know that there are big performance gains that come from storing attachments outside of the database. Check out one of Bugzilla's core developers discussion of this issue and their work-in-progress implementation: https://bugzilla.mozilla.org/show_bug.cgi?id=577532. So moving attachments out of the database *is* an actual tuning option, just like the other options you mentioned. Why do something drastic like changing the database backend or performing complicated and expert-level tuning/sharding/partitioning, when I could just add a few config options to RT_SiteConfig.pm and run a script (for a pre-existing instance) that then sets up my instance to serve attachments from a filesystem instead of the db? Here's one recent example of how our current database size is negatively impacting us: We upgraded from 3.8.4 to 4.0.4 yesterday and it took almost an hour to dump our database and almost an hour to import the database (we were upgrading MySQL and the OSes as well). And then we had to import it again because max_packet_size was set too small (which wouldn't have been a problem if attachments were outside the db: anecdotal and not logical argument, but nonetheless a real-world occurrence as errors happen) so add another hour instead of only another 10 minutes. If attachments were stored outside of the database, we could have reduced just the backup and import phases from 3 hours to 20 minutes. That is a huge difference, especially when your application is used by thousands of customers waiting to log back in. The positive ramifications continue: internal development of RT is much faster with a small database because we can copy them around the network faster, perform impo rts in 1/10th the time, and keep our development database up-to-date much easier. If someone knew of a simpler way to cut the dump and restore times by 1/10, I would love to hear it and be totally open to a different solution. The main point I would like to restate is that larger or quickly-growing instances of RT are very different than smaller or slowly-growing instances. One pain point of the larger instances is the size of the database and how that affects backups, restores, disaster recoveries, and development. Having the option to store attachments outside of the database allows the larger RT instances to more easily manage their data for a much longer period of time. Most importantly for the Best Practical folks, this option increase the appeal of RT to larger organizations instead of the small- to medium-sized market as stated at http://requesttracker.wikia.com/wiki/ManualIntroduction. The addition of this feature along with the recent SphinxSE option truly makes RT more feasible and attractive to larger organizations. Kind regards, Geoff Mayes From: rt-users-boun...@lists.bestpractical.com [rt-users-boun...@lists.bestpractical.com] on behalf of Joe Harris [drey...@gmail.com] Sent: Thursday, December 22, 2011 9:43 AM To: rt-users@lists.bestpractical.com Subject: Re: [rt-users] Option to store attachments on the filesystem I am looking into this type of functionality as well. We were thinking of an NSF share in a web directory to drop the attachment with a way to drop a link within the ticket. So the attachments may not even exist on the RT server, but there will be links in the ticket to a web server that houses the attachment. On Dec 22, 2011, at 9:42 AM, k...@rice.edu k...@rice.edu wrote: On Wed, Dec 21, 2011 at 11:12:04PM +, Geoff Mayes wrote: Hello RT Users and Developers, Our RT instance at the University of Oregon is outgrowing the standard settings in some ways. One way is with attachments. The size of our database is 15.3GB and 13.7GB of that comes from the Attachments table. If our attachments were stored on a high-performance fileserver (or locally if you prefer), our database would shrink to 1.6GB. This would have numerous positive ramifications: - Database dumps/backups would finish in 1/10 the time - Database restores would finish in 1/10 the time - Planned downtimes and disaster recovery situations could be more nimbly performed (scp'ing around the db dump, restoring, etc) - Backups could be taken much more frequently - More backups could be stored - MySQL replication would be more robust with less binary data to chew on - Larger attachments could be permitted because there would be less fear of the database growing too quickly - Reduced database load querying/inserting/deleting/joining attachments I've read in previous posts to this mailing list (see below) that the arguments against this are that (1) attachments on the filesystem can't be searched and (2) the data backing the application will not be in one tidy database package but instead spread out across the db and filesystem. For our instance we don't care about #1, and for #2, while I understand the argument
Re: [rt-users] Option to store attachments on the filesystem
On Thu, Dec 22, 2011 at 02:17:00PM -0700, Brent Wiese wrote: Totally agree with this. An option to store attachments on the filesystem, however, is database-agnostic, so RT admins can select this option with MySQL, Oracle, Postgres, SQLite, etc. What if it wasn't DB agnostic? Does any linux/open source DB provide a FILESTREAM option like MS SQL? It sounds like that would be an easy solution if it existed as, to the best of my knowledge, the app doesn't need to know any different when using FILESTREAM. The DB handles the disk/DB interaction. Brent I think that PostgreSQL 9.1 includes support for Foreign Data Wrappers that could be used to access the filesystem contents in a similar fashion. Regards, Ken RT Training Sessions (http://bestpractical.com/services/training.html) * Boston March 5 6, 2012
Re: [rt-users] Option to store attachments on the filesystem
I may be mistaken, but I thought that all ticket content is currently stored as an attachment in the DB and not just those available in the Create Ticket or Display Ticket screens. You're right: all uploaded attachments (images, logs, PDFs, etc) as well as all textual ticket updates (comments, correspondence, etc) are stored in the Attachments table. I've only been working on RT for a couple weeks, but I've worked on Bugzilla for a few years, so it has been very interesting to compare the two. I am surprised that uploaded attachments and ticket comments/correspondence are stored in the same table! That feels like overloading and/or de-normalization to me. So, yes, any implementation of the store attachments on the filesystem feature would have to change the current schema so that all textual ticket updates remain in the database, but all file uploads are stored on the filesystem. My brief reading of the patch submitted during a previous post about this issue (http://www.gossamer-threads.com/lists/rt/users/67406) shows that the current schema can be kept, but a clean, non-hackish implementation should probably change the schema. I wonder if we've been misunderstanding each other in other places because of my lack of understanding on this matter until now. If so, I apologize. I do think that more people count on the ticket system as a resource and expect it to be continuously available, the larger the consequences of adding additional moving parts to the system. It is very easy to trivialize the issues that need to manage a filestore, whether in a SQL DB or filesystem DB(or data store). When reliability and availability are important, many measures need to be taken to ensure access to all of the metadata+data and handle business continuity and disaster recovery. As a general statement, I would agree with everything you wrote. I will say, though, that for our RT instance I'd take an NFS mount of attachments over an additional 14GB table in the database any day of the week. Additionally, our rate of RT usage is growing every month, so that 14GB Attachments table could easily become 40GB in two years. I like future-proofing. As an example, using a different DB product you can replicate the backend to a new instance and keep it in sync until the upgrade. Then you have effectively zero time to copy the DB because the work was done outside the critical path for the upgrade. That's a good idea. Unfortunately, we couldn't have done this for our last upgrade because we moved from MySQL 4.1 to 5.1 (http://dev.mysql.com/doc/refman/5.1/en/replication-compatibility.html). I'm pretty sure we are going to make this change to our instance. (I'm pushing for it.) I was just hoping to get this feature into the main RT repo to spread the love, de-duplicate the identical work done by others, and make future upgrades easier for us. RT Training Sessions (http://bestpractical.com/services/training.html) * Boston March 5 6, 2012
Re: [rt-users] Option to store attachments on the filesystem
Hi Geoff, There exists a Best Practical written extension for putting attachments on disk, but it is not public. (Multiple mailing list posts have mentioned it in the last year.) As Ken points out, there are non-trivial issues and maintenance associated with using a database store and a filestore. This is one of handful of reasons it isn't public. On 12/22/2011 03:42 PM, Geoff Mayes wrote: Most importantly for the Best Practical folks, this option increase the appeal of RT to larger organizations instead of the small- to medium-sized market as stated at http://requesttracker.wikia.com/wiki/ManualIntroduction. The addition of this feature along with the recent SphinxSE option truly makes RT more feasible and attractive to larger organizations. We deal with some huge RT instances at some huge organizations. The assertion on the wiki page you point to is simply incorrect. Best, Thomas RT Training Sessions (http://bestpractical.com/services/training.html) * Boston March 5 6, 2012
[rt-users] Option to store attachments on the filesystem
Hello RT Users and Developers, Our RT instance at the University of Oregon is outgrowing the standard settings in some ways. One way is with attachments. The size of our database is 15.3GB and 13.7GB of that comes from the Attachments table. If our attachments were stored on a high-performance fileserver (or locally if you prefer), our database would shrink to 1.6GB. This would have numerous positive ramifications: - Database dumps/backups would finish in 1/10 the time - Database restores would finish in 1/10 the time - Planned downtimes and disaster recovery situations could be more nimbly performed (scp'ing around the db dump, restoring, etc) - Backups could be taken much more frequently - More backups could be stored - MySQL replication would be more robust with less binary data to chew on - Larger attachments could be permitted because there would be less fear of the database growing too quickly - Reduced database load querying/inserting/deleting/joining attachments I've read in previous posts to this mailing list (see below) that the arguments against this are that (1) attachments on the filesystem can't be searched and (2) the data backing the application will not be in one tidy database package but instead spread out across the db and filesystem. For our instance we don't care about #1, and for #2, while I understand the argument, I would actually argue the opposite: when attachments are on a high-performance, redundant SAN managed by a dedicated storage team that I don't have to worry about, my job administering RT just got a whole lot easier because I only have to worry about ensuring the fileserver is mounted and $AttachmentsPath (just an example config option) is properly set. I worked previously at a company that ran one of the largest instances of Bugzilla in the world and we served up 30TB of attachments over a fileserver without any problems. Can you imagine those attachments in a MySQL database? When ticket tracking syste ms are no longer small-ish, moving attachments out of the database becomes a must. I'm not asking the RT folks to switch attachment storage to the filesystem instead of the database. My wish is that RT offers its administrators the ability to choose one or the other. I know this has been a hot topic in the past, but I was hoping we could revisit the issue. Best Practical folks -- are you open to this? If so, would it help the process if I did all the work and submitted a patch? If so, should I file a bug so that we can talk about the way you would like this implemented? Given my reading of the history of this issue, I think a lot of folks would benefit from this feature. I've included previous postings about this issue below. Let me know if I can help and how I can. We would love to upstream a patch so our local instance doesn't diverge too severely from you all. Thanks for your consideration, Geoff Mayes One of the first, meaty discussions: http://www.gossamer-threads.com/lists/rt/devel/706 http://www.gossamer-threads.com/lists/rt/devel/37733 http://www.gossamer-threads.com/lists/rt/users/39507 The best discussion of the issue: http://www.gossamer-threads.com/lists/rt/users/67406 Best Practical has recently worked on this issue: http://www.gossamer-threads.com/lists/rt/users/89596 RT Training Sessions (http://bestpractical.com/services/training.html) * Boston March 5 6, 2012