Hi Kevin and Joe,

Joe -- that's exactly what I did at a previous company where we served 30TB of 
Bugzilla attachments and it worked very well.  I plan to do that with RT too.

Kevin -- thanks so much for your detailed response.  Yeah, definitely, I think 
this is an important conversation to have and I am so happy that others are 
weighing in.

> As far as the assertion that "a lot of folks would benefit from this
> feature", I doubt that would be the case for the vast majority of RT
> users. Most users can handle "one-stop-shopping" type applications
> with far fewer problems.

Yes, you're right.  I was a bit overzealous in claiming that "a lot of folks 
would benefit from this feature."  By "a lot" I did not mean the majority but 
instead that a not insignificant number of RT users would be interested in this 
feature (per the previous list postings I referenced).

> Once you divorce the metadate repository
> from the actual ticket data, you add a whole slew of different failure
> modes that will require much more sophisticated administration processes
> to prevent, ameliorate, or recover from.

I don't think this is the case, especially if this feature is implemented well. 
 RT only needs to deal with filesystem attachments for the "Create Ticket" and 
"Display Ticket" pages where users can upload or view attachments.  When 
displaying tickets RT could have a 5 second timeout when trying to access the 
local filesystem (over AJAX so the page still loads quickly entirely from the 
database) and if there is a timeout or failure, RT would log the issue (and 
optionally email) the admin and display a user-friendly message in the same 
attachments UI area stating that the attachments couldn't be displayed.

I envision searching of attachments being disabled in RT if the user has chosen 
to store attachments on the filesystem and this would of course be 
well-documented for those choosing to install RT with filesystem attachments.  
RT could still keep the Attachments table and store metadata about files in 
that table when a user uploads a file, but this isn't necessary and I'd let 
Best Practical decide this.  Keeping attachments in the Transactions table 
isn't necessary, but it fits with the design of RT and would be nice to have.  
Again, all of this is internal RT implementation details, and depending on how 
well it is done, the local administrator of RT could have an incredibly simple 
and easy experience if they chose to store attachments on the filesystem 
instead of the database. 

The local filesystem or fileserver itself is, granted, an additional piece to 
manage, but it is an incredibly simple piece.  I would much rather manage files 
on a filesystem than files in a database when I could have a database be 1.6GB 
instead of 15GB.  And there are so many free (as well as expensive) ways to 
manage data (e.g. NFS, rsync, RAID, high-end redundant SANs, etc).  Most 
importantly, the binary data is separated from textual data, and separate 
backup schedules and schemes are then permitted.  Yay for modular design!  As I 
said previously, I previously administered 30TB of attachments over NFS for a 
different tracking system and it worked very well.

> Your reference to leveraging an existing SAN+SAN management team gives a hint 
> to the increase in
> both complexity and cost of running an instance.

This cost is up to the user depending on how secure and robust they would like 
their data to be.  RT only provides the option for local or database attachment 
storage.  The user can then decide whether a cheap SATA local disk serves the 
attachments or a super-expensive (or cheap), locally-mounted fileserver.

> There are a wide range of RT users from systems that manage a handful
> of tickets a week all the way to systems handling thousands of tickets
> or more a week. Those on the small end can/should use whatever DB
> backend that they are familiar with to simplify administration and
> the "what did I do?!" errors due to a lack of familiarity.

Totally agree with this.  An option to store attachments on the filesystem, 
however, is database-agnostic, so RT admins can select this option with MySQL, 
Oracle, Postgres, SQLite, etc.

> As you move towards larger implementations, your DB backend needs to be
> chosen based on it viability in an enterprise/large-scale environment.
> I do not know the level of your local MySQL expertise and I am certainly
> not a MySQL expert, but a 15GB database does not strike me as particularly
> large, by any metric. Maybe you would benefit by changing your backend DB
> to something that scales better. I know that other DBs support tablespaces
> that can allow you to move certain tables to different filesystems/locations
> to provide for more parallel partitioning across more I/O resources.

Our desire to store attachments outside of the database, at this point, has 
little to do with application performance and everything to do with backups, 
disaster recovery, upgrades, and downtimes.  That being said, I do know that 
there are big performance gains that come from storing attachments outside of 
the database.  Check out one of Bugzilla's core developers discussion of this 
issue and their work-in-progress implementation: 
https://bugzilla.mozilla.org/show_bug.cgi?id=577532.  So moving attachments out 
of the database *is* an actual tuning option, just like the other options you 
mentioned.  Why do something drastic like changing the database backend or 
performing complicated and expert-level tuning/sharding/partitioning, when I 
could just add a few config options to RT_SiteConfig.pm and run a script (for a 
pre-existing instance) that then sets up my instance to serve attachments from 
a filesystem instead of the db?

Here's one recent example of how our current database size is negatively 
impacting us:  We upgraded from 3.8.4 to 4.0.4 yesterday and it took almost an 
hour to dump our database and almost an hour to import the database (we were 
upgrading MySQL and the OSes as well).  And then we had to import it again 
because max_packet_size was set too small (which wouldn't have been a problem 
if attachments were outside the db: anecdotal and not logical argument, but 
nonetheless a real-world occurrence as errors happen) so add another hour 
instead of only another 10 minutes.  If attachments were stored outside of the 
database, we could have reduced just the backup and import phases from 3 hours 
to 20 minutes.  That is a huge difference, especially when your application is 
used by thousands of customers waiting to log back in.  The positive 
ramifications continue: internal development of RT is much faster with a small 
database because we can copy them around the network faster, perform impo
 rts in 1/10th the time, and keep our development database up-to-date much 
easier.

If someone knew of a simpler way to cut the dump and restore times by 1/10, I 
would love to hear it and be totally open to a different solution.

The main point I would like to restate is that larger or quickly-growing 
instances of RT are very different than smaller or slowly-growing instances.  
One pain point of the larger instances is the size of the database and how that 
affects backups, restores, disaster recoveries, and development.  Having the 
option to store attachments outside of the database allows the larger RT 
instances to more easily manage their data for a much longer period of time.  
Most importantly for the Best Practical folks, this option increase the appeal 
of RT to larger organizations instead of the "small- to medium-sized" market as 
stated at http://requesttracker.wikia.com/wiki/ManualIntroduction.  The 
addition of this feature along with the recent SphinxSE option truly makes RT 
more feasible and attractive to larger organizations.

Kind regards, Geoff Mayes

________________________________________
From: rt-users-boun...@lists.bestpractical.com 
[rt-users-boun...@lists.bestpractical.com] on behalf of Joe Harris 
[drey...@gmail.com]
Sent: Thursday, December 22, 2011 9:43 AM
To: rt-users@lists.bestpractical.com
Subject: Re: [rt-users] Option to store attachments on the filesystem

I am looking into this type of functionality as well. We were thinking of an 
NSF share in a web directory to drop the attachment with a way to drop a link 
within the ticket. So the attachments may not even exist on the RT server, but 
there will be links in the ticket to a web server that houses the attachment.




On Dec 22, 2011, at 9:42 AM, "k...@rice.edu" <k...@rice.edu> wrote:

> On Wed, Dec 21, 2011 at 11:12:04PM +0000, Geoff Mayes wrote:
>> Hello RT Users and Developers,
>>
>> Our RT instance at the University of Oregon is outgrowing the standard 
>> settings in some ways.  One way is with attachments.  The size of our 
>> database is 15.3GB and 13.7GB of that comes from the Attachments table.  If 
>> our attachments were stored on a high-performance fileserver (or locally if 
>> you prefer), our database would shrink to 1.6GB.  This would have numerous 
>> positive ramifications:
>>
>> - Database dumps/backups would finish in 1/10 the time
>> - Database restores would finish in 1/10 the time
>> - Planned downtimes and disaster recovery situations could be more nimbly 
>> performed (scp'ing around the db dump, restoring, etc)
>> - Backups could be taken much more frequently
>> - More backups could be stored
>> - MySQL replication would be more robust with less binary data to chew on
>> - Larger attachments could be permitted because there would be less fear of 
>> the database growing too quickly
>> - Reduced database load querying/inserting/deleting/joining attachments
>>
>> I've read in previous posts to this mailing list (see below) that the 
>> arguments against this are that (1) attachments on the filesystem can't be 
>> searched and (2) the data backing the application will not be in one tidy 
>> database package but instead spread out across the db and filesystem.  For 
>> our instance we don't care about #1, and for #2, while I understand the 
>> argument, I would actually argue the opposite: when attachments are on a 
>> high-performance, redundant SAN managed by a dedicated storage team that I 
>> don't have to worry about, my job administering RT just got a whole lot 
>> easier because I only have to worry about ensuring the fileserver is mounted 
>> and $AttachmentsPath (just an example config option) is properly set.  I 
>> worked previously at a company that ran one of the largest instances of 
>> Bugzilla in the world and we served up 30TB of attachments over a fileserver 
>> without any problems.  Can you imagine those attachments in a MySQL 
>> database?  When ticket tracking sy
 s
> te
>> ms are no longer small-ish, moving attachments out of the database becomes a 
>> must.
>>
>> I'm not asking the RT folks to switch attachment storage to the filesystem 
>> instead of the database.  My wish is that RT offers its administrators the 
>> ability to choose one or the other.  I know this has been a hot topic in the 
>> past, but I was hoping we could revisit the issue.  Best Practical folks -- 
>> are you open to this?  If so, would it help the process if I did all the 
>> work and submitted a patch?  If so, should I file a bug so that we can talk 
>> about the way you would like this implemented?
>>
>> Given my reading of the history of this issue, I think a lot of folks would 
>> benefit from this feature.  I've included previous postings about this issue 
>> below.  Let me know if I can help and how I can.  We would love to upstream 
>> a patch so our local instance doesn't diverge too severely from you all.
>>
>> Thanks for your consideration, Geoff Mayes
>>
>> One of the first, meaty discussions:
>> http://www.gossamer-threads.com/lists/rt/devel/706
>> http://www.gossamer-threads.com/lists/rt/devel/37733
>> http://www.gossamer-threads.com/lists/rt/users/39507
>> The best discussion of the issue:
>> http://www.gossamer-threads.com/lists/rt/users/67406
>> Best Practical has recently worked on this issue:
>> http://www.gossamer-threads.com/lists/rt/users/89596
>>
>
> Hi Geoff,
>
> I had thought that something like this had already been implemented
> by Best Practical for a customer. Hopefully, they can provide some
> feedback regarding the utility and possible problems of such an
> approach from personal experience. Maybe they would consider releasing
> it as an extenstion.
>
> As far as the assertion that "a lot of folks would benefit from this
> feature", I doubt that would be the case for the vast majority of RT
> users. Most users can handle "one-stop-shopping" type applications
> with far fewer problems. Once you divorce the metadate repository
> from the actual ticket data, you add a whole slew of different failure
> modes that will require much more sophisticated administration processes
> to prevent, ameliorate, or recover from. Your reference to leveraging
> an existing SAN+SAN management team gives a hint to the increase in
> both complexity and cost of running an instance.
>
> There are a wide range of RT users from systems that manage a handful
> of tickets a week all the way to systems handling thousands of tickets
> or more a week. Those on the small end can/should use whatever DB
> backend that they are familiar with to simplify administration and
> the "what did I do?!" errors due to a lack of familiarity. As you
> move towards larger implementations, your DB backend needs to be
> chosen based on it viability in an enterprise/large-scale environment.
> I do not know the level of your local MySQL expertise and I am certainly
> not a MySQL expert, but a 15GB database does not strike me as particularly
> large, by any metric. Maybe you would benefit by changing your backend DB
> to something that scales better. I know that other DBs support tablespaces
> that can allow you to move certain tables to different filesystems/locations
> to provide for more parallel partitioning across more I/O resources.
>
> Sorry for the slight ramble. I am looking forward to this discussion and
> if this feature is added some documentation describing when and when not
> to use it will be essential.
>
> Regards,
> Ken
>> --------
>> RT Training Sessions (http://bestpractical.com/services/training.html)
>> * Boston  March 5 & 6, 2012
>>
> --------
> RT Training Sessions (http://bestpractical.com/services/training.html)
> * Boston  March 5 & 6, 2012
--------
RT Training Sessions (http://bestpractical.com/services/training.html)
* Boston  March 5 & 6, 2012
--------
RT Training Sessions (http://bestpractical.com/services/training.html)
* Boston  March 5 & 6, 2012

Reply via email to