Re: [rt-users] Option to store attachments on the filesystem

2011-12-23 Thread Thomas Sibley
On 12/22/2011 12:43 PM, Joe Harris wrote:
 I am looking into this type of functionality as well. We were
 thinking of an NSF share in a web directory to drop the attachment
 with a way to drop a link within the ticket. So the attachments may
 not even exist on the RT server, but there will be links in the
 ticket to a web server that houses the attachment.

It'd be nice if this was the case, but the big problem with having the
web server directly serve your attachments is that suddenly you lose
_all_ access control that RT normally provides around attachments.

Even if attachments are stored on disk, they must be _served_ by RT, not
the web server directly.

Thomas

RT Training Sessions (http://bestpractical.com/services/training.html)
* Boston  March 5  6, 2012


Re: [rt-users] Option to store attachments on the filesystem

2011-12-23 Thread Thomas Sibley
Please keep replies on the list.

For the record, I'm not claiming that core RT shouldn't support
attachments on disk in the future.  I'm just trying to give you the
relevant info for right now.

On 12/22/2011 08:58 PM, Geoff Mayes wrote:
 I searched but struck out.  Could you provide some links?  Why isn'
 it public?  Any way I can take a look at it?  :)

The extension was originally the result of customer work, and it hasn't
been made public.  Will it be made public?  We don't know yet.  For the
time being, you'll need to contact sa...@bestpractical.com if you're
interested in it.

The mailing list threads I found just now with a search:
http://www.gossamer-threads.com/lists/rt/users/92667
http://www.gossamer-threads.com/lists/rt/users/100964

 Is it that non-trivial?  The Bugzilla in-house Attachments.pm module
 we used to use was 200 lines of Perl and that handled the main 8TB
 attachments datastore as well as the archives 30TB datastore, sorting
 out discrepancies between the two.  And larger organizations will
 have the resources and expertise to do these kinds of things easily,
 so if RT really is for organizations of all sizes, then how does it
 cater to the non-trivial users in this matter?

Larger organizations take a variety of steps to help ensure RT performs
well.  Yes, occasionally that includes putting attachments on disk, but
it also includes good database tuning and many other tweaks before that.
 The fact is that a 15 or 20GB database is simply not large at all;
15-20GB fits on a single USB flash drive.

 I just don't see how keeping attachments in the database is *optimal*
 for backing up and restoring the database when the database gets
 beyond 10-20GB, especially when an organization hasn't paid Oracle
 $5,000 so they can run hotcopy on their InnoDB databases.

As Ken from rice.edu said earlier, there are smarter backup solutions
than dumping/restoring the entire DB every single time.  He named a couple.

As for Oracle's $5k tool, you can get similar results with Percona's
completely free, open-source hot backup/copy tool for InnoDB called
XtraBackup: http://www.percona.com/software/percona-xtrabackup/

 How big are their databases and attachment tables?  How do they do
 backups and restores?  What are their disaster recovery plans?

These are customers that we can't provide that information on.  There
was at least one thread in 2010 or 2011 on the mailing list asking
people to contribute stats about the sizes of their RT instance.  Some
notably large ones came up.

Thomas

RT Training Sessions (http://bestpractical.com/services/training.html)
* Boston  March 5  6, 2012


Re: [rt-users] Option to store attachments on the filesystem

2011-12-22 Thread k...@rice.edu
On Wed, Dec 21, 2011 at 11:12:04PM +, Geoff Mayes wrote:
 Hello RT Users and Developers,
 
 Our RT instance at the University of Oregon is outgrowing the standard 
 settings in some ways.  One way is with attachments.  The size of our 
 database is 15.3GB and 13.7GB of that comes from the Attachments table.  If 
 our attachments were stored on a high-performance fileserver (or locally if 
 you prefer), our database would shrink to 1.6GB.  This would have numerous 
 positive ramifications:
 
 - Database dumps/backups would finish in 1/10 the time
 - Database restores would finish in 1/10 the time
 - Planned downtimes and disaster recovery situations could be more nimbly 
 performed (scp'ing around the db dump, restoring, etc)
 - Backups could be taken much more frequently
 - More backups could be stored
 - MySQL replication would be more robust with less binary data to chew on
 - Larger attachments could be permitted because there would be less fear of 
 the database growing too quickly
 - Reduced database load querying/inserting/deleting/joining attachments
 
 I've read in previous posts to this mailing list (see below) that the 
 arguments against this are that (1) attachments on the filesystem can't be 
 searched and (2) the data backing the application will not be in one tidy 
 database package but instead spread out across the db and filesystem.  For 
 our instance we don't care about #1, and for #2, while I understand the 
 argument, I would actually argue the opposite: when attachments are on a 
 high-performance, redundant SAN managed by a dedicated storage team that I 
 don't have to worry about, my job administering RT just got a whole lot 
 easier because I only have to worry about ensuring the fileserver is mounted 
 and $AttachmentsPath (just an example config option) is properly set.  I 
 worked previously at a company that ran one of the largest instances of 
 Bugzilla in the world and we served up 30TB of attachments over a fileserver 
 without any problems.  Can you imagine those attachments in a MySQL database? 
  When ticket tracking sys
 te
  ms are no longer small-ish, moving attachments out of the database becomes a 
 must.
 
 I'm not asking the RT folks to switch attachment storage to the filesystem 
 instead of the database.  My wish is that RT offers its administrators the 
 ability to choose one or the other.  I know this has been a hot topic in the 
 past, but I was hoping we could revisit the issue.  Best Practical folks -- 
 are you open to this?  If so, would it help the process if I did all the work 
 and submitted a patch?  If so, should I file a bug so that we can talk about 
 the way you would like this implemented?
 
 Given my reading of the history of this issue, I think a lot of folks would 
 benefit from this feature.  I've included previous postings about this issue 
 below.  Let me know if I can help and how I can.  We would love to upstream a 
 patch so our local instance doesn't diverge too severely from you all.
 
 Thanks for your consideration, Geoff Mayes
 
 One of the first, meaty discussions:
 http://www.gossamer-threads.com/lists/rt/devel/706
 http://www.gossamer-threads.com/lists/rt/devel/37733
 http://www.gossamer-threads.com/lists/rt/users/39507
 The best discussion of the issue:
 http://www.gossamer-threads.com/lists/rt/users/67406
 Best Practical has recently worked on this issue:
 http://www.gossamer-threads.com/lists/rt/users/89596
 

Hi Geoff,

I had thought that something like this had already been implemented
by Best Practical for a customer. Hopefully, they can provide some
feedback regarding the utility and possible problems of such an
approach from personal experience. Maybe they would consider releasing
it as an extenstion.

As far as the assertion that a lot of folks would benefit from this
feature, I doubt that would be the case for the vast majority of RT
users. Most users can handle one-stop-shopping type applications
with far fewer problems. Once you divorce the metadate repository
from the actual ticket data, you add a whole slew of different failure
modes that will require much more sophisticated administration processes
to prevent, ameliorate, or recover from. Your reference to leveraging
an existing SAN+SAN management team gives a hint to the increase in
both complexity and cost of running an instance.

There are a wide range of RT users from systems that manage a handful
of tickets a week all the way to systems handling thousands of tickets
or more a week. Those on the small end can/should use whatever DB
backend that they are familiar with to simplify administration and
the what did I do?! errors due to a lack of familiarity. As you
move towards larger implementations, your DB backend needs to be
chosen based on it viability in an enterprise/large-scale environment.
I do not know the level of your local MySQL expertise and I am certainly
not a MySQL expert, but a 15GB database does not strike me as particularly
large, by any 

Re: [rt-users] Option to store attachments on the filesystem

2011-12-22 Thread Joe Harris
I am looking into this type of functionality as well. We were thinking of an 
NSF share in a web directory to drop the attachment with a way to drop a link 
within the ticket. So the attachments may not even exist on the RT server, but 
there will be links in the ticket to a web server that houses the attachment. 




On Dec 22, 2011, at 9:42 AM, k...@rice.edu k...@rice.edu wrote:

 On Wed, Dec 21, 2011 at 11:12:04PM +, Geoff Mayes wrote:
 Hello RT Users and Developers,
 
 Our RT instance at the University of Oregon is outgrowing the standard 
 settings in some ways.  One way is with attachments.  The size of our 
 database is 15.3GB and 13.7GB of that comes from the Attachments table.  If 
 our attachments were stored on a high-performance fileserver (or locally if 
 you prefer), our database would shrink to 1.6GB.  This would have numerous 
 positive ramifications:
 
 - Database dumps/backups would finish in 1/10 the time
 - Database restores would finish in 1/10 the time
 - Planned downtimes and disaster recovery situations could be more nimbly 
 performed (scp'ing around the db dump, restoring, etc)
 - Backups could be taken much more frequently
 - More backups could be stored
 - MySQL replication would be more robust with less binary data to chew on
 - Larger attachments could be permitted because there would be less fear of 
 the database growing too quickly
 - Reduced database load querying/inserting/deleting/joining attachments
 
 I've read in previous posts to this mailing list (see below) that the 
 arguments against this are that (1) attachments on the filesystem can't be 
 searched and (2) the data backing the application will not be in one tidy 
 database package but instead spread out across the db and filesystem.  For 
 our instance we don't care about #1, and for #2, while I understand the 
 argument, I would actually argue the opposite: when attachments are on a 
 high-performance, redundant SAN managed by a dedicated storage team that I 
 don't have to worry about, my job administering RT just got a whole lot 
 easier because I only have to worry about ensuring the fileserver is mounted 
 and $AttachmentsPath (just an example config option) is properly set.  I 
 worked previously at a company that ran one of the largest instances of 
 Bugzilla in the world and we served up 30TB of attachments over a fileserver 
 without any problems.  Can you imagine those attachments in a MySQL 
 database?  When ticket tracking sy
 s
 te
 ms are no longer small-ish, moving attachments out of the database becomes a 
 must.
 
 I'm not asking the RT folks to switch attachment storage to the filesystem 
 instead of the database.  My wish is that RT offers its administrators the 
 ability to choose one or the other.  I know this has been a hot topic in the 
 past, but I was hoping we could revisit the issue.  Best Practical folks -- 
 are you open to this?  If so, would it help the process if I did all the 
 work and submitted a patch?  If so, should I file a bug so that we can talk 
 about the way you would like this implemented?
 
 Given my reading of the history of this issue, I think a lot of folks would 
 benefit from this feature.  I've included previous postings about this issue 
 below.  Let me know if I can help and how I can.  We would love to upstream 
 a patch so our local instance doesn't diverge too severely from you all.
 
 Thanks for your consideration, Geoff Mayes
 
 One of the first, meaty discussions:
 http://www.gossamer-threads.com/lists/rt/devel/706
 http://www.gossamer-threads.com/lists/rt/devel/37733
 http://www.gossamer-threads.com/lists/rt/users/39507
 The best discussion of the issue:
 http://www.gossamer-threads.com/lists/rt/users/67406
 Best Practical has recently worked on this issue:
 http://www.gossamer-threads.com/lists/rt/users/89596
 
 
 Hi Geoff,
 
 I had thought that something like this had already been implemented
 by Best Practical for a customer. Hopefully, they can provide some
 feedback regarding the utility and possible problems of such an
 approach from personal experience. Maybe they would consider releasing
 it as an extenstion.
 
 As far as the assertion that a lot of folks would benefit from this
 feature, I doubt that would be the case for the vast majority of RT
 users. Most users can handle one-stop-shopping type applications
 with far fewer problems. Once you divorce the metadate repository
 from the actual ticket data, you add a whole slew of different failure
 modes that will require much more sophisticated administration processes
 to prevent, ameliorate, or recover from. Your reference to leveraging
 an existing SAN+SAN management team gives a hint to the increase in
 both complexity and cost of running an instance.
 
 There are a wide range of RT users from systems that manage a handful
 of tickets a week all the way to systems handling thousands of tickets
 or more a week. Those on the small end can/should use whatever DB
 backend that they are 

Re: [rt-users] Option to store attachments on the filesystem

2011-12-22 Thread Geoff Mayes
, I do know that 
there are big performance gains that come from storing attachments outside of 
the database.  Check out one of Bugzilla's core developers discussion of this 
issue and their work-in-progress implementation: 
https://bugzilla.mozilla.org/show_bug.cgi?id=577532.  So moving attachments out 
of the database *is* an actual tuning option, just like the other options you 
mentioned.  Why do something drastic like changing the database backend or 
performing complicated and expert-level tuning/sharding/partitioning, when I 
could just add a few config options to RT_SiteConfig.pm and run a script (for a 
pre-existing instance) that then sets up my instance to serve attachments from 
a filesystem instead of the db?

Here's one recent example of how our current database size is negatively 
impacting us:  We upgraded from 3.8.4 to 4.0.4 yesterday and it took almost an 
hour to dump our database and almost an hour to import the database (we were 
upgrading MySQL and the OSes as well).  And then we had to import it again 
because max_packet_size was set too small (which wouldn't have been a problem 
if attachments were outside the db: anecdotal and not logical argument, but 
nonetheless a real-world occurrence as errors happen) so add another hour 
instead of only another 10 minutes.  If attachments were stored outside of the 
database, we could have reduced just the backup and import phases from 3 hours 
to 20 minutes.  That is a huge difference, especially when your application is 
used by thousands of customers waiting to log back in.  The positive 
ramifications continue: internal development of RT is much faster with a small 
database because we can copy them around the network faster, perform impo
 rts in 1/10th the time, and keep our development database up-to-date much 
easier.

If someone knew of a simpler way to cut the dump and restore times by 1/10, I 
would love to hear it and be totally open to a different solution.

The main point I would like to restate is that larger or quickly-growing 
instances of RT are very different than smaller or slowly-growing instances.  
One pain point of the larger instances is the size of the database and how that 
affects backups, restores, disaster recoveries, and development.  Having the 
option to store attachments outside of the database allows the larger RT 
instances to more easily manage their data for a much longer period of time.  
Most importantly for the Best Practical folks, this option increase the appeal 
of RT to larger organizations instead of the small- to medium-sized market as 
stated at http://requesttracker.wikia.com/wiki/ManualIntroduction.  The 
addition of this feature along with the recent SphinxSE option truly makes RT 
more feasible and attractive to larger organizations.

Kind regards, Geoff Mayes


From: rt-users-boun...@lists.bestpractical.com 
[rt-users-boun...@lists.bestpractical.com] on behalf of Joe Harris 
[drey...@gmail.com]
Sent: Thursday, December 22, 2011 9:43 AM
To: rt-users@lists.bestpractical.com
Subject: Re: [rt-users] Option to store attachments on the filesystem

I am looking into this type of functionality as well. We were thinking of an 
NSF share in a web directory to drop the attachment with a way to drop a link 
within the ticket. So the attachments may not even exist on the RT server, but 
there will be links in the ticket to a web server that houses the attachment.




On Dec 22, 2011, at 9:42 AM, k...@rice.edu k...@rice.edu wrote:

 On Wed, Dec 21, 2011 at 11:12:04PM +, Geoff Mayes wrote:
 Hello RT Users and Developers,

 Our RT instance at the University of Oregon is outgrowing the standard 
 settings in some ways.  One way is with attachments.  The size of our 
 database is 15.3GB and 13.7GB of that comes from the Attachments table.  If 
 our attachments were stored on a high-performance fileserver (or locally if 
 you prefer), our database would shrink to 1.6GB.  This would have numerous 
 positive ramifications:

 - Database dumps/backups would finish in 1/10 the time
 - Database restores would finish in 1/10 the time
 - Planned downtimes and disaster recovery situations could be more nimbly 
 performed (scp'ing around the db dump, restoring, etc)
 - Backups could be taken much more frequently
 - More backups could be stored
 - MySQL replication would be more robust with less binary data to chew on
 - Larger attachments could be permitted because there would be less fear of 
 the database growing too quickly
 - Reduced database load querying/inserting/deleting/joining attachments

 I've read in previous posts to this mailing list (see below) that the 
 arguments against this are that (1) attachments on the filesystem can't be 
 searched and (2) the data backing the application will not be in one tidy 
 database package but instead spread out across the db and filesystem.  For 
 our instance we don't care about #1, and for #2, while I understand the 
 argument

Re: [rt-users] Option to store attachments on the filesystem

2011-12-22 Thread k...@rice.edu
On Thu, Dec 22, 2011 at 02:17:00PM -0700, Brent Wiese wrote:
 Totally agree with this.  An option to store attachments on the filesystem, 
 however, is database-agnostic, so RT admins can select this option with 
 MySQL, Oracle, Postgres, SQLite, etc.
 
 What if it wasn't DB agnostic? 
 
 Does any linux/open source DB provide a FILESTREAM option like MS SQL? It 
 sounds like that would be an easy solution if it existed as, to the best of 
 my knowledge, the app doesn't need to know any different when using 
 FILESTREAM. The DB handles the disk/DB interaction.
 
 Brent

I think that PostgreSQL 9.1 includes support for Foreign Data Wrappers
that could be used to access the filesystem contents in a similar
fashion.

Regards,
Ken

RT Training Sessions (http://bestpractical.com/services/training.html)
* Boston  March 5  6, 2012


Re: [rt-users] Option to store attachments on the filesystem

2011-12-22 Thread Geoff Mayes
 I may be mistaken, but I thought that all ticket content is currently stored
 as an attachment in the DB and not just those available in the Create Ticket 
 or
 Display Ticket screens.

You're right: all uploaded attachments (images, logs, PDFs, etc) as well as all 
textual ticket updates (comments, correspondence, etc) are stored in the 
Attachments table.  I've only been working on RT for a couple weeks, but I've 
worked on Bugzilla for a few years, so it has been very interesting to compare 
the two.  I am surprised that uploaded attachments and ticket 
comments/correspondence are stored in the same table!  That feels like 
overloading and/or de-normalization to me.  So, yes, any implementation of the 
store attachments on the filesystem feature would have to change the current 
schema so that all textual ticket updates remain in the database, but all file 
uploads are stored on the filesystem.  My brief reading of the patch submitted 
during a previous post about this issue 
(http://www.gossamer-threads.com/lists/rt/users/67406) shows that the current 
schema can be kept, but a clean, non-hackish implementation should probably 
change the schema.

I wonder if we've been misunderstanding each other in other places because of 
my lack of understanding on this matter until now.  If so, I apologize.

 I do think that more people count on the ticket system as a resource and 
 expect
 it to be continuously available, the larger the consequences of adding 
 additional
 moving parts to the system.
 It is very easy to trivialize the issues
 that need to manage a filestore, whether in a SQL DB or filesystem DB(or data
 store). When reliability and availability are important, many measures need to
 be taken to ensure access to all of the metadata+data and handle business
 continuity and disaster recovery.

As a general statement, I would agree with everything you wrote.  I will say, 
though, that for our RT instance I'd take an NFS mount of attachments over an 
additional 14GB table in the database any day of the week.  Additionally, our 
rate of RT usage is growing every month, so that 14GB Attachments table could 
easily become 40GB in two years.  I like future-proofing.

 As an example, using a different DB product you can replicate the backend to
 a new instance and keep it in sync until the upgrade. Then you have 
 effectively
 zero time to copy the DB because the work was done outside the critical path
 for the upgrade.

That's a good idea.  Unfortunately, we couldn't have done this for our last 
upgrade because we moved from MySQL 4.1 to 5.1 
(http://dev.mysql.com/doc/refman/5.1/en/replication-compatibility.html).

I'm pretty sure we are going to make this change to our instance.  (I'm pushing 
for it.)  I was just hoping to get this feature into the main RT repo to spread 
the love, de-duplicate the identical work done by others, and make future 
upgrades easier for us.

RT Training Sessions (http://bestpractical.com/services/training.html)
* Boston  March 5  6, 2012


Re: [rt-users] Option to store attachments on the filesystem

2011-12-22 Thread Thomas Sibley
Hi Geoff,

There exists a Best Practical written extension for putting attachments
on disk, but it is not public.  (Multiple mailing list posts have
mentioned it in the last year.)

As Ken points out, there are non-trivial issues and maintenance
associated with using a database store and a filestore.  This is one of
handful of reasons it isn't public.

On 12/22/2011 03:42 PM, Geoff Mayes wrote:
 Most importantly for the Best Practical folks, this option increase 
 the appeal of RT to larger organizations instead of the small- to 
 medium-sized market as stated at 
 http://requesttracker.wikia.com/wiki/ManualIntroduction. The addition
 of this feature along with the recent SphinxSE option truly makes RT
 more feasible and attractive to larger organizations.

We deal with some huge RT instances at some huge organizations.  The
assertion on the wiki page you point to is simply incorrect.

Best,
Thomas

RT Training Sessions (http://bestpractical.com/services/training.html)
* Boston  March 5  6, 2012


[rt-users] Option to store attachments on the filesystem

2011-12-21 Thread Geoff Mayes
Hello RT Users and Developers,

Our RT instance at the University of Oregon is outgrowing the standard settings 
in some ways.  One way is with attachments.  The size of our database is 15.3GB 
and 13.7GB of that comes from the Attachments table.  If our attachments were 
stored on a high-performance fileserver (or locally if you prefer), our 
database would shrink to 1.6GB.  This would have numerous positive 
ramifications:

- Database dumps/backups would finish in 1/10 the time
- Database restores would finish in 1/10 the time
- Planned downtimes and disaster recovery situations could be more nimbly 
performed (scp'ing around the db dump, restoring, etc)
- Backups could be taken much more frequently
- More backups could be stored
- MySQL replication would be more robust with less binary data to chew on
- Larger attachments could be permitted because there would be less fear of the 
database growing too quickly
- Reduced database load querying/inserting/deleting/joining attachments

I've read in previous posts to this mailing list (see below) that the arguments 
against this are that (1) attachments on the filesystem can't be searched and 
(2) the data backing the application will not be in one tidy database package 
but instead spread out across the db and filesystem.  For our instance we don't 
care about #1, and for #2, while I understand the argument, I would actually 
argue the opposite: when attachments are on a high-performance, redundant SAN 
managed by a dedicated storage team that I don't have to worry about, my job 
administering RT just got a whole lot easier because I only have to worry about 
ensuring the fileserver is mounted and $AttachmentsPath (just an example config 
option) is properly set.  I worked previously at a company that ran one of the 
largest instances of Bugzilla in the world and we served up 30TB of attachments 
over a fileserver without any problems.  Can you imagine those attachments in a 
MySQL database?  When ticket tracking syste
 ms are no longer small-ish, moving attachments out of the database becomes a 
must.

I'm not asking the RT folks to switch attachment storage to the filesystem 
instead of the database.  My wish is that RT offers its administrators the 
ability to choose one or the other.  I know this has been a hot topic in the 
past, but I was hoping we could revisit the issue.  Best Practical folks -- are 
you open to this?  If so, would it help the process if I did all the work and 
submitted a patch?  If so, should I file a bug so that we can talk about the 
way you would like this implemented?

Given my reading of the history of this issue, I think a lot of folks would 
benefit from this feature.  I've included previous postings about this issue 
below.  Let me know if I can help and how I can.  We would love to upstream a 
patch so our local instance doesn't diverge too severely from you all.

Thanks for your consideration, Geoff Mayes

One of the first, meaty discussions:
http://www.gossamer-threads.com/lists/rt/devel/706
http://www.gossamer-threads.com/lists/rt/devel/37733
http://www.gossamer-threads.com/lists/rt/users/39507
The best discussion of the issue:
http://www.gossamer-threads.com/lists/rt/users/67406
Best Practical has recently worked on this issue:
http://www.gossamer-threads.com/lists/rt/users/89596


RT Training Sessions (http://bestpractical.com/services/training.html)
* Boston  March 5  6, 2012