Re: Idea for reducing disk IO on tagging operations

2005-03-30 Thread Spiro Trikaliotis
Hello,

sorry for the late reply to this, but I was on vacation. Anything, I
believe I might be able to contribute something to this discussion,
which even resulted in some code.

* On Sun, Mar 20, 2005 at 11:54:32PM + Dr. David Alan Gilbert wrote:

 OK, my conscience will let me carefully ignore NFS issues given the
 pain it causes me elsewhere (and I make my mechanism switchable).
 What happens if I only used the overwrite mechanism if
 none of the characters being modified crossed a 512 (e.g.) byte
 boundary offset in the file?  Since the spaces were actually
 written in a previous operation we can assume that the space
 is allocated and no allocation operation is going to happen
 at this point (mumble filesystem journalling mumble!).

IMHO, here, you are not correct. If I write X times a char Y into a
file, I cannot assume that memory for X characters has been allocated.
The file system can do some optimizations, compress the file (for
example, run-length encoding RLE: First character tells that X times the
same character will be written, and the character itself is written
afterwards), or anything else. Furthermore, think of so-called
sparse-files, which can be rather big - much bigger than your actual
medium is itself.

Because of this, even a block boundary in the file does not make much
sense, IMHO, for the general case, that is, arbitrary file systems.

Regards,
   Spiro.

-- 
Spiro R. Trikaliotis  http://cbm4win.sf.net/
http://www.trikaliotis.net/ http://www.viceteam.org/


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-28 Thread Dr. David Alan Gilbert
Hi,
  Well, I've had a crack at implementing the optimisation; and attached
is a patch which seems to work - but there is at least one nasty
hack in it; more about that in a sec.

To enable it you need to add:
  TagOverwriteEnable=yes
to the config file in the CVSROOT; without that it should not
change behaviour in any way (except adding that as a commented
out option with warning to a newly created repository).

It won't give you any performance benefit on the first tag, but should
give something on subsequent tags.  I see some improvement (~15%)
but it is variable, on a large repository that doesn't fit in
memory on my home machine.

It is my first dig into the CVS code base, so I would appreciate
(gentle) comments.

Now some details;
  1) The real nasty hack; this is something that I hadn't thought
  of (and I don't think anyone else noticed?) in my original
  description; the permissions on the rcs files is read only
  so when I need to open them to overwrite I can't - this is a pain;
  this patch has a gratuitous (and obviously WRONG) hack in of
  chmod'ing it before the open - I'm open for any suggestions *if*
  there is a right way of doing this. (This was a pain because
  it was at the very last stage of the patch that I noticed this!).

  2) I don't currently create the dummy ,foo, locking file.

  3) I haven't written any docs yet.

  4) I needed to get a couple of values out of rcsbuf_getkey and
  have shoved them in globals for the moment; I was looking for a
  neater way that wouldn't mean changing all the callers.

  5) I'm worried about the right types to use for file offsets
  in a portable way. (Has anyone tried cvs with rcs files over 2GB?)

The patch is against 1.12.9 which is the version my debian happened to
have.

As I say, suggestions - and experiences welcome.

Dave
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC  HPPA | In Hex /
 \ _|_ http://www.treblig.org   |___/
diff -ur orig/cvs-1.12.9/ChangeLog cvs-1.12.9/ChangeLog
--- orig/cvs-1.12.9/ChangeLog   2004-06-09 15:52:32.0 +0100
+++ cvs-1.12.9/ChangeLog2005-03-24 23:43:48.0 +
@@ -1,3 +1,6 @@
+2005-03-24  Dave Gilbert [EMAIL PROTECTED]
+  * Added fast tagging mechanism; rcs.h/c, parseinfo.c,mkmodules.c
+
 2004-06-09  Derek Price  [EMAIL PROTECTED]
 
* NEWS: Note Stefan  Sebastian's security fixes.
diff -ur orig/cvs-1.12.9/src/admin.c cvs-1.12.9/src/admin.c
--- orig/cvs-1.12.9/src/admin.c 2004-03-22 15:37:34.0 +
+++ cvs-1.12.9/src/admin.c  2005-03-27 20:39:38.0 +0100
@@ -792,7 +792,7 @@
 || (rev = RCS_tag2rev (rcs, p))) /* tag2rev may exit */
{
RCS_check_tag (tag); /* exit if not a valid tag */
-   RCS_settag (rcs, tag, rev);
+   RCS_settag (rcs, tag, rev, NULL);
free (rev);
}
 else
diff -ur orig/cvs-1.12.9/src/commit.c cvs-1.12.9/src/commit.c
--- orig/cvs-1.12.9/src/commit.c2004-06-09 15:52:37.0 +0100
+++ cvs-1.12.9/src/commit.c 2005-03-27 20:39:45.0 +0100
@@ -2144,7 +2144,7 @@
head = RCS_getversion (rcs, NULL, NULL, 0, (int *) NULL);
magicrev = RCS_magicrev (rcs, head);
 
-   retcode = RCS_settag (rcs, tag, magicrev);
+   retcode = RCS_settag (rcs, tag, magicrev, NULL);
RCS_rewrite (rcs, NULL, NULL);
 
free (head);
diff -ur orig/cvs-1.12.9/src/import.c cvs-1.12.9/src/import.c
--- orig/cvs-1.12.9/src/import.c2004-04-27 22:08:40.0 +0100
+++ cvs-1.12.9/src/import.c 2005-03-27 20:39:59.0 +0100
@@ -770,7 +770,7 @@
 if (noexec)
return (0);
 
-if ((retcode = RCS_settag(rcs, vtag, vbranch)) != 0)
+if ((retcode = RCS_settag(rcs, vtag, vbranch, NULL)) != 0)
 {
ierrno = errno;
fperrmsg (logfp, 0, retcode == -1 ? ierrno : 0,
@@ -792,7 +792,7 @@
 vers = Version_TS (finfo, NULL, vtag, NULL, 1, 0);
 for (i = 0; i  targc; i++)
 {
-   if ((retcode = RCS_settag (rcs, targv[i], vers-vn_rcs)) == 0)
+   if ((retcode = RCS_settag (rcs, targv[i], vers-vn_rcs, NULL)) == 0)
RCS_rewrite (rcs, NULL, NULL);
else
{
diff -ur orig/cvs-1.12.9/src/mkmodules.c cvs-1.12.9/src/mkmodules.c
--- orig/cvs-1.12.9/src/mkmodules.c 2004-05-29 05:48:52.0 +0100
+++ cvs-1.12.9/src/mkmodules.c  2005-03-24 23:43:38.0 +
@@ -349,6 +349,23 @@
 # Be warned that these strings could be disabled in any new version of 
CVS.\n,
 UseNewInfoFmtStrings=yes\n,
 #endif /* SUPPORT_OLD_INFO_FMT_STRINGS */
+# Options relating to the Tag overwrite optimisation\n,
+# ** WARNING ** Only enable this after reading the appropriate 
documentation\n,
+# since it can cause 

Re: Idea for reducing disk IO on tagging operations

2005-03-28 Thread Doug Lee
I followed this discussion only loosely and kept silent because I
suspect someone will shoot me to pieces for the complaint I'm about to
make, but now that we're to the stage of actual implementation, I
guess I'd like to say this anyway...

I have reservations about any system that makes whitespace significant
in a text file.  I can make an exception for indent levels, as used by
Python, because these are visible and errors are obvious without
resorting to odd tactics like hex editors, vi's :list command, etc.

I say I expect to be shot down because, of course, the proper theory
is that all in a CVS file is opaque and should not be depended upon by
CVS users.  True in theory, but in practice, sometimes I've found it
much quicker to fix, say, a log mistake by hand in a CVS file (yes I'm
aware of the section specifically addressing this in Cederqvist).  The
current danger to editing the file directly is real, but I think much
more easily avoided now than if we come to require a lot of consecutive
lines of just whitespace which, if mangled, could cause overwrites of
other data and suchlike.

I'll leave the message that spurred this commentary below for
reference, but I top-posted because it's really a new subject (well
that, and I suppose I have a bias against bottom-posting:  I'm
blind, and bottom-posting makes me read through the whole blooming
family tree of messages every time a new one comes along grin--but
I digress, and I'm not about to try to change the list's standard
on this).

To the author of the idea being discussed, my apologies for weighing
in so tardily.  I guess I'm guilty of having complacently assumed
nothing would happen.  I see your new behavior is made optional and a
user choice, which I appreciate.

On Mon, Mar 28, 2005 at 07:06:36PM +0100, Dr. David Alan Gilbert wrote:
Hi,
  Well, I've had a crack at implementing the optimisation; and attached
is a patch which seems to work - but there is at least one nasty
hack in it; more about that in a sec.

To enable it you need to add:
  TagOverwriteEnable=yes
to the config file in the CVSROOT; without that it should not
change behaviour in any way (except adding that as a commented
out option with warning to a newly created repository).

It won't give you any performance benefit on the first tag, but should
give something on subsequent tags.  I see some improvement (~15%)
but it is variable, on a large repository that doesn't fit in
memory on my home machine.

It is my first dig into the CVS code base, so I would appreciate
(gentle) comments.

Now some details;
  1) The real nasty hack; this is something that I hadn't thought
  of (and I don't think anyone else noticed?) in my original
  description; the permissions on the rcs files is read only
  so when I need to open them to overwrite I can't - this is a pain;
  this patch has a gratuitous (and obviously WRONG) hack in of
  chmod'ing it before the open - I'm open for any suggestions *if*
  there is a right way of doing this. (This was a pain because
  it was at the very last stage of the patch that I noticed this!).

  2) I don't currently create the dummy ,foo, locking file.

  3) I haven't written any docs yet.

  4) I needed to get a couple of values out of rcsbuf_getkey and
  have shoved them in globals for the moment; I was looking for a
  neater way that wouldn't mean changing all the callers.

  5) I'm worried about the right types to use for file offsets
  in a portable way. (Has anyone tried cvs with rcs files over 2GB?)

The patch is against 1.12.9 which is the version my debian happened to
have.

As I say, suggestions - and experiences welcome.

Dave
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC  HPPA | In Hex /
 \ _|_ http://www.treblig.org   |___/

diff -ur orig/cvs-1.12.9/ChangeLog cvs-1.12.9/ChangeLog
--- orig/cvs-1.12.9/ChangeLog   2004-06-09 15:52:32.0 +0100
+++ cvs-1.12.9/ChangeLog2005-03-24 23:43:48.0 +
@@ -1,3 +1,6 @@
+2005-03-24  Dave Gilbert [EMAIL PROTECTED]
+  * Added fast tagging mechanism; rcs.h/c, parseinfo.c,mkmodules.c
+
 2004-06-09  Derek Price  [EMAIL PROTECTED]
 
* NEWS: Note Stefan  Sebastian's security fixes.
diff -ur orig/cvs-1.12.9/src/admin.c cvs-1.12.9/src/admin.c
--- orig/cvs-1.12.9/src/admin.c 2004-03-22 15:37:34.0 +
+++ cvs-1.12.9/src/admin.c  2005-03-27 20:39:38.0 +0100
@@ -792,7 +792,7 @@
 || (rev = RCS_tag2rev (rcs, p))) /* tag2rev may exit */
{
RCS_check_tag (tag); /* exit if not a valid tag */
-   RCS_settag (rcs, tag, rev);
+   RCS_settag (rcs, tag, rev, NULL);
free (rev);
}
 else
diff -ur orig/cvs-1.12.9/src/commit.c cvs-1.12.9/src/commit.c
--- 

Re: Idea for reducing disk IO on tagging operations

2005-03-28 Thread Dr. David Alan Gilbert
* Doug Lee ([EMAIL PROTECTED]) wrote:
 I followed this discussion only loosely and kept silent because I
 suspect someone will shoot me to pieces for the complaint I'm about to
 make, but now that we're to the stage of actual implementation, I
 guess I'd like to say this anyway...

Hey that's OK.

 I have reservations about any system that makes whitespace significant
 in a text file.  I can make an exception for indent levels, as used by
 Python, because these are visible and errors are obvious without
 resorting to odd tactics like hex editors, vi's :list command, etc.

Let me make it clear that this patch *in no way* makes whitespace
significant; in actual fact it only works because it isn't
significant.

What it does is put a glob of whitespace in when it is convenient;
nothing changes in the parsing or anything - so just like before
that whitespace is completely ignored.

The trick is that when it comes to add a tag it checks to see if there
is spare white space and if so overwrites it; if you removed
the white space or otherwise fettled with the file that is fine;
it won't perform the optimisation.

Indeed this means that an existing cvs client can quite happily
read a repository which has had my patch inflicted on it.

(The existing cvs code that rewrites the file will remove any
excess white space you added up there anyway.)

Dave
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC  HPPA | In Hex /
 \ _|_ http://www.treblig.org   |___/


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


RE: Idea for reducing disk IO on tagging operations

2005-03-28 Thread Jim.Hyslop
[top posting as a courtesy for Doug]

I haven't examined the patch, so I don't know how closely the implementation
matches the proposal, but if I understand the proposed changes, whitespace
is still insignificant, there's just more of it added as a buffer, as an
optimization to improve speed when applying tags. If the implementation is
carried out correctly, then the RCS file will still be compatible with other
RCS-compatible software, some of which could legitimately strip out the
extra whitespace (unless the general practise is to leave whitespace alone).

My only concern around this patch is to make sure robustness has not been
adversely affected. I don't know enough about third-party add-ons to know
for sure, or to comment on their use. 

I also like the fact that the change is optional, so that it can be disabled
if any particular platform is incompatible with the changes.

Doug Lee wrote:
 I have reservations about any system that makes whitespace significant
 in a text file.  I can make an exception for indent levels, as used by
 Python, because these are visible and errors are obvious without
 resorting to odd tactics like hex editors, vi's :list command, etc.
 
 I say I expect to be shot down because, of course, the proper theory
 is that all in a CVS file is opaque and should not be depended upon by
 CVS users.

-- 
Jim Hyslop
Senior Software Designer
Leitch Technology International Inc. ( http://www.leitch.com )
Columnist, C/C++ Users Journal ( http://www.cuj.com/experts )


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-28 Thread Doug Lee
On Mon, Mar 28, 2005 at 02:12:56PM -0500, Jim.Hyslop wrote:
 [top posting as a courtesy for Doug]

Thanks :) I just hope I don't cause a mess by that comment, which I
suppose was fuelled as much by lack of lunch as by anything. :-)

 I haven't examined the patch, so I don't know how closely the implementation
 matches the proposal, but if I understand the proposed changes, whitespace
 is still insignificant, there's just more of it added as a buffer, as an
 optimization to improve speed when applying tags. If the implementation is
 carried out correctly, then the RCS file will still be compatible with other
 RCS-compatible software, some of which could legitimately strip out the
 extra whitespace (unless the general practise is to leave whitespace alone).

You are correct, according to a message the author just sent me.
Consider my complaint dismissed, and thanks for the explanations.

 My only concern around this patch is to make sure robustness has not been
 adversely affected. I don't know enough about third-party add-ons to know
 for sure, or to comment on their use. 
 
 I also like the fact that the change is optional, so that it can be disabled
 if any particular platform is incompatible with the changes.
 
 Doug Lee wrote:
  I have reservations about any system that makes whitespace significant
  in a text file.  I can make an exception for indent levels, as used by
  Python, because these are visible and errors are obvious without
  resorting to odd tactics like hex editors, vi's :list command, etc.
  
  I say I expect to be shot down because, of course, the proper theory
  is that all in a CVS file is opaque and should not be depended upon by
  CVS users.
 
 -- 
 Jim Hyslop
 Senior Software Designer
 Leitch Technology International Inc. ( http://www.leitch.com )
 Columnist, C/C++ Users Journal ( http://www.cuj.com/experts )

-- 
Doug Lee   [EMAIL PROTECTED]http://www.dlee.org
Bartimaeus Group   [EMAIL PROTECTED]   http://www.bartsite.com
It is difficult to produce a television documentary that is both
incisive and probing when every twelve minutes one is interrupted by
dancing rabbits singing about toilet paper.  --Rod Serling


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-23 Thread Dr. David Alan Gilbert
* Jim Hyslop ([EMAIL PROTECTED]) wrote:
 Dr. David Alan Gilbert wrote:
   2) I could do with a better under standing of the directory locks;
   pointers? I've read the top of lock.c but it still doesn't tell me
   enough; for example there seem to be multiple lock files used - but
   then surely the creation of them isn't atomic? Or is there one lock
   file used for both reading and writing?
 The locking process is explained in the manual, at 
 https://www.cvshome.org/docs/manual/cvs-1.11.19/cvs_2.html#SEC17

Thanks Jim for pointing me at that (I'd had a good search through
the FAQ rather than the manual).

(and to Paul - apologies if I misquoted in that last email)

OK; this convinces me that I don't need to worry about cvs reading
my file while it is being modified.  Together with the restriction
of me only performing my trick if the write is entirely within
a block then I feel reasonably safe.

I'm going to have a crack at making this optimisation and will
forward a copy here for discussion when I've done it.

Dave
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC  HPPA | In Hex /
 \ _|_ http://www.treblig.org   |___/


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-22 Thread Dr. David Alan Gilbert
* Mark D. Baushke ([EMAIL PROTECTED]) wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Paul Sander [EMAIL PROTECTED] writes:
 
 Actually, if you look closely, I believe that CVS will not do read-only
 RCS operations if a CVS write-lock exists for the directory. Of course,
 ViewCVS and CVSweb do it all the time as do many of the other add-ons.

I'm getting more worried about this one for 2 seperate reasons:
  1) There is talk of cvs -n for diff and the like which seems to
  suggest it ignores locks.
  2) I could do with a better under standing of the directory locks;
  pointers? I've read the top of lock.c but it still doesn't tell me
  enough; for example there seem to be multiple lock files used - but
  then surely the creation of them isn't atomic? Or is there one lock
  file used for both reading and writing?


  There's also the interrupt issue:  Killing an update before it
  completes leaves the RCS file corrupt.  You'd have to build in some
  kind of crash recovery.  But RCS already has that by way of the comma
  file, which can simply be deleted.  Other crash recovery algorithms
  usually involve transaction logs that can be reversed and replayed, or
  the creation of backup copies.  None of these are more efficient than
  the existing RCS update protocol.
 
 Agreed. This is a very big deal.

Actually I'm becoming less worried by this; I'm failing to see any way
that a single system call write() to a block not crossing a block
boundary could partially fail; but I'm up for suggestions.

Dave

 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC  HPPA | In Hex /
 \ _|_ http://www.treblig.org   |___/


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-22 Thread Jim Hyslop
Dr. David Alan Gilbert wrote:
  2) I could do with a better under standing of the directory locks;
  pointers? I've read the top of lock.c but it still doesn't tell me
  enough; for example there seem to be multiple lock files used - but
  then surely the creation of them isn't atomic? Or is there one lock
  file used for both reading and writing?
The locking process is explained in the manual, at 
https://www.cvshome.org/docs/manual/cvs-1.11.19/cvs_2.html#SEC17

--
Jim

___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-21 Thread Mark D. Baushke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Todd Denniston [EMAIL PROTECTED] writes:

 This reminds me of conversations held earlier in the list. I think
 several of them ended with something to the effect of 'putting the
 /tmp/ or LockDir which cvs uses on a RAM disk should make the whole
 thing _much_ faster'.

Yes.

Our testing also found that using the SAN was faster than the Solaris 9
RAM disk solution. So, that is what we are using these days.

If anyone is really serious about tuning and improving the performance
of their own CVS installation, there should be nothing stopping them
- From tweaking the sources for experimentation and running their own
tests.

If you instrument CVS and find a particular hotspot and then find a way
to make that area more efficient without hurting the rest of the system,
let the bug-cvs@gnu.org list know your results and your patch and you
will likely find changes considered for inclusion in future versions of
CVS.

-- Mark
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQFCPvBl3x41pRYZE/gRAk3eAJ0ageE1b5X67SuvqubxXKXHUPHjIACgikBw
Muqs+EjIyczfddfr7EZT8Aw=
=2X8W
-END PGP SIGNATURE-


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-21 Thread Tony Aiuto

Tagging in particular is slow and I don't think cpu or ram is the
issue (it is a dual xeon with 3GB of RAM).
...

I'll be shot as a heretic, but the real solution is that tags don't
belong in the ,v files in the first place.  IMO the only useful purpose
of tags is to snapshot the entire code base in some way so that you can
roll back to it (or diff against it).   Tagging individual files doesn't
to anything to help you understand their relation over time to the rest
of the code base.

The entire system tag could be accomplished in O(# files) time (rather
than O(repository disk size) by simply creating a manifest of each
file in the repository and its version at the time of the snapshot.
The snapshot/manifests become entities in their own right, so you should
be able to do things like list available snapshots, see when they
were created, add meta information to the snapshot ...

-Tony Aiuto


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Idea for reducing disk IO on tagging operations

2005-03-20 Thread Dr. David Alan Gilbert
Hi,
  I maintain a system that is used to hold a rather large
CVS repository (~1GB give or take) which could do with being faster.
Tagging in particular is slow and I don't think cpu or ram is the
issue (it is a dual xeon with 3GB of RAM).

My suspicion is that at least one of the problems is that when
a tag is added most of the rcs files are rewritten giving a sudden
large amount of data that must be written to disc.

So - here are my questions/ideas - I'd appreciate comments to tell
me whether I'm on the right lines:
  1) As I understand it the tag data is the first of the 3 main
  data structures in the RCS file (tag, comments, diffs) and that
  when I do pretty much any CVS operation I rewrite the whole file -
  is this correct?

  2) White space appears to be irrelevent in RCS files; so adding
  arbitrary amounts in between sections should leave files still
  fully compatible with existing RCS/cvs tools.

  3) So the idea is that when I add a tag I add a bunch of white
  space after the tag (lets say 1KB of spaces split into 64 byte
  lines or similar); when I come to add the next tag I check if
  there is plenty of white space, if there is then instead of
  rewriting the file I just overwrite the white space with my
  new tag data; if there is no space then as I rewrite the
  file I add another lump of white space.

  4) Whether dummy white space is added and how much is controlled
  by the existing size of the RCS file; so an RCS file that is only
  a few KB wont have any space added; that way this mechanism doesn't
  slow down/bloat small repositories.  The amount of white space might
  be chosen to align data structures with disk block boundaries.

  5) My main concern is to do with concurrency/consistency requirements;
  is the file rewrite essential to ensure consistency, or is the
  locking that is carried out sufficient?
  
Does this make sense?

Dave

 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC  HPPA | In Hex /
 \ _|_ http://www.treblig.org   |___/


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-20 Thread Mark D. Baushke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dr. David Alan Gilbert [EMAIL PROTECTED] writes:

 So - here are my questions/ideas - I'd appreciate comments to tell
 me whether I'm on the right lines:
   1) As I understand it the tag data is the
   first of the 3 main data structures in the RCS
   file (tag, comments, diffs) and that when I do
   pretty much any CVS operation I rewrite the
   whole file - is this correct?

CVS write operations on a foo.c,v repository file
will write ,foo.c, and then when the write
operation is successful and without any errors, it
does a rename (,foo.c,, foo.c,v); to make the
new version the official version. While the
,foo.c, file exists, RCS commands will consider
the file locked.

It is desirable to use RCS write semanitcs as many
other tools out there (cf, ViewCVS) use RCS on the
repository and want to obey RCS locking.

   2) White space appears to be irrelevent in RCS
   files; so adding arbitrary amounts in between
   sections should leave files still fully
   compatible with existing RCS/cvs tools.

Tools such as CVSup by default will canonicalize
the whitespace between sections (although this may
be configured). So, yes, whitespace is mostly
irelevent between sections.

   3) So the idea is that when I add a tag I add
   a bunch of white space after the tag (lets say
   1KB of spaces split into 64 byte lines or
   similar); when I come to add the next tag I
   check if there is plenty of white space, if
   there is then instead of rewriting the file I
   just overwrite the white space with my new tag
   data; if there is no space then as I rewrite
   the file I add another lump of white space.

This has the potential to more easily corrupt the
RCS file if the operation is interrupted for any
reason.

   4) Whether dummy white space is added and how
   much is controlled by the existing size of the
   RCS file; so an RCS file that is only a few KB
   wont have any space added; that way this
   mechanism doesn't slow down/bloat small
   repositories. The amount of white space might
   be chosen to align data structures with disk
   block boundaries.
 
   5) My main concern is to do with
   concurrency/consistency requirements; is the
   file rewrite essential to ensure consistency,
   or is the locking that is carried out
   sufficient?
   
 Does this make sense?

It would be more robust to enhance CVS to use an
external database for tagging information instead
of putting the tagging information into the RCS
files directly than to rewrite parts of the RCS
file and hope that the operation didn't corrupt
the file along the way.

You may wish to consider looking at Meta-CVS as I
believe that Kaz keeps a lot of the branching
information outside of the RCS files already.

See http://users.footprints.net/~kaz/mcvs.html
for more details on Meta-CVS.

Good luck,
-- Mark
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQFCPaS23x41pRYZE/gRAjULAJ9RzLHw+gUDoMCbF0zjgmStBJIT9gCfUU83
K/TZMZdXbJx+BWVFaXGS0Jk=
=fz6n
-END PGP SIGNATURE-


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-20 Thread Dr. David Alan Gilbert
[Resend: I sent it with the wrong 'from' address - apologies
if you get both]

* Mark D. Baushke ([EMAIL PROTECTED]) wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 

Hi Mark,
  Thanks for your reply.

 Dr. David Alan Gilbert [EMAIL PROTECTED] writes:
 
  So - here are my questions/ideas - I'd appreciate comments to tell
  me whether I'm on the right lines:
1) As I understand it the tag data is the
first of the 3 main data structures in the RCS
file (tag, comments, diffs) and that when I do
pretty much any CVS operation I rewrite the
whole file - is this correct?
 
 CVS write operations on a foo.c,v repository file
 will write ,foo.c, and then when the write
 operation is successful and without any errors, it
 does a rename (,foo.c,, foo.c,v); to make the
 new version the official version. While the
 ,foo.c, file exists, RCS commands will consider
 the file locked.
 
 It is desirable to use RCS write semanitcs as many
 other tools out there (cf, ViewCVS) use RCS on the
 repository and want to obey RCS locking.

OK, if I create a dummy ,foo.c, before modifying (or create a hardlink
with that name to foo.c,v ?)  would that be sufficient?  Or perhaps create
the ,foo,c, as I normally would - but if I can use this overwrite trick
on the original then I just delete the ,foo.c, file.  Is the problem that
things are allowed to read the original foo.c,v while you are creating
the new version?

 be configured). So, yes, whitespace is mostly
 irelevent between sections.

Great.

3) So the idea is that when I add a tag I add
a bunch of white space after the tag (lets say
1KB of spaces split into 64 byte lines or
similar); when I come to add the next tag I
check if there is plenty of white space, if
there is then instead of rewriting the file I
just overwrite the white space with my new tag
data; if there is no space then as I rewrite
the file I add another lump of white space.
 
 This has the potential to more easily corrupt the
 RCS file if the operation is interrupted for any
 reason.

The act of rewriting adding extra space would be performed using the existing
mechanism (with just some extra add space created in RCS_rewrite);
so that can't be a problem.

So the issue is what happens if the interrupt occurs as I'm overwriting
the white space to add a tag; hmm yes; is it possible to guard against
this by using a single call to write(2) for that?  Is that the problem
you are thinking of?

 It would be more robust to enhance CVS to use an
 external database for tagging information instead
 of putting the tagging information into the RCS
 files directly than to rewrite parts of the RCS
 file and hope that the operation didn't corrupt
 the file along the way.

Sure, seperating the tagging data out is much neater; but what I was
looking for here was a simple speed up which didn't require anything
extra and would be fully compatible with existing tools.

 You may wish to consider looking at Meta-CVS as I
 believe that Kaz keeps a lot of the branching
 information outside of the RCS files already.
 
 See http://users.footprints.net/~kaz/mcvs.html
 for more details on Meta-CVS.

If I was changing to another tool then I'd have a much larger set of
tools to consider (e.g.  subversion) but I'd rather stick with plain CVS
if I can - I've got clients on lots of (weird) OSs that work via pserver
and an infinite number of scripts built around CVS.

Thanks for the reply,

Dave
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC  HPPA | In Hex /
 \ _|_ http://www.treblig.org   |___/


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-20 Thread Paul Sander
Everything that Mark says is true.  I'll add that some shops optimize 
their read operations under certain conditions, and such optimizations 
would break if the RCS files are updated in-place.

What happens is that, if the version of every file can be identified in 
advance (using version number, tag, or branch/timestamp pair) then they 
can invoke RCS directly to fetch file versions, read metadata, and so 
on.  This sidesteps CVS' overhead and can increase performance by as 
much as 50%.  Such operations will also succeed and not interfere with 
write operations to the repository, such as commits and the creation of 
new tags.  Moving tags or using cvs admin may sometimes cause race 
conditions that produce incorrect results, but that all depends on the 
nature of the changes being made at the time and how the readable 
versions have been identified.

The reason that such an optimization works is because RCS rewrites the 
RCS file updates into the lock file, filesystem semantics always keep 
the complete RCS file intact while it's being read, and pre-existing 
data in the RCS file are not changed during write operations (except 
for those race conditions I've identified above, which can be avoided).

On Mar 20, 2005, at 8:28 AM, [EMAIL PROTECTED] wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Dr. David Alan Gilbert [EMAIL PROTECTED] writes:
So - here are my questions/ideas - I'd appreciate comments to tell
me whether I'm on the right lines:
  1) As I understand it the tag data is the
  first of the 3 main data structures in the RCS
  file (tag, comments, diffs) and that when I do
  pretty much any CVS operation I rewrite the
  whole file - is this correct?
CVS write operations on a foo.c,v repository file
will write ,foo.c, and then when the write
operation is successful and without any errors, it
does a rename (,foo.c,, foo.c,v); to make the
new version the official version. While the
,foo.c, file exists, RCS commands will consider
the file locked.
It is desirable to use RCS write semanitcs as many
other tools out there (cf, ViewCVS) use RCS on the
repository and want to obey RCS locking.
  2) White space appears to be irrelevent in RCS
  files; so adding arbitrary amounts in between
  sections should leave files still fully
  compatible with existing RCS/cvs tools.
Tools such as CVSup by default will canonicalize
the whitespace between sections (although this may
be configured). So, yes, whitespace is mostly
irelevent between sections.
  3) So the idea is that when I add a tag I add
  a bunch of white space after the tag (lets say
  1KB of spaces split into 64 byte lines or
  similar); when I come to add the next tag I
  check if there is plenty of white space, if
  there is then instead of rewriting the file I
  just overwrite the white space with my new tag
  data; if there is no space then as I rewrite
  the file I add another lump of white space.
This has the potential to more easily corrupt the
RCS file if the operation is interrupted for any
reason.
  4) Whether dummy white space is added and how
  much is controlled by the existing size of the
  RCS file; so an RCS file that is only a few KB
  wont have any space added; that way this
  mechanism doesn't slow down/bloat small
  repositories. The amount of white space might
  be chosen to align data structures with disk
  block boundaries.
  5) My main concern is to do with
  concurrency/consistency requirements; is the
  file rewrite essential to ensure consistency,
  or is the locking that is carried out
  sufficient?
Does this make sense?
It would be more robust to enhance CVS to use an
external database for tagging information instead
of putting the tagging information into the RCS
files directly than to rewrite parts of the RCS
file and hope that the operation didn't corrupt
the file along the way.
You may wish to consider looking at Meta-CVS as I
believe that Kaz keeps a lot of the branching
information outside of the RCS files already.
See http://users.footprints.net/~kaz/mcvs.html
for more details on Meta-CVS.
Good luck,
-- Mark
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.3 (FreeBSD)
iD8DBQFCPaS23x41pRYZE/gRAjULAJ9RzLHw+gUDoMCbF0zjgmStBJIT9gCfUU83
K/TZMZdXbJx+BWVFaXGS0Jk=
=fz6n
-END PGP SIGNATURE-
___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs
--
Paul Sander   | When a true genius appears in the world, you may
[EMAIL PROTECTED] | know him by this sign:  that all the dunces are in
  | confederacy against him.  -- Jonathan Swift, 
writer.


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-20 Thread Mark D. Baushke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dr. David Alan Gilbert [EMAIL PROTECTED] writes:

 * Mark D. Baushke ([EMAIL PROTECTED]) wrote:
 Hi Mark,
   Thanks for your reply.
 
  Dr. David Alan Gilbert [EMAIL PROTECTED] writes:
  
   So - here are my questions/ideas - I'd appreciate comments to tell
   me whether I'm on the right lines:
 1) As I understand it the tag data is the
 first of the 3 main data structures in the RCS
 file (tag, comments, diffs) and that when I do
 pretty much any CVS operation I rewrite the
 whole file - is this correct?
  
  CVS write operations on a foo.c,v repository file
  will write ,foo.c, and then when the write
  operation is successful and without any errors, it
  does a rename (,foo.c,, foo.c,v); to make the
  new version the official version. While the
  ,foo.c, file exists, RCS commands will consider
  the file locked.
  
  It is desirable to use RCS write semanitcs as many
  other tools out there (cf, ViewCVS) use RCS on the
  repository and want to obey RCS locking.
 
 OK, if I create a dummy ,foo.c, before
 modifying (or create a hardlink with that name
 to foo.c,v ?) would that be sufficient?

I would say that it is likely necessary, but may
not be sufficient.

 Or perhaps create the ,foo,c, as I normally
 would - but if I can use this overwrite trick on
 the original then I just delete the ,foo.c,
 file.

I am unclear how this lets you perform a speedup.

 Is the problem that things are allowed to read
 the original foo.c,v while you are creating the
 new version?

I am given to understand that many of the
anicillary tools that surround CVS make use of
being able to read a consistent ,v file at all
times.

 3) So the idea is that when I add a tag I add
 a bunch of white space after the tag (lets say
 1KB of spaces split into 64 byte lines or
 similar); when I come to add the next tag I
 check if there is plenty of white space, if
 there is then instead of rewriting the file I
 just overwrite the white space with my new tag
 data; if there is no space then as I rewrite
 the file I add another lump of white space.
  
  This has the potential to more easily corrupt the
  RCS file if the operation is interrupted for any
  reason.
 
 The act of rewriting adding extra space would be
 performed using the existing mechanism (with
 just some extra add space created in
 RCS_rewrite); so that can't be a problem.

Adding extra data to the ,foo.c, file during the
normal write operation should not be a problem.

 So the issue is what happens if the interrupt
 occurs as I'm overwriting the white space to add
 a tag; hmm yes; 

Correct. Depending on the filesystem kind and the
level of I/O, your rewrite could impact up to three
fileblocks and the directory data.

 is it possible to guard against this by using a
 single call to write(2) for that? 

Not for all possible filesystem types.

 Is that the problem you are thinking of?

Yes. Even worse things can happen in this regard
if the filesystem is a 'stateless' one such as an
NFS mounted directory (we keep advising folks
against using them, but I know for a fact that
they are still used).

  It would be more robust to enhance CVS to use an
  external database for tagging information instead
  of putting the tagging information into the RCS
  files directly than to rewrite parts of the RCS
  file and hope that the operation didn't corrupt
  the file along the way.
 
 Sure, seperating the tagging data out is much
 neater; but what I was looking for here was a
 simple speed up which didn't require anything
 extra and would be fully compatible with
 existing tools.

And you are finding that existing tools torture
the assumptions you are able to make about the CVS
repository.

FWIW: (In my personal experience) using a SAN
solution for your repository storage allows you
much better throughput for all write operations in
the general case as the SAN can guarentee the
writes are okay before the disk actually does it.

Optimizing for tagging does not seem very useful
to me as we typically do not drop that many tags
on our repository.

  You may wish to consider looking at Meta-CVS
  as I believe that Kaz keeps a lot of the
  branching information outside of the RCS files
  already.
  
  See http://users.footprints.net/~kaz/mcvs.html
  for more details on Meta-CVS.
 
 If I was changing to another tool then I'd have
 a much larger set of tools to consider (e.g.
 subversion) but I'd rather stick with plain CVS
 if I can - I've got clients on lots of (weird)
 OSs that work via pserver and an infinite number
 of scripts built around CVS.

Indeed. Part of the difficulty with CVS
development has been worrying about legacy
software assumptions.

-- Mark
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQFCPfR63x41pRYZE/gRAr5/AKCVOkIlgvWabSYXCJ10JbT6W7tMqACdFQs0
6WWc8Ig8hFISTOJK3IhGUB8=
=PW+V
-END PGP SIGNATURE-



Re: Idea for reducing disk IO on tagging operations

2005-03-20 Thread Dr. David Alan Gilbert
* Paul Sander ([EMAIL PROTECTED]) wrote:

Hi Paul,
  Thanks for the reply,

 Everything that Mark says is true.  I'll add that some shops optimize 
 their read operations under certain conditions, and such optimizations 
 would break if the RCS files are updated in-place.
 
 What happens is that, if the version of every file can be identified in 
 advance (using version number, tag, or branch/timestamp pair) then they 
 can invoke RCS directly to fetch file versions, read metadata, and so 
 on.  This sidesteps CVS' overhead and can increase performance by as 

So are these tricks *never* performed by cvs itself? i.e. would my
trick (if I can solve the interrupted write case) be completely
safe with any use of cvs as long as you didn't access the files
externally?

Dave
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC  HPPA | In Hex /
 \ _|_ http://www.treblig.org   |___/


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-20 Thread Dr. David Alan Gilbert
* Mark D. Baushke ([EMAIL PROTECTED]) wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Dr. David Alan Gilbert [EMAIL PROTECTED] writes:
  
  OK, if I create a dummy ,foo.c, before
  modifying (or create a hardlink with that name
  to foo.c,v ?) would that be sufficient?
 
 I would say that it is likely necessary, but may
 not be sufficient.

Hmm ok.

  Or perhaps create the ,foo,c, as I normally
  would - but if I can use this overwrite trick on
  the original then I just delete the ,foo.c,
  file.
 
 I am unclear how this lets you perform a speedup.

I only create the ,foo.c, file - I don't write anything into it; the
existence of the file is enough to act as the RCS lock; if I can do my
inplace modification then I delete this file after doing it, if not then
I proceed as normal and just write the ,foo.c, file and do the rename
as you normally would.

  Is the problem that things are allowed to read
  the original foo.c,v while you are creating the
  new version?
 
 I am given to understand that many of the
 anicillary tools that surround CVS make use of
 being able to read a consistent ,v file at all
 times.

This is very tricky; I don't think in our case we use any such tools
(we might have a cvs/web thing for browsing it, but this is probably
not critical); and as long I can guarentee what I do is safe as far
as CVS itself is concerned I think I'd be prepared to go for it as a
configurable mechanism.

  So the issue is what happens if the interrupt
  occurs as I'm overwriting the white space to add
  a tag; hmm yes; 
 
 Correct. Depending on the filesystem kind and the
 level of I/O, your rewrite could impact up to three
 fileblocks and the directory data.
 
  is it possible to guard against this by using a
  single call to write(2) for that? 
 
 Not for all possible filesystem types.
 
  Is that the problem you are thinking of?
 
 Yes. Even worse things can happen in this regard
 if the filesystem is a 'stateless' one such as an
 NFS mounted directory (we keep advising folks
 against using them, but I know for a fact that
 they are still used).

OK, my conscience will let me carefully ignore NFS issues given the
pain it causes me elsewhere (and I make my mechanism switchable).
What happens if I only used the overwrite mechanism if
none of the characters being modified crossed a 512 (e.g.) byte
boundary offset in the file?  Since the spaces were actually
written in a previous operation we can assume that the space
is allocated and no allocation operation is going to happen
at this point (mumble filesystem journalling mumble!).

  Sure, seperating the tagging data out is much
  neater; but what I was looking for here was a
  simple speed up which didn't require anything
  extra and would be fully compatible with
  existing tools.
 
 And you are finding that existing tools torture
 the assumptions you are able to make about the CVS
 repository.

Nod; it is quite painful!

 FWIW: (In my personal experience) using a SAN
 solution for your repository storage allows you
 much better throughput for all write operations in
 the general case as the SAN can guarentee the
 writes are okay before the disk actually does it.

But when you throw a GB of writes at them in a short time from a tag
accross our whole repository they aren't going to be happy - they are
going to want to get rid of that backlog of write data ASAP.

 Optimizing for tagging does not seem very useful
 to me as we typically do not drop that many tags
 on our repository.

In the company I work for we are very tag heavy, but more importantly
it is the tagging that gets in peoples way and places the strain
on the write bandwidth of the discs/RAID.

Dave
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC  HPPA | In Hex /
 \ _|_ http://www.treblig.org   |___/


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-20 Thread Paul Sander
On Mar 20, 2005, at 3:54 PM, [EMAIL PROTECTED] wrote:
* Mark D. Baushke ([EMAIL PROTECTED]) wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Dr. David Alan Gilbert [EMAIL PROTECTED] writes:
OK, if I create a dummy ,foo.c, before
modifying (or create a hardlink with that name
to foo.c,v ?) would that be sufficient?
I would say that it is likely necessary, but may
not be sufficient.
Hmm ok.
Or perhaps create the ,foo,c, as I normally
would - but if I can use this overwrite trick on
the original then I just delete the ,foo.c,
file.
I am unclear how this lets you perform a speedup.
I only create the ,foo.c, file - I don't write anything into it; the
existence of the file is enough to act as the RCS lock; if I can do my
inplace modification then I delete this file after doing it, if not 
then
I proceed as normal and just write the ,foo.c, file and do the rename
as you normally would.
You're forgetting something:  The RCS commands will complete read-only 
operations on RCS files even in the presence of the comma files owned 
by other processes.  Your update protocol introduces race conditions in 
which the RCS file is not self-consistent at all times.

There's also the interrupt issue:  Killing an update before it 
completes leaves the RCS file corrupt.  You'd have to build in some 
kind of crash recovery.  But RCS already has that by way of the comma 
file, which can simply be deleted.  Other crash recovery algorithms 
usually involve transaction logs that can be reversed and replayed, or 
the creation of backup copies.  None of these are more efficient than 
the existing RCS update protocol.

So the issue is what happens if the interrupt
occurs as I'm overwriting the white space to add
a tag; hmm yes;
Correct. Depending on the filesystem kind and the
level of I/O, your rewrite could impact up to three
fileblocks and the directory data.
is it possible to guard against this by using a
single call to write(2) for that?
Not for all possible filesystem types.
You'd have to guarantee that the write is atomic and flushes results 
completely to disk, even in the presence of things like power failures. 
 It's hard to make this guarantee given all the buffering that goes on 
below the write(2) API.

Optimizing for tagging does not seem very useful
to me as we typically do not drop that many tags
on our repository.
In the company I work for we are very tag heavy, but more importantly
it is the tagging that gets in peoples way and places the strain
on the write bandwidth of the discs/RAID.
I once built a successful system that tracked desirable configurations 
by building lists of file/version pairs, then committing and tagging 
the lists.  The lists were built by polling the Entries files in 
workspaces (and making sure there were no uncommitted changes).  This 
was fast and efficient, and it opens you up to use the optimization I 
mentioned earlier.  And if you rely on floating tags, such lists could 
track the history of the tags as well.

In addition, an algebra can be easily written to manipulate such lists. 
 Combine this with a way to link these lists with your defect tracking 
system, and you have the tools to build a very good change control 
system.

--
Paul Sander   | Lets stick to the new mistakes and get rid of the 
old
[EMAIL PROTECTED] | ones -- William Brown


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-20 Thread Mark D. Baushke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dr. David Alan Gilbert [EMAIL PROTECTED] writes:

 * Paul Sander ([EMAIL PROTECTED]) wrote:
 
 Hi Paul,
   Thanks for the reply,
 
  Everything that Mark says is true.  I'll add that some shops optimize 
  their read operations under certain conditions, and such optimizations 
  would break if the RCS files are updated in-place.
  
  What happens is that, if the version of every file can be identified in 
  advance (using version number, tag, or branch/timestamp pair) then they 
  can invoke RCS directly to fetch file versions, read metadata, and so 
  on.  This sidesteps CVS' overhead and can increase performance by as 
 
 So are these tricks *never* performed by cvs itself? 

Never? Hmmm... well, the CVS from cvshome.org will not read a foo.c,v
file while the CVS read-lock or a CVS write-lock is owned by another
process.

The real problem is dealing with filesystem errors while RCS is updating
the ,v file. I would not trust that the RCS write manipulations will
always fail in a safe manner.

 i.e. would my trick (if I can solve the interrupted write case) be
 completely safe with any use of cvs as long as you didn't access the
 files externally?

I am not able to say that it would ever be 'completely safe' to do as
you suggest. You would need to greatly harden the failure paths of CVS
to ensure that the file being modified is not just discarded in the
event of a filesystem error by CVS itself. I would not wish to attempt
to do it myself.

-- Mark
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQFCPnPk3x41pRYZE/gRAi8hAJkBOVbkrD8oSF7/tn4BzFl6JWY5yQCfSKop
72vIMJsvjAoBlQA0NRhf25E=
=dWOz
-END PGP SIGNATURE-


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Idea for reducing disk IO on tagging operations

2005-03-20 Thread Mark D. Baushke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Paul Sander [EMAIL PROTECTED] writes:

  I only create the ,foo.c, file - I don't write anything into it; the
  existence of the file is enough to act as the RCS lock; if I can do my
  inplace modification then I delete this file after doing it, if not
  then
  I proceed as normal and just write the ,foo.c, file and do the rename
  as you normally would.
 
 You're forgetting something:  The RCS commands will complete read-only
 operations on RCS files even in the presence of the comma files owned
 by other processes.  Your update protocol introduces race conditions
 in which the RCS file is not self-consistent at all times.

Actually, if you look closely, I believe that CVS will not do read-only
RCS operations if a CVS write-lock exists for the directory. Of course,
ViewCVS and CVSweb do it all the time as do many of the other add-ons.

 There's also the interrupt issue:  Killing an update before it
 completes leaves the RCS file corrupt.  You'd have to build in some
 kind of crash recovery.  But RCS already has that by way of the comma
 file, which can simply be deleted.  Other crash recovery algorithms
 usually involve transaction logs that can be reversed and replayed, or
 the creation of backup copies.  None of these are more efficient than
 the existing RCS update protocol.

Agreed. This is a very big deal.

Dr. David Alan Gilbert [EMAIL PROTECTED] writes:

  FWIW: (In my personal experience) using a SAN
  solution for your repository storage allows you
  much better throughput for all write operations in
  the general case as the SAN can guarentee the
  writes are okay before the disk actually does it.
 
 But when you throw a GB of writes at them in a short time from a tag
 accross our whole repository they aren't going to be happy - they are
 going to want to get rid of that backlog of write data ASAP.

I believe you will find that the performance knee for a commercial SAN
that is well provisioned happens when you hit a 2GB of sustained writes.
You are more likely to run into problems with bandwidth to the
fiberchannel mesh first.

For us, I seem to recall that the actual bottleneck is the creation of
the /tmp/cvs-server$$ trees for a 'cvs tag' operation. So, you results
will also depend on how shallow or deep your module hierarchy runs.

  Optimizing for tagging does not seem very useful
  to me as we typically do not drop that many tags
  on our repository.
 
 In the company I work for we are very tag heavy, but more importantly
 it is the tagging that gets in peoples way and places the strain on
 the write bandwidth of the discs/RAID.

Sure, a conventional RAID can be very expensive to rewrite all of the
files.

It is certainly possible that a close look at CVS performance
bottlenecks may find some places where improvements in throughput could
be gained. However, I and not at all certain that your particular
suggestion would be the best use of optimization time.

Enjoy!
-- Mark
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQFCPnkr3x41pRYZE/gRAtu0AJ4qNbP4WSN9C60hZsaBejYwYcbnDACdGsOZ
RMw/SnkdG/mGOP2oyrdWnis=
=lD1h
-END PGP SIGNATURE-


___
Info-cvs mailing list
Info-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/info-cvs