Re: Kernel SCM saga.. (bk license?)

2005-04-12 Thread Ricky Beam
On Tue, 12 Apr 2005, Kedar Sovani wrote:
>I was wondering if working on git, is in anyway, in violation of the
>Bitkeeper license, which states that you cannot work on any other SCM
>(SCM-like?) tool for "x" amount of time after using Bitkeeper ?

Technically, yes, it is.  However, as BitMover has given the community
little other choice, I don't see how they could hold anyone to it.  They'd
have a hard time making that 1year clause stick given their abandonment
of the free product and refusal to grant licenses to OSDL employees.

Plus, there's nothing in the bkl specifically granting BitMover the
right to revoke the license and thus use of BK/Free at their whim.
They have every right to stop developing, supporting, and distributing
BK/Free, but recending all BK/Free licenses just for spite does not
appear to be within their legal rights.

(Sorry Larry, but that's what you're doing.  Tridge was working on taking
 your toys apart -- he does that, what can I say.  He explicitly lied and
 said he would stop, but of course didn't.  And then you got all pissed
 at OSDL for not smiting him when, technically, they can't -- an employer
 is not responsible for the actions of their employees on their own time,
 on their own property, unrelated to their employ.  Sorry, but I know that
 one by heart :-))

--Ricky


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-12 Thread Pavel Machek
Hi!

> > It's possible to generate another object with the same hash, but:
> 
> Yeah - the real check is that the modified object has to
> compile and do something useful for someone (the cracker
> if no one else).
> 
> Just getting a random bucket of bits substituted for a
> real kernel source file isn't going to get me into the
> cracker hall of fame, only into their odd-news of the
> day.

I actually two different files with same md5 sum in my local CVS
repository. It would be very wrong if CVS did not do the right thing
with those files.

Yes, I was playing with md5, see "md5 to be considered harmfull
today". And I wanted old version of my "exploits" to be archived.
 
Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga.. (bk license?)

2005-04-12 Thread Catalin Marinas
Kedar Sovani <[EMAIL PROTECTED]> wrote:
> I was wondering if working on git, is in anyway, in violation of the
> Bitkeeper license, which states that you cannot work on any other SCM
> (SCM-like?) tool for "x" amount of time after using Bitkeeper ?

That's valid for the new BK license only which probably wasn't
accepted by Linus.

-- 
Catalin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga.. (bk license?)

2005-04-12 Thread Kedar Sovani
I was wondering if working on git, is in anyway, in violation of the
Bitkeeper license, which states that you cannot work on any other SCM
(SCM-like?) tool for "x" amount of time after using Bitkeeper ?


Kedar. 

On Apr 8, 2005 10:12 AM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> 
> On Thu, 7 Apr 2005, Chris Wedgwood wrote:
> >
> > I'm playing with monotone right now.  Superficially it looks like it
> > has tons of gee-whiz neato stuff...  however, it's *agonizingly* slow.
> > I mean glacial.  A heavily sedated sloth with no legs is probably
> > faster.
> 
> Yes. The silly thing is, at least in my local tests it doesn't actually
> seem to be _doing_ anything while it's slow (there are no system calls
> except for a few memory allocations and de-allocations). It seems to have
> some exponential function on the number of pathnames involved etc.
> 
> I'm hoping they can fix it, though. The basic notions do not sound wrong.
> 
> In the meantime (and because monotone really _is_ that slow), here's a
> quick challenge for you, and any crazy hacker out there: if you want to
> play with something _really_ nasty (but also very _very_ fast), take a
> look at kernel.org:/pub/linux/kernel/people/torvalds/.
> 
> First one to send me the changelog tree of sparse-git (and a tool to
> commit and push/pull further changes) gets a gold star, and an honorable
> mention. I've put a hell of a lot of clues in there (*).
> 
> I've worked on it (and little else) for the last two days. Time for
> somebody else to tell me I'm crazy.
> 
> Linus
> 
> (*) It should be easier than it sounds. The database is designed so that
> you can do the equivalent of a nonmerging (ie pure superset) push/pull
> with just plain rsync, so replication really should be that easy (if
> somewhat bandwidth-intensive due to the whole-file format).
> 
> Never mind merging. It's not an SCM, it's a distribution and archival
> mechanism. I bet you could make a reasonable SCM on top of it, though.
> Another way of looking at it is to say that it's really a content-
> addressable filesystem, used to track directory trees.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga.. (bk license?)

2005-04-12 Thread Kedar Sovani
I was wondering if working on git, is in anyway, in violation of the
Bitkeeper license, which states that you cannot work on any other SCM
(SCM-like?) tool for x amount of time after using Bitkeeper ?


Kedar. 

On Apr 8, 2005 10:12 AM, Linus Torvalds [EMAIL PROTECTED] wrote:
 
 
 On Thu, 7 Apr 2005, Chris Wedgwood wrote:
 
  I'm playing with monotone right now.  Superficially it looks like it
  has tons of gee-whiz neato stuff...  however, it's *agonizingly* slow.
  I mean glacial.  A heavily sedated sloth with no legs is probably
  faster.
 
 Yes. The silly thing is, at least in my local tests it doesn't actually
 seem to be _doing_ anything while it's slow (there are no system calls
 except for a few memory allocations and de-allocations). It seems to have
 some exponential function on the number of pathnames involved etc.
 
 I'm hoping they can fix it, though. The basic notions do not sound wrong.
 
 In the meantime (and because monotone really _is_ that slow), here's a
 quick challenge for you, and any crazy hacker out there: if you want to
 play with something _really_ nasty (but also very _very_ fast), take a
 look at kernel.org:/pub/linux/kernel/people/torvalds/.
 
 First one to send me the changelog tree of sparse-git (and a tool to
 commit and push/pull further changes) gets a gold star, and an honorable
 mention. I've put a hell of a lot of clues in there (*).
 
 I've worked on it (and little else) for the last two days. Time for
 somebody else to tell me I'm crazy.
 
 Linus
 
 (*) It should be easier than it sounds. The database is designed so that
 you can do the equivalent of a nonmerging (ie pure superset) push/pull
 with just plain rsync, so replication really should be that easy (if
 somewhat bandwidth-intensive due to the whole-file format).
 
 Never mind merging. It's not an SCM, it's a distribution and archival
 mechanism. I bet you could make a reasonable SCM on top of it, though.
 Another way of looking at it is to say that it's really a content-
 addressable filesystem, used to track directory trees.
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga.. (bk license?)

2005-04-12 Thread Catalin Marinas
Kedar Sovani [EMAIL PROTECTED] wrote:
 I was wondering if working on git, is in anyway, in violation of the
 Bitkeeper license, which states that you cannot work on any other SCM
 (SCM-like?) tool for x amount of time after using Bitkeeper ?

That's valid for the new BK license only which probably wasn't
accepted by Linus.

-- 
Catalin

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-12 Thread Pavel Machek
Hi!

  It's possible to generate another object with the same hash, but:
 
 Yeah - the real check is that the modified object has to
 compile and do something useful for someone (the cracker
 if no one else).
 
 Just getting a random bucket of bits substituted for a
 real kernel source file isn't going to get me into the
 cracker hall of fame, only into their odd-news of the
 day.

I actually two different files with same md5 sum in my local CVS
repository. It would be very wrong if CVS did not do the right thing
with those files.

Yes, I was playing with md5, see md5 to be considered harmfull
today. And I wanted old version of my exploits to be archived.
 
Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga.. (bk license?)

2005-04-12 Thread Ricky Beam
On Tue, 12 Apr 2005, Kedar Sovani wrote:
I was wondering if working on git, is in anyway, in violation of the
Bitkeeper license, which states that you cannot work on any other SCM
(SCM-like?) tool for x amount of time after using Bitkeeper ?

Technically, yes, it is.  However, as BitMover has given the community
little other choice, I don't see how they could hold anyone to it.  They'd
have a hard time making that 1year clause stick given their abandonment
of the free product and refusal to grant licenses to OSDL employees.

Plus, there's nothing in the bkl specifically granting BitMover the
right to revoke the license and thus use of BK/Free at their whim.
They have every right to stop developing, supporting, and distributing
BK/Free, but recending all BK/Free licenses just for spite does not
appear to be within their legal rights.

(Sorry Larry, but that's what you're doing.  Tridge was working on taking
 your toys apart -- he does that, what can I say.  He explicitly lied and
 said he would stop, but of course didn't.  And then you got all pissed
 at OSDL for not smiting him when, technically, they can't -- an employer
 is not responsible for the actions of their employees on their own time,
 on their own property, unrelated to their employ.  Sorry, but I know that
 one by heart :-))

--Ricky


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-11 Thread Jan Hudec
On Mon, Apr 11, 2005 at 04:56:06 +0200, Marcin Dalecki wrote:
> 
> On 2005-04-11, at 04:26, Miles Bader wrote:
> 
> >Marcin Dalecki <[EMAIL PROTECTED]> writes:
> >>Better don't waste your time with looking at Arch. Stick with patches
> >>you maintain by hand combined with some scripts containing a list of
> >>apply commands and you should be still more productive then when using
> >>Arch.
> >
> >Arch has its problems, but please lay off the uninformed flamebait (the
> >"issues" you complain about are so utterly minor as to be laughable).
> 
> I wish you a lot of laughter after replying to an already 3 days old 
> message,
> which was my final on Arch.

Marcin Dalecki <[EMAIL PROTECTED]> complained:
> Arch isn't a sound example of software design. Quite contrary to the 
> random notes posted by it's author the following issues did strike me 
> the time I did evaluate it:
> [...]

I didn't comment on this first time, but I see I should have. *NONE* of
the issues you complained about were issues of *DESIGN*. They were all
issues of *ENGINEERING*. *ENGINEERING* issues can be fixed. One of the
issues does not even exist any longer (the diff/patch one -- it now
checks they are the right ones -- and in all other respects it is
*exactly* the same as depending on a library)

But what really matters here is the concept. Arch has a simple concept,
that works well. Others have different concepts, that work well or
almost well too (Darcs, Monotone).

---
 Jan 'Bulb' Hudec <[EMAIL 
PROTECTED]>


signature.asc
Description: Digital signature


Re: Kernel SCM saga..

2005-04-11 Thread Jan Hudec
On Mon, Apr 11, 2005 at 04:56:06 +0200, Marcin Dalecki wrote:
 
 On 2005-04-11, at 04:26, Miles Bader wrote:
 
 Marcin Dalecki [EMAIL PROTECTED] writes:
 Better don't waste your time with looking at Arch. Stick with patches
 you maintain by hand combined with some scripts containing a list of
 apply commands and you should be still more productive then when using
 Arch.
 
 Arch has its problems, but please lay off the uninformed flamebait (the
 issues you complain about are so utterly minor as to be laughable).
 
 I wish you a lot of laughter after replying to an already 3 days old 
 message,
 which was my final on Arch.

Marcin Dalecki [EMAIL PROTECTED] complained:
 Arch isn't a sound example of software design. Quite contrary to the 
 random notes posted by it's author the following issues did strike me 
 the time I did evaluate it:
 [...]

I didn't comment on this first time, but I see I should have. *NONE* of
the issues you complained about were issues of *DESIGN*. They were all
issues of *ENGINEERING*. *ENGINEERING* issues can be fixed. One of the
issues does not even exist any longer (the diff/patch one -- it now
checks they are the right ones -- and in all other respects it is
*exactly* the same as depending on a library)

But what really matters here is the concept. Arch has a simple concept,
that works well. Others have different concepts, that work well or
almost well too (Darcs, Monotone).

---
 Jan 'Bulb' Hudec [EMAIL 
PROTECTED]


signature.asc
Description: Digital signature


Re: Kernel SCM saga..

2005-04-10 Thread Marcin Dalecki
On 2005-04-11, at 04:26, Miles Bader wrote:
Marcin Dalecki <[EMAIL PROTECTED]> writes:
Better don't waste your time with looking at Arch. Stick with patches
you maintain by hand combined with some scripts containing a list of
apply commands and you should be still more productive then when using
Arch.
Arch has its problems, but please lay off the uninformed flamebait (the
"issues" you complain about are so utterly minor as to be laughable).
I wish you a lot of laughter after replying to an already 3 days old 
message,
which was my final on Arch.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Miles Bader
Marcin Dalecki <[EMAIL PROTECTED]> writes:
> Better don't waste your time with looking at Arch. Stick with patches
> you maintain by hand combined with some scripts containing a list of
> apply commands and you should be still more productive then when using
> Arch.

Arch has its problems, but please lay off the uninformed flamebait (the
"issues" you complain about are so utterly minor as to be laughable).

-Miles
-- 
Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Christian Parpart
On Monday 11 April 2005 12:33 am, you wrote:
[..]
> Well, I followed some of the instructions to mirror the kernel tree on
> svn.clkao.org/linux/cvs, and although it took around 12 hours to import
> 28232 versions, I seem to have a mirror of it on my own subversion
> server now. I think the svn.clkao.org mirror was taken from bkcvs... the
> last log message I see is "Rev 28232 - torvalds - 2005-04-04 09:08:33"

I'd love to see svk as a real choice for you guys, but I don't mind as along 
as I get a door open using svn/svk ;);)

> I have no idea what's missing. What is everyone's favorite web frontend
> to subversion? 

Check out ViewCVS at: http://viewcvs.sourceforge.net/
This seem widely used (not just by me ^o^).

Regards,
Christian Parpart.

-- 
Netiquette: http://www.ietf.org/rfc/rfc1855.txt
 01:55:08 up 18 days, 15:01,  2 users,  load average: 0.27, 0.39, 0.36


pgpxCXxtEIvrr.pgp
Description: PGP signature


Re: Kernel SCM saga..

2005-04-10 Thread Troy Benjegerdes
On Thu, Apr 07, 2005 at 02:29:24PM -0400, Daniel Phillips wrote:
> On Thursday 07 April 2005 14:13, Dmitry Yusupov wrote:
> > On Thu, 2005-04-07 at 13:54 -0400, Daniel Phillips wrote:
> > > Three years ago, there was no fully working open source distributed scm
> > > code base to use as a starting point, so extending BK would have been the
> > > only easy alternative.  But since then the situation has changed.  There
> > > are now several working code bases to provide a good starting point:
> > > Monotone, Arch, SVK, Bazaar-ng and others.
> >
> > Right. For example, SVK is pretty mature project and very close to 1.0
> > release now. And it supports all kind of merges including Cherry-Picking
> > Mergeback:
> >
> > http://svk.elixus.org/?MergeFeatures
> 
> So for an interim way to get the patch flow back online, SVK is ready to try 
> _now_, and we only need a way to import the version graph?  (true/false)

Well, I followed some of the instructions to mirror the kernel tree on
svn.clkao.org/linux/cvs, and although it took around 12 hours to import
28232 versions, I seem to have a mirror of it on my own subversion
server now. I think the svn.clkao.org mirror was taken from bkcvs... the
last log message I see is "Rev 28232 - torvalds - 2005-04-04 09:08:33"

I have no idea what's missing. What is everyone's favorite web frontend
to subversion? I've got websvn (debian package) on there now, and it's a
bit sluggish, but it seems to work.

I hope to have time this week or next to actually make this machine
publicly accessible.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Paul Jackson
Ingo wrote:
> not the compression of every file separately.

ok

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Matthias Andree
Andrea Arcangeli schrieb am 2005-04-09:

> On Fri, Apr 08, 2005 at 05:12:49PM -0700, Linus Torvalds wrote:
> > really designed for something like a offline http grabber, in that you can 
> > just grab files purely by filename (and verify that you got them right by 
> > running sha1sum on the resulting local copy). So think "wget".
> 
> I'm not entirely convinced wget is going to be an efficient way to
> synchronize and fetch your tree, its simplicitly is great though. It's a

wget is probably a VERY UNWISE choice:



-- 
Matthias Andree
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Paul Jackson
> It's possible to generate another object with the same hash, but:

Yeah - the real check is that the modified object has to
compile and do something useful for someone (the cracker
if no one else).

Just getting a random bucket of bits substituted for a
real kernel source file isn't going to get me into the
cracker hall of fame, only into their odd-news of the
day.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Ingo Molnar

* Paul Jackson <[EMAIL PROTECTED]> wrote:

> Ingo wrote:
> > With default gzip it's 3.3 seconds though,
> > and that still compresses it down to 57 MB.
> 
> Interesting.  I'm surprised how much a bunch of separate, modest sized
> files can be compressed.

sorry, what i measured was in essence the tarball. I.e. not the 
compression of every file separately. I should have been clear about 
that ...

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Paul Jackson
Ingo wrote:
> With default gzip it's 3.3 seconds though,
> and that still compresses it down to 57 MB.

Interesting.  I'm surprised how much a bunch of separate, modest sized
files can be compressed.

I'm unclear what matters most here.

Space on disk certainly isn't much of an issue.  Even with Andrew Morton
on our side, we still can't grow the kernel as fast as the disk drive
manufacturers can grow disk sizes.

Main memory size of the compressed history matters to Linus and his top
20 lieutenants doing full kernel source patching as a primary mission if
they can't fit the source _history_ in main memory.  But those people
are running 1 GByte or more of RAM - so whether it is 95, 57 or 45
MBytes, it fits fine.  The rest of us are mostly concerned with whether
a kernel build fits in memory.

Looking at an arch i386 kernel build tree I have at hand, I see the
following disk usage:

102 MBytes - BitKeeper/*
287 MBytes - */SCCS/* (outside of already counted BitKeeper/*)
232 MBytes - checked out source files
 94 MBytes - ELF and other build byproducts
---
715 MBytes - Total

Converting from bk to git, I guess this becomes:

 97 MBytes - git (zlib)
232 MBytes - checked out source files
 94 MBytes - ELF and other build byproducts
---
423 MBytes - Total

Size matters when its a two to one difference, but when we are down to a
10% to 15% difference in the Total, its presentation that matters.  The
above numbers tell me that this is not a pure size issue for local disk
or memory usage.

What does matter that I can see:

 1) Linus explicitly stated he wanted "a raw zlib compressed blob,
not a gzipped file", to encourage everyone to use the git tools to
access this data.  He did not "want people editing repostitory files
by hand."  I'm not sure what he gains here - it did annoy me for a
couple hours before I decided fixing my supper was more important.

 2) The time to compress will be noticed by users as a delay when
checking in changes (I'm guessing zlib compresses relatively faster).

 3) The time to copy compressed data over the internet will be
noticed by users when upgrading kernel versions (gzip can
compress smaller).

 4) Decompress times are smaller so don't matter as much.

 5) Zlib has a nice library, and is patent free.  I don't know about gzip.

 6) As you note, zlib has rsync-friendly, recovery-friendly Z_PARTIAL_FLUSH.
I don't know about gzip.

My guess is that Linus finds (2) and (3) to balance each other, and that
(1) decides the point, in favor of zlib.  Well, that or a simpler
hypothesis, that he found the nice library (5) convenient, and (1)
sealed the deal, with the other tradeoffs passing through his
subconscious faster than he bothered to verbalize them.

You (Ingo) seem in your second message to be encouraging further
consideration of gzip, for its improved compression.

How will that matter to us, day to day?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Bill Davidsen
On Sun, 10 Apr 2005, Junio C Hamano wrote:

> >>>>> "DL" == David Lang <[EMAIL PROTECTED]> writes:
> 
> DL> just wanted to point out that recent news shows that sha1 isn't as
> DL> good as it was thought to be (far easier to deliberatly create
> DL> collisions then it should be)
> 
> I suspect there is no need to do so...

It's possible to generate another object with the same hash, but:
 - you can't just take your desired object and do magic to make it hash
   right
 - it may not have the same length (almost certainly)
 - it's still non-trivial in terms of computation needed

> 
>   Message-ID: <[EMAIL PROTECTED]>
>   From: Linus Torvalds <[EMAIL PROTECTED]>
>   Subject: Re: Kernel SCM saga..
>   Date: Sat, 9 Apr 2005 09:16:22 -0700 (PDT)
> 
>   ...
> 
>   Linus 
> 
>   (*) yeah, yeah, I know about the current theoretical case, and I don't
>   care. Not only is it theoretical, the way my objects are packed you'd have
>   to not just generate the same SHA1 for it, it would have to _also_ still
>   be a valid zlib object _and_ get the header to match the "type + length"  
>   of object part. IOW, the object validity checks are actually even stricter
>   than just "sha1 matches".
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
bill davidsen <[EMAIL PROTECTED]>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread David Roundy
On Sun, Apr 10, 2005 at 11:24:07AM +0200, Giuseppe Bilotta wrote:
> On Sat, 9 Apr 2005 12:17:58 -0400, David Roundy wrote:
> 
> > I've recently made some improvements recently which will reduce the
> > memory use
> 
> Does this include check for redundancy? ;)

Yeah, the only catch is that if the redundancy checks fail, we now may
leave the repository in an inconsistent, but repairable, state.  (Only a
cache of the pristine tree is affected.)  The recent improvements mostly
came by increasing the laziness of a few operations, which meant we don't
need to store the entire parsed tree (or parsed patch) in memory for
certain operations.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Ingo Molnar

* Paul Jackson <[EMAIL PROTECTED]> wrote:

> These 16817 files consume:
> 
>   224 MBytes uncompressed and
>95 MBytes compressed
> 
> (using zlib's minigzip, on a 4 KB page reiserfs.)

that's a 42.4% compressed size. Using a (much) more CPU-intense 
compression method (bzip -9), the compressed size is down to 45 MBytes.  
(a ratio of 20.2%)

using default 'gzip' i get 57 MB compressed.

> Since each change will get its own copy of the file, multiplying these
> two sizes (224 and 95) by 12.2 changes per file means the disk cost
> would be:
> 
>   2.73 GByte uncompressed, or
>   1.16 GBytes compressed.

with bzip2 -9 it would be 551 MBytes. It might as well be practical on 
faster CPUs, a full tree (224 MBytes, 45 MBytes compressed) decompresses 
in 24 seconds on a 3.4GHz P4 - single CPU. (and with dual core likely 
becoming the standard, we might as well divide that by two) With default 
gzip it's 3.3 seconds though, and that still compresses it down to 57 
MB.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Ingo Molnar

* David S. Miller <[EMAIL PROTECTED]> wrote:

> On Fri, 8 Apr 2005 22:45:18 -0700 (PDT)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > Also, I don't want people editing repostitory files by hand. Sure, the 
> > sha1 catches it, but still... I'd rather force the low-level ops to use 
> > the proper helper routines. Which is why it's a raw zlib compressed blob, 
> > not a gzipped file.
> 
> I understand the arguments for compression, but I hate it for one
> simple reason: recovery is more difficult when you corrupt some
> file in your repository.
> 
> It's happened to me more than once and I did lose data.
> 
> Without compression, I might be able to recover if something
> causes a block of zeros to be written to the middle of some
> repository file.  With compression, you pretty much just lose.

that depends on how you compress. You are perfectly right that with 
default zlib compression, where you start the compression stream and 
stop it at the end of the file, recovery in case of damage is very hard 
for the portion that comes _after_ the damaged section. You'd have to 
reconstruct the compression state which is akin to breaking a key.

But with zlib you can 'flush' the compression state every couple of 
blocks and basically get the same recovery properties, at some very 
minimal extra space cost (because when you flush out compression state 
you get some extra padding bytes).

Flushing has another advantage as well: a small delta (even if it 
increases/decreases the file size!) in the middle of a larger file will 
still be compressed to the same output both before and after the change 
area (modulo flush block size), which rsync can pick up just fine. (IIRC 
that is one of the reasons why Debian, when compressing .deb's, does 
zlib-flushes every couple of blocks, so that rsync/apt-get can pick up 
partial .deb's as well.)

the zlib option is i think Z_PARTIAL_FLUSH, i'm using it in Tux to do 
chunks of compression. The flushing cost ismax 12 bytes or so, so if 
it's done every 4K we maximize the cost to 0.2%.

so flushing is both rsync-friendly and recovery-friendly.

(recovery isnt as simple as with plaintext, as you have to find the next 
'block' and the block length will be inevitably variable. But it should 
be pretty predictable, and tools might even exist.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Junio C Hamano
>>>>> "DL" == David Lang <[EMAIL PROTECTED]> writes:

DL> just wanted to point out that recent news shows that sha1 isn't as
DL> good as it was thought to be (far easier to deliberatly create
DL> collisions then it should be)

I suspect there is no need to do so...

  Message-ID: <[EMAIL PROTECTED]>
  From: Linus Torvalds <[EMAIL PROTECTED]>
  Subject: Re: Kernel SCM saga..
  Date: Sat, 9 Apr 2005 09:16:22 -0700 (PDT)

  ...

  Linus 

  (*) yeah, yeah, I know about the current theoretical case, and I don't
  care. Not only is it theoretical, the way my objects are packed you'd have
  to not just generate the same SHA1 for it, it would have to _also_ still
  be a valid zlib object _and_ get the header to match the "type + length"  
  of object part. IOW, the object validity checks are actually even stricter
  than just "sha1 matches".

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Giuseppe Bilotta
On Sat, 9 Apr 2005 12:17:58 -0400, David Roundy wrote:

> I've recently made some improvements
> recently which will reduce the memory use

Does this include check for redundancy? ;)

-- 
Giuseppe "Oblomov" Bilotta

Hic manebimus optime

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread David Lang
On Sat, 9 Apr 2005, Linus Torvalds wrote:
The biggest irritation I have with the "tree" format I chose is actually
not the name (which is trivial), it's the  part. Almost everything
else keeps the  in the ASCII hexadecimal representation, and I
should have done that here too. Why? Not because it's a  - hey, the
binary representation is certainly denser and equivalent - but because an
ASCII representation there would have allowed me to much more easily
change the key format if I ever wanted to. Now it's very SHA1-specific.
Which I guess is fine - I don't really see any reason to change, and if I
do change, I could always just re-generate the whole tree. But I think it
would have been cleaner to have _that_ part in ASCII.
just wanted to point out that recent news shows that sha1 isn't as good as 
it was thought to be (far easier to deliberatly create collisions then it 
should be)

this hasn't reached a point where you HAVE to quit useing it (especially 
since you have the other validity checks in place), but it's a good reason 
to expect that you may want to change to something else in a few years.

it's a lot easier to change things now to make that move easier then once 
this is being used extensively

David Lang
--
There are two ways of constructing a software design. One way is to make it so 
simple that there are obviously no deficiencies. And the other way is to make 
it so complicated that there are no obvious deficiencies.
 -- C.A.R. Hoare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread David Lang
On Sat, 9 Apr 2005, Linus Torvalds wrote:
The biggest irritation I have with the tree format I chose is actually
not the name (which is trivial), it's the sha1 part. Almost everything
else keeps the sha1 in the ASCII hexadecimal representation, and I
should have done that here too. Why? Not because it's a sha1 - hey, the
binary representation is certainly denser and equivalent - but because an
ASCII representation there would have allowed me to much more easily
change the key format if I ever wanted to. Now it's very SHA1-specific.
Which I guess is fine - I don't really see any reason to change, and if I
do change, I could always just re-generate the whole tree. But I think it
would have been cleaner to have _that_ part in ASCII.
just wanted to point out that recent news shows that sha1 isn't as good as 
it was thought to be (far easier to deliberatly create collisions then it 
should be)

this hasn't reached a point where you HAVE to quit useing it (especially 
since you have the other validity checks in place), but it's a good reason 
to expect that you may want to change to something else in a few years.

it's a lot easier to change things now to make that move easier then once 
this is being used extensively

David Lang
--
There are two ways of constructing a software design. One way is to make it so 
simple that there are obviously no deficiencies. And the other way is to make 
it so complicated that there are no obvious deficiencies.
 -- C.A.R. Hoare
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Giuseppe Bilotta
On Sat, 9 Apr 2005 12:17:58 -0400, David Roundy wrote:

 I've recently made some improvements
 recently which will reduce the memory use

Does this include check for redundancy? ;)

-- 
Giuseppe Oblomov Bilotta

Hic manebimus optime

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Junio C Hamano
 DL == David Lang [EMAIL PROTECTED] writes:

DL just wanted to point out that recent news shows that sha1 isn't as
DL good as it was thought to be (far easier to deliberatly create
DL collisions then it should be)

I suspect there is no need to do so...

  Message-ID: [EMAIL PROTECTED]
  From: Linus Torvalds [EMAIL PROTECTED]
  Subject: Re: Kernel SCM saga..
  Date: Sat, 9 Apr 2005 09:16:22 -0700 (PDT)

  ...

  Linus 

  (*) yeah, yeah, I know about the current theoretical case, and I don't
  care. Not only is it theoretical, the way my objects are packed you'd have
  to not just generate the same SHA1 for it, it would have to _also_ still
  be a valid zlib object _and_ get the header to match the type + length  
  of object part. IOW, the object validity checks are actually even stricter
  than just sha1 matches.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Ingo Molnar

* David S. Miller [EMAIL PROTECTED] wrote:

 On Fri, 8 Apr 2005 22:45:18 -0700 (PDT)
 Linus Torvalds [EMAIL PROTECTED] wrote:
 
  Also, I don't want people editing repostitory files by hand. Sure, the 
  sha1 catches it, but still... I'd rather force the low-level ops to use 
  the proper helper routines. Which is why it's a raw zlib compressed blob, 
  not a gzipped file.
 
 I understand the arguments for compression, but I hate it for one
 simple reason: recovery is more difficult when you corrupt some
 file in your repository.
 
 It's happened to me more than once and I did lose data.
 
 Without compression, I might be able to recover if something
 causes a block of zeros to be written to the middle of some
 repository file.  With compression, you pretty much just lose.

that depends on how you compress. You are perfectly right that with 
default zlib compression, where you start the compression stream and 
stop it at the end of the file, recovery in case of damage is very hard 
for the portion that comes _after_ the damaged section. You'd have to 
reconstruct the compression state which is akin to breaking a key.

But with zlib you can 'flush' the compression state every couple of 
blocks and basically get the same recovery properties, at some very 
minimal extra space cost (because when you flush out compression state 
you get some extra padding bytes).

Flushing has another advantage as well: a small delta (even if it 
increases/decreases the file size!) in the middle of a larger file will 
still be compressed to the same output both before and after the change 
area (modulo flush block size), which rsync can pick up just fine. (IIRC 
that is one of the reasons why Debian, when compressing .deb's, does 
zlib-flushes every couple of blocks, so that rsync/apt-get can pick up 
partial .deb's as well.)

the zlib option is i think Z_PARTIAL_FLUSH, i'm using it in Tux to do 
chunks of compression. The flushing cost ismax 12 bytes or so, so if 
it's done every 4K we maximize the cost to 0.2%.

so flushing is both rsync-friendly and recovery-friendly.

(recovery isnt as simple as with plaintext, as you have to find the next 
'block' and the block length will be inevitably variable. But it should 
be pretty predictable, and tools might even exist.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Ingo Molnar

* Paul Jackson [EMAIL PROTECTED] wrote:

 These 16817 files consume:
 
   224 MBytes uncompressed and
95 MBytes compressed
 
 (using zlib's minigzip, on a 4 KB page reiserfs.)

that's a 42.4% compressed size. Using a (much) more CPU-intense 
compression method (bzip -9), the compressed size is down to 45 MBytes.  
(a ratio of 20.2%)

using default 'gzip' i get 57 MB compressed.

 Since each change will get its own copy of the file, multiplying these
 two sizes (224 and 95) by 12.2 changes per file means the disk cost
 would be:
 
   2.73 GByte uncompressed, or
   1.16 GBytes compressed.

with bzip2 -9 it would be 551 MBytes. It might as well be practical on 
faster CPUs, a full tree (224 MBytes, 45 MBytes compressed) decompresses 
in 24 seconds on a 3.4GHz P4 - single CPU. (and with dual core likely 
becoming the standard, we might as well divide that by two) With default 
gzip it's 3.3 seconds though, and that still compresses it down to 57 
MB.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread David Roundy
On Sun, Apr 10, 2005 at 11:24:07AM +0200, Giuseppe Bilotta wrote:
 On Sat, 9 Apr 2005 12:17:58 -0400, David Roundy wrote:
 
  I've recently made some improvements recently which will reduce the
  memory use
 
 Does this include check for redundancy? ;)

Yeah, the only catch is that if the redundancy checks fail, we now may
leave the repository in an inconsistent, but repairable, state.  (Only a
cache of the pristine tree is affected.)  The recent improvements mostly
came by increasing the laziness of a few operations, which meant we don't
need to store the entire parsed tree (or parsed patch) in memory for
certain operations.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Bill Davidsen
On Sun, 10 Apr 2005, Junio C Hamano wrote:

  DL == David Lang [EMAIL PROTECTED] writes:
 
 DL just wanted to point out that recent news shows that sha1 isn't as
 DL good as it was thought to be (far easier to deliberatly create
 DL collisions then it should be)
 
 I suspect there is no need to do so...

It's possible to generate another object with the same hash, but:
 - you can't just take your desired object and do magic to make it hash
   right
 - it may not have the same length (almost certainly)
 - it's still non-trivial in terms of computation needed

 
   Message-ID: [EMAIL PROTECTED]
   From: Linus Torvalds [EMAIL PROTECTED]
   Subject: Re: Kernel SCM saga..
   Date: Sat, 9 Apr 2005 09:16:22 -0700 (PDT)
 
   ...
 
   Linus 
 
   (*) yeah, yeah, I know about the current theoretical case, and I don't
   care. Not only is it theoretical, the way my objects are packed you'd have
   to not just generate the same SHA1 for it, it would have to _also_ still
   be a valid zlib object _and_ get the header to match the type + length  
   of object part. IOW, the object validity checks are actually even stricter
   than just sha1 matches.
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

-- 
bill davidsen [EMAIL PROTECTED]
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Paul Jackson
Ingo wrote:
 With default gzip it's 3.3 seconds though,
 and that still compresses it down to 57 MB.

Interesting.  I'm surprised how much a bunch of separate, modest sized
files can be compressed.

I'm unclear what matters most here.

Space on disk certainly isn't much of an issue.  Even with Andrew Morton
on our side, we still can't grow the kernel as fast as the disk drive
manufacturers can grow disk sizes.

Main memory size of the compressed history matters to Linus and his top
20 lieutenants doing full kernel source patching as a primary mission if
they can't fit the source _history_ in main memory.  But those people
are running 1 GByte or more of RAM - so whether it is 95, 57 or 45
MBytes, it fits fine.  The rest of us are mostly concerned with whether
a kernel build fits in memory.

Looking at an arch i386 kernel build tree I have at hand, I see the
following disk usage:

102 MBytes - BitKeeper/*
287 MBytes - */SCCS/* (outside of already counted BitKeeper/*)
232 MBytes - checked out source files
 94 MBytes - ELF and other build byproducts
---
715 MBytes - Total

Converting from bk to git, I guess this becomes:

 97 MBytes - git (zlib)
232 MBytes - checked out source files
 94 MBytes - ELF and other build byproducts
---
423 MBytes - Total

Size matters when its a two to one difference, but when we are down to a
10% to 15% difference in the Total, its presentation that matters.  The
above numbers tell me that this is not a pure size issue for local disk
or memory usage.

What does matter that I can see:

 1) Linus explicitly stated he wanted a raw zlib compressed blob,
not a gzipped file, to encourage everyone to use the git tools to
access this data.  He did not want people editing repostitory files
by hand.  I'm not sure what he gains here - it did annoy me for a
couple hours before I decided fixing my supper was more important.

 2) The time to compress will be noticed by users as a delay when
checking in changes (I'm guessing zlib compresses relatively faster).

 3) The time to copy compressed data over the internet will be
noticed by users when upgrading kernel versions (gzip can
compress smaller).

 4) Decompress times are smaller so don't matter as much.

 5) Zlib has a nice library, and is patent free.  I don't know about gzip.

 6) As you note, zlib has rsync-friendly, recovery-friendly Z_PARTIAL_FLUSH.
I don't know about gzip.

My guess is that Linus finds (2) and (3) to balance each other, and that
(1) decides the point, in favor of zlib.  Well, that or a simpler
hypothesis, that he found the nice library (5) convenient, and (1)
sealed the deal, with the other tradeoffs passing through his
subconscious faster than he bothered to verbalize them.

You (Ingo) seem in your second message to be encouraging further
consideration of gzip, for its improved compression.

How will that matter to us, day to day?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Ingo Molnar

* Paul Jackson [EMAIL PROTECTED] wrote:

 Ingo wrote:
  With default gzip it's 3.3 seconds though,
  and that still compresses it down to 57 MB.
 
 Interesting.  I'm surprised how much a bunch of separate, modest sized
 files can be compressed.

sorry, what i measured was in essence the tarball. I.e. not the 
compression of every file separately. I should have been clear about 
that ...

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Paul Jackson
 It's possible to generate another object with the same hash, but:

Yeah - the real check is that the modified object has to
compile and do something useful for someone (the cracker
if no one else).

Just getting a random bucket of bits substituted for a
real kernel source file isn't going to get me into the
cracker hall of fame, only into their odd-news of the
day.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Matthias Andree
Andrea Arcangeli schrieb am 2005-04-09:

 On Fri, Apr 08, 2005 at 05:12:49PM -0700, Linus Torvalds wrote:
  really designed for something like a offline http grabber, in that you can 
  just grab files purely by filename (and verify that you got them right by 
  running sha1sum on the resulting local copy). So think wget.
 
 I'm not entirely convinced wget is going to be an efficient way to
 synchronize and fetch your tree, its simplicitly is great though. It's a

wget is probably a VERY UNWISE choice:

http://www.derkeiler.com/Mailing-Lists/securityfocus/bugtraq/2004-12/0106.html

-- 
Matthias Andree
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Paul Jackson
Ingo wrote:
 not the compression of every file separately.

ok

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Troy Benjegerdes
On Thu, Apr 07, 2005 at 02:29:24PM -0400, Daniel Phillips wrote:
 On Thursday 07 April 2005 14:13, Dmitry Yusupov wrote:
  On Thu, 2005-04-07 at 13:54 -0400, Daniel Phillips wrote:
   Three years ago, there was no fully working open source distributed scm
   code base to use as a starting point, so extending BK would have been the
   only easy alternative.  But since then the situation has changed.  There
   are now several working code bases to provide a good starting point:
   Monotone, Arch, SVK, Bazaar-ng and others.
 
  Right. For example, SVK is pretty mature project and very close to 1.0
  release now. And it supports all kind of merges including Cherry-Picking
  Mergeback:
 
  http://svk.elixus.org/?MergeFeatures
 
 So for an interim way to get the patch flow back online, SVK is ready to try 
 _now_, and we only need a way to import the version graph?  (true/false)

Well, I followed some of the instructions to mirror the kernel tree on
svn.clkao.org/linux/cvs, and although it took around 12 hours to import
28232 versions, I seem to have a mirror of it on my own subversion
server now. I think the svn.clkao.org mirror was taken from bkcvs... the
last log message I see is Rev 28232 - torvalds - 2005-04-04 09:08:33

I have no idea what's missing. What is everyone's favorite web frontend
to subversion? I've got websvn (debian package) on there now, and it's a
bit sluggish, but it seems to work.

I hope to have time this week or next to actually make this machine
publicly accessible.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Christian Parpart
On Monday 11 April 2005 12:33 am, you wrote:
[..]
 Well, I followed some of the instructions to mirror the kernel tree on
 svn.clkao.org/linux/cvs, and although it took around 12 hours to import
 28232 versions, I seem to have a mirror of it on my own subversion
 server now. I think the svn.clkao.org mirror was taken from bkcvs... the
 last log message I see is Rev 28232 - torvalds - 2005-04-04 09:08:33

I'd love to see svk as a real choice for you guys, but I don't mind as along 
as I get a door open using svn/svk ;);)

 I have no idea what's missing. What is everyone's favorite web frontend
 to subversion? 

Check out ViewCVS at: http://viewcvs.sourceforge.net/
This seem widely used (not just by me ^o^).

Regards,
Christian Parpart.

-- 
Netiquette: http://www.ietf.org/rfc/rfc1855.txt
 01:55:08 up 18 days, 15:01,  2 users,  load average: 0.27, 0.39, 0.36


pgpxCXxtEIvrr.pgp
Description: PGP signature


Re: Kernel SCM saga..

2005-04-10 Thread Miles Bader
Marcin Dalecki [EMAIL PROTECTED] writes:
 Better don't waste your time with looking at Arch. Stick with patches
 you maintain by hand combined with some scripts containing a list of
 apply commands and you should be still more productive then when using
 Arch.

Arch has its problems, but please lay off the uninformed flamebait (the
issues you complain about are so utterly minor as to be laughable).

-Miles
-- 
Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-10 Thread Marcin Dalecki
On 2005-04-11, at 04:26, Miles Bader wrote:
Marcin Dalecki [EMAIL PROTECTED] writes:
Better don't waste your time with looking at Arch. Stick with patches
you maintain by hand combined with some scripts containing a list of
apply commands and you should be still more productive then when using
Arch.
Arch has its problems, but please lay off the uninformed flamebait (the
issues you complain about are so utterly minor as to be laughable).
I wish you a lot of laughter after replying to an already 3 days old 
message,
which was my final on Arch.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Albert Cahalan
Linus Torvalds writes:

> NOTE! I detest the centralized SCM model, but if push comes to shove,
> and we just _can't_ get a reasonable parallell merge thing going in
> the short timeframe (ie month or two), I'll use something like SVN
> on a trusted site with just a few committers, and at least try to
> distribute the merging out over a few people rather than making _me_
> be the throttle.
>
> The reason I don't really want to do that is once we start doing
> it that way, I suspect we'll have a _really_ hard time stopping.
> I think it's a broken model. So I'd much rather try to have some
> pain in the short run and get a better model running, but I just
> wanted to let people know that I'm pragmatic enough that I realize
> that we may not have much choice.

I think you at least instinctively know this, but...

Centralized SCM means you have to grant and revoke commit access,
which means that Linux gets the disease of ugly BSD politics.

Under both the old pre-BitKeeper patch system and under BitKeeper,
developer rank is fuzzy. Everyone knows that some developers are
more central than others, but it isn't fully public and well-defined.
You can change things day by day without having to demote anyone.
While Linux development isn't completely without jealousy and pride,
few have stormed off (mostly IDE developers AFAIK) and none have
forked things as severely as OpenBSD and DragonflyBSD.

You may rank developer X higher than developer Y, but they have
only a guess as to how things are. Perhaps developer X would be
a prideful jerk if he knew. Perhaps developer Y would quit in
resentment if he knew.

Whatever you do, please avoid the BSD-style politics.

(the MAINTAINERS file is bad enough; it has caused problems)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Linus wrote:
> Almost everything
> else keeps the  in the ASCII hexadecimal representation, and I
> should have done that here too. Why? Not because it's a  - hey, the 
> binary representation is certainly denser and equivalent

Since the size of  ASCII sha1's is only about 18% larger
than the size of the same number of binary sha1's , I
don't see you gain much from the binary.

I cast my non-existent vote for making the sha1 ascii - while you still can ;).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Chris wrote:
> How many is alot?  Are we talking 100k, 1m, 10m?

I pulled some numbers out of my bk tree for Linux.

I have 16817 source files.

They average 12.2 bitkeeper changes per file (counting the number of
changes visible from doing 'bk sccslog' on each of the 16817 files). 

These 16817 files consume:

224 MBytes uncompressed and
 95 MBytes compressed

(using zlib's minigzip, on a 4 KB page reiserfs.)

Since each change will get its own copy of the file, multiplying these
two sizes (224 and 95) by 12.2 changes per file means the disk cost
would be:

2.73 GByte uncompressed, or
1.16 GBytes compressed.

I was pleasantly surprised at the degree of compression, shrinking files
to 42% of their original size.  I expected, since the classic rule of
thumb here to archive before compressing wasn't being followed (nor
should it be) and we were compressing lots a little files, we would save
fewer disk blocks than this.

Of course, since as Linus reminds us, it's disk buffers in memory,
not blocks on disk, that are precious, it's more like we will save
224 - 95 == 129 MBytes of RAM to hold one entire tree.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Re: Kernel SCM saga..

2005-04-09 Thread Phillip Lougher
On Apr 10, 2005 2:42 AM, Petr Baudis <[EMAIL PROTECTED]> wrote:
> Dear diary, on Sun, Apr 10, 2005 at 03:01:12AM CEST, I got a letter
> where Phillip Lougher <[EMAIL PROTECTED]> told me that...
> > On Apr 9, 2005 3:53 AM, Petr Baudis <[EMAIL PROTECTED]> wrote:
> >
> > >   FWIW, I made few small fixes (to prevent some trivial usage errors to
> > > cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
> > > gitlog.sh - heavily inspired by what already went through the mailing
> > > list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
> > > (including .dircache, even though it isn't shown in the index), the
> > > cumulative patch can be found below. The scripts aim to provide some
> > > (obviously very interim) more high-level interface for git.
> >
> > I did a bit of playing about with the changelog generate script,
> > trying to produce a faster version.  The attached version uses a
> > couple of improvements to be a lot faster (e.g. no recursion in the
> > common case of one parent).
> >
> > FWIW it is 7x faster than makechlog.sh (4.342 secs vs 34.129 secs) and
> > 28x faster than gitlog.sh (4.342 secs vs 2 mins 4 secs) on my
> > hardware.  You mileage may of course vary.
> 
> Wow, really impressive! Great work, I've merged it (if you don't object,
> of course).

Of course I don't object...

> 
> Wondering why I wasn't in the Cc list, BTW.

Weird, it wasn't intentional.  I read LKML in Gmail (which I don't use
for much else), and just clicked "reply", expecting to do the right
thing.  Replying to this email it's also left you off the CC list. 
Looking at the email source I believe it's probably to do with the
following:

Mail-Followup-To: Linus Torvalds <[EMAIL PROTECTED]>,
[EMAIL PROTECTED],
Kernel Mailing List > 

I've CC'd you explicitly on this.

Phillip
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Re: Kernel SCM saga..

2005-04-09 Thread Petr Baudis
Dear diary, on Sun, Apr 10, 2005 at 03:01:12AM CEST, I got a letter
where Phillip Lougher <[EMAIL PROTECTED]> told me that...
> On Apr 9, 2005 3:53 AM, Petr Baudis <[EMAIL PROTECTED]> wrote:
> 
> >   FWIW, I made few small fixes (to prevent some trivial usage errors to
> > cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
> > gitlog.sh - heavily inspired by what already went through the mailing
> > list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
> > (including .dircache, even though it isn't shown in the index), the
> > cumulative patch can be found below. The scripts aim to provide some
> > (obviously very interim) more high-level interface for git.
> 
> I did a bit of playing about with the changelog generate script,
> trying to produce a faster version.  The attached version uses a
> couple of improvements to be a lot faster (e.g. no recursion in the
> common case of one parent).
> 
> FWIW it is 7x faster than makechlog.sh (4.342 secs vs 34.129 secs) and
> 28x faster than gitlog.sh (4.342 secs vs 2 mins 4 secs) on my
> hardware.  You mileage may of course vary.

Wow, really impressive! Great work, I've merged it (if you don't object,
of course).

Wondering why I wasn't in the Cc list, BTW.

-- 
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
98% of the time I am right. Why worry about the other 3%.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Kernel SCM saga..

2005-04-09 Thread Phillip Lougher
On Apr 9, 2005 3:53 AM, Petr Baudis <[EMAIL PROTECTED]> wrote:

>   FWIW, I made few small fixes (to prevent some trivial usage errors to
> cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
> gitlog.sh - heavily inspired by what already went through the mailing
> list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
> (including .dircache, even though it isn't shown in the index), the
> cumulative patch can be found below. The scripts aim to provide some
> (obviously very interim) more high-level interface for git.

I did a bit of playing about with the changelog generate script,
trying to produce a faster version.  The attached version uses a
couple of improvements to be a lot faster (e.g. no recursion in the
common case of one parent).

FWIW it is 7x faster than makechlog.sh (4.342 secs vs 34.129 secs) and
28x faster than gitlog.sh (4.342 secs vs 2 mins 4 secs) on my
hardware.  You mileage may of course vary.

Regards

Phillip

--
#!/bin/sh

changelog() {
local parents new_parent
declare -a new_parent

new_parent[0]=$1
parents=1

while [ $parents -gt 0 ]; do
parent=${new_parent[$((parents-1))]}
echo $parent >> $TMP
cat-file commit $parent > $TMP_FILE

echo me $parent
cat $TMP_FILE
echo -e "\n--\n"

parents=0
while read type text; do
if [ $type = 'committer' ]; then
break;
elif [ $type = 'parent' ] &&
! grep -q $text $TMP ; then
new_parent[$parents]=$text
parents=$((parents+1))
fi
done < $TMP_FILE

i=0
while [ $i -lt $((parents-1)) ]; do
changelog ${new_parent[$i]}
i=$((i+1))
done
done
}

TMP=`mktemp`
TMP_FILE=`mktemp`

base=$1
if [ ! "$base" ]; then
base=$(cat .dircache/HEAD)
fi
changelog $base
rm -rf $TMP $TMP_FILE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
David wrote:
> recovery is more difficult when you corrupt some
> file in your repository.

Agreed.  I too have recovered RCS and SCCS files by hand editing.


Linus wrote:
> I don't want people editing repostitory files by hand.

Tyrant !;)

>From Wikipedia:

A tyrant is a usurper of rightful power.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Chris Wedgwood
On Sat, Apr 09, 2005 at 04:13:51PM -0700, Linus Torvalds wrote:

> > I understand the arguments for compression, but I hate it for one
> > simple reason: recovery is more difficult when you corrupt some
> > file in your repository.

I've had this too.  Magic binary blobs are horrible here for data loss
which is why I'm not keen on subversion.

> Trust me, the way git does things, you'll have so much redundancy
> that you'll have to really _work_ at losing data.

It's not clear to me that compression should be *required* though.
Shouldn't we be able to turn this off in some cases?

> The bad news is that this is obviously why it does eat a lot of
> disk.

Disk is cheap, but sadly page-cache is not :-(

> Since it saves full-file commits, you're going to have a lot of
> (compressed) full files around.

How many is alot?  Are we talking 100k, 1m, 10m?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Tupshin Harper
Roman Zippel wrote:
It seems you exported the complete parent information and this is exactly 
the "nitty-gritty" I was "whining" about and which is not available via 
bkcvs or bkweb and it's the most crucial information to make the bk data 
useful outside of bk. Larry was previously very clear about this that he 
considers this proprietary bk meta data and anyone attempting to export 
this information is in violation with the free bk licence, so you indeed 
just took the important parts and this is/was explicitly verboten for 
normal bk users.
 

Yes, this is exactly the information that would be necessary to create a 
general interop tool between bk and darcs|arch|monotone, and is the 
fundamental objection I and others have had to open source projects 
using BK. Is Bitmover willing to grant a special dispensation to allow a 
lossless conversion of the linux history to another format?

-Tupshin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Linus Torvalds


On Sat, 9 Apr 2005, David S. Miller wrote:
> 
> I understand the arguments for compression, but I hate it for one
> simple reason: recovery is more difficult when you corrupt some
> file in your repository.

Trust me, the way git does things, you'll have so much redundancy that 
you'll have to really _work_ at losing data.

That's the good news.

The bad news is that this is obviously why it does eat a lot of disk. 
Since it saves full-file commits, you're going to have a lot of 
(compressed) full files around.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread David S. Miller
On Fri, 8 Apr 2005 22:45:18 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> Also, I don't want people editing repostitory files by hand. Sure, the 
> sha1 catches it, but still... I'd rather force the low-level ops to use 
> the proper helper routines. Which is why it's a raw zlib compressed blob, 
> not a gzipped file.

I understand the arguments for compression, but I hate it for one
simple reason: recovery is more difficult when you corrupt some
file in your repository.

It's happened to me more than once and I did lose data.

Without compression, I might be able to recover if something
causes a block of zeros to be written to the middle of some
repository file.  With compression, you pretty much just lose.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Florian Weimer
* David Lang:

>> Databases supporting replication are called high end. You forgot
>> the cats dance around the network this issue involves.
>
> And Postgres (which is Free in all senses of the word) is high end by this 
> definition.

I'm not aware of *any* DBMS, commercial or not, which can perform
meaningful multi-master replication on tables which mainly consist of
text files as records.  All you can get is single-master replication
(which is well-understood), or some rather scary stuff which involves
throwing away updates, or taking extrema or averages (even automatic
3-way merges aren't available).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Ray Lee
On Sat, 2005-04-09 at 19:40 +0200, Roman Zippel wrote:
> On Sat, 9 Apr 2005, Eric D. Mudama wrote:
> > > For example bk does something like this:
> > > 
> > > A1 -> A2 -> A3 -> BM
> > >   \-> B1 -> B2 --^
> > > 
> > > and instead of creating the merge changeset, one could merge them like
> > > this:
> > > 
> > > A1 -> A2 -> A3 -> B1 -> B2

> > I believe that flattening the change graph makes history reproduction
> > impossible, or alternately, you are imposing on each developer to test
> > the merge results at B1 + A1..3 before submission, but in doing so,
> > the test time may require additional test periods etc and with
> > sufficient velocity, might never close.
> 
> The merge result has to be tested either way, so I'm not exactly sure, 
> what you're trying to say.

The kernel changes. A lot. And often.

With that in mind, if (for example) A2 and A3 are simple changes that
are quick to test and B1 is large, or complex, or requires hours (days,
weeks) of testing to validate, then a maintainer's decision can
legitimately be to rebase a tree (say, -mm) upon the B1 line of
development, and toss the A2 branch back to those developers with a
"Sorry it didn't work out, something here causes Unhappiness with B1,
can you track down the problem and try again?"

Ray

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Marcin Dalecki
On 2005-04-09, at 17:42, Paul Jackson wrote:
Marcin wrote:
But what will impress you are either the price tag the
DB comes with or
the hardware it runs on :-)
The payroll for the staffing to care and feed for these
babies is often impressive as well.
Please don't forget the bill from the electric plant behind it!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: Kernel SCM saga..

2005-04-09 Thread Petr Baudis
Dear diary, on Sat, Apr 09, 2005 at 09:08:59AM CEST, I got a letter
where "Randy.Dunlap" <[EMAIL PROTECTED]> told me that...
> On Sat, 9 Apr 2005 04:53:57 +0200 Petr Baudis wrote:
..snip..
> |   FWIW, I made few small fixes (to prevent some trivial usage errors to
> | cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
> | gitlog.sh - heavily inspired by what already went through the mailing
> | list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
> | (including .dircache, even though it isn't shown in the index), the
> | cumulative patch can be found below. The scripts aim to provide some
> | (obviously very interim) more high-level interface for git.
> | 
> |   I'm now working on tree-diff.c which will (surprise!) produce a diff
> | of two trees (I'll finish it after I get some sleep, though), and then I
> | will probably do some dwimmy gitdiff.sh wrapper for tree-diff and
> | show-diff. At that point I might get my hand on some pull more kind to
> | local changes.
> 
> Hi,

  Hi,

> I'll look at your scripts this weekend.  I've also been
> working on some, but mine are a bit more experimental (cruder)
> than yours are.  Anyway, here they are (attached) -- also
> available at http://developer.osdl.org/rddunlap/git/
> 
> gitin : checkin/commit
> gitwhat sha1 : what is that sha1 file (type and contents if blob or commit)
> gitlist (blob, commit, tree, or all) :
>   list all objects with type (commit, tree, blob, or all)

  thanks - I had a look, but so far I borrowed only the prompt message
from your gitin. ;-) I'm not sure if gitwhat would be useful for me in
any way and gitlist doesn't appear too practical to me either.

  In the meantime, I've made some progress too. I made ls-tree, which
will just convert the tree object to a human readable (and script
processable) form, and wrapper gitls.sh, which will also try to guess
the tree ID. parent-id will just return the commit ID(s) of the previous
commit(s), practical if you want to diff against the previous commit
easily etc.  And finally, there is gitdiff.sh, which will produce a diff
of any two trees.

  Everything is again available at http://pasky.or.cz/~pasky/dev/git/
and again including .dircache, even though it's invisible in the index.
The cumulative patch (against 0.03) is there as well as below, generated
by the

./gitdiff.sh 0af20307bb4c634722af0f9203dac7b3222c4a4f

command. The empty entries are changed modes (664 vs 644), I will yet
have to think about how to denote them if the content didn't change;
or I might ignore them altogether...?

  You can obviously fetch any arbitrary change by doing the appropriate
gitdiff.sh call. You can find the ids in the ChangeLog, which was
generated by the plain

./gitlog.sh

command. (That is for HEAD. 0af20307bb4c634722af0f9203dac7b3222c4a4f is
the last commit on the Linus' branch, pass that to gitlog.sh to get his
ChangeLog. ;-)

  Next, I will probably do some bk-style pull tool. Or perhaps first
a gitpatch.sh which will verify the sha1s and do the mode changes.

  Linus, could you please have a look and tell me what do you think
about it so far?

  Thanks,

Petr Baudis

Index: Makefile
===
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/Makefile (mode:100664 
sha1:270cd4f8a8bf10cd513b489c4aaf76c14d4504a7)
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/Makefile (mode:100644 
sha1:185ff422e68984e68da011509dec116f05fc6f8d)
@@ -1,7 +1,7 @@
 CFLAGS=-g -O3 -Wall
 CC=gcc
 
-PROG=update-cache show-diff init-db write-tree read-tree commit-tree cat-file 
fsck-cache
+PROG=update-cache show-diff init-db write-tree read-tree commit-tree cat-file 
fsck-cache ls-tree
 
 all: $(PROG)
 
@@ -30,6 +30,9 @@
 cat-file: cat-file.o read-cache.o
$(CC) $(CFLAGS) -o cat-file cat-file.o read-cache.o $(LIBS)
 
+ls-tree: ls-tree.o read-cache.o
+   $(CC) $(CFLAGS) -o ls-tree ls-tree.o read-cache.o $(LIBS)
+
 fsck-cache: fsck-cache.o read-cache.o
$(CC) $(CFLAGS) -o fsck-cache fsck-cache.o read-cache.o $(LIBS)
 
Index: README
===
Index: cache.h
===
Index: cat-file.c
===
Index: commit-tree.c
===
Index: fsck-cache.c
===
Index: gitadd.sh
===
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitadd.sh
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitadd.sh (mode:100755 
sha1:d23be758c0c9fc1cf9756bcd3ee4d7266c60a2c9)
@@ -0,0 +1,13 @@
+#!/bin/sh
+#
+# Add new file to a GIT repository.
+# Copyright (c) Petr Baudis, 2005
+#
+# Takes a list of file names at the command line, and schedules them
+# for addition to 

Re: Kernel SCM saga..

2005-04-09 Thread Roman Zippel
Hi,

On Sat, 9 Apr 2005, Eric D. Mudama wrote:

> > For example bk does something like this:
> > 
> > A1 -> A2 -> A3 -> BM
> >   \-> B1 -> B2 --^
> > 
> > and instead of creating the merge changeset, one could merge them like
> > this:
> > 
> > A1 -> A2 -> A3 -> B1 -> B2
> > 
> > This results in a simpler repository, which is more scalable and which
> > is easier for users to work with (e.g. binary bug search).
> > The disadvantage would be it will cause more minor conflicts, when changes
> > are pulled back into the original tree, but which should be easily
> > resolvable most of the time.
> 
> The kicker comes that B1 was developed based on A1, so any test
> results were based on B1 being a single changeset delta away from A1. 
> If the resulting 'BM' fails testing, and you've converted into the
> linear model above where B2 has failed, you lose the ability to
> isolate B1's changes and where they came from, to revalidate the
> developer's results.

What good does it do if you can revalidate the original B1? The important 
point is that the end result works and if it only fails in the merged 
version you have a big problem. The serialized version gives you the 
chance to test whether it fails in B1 or B2.

> I believe that flattening the change graph makes history reproduction
> impossible, or alternately, you are imposing on each developer to test
> the merge results at B1 + A1..3 before submission, but in doing so,
> the test time may require additional test periods etc and with
> sufficient velocity, might never close.

The merge result has to be tested either way, so I'm not exactly sure, 
what you're trying to say.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
>  (b) while I depend on the fact that if the SHA of an object matches, the 
>  objects are the same, I generally try to avoid the reverse 
>  dependency.

It might be a valid point that you want to leave the door open to using
a different (than SHA1) digest.  (So this means you going to store it
as an ASCII string, right?)

But I don't see how that applies here.  Any optimization that avoids
rereading old versions if the digests match will never trigger on the
day you change digests.  No problem here - you doomed to reread the old
version in any case.

Either you got your logic backwards, or I need another cup of coffee.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Linus wrote:
> In "git", you usually care about 
> the old contents too.

True - in your case, you probably want the old contents
so might as well dig them out as soon as it becomes
convenient to have them.

I was objecting to your claim that you _had_ to dig out
the old contents to determine if a file changed.

You don't _have_ to ... but I agree that it's a good
time to do so.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Roman Zippel
Hi,

On Fri, 8 Apr 2005, Linus Torvalds wrote:

> Yes.  Per-file history is expensive in git, because if the way it is 
> indexed. Things are indexed by tree and by changeset, and there are no 
> per-file indexes.
> 
> You could create per-file _caches_ (*) on top of git if you wanted to make
> it behave more like a real SCM, but yes, it's all definitely optimized for
> the things that _I_ tend to care about, which is the whole-repository
> operations.

Per file history is also expensive for another reason. The basic reason is 
that I think that a hash based storage is not the best approach for SCM. 
It's lacking locality, so the more it grows the more it has to seek to 
collect all the data.
To reduce the space usage you could replace the parent file with a sha1 
reference + delta to the new file. This is basically what monotone does 
and might cause perfomance problems if you need to restore old versions 
(e.g. if you want to annotate a file).

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Linus wrote:
> (you need to remember to escape '%' 
> too when you do that ;).

No - don't have to.  Not if I don't mind giving fools that embed
newlines in paths second class service.

In my case, if I create a file named "foo\nbar", then backup and restore
it, I end up with a restored file named "foo%0Abar".  If I had backed up
another file named "foo%0Abar", and now restore it, it collides, and
last one to be restored wins.  If I really need the "foo\nbar" file back
as originally named, I will have to dig it out by hand.

I dare say that Linux kernel source does not require first class support
for newlines embedded in pathnames.

> ASCII isn't magical.

No - but it's damn convenient.  Alot of tools work on line-oriented
ASCII that don't work elsewhere.

I guess Perl-hackers won't care much, but those working with either
classic shell script tools or Python will find line formatted ASCII more
convenient.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Eric D. Mudama
On Apr 8, 2005 4:52 PM, Roman Zippel <[EMAIL PROTECTED]> wrote:
> The problem is you pay a price for this. There must be a reason developers
> were adding another GB of memory just to run BK.
> Preserving the complete merge history does indeed make repeated merges
> simpler, but it builds up complex meta data, which has to be managed
> forever. I doubt that this is really an advantage in the long term. I
> expect that we were better off serializing changesets in the main
> repository. For example bk does something like this:
> 
> A1 -> A2 -> A3 -> BM
>   \-> B1 -> B2 --^
> 
> and instead of creating the merge changeset, one could merge them like
> this:
> 
> A1 -> A2 -> A3 -> B1 -> B2
> 
> This results in a simpler repository, which is more scalable and which
> is easier for users to work with (e.g. binary bug search).
> The disadvantage would be it will cause more minor conflicts, when changes
> are pulled back into the original tree, but which should be easily
> resolvable most of the time.

The kicker comes that B1 was developed based on A1, so any test
results were based on B1 being a single changeset delta away from A1. 
If the resulting 'BM' fails testing, and you've converted into the
linear model above where B2 has failed, you lose the ability to
isolate B1's changes and where they came from, to revalidate the
developer's results.

With bugs and fixes that can be validated in a few hours, this may not
be a problem, but when chasing a bug that takes days or weeks to
manifest, that a developer swears they fixed, one has to be able to
reproduce their exact test environment.

I believe that flattening the change graph makes history reproduction
impossible, or alternately, you are imposing on each developer to test
the merge results at B1 + A1..3 before submission, but in doing so,
the test time may require additional test periods etc and with
sufficient velocity, might never close.  This is the problem CVS has
if you don't create micro branches for every single modification.

--eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Roman Zippel
Hi,

On Fri, 8 Apr 2005, Linus Torvalds wrote:

> Also, I suspect that BKCVS actually bothers to get more details out of a
> BK tree than I cared about. People have pestered Larry about it, so BKCVS
> exports a lot of the nitty-gritty (per-file comments etc) that just
> doesn't actually _matter_, but people whine about. Me, I don't care. My
> sparse-conversion just took the important parts.

As soon as you want to synchronize and merge two trees, you will know why 
this information does matter.
(/me looks closer at the sparse-conversion...)
It seems you exported the complete parent information and this is exactly 
the "nitty-gritty" I was "whining" about and which is not available via 
bkcvs or bkweb and it's the most crucial information to make the bk data 
useful outside of bk. Larry was previously very clear about this that he 
considers this proprietary bk meta data and anyone attempting to export 
this information is in violation with the free bk licence, so you indeed 
just took the important parts and this is/was explicitly verboten for 
normal bk users.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Linus Torvalds


On Sat, 9 Apr 2005, Paul Jackson wrote:
>
> > in order to avoid having to worry about special characters
> > they are NUL-terminated)
> 
> Would this be a possible alternative - newline terminated (convert any
> newlines embedded in filenames to the 3 chars '%0A', and leave it as an
> exercise to the reader to de-convert them.)

Sure, you could obviously do escaping (you need to remember to escape '%' 
too when you do that ;).

However, whenever you do escaping, that means that you're already going to 
have to use a tool to unpack the dang thing. So you didn't actually win 
anything. I pretty much guarantee that my existing format is easier to 
unpack than your escaped format.

ASCII isn't magical.

This is "fsck_tree()", which walks the unpacked tree representation and 
checks that it looks sane and marks the sha1's it finds as being 
needed (so that you can do reachability analysis in a second pass). It's 
not exactly complicated:

static int fsck_tree(unsigned char *sha1, void *data, unsigned long 
size)
{
while (size) {
int len = 1+strlen(data);
unsigned char *file_sha1 = data + len;
char *path = strchr(data, ' ');
if (size < len + 20 || !path)
return -1;
data += len + 20;
size -= len + 20;
mark_needs_sha1(sha1, "blob", file_sha1);
}
return 0;
}

and there's one HUGE advantage to _not_ having escaping: sorting and
comparing.

If you escape things, you now have to decide how you sort filenames. Do
you sort them by the escaped representation, or by the "raw"  
representation? Do you always have to escape or unescape the name in order 
to sort it.

So I like ASCII as much as the next guy, but it's not a religion. If there 
isn't any point to it, there isn't any point to it.

The biggest irritation I have with the "tree" format I chose is actually
not the name (which is trivial), it's the  part. Almost everything
else keeps the  in the ASCII hexadecimal representation, and I
should have done that here too. Why? Not because it's a  - hey, the 
binary representation is certainly denser and equivalent - but because an 
ASCII representation there would have allowed me to much more easily 
change the key format if I ever wanted to. Now it's very SHA1-specific.

Which I guess is fine - I don't really see any reason to change, and if I 
do change, I could always just re-generate the whole tree. But I think it 
would have been cleaner to have _that_ part in ASCII.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread David Roundy
On Thu, Apr 07, 2005 at 12:30:18PM +0200, Matthias Andree wrote:
> On Thu, 07 Apr 2005, Sergei Organov wrote:
> > darcs? 
> 
> Close. Some things:
> 
> 1. It's rather slow and quite CPU consuming and certainly I/O consuming
>at times - I keep, to try it out, leafnode-2 in a DARCS repo, which
>has a mere 20,000 lines in 140 files, with 1,436 changes so far, on a
>RAID-1 with two 7200/min disk drives, with an Athlon XP 2500+ with
>512 MB RAM. The repo has 1,700 files in 11.5 MB, the source itself
>189 files in 1.8 MB.
> 
>Example: darcs annotate nntpd.c takes 23 s. (2,660 lines, 60 kByte)
> 
>The maintainer himself states that there's still optimization required.

Indeed, there's still a lot of optimization to be done.  I've recently made
some improvements recently which will reduce the memory use (and speed
things up) for a few of the worst-performing commands.  No improvement to
the initial record, but on the plus side, that's only done once.  But I was
able to cut down the memory used checking out a kernel repository to 500m.
(Which, sadly enough, is a major improvement.)

You would do much better if you recorded the initial state one directory at
a time, since it's the size of the largest changeset that determines the
memory use on checkout, but that's ugly.

> Getting DARCS up to the task would probably require some polishing, and
> should probably be discussed with the DARCS maintainer before making
> this decision.
> 
> Don't get me wrong, DARCS looks promising, but I'm not convinced it's
> ready for the linux kernel yet.

Indeed, I do believe that darcs has a way to go before it'll perform
acceptably on the kernel.  On the other hand, tar seems to perform
unacceptably slow on the kernel, so I'm not sure how slow is too slow.
Definitely input from interested kernel developers on which commands are
too slow would be welcome.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Linus wrote:
> If you want to have spaces
>  and newlines in your pathname, go wild.

So long as there is only one pathname in a record, you don't need
nul-terminators to be allow spaces in the name.  The rest of the record
is well known, so the pathname is just whatever is left after chomping
off the rest of the record.

It's only the support for embedded newlines that forces you to use
nul-terminators.

Not worth it - in my view.  Rather, do just enough hackery that
such a pathname doesn't break you, even if it means not giving
full service to such names.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Linus Torvalds


On Sat, 9 Apr 2005, Paul Jackson wrote:
> 
> I must be missing something here ...
> 
> If the stat shows a possible change, then you shouldn't have to open the
> original version to determine if it really changed - just compute the
> SHA1 of the new file, and see if that changed from the original SHA1.

Yes. However, I've got two reasons for this:

 (a) it may actually be cheaper to just unpack the compressed thing than
 it is to compute the sha, _especially_ since it's very likely that
 you have to do that anyway (ie if it turns out that they _are_
 different, you need the unpacked data to then look at the
 differences).

 So when you come from your backup angle, you only care about "has it 
 changed", and you'll do a backup. In "git", you usually care about 
 the old contents too.

 (b) while I depend on the fact that if the SHA of an object matches, the 
 objects are the same, I generally try to avoid the reverse 
 dependency. Why? Because if I end up changing the way I pack objects,
 and still want to work with old objects, I may end up in the 
 situation that two identical objects could get different object 
 names.

I don't actually know how valid a point "(b)" is, and I don't think it's 
likely, but imagine that SHA1 ends up being broken (*) and I decide that I 
want to pack new objects with a new-and-improved-SHA256 or something. Such 
a thing would obviously mean that you end up with lots of _duplicate_ data 
(any new data that is repackaged with the new name will now cause a new 
git object), but "duplicate" is better than "broken".

I don't actually guarantee that "git" could handle that right, but I've
been idly trying to avoid locking myself into the mindset that "file
equality has to mean name equality over the long run". So while the system 
right now works on the 1:1 "name" <-> "content" mapping, it's possible 
that it _could_ work with a more relaxed 1:n "content" -> "name" mapping.

But it's entirely possible that I'm being a git about this.

Linus 

(*) yeah, yeah, I know about the current theoretical case, and I don't
care. Not only is it theoretical, the way my objects are packed you'd have
to not just generate the same SHA1 for it, it would have to _also_ still
be a valid zlib object _and_ get the header to match the "type + length"  
of object part. IOW, the object validity checks are actually even stricter
than just "sha1 matches".
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
> in order to avoid having to worry about special characters
> they are NUL-terminated)

Would this be a possible alternative - newline terminated (convert any
newlines embedded in filenames to the 3 chars '%0A', and leave it as an
exercise to the reader to de-convert them.)

Line formatted ASCII files are really nice - worth pissing on embedded
newlines in paths to obtain.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Marcin wrote:
> But what will impress you are either the price tag the 
> DB comes with or
> the hardware it runs on :-)

The payroll for the staffing to care and feed for these
babies is often impressive as well.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Linus wrote:
> then git will open have exactly _one_ 
> file (no searching, no messing around), which contains absolutely nothing 
> except for the compressed (and SHA1-signed) old contents of the file. It 
> obviously _has_ to do that, because in order to know whether you've 
> changed it, it needs to now compare it to the original.

I must be missing something here ...

If the stat shows a possible change, then you shouldn't have to open the
original version to determine if it really changed - just compute the
SHA1 of the new file, and see if that changed from the original SHA1.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Linus wrote:
> you need to reuse the same inode/dev numbers
> (again - I didn't worry about portability, and filesystems where those
> aren't stable are a "don't do that then") 

On filesystems that don't have a stable inode number, I use the md5sum
of the full (relative to mount point) pathname as the inode number. 

Since these same file systems (not surprisingly) lack hard links as
well, the pathname _is_ essentially the stable inode number.


Off-topic details ...

This is on my backup program, which does a full snapshot of my 90 Gb
system, including some FAT file systems, in 6 or 7 minutes, plus time
proportional to actual changes.  I have given up finding a backup
program I can tolerate, and write my own.  It stores each md5sum unique
blob exactly once, but uses the same sort of tricks you describe to
detect changes from examining just the stat information so as to avoid
reading every damn byte on the disk.  It works with smb, fat, vfat,
ntfs, reiserfs, xfs, ext2/3, ...  A single manifest file, in plain
ascii, one file per line, captures a full snapshot, disk-to-disk, every
few hours.

This comment from my backup source explains more:

# Unfortunately, fat, vfat, smb, and ncpfs (Netware) file systems
# do not have unique disk-based persistent inode numbers.
# The kernel constructs transient inode numbers for inodes
# in its cache.  But after an umount and re-mount, the inode
# numbers are all different.  So we would end up recalculating
# the md5sums of all files in any such file systems.
#
# To avoid this, we keep track of which directories are on such
# file systems, and for files in any such directory, instead
# of using the inode value from stat'ing a file, we use the
# md5sum of its path as a pseudo-inode number.  This digest of
# a file's path has improved persistance over it's transiently
# assigned inode number.  Fields 5,6,7 (files total, free and
# avail) happen to be zero on file systems (fat, vfat, smb,
# ...) with no real inodes, so we we use this fallback means
# of getting a persistent pseudo-inode if a statvfs() call on
# its directory has fields 5,6,7 summing to zero:
#   sum(os.statvfs(dir)[5:8]) == 0
# We include that dir in the fat_directories set in this case.

fat_directories = sets.Set()# set of directory paths on FAT file systems

# The Python statvfs() on Linux is a tad expensive - the
# glibc statvfs(2) code does several system calls, including
# scanning /proc/mounts and stat'ing its entries.  We need
# to know for each file whether it is on a "fat" file system
# (see above), but for efficiency we only statvfs at mount
# points, then propagate the file system type from there down.

mountpoints = [m.split()[1] for m in open("/proc/mounts")]



-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Samium Gromoff
It seems that Tom Lord, the primary architect behind GNU Arch
has recently published an open letter to Linus Torvalds.

Because no open letter to Linus would be really open without an
accompanying reference post on lkml, here it is:

http://lists.seyza.com/pipermail/gnu-arch-dev/2005-April/001001.html

---
cheers,
   Samium Gromoff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Samium Gromoff
Ok, this was literally screaming for a rebuttal! :-)
 

 
> Arch isn't a sound example of software design. Quite contrary to the  
>  
> random notes posted by it's author the following issues did strike me 
>  
> the time I did evaluate it:   
>  
(Note that here you take a stab at the Arch design fundamentals, but
 
actually fail to substantiate it later) 
 

 
> The application (tla) claims to have "intuitive" command names. However   
>  
> I didn't see that as given. Most of them where difficult to remember  
>  
> and appeared to be just infantile. I stopped looking further after I  
>  
> saw:  
>  
[ UI issues snipped, not really core design ]   
 

 
Yes, some people perceive that there _are_ UI issues in Arch.   
 
However, as strange as it may sound, some don`t feel so.
 

 
> As an added bonus it relies on the applications named by accident 
>  
> patch and diff and installed on the host in question as well as few   
>  
> other as well to  
>  
> operate.  
>  

 
This is called modularity and code reuse.   
 

 
And given that patch and diff are installed by default on all of the
 
relevant developer machines i fail to see as to why it is by any
 
measure a derogatory.   
 

 
(and the rest you speak about is tar and gzip)  
 

 
> Better don't waste your time with looking at Arch. Stick with patches 
>  
> you maintain by hand combined with some scripts containing a list of  
>  
> apply commands
>  
> and you should be still more productive then when using Arch. 
>  

 
Sure, you should`ve had come up with something more based than that! :-)
 

 
Now to the real design issues...
 

 
Globally unique, meaningful, symbolic revision names -- the core of the 
 
Arch namespace. 
 

 
"Stone simple" on-disk format to store things -- a hierarchy
 
of directories with textual files and tarballs. 
 

 
No smart server -- any sftp, ftp, webdav (or just http for read-only access)
 
server is exactly up to the task.   
 

 
O(0) branching -- a 

Re: Kernel SCM saga..

2005-04-09 Thread Neil Brown
On Saturday April 9, [EMAIL PROTECTED] wrote:
> On Sat, Apr 09, 2005 at 05:47:08PM +1000, Neil Brown wrote:
> > On Saturday April 9, [EMAIL PROTECTED] wrote:
> > > 
> > > I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
> > > files each) and 1.3s once the trees are cached locally. This is without
> > > comparing file contents, just meta-data. And it takes 19.33s to compare
> > > the file's md5 sums once the trees are cached. I don't know if there are
> > > ways to avoid some NFS operations when everything is cached.
> > > 
> > > Anyway, the system does not seem much efficient on hard links, it caches
> > > the files twice :-(
> > 
> > I suspect you'll be wanting to add a "no_subtree_check" export option
> > on your NFS server...
> 
> Thanks a lot, Neil ! This is very valuable information. I didn't
> understand such implications from the exports(5) man page, but it
> makes a great difference. And the diff sped up from 5.7 to 3.9s
> and from 19.3 to 15.3s.

No, that implication had never really occurred to me before either.
But when you said "caches the file twice" it suddenly made sense.
With subtree_check, the NFS file handle contains information about the
directory, and NFS uses the filehandle as the primary key to tell if
two things are the same or not.

Trond keeps prodding me to make no_subtree_check the default.  Maybe it
is time that I actually did

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Jan Hudec
On Sat, Apr 09, 2005 at 03:01:29 +0200, Marcin Dalecki wrote:
> 
> On 2005-04-07, at 09:44, Jan Hudec wrote:
> >
> >I have looked at most systems currently available. I would suggest
> >following for closer look on:
> >
> >1) GNU Arch/Bazaar. They use the same archive format, simple, have the
> >   concepts right. It may need some scripts or add ons. When Bazaar-NG
> >   is ready, it will be able to read the GNU Arch/Bazaar archives so
> >   switching should be easy.
> 
> Arch isn't a sound example of software design. Quite contrary to the 

I actually _do_ agree with you. I like Arch, but it's user interface
certainly is broken and some parts of it would sure needs some redesign.

> random notes posted by it's author the following issues did strike me 
> the time I did evaluate it:
> 
> The application (tla) claims to have "intuitive" command names. However
> I didn't see that as given. Most of them where difficult to remember
> and appeared to be just infantile. I stopped looking further after I 
> saw:
> 
> tla my-id instead of: tla user-id or oeven tla set id ...
> 
> tla make-archive instead of tla init

In this case, tla init would be a lot *worse*, because there are two
different things to initialize -- the archive and the tree. But
init-archive would be a little better, for consistency.

> tla my-default-archive [EMAIL PROTECTED]

This one is kinda broken. Even in concept it is.

> No more "My Compuer" please...
> 
> Repository addressing requires you to use informally defined
> very elaborated and typing error prone conventions:
> 
> mkdir ~/{archives}

*NO*. Usng this is name is STRONGLY recommended *AGAINST*. Tom once used
it in the example or in some of his archive and people started doing it,
but it's a compelete bogosity and it is not required anywhere.

> tla make-archive [EMAIL PROTECTED] 
> ~/{archives}/2005-VersionPatrol
> 
> You notice the requirement for two commands to accomplish a single task 
> already well denoted by the second command? There is more of the same
> at quite a few places when you try to use it. You notice the triple
> zero it didn't catch?

I sure do. But the folks writing Bazaar are gradually fixing these.
There is a lot of them and it's not that long since they started, so
they did not fix all of them yey, but I think they eventually will.

> As an added bonus it relies on the applications named by accident
> patch and diff and installed on the host in question as well as few 
> other as well to
> operate.

No. The build process actually checks that the diff and patch
applications are actually the GNU Diff and GNU Patch in sufficiently
recent version. It's was not always the case, but now it does.

> Better don't waste your time with looking at Arch. Stick with patches
> you maintain by hand combined with some scripts containing a list of 
> apply commands
> and you should be still more productive then when using Arch.

I don't agree with you. Using Arch is more productive (eg. because it
does merges), but certainly one could do a lot better than Arch does.

---
 Jan 'Bulb' Hudec <[EMAIL 
PROTECTED]>


signature.asc
Description: Digital signature


Re: Kernel SCM saga..

2005-04-09 Thread Willy Tarreau
On Sat, Apr 09, 2005 at 05:47:08PM +1000, Neil Brown wrote:
> On Saturday April 9, [EMAIL PROTECTED] wrote:
> > 
> > I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
> > files each) and 1.3s once the trees are cached locally. This is without
> > comparing file contents, just meta-data. And it takes 19.33s to compare
> > the file's md5 sums once the trees are cached. I don't know if there are
> > ways to avoid some NFS operations when everything is cached.
> > 
> > Anyway, the system does not seem much efficient on hard links, it caches
> > the files twice :-(
> 
> I suspect you'll be wanting to add a "no_subtree_check" export option
> on your NFS server...

Thanks a lot, Neil ! This is very valuable information. I didn't
understand such implications from the exports(5) man page, but it
makes a great difference. And the diff sped up from 5.7 to 3.9s
and from 19.3 to 15.3s.

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Neil Brown
On Saturday April 9, [EMAIL PROTECTED] wrote:
> 
> I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
> files each) and 1.3s once the trees are cached locally. This is without
> comparing file contents, just meta-data. And it takes 19.33s to compare
> the file's md5 sums once the trees are cached. I don't know if there are
> ways to avoid some NFS operations when everything is cached.
> 
> Anyway, the system does not seem much efficient on hard links, it caches
> the files twice :-(

I suspect you'll be wanting to add a "no_subtree_check" export option
on your NFS server...

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Willy Tarreau
On Fri, Apr 08, 2005 at 11:56:09AM -0700, Chris Wedgwood wrote:
> On Fri, Apr 08, 2005 at 11:47:10AM -0700, Linus Torvalds wrote:
> 
> > Don't use NFS for development. It sucks for BK too.
> 
> Some times NFS is unavoidable.
> 
> In the best case (see previous email wrt to only stat'ing the parent
> directories when you can) for a current kernel though you can get away
> with 894 stats --- over NFS that would probably be tolerable.
> 
> After claiming such an optimization is probably not worth while I'm
> now thinking for network filesystems it might be.

I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
files each) and 1.3s once the trees are cached locally. This is without
comparing file contents, just meta-data. And it takes 19.33s to compare
the file's md5 sums once the trees are cached. I don't know if there are
ways to avoid some NFS operations when everything is cached.

Anyway, the system does not seem much efficient on hard links, it caches
the files twice :-(

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Willy Tarreau
On Fri, Apr 08, 2005 at 12:03:49PM -0700, Linus Torvalds wrote:
 
> And if you do actively malicious things in your own directory, you get
> what you deserve. It's actually _hard_ to try to fool git into believing a
> file hasn't changed: you need to not only replace it with the exact same
> file length and ctime/mtime, you need to reuse the same inode/dev numbers
> (again - I didn't worry about portability, and filesystems where those
> aren't stable are a "don't do that then") and keep the mode the same. Oh,
> and uid/gid, but that was much me being silly.

It would be even easier to touch the tree with a known date before
patching (eg:1/1/70). It would protect against any accidental date
change if for any reason your system time went backwards while
working on the tree.

Another trick I use when I build the 2.4-hf patches is to build a
list of filenames from the patches. It works only because I want
to keep all original patches and no change should appear outside
those patches. Using this + cp -al + diff -pruN makes the process
very fast. It would not work if I had to rebuild those patches from
hand-edited files of course.

Last but not least, it only takes 0.26 seconds on my dual athlon
1800 to find date/size changes between 2.6.11{,.7} and 4.7s if the
tool includes the md5 sum in its checks :

$ time flx check --ignore-owner --ignore-mode --ignore-ldate --ignore-dir \
  --ignore-dot --only-new --ignore-sum linux-2.6.11/. linux-2.6.11.7/. |wc -l
 47

real0m0.255s
user0m0.094s
sys 0m0.162s

$ time flx check --ignore-owner --ignore-mode --ignore-ldate --ignore-dir \
  --ignore-dot --only-new linux-2.6.11/. linux-2.6.11.7/. |wc -l
 47

real0m4.705s
user0m3.398s
sys 0m1.310s

(This was with 'flx', a tool a friend developped for file-system integrity
checking which we also use to build our packages). Anyway, what I wanted
to show is that once the trees are cached, even somewhat heavy operations
such as checksumming can be done occasionnaly (such as md5 for double
checking) without you waiting too long. And I don't think that a database
would provide all the comfort of a standard file-system (cp -al, rsync,
choice of tools, etc...).

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Randy.Dunlap
On Sat, 9 Apr 2005 04:53:57 +0200 Petr Baudis wrote:

|   Hello,
| 
| Dear diary, on Fri, Apr 08, 2005 at 05:50:21PM CEST, I got a letter
| where Linus Torvalds <[EMAIL PROTECTED]> told me that...
| > 
| > 
| > On Fri, 8 Apr 2005 [EMAIL PROTECTED] wrote:
| > > 
| > > Here's a partial solution.  It does depend on a modified version of
| > > cat-file that behaves like cat.  I found it easier to have cat-file
| > > just dump the object indicated on stdout.  Trivial patch for that is 
included.
| > 
| > Your trivial patch is trivially incorrect, though. First off, some files
| > may be binary (and definitely are - the "tree" type object contains
| > pathnames, and in order to avoid having to worry about special characters
| > they are NUL-terminated), and your modified "cat-file" breaks that.  
| > 
| > Secondly, it doesn't check or print the tag.
| 
|   FWIW, I made few small fixes (to prevent some trivial usage errors to
| cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
| gitlog.sh - heavily inspired by what already went through the mailing
| list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
| (including .dircache, even though it isn't shown in the index), the
| cumulative patch can be found below. The scripts aim to provide some
| (obviously very interim) more high-level interface for git.
| 
|   I'm now working on tree-diff.c which will (surprise!) produce a diff
| of two trees (I'll finish it after I get some sleep, though), and then I
| will probably do some dwimmy gitdiff.sh wrapper for tree-diff and
| show-diff. At that point I might get my hand on some pull more kind to
| local changes.

Hi,

I'll look at your scripts this weekend.  I've also been
working on some, but mine are a bit more experimental (cruder)
than yours are.  Anyway, here they are (attached) -- also
available at http://developer.osdl.org/rddunlap/git/

gitin : checkin/commit
gitwhat sha1 : what is that sha1 file (type and contents if blob or commit)
gitlist (blob, commit, tree, or all) :
list all objects with type (commit, tree, blob, or all)

---
~Randy


gitin
Description: Binary data


gitlist
Description: Binary data


gitwhat
Description: Binary data


Re: Kernel SCM saga..

2005-04-09 Thread Randy.Dunlap
On Sat, 9 Apr 2005 04:53:57 +0200 Petr Baudis wrote:

|   Hello,
| 
| Dear diary, on Fri, Apr 08, 2005 at 05:50:21PM CEST, I got a letter
| where Linus Torvalds [EMAIL PROTECTED] told me that...
|  
|  
|  On Fri, 8 Apr 2005 [EMAIL PROTECTED] wrote:
|   
|   Here's a partial solution.  It does depend on a modified version of
|   cat-file that behaves like cat.  I found it easier to have cat-file
|   just dump the object indicated on stdout.  Trivial patch for that is 
included.
|  
|  Your trivial patch is trivially incorrect, though. First off, some files
|  may be binary (and definitely are - the tree type object contains
|  pathnames, and in order to avoid having to worry about special characters
|  they are NUL-terminated), and your modified cat-file breaks that.  
|  
|  Secondly, it doesn't check or print the tag.
| 
|   FWIW, I made few small fixes (to prevent some trivial usage errors to
| cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
| gitlog.sh - heavily inspired by what already went through the mailing
| list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
| (including .dircache, even though it isn't shown in the index), the
| cumulative patch can be found below. The scripts aim to provide some
| (obviously very interim) more high-level interface for git.
| 
|   I'm now working on tree-diff.c which will (surprise!) produce a diff
| of two trees (I'll finish it after I get some sleep, though), and then I
| will probably do some dwimmy gitdiff.sh wrapper for tree-diff and
| show-diff. At that point I might get my hand on some pull more kind to
| local changes.

Hi,

I'll look at your scripts this weekend.  I've also been
working on some, but mine are a bit more experimental (cruder)
than yours are.  Anyway, here they are (attached) -- also
available at http://developer.osdl.org/rddunlap/git/

gitin : checkin/commit
gitwhat sha1 : what is that sha1 file (type and contents if blob or commit)
gitlist (blob, commit, tree, or all) :
list all objects with type (commit, tree, blob, or all)

---
~Randy


gitin
Description: Binary data


gitlist
Description: Binary data


gitwhat
Description: Binary data


Re: Kernel SCM saga..

2005-04-09 Thread Willy Tarreau
On Fri, Apr 08, 2005 at 12:03:49PM -0700, Linus Torvalds wrote:
 
 And if you do actively malicious things in your own directory, you get
 what you deserve. It's actually _hard_ to try to fool git into believing a
 file hasn't changed: you need to not only replace it with the exact same
 file length and ctime/mtime, you need to reuse the same inode/dev numbers
 (again - I didn't worry about portability, and filesystems where those
 aren't stable are a don't do that then) and keep the mode the same. Oh,
 and uid/gid, but that was much me being silly.

It would be even easier to touch the tree with a known date before
patching (eg:1/1/70). It would protect against any accidental date
change if for any reason your system time went backwards while
working on the tree.

Another trick I use when I build the 2.4-hf patches is to build a
list of filenames from the patches. It works only because I want
to keep all original patches and no change should appear outside
those patches. Using this + cp -al + diff -pruN makes the process
very fast. It would not work if I had to rebuild those patches from
hand-edited files of course.

Last but not least, it only takes 0.26 seconds on my dual athlon
1800 to find date/size changes between 2.6.11{,.7} and 4.7s if the
tool includes the md5 sum in its checks :

$ time flx check --ignore-owner --ignore-mode --ignore-ldate --ignore-dir \
  --ignore-dot --only-new --ignore-sum linux-2.6.11/. linux-2.6.11.7/. |wc -l
 47

real0m0.255s
user0m0.094s
sys 0m0.162s

$ time flx check --ignore-owner --ignore-mode --ignore-ldate --ignore-dir \
  --ignore-dot --only-new linux-2.6.11/. linux-2.6.11.7/. |wc -l
 47

real0m4.705s
user0m3.398s
sys 0m1.310s

(This was with 'flx', a tool a friend developped for file-system integrity
checking which we also use to build our packages). Anyway, what I wanted
to show is that once the trees are cached, even somewhat heavy operations
such as checksumming can be done occasionnaly (such as md5 for double
checking) without you waiting too long. And I don't think that a database
would provide all the comfort of a standard file-system (cp -al, rsync,
choice of tools, etc...).

Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Willy Tarreau
On Fri, Apr 08, 2005 at 11:56:09AM -0700, Chris Wedgwood wrote:
 On Fri, Apr 08, 2005 at 11:47:10AM -0700, Linus Torvalds wrote:
 
  Don't use NFS for development. It sucks for BK too.
 
 Some times NFS is unavoidable.
 
 In the best case (see previous email wrt to only stat'ing the parent
 directories when you can) for a current kernel though you can get away
 with 894 stats --- over NFS that would probably be tolerable.
 
 After claiming such an optimization is probably not worth while I'm
 now thinking for network filesystems it might be.

I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
files each) and 1.3s once the trees are cached locally. This is without
comparing file contents, just meta-data. And it takes 19.33s to compare
the file's md5 sums once the trees are cached. I don't know if there are
ways to avoid some NFS operations when everything is cached.

Anyway, the system does not seem much efficient on hard links, it caches
the files twice :-(

Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Neil Brown
On Saturday April 9, [EMAIL PROTECTED] wrote:
 
 I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
 files each) and 1.3s once the trees are cached locally. This is without
 comparing file contents, just meta-data. And it takes 19.33s to compare
 the file's md5 sums once the trees are cached. I don't know if there are
 ways to avoid some NFS operations when everything is cached.
 
 Anyway, the system does not seem much efficient on hard links, it caches
 the files twice :-(

I suspect you'll be wanting to add a no_subtree_check export option
on your NFS server...

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Willy Tarreau
On Sat, Apr 09, 2005 at 05:47:08PM +1000, Neil Brown wrote:
 On Saturday April 9, [EMAIL PROTECTED] wrote:
  
  I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
  files each) and 1.3s once the trees are cached locally. This is without
  comparing file contents, just meta-data. And it takes 19.33s to compare
  the file's md5 sums once the trees are cached. I don't know if there are
  ways to avoid some NFS operations when everything is cached.
  
  Anyway, the system does not seem much efficient on hard links, it caches
  the files twice :-(
 
 I suspect you'll be wanting to add a no_subtree_check export option
 on your NFS server...

Thanks a lot, Neil ! This is very valuable information. I didn't
understand such implications from the exports(5) man page, but it
makes a great difference. And the diff sped up from 5.7 to 3.9s
and from 19.3 to 15.3s.

Cheers,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Jan Hudec
On Sat, Apr 09, 2005 at 03:01:29 +0200, Marcin Dalecki wrote:
 
 On 2005-04-07, at 09:44, Jan Hudec wrote:
 
 I have looked at most systems currently available. I would suggest
 following for closer look on:
 
 1) GNU Arch/Bazaar. They use the same archive format, simple, have the
concepts right. It may need some scripts or add ons. When Bazaar-NG
is ready, it will be able to read the GNU Arch/Bazaar archives so
switching should be easy.
 
 Arch isn't a sound example of software design. Quite contrary to the 

I actually _do_ agree with you. I like Arch, but it's user interface
certainly is broken and some parts of it would sure needs some redesign.

 random notes posted by it's author the following issues did strike me 
 the time I did evaluate it:
 
 The application (tla) claims to have intuitive command names. However
 I didn't see that as given. Most of them where difficult to remember
 and appeared to be just infantile. I stopped looking further after I 
 saw:
 
 tla my-id instead of: tla user-id or oeven tla set id ...
 
 tla make-archive instead of tla init

In this case, tla init would be a lot *worse*, because there are two
different things to initialize -- the archive and the tree. But
init-archive would be a little better, for consistency.

 tla my-default-archive [EMAIL PROTECTED]

This one is kinda broken. Even in concept it is.

 No more My Compuer please...
 
 Repository addressing requires you to use informally defined
 very elaborated and typing error prone conventions:
 
 mkdir ~/{archives}

*NO*. Usng this is name is STRONGLY recommended *AGAINST*. Tom once used
it in the example or in some of his archive and people started doing it,
but it's a compelete bogosity and it is not required anywhere.

 tla make-archive [EMAIL PROTECTED] 
 ~/{archives}/2005-VersionPatrol
 
 You notice the requirement for two commands to accomplish a single task 
 already well denoted by the second command? There is more of the same
 at quite a few places when you try to use it. You notice the triple
 zero it didn't catch?

I sure do. But the folks writing Bazaar are gradually fixing these.
There is a lot of them and it's not that long since they started, so
they did not fix all of them yey, but I think they eventually will.

 As an added bonus it relies on the applications named by accident
 patch and diff and installed on the host in question as well as few 
 other as well to
 operate.

No. The build process actually checks that the diff and patch
applications are actually the GNU Diff and GNU Patch in sufficiently
recent version. It's was not always the case, but now it does.

 Better don't waste your time with looking at Arch. Stick with patches
 you maintain by hand combined with some scripts containing a list of 
 apply commands
 and you should be still more productive then when using Arch.

I don't agree with you. Using Arch is more productive (eg. because it
does merges), but certainly one could do a lot better than Arch does.

---
 Jan 'Bulb' Hudec [EMAIL 
PROTECTED]


signature.asc
Description: Digital signature


Re: Kernel SCM saga..

2005-04-09 Thread Neil Brown
On Saturday April 9, [EMAIL PROTECTED] wrote:
 On Sat, Apr 09, 2005 at 05:47:08PM +1000, Neil Brown wrote:
  On Saturday April 9, [EMAIL PROTECTED] wrote:
   
   I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
   files each) and 1.3s once the trees are cached locally. This is without
   comparing file contents, just meta-data. And it takes 19.33s to compare
   the file's md5 sums once the trees are cached. I don't know if there are
   ways to avoid some NFS operations when everything is cached.
   
   Anyway, the system does not seem much efficient on hard links, it caches
   the files twice :-(
  
  I suspect you'll be wanting to add a no_subtree_check export option
  on your NFS server...
 
 Thanks a lot, Neil ! This is very valuable information. I didn't
 understand such implications from the exports(5) man page, but it
 makes a great difference. And the diff sped up from 5.7 to 3.9s
 and from 19.3 to 15.3s.

No, that implication had never really occurred to me before either.
But when you said caches the file twice it suddenly made sense.
With subtree_check, the NFS file handle contains information about the
directory, and NFS uses the filehandle as the primary key to tell if
two things are the same or not.

Trond keeps prodding me to make no_subtree_check the default.  Maybe it
is time that I actually did

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Samium Gromoff
Ok, this was literally screaming for a rebuttal! :-)
 

 
 Arch isn't a sound example of software design. Quite contrary to the  
  
 random notes posted by it's author the following issues did strike me 
  
 the time I did evaluate it:   
  
(Note that here you take a stab at the Arch design fundamentals, but
 
actually fail to substantiate it later) 
 

 
 The application (tla) claims to have intuitive command names. However   
  
 I didn't see that as given. Most of them where difficult to remember  
  
 and appeared to be just infantile. I stopped looking further after I  
  
 saw:  
  
[ UI issues snipped, not really core design ]   
 

 
Yes, some people perceive that there _are_ UI issues in Arch.   
 
However, as strange as it may sound, some don`t feel so.
 

 
 As an added bonus it relies on the applications named by accident 
  
 patch and diff and installed on the host in question as well as few   
  
 other as well to  
  
 operate.  
  

 
This is called modularity and code reuse.   
 

 
And given that patch and diff are installed by default on all of the
 
relevant developer machines i fail to see as to why it is by any
 
measure a derogatory.   
 

 
(and the rest you speak about is tar and gzip)  
 

 
 Better don't waste your time with looking at Arch. Stick with patches 
  
 you maintain by hand combined with some scripts containing a list of  
  
 apply commands
  
 and you should be still more productive then when using Arch. 
  

 
Sure, you should`ve had come up with something more based than that! :-)
 

 
Now to the real design issues...
 

 
Globally unique, meaningful, symbolic revision names -- the core of the 
 
Arch namespace. 
 

 
Stone simple on-disk format to store things -- a hierarchy
 
of directories with textual files and tarballs. 
 

 
No smart server -- any sftp, ftp, webdav (or just http for read-only access)
 
server is exactly up to the task.   
 

 
O(0) branching -- a branch is simply a tag, a continuation 

Re: Kernel SCM saga..

2005-04-09 Thread Samium Gromoff
It seems that Tom Lord, the primary architect behind GNU Arch
has recently published an open letter to Linus Torvalds.

Because no open letter to Linus would be really open without an
accompanying reference post on lkml, here it is:

http://lists.seyza.com/pipermail/gnu-arch-dev/2005-April/001001.html

---
cheers,
   Samium Gromoff
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Linus wrote:
 you need to reuse the same inode/dev numbers
 (again - I didn't worry about portability, and filesystems where those
 aren't stable are a don't do that then) 

On filesystems that don't have a stable inode number, I use the md5sum
of the full (relative to mount point) pathname as the inode number. 

Since these same file systems (not surprisingly) lack hard links as
well, the pathname _is_ essentially the stable inode number.


Off-topic details ...

This is on my backup program, which does a full snapshot of my 90 Gb
system, including some FAT file systems, in 6 or 7 minutes, plus time
proportional to actual changes.  I have given up finding a backup
program I can tolerate, and write my own.  It stores each md5sum unique
blob exactly once, but uses the same sort of tricks you describe to
detect changes from examining just the stat information so as to avoid
reading every damn byte on the disk.  It works with smb, fat, vfat,
ntfs, reiserfs, xfs, ext2/3, ...  A single manifest file, in plain
ascii, one file per line, captures a full snapshot, disk-to-disk, every
few hours.

This comment from my backup source explains more:

# Unfortunately, fat, vfat, smb, and ncpfs (Netware) file systems
# do not have unique disk-based persistent inode numbers.
# The kernel constructs transient inode numbers for inodes
# in its cache.  But after an umount and re-mount, the inode
# numbers are all different.  So we would end up recalculating
# the md5sums of all files in any such file systems.
#
# To avoid this, we keep track of which directories are on such
# file systems, and for files in any such directory, instead
# of using the inode value from stat'ing a file, we use the
# md5sum of its path as a pseudo-inode number.  This digest of
# a file's path has improved persistance over it's transiently
# assigned inode number.  Fields 5,6,7 (files total, free and
# avail) happen to be zero on file systems (fat, vfat, smb,
# ...) with no real inodes, so we we use this fallback means
# of getting a persistent pseudo-inode if a statvfs() call on
# its directory has fields 5,6,7 summing to zero:
#   sum(os.statvfs(dir)[5:8]) == 0
# We include that dir in the fat_directories set in this case.

fat_directories = sets.Set()# set of directory paths on FAT file systems

# The Python statvfs() on Linux is a tad expensive - the
# glibc statvfs(2) code does several system calls, including
# scanning /proc/mounts and stat'ing its entries.  We need
# to know for each file whether it is on a fat file system
# (see above), but for efficiency we only statvfs at mount
# points, then propagate the file system type from there down.

mountpoints = [m.split()[1] for m in open(/proc/mounts)]



-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Linus wrote:
 then git will open have exactly _one_ 
 file (no searching, no messing around), which contains absolutely nothing 
 except for the compressed (and SHA1-signed) old contents of the file. It 
 obviously _has_ to do that, because in order to know whether you've 
 changed it, it needs to now compare it to the original.

I must be missing something here ...

If the stat shows a possible change, then you shouldn't have to open the
original version to determine if it really changed - just compute the
SHA1 of the new file, and see if that changed from the original SHA1.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Marcin wrote:
 But what will impress you are either the price tag the 
 DB comes with or
 the hardware it runs on :-)

The payroll for the staffing to care and feed for these
babies is often impressive as well.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
 in order to avoid having to worry about special characters
 they are NUL-terminated)

Would this be a possible alternative - newline terminated (convert any
newlines embedded in filenames to the 3 chars '%0A', and leave it as an
exercise to the reader to de-convert them.)

Line formatted ASCII files are really nice - worth pissing on embedded
newlines in paths to obtain.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Linus Torvalds


On Sat, 9 Apr 2005, Paul Jackson wrote:
 
 I must be missing something here ...
 
 If the stat shows a possible change, then you shouldn't have to open the
 original version to determine if it really changed - just compute the
 SHA1 of the new file, and see if that changed from the original SHA1.

Yes. However, I've got two reasons for this:

 (a) it may actually be cheaper to just unpack the compressed thing than
 it is to compute the sha, _especially_ since it's very likely that
 you have to do that anyway (ie if it turns out that they _are_
 different, you need the unpacked data to then look at the
 differences).

 So when you come from your backup angle, you only care about has it 
 changed, and you'll do a backup. In git, you usually care about 
 the old contents too.

 (b) while I depend on the fact that if the SHA of an object matches, the 
 objects are the same, I generally try to avoid the reverse 
 dependency. Why? Because if I end up changing the way I pack objects,
 and still want to work with old objects, I may end up in the 
 situation that two identical objects could get different object 
 names.

I don't actually know how valid a point (b) is, and I don't think it's 
likely, but imagine that SHA1 ends up being broken (*) and I decide that I 
want to pack new objects with a new-and-improved-SHA256 or something. Such 
a thing would obviously mean that you end up with lots of _duplicate_ data 
(any new data that is repackaged with the new name will now cause a new 
git object), but duplicate is better than broken.

I don't actually guarantee that git could handle that right, but I've
been idly trying to avoid locking myself into the mindset that file
equality has to mean name equality over the long run. So while the system 
right now works on the 1:1 name - content mapping, it's possible 
that it _could_ work with a more relaxed 1:n content - name mapping.

But it's entirely possible that I'm being a git about this.

Linus 

(*) yeah, yeah, I know about the current theoretical case, and I don't
care. Not only is it theoretical, the way my objects are packed you'd have
to not just generate the same SHA1 for it, it would have to _also_ still
be a valid zlib object _and_ get the header to match the type + length  
of object part. IOW, the object validity checks are actually even stricter
than just sha1 matches.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Paul Jackson
Linus wrote:
 If you want to have spaces
  and newlines in your pathname, go wild.

So long as there is only one pathname in a record, you don't need
nul-terminators to be allow spaces in the name.  The rest of the record
is well known, so the pathname is just whatever is left after chomping
off the rest of the record.

It's only the support for embedded newlines that forces you to use
nul-terminators.

Not worth it - in my view.  Rather, do just enough hackery that
such a pathname doesn't break you, even if it means not giving
full service to such names.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread David Roundy
On Thu, Apr 07, 2005 at 12:30:18PM +0200, Matthias Andree wrote:
 On Thu, 07 Apr 2005, Sergei Organov wrote:
  darcs? http://www.abridgegame.org/darcs/
 
 Close. Some things:
 
 1. It's rather slow and quite CPU consuming and certainly I/O consuming
at times - I keep, to try it out, leafnode-2 in a DARCS repo, which
has a mere 20,000 lines in 140 files, with 1,436 changes so far, on a
RAID-1 with two 7200/min disk drives, with an Athlon XP 2500+ with
512 MB RAM. The repo has 1,700 files in 11.5 MB, the source itself
189 files in 1.8 MB.
 
Example: darcs annotate nntpd.c takes 23 s. (2,660 lines, 60 kByte)
 
The maintainer himself states that there's still optimization required.

Indeed, there's still a lot of optimization to be done.  I've recently made
some improvements recently which will reduce the memory use (and speed
things up) for a few of the worst-performing commands.  No improvement to
the initial record, but on the plus side, that's only done once.  But I was
able to cut down the memory used checking out a kernel repository to 500m.
(Which, sadly enough, is a major improvement.)

You would do much better if you recorded the initial state one directory at
a time, since it's the size of the largest changeset that determines the
memory use on checkout, but that's ugly.

 Getting DARCS up to the task would probably require some polishing, and
 should probably be discussed with the DARCS maintainer before making
 this decision.
 
 Don't get me wrong, DARCS looks promising, but I'm not convinced it's
 ready for the linux kernel yet.

Indeed, I do believe that darcs has a way to go before it'll perform
acceptably on the kernel.  On the other hand, tar seems to perform
unacceptably slow on the kernel, so I'm not sure how slow is too slow.
Definitely input from interested kernel developers on which commands are
too slow would be welcome.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Linus Torvalds


On Sat, 9 Apr 2005, Paul Jackson wrote:

  in order to avoid having to worry about special characters
  they are NUL-terminated)
 
 Would this be a possible alternative - newline terminated (convert any
 newlines embedded in filenames to the 3 chars '%0A', and leave it as an
 exercise to the reader to de-convert them.)

Sure, you could obviously do escaping (you need to remember to escape '%' 
too when you do that ;).

However, whenever you do escaping, that means that you're already going to 
have to use a tool to unpack the dang thing. So you didn't actually win 
anything. I pretty much guarantee that my existing format is easier to 
unpack than your escaped format.

ASCII isn't magical.

This is fsck_tree(), which walks the unpacked tree representation and 
checks that it looks sane and marks the sha1's it finds as being 
needed (so that you can do reachability analysis in a second pass). It's 
not exactly complicated:

static int fsck_tree(unsigned char *sha1, void *data, unsigned long 
size)
{
while (size) {
int len = 1+strlen(data);
unsigned char *file_sha1 = data + len;
char *path = strchr(data, ' ');
if (size  len + 20 || !path)
return -1;
data += len + 20;
size -= len + 20;
mark_needs_sha1(sha1, blob, file_sha1);
}
return 0;
}

and there's one HUGE advantage to _not_ having escaping: sorting and
comparing.

If you escape things, you now have to decide how you sort filenames. Do
you sort them by the escaped representation, or by the raw  
representation? Do you always have to escape or unescape the name in order 
to sort it.

So I like ASCII as much as the next guy, but it's not a religion. If there 
isn't any point to it, there isn't any point to it.

The biggest irritation I have with the tree format I chose is actually
not the name (which is trivial), it's the sha1 part. Almost everything
else keeps the sha1 in the ASCII hexadecimal representation, and I
should have done that here too. Why? Not because it's a sha1 - hey, the 
binary representation is certainly denser and equivalent - but because an 
ASCII representation there would have allowed me to much more easily 
change the key format if I ever wanted to. Now it's very SHA1-specific.

Which I guess is fine - I don't really see any reason to change, and if I 
do change, I could always just re-generate the whole tree. But I think it 
would have been cleaner to have _that_ part in ASCII.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Roman Zippel
Hi,

On Fri, 8 Apr 2005, Linus Torvalds wrote:

 Also, I suspect that BKCVS actually bothers to get more details out of a
 BK tree than I cared about. People have pestered Larry about it, so BKCVS
 exports a lot of the nitty-gritty (per-file comments etc) that just
 doesn't actually _matter_, but people whine about. Me, I don't care. My
 sparse-conversion just took the important parts.

As soon as you want to synchronize and merge two trees, you will know why 
this information does matter.
(/me looks closer at the sparse-conversion...)
It seems you exported the complete parent information and this is exactly 
the nitty-gritty I was whining about and which is not available via 
bkcvs or bkweb and it's the most crucial information to make the bk data 
useful outside of bk. Larry was previously very clear about this that he 
considers this proprietary bk meta data and anyone attempting to export 
this information is in violation with the free bk licence, so you indeed 
just took the important parts and this is/was explicitly verboten for 
normal bk users.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel SCM saga..

2005-04-09 Thread Eric D. Mudama
On Apr 8, 2005 4:52 PM, Roman Zippel [EMAIL PROTECTED] wrote:
 The problem is you pay a price for this. There must be a reason developers
 were adding another GB of memory just to run BK.
 Preserving the complete merge history does indeed make repeated merges
 simpler, but it builds up complex meta data, which has to be managed
 forever. I doubt that this is really an advantage in the long term. I
 expect that we were better off serializing changesets in the main
 repository. For example bk does something like this:
 
 A1 - A2 - A3 - BM
   \- B1 - B2 --^
 
 and instead of creating the merge changeset, one could merge them like
 this:
 
 A1 - A2 - A3 - B1 - B2
 
 This results in a simpler repository, which is more scalable and which
 is easier for users to work with (e.g. binary bug search).
 The disadvantage would be it will cause more minor conflicts, when changes
 are pulled back into the original tree, but which should be easily
 resolvable most of the time.

The kicker comes that B1 was developed based on A1, so any test
results were based on B1 being a single changeset delta away from A1. 
If the resulting 'BM' fails testing, and you've converted into the
linear model above where B2 has failed, you lose the ability to
isolate B1's changes and where they came from, to revalidate the
developer's results.

With bugs and fixes that can be validated in a few hours, this may not
be a problem, but when chasing a bug that takes days or weeks to
manifest, that a developer swears they fixed, one has to be able to
reproduce their exact test environment.

I believe that flattening the change graph makes history reproduction
impossible, or alternately, you are imposing on each developer to test
the merge results at B1 + A1..3 before submission, but in doing so,
the test time may require additional test periods etc and with
sufficient velocity, might never close.  This is the problem CVS has
if you don't create micro branches for every single modification.

--eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >