Re: GNU make to consider files checksum

2009-11-28 Thread Harald Dunkel

I would say that this thread went a little bit out of focus.
Maybe it is allowed to add my $.02?

Traditional Make does a check like

if timestamp(file1)  timestamp(file2) then
rebuild file2 file1

The fundamental flaw in this is obvious: file1 can be replaced
anytime by another file with an older timestamp than file2. The
result is that file2 is not rebuilt, even though file1 has
changed.

As a workaround you have to touch file1 on checkout, _and_ you
have to be sure that no builds are running. Both restrictions can
become a huge problem when working with automatic builds in a
large team with several parallel development branches. You get
more builds than necessary (which implies more tests to be run),
and everybody has to spend more time in waiting.

Maybe you are used to these restrictions and don't consider it as
a problem, or maybe you are simply not affected, but IMHO this is
_highly_ painful.

Of course you can always make it work somehow without touching
Make. Surely I do not want you to drop the traditional timestamp
check. But an optional checksum feature as suggested by Giuseppe
would be a reasonable extension.


It doesn't have to be perfect, it just needs to be better.


Regards

Harri


___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make


RE: GNU make to consider files checksum

2009-10-05 Thread lasse.makholm

Tim Murphy wrote:
 I think that checksumming might benefit some targets.  It would be
 nice to be able to implement different methods for different targets
 - because not all methods work well in all circumstances.

 I have one example where every single file in a huge build includes 1
 particular header file.  The file defines macros which are the
 features that are enabled or disabled in the build.

 We know which features are used by particular components so in theory
 we could work out not to rebuild components that are not influenced by
 what's happened to the header file.  e.g. we could switch on a feature
 or add a new feature without forcing a rebuild of the entire source
 base.

You can do that already today by simply splitting your global feature
header file into smaller pieces and letting targets depend on only
the relevant pieces rather than everything...

Of course that means you have to know which targets need which
pieces of the feature set to define the correct dependencies but
that's the price you pay for properly functioning incremental
rebuilds. You can't have your cake and eat it too...

This sort dependency generation can usually be automated fairly
easily though, but it's something that is highly dependant on your
software architecture, build structure, programming language,
etc...

For that reason it belongs in your makefiles and not in GNU make
itself... 

 This requires something like md5 but also some kind of filter to
 determine what kinds of changes are significant to the particular
 target that you are testing the dependency for

IMNSHO, this is not a problem that make can (or should even attempt
to) solve for you. This filter as you call it would have to know a lot
about the the syntax your header and code files which makes it a bad
candidate for a core make feature.

This is the classic global.h problem of large software builds...

 You can emulate md5 checksum dependencies  in make of course, using
 temporary marker files, but it's a bit ugly and complicated..

This problem is not strictly related to MD5 summing. With MD5 summing
instead of timestamps, your global header file would still change and
cause a full rebuild because this is what you explicitly asked for by saying
that all targets depend on it.

/Lasse


___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make


Re: GNU make to consider files checksum

2009-10-05 Thread Tim Murphy
Hi :-)

2009/10/5  lasse.makh...@nokia.com:

 Tim Murphy wrote:
 I think that checksumming might benefit some targets.  It would be
 nice to be able to implement different methods for different targets
 - because not all methods work well in all circumstances.

 I have one example where every single file in a huge build includes 1
 particular header file.  The file defines macros which are the
 features that are enabled or disabled in the build.

 We know which features are used by particular components so in theory
 we could work out not to rebuild components that are not influenced by
 what's happened to the header file.  e.g. we could switch on a feature
 or add a new feature without forcing a rebuild of the entire source
 base.

 You can do that already today by simply splitting your global feature
 header file into smaller pieces and letting targets depend on only
 the relevant pieces rather than everything...

Yes, we have thought of that.  It's a good answer but it's hard to get
changes like that through into the absolutely enormous thing that it's
all used in.  That's just a human problem but it's more real than any
of the technical problems.  So it's not a solution we can use tomorrow
morning.

On the other hand what it the header file is stdio.h?  You can't
really do anything about that. what if the change is just a comment?

So I was searching for an example and that wasn't the greatest one.
What I really wanted to say was that for large files, an md5 checksum
is potentially a slow way to determine how out of date something is
but for small files it might be much more effective than a timestamp
and still be quick.

i.e. if you invent a new dependency mechanism then you need to be able
to balance where it is used so that it doesn't  end up making some
tasks worse.


 This requires something like md5 but also some kind of filter to
 determine what kinds of changes are significant to the particular
 target that you are testing the dependency for

 IMNSHO, this is not a problem that make can (or should even attempt
 to) solve for you. This filter as you call it would have to know a lot
 about the the syntax your header and code files which makes it a bad
 candidate for a core make feature.

The filter would have to be external (i.e. not part of make) but it
could be much faster if it was a loadable plugin.

 You can emulate md5 checksum dependencies  in make of course, using
 temporary marker files, but it's a bit ugly and complicated..

 This problem is not strictly related to MD5 summing. With MD5 summing
 instead of timestamps, your global header file would still change and
 cause a full rebuild because this is what you explicitly asked for by saying
 that all targets depend on it.

One would md5 the filtered file (the result of the filter), not the
original.  One's filter would be on the features that affect the
current project.  So the filtered file would be unchanged even if you
changed the original and added new features as long as they weren't
ones that the current project (an exe,dll,lib,whatever) cared about.
This would mean that you need not rebuild any of the object files for
the current project.

So this would require a special-feature-filter-just-for-me.  GNU Make
wouldn't provide it but it might provide a way to load it and thus
make it fast enough to be worth using in a lot of places.

Arranging this in make as it is is complicated and messy because it
would involve creating temporary marker files that contained the md5
in their name.  This would lead to a mess of temporary files which it
would be hard to clean up precisely because by the time you want to
clean them you might not know what their real name is anymore.

We might do this one day without any help from make but I think it's
worthy for make to look beyond timestamps (actually beyond a lot of
stuff) and I am suggesting how.  I am constantly amazed by some of the
great features make has and I just think that a few more amazing
features wouldn't be a bad thing.

Regards,

Tim


-- 
You could help some brave and decent people to have access to
uncensored news by making a donation at:

http://www.thezimbabwean.co.uk/


___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make


Re: GNU make to consider files checksum

2009-09-29 Thread Giuseppe Scrivano
Philip Guenther guent...@gmail.com writes:

 (Have you measured how often this sort of thing would save
 recompilation and/or relinking and how much time it would save then?
 What's the comparison to how much time would be spent calculating the
 checksums?  If it saves a minute once every 100 compiles but costs a
 second in each of those, then it's a net loss...)

I don't have numbers but I think it can save a lot of time in the
linking phase, that is *really* slow.

Best,
Giuseppe



___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make


Re: GNU make to consider files checksum

2009-09-29 Thread Tim Murphy
Hi,

I think that checksumming might benefit some targets.  It would be
nice to be able to implement different methods for different targets
- because not all methods work well in all circumstances.

I have one example where every single file in a huge build includes 1
particular header file.  The file defines macros which are the
features that are enabled or disabled in the build.

We know which features are used by particular components so in theory
we could work out not to rebuild components that are not influenced by
what's happened to the header file.  e.g. we could switch on a feature
or add a new feature without forcing a rebuild of the entire source
base.

This requires something like md5 but also some kind of filter to
determine what kinds of changes are significant to the particular
target that you are testing the dependency for

You can emulate md5 checksum dependencies  in make of course, using
temporary marker files, but it's a bit ugly and complicated..


Regards,

Tim

2009/9/29 Giuseppe Scrivano gscriv...@gnu.org:
 Philip Guenther guent...@gmail.com writes:

 (Have you measured how often this sort of thing would save
 recompilation and/or relinking and how much time it would save then?
 What's the comparison to how much time would be spent calculating the
 checksums?  If it saves a minute once every 100 compiles but costs a
 second in each of those, then it's a net loss...)

 I don't have numbers but I think it can save a lot of time in the
 linking phase, that is *really* slow.

 Best,
 Giuseppe



 ___
 Bug-make mailing list
 Bug-make@gnu.org
 http://lists.gnu.org/mailman/listinfo/bug-make




-- 
You could help some brave and decent people to have access to
uncensored news by making a donation at:

http://www.thezimbabwean.co.uk/


___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make


Re: GNU make to consider files checksum

2009-09-28 Thread Giuseppe Scrivano
Hi Philip,

it looks like a good idea.  Do you think it worths to be discussed with
automake hackers?


Cheers,
Giuseppe


Philip Guenther guent...@gmail.com writes:

 On Fri, Apr 11, 2008 at 2:45 PM, Giuseppe Scrivanogscriv...@gnu.org wrote:
 I could find on this ML archives only a thread about this subject: to
 consider the file checksum instead of the timestamp.
 Other systems like scons already support this feature and it would be
 great to have it for GNU Make too.

 This is a long dead thread (it's been sitting in my mailbox for a
 year, ouch), but I'll throw in my two cents that a makefile can
 implement this for itself with pattern rules.  Consider:

 
 %.o: %.c
 %.o.new: %.c
 $(COMPILE.c) -o $@ $
 %.o: %.o.new
 @{ [ -f $...@.md5 ]  md5sum -c --status $...@.md5; } || \
 { md5sum $ $...@.md5; cp $ $@; }

 .SECONDARY:
 -

 Poof, if you touch a .c file without making changes that affect the
 compiler output, the executable will not be relinked.  Indeed, the
 presence of the .SECONDARY target means the only thing that will be
 rerun each time is the md5sum.

 Yes, this is non-trivial to use, but it's also completely flexible,
 letting you use whatever checksum comparison you want (need to strip
 comments or RCS/CVS tags from the file before checksumming it?  Sure!)
 and can be used Right Now.

 Anyway, we now return you to your originally scheduled mailing list.


 Philip Guenther


___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make


Re: GNU make to consider files checksum

2009-09-28 Thread Philip Guenther
On Mon, Sep 28, 2009 at 11:05 AM, Giuseppe Scrivano gscriv...@gnu.org wrote:
 it looks like a good idea.  Do you think it worths to be discussed with
 automake hackers?

I'm not actually convinced that this checksumming is a good idea,
mainly because I'm not convinced this is enough of a problem.  The
point of my message was just that this problem *can* be solved at the
makefile level.  Attacking it by changing automake sounds practical
and probably a faster way to a solution, though I would prefer it to
be optional even there.

(Have you measured how often this sort of thing would save
recompilation and/or relinking and how much time it would save then?
What's the comparison to how much time would be spent calculating the
checksums?  If it saves a minute once every 100 compiles but costs a
second in each of those, then it's a net loss...)

Philip Guenther


___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make


Re: GNU make to consider files checksum

2009-09-06 Thread Philip Guenther
On Fri, Apr 11, 2008 at 2:45 PM, Giuseppe Scrivanogscriv...@gnu.org wrote:
 I could find on this ML archives only a thread about this subject: to
 consider the file checksum instead of the timestamp.
 Other systems like scons already support this feature and it would be
 great to have it for GNU Make too.

This is a long dead thread (it's been sitting in my mailbox for a
year, ouch), but I'll throw in my two cents that a makefile can
implement this for itself with pattern rules.  Consider:


%.o: %.c
%.o.new: %.c
$(COMPILE.c) -o $@ $
%.o: %.o.new
@{ [ -f $...@.md5 ]  md5sum -c --status $...@.md5; } || \
{ md5sum $ $...@.md5; cp $ $@; }

.SECONDARY:
-

Poof, if you touch a .c file without making changes that affect the
compiler output, the executable will not be relinked.  Indeed, the
presence of the .SECONDARY target means the only thing that will be
rerun each time is the md5sum.

Yes, this is non-trivial to use, but it's also completely flexible,
letting you use whatever checksum comparison you want (need to strip
comments or RCS/CVS tags from the file before checksumming it?  Sure!)
and can be used Right Now.

Anyway, we now return you to your originally scheduled mailing list.


Philip Guenther


___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make


Re: GNU make to consider files checksum

2008-08-28 Thread Giuseppe Scrivano
Eli Zaretskii wrote:
 Thanks.  (I'm not the head maintainer, so please wait for Paul and
 others to respond.)
I sent a message to this mailing list some months ago but I still didn't
get an answer.  Doesn't GNU Make want to consider files checksum in
addition to mtime?

Giuseppe


___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make


Re: GNU make to consider files checksum

2008-08-28 Thread Paul Smith
On Thu, 2008-08-28 at 09:06 +0200, Giuseppe Scrivano wrote:
 I sent a message to this mailing list some months ago but I still
 didn't get an answer.  Doesn't GNU Make want to consider files
 checksum in addition to mtime?

There was a Google SOC project for GNU make which added user-definable
out of date criteria; these could be defined on a per-target basis and,
as per the name, were defined by the user, not hardcoded (as md5sum
would be).  For example, you can short-circuit an expensive md5sum check
by simply comparing the file sizes: most of the time they will be
different and if so you can skip md5sum altogether.

The major change this implies is that you must have a stateful make; a
make that stores state from previous invocations, then reads it the next
time.  Normal make is stateless; or at least it uses only the state
provided by the filesystem and not its own state.


The project was successful in that the changes were delivered; however,
the user interface implementation is, in my opinion, too baroque at the
moment.  Its use model confuses me, anyway.  This is not so much the
fault of the student as my fault: I simply did not have enough time to
be a good mentor for the project and provide enough direction.  I knew
this would be an issue (I didn't solicit anyone to do this work but
someone contacted me and really wanted to do it, and I wanted it done)
but I hoped I would find the time.  And, a lot of really good work was
done... it's just the presentation to the user that I think needs more
effort.

-- 
---
 Paul D. Smith [EMAIL PROTECTED]  Find some GNU make tips at:
 http://www.gnu.org  http://make.mad-scientist.us
 Please remain calm...I may be mad, but I am a professional. --Mad Scientist


___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make


GNU make to consider files checksum

2008-04-12 Thread Giuseppe Scrivano
Hello,

I could find on this ML archives only a thread about this subject: to
consider the file checksum instead of the timestamp.
Other systems like scons already support this feature and it would be
great to have it for GNU Make too.

I attached a patch against the current CVS to add --use-checksum to
GNU Make, it is just a proof-of-concept but it shows that adding this
feature can really boost a remake.

In this way, simply touching a file will not cause it to be
recompiled, as it was easy to imagine but for example let's say you
modify a comment in the file test.c; using the standard make you will
have to:

test.c - test.o - test

Using a checksum you will have only:

test.c - test.o

because the .o file is unchanged.

This scenario is what surprised me more as it is a very common one and
can save a lot of time at linking time.

The biggest problem is how save information, in the patch the checksum
for file a is saved in the file a.checksum, but I don't think this can
be a reasonable solution; probably hide them in a subdirectory is not
a so bad idea.

Concurrent accesses are not a problem using files, they will be used
almost in the same way as the timestamp information is used now;
anyway, in the worst case the hash will be different and the file will
be recompiled.

Beside use a better algorithm to find a hash for the file, MD5 is my
first thought, and hopefully find another way to store data (but still
I think files are the best choice), do you have other ideas or
suggestions?

Regards,
Giuseppe

? checksum_patch.diff
Index: file.c
===
RCS file: /sources/make/make/file.c,v
retrieving revision 1.90
diff -u -r1.90 file.c
--- file.c	4 Nov 2007 21:54:01 -	1.90
+++ file.c	11 Apr 2008 21:20:54 -
@@ -1,6 +1,6 @@
-/* Target file management for GNU Make.
+/* Target file management fo GNU Make.
 Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
-1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software
+1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software
 Foundation, Inc.
 This file is part of GNU Make.
 
@@ -189,8 +189,80 @@
   f-last = new;
 }
 
+  new-last_checksum = read_checksum (new);
+
+
   return new;
 }
+
+
+/* Compute the checksum for the file.  */
+
+int
+compute_checksum(struct file *new)
+{
+  int checksum = 0;
+  FILE *f;
+  char buffer [4096];
+  
+  f = fopen (new-name, r);
+  if (f != NULL)
+{
+  size_t nbr;
+  int i;
+  do 
+{
+  nbr = fread (buffer, 4096, 1, f);
+  
+  for (i = 0; i  nbr; i++)
+checksum = 21 * checksum + 23 * buffer[i];
+  
+}
+  while (nbr);
+  fclose (f);
+}
+  return checksum;
+}
+
+int
+read_checksum(struct file *new)
+{
+  int checksum = 0;
+  FILE *f;
+  char * checksum_file = (char*) xmalloc (strlen (new-name) + 10);
+  
+  sprintf (checksum_file, %s.checksum, new-name);
+  
+  f = fopen (checksum_file, r);
+  if (f != NULL)
+{
+  fread (checksum, 4, 1, f);
+  fclose (f);
+}
+  
+  
+  free (checksum_file);
+  return checksum;
+}
+
+void
+write_checksum(struct file *new)
+{
+  FILE *f;
+  char * checksum_file = (char*) xmalloc (strlen (new-name) + 10);
+  
+  sprintf (checksum_file, %s.checksum, new-name);
+  
+  f = fopen (checksum_file, w);
+  if (f != NULL)
+{
+  fwrite (new-checksum, 4, 1, f);
+  fclose (f);
+}
+
+  free (checksum_file);
+}
+
 
 /* Rehash FILE to NAME.  This is not as simple as resetting
the `hname' member, since it must be put in a new hash bucket,
Index: filedef.h
===
RCS file: /sources/make/make/filedef.h,v
retrieving revision 2.30
diff -u -r2.30 filedef.h
--- filedef.h	4 Jul 2007 19:35:18 -	2.30
+++ filedef.h	11 Apr 2008 21:20:54 -
@@ -1,6 +1,6 @@
 /* Definition of target file data structures for GNU Make.
 Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
-1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software
+1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software
 Foundation, Inc.
 This file is part of GNU Make.
 
@@ -94,6 +94,8 @@
pattern-specific variables.  */
 unsigned int considered:1;  /* equal to 'considered' if file has been
considered on current scan of goal chain */
+int checksum; /* Actual checksum of the file.  */
+int last_checksum; /* Last checksum registered on the file.  */
   };
 
 
@@ -103,6 +105,9 @@
 
 struct file *lookup_file (const char *name);
 struct file *enter_file (const char *name);
+int compute_checksum(struct file *new);
+int read_checksum(struct file *new);
+void write_checksum(struct file *new);
 struct dep *parse_prereqs (char *prereqs);
 void remove_intermediates (int sig);
 void snap_deps (void);
Index: main.c

Re: GNU make to consider files checksum

2008-04-12 Thread Eli Zaretskii
 From: Giuseppe Scrivano [EMAIL PROTECTED]
 Date: Fri, 11 Apr 2008 23:45:02 +0200
 
 Other systems like scons already support this feature and it would be
 great to have it for GNU Make too.
 
 I attached a patch against the current CVS to add --use-checksum to
 GNU Make, it is just a proof-of-concept but it shows that adding this
 feature can really boost a remake.

Thanks.  (I'm not the head maintainer, so please wait for Paul and
others to respond.)

 +int
 +compute_checksum(struct file *new)
 +{
 +  int checksum = 0;
 +  FILE *f;
 +  char buffer [4096];
 +  
 +  f = fopen (new-name, r);

This needs to use rb, not r.

Also, what about directories? they cannot be fopen'ed and fread, at
least not on all supported systems.


___
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make