Re: GNU make to consider files checksum
I would say that this thread went a little bit out of focus. Maybe it is allowed to add my $.02? Traditional Make does a check like if timestamp(file1) timestamp(file2) then rebuild file2 file1 The fundamental flaw in this is obvious: file1 can be replaced anytime by another file with an older timestamp than file2. The result is that file2 is not rebuilt, even though file1 has changed. As a workaround you have to touch file1 on checkout, _and_ you have to be sure that no builds are running. Both restrictions can become a huge problem when working with automatic builds in a large team with several parallel development branches. You get more builds than necessary (which implies more tests to be run), and everybody has to spend more time in waiting. Maybe you are used to these restrictions and don't consider it as a problem, or maybe you are simply not affected, but IMHO this is _highly_ painful. Of course you can always make it work somehow without touching Make. Surely I do not want you to drop the traditional timestamp check. But an optional checksum feature as suggested by Giuseppe would be a reasonable extension. It doesn't have to be perfect, it just needs to be better. Regards Harri ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make
RE: GNU make to consider files checksum
Tim Murphy wrote: I think that checksumming might benefit some targets. It would be nice to be able to implement different methods for different targets - because not all methods work well in all circumstances. I have one example where every single file in a huge build includes 1 particular header file. The file defines macros which are the features that are enabled or disabled in the build. We know which features are used by particular components so in theory we could work out not to rebuild components that are not influenced by what's happened to the header file. e.g. we could switch on a feature or add a new feature without forcing a rebuild of the entire source base. You can do that already today by simply splitting your global feature header file into smaller pieces and letting targets depend on only the relevant pieces rather than everything... Of course that means you have to know which targets need which pieces of the feature set to define the correct dependencies but that's the price you pay for properly functioning incremental rebuilds. You can't have your cake and eat it too... This sort dependency generation can usually be automated fairly easily though, but it's something that is highly dependant on your software architecture, build structure, programming language, etc... For that reason it belongs in your makefiles and not in GNU make itself... This requires something like md5 but also some kind of filter to determine what kinds of changes are significant to the particular target that you are testing the dependency for IMNSHO, this is not a problem that make can (or should even attempt to) solve for you. This filter as you call it would have to know a lot about the the syntax your header and code files which makes it a bad candidate for a core make feature. This is the classic global.h problem of large software builds... You can emulate md5 checksum dependencies in make of course, using temporary marker files, but it's a bit ugly and complicated.. This problem is not strictly related to MD5 summing. With MD5 summing instead of timestamps, your global header file would still change and cause a full rebuild because this is what you explicitly asked for by saying that all targets depend on it. /Lasse ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make
Re: GNU make to consider files checksum
Hi :-) 2009/10/5 lasse.makh...@nokia.com: Tim Murphy wrote: I think that checksumming might benefit some targets. It would be nice to be able to implement different methods for different targets - because not all methods work well in all circumstances. I have one example where every single file in a huge build includes 1 particular header file. The file defines macros which are the features that are enabled or disabled in the build. We know which features are used by particular components so in theory we could work out not to rebuild components that are not influenced by what's happened to the header file. e.g. we could switch on a feature or add a new feature without forcing a rebuild of the entire source base. You can do that already today by simply splitting your global feature header file into smaller pieces and letting targets depend on only the relevant pieces rather than everything... Yes, we have thought of that. It's a good answer but it's hard to get changes like that through into the absolutely enormous thing that it's all used in. That's just a human problem but it's more real than any of the technical problems. So it's not a solution we can use tomorrow morning. On the other hand what it the header file is stdio.h? You can't really do anything about that. what if the change is just a comment? So I was searching for an example and that wasn't the greatest one. What I really wanted to say was that for large files, an md5 checksum is potentially a slow way to determine how out of date something is but for small files it might be much more effective than a timestamp and still be quick. i.e. if you invent a new dependency mechanism then you need to be able to balance where it is used so that it doesn't end up making some tasks worse. This requires something like md5 but also some kind of filter to determine what kinds of changes are significant to the particular target that you are testing the dependency for IMNSHO, this is not a problem that make can (or should even attempt to) solve for you. This filter as you call it would have to know a lot about the the syntax your header and code files which makes it a bad candidate for a core make feature. The filter would have to be external (i.e. not part of make) but it could be much faster if it was a loadable plugin. You can emulate md5 checksum dependencies in make of course, using temporary marker files, but it's a bit ugly and complicated.. This problem is not strictly related to MD5 summing. With MD5 summing instead of timestamps, your global header file would still change and cause a full rebuild because this is what you explicitly asked for by saying that all targets depend on it. One would md5 the filtered file (the result of the filter), not the original. One's filter would be on the features that affect the current project. So the filtered file would be unchanged even if you changed the original and added new features as long as they weren't ones that the current project (an exe,dll,lib,whatever) cared about. This would mean that you need not rebuild any of the object files for the current project. So this would require a special-feature-filter-just-for-me. GNU Make wouldn't provide it but it might provide a way to load it and thus make it fast enough to be worth using in a lot of places. Arranging this in make as it is is complicated and messy because it would involve creating temporary marker files that contained the md5 in their name. This would lead to a mess of temporary files which it would be hard to clean up precisely because by the time you want to clean them you might not know what their real name is anymore. We might do this one day without any help from make but I think it's worthy for make to look beyond timestamps (actually beyond a lot of stuff) and I am suggesting how. I am constantly amazed by some of the great features make has and I just think that a few more amazing features wouldn't be a bad thing. Regards, Tim -- You could help some brave and decent people to have access to uncensored news by making a donation at: http://www.thezimbabwean.co.uk/ ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make
Re: GNU make to consider files checksum
Philip Guenther guent...@gmail.com writes: (Have you measured how often this sort of thing would save recompilation and/or relinking and how much time it would save then? What's the comparison to how much time would be spent calculating the checksums? If it saves a minute once every 100 compiles but costs a second in each of those, then it's a net loss...) I don't have numbers but I think it can save a lot of time in the linking phase, that is *really* slow. Best, Giuseppe ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make
Re: GNU make to consider files checksum
Hi, I think that checksumming might benefit some targets. It would be nice to be able to implement different methods for different targets - because not all methods work well in all circumstances. I have one example where every single file in a huge build includes 1 particular header file. The file defines macros which are the features that are enabled or disabled in the build. We know which features are used by particular components so in theory we could work out not to rebuild components that are not influenced by what's happened to the header file. e.g. we could switch on a feature or add a new feature without forcing a rebuild of the entire source base. This requires something like md5 but also some kind of filter to determine what kinds of changes are significant to the particular target that you are testing the dependency for You can emulate md5 checksum dependencies in make of course, using temporary marker files, but it's a bit ugly and complicated.. Regards, Tim 2009/9/29 Giuseppe Scrivano gscriv...@gnu.org: Philip Guenther guent...@gmail.com writes: (Have you measured how often this sort of thing would save recompilation and/or relinking and how much time it would save then? What's the comparison to how much time would be spent calculating the checksums? If it saves a minute once every 100 compiles but costs a second in each of those, then it's a net loss...) I don't have numbers but I think it can save a lot of time in the linking phase, that is *really* slow. Best, Giuseppe ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make -- You could help some brave and decent people to have access to uncensored news by making a donation at: http://www.thezimbabwean.co.uk/ ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make
Re: GNU make to consider files checksum
Hi Philip, it looks like a good idea. Do you think it worths to be discussed with automake hackers? Cheers, Giuseppe Philip Guenther guent...@gmail.com writes: On Fri, Apr 11, 2008 at 2:45 PM, Giuseppe Scrivanogscriv...@gnu.org wrote: I could find on this ML archives only a thread about this subject: to consider the file checksum instead of the timestamp. Other systems like scons already support this feature and it would be great to have it for GNU Make too. This is a long dead thread (it's been sitting in my mailbox for a year, ouch), but I'll throw in my two cents that a makefile can implement this for itself with pattern rules. Consider: %.o: %.c %.o.new: %.c $(COMPILE.c) -o $@ $ %.o: %.o.new @{ [ -f $...@.md5 ] md5sum -c --status $...@.md5; } || \ { md5sum $ $...@.md5; cp $ $@; } .SECONDARY: - Poof, if you touch a .c file without making changes that affect the compiler output, the executable will not be relinked. Indeed, the presence of the .SECONDARY target means the only thing that will be rerun each time is the md5sum. Yes, this is non-trivial to use, but it's also completely flexible, letting you use whatever checksum comparison you want (need to strip comments or RCS/CVS tags from the file before checksumming it? Sure!) and can be used Right Now. Anyway, we now return you to your originally scheduled mailing list. Philip Guenther ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make
Re: GNU make to consider files checksum
On Mon, Sep 28, 2009 at 11:05 AM, Giuseppe Scrivano gscriv...@gnu.org wrote: it looks like a good idea. Do you think it worths to be discussed with automake hackers? I'm not actually convinced that this checksumming is a good idea, mainly because I'm not convinced this is enough of a problem. The point of my message was just that this problem *can* be solved at the makefile level. Attacking it by changing automake sounds practical and probably a faster way to a solution, though I would prefer it to be optional even there. (Have you measured how often this sort of thing would save recompilation and/or relinking and how much time it would save then? What's the comparison to how much time would be spent calculating the checksums? If it saves a minute once every 100 compiles but costs a second in each of those, then it's a net loss...) Philip Guenther ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make
Re: GNU make to consider files checksum
On Fri, Apr 11, 2008 at 2:45 PM, Giuseppe Scrivanogscriv...@gnu.org wrote: I could find on this ML archives only a thread about this subject: to consider the file checksum instead of the timestamp. Other systems like scons already support this feature and it would be great to have it for GNU Make too. This is a long dead thread (it's been sitting in my mailbox for a year, ouch), but I'll throw in my two cents that a makefile can implement this for itself with pattern rules. Consider: %.o: %.c %.o.new: %.c $(COMPILE.c) -o $@ $ %.o: %.o.new @{ [ -f $...@.md5 ] md5sum -c --status $...@.md5; } || \ { md5sum $ $...@.md5; cp $ $@; } .SECONDARY: - Poof, if you touch a .c file without making changes that affect the compiler output, the executable will not be relinked. Indeed, the presence of the .SECONDARY target means the only thing that will be rerun each time is the md5sum. Yes, this is non-trivial to use, but it's also completely flexible, letting you use whatever checksum comparison you want (need to strip comments or RCS/CVS tags from the file before checksumming it? Sure!) and can be used Right Now. Anyway, we now return you to your originally scheduled mailing list. Philip Guenther ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make
Re: GNU make to consider files checksum
Eli Zaretskii wrote: Thanks. (I'm not the head maintainer, so please wait for Paul and others to respond.) I sent a message to this mailing list some months ago but I still didn't get an answer. Doesn't GNU Make want to consider files checksum in addition to mtime? Giuseppe ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make
Re: GNU make to consider files checksum
On Thu, 2008-08-28 at 09:06 +0200, Giuseppe Scrivano wrote: I sent a message to this mailing list some months ago but I still didn't get an answer. Doesn't GNU Make want to consider files checksum in addition to mtime? There was a Google SOC project for GNU make which added user-definable out of date criteria; these could be defined on a per-target basis and, as per the name, were defined by the user, not hardcoded (as md5sum would be). For example, you can short-circuit an expensive md5sum check by simply comparing the file sizes: most of the time they will be different and if so you can skip md5sum altogether. The major change this implies is that you must have a stateful make; a make that stores state from previous invocations, then reads it the next time. Normal make is stateless; or at least it uses only the state provided by the filesystem and not its own state. The project was successful in that the changes were delivered; however, the user interface implementation is, in my opinion, too baroque at the moment. Its use model confuses me, anyway. This is not so much the fault of the student as my fault: I simply did not have enough time to be a good mentor for the project and provide enough direction. I knew this would be an issue (I didn't solicit anyone to do this work but someone contacted me and really wanted to do it, and I wanted it done) but I hoped I would find the time. And, a lot of really good work was done... it's just the presentation to the user that I think needs more effort. -- --- Paul D. Smith [EMAIL PROTECTED] Find some GNU make tips at: http://www.gnu.org http://make.mad-scientist.us Please remain calm...I may be mad, but I am a professional. --Mad Scientist ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make
GNU make to consider files checksum
Hello, I could find on this ML archives only a thread about this subject: to consider the file checksum instead of the timestamp. Other systems like scons already support this feature and it would be great to have it for GNU Make too. I attached a patch against the current CVS to add --use-checksum to GNU Make, it is just a proof-of-concept but it shows that adding this feature can really boost a remake. In this way, simply touching a file will not cause it to be recompiled, as it was easy to imagine but for example let's say you modify a comment in the file test.c; using the standard make you will have to: test.c - test.o - test Using a checksum you will have only: test.c - test.o because the .o file is unchanged. This scenario is what surprised me more as it is a very common one and can save a lot of time at linking time. The biggest problem is how save information, in the patch the checksum for file a is saved in the file a.checksum, but I don't think this can be a reasonable solution; probably hide them in a subdirectory is not a so bad idea. Concurrent accesses are not a problem using files, they will be used almost in the same way as the timestamp information is used now; anyway, in the worst case the hash will be different and the file will be recompiled. Beside use a better algorithm to find a hash for the file, MD5 is my first thought, and hopefully find another way to store data (but still I think files are the best choice), do you have other ideas or suggestions? Regards, Giuseppe ? checksum_patch.diff Index: file.c === RCS file: /sources/make/make/file.c,v retrieving revision 1.90 diff -u -r1.90 file.c --- file.c 4 Nov 2007 21:54:01 - 1.90 +++ file.c 11 Apr 2008 21:20:54 - @@ -1,6 +1,6 @@ -/* Target file management for GNU Make. +/* Target file management fo GNU Make. Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, -1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software +1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc. This file is part of GNU Make. @@ -189,8 +189,80 @@ f-last = new; } + new-last_checksum = read_checksum (new); + + return new; } + + +/* Compute the checksum for the file. */ + +int +compute_checksum(struct file *new) +{ + int checksum = 0; + FILE *f; + char buffer [4096]; + + f = fopen (new-name, r); + if (f != NULL) +{ + size_t nbr; + int i; + do +{ + nbr = fread (buffer, 4096, 1, f); + + for (i = 0; i nbr; i++) +checksum = 21 * checksum + 23 * buffer[i]; + +} + while (nbr); + fclose (f); +} + return checksum; +} + +int +read_checksum(struct file *new) +{ + int checksum = 0; + FILE *f; + char * checksum_file = (char*) xmalloc (strlen (new-name) + 10); + + sprintf (checksum_file, %s.checksum, new-name); + + f = fopen (checksum_file, r); + if (f != NULL) +{ + fread (checksum, 4, 1, f); + fclose (f); +} + + + free (checksum_file); + return checksum; +} + +void +write_checksum(struct file *new) +{ + FILE *f; + char * checksum_file = (char*) xmalloc (strlen (new-name) + 10); + + sprintf (checksum_file, %s.checksum, new-name); + + f = fopen (checksum_file, w); + if (f != NULL) +{ + fwrite (new-checksum, 4, 1, f); + fclose (f); +} + + free (checksum_file); +} + /* Rehash FILE to NAME. This is not as simple as resetting the `hname' member, since it must be put in a new hash bucket, Index: filedef.h === RCS file: /sources/make/make/filedef.h,v retrieving revision 2.30 diff -u -r2.30 filedef.h --- filedef.h 4 Jul 2007 19:35:18 - 2.30 +++ filedef.h 11 Apr 2008 21:20:54 - @@ -1,6 +1,6 @@ /* Definition of target file data structures for GNU Make. Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, -1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software +1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc. This file is part of GNU Make. @@ -94,6 +94,8 @@ pattern-specific variables. */ unsigned int considered:1; /* equal to 'considered' if file has been considered on current scan of goal chain */ +int checksum; /* Actual checksum of the file. */ +int last_checksum; /* Last checksum registered on the file. */ }; @@ -103,6 +105,9 @@ struct file *lookup_file (const char *name); struct file *enter_file (const char *name); +int compute_checksum(struct file *new); +int read_checksum(struct file *new); +void write_checksum(struct file *new); struct dep *parse_prereqs (char *prereqs); void remove_intermediates (int sig); void snap_deps (void); Index: main.c
Re: GNU make to consider files checksum
From: Giuseppe Scrivano [EMAIL PROTECTED] Date: Fri, 11 Apr 2008 23:45:02 +0200 Other systems like scons already support this feature and it would be great to have it for GNU Make too. I attached a patch against the current CVS to add --use-checksum to GNU Make, it is just a proof-of-concept but it shows that adding this feature can really boost a remake. Thanks. (I'm not the head maintainer, so please wait for Paul and others to respond.) +int +compute_checksum(struct file *new) +{ + int checksum = 0; + FILE *f; + char buffer [4096]; + + f = fopen (new-name, r); This needs to use rb, not r. Also, what about directories? they cannot be fopen'ed and fread, at least not on all supported systems. ___ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make