Re: Unused #include statements

2015-01-19 Thread Zoltan Klinger
Robert, Peff and Junio.

Thank you all for your feedback. It's clear now what sort of analysis I should
aim towards.

Thanks,
Zoltan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unused #include statements

2015-01-15 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 One of our rules is that git-compat-util.h (or one of the well-known
 headers which includes, cache.h or builtin.h) is included first in any
 translation unit.

Perhaps by now we can spell it out a bit more explicitly and update
CodingGuidelines.  We have:

 - The first #include in C files, except in platform specific
   compat/ implementations, should be git-compat-util.h or another
   header file that includes it, such as cache.h or builtin.h.

such as might be making things unnecessarily vague; I do not think
a valid reason why we should say a .c file that includes advice.h
as the first thing satisfies this requirement, for example.

Because:

 - A command that interacts with the object store, config subsystem,
   the index, or the working tree cannot do anything without using
   what is declared in cache.h.

 - A built-in command must be declared in builtin.h, so anything
   in builtin/*.c must include it.

it may be reasonable to say the first *.h file included must be one
of git-compat-util.h, cache.h or builtin.h (and then we make sure
that compat-util is the first thing included in either of the latter
two).

While I very much agree with the principle Robert alluded to [*1*],
we may want to loosen that for .c files that include builtin.h or
cache.h for the sake of brevity.  For example, if you are builtin
(hence you start by #include builtin.h), it may be reasonable to
allow you to take whatever is in cache.h for granted [*2*].

So the rule might be:

 - The first #include in C files, except in platform specific
   compat/ implementations, must be either git-compat-util.h,
   cache.h or builtin.h.

 - A C file must directly include the header files that declare the
   functions and the types it uses, except for the functions and
   types that are made available to it by including one of the
   header files it must include by the previous rule.

Optionally, 

 - A C file must include only one of git-compat-util.h, cache.h
   or builtin.h; e.g. if you include builtin.h, do not include
   the other two, but it can consider what is availble in cache.h
   available to it.

Thoughts?  I am not looking forward to a torrent of patches whose
sole purpose is to make the existing C files conform to any such
rule, though.  Clean-up patches that trickle in at a low rate is
tolerable, but a torrent is too distracting.


[Footnote]

*1* For example, even though diff.h may include tree-walk.h for
its own use, a .c file that includes diff.h without including
tree-walk.h that uses update_tree_entry() or anything that is
declared in the latter is very iffy from semantic point of view.

*2* Because that facility is so widely used inside the codebase,
builtin.h includes strbuf.h, so in addition to what are in
cache.h, we may want to allow builtin implementations to take
strbufs for granted as well.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unused #include statements

2015-01-15 Thread Jeff King
On Thu, Jan 15, 2015 at 10:50:45AM -0800, Junio C Hamano wrote:

 So the rule might be:
 
  - The first #include in C files, except in platform specific
compat/ implementations, must be either git-compat-util.h,
cache.h or builtin.h.
 
  - A C file must directly include the header files that declare the
functions and the types it uses, except for the functions and
types that are made available to it by including one of the
header files it must include by the previous rule.

Yeah, that makes sense (and is what I took away from the existing rule
in CodingGuidelines, but I agree what is there is not very rigorous).

 Optionally, 
 
  - A C file must include only one of git-compat-util.h, cache.h
or builtin.h; e.g. if you include builtin.h, do not include
the other two, but it can consider what is availble in cache.h
available to it.
 
 Thoughts?  I am not looking forward to a torrent of patches whose
 sole purpose is to make the existing C files conform to any such
 rule, though.  Clean-up patches that trickle in at a low rate is
 tolerable, but a torrent is too distracting.

I don't think the optionally one above is that necessary. Not because
I don't agree with it, but because I do not know that we want to get
into the business of laying out every minute detail and implication.
The CodingGuidelines document is meant to be guidelines, and I do not
want to see arguments like well, the guidelines do not explicitly
_disallow_ this, so you must accept it or add something to the
guideline. That is a waste of everybody's time.

A general philosophy + good taste (from the submitter and the
maintainer) should ideally be enough. And hopefully would stop a torrent
of but this file doesn't conform to the letter of CodingGuidelines!.
Maybe it does not, but if there is no tangible benefit besides blindly
following some rules, it is not worth the precious time of developers.

Which isn't to say we shouldn't clarify the document when need be. But I
think what I quoted at the top already is probably a good improvement
over what is there.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unused #include statements

2015-01-15 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Thu, Jan 15, 2015 at 10:50:45AM -0800, Junio C Hamano wrote:
  ...
 Thoughts?  I am not looking forward to a torrent of patches whose
 sole purpose is to make the existing C files conform to any such
 rule, though.  Clean-up patches that trickle in at a low rate is
 tolerable, but a torrent is too distracting.

 I don't think the optionally one above is that necessary. Not because
 I don't agree with it, but because I do not know that we want to get
 into the business of laying out every minute detail and implication.
 The CodingGuidelines document is meant to be guidelines, and I do not
 want to see arguments like well, the guidelines do not explicitly
 _disallow_ this, so you must accept it or add something to the
 guideline. That is a waste of everybody's time.

Totally.  I know we do not want to get into that business.

 A general philosophy + good taste (from the submitter and the
 maintainer) should ideally be enough.

Yes, ideally ;-)

 Which isn't to say we shouldn't clarify the document when need be. But I
 think what I quoted at the top already is probably a good improvement
 over what is there.

OK, thanks.  Let's queue something like this for post 2.3 cycle,
then.

-- 8 --
Subject: CodingGuidelines: clarify C #include rules

Even though advice.h includes git-compat-util.h, it is not
sensible to have it as the first #include and indirectly satisify
the You must give git-compat-util.h a clean environment to set up
feature test macros before including any of the system headers are
included, which is the real requirement.

Because:

 - A command that interacts with the object store, config subsystem,
   the index, or the working tree cannot do anything without using
   what is declared in cache.h;

 - A built-in command must be declared in builtin.h, so anything
   in builtin/*.c must include it;

 - These two headers both include git-compat-util.h as the first
   thing; and

 - Almost all our *.c files (outside compat/ and borrowed files in
   xdiff/) need some Git-ness from cache.h to do something
   Git-ish.

let's explicitly specify that one of these three header files must
be the first thing that is included.

Any of our *.c file should include the header file that directly
declares what it uses, instead of relying on the fact that some *.h
file it includes happens to include another *.h file that declares
the necessary function or type.  Spell it out as another guideline
item.

Helped-by: Jeff King p...@peff.net
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 Documentation/CodingGuidelines | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 894546d..578d07c 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -328,9 +328,14 @@ For C programs:
 
  - When you come up with an API, document it.
 
- - The first #include in C files, except in platform specific
-   compat/ implementations, should be git-compat-util.h or another
-   header file that includes it, such as cache.h or builtin.h.
+ - The first #include in C files, except in platform specific compat/
+   implementations, must be either git-compat-util.h, cache.h or
+   builtin.h.  You do not have to include more than one of these.
+
+ - A C file must directly include the header files that declare the
+   functions and the types it uses, except for the functions and types
+   that are made available to it by including one of the header files
+   it must include by the previous rule.
 
  - If you are planning a new command, consider writing it in shell
or perl first, so that changes in semantics can be easily
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unused #include statements

2015-01-15 Thread Jeff King
On Thu, Jan 15, 2015 at 03:20:09PM -0800, Junio C Hamano wrote:

 OK, thanks.  Let's queue something like this for post 2.3 cycle,
 then.
 
 -- 8 --
 Subject: CodingGuidelines: clarify C #include rules
 [...]

Thanks, this looks good to me.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unused #include statements

2015-01-14 Thread Robert Schiele
Hi Zoltan,

I can't make a statement for the git project but I consider this kind
of brute-force removal a very problematic approach for languages like
C and C++. The reason for that is simple: Often header files include
other header files since their content depends on those other header
files. Let's assume a header file a.h includes b.h. Now consider a
file c.c includes both a.h and b.h since they are actually using stuff
from both of them. Your brute-force approach would now remove b.h
since it is indirectly pulled in through a.h. While the removal would
no lead to technially incorrect code it would no longer reflect the
semantical situation (since c.c still uses stuff from b.h). Even worse
is that if you later modify c.c to no longer use stuff from a.h you
could no longer remove a.h since that way you would break the chain to
b.h.

Thus doing those kind of brute-force removals generally makes the
include structure in a project very fragile. The analysis itself you
did is still useful to identify header files that can potentially be
removed but removing them without further analysis I would consider
problematic.

Robert
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unused #include statements

2015-01-14 Thread Zoltan Klinger
Hello there,

Since reading a post [1] about removing some unnecessary #include statements
from a git C source file I've been intrigued to see how many more might be
lurking in the code base.

After a bit of digging around, my brute force approach of 'remove as many
#includes as possible while making sure the code still successfully compiles'
has returned the following results:


$ git diff --stat
 alloc.c|  2 --
 archive-zip.c  |  1 -
 archive.c  |  1 -
 argv-array.c   |  1 -
 bisect.c   |  9 -
 block-sha1/sha1.c  |  2 --
 branch.c   |  1 -
 builtin/add.c  |  7 ---
 builtin/annotate.c |  1 -
 builtin/apply.c|  8 
 builtin/archive.c  |  3 ---
 builtin/bisect--helper.c   |  2 --
 builtin/blame.c| 13 -
 builtin/branch.c   |  8 
 builtin/bundle.c   |  1 -
 builtin/cat-file.c |  3 ---
 builtin/check-attr.c   |  2 --
 builtin/check-ignore.c |  2 --
 builtin/check-mailmap.c|  2 --
 builtin/check-ref-format.c |  2 --
 builtin/checkout-index.c   |  2 --
 builtin/checkout.c | 11 ---
 builtin/clean.c|  3 ---
 builtin/clone.c| 10 --
 builtin/column.c   |  3 ---
 builtin/commit-tree.c  |  4 
 builtin/commit.c   | 13 -
 builtin/config.c   |  1 -
 builtin/count-objects.c|  2 --
 builtin/credential.c   |  1 -
 builtin/describe.c |  5 -
 builtin/diff-files.c   |  4 
 builtin/diff-index.c   |  4 
 builtin/diff-tree.c|  4 
 builtin/diff.c |  6 --
 builtin/fast-export.c  |  8 
 builtin/fetch.c|  8 
 builtin/fmt-merge-msg.c|  6 --
 builtin/for-each-ref.c |  6 --
 builtin/fsck.c |  7 ---
 builtin/gc.c   |  3 ---
 builtin/get-tar-commit-id.c|  3 ---
 builtin/grep.c |  6 --
 builtin/hash-object.c  |  2 --
 builtin/help.c |  2 --
 builtin/index-pack.c   |  5 -
 builtin/init-db.c  |  1 -
 builtin/interpret-trailers.c   |  3 ---
 builtin/log.c  | 14 --
 builtin/ls-files.c |  3 ---
 builtin/ls-remote.c|  3 ---
 builtin/ls-tree.c  |  3 ---
 builtin/mailinfo.c |  2 --
 builtin/mailsplit.c|  3 ---
 builtin/merge-base.c   |  5 -
 builtin/merge-file.c   |  2 --
 builtin/merge-index.c  |  1 -
 builtin/merge-ours.c   |  1 -
 builtin/merge-recursive.c  |  3 ---
 builtin/merge-tree.c   |  2 --
 builtin/merge.c| 10 --
 builtin/mktree.c   |  2 --
 builtin/mv.c   |  4 
 builtin/name-rev.c |  3 ---
 builtin/notes.c|  4 
 builtin/pack-objects.c

Re: Unused #include statements

2015-01-14 Thread Jeff King
On Thu, Jan 15, 2015 at 05:14:39AM +0100, Robert Schiele wrote:

 Thus doing those kind of brute-force removals generally makes the
 include structure in a project very fragile. The analysis itself you
 did is still useful to identify header files that can potentially be
 removed but removing them without further analysis I would consider
 problematic.

I would second that. Besides leading to a potentially fragile result,
this analysis was done only for a particular platform with a particular
set of config knobs.

One of our rules is that git-compat-util.h (or one of the well-known
headers which includes, cache.h or builtin.h) is included first in any
translation unit. This gives git-compat-util the cleanest environment
possible for making decisions, and lets macros it defines effect the
rest of the code consistently. I suspect on modern platforms like
Linux/glibc that it is not a huge deal to include git-compat-util a
little late, simply because it does not have all that much to do. But
on Solaris 8? Who knows.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html