Re: git 2.0.0 PROFILE=BUILD check-phase problems with ./t5561-http-backend.sh; GIT_TEST_HTTPD=false problems with t5537-fetch-shallow.sh

2014-05-30 Thread Nix
On 30 May 2014, Jeff King told this:

 On Thu, May 29, 2014 at 11:44:37PM +0100, Nix wrote:

 I observe test failures with git 2.0.0 which are attributable to the
 change to run network tests by default. I'm lumping them both together
 into one report because I'm lazy and I've blown too much time on this
 already.

 Weird. I also see a strange failure on t5310 when building with
 PROFILE=BUILD. We get a segfault when reading jgit-produced bitmaps.

I don't see that, but this may be a compiler difference, or
environmental difference, or something.

 Tracking it down, we're getting inexplicably bogus data from an mmap'd
 file (!). Compiling without PROFILE=BUILD, the test passes fine (even
 with valgrind).

 If I instrument it like this:

 diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
 index f7f700e..8cafacf 100644
 --- a/ewah/ewah_io.c
 +++ b/ewah/ewah_io.c
 @@ -119,6 +119,7 @@ int ewah_read_mmap(struct ewah_bitmap *self, void *map, 
 size_t len)
   ptr += sizeof(uint32_t);
  
   self-buffer_size = self-alloc_size = get_be32(ptr);
 + warning(got buffer_size of %lu from %lu, self-buffer_size, 
 *(uint32_t *)ptr);
   ptr += sizeof(uint32_t);
  
   self-buffer = ewah_realloc(self-buffer,

 a regular test-run reads:

   warning: got buffer_size of 2 from 33554432
   warning: got buffer_size of 2 from 33554432
   warning: got buffer_size of 3 from 50331648
   warning: got buffer_size of 1 from 16777216
   warning: got buffer_size of 2 from 33554432

 and a PROFILE=BUILD one reads:

   warning: got buffer_size of 2 from 33554432
   warning: got buffer_size of 2 from 33554432
   warning: got buffer_size of 3 from 50331648
   warning: got buffer_size of 1 from 16777216
   warning: got buffer_size of 131072 from 512

 I'm willing to believe we're doing something weird that violates the C
 standard there, but I really can't see it.

It's doubly strange because you're not building with -flto at all, all
callers of ewah_read_mmap() are in a separate file, and there are no
conditionals or loops above the warning() locus. Maybe the error is in a
different file, and is causing a nonsense 'len' to get passed in?

But I think this *may* be a false alarm. *(uint32_t *)ptr on a uint8_t
(in your logging line) is an aliasing violation, and the nearly-
uninitialized '512' is pretty much what I'd expect GCC to do now and
then, given that. Just a guess.

(I'm surprised this doesn't cause a problem *always*, though.)

 It appears that the Apache daemon is writing to the log slowly enough
 that its log lines only get there after the testsuite has written its
 separator, so a bunch of log lines appear to be attached to the wrong
 test, and the comparison fails.

 Yeah, that looks to me like what is happening, too. If I put a 'sleep
 1' into log_div, it passes. I would think apache would write the log
 before serving the file, but perhaps not.

It does a straight write() via APR, which should work, I'd have thought.
This is on a tmpfs, not a network filesystem or anything.

(Aside: the flushing semantics are quite tangled, but shouldn't matter,
given that we're just doing a write(). Apache calls apr_file_flush() on
the logfile, but that's just flushing apr's analogue of stdio buffers:
it does nothing if the file is unbuffered -- and the error log file is
unbuffered. It does not call apr_file_sync(), but you shouldn't need a
sync() for one process to see another process's output.)

   And like you, I would expect
 gcov to make things slower, not faster. Could there be something in the
 environment that

I'm more inclined to blame the kernel or the compiler, but I *always*
blame the kernel or the compiler. :P

 I'm not sure what the best fix is. We could check the logfiles after
 each test instead of at the end, but that will just end up with the same
 race: we may check them before apache has written them.

Yeah, that seems less than useful.

-- 
NULL  (void)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git 2.0.0 PROFILE=BUILD check-phase problems with ./t5561-http-backend.sh; GIT_TEST_HTTPD=false problems with t5537-fetch-shallow.sh

2014-05-29 Thread Nix
I observe test failures with git 2.0.0 which are attributable to the
change to run network tests by default. I'm lumping them both together
into one report because I'm lazy and I've blown too much time on this
already.

I've got Apache 2.2.24 on this box, and t5551-http-fetch-smart.sh fails
in a peculiar fashion during the PROFILE=BUILD stage:

--- exp 2014-05-29 22:42:50.221599297 +
+++ act 2014-05-29 22:42:50.231598452 +
@@ -12,10 +12,10 @@
 GET  /smart/repo.git/objects/info/http-alternates HTTP/1.1 200 -
 GET  /smart/repo.git/objects/41/57d6f47fc8b7cb455fbf6f18d6f47fed49a6a5 
HTTP/1.1 200
 GET  
/smart/repo.git/objects/pack/pack-2cb3186872e8768852fcb6b22cdf2182e12f7490.pack 
HTTP/1.1 200
-GET  
/smart/repo.git/objects/pack/pack-2cb3186872e8768852fcb6b22cdf2182e12f7490.idx 
HTTP/1.1 200

 ###  no git-daemon-export-ok
 ###
+GET  
/smart/repo.git/objects/pack/pack-2cb3186872e8768852fcb6b22cdf2182e12f7490.idx 
HTTP/1.1 200
 GET  /smart_noexport/repo.git/HEAD HTTP/1.1 404 -
 GET  /smart_noexport/repo.git/info/refs HTTP/1.1 404 -
 GET  /smart_noexport/repo.git/objects/info/packs HTTP/1.1 404 -
@@ -34,10 +34,10 @@
 GET  /smart_noexport/repo.git/objects/info/http-alternates HTTP/1.1 200 -
 GET  
/smart_noexport/repo.git/objects/41/57d6f47fc8b7cb455fbf6f18d6f47fed49a6a5 
HTTP/1.1 200
 GET  
/smart_noexport/repo.git/objects/pack/pack-2cb3186872e8768852fcb6b22cdf2182e12f7490.pack
 HTTP/1.1 200
-GET  
/smart_noexport/repo.git/objects/pack/pack-2cb3186872e8768852fcb6b22cdf2182e12f7490.idx
 HTTP/1.1 200

 ###  getanyfile true
 ###
+GET  
/smart_noexport/repo.git/objects/pack/pack-2cb3186872e8768852fcb6b22cdf2182e12f7490.idx
 HTTP/1.1 200
 GET  /smart/repo.git/HEAD HTTP/1.1 200
 GET  /smart/repo.git/info/refs HTTP/1.1 200
 GET  /smart/repo.git/objects/info/packs HTTP/1.1 200
@@ -45,10 +45,10 @@
 GET  /smart/repo.git/objects/info/http-alternates HTTP/1.1 200 -
 GET  /smart/repo.git/objects/41/57d6f47fc8b7cb455fbf6f18d6f47fed49a6a5 
HTTP/1.1 200
 GET  
/smart/repo.git/objects/pack/pack-2cb3186872e8768852fcb6b22cdf2182e12f7490.pack 
HTTP/1.1 200
-GET  
/smart/repo.git/objects/pack/pack-2cb3186872e8768852fcb6b22cdf2182e12f7490.idx 
HTTP/1.1 200

 ###  getanyfile false
 ###
+GET  
/smart/repo.git/objects/pack/pack-2cb3186872e8768852fcb6b22cdf2182e12f7490.idx 
HTTP/1.1 200
 GET  /smart/repo.git/HEAD HTTP/1.1 403 -
 GET  /smart/repo.git/info/refs HTTP/1.1 403 -
 GET  /smart/repo.git/objects/info/packs HTTP/1.1 403 -
not ok 14 - server request log matches test results
#
#   sed -e 
#   s/^.* \//
#   s/\//
#   s/ [1-9][0-9]*\$//
#   s/^GET /GET  /
#act $HTTPD_ROOT_PATH/access.log 
#   test_cmp exp act
#

# failed 1 among 14 test(s)

It appears that the Apache daemon is writing to the log slowly enough
that its log lines only get there after the testsuite has written its
separator, so a bunch of log lines appear to be attached to the wrong
test, and the comparison fails.

Curiously, I can't make this happen in a conventional 'make check', even
though the only relevant components would seem to be bash and httpd: if
anything, you'd expect the gcovvery to slow down git and thus make it
*more* likely that any race between httpd log syncing and testsuite
framework output to the same logfile would be hit...

Attempting to work around this by building with GIT_TEST_HTTPD=false
doesn't work either:

*** t5537-fetch-shallow.sh ***
ok 1 - setup
ok 2 - setup shallow clone
ok 3 - clone from shallow clone
ok 4 - fetch from shallow clone
ok 5 - fetch --depth from shallow clone
ok 6 - fetch --unshallow from shallow clone
ok 7 - fetch something upstream has but hidden by clients shallow boundaries
ok 8 - fetch that requires changes in .git/shallow is filtered
ok 9 - fetch --update-shallow
error: Can't use skip_all after running some tests
Makefile:43: recipe for target 't5537-fetch-shallow.sh' failed
make[3]: *** [t5537-fetch-shallow.sh] Error 1

since this is trying to run the httpd halfway through the test, which
will never work if it's skipping it. Moving the httpd sourcing to the
top of the test isn't going to work either, because that would skip
*everything*, when we want to skip only the httpd bits. Maybe splitting
the httpd bits into a separate test is best here? I'm not sure.

-- 
NULL  (void)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git 2.0.0 PROFILE=BUILD check-phase problems with ./t5561-http-backend.sh; GIT_TEST_HTTPD=false problems with t5537-fetch-shallow.sh

2014-05-29 Thread Jeff King
On Thu, May 29, 2014 at 11:44:37PM +0100, Nix wrote:

 I observe test failures with git 2.0.0 which are attributable to the
 change to run network tests by default. I'm lumping them both together
 into one report because I'm lazy and I've blown too much time on this
 already.

Weird. I also see a strange failure on t5310 when building with
PROFILE=BUILD. We get a segfault when reading jgit-produced bitmaps.
Tracking it down, we're getting inexplicably bogus data from an mmap'd
file (!). Compiling without PROFILE=BUILD, the test passes fine (even
with valgrind).

If I instrument it like this:

diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index f7f700e..8cafacf 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -119,6 +119,7 @@ int ewah_read_mmap(struct ewah_bitmap *self, void *map, 
size_t len)
ptr += sizeof(uint32_t);
 
self-buffer_size = self-alloc_size = get_be32(ptr);
+   warning(got buffer_size of %lu from %lu, self-buffer_size, 
*(uint32_t *)ptr);
ptr += sizeof(uint32_t);
 
self-buffer = ewah_realloc(self-buffer,

a regular test-run reads:

  warning: got buffer_size of 2 from 33554432
  warning: got buffer_size of 2 from 33554432
  warning: got buffer_size of 3 from 50331648
  warning: got buffer_size of 1 from 16777216
  warning: got buffer_size of 2 from 33554432

and a PROFILE=BUILD one reads:

  warning: got buffer_size of 2 from 33554432
  warning: got buffer_size of 2 from 33554432
  warning: got buffer_size of 3 from 50331648
  warning: got buffer_size of 1 from 16777216
  warning: got buffer_size of 131072 from 512

I'm willing to believe we're doing something weird that violates the C
standard there, but I really can't see it.

Anyway, that's a side note to your problem...

 It appears that the Apache daemon is writing to the log slowly enough
 that its log lines only get there after the testsuite has written its
 separator, so a bunch of log lines appear to be attached to the wrong
 test, and the comparison fails.

Yeah, that looks to me like what is happening, too. If I put a 'sleep
1' into log_div, it passes. I would think apache would write the log
before serving the file, but perhaps not. And like you, I would expect
gcov to make things slower, not faster. Could there be something in the
environment that

I'm not sure what the best fix is. We could check the logfiles after
each test instead of at the end, but that will just end up with the same
race: we may check them before apache has written them.

 Attempting to work around this by building with GIT_TEST_HTTPD=false
 doesn't work either:
 
 *** t5537-fetch-shallow.sh ***
 ok 1 - setup
 ok 2 - setup shallow clone
 ok 3 - clone from shallow clone
 ok 4 - fetch from shallow clone
 ok 5 - fetch --depth from shallow clone
 ok 6 - fetch --unshallow from shallow clone
 ok 7 - fetch something upstream has but hidden by clients shallow boundaries
 ok 8 - fetch that requires changes in .git/shallow is filtered
 ok 9 - fetch --update-shallow
 error: Can't use skip_all after running some tests
 Makefile:43: recipe for target 't5537-fetch-shallow.sh' failed
 make[3]: *** [t5537-fetch-shallow.sh] Error 1

Hrm. This already came up, and we dealt with it in 0232852 (t5537: move
http tests out to t5539, 2014-02-13). Which is in v2.0.0.

But somehow the code is back in v2.0.0 Presumably this is the result of a
mis-merge. I'll send a patch.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html