Re: shelf-tests failure on macOS

2018-06-22 Thread Julian Foad
Julian Foad wrote on 2018-06-15:
> Bert Huijben wrote on 2018-06-14:
> > On the testsuite we explicitly disable the sleeps via the environment 
> > variable designed for this (except for a few specific tests). We 
> > compensate for this problem in quite a few places by making sure that we 
> > also change the number of characters in the file.
> 
> Thank you Bert! That was one of the remaining pieces of the puzzle. 
> After adding 'sleep for timestamps' in the unshelve code in r1833532, I 
> have again changed the tests to use files that are all different lengths 
> in r1833565, so the tests can work with the sleep disabled.
> 
> Then I fixed one more timestamp-related issue in r1833577: we can't 
> depend on timestamp-ordered output of 'svn shelves'.

One more follow-up:
http://svn.apache.org/r1834125 : Shelving: re-enable a test, following 
sleep-for-timestamps fix. 

- Julian


Re: shelf-tests failure on macOS

2018-06-15 Thread Julian Foad
Bert Huijben wrote on 2018-06-14:
> On the testsuite we explicitly disable the sleeps via the environment 
> variable designed for this (except for a few specific tests). We 
> compensate for this problem in quite a few places by making sure that we 
> also change the number of characters in the file.

Thank you Bert! That was one of the remaining pieces of the puzzle. After 
adding 'sleep for timestamps' in the unshelve code in r1833532, I have again 
changed the tests to use files that are all different lengths in r1833565, so 
the tests can work with the sleep disabled.

Then I fixed one more timestamp-related issue in r1833577: we can't depend on 
timestamp-ordered output of 'svn shelves'.

- Julian


> > -Original Message-
> > From: Julian Foad [mailto:julianf...@apache.org]
> > Sent: donderdag 14 juni 2018 19:35
> > To: Philip Martin 
> > Cc: dev@subversion.apache.org
> > Subject: Re: shelf-tests failure on macOS
> > 
> > Philip Martin wrote on 2018-06-14:
> > > Philip Martin  writes:
> > > > Julian Foad  writes:
> > > >> The most unusual thing about these failing tests is the data they
> > > >> write. Strings of 5 or 6 low-value bytes, like:
> > > >>
> > > >> '\0\1\2\3\4\5'
> > > >> '\5\4\3\2\1\0'
> > > >
> > > > Those are the same size.  Could it be a filesystem timestamp resolution
> > > > issue?  The file changes but the timestamp does not and some change is
> > > > not detected?
> > >
> > > An HFS+ filesystem has a coarse 1s timestamp resolution so that would
> > > explain why the problem shows up there but not systems with subsecond
> > > resolution.
> > 
> > Sounds like the correct explanation.
> > 
> > > Neither shelve or unshelve appear to use svn_sleep_for_timestamp().
> > 
> > Actually shelve already does, in the relevant code path, which is inside
> > svn_client_shelf_unapply() where it uses svn_client_revert4() which does.
> > 
> > Unshelve sleeps now, since r1833532. Hopefully that's the end of it.
> > 
> > Thank you very much for the insight, Philip!
> > 
> > - Julian
> 


-- 
- Julian


RE: shelf-tests failure on macOS

2018-06-14 Thread Bert Huijben
On the testsuite we explicitly disable the sleeps via the environment variable 
designed for this (except for a few specific tests). We compensate for this 
problem in quite a few places by making sure that we also change the number of 
characters in the file.

Bert
> -Original Message-
> From: Julian Foad [mailto:julianf...@apache.org]
> Sent: donderdag 14 juni 2018 19:35
> To: Philip Martin 
> Cc: dev@subversion.apache.org
> Subject: Re: shelf-tests failure on macOS
> 
> Philip Martin wrote on 2018-06-14:
> > Philip Martin  writes:
> > > Julian Foad  writes:
> > >> The most unusual thing about these failing tests is the data they
> > >> write. Strings of 5 or 6 low-value bytes, like:
> > >>
> > >> '\0\1\2\3\4\5'
> > >> '\5\4\3\2\1\0'
> > >
> > > Those are the same size.  Could it be a filesystem timestamp resolution
> > > issue?  The file changes but the timestamp does not and some change is
> > > not detected?
> >
> > An HFS+ filesystem has a coarse 1s timestamp resolution so that would
> > explain why the problem shows up there but not systems with subsecond
> > resolution.
> 
> Sounds like the correct explanation.
> 
> > Neither shelve or unshelve appear to use svn_sleep_for_timestamp().
> 
> Actually shelve already does, in the relevant code path, which is inside
> svn_client_shelf_unapply() where it uses svn_client_revert4() which does.
> 
> Unshelve sleeps now, since r1833532. Hopefully that's the end of it.
> 
> Thank you very much for the insight, Philip!
> 
> - Julian



Re: shelf-tests failure on macOS

2018-06-14 Thread Julian Foad
Philip Martin wrote on 2018-06-14:
> Philip Martin  writes:
> > Julian Foad  writes:
> >> The most unusual thing about these failing tests is the data they
> >> write. Strings of 5 or 6 low-value bytes, like:
> >>
> >> '\0\1\2\3\4\5'
> >> '\5\4\3\2\1\0'
> >
> > Those are the same size.  Could it be a filesystem timestamp resolution
> > issue?  The file changes but the timestamp does not and some change is
> > not detected?
> 
> An HFS+ filesystem has a coarse 1s timestamp resolution so that would
> explain why the problem shows up there but not systems with subsecond
> resolution.

Sounds like the correct explanation.

> Neither shelve or unshelve appear to use svn_sleep_for_timestamp().

Actually shelve already does, in the relevant code path, which is inside 
svn_client_shelf_unapply() where it uses svn_client_revert4() which does.

Unshelve sleeps now, since r1833532. Hopefully that's the end of it.

Thank you very much for the insight, Philip!

- Julian


Re: shelf-tests failure on macOS

2018-06-14 Thread Branko Čibej
On 14.06.2018 18:09, Philip Martin wrote:
> Philip Martin  writes:
>
>> Julian Foad  writes:
>>
>>> The most unusual thing about these failing tests is the data they
>>> write. Strings of 5 or 6 low-value bytes, like:
>>>
>>> '\0\1\2\3\4\5'
>>> '\5\4\3\2\1\0'
>> Those are the same size.  Could it be a filesystem timestamp resolution
>> issue?  The file changes but the timestamp does not and some change is
>> not detected?
> An HFS+ filesystem has a coarse 1s timestamp resolution so that would
> explain why the problem shows up there but not systems with subsecond
> resolution.
>
> Neither shelve or unshelve appear to use svn_sleep_for_timestamp().

Oh, bravo! This could indeed be the problem.

-- Brane



Re: shelf-tests failure on macOS

2018-06-14 Thread Philip Martin
Philip Martin  writes:

> Julian Foad  writes:
>
>> The most unusual thing about these failing tests is the data they
>> write. Strings of 5 or 6 low-value bytes, like:
>>
>> '\0\1\2\3\4\5'
>> '\5\4\3\2\1\0'
>
> Those are the same size.  Could it be a filesystem timestamp resolution
> issue?  The file changes but the timestamp does not and some change is
> not detected?

An HFS+ filesystem has a coarse 1s timestamp resolution so that would
explain why the problem shows up there but not systems with subsecond
resolution.

Neither shelve or unshelve appear to use svn_sleep_for_timestamp().

-- 
Philip


Re: shelf-tests failure on macOS

2018-06-14 Thread Philip Martin
Julian Foad  writes:

> The most unusual thing about these failing tests is the data they
> write. Strings of 5 or 6 low-value bytes, like:
>
> '\0\1\2\3\4\5'
> '\5\4\3\2\1\0'

Those are the same size.  Could it be a filesystem timestamp resolution
issue?  The file changes but the timestamp does not and some change is
not detected?

-- 
Philip


Re: shelf-tests failure on macOS

2018-06-14 Thread Julian Foad
Daniel Shahaf wrote on 2018-06-14:
> Julian Foad wrote on Thu, 14 Jun 2018 16:06 +0100:
> > Branko Čibej wrote on 2018-06-14:
> > > A sync (which forces a write to disk) should not be necessary in cases
> > > like this one, as long as we close the file in Python — that should
> > > ensure a write to the OS buffers. If this were the problem, Subversion
> > > wouldn't work on macOS at all.
> > > 
> > > I still don't understand why this issue manifests only with the shelving
> > > tests,
> > 
> > The most unusual thing about these failing tests is the data they write. 
> > Strings of 5 or 6 low-value bytes, like:
> > 
> > '\0\1\2\3\4\5'
> > '\5\4\3\2\1\0'
> > '\1\6\3\4\5'
> > 
> > Just in case that gives anybody a clue.
> 
> In the link in the first post, the expected data is '\0\1\2\3\4\5' and
> the actual data is '\5\4\3\2\1\0'.  It looks like a uint48_t written
> with the wrong endianness.

No, no -- these are the literal strings that I put in the Python test code, 
just as written here, in order to test shelving of 'binary' data. For example, 
one test starts with file content '\0\1\2\3\4\5' and changes it to 
'\5\4\3\2\1\0' and shelves that change.

> Note that test 15 didn't fail in today's build, even though it did fail
> for the last four days in a row.  There don't seem to have been any
> relevant commits yesterday.

The exact location of the failures varies from one test run to another. Some 
tests fail at one point in one test run, then fail at a later or earlier point 
in another test run, but all failures seem to be of the same general form.

- Julian


Re: shelf-tests failure on macOS

2018-06-14 Thread Daniel Shahaf
Julian Foad wrote on Thu, 14 Jun 2018 16:06 +0100:
> Branko Čibej wrote on 2018-06-14:
> > A sync (which forces a write to disk) should not be necessary in cases
> > like this one, as long as we close the file in Python — that should
> > ensure a write to the OS buffers. If this were the problem, Subversion
> > wouldn't work on macOS at all.
> > 
> > I still don't understand why this issue manifests only with the shelving
> > tests,
> 
> The most unusual thing about these failing tests is the data they write. 
> Strings of 5 or 6 low-value bytes, like:
> 
> '\0\1\2\3\4\5'
> '\5\4\3\2\1\0'
> '\1\6\3\4\5'
> 
> Just in case that gives anybody a clue.

In the link in the first post, the expected data is '\0\1\2\3\4\5' and
the actual data is '\5\4\3\2\1\0'.  It looks like a uint48_t written
with the wrong endianness.

Note that test 15 didn't fail in today's build, even though it did fail
for the last four days in a row.  There don't seem to have been any
relevant commits yesterday.


Re: shelf-tests failure on macOS

2018-06-14 Thread Julian Foad
Branko Čibej wrote on 2018-06-14:
> A sync (which forces a write to disk) should not be necessary in cases
> like this one, as long as we close the file in Python — that should
> ensure a write to the OS buffers. If this were the problem, Subversion
> wouldn't work on macOS at all.
> 
> I still don't understand why this issue manifests only with the shelving
> tests,

The most unusual thing about these failing tests is the data they write. 
Strings of 5 or 6 low-value bytes, like:

'\0\1\2\3\4\5'
'\5\4\3\2\1\0'
'\1\6\3\4\5'

Just in case that gives anybody a clue.

- Julian


Re: shelf-tests failure on macOS

2018-06-14 Thread Branko Čibej
On 14.06.2018 16:50, Julian Foad wrote:
> Branko Čibej wrote:
>> The object lifetime and implicit close are very well defined indeed in
>> CPython, which we use. We don't use "modern Pythons on other virtual
>> machines."
>> If this were indeed the source of the problem, a number of
>> other tests would be failing randomly on that bot, and elsewhere too,
>> I'm sure. But ...
>>
>>>   with open(path, mode) as f:
>>> f.write(contents)
>> ... on the other hand, making this suggested change won't hurt [...]
> Done ... and it didn't fix the problem.
>
>> It's quite likely a timing problem between cache flush and file open.
>> For a while I was looking at the shelve code to see if there was a
>> missing sync(), but that actually doesn't make sense in the context of
>> the test suite, since 'svn' is invoked as an external process which ends
>> before the tests continue 
> It's the other way around:
> * Python writes to a file,
> * Python calls 'svn' as a subprocess,
> * 'svn' opens the file and doesn't see the change
>
> Could it be that we need to run 'sync' immediately *before* calling 'svn' as 
> a subprocess?

A sync (which forces a write to disk) should not be necessary in cases
like this one, as long as we close the file in Python — that should
ensure a write to the OS buffers. If this were the problem, Subversion
wouldn't work on macOS at all.

I still don't understand why this issue manifests only with the shelving
tests, even though we've used the open().write() pattern in our tests
pretty much since they existed. The bot runs the build and tests on a
RAM disk, but I do the same locally and don't see these issues.

It could be a bug in the system-provided python on that particular
version of macOS ... but again, we should have seen that with other
tests, too.

-- Brane


Re: shelf-tests failure on macOS

2018-06-14 Thread Julian Foad
Branko Čibej wrote:
> The object lifetime and implicit close are very well defined indeed in
> CPython, which we use. We don't use "modern Pythons on other virtual
> machines."
> If this were indeed the source of the problem, a number of
> other tests would be failing randomly on that bot, and elsewhere too,
> I'm sure. But ...
> 
> >   with open(path, mode) as f:
> > f.write(contents)
> 
> ... on the other hand, making this suggested change won't hurt [...]

Done ... and it didn't fix the problem.

> It's quite likely a timing problem between cache flush and file open.
> For a while I was looking at the shelve code to see if there was a
> missing sync(), but that actually doesn't make sense in the context of
> the test suite, since 'svn' is invoked as an external process which ends
> before the tests continue 

It's the other way around:
* Python writes to a file,
* Python calls 'svn' as a subprocess,
* 'svn' opens the file and doesn't see the change

Could it be that we need to run 'sync' immediately *before* calling 'svn' as a 
subprocess?

- Julian


Re: shelf-tests failure on macOS

2018-06-14 Thread Daniel Shahaf
Branko Čibej wrote on Thu, 14 Jun 2018 04:04 +0200:
> ... on the other hand, making this suggested change won't hurt, as long
> as our required Python version for tests supports the 'with' statement,
> which I believe it does. In fact we already use the 'with' statement in
> our test suite.

Since 1.9 we require python>=2.7 which supports the 'with' statement.


Re: shelf-tests failure on macOS

2018-06-13 Thread Branko Čibej
On 14.06.2018 04:04, Branko Čibej wrote:
> So, long story short ... changing all the open().write() statements to
> the 'with' form should at least remove one possible source of bugs.

We also use open().read() a lot, but that should not have timing
problems since it doesn't modify the file.

-- Brane



Re: shelf-tests failure on macOS

2018-06-13 Thread Branko Čibej
On 13.06.2018 16:31, Julian Foad wrote:
> Branko Čibej wrote:
>> On 12.06.2018 13:00, Julian Foad wrote:
>>> Branko Čibej wrote:
 https://ci.apache.org/builders/svn-x64-macosx-apr1.3-nothread/builds/3416/steps/Test%20ra_local%2Bfsfs/logs/faillog

>>> I should have a chance to debug this on a Mac today or tomorrow.
>> The good news is that I can't reproduce this test failure locally. There
>> are two possible reasons I can think of: [...]
> I have been trying to reproduce and debug this with my colleague Konrad and 
> we have a theory. The failure mode seems consistent with Python writing a 
> change to disk and then running 'svn', and 'svn' not seeing that change on 
> disk.
>
> The test uses this code in svntest.main.file_write():
>
>   open(path, mode).write(contents)
>
> When opening and using a file object "in line" in this way, is seems that the 
> timing of the implicit "close" is not specified, according to sources such as:
>
>   
> https://stackoverflow.com/questions/5362901/in-the-inline-open-and-write-file-is-the-close-implicit#5362917
>
> """
> The close() here happens when the file object is deallocated from memory, as 
> part of its deletion logic. Because modern Pythons on other virtual machines 
> — like Java and .NET — cannot control when an object is deallocated from 
> memory, it is no longer considered good Python to open() like this without a 
> close(). The recommendation today is to use a with statement, which 
> explicitly requests a close() when the block is exited ...
> """

The object lifetime and implicit close are very well defined indeed in
CPython, which we use. We don't use "modern Pythons on other virtual
machines." If this were indeed the source of the problem, a number of
other tests would be failing randomly on that bot, and elsewhere too,
I'm sure. But ...

> It seems that we need to change the code to something like:
>
> def file_write(path, contents, mode='w'):
>   [...]
>   with open(path, mode) as f:
> f.write(contents)
>
> to guarantee the file is closed before the test proceeds further. It may be 
> that most of our testing in the past has used Python implementations where 
> the write is never delayed far enough to affect what an invoked 'svn' 
> subprocess sees, but on the Mac it is a little different. We are using a Mac 
> with Python 2. (The 'file_write' method has separate code paths for Python 2 
> and Python 3 but both use the same open().write() construct.)

... on the other hand, making this suggested change won't hurt, as long
as our required Python version for tests supports the 'with' statement,
which I believe it does. In fact we already use the 'with' statement in
our test suite.

> The same construct appears in many other places in the test suite too.
>
> We are trying to test this, and it is taking a long time because we can only 
> currently reproduce the failure after a fresh checkout/build/test cycle. 
> Maybe because the bug is sensitive to the differences in timing of a warm 
> cache vs. a cold cache.

It's quite likely a timing problem between cache flush and file open.
For a while I was looking at the shelve code to see if there was a
missing sync(), but that actually doesn't make sense in the context of
the test suite, since 'svn' is invoked as an external process which ends
before the tests continue — only a bug in the OS filesystem cache could
cause the observed behaviour, and that sort of bug would be ... let's
just say rather extremely visible with other applications, too.

So, long story short ... changing all the open().write() statements to
the 'with' form should at least remove one possible source of bugs.

-- Brane



Re: shelf-tests failure on macOS

2018-06-13 Thread Julian Foad
Branko Čibej wrote:
> On 12.06.2018 13:00, Julian Foad wrote:
>> Branko Čibej wrote:
>>> https://ci.apache.org/builders/svn-x64-macosx-apr1.3-nothread/builds/3416/steps/Test%20ra_local%2Bfsfs/logs/faillog
>>>
>> I should have a chance to debug this on a Mac today or tomorrow.
> 
> The good news is that I can't reproduce this test failure locally. There
> are two possible reasons I can think of: [...]

I have been trying to reproduce and debug this with my colleague Konrad and we 
have a theory. The failure mode seems consistent with Python writing a change 
to disk and then running 'svn', and 'svn' not seeing that change on disk.

The test uses this code in svntest.main.file_write():

  open(path, mode).write(contents)

When opening and using a file object "in line" in this way, is seems that the 
timing of the implicit "close" is not specified, according to sources such as:

  
https://stackoverflow.com/questions/5362901/in-the-inline-open-and-write-file-is-the-close-implicit#5362917

"""
The close() here happens when the file object is deallocated from memory, as 
part of its deletion logic. Because modern Pythons on other virtual machines — 
like Java and .NET — cannot control when an object is deallocated from memory, 
it is no longer considered good Python to open() like this without a close(). 
The recommendation today is to use a with statement, which explicitly requests 
a close() when the block is exited ...
"""

It seems that we need to change the code to something like:

def file_write(path, contents, mode='w'):
  [...]
  with open(path, mode) as f:
f.write(contents)

to guarantee the file is closed before the test proceeds further. It may be 
that most of our testing in the past has used Python implementations where the 
write is never delayed far enough to affect what an invoked 'svn' subprocess 
sees, but on the Mac it is a little different. We are using a Mac with Python 
2. (The 'file_write' method has separate code paths for Python 2 and Python 3 
but both use the same open().write() construct.)

The same construct appears in many other places in the test suite too.

We are trying to test this, and it is taking a long time because we can only 
currently reproduce the failure after a fresh checkout/build/test cycle. Maybe 
because the bug is sensitive to the differences in timing of a warm cache vs. a 
cold cache.

- Julian


Re: shelf-tests failure on macOS

2018-06-12 Thread Branko Čibej
On 12.06.2018 13:00, Julian Foad wrote:
> Branko Čibej wrote:
>> https://ci.apache.org/builders/svn-x64-macosx-apr1.3-nothread/builds/3416/steps/Test%20ra_local%2Bfsfs/logs/faillog
>>
>> With the file contents now being printed in the test log, we can see how
>> the test is failing. What's worrying is that we don't see this failure
>> either on Windows or Linux (at least, I haven't seen it in the buildbot
>> logs).
> I should have a chance to debug this on a Mac today or tomorrow.

The good news is that I can't reproduce this test failure locally. There
are two possible reasons I can think of:

  * something's wrong with the version of APR used by the bot; it's
1.5.x compared to 1.6.x that I use;
  * the filesystem on the bot is broken; but it's a RAMdisk that gets
recreated for each test run and it would be extremely strange if it
caused only that test to fail consistently.


-- Brane


Re: shelf-tests failure on macOS

2018-06-12 Thread Julian Foad
Branko Čibej wrote:
> https://ci.apache.org/builders/svn-x64-macosx-apr1.3-nothread/builds/3416/steps/Test%20ra_local%2Bfsfs/logs/faillog
> 
> With the file contents now being printed in the test log, we can see how
> the test is failing. What's worrying is that we don't see this failure
> either on Windows or Linux (at least, I haven't seen it in the buildbot
> logs).

I should have a chance to debug this on a Mac today or tomorrow.

- Julian