Re: [PATCHv1 3/3] git-p4: auto-size the block

2018-06-05 Thread Eric Sunshine
On Tue, Jun 5, 2018 at 5:14 AM Luke Diamand  wrote:
> git-p4 originally would fetch changes in one query. On large repos this
> could fail because of the limits that Perforce imposes on the number of
> items returned and the number of queries in the database.
>
> To fix this, git-p4 learned to query changes in blocks of 512 changes,
> However, this can be very slow - if you have a few million changes,
> with each chunk taking about a second, it can be an hour or so.
>
> Although it's possible to tune this value manually with the
> "--changes-block-size" option, it's far from obvious to ordinary users
> that this is what needs doing.
>
> This change alters the block size dynamically by looking for the
> specific error messages returned from the Perforce server, and reducing
> the block size if the error is seen, either to the limit reported by the
> server, or to half the current block size.
>
> That means we can start out with a very large block size, and then let
> it automatically drop down to a value that works without error, while
> still failing correctly if some other error occurs.
>
> Signed-off-by: Luke Diamand 
> ---
> diff --git a/git-p4.py b/git-p4.py
> @@ -48,7 +48,8 @@ def __str__(self):
>  # Grab changes in blocks of this many revisions, unless otherwise requested
> -defaultBlockSize = 512
> +# The code should now automatically reduce the block size if it is required
> +defaultBlockSize = 1<<20

As worded, the new comment only has value to someone who knew the old
behavior (before this patch), not for someone reading the code fresh.
However, if reworded, it might be meaningful to all readers (new and
old):

# The block size is reduced automatically if required

> @@ -983,10 +985,24 @@ def p4ChangesForPaths(depotPaths, changeRange, 
> requestedBlockSize):
> +try:
> +result = p4CmdList(cmd, errors_as_exceptions=True)
> +except P4RequestSizeException as e:
> +if not block_size:
> +block_size = e.limit
> +elif block_size > e.limit:
> +block_size = e.limit
> +else:
> +block_size = max(2, block_size // 2)
> +
> +if verbose: print("block size error, retry with block size 
> {0}".format(block_size))

Perhaps: s/retry/retrying/

> diff --git a/t/t9818-git-p4-block.sh b/t/t9818-git-p4-block.sh
> @@ -129,6 +129,7 @@ test_expect_success 'Create a repo with multiple depot 
> paths' '
> +test_expect_success 'Clone repo with self-sizing block size' '
> +   test_when_finished cleanup_git &&
> +   git p4 clone --changes-block-size=100 //depot@all 
> --destination="$git" &&
> +   (
> +   cd "$git" &&
> +   git log --oneline > log &&
> +   test_line_count \> 10 log
> +   )
> +'

Or, without a subshell (and dropping whitespace after redirection operator):

git -C git log --oneline >log &&
test_line_count \> 10 log

(not worth a re-roll)


[PATCHv1 3/3] git-p4: auto-size the block

2018-06-05 Thread Luke Diamand
git-p4 originally would fetch changes in one query. On large repos this
could fail because of the limits that Perforce imposes on the number of
items returned and the number of queries in the database.

To fix this, git-p4 learned to query changes in blocks of 512 changes,
However, this can be very slow - if you have a few million changes,
with each chunk taking about a second, it can be an hour or so.

Although it's possible to tune this value manually with the
"--changes-block-size" option, it's far from obvious to ordinary users
that this is what needs doing.

This change alters the block size dynamically by looking for the
specific error messages returned from the Perforce server, and reducing
the block size if the error is seen, either to the limit reported by the
server, or to half the current block size.

That means we can start out with a very large block size, and then let
it automatically drop down to a value that works without error, while
still failing correctly if some other error occurs.

Signed-off-by: Luke Diamand 
---
 git-p4.py   | 26 +-
 t/t9818-git-p4-block.sh | 11 +++
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index b337562b39..6736f81b60 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -48,7 +48,8 @@ def __str__(self):
 defaultLabelRegexp = r'[a-zA-Z0-9_\-.]+$'
 
 # Grab changes in blocks of this many revisions, unless otherwise requested
-defaultBlockSize = 512
+# The code should now automatically reduce the block size if it is required
+defaultBlockSize = 1<<20
 
 p4_access_checked = False
 
@@ -969,7 +970,8 @@ def p4ChangesForPaths(depotPaths, changeRange, 
requestedBlockSize):
 changes = set()
 
 # Retrieve changes a block at a time, to prevent running
-# into a MaxResults/MaxScanRows error from the server.
+# into a MaxResults/MaxScanRows error from the server. If
+# we _do_ hit one of those errors, turn down the block size
 
 while True:
 cmd = ['changes']
@@ -983,10 +985,24 @@ def p4ChangesForPaths(depotPaths, changeRange, 
requestedBlockSize):
 for p in depotPaths:
 cmd += ["%s...@%s" % (p, revisionRange)]
 
+# fetch the changes
+try:
+result = p4CmdList(cmd, errors_as_exceptions=True)
+except P4RequestSizeException as e:
+if not block_size:
+block_size = e.limit
+elif block_size > e.limit:
+block_size = e.limit
+else:
+block_size = max(2, block_size // 2)
+
+if verbose: print("block size error, retry with block size 
{0}".format(block_size))
+continue
+except P4Exception as e:
+die('Error retrieving changes description 
({0})'.format(e.p4ExitCode))
+
 # Insert changes in chronological order
-for entry in reversed(p4CmdList(cmd)):
-if entry.has_key('p4ExitCode'):
-die('Error retrieving changes descriptions 
({})'.format(entry['p4ExitCode']))
+for entry in reversed(result):
 if not entry.has_key('change'):
 continue
 changes.add(int(entry['change']))
diff --git a/t/t9818-git-p4-block.sh b/t/t9818-git-p4-block.sh
index 8840a183ac..e5ec9cdec8 100755
--- a/t/t9818-git-p4-block.sh
+++ b/t/t9818-git-p4-block.sh
@@ -129,6 +129,7 @@ test_expect_success 'Create a repo with multiple depot 
paths' '
 '
 
 test_expect_success 'Clone repo with multiple depot paths' '
+   test_when_finished cleanup_git &&
(
cd "$git" &&
git p4 clone --changes-block-size=4 //depot/pathA@all 
//depot/pathB@all \
@@ -138,6 +139,16 @@ test_expect_success 'Clone repo with multiple depot paths' 
'
)
 '
 
+test_expect_success 'Clone repo with self-sizing block size' '
+   test_when_finished cleanup_git &&
+   git p4 clone --changes-block-size=100 //depot@all 
--destination="$git" &&
+   (
+   cd "$git" &&
+   git log --oneline > log &&
+   test_line_count \> 10 log
+   )
+'
+
 test_expect_success 'kill p4d' '
kill_p4d
 '
-- 
2.17.0.392.gdeb1a6e9b7