Re: [PATCH] [git-p4.py] Add --checkpoint-period option to sync/clone

2016-09-13 Thread Luke Diamand
On 12 September 2016 at 23:02, Ori Rawlings  wrote:
> Importing a long history from Perforce into git using the git-p4 tool
> can be especially challenging. The `git p4 clone` operation is based
> on an all-or-nothing transactionality guarantee. Under real-world
> conditions like network unreliability or a busy Perforce server,
> `git p4 clone` and  `git p4 sync` operations can easily fail, forcing a
> user to restart the import process from the beginning. The longer the
> history being imported, the more likely a fault occurs during the
> process. Long enough imports thus become statistically unlikely to ever
> succeed.

That would never happen :-)

The usual thing that I find is that my Perforce login session expires.

>
> The underlying git fast-import protocol supports an explicit checkpoint
> command. The idea here is to optionally allow the user to force an
> explicit checkpoint every  seconds. If the sync/clone operation fails
> branches are left updated at the appropriate commit available during the
> latest checkpoint. This allows a user to resume importing Perforce
> history while only having to repeat at most approximately  seconds
> worth of import activity.

I think this ought to work, and could be quite useful. It would be
good to have some kind of test case for it though, and updated
documentation.

Luke

>
> Signed-off-by: Ori Rawlings 
> ---
>  git-p4.py | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/git-p4.py b/git-p4.py
> index fd5ca52..40cb64f 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -2244,6 +2244,7 @@ class P4Sync(Command, P4UserMap):
>  optparse.make_option("-/", dest="cloneExclude",
>   action="append", type="string",
>   help="exclude depot path"),
> +optparse.make_option("--checkpoint-period", 
> dest="checkpointPeriod", type="int", help="Period in seconds between explict 
> git fast-import checkpoints (by default, no explicit checkpoints are 
> performed)"),
>  ]
>  self.description = """Imports from Perforce into a git repository.\n
>  example:
> @@ -2276,6 +2277,7 @@ class P4Sync(Command, P4UserMap):
>  self.tempBranches = []
>  self.tempBranchLocation = "refs/git-p4-tmp"
>  self.largeFileSystem = None
> +self.checkpointPeriod = -1

Or use None?

>
>  if gitConfig('git-p4.largeFileSystem'):
>  largeFileSystemConstructor = 
> globals()[gitConfig('git-p4.largeFileSystem')]
> @@ -3031,6 +3033,8 @@ class P4Sync(Command, P4UserMap):
>
>  def importChanges(self, changes):
>  cnt = 1
> +if self.checkpointPeriod > -1:
> +self.lastCheckpointTime = time.time()

Could you just always set the lastCheckpointTime?

>  for change in changes:
>  description = p4_describe(change)
>  self.updateOptionDict(description)
> @@ -3107,6 +3111,10 @@ class P4Sync(Command, P4UserMap):
>  self.initialParent)
>  # only needed once, to connect to the previous commit
>  self.initialParent = ""
> +
> +if self.checkpointPeriod > -1 and time.time() - 
> self.lastCheckpointTime > self.checkpointPeriod:
> +self.checkpoint()
> +self.lastCheckpointTime = time.time()

If you use time.time(), then this could fail to work as expected if
the system time goes backwards (e.g. NTP updates). However, Python 2
doesn't have access to clock_gettime() without jumping through hoops,
so perhaps we leave this as a bug until git-p4 gets ported to Python
3.



>  except IOError:
>  print self.gitError.read()
>  sys.exit(1)
> --
> 2.7.4 (Apple Git-66)
>


[PATCH] [git-p4.py] Add --checkpoint-period option to sync/clone

2016-09-12 Thread Ori Rawlings
Importing a long history from Perforce into git using the git-p4 tool
can be especially challenging. The `git p4 clone` operation is based
on an all-or-nothing transactionality guarantee. Under real-world
conditions like network unreliability or a busy Perforce server,
`git p4 clone` and  `git p4 sync` operations can easily fail, forcing a
user to restart the import process from the beginning. The longer the
history being imported, the more likely a fault occurs during the
process. Long enough imports thus become statistically unlikely to ever
succeed.

The underlying git fast-import protocol supports an explicit checkpoint
command. The idea here is to optionally allow the user to force an
explicit checkpoint every  seconds. If the sync/clone operation fails
branches are left updated at the appropriate commit available during the
latest checkpoint. This allows a user to resume importing Perforce
history while only having to repeat at most approximately  seconds
worth of import activity.

Signed-off-by: Ori Rawlings 
---
 git-p4.py | 8 
 1 file changed, 8 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index fd5ca52..40cb64f 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -2244,6 +2244,7 @@ class P4Sync(Command, P4UserMap):
 optparse.make_option("-/", dest="cloneExclude",
  action="append", type="string",
  help="exclude depot path"),
+optparse.make_option("--checkpoint-period", 
dest="checkpointPeriod", type="int", help="Period in seconds between explict 
git fast-import checkpoints (by default, no explicit checkpoints are 
performed)"),
 ]
 self.description = """Imports from Perforce into a git repository.\n
 example:
@@ -2276,6 +2277,7 @@ class P4Sync(Command, P4UserMap):
 self.tempBranches = []
 self.tempBranchLocation = "refs/git-p4-tmp"
 self.largeFileSystem = None
+self.checkpointPeriod = -1
 
 if gitConfig('git-p4.largeFileSystem'):
 largeFileSystemConstructor = 
globals()[gitConfig('git-p4.largeFileSystem')]
@@ -3031,6 +3033,8 @@ class P4Sync(Command, P4UserMap):
 
 def importChanges(self, changes):
 cnt = 1
+if self.checkpointPeriod > -1:
+self.lastCheckpointTime = time.time()
 for change in changes:
 description = p4_describe(change)
 self.updateOptionDict(description)
@@ -3107,6 +3111,10 @@ class P4Sync(Command, P4UserMap):
 self.initialParent)
 # only needed once, to connect to the previous commit
 self.initialParent = ""
+
+if self.checkpointPeriod > -1 and time.time() - 
self.lastCheckpointTime > self.checkpointPeriod:
+self.checkpoint()
+self.lastCheckpointTime = time.time()
 except IOError:
 print self.gitError.read()
 sys.exit(1)
-- 
2.7.4 (Apple Git-66)



[PATCH] [git-p4.py] Add --checkpoint-period option to sync/clone

2016-09-12 Thread Ori Rawlings
Importing a long history from Perforce into git using the git-p4 tool
can be especially challenging. The `git p4 clone` operation is based
on an all-or-nothing transactionality guarantee. Under real-world
conditions like network unreliability or a busy Perforce server,
`git p4 clone` and  `git p4 sync` operations can easily fail, forcing a
user to restart the import process from the beginning. The longer the
history being imported, the more likely a fault occurs during the
process. Long enough imports thus become statistically unlikely to ever
succeed.

I'm looking for feedback on a potential approach for addressing the
problem. My idea was to leverage the checkpoint feature of git 
fast-import. I've included a patch which exposes a new option to the 
sync/clone commands in the git-p4 tool. The option enables explict 
checkpoints on a periodic basis (approximately every x seconds).

If the sync/clone command fails during processing of Perforce changes, 
the user can craft a new git p4 sync command that will identify 
changes that have already been imported and proceed with importing 
only changes more recent than the last successful checkpoint.

Assuming this approach makes sense, there are a few questions/items I
have:

  1. To add tests for this option, I'm thinking I'd need to simulate a 
 Perforce server or client that exits abnormally after first 
 processing some operations successfully. I'm looking for 
 suggestions on sane approaches for implementing that.
  2. From a usability perspective, I think it makes sense to print 
 out a message upon clone/sync failure if the user has enabled the 
 option. This message would describe how long ago the last 
 successful checkpoint was completed and document what command/s 
 to execute to continue importing Perforce changes. Ideally, the 
 commmand to continue would be exactly the same as the command 
 which failed, but today, clone will ignore any commits already 
 imported to git. There are some lingering TODO comments in 
 git-p4.py suggesting that clone should try to avoid reimporting
 changes. I don't mind taking a stab at addressing the TODO, but 
 am worried I'll quickly encounter edge cases in the clone/sync 
 features I don't understand.
  3. This is my first attempt at a git contribution, so I'm definitely 
 looking for feedback on commit messages, etc.


Cheers!

Ori Rawlings (1):
  [git-p4.py] Add --checkpoint-period option to sync/clone

 git-p4.py | 8 
 1 file changed, 8 insertions(+)

-- 
2.7.4 (Apple Git-66)