Re: [PATCH v2 4/4] git-p4: add support for large file systems

2015-09-04 Thread Jeff King
On Thu, Sep 03, 2015 at 10:49:36PM +0200, Lars Schneider wrote:

> Do you want to test this feature against a real backend? In that case
> you would need a LFS enabled GitHub account. If you don’t have one,
> maybe _Jeff King_ can help?

You can sign up here:

  https://github.com/early_access/git-lfs

I don't know what the turnaround time is like, but if you (or anybody
doing development work around git-lfs) needs it expedited, email me
off-list and I can look into it.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] git-p4: add support for large file systems

2015-09-03 Thread Lars Schneider

On 03 Sep 2015, at 22:03, Luke Diamand  wrote:

> On 03/09/15 17:35, larsxschnei...@gmail.com wrote:
>> From: Lars Schneider 
>> 
>> Perforce repositories can contain large (binary) files. Migrating these
>> repositories to Git generates very large local clones. External storage
>> systems such as LFS [1] or git-annex [2] try to address this problem.
>> 
>> Add a generic mechanism to detect large files based on extension,
>> uncompressed size, and/or compressed size. Add LFS as example
>> implementation.
> 
> Can you split this into "add mechanism for large file support" and then "add 
> LFS implementation".
> 
> It will make it easier to review if nothing else.
Fun fact: I had it that way first and then decided against it. I split it this 
way:

commit 1: generic impl + docs
commits 2: lfs impl + TC

OK?

> 
> Some other comments inline.
> 
> Thanks!
> Luke
> 
>> 
>> [1] https://git-lfs.github.com/
>> [2] http://www.git-annex.org/
>> 
>> Signed-off-by: Lars Schneider 
>> ---
>>  Documentation/git-p4.txt |  21 
>>  git-p4.py| 126 +--
>>  t/t9823-git-p4-lfs.sh| 263 
>> +++
>>  3 files changed, 401 insertions(+), 9 deletions(-)
>>  create mode 100755 t/t9823-git-p4-lfs.sh
>> 
>> diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
>> index 82aa5d6..eac5bad 100644
>> --- a/Documentation/git-p4.txt
>> +++ b/Documentation/git-p4.txt
>> @@ -510,6 +510,27 @@ git-p4.useClientSpec::
>>  option '--use-client-spec'.  See the "CLIENT SPEC" section above.
>>  This variable is a boolean, not the name of a p4 client.
>> 
>> +git-p4.largeFileSystem::
>> +Specify the system that is used used for large (binary) files. Only
>> +"LFS" [1] is supported right now. Download and install the Git LFS
>> +command line extension to use this option.
>> +[1] https://git-lfs.github.com/
>> +
>> +git-p4.largeFileExtensions::
>> +All files matching a file extension in the list will be processed
>> +by the large file system. Do not prefix the extensions with '.'.
>> +
>> +git-p4.largeFileThreshold::
>> +All files with an uncompressed size exceeding the threshold will be
>> +processed by the large file system. By default the threshold is
>> +defined in bytes. Add the suffix k, m, or g to change the unit.
>> +
>> +git-p4.largeFileCompressedThreshold::
>> +All files with an compressed size exceeding the threshold will be
> 
> s/an compressed/a compressed/
ok

> 
>> +processed by the large file system. This option might significantly
>> +slow down your clone/sync process. By default the threshold is
>> +defined in bytes. Add the suffix k, m, or g to change the unit.
>> +
>>  Submit variables
>>  
>>  git-p4.detectRenames::
>> diff --git a/git-p4.py b/git-p4.py
>> index 4d78e1c..cde75a5 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -22,6 +22,8 @@ import platform
>>  import re
>>  import shutil
>>  import stat
>> +import zipfile
>> +import zlib
>> 
>>  try:
>>  from subprocess import CalledProcessError
>> @@ -922,6 +924,51 @@ def wildcard_present(path):
>>  m = re.search("[*#@%]", path)
>>  return m is not None
>> 
>> +def largeFileSystem():
>> +try:
>> +return getattr(sys.modules[__name__], 
>> gitConfig('git-p4.largeFileSystem'))
>> +except AttributeError as e:
>> +die('Large file system not supported: %s' % 
>> gitConfig('git-p4.largeFileSystem'))
>> +
>> +class LFS:
> 
> Docstrings?
OK.

> 
> 
>> +@staticmethod
>> +def description():
>> +return 'LFS (see https://git-lfs.github.com/)'
>> +
>> +@staticmethod
>> +def attributeFilter():
>> +return 'lfs'
>> +
>> +@staticmethod
>> +def generatePointer(cloneDestination, relPath, contents):
>> +# Write P4 content to temp file
>> +p4ContentTempFile = 
>> tempfile.NamedTemporaryFile(prefix='git-p4-large-file', delete=False)
> 
> delete=False, doesn't that mean that if anything goes wrong, we leave large 
> files lying around? Perhaps that doesn't matter, I don't know.
delete=True means the file is deleted as soon as it is closed
source: https://docs.python.org/2/library/tempfile.html
I don’t close the file (sounds like a bug?!) so in theory delete=True would 
work. However, I am moving the file a few lines below.

> 
>> +for d in contents:
>> +p4ContentTempFile.write(d)
>> +p4ContentTempFile.flush()
> 
> These seems like behaviour that could live in the base class or the main 
> git-p4 code. It doesn't look LFS-specific.
ok

> 
>> +
>> +# Generate LFS pointer file based on P4 content
>> +lfsProcess = subprocess.Popen(
>> +['git', 'lfs', 'pointer', '--file=' + p4ContentTempFile.name],
>> +stdout=subprocess.PIPE
>> +)
>> +lfsPointerFile = lfsProcess.stdout.read()
>> +if lfsProcess.wait():
>> +os.remove(p4ContentTempFile.name)
>> +  

Re: [PATCH v2 4/4] git-p4: add support for large file systems

2015-09-03 Thread Luke Diamand

On 03/09/15 17:35, larsxschnei...@gmail.com wrote:

From: Lars Schneider 

Perforce repositories can contain large (binary) files. Migrating these
repositories to Git generates very large local clones. External storage
systems such as LFS [1] or git-annex [2] try to address this problem.

Add a generic mechanism to detect large files based on extension,
uncompressed size, and/or compressed size. Add LFS as example
implementation.


Can you split this into "add mechanism for large file support" and then 
"add LFS implementation".


It will make it easier to review if nothing else.

Some other comments inline.

Thanks!
Luke



[1] https://git-lfs.github.com/
[2] http://www.git-annex.org/

Signed-off-by: Lars Schneider 
---
  Documentation/git-p4.txt |  21 
  git-p4.py| 126 +--
  t/t9823-git-p4-lfs.sh| 263 +++
  3 files changed, 401 insertions(+), 9 deletions(-)
  create mode 100755 t/t9823-git-p4-lfs.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index 82aa5d6..eac5bad 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -510,6 +510,27 @@ git-p4.useClientSpec::
option '--use-client-spec'.  See the "CLIENT SPEC" section above.
This variable is a boolean, not the name of a p4 client.

+git-p4.largeFileSystem::
+   Specify the system that is used used for large (binary) files. Only
+   "LFS" [1] is supported right now. Download and install the Git LFS
+   command line extension to use this option.
+   [1] https://git-lfs.github.com/
+
+git-p4.largeFileExtensions::
+   All files matching a file extension in the list will be processed
+   by the large file system. Do not prefix the extensions with '.'.
+
+git-p4.largeFileThreshold::
+   All files with an uncompressed size exceeding the threshold will be
+   processed by the large file system. By default the threshold is
+   defined in bytes. Add the suffix k, m, or g to change the unit.
+
+git-p4.largeFileCompressedThreshold::
+   All files with an compressed size exceeding the threshold will be


s/an compressed/a compressed/


+   processed by the large file system. This option might significantly
+   slow down your clone/sync process. By default the threshold is
+   defined in bytes. Add the suffix k, m, or g to change the unit.
+
  Submit variables
  
  git-p4.detectRenames::
diff --git a/git-p4.py b/git-p4.py
index 4d78e1c..cde75a5 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -22,6 +22,8 @@ import platform
  import re
  import shutil
  import stat
+import zipfile
+import zlib

  try:
  from subprocess import CalledProcessError
@@ -922,6 +924,51 @@ def wildcard_present(path):
  m = re.search("[*#@%]", path)
  return m is not None

+def largeFileSystem():
+try:
+return getattr(sys.modules[__name__], 
gitConfig('git-p4.largeFileSystem'))
+except AttributeError as e:
+die('Large file system not supported: %s' % 
gitConfig('git-p4.largeFileSystem'))
+
+class LFS:


Docstrings?



+@staticmethod
+def description():
+return 'LFS (see https://git-lfs.github.com/)'
+
+@staticmethod
+def attributeFilter():
+return 'lfs'
+
+@staticmethod
+def generatePointer(cloneDestination, relPath, contents):
+# Write P4 content to temp file
+p4ContentTempFile = 
tempfile.NamedTemporaryFile(prefix='git-p4-large-file', delete=False)


delete=False, doesn't that mean that if anything goes wrong, we leave 
large files lying around? Perhaps that doesn't matter, I don't know.



+for d in contents:
+p4ContentTempFile.write(d)
+p4ContentTempFile.flush()


These seems like behaviour that could live in the base class or the main 
git-p4 code. It doesn't look LFS-specific.



+
+# Generate LFS pointer file based on P4 content
+lfsProcess = subprocess.Popen(
+['git', 'lfs', 'pointer', '--file=' + p4ContentTempFile.name],
+stdout=subprocess.PIPE
+)
+lfsPointerFile = lfsProcess.stdout.read()
+if lfsProcess.wait():
+os.remove(p4ContentTempFile.name)
+die('git-lfs command failed. Did you install the extension?')
+contents = [i+'\n' for i in lfsPointerFile.split('\n')[2:][:-1]]
+
+# Write P4 content to LFS
+oid = contents[1].split(' ')[1].split(':')[1][:-1]
+oidPath = os.path.join(cloneDestination, '.git', 'lfs', 'objects', 
oid[:2], oid[2:4])
+if not os.path.isdir(oidPath):
+os.makedirs(oidPath)
+shutil.move(p4ContentTempFile.name, os.path.join(oidPath, oid))


This also does not look LFS-specific.


+
+# LFS Spec states that pointer files should not have the executable 
bit set.
+gitMode = '100644'
+return (gitMode, contents)
+
  class Command:
  def __init__(self):
  self.usage = "usage: %

[PATCH v2 4/4] git-p4: add support for large file systems

2015-09-03 Thread larsxschneider
From: Lars Schneider 

Perforce repositories can contain large (binary) files. Migrating these
repositories to Git generates very large local clones. External storage
systems such as LFS [1] or git-annex [2] try to address this problem.

Add a generic mechanism to detect large files based on extension,
uncompressed size, and/or compressed size. Add LFS as example
implementation.

[1] https://git-lfs.github.com/
[2] http://www.git-annex.org/

Signed-off-by: Lars Schneider 
---
 Documentation/git-p4.txt |  21 
 git-p4.py| 126 +--
 t/t9823-git-p4-lfs.sh| 263 +++
 3 files changed, 401 insertions(+), 9 deletions(-)
 create mode 100755 t/t9823-git-p4-lfs.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index 82aa5d6..eac5bad 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -510,6 +510,27 @@ git-p4.useClientSpec::
option '--use-client-spec'.  See the "CLIENT SPEC" section above.
This variable is a boolean, not the name of a p4 client.
 
+git-p4.largeFileSystem::
+   Specify the system that is used used for large (binary) files. Only
+   "LFS" [1] is supported right now. Download and install the Git LFS
+   command line extension to use this option.
+   [1] https://git-lfs.github.com/
+
+git-p4.largeFileExtensions::
+   All files matching a file extension in the list will be processed
+   by the large file system. Do not prefix the extensions with '.'.
+
+git-p4.largeFileThreshold::
+   All files with an uncompressed size exceeding the threshold will be
+   processed by the large file system. By default the threshold is
+   defined in bytes. Add the suffix k, m, or g to change the unit.
+
+git-p4.largeFileCompressedThreshold::
+   All files with an compressed size exceeding the threshold will be
+   processed by the large file system. This option might significantly
+   slow down your clone/sync process. By default the threshold is
+   defined in bytes. Add the suffix k, m, or g to change the unit.
+
 Submit variables
 
 git-p4.detectRenames::
diff --git a/git-p4.py b/git-p4.py
index 4d78e1c..cde75a5 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -22,6 +22,8 @@ import platform
 import re
 import shutil
 import stat
+import zipfile
+import zlib
 
 try:
 from subprocess import CalledProcessError
@@ -922,6 +924,51 @@ def wildcard_present(path):
 m = re.search("[*#@%]", path)
 return m is not None
 
+def largeFileSystem():
+try:
+return getattr(sys.modules[__name__], 
gitConfig('git-p4.largeFileSystem'))
+except AttributeError as e:
+die('Large file system not supported: %s' % 
gitConfig('git-p4.largeFileSystem'))
+
+class LFS:
+@staticmethod
+def description():
+return 'LFS (see https://git-lfs.github.com/)'
+
+@staticmethod
+def attributeFilter():
+return 'lfs'
+
+@staticmethod
+def generatePointer(cloneDestination, relPath, contents):
+# Write P4 content to temp file
+p4ContentTempFile = 
tempfile.NamedTemporaryFile(prefix='git-p4-large-file', delete=False)
+for d in contents:
+p4ContentTempFile.write(d)
+p4ContentTempFile.flush()
+
+# Generate LFS pointer file based on P4 content
+lfsProcess = subprocess.Popen(
+['git', 'lfs', 'pointer', '--file=' + p4ContentTempFile.name],
+stdout=subprocess.PIPE
+)
+lfsPointerFile = lfsProcess.stdout.read()
+if lfsProcess.wait():
+os.remove(p4ContentTempFile.name)
+die('git-lfs command failed. Did you install the extension?')
+contents = [i+'\n' for i in lfsPointerFile.split('\n')[2:][:-1]]
+
+# Write P4 content to LFS
+oid = contents[1].split(' ')[1].split(':')[1][:-1]
+oidPath = os.path.join(cloneDestination, '.git', 'lfs', 'objects', 
oid[:2], oid[2:4])
+if not os.path.isdir(oidPath):
+os.makedirs(oidPath)
+shutil.move(p4ContentTempFile.name, os.path.join(oidPath, oid))
+
+# LFS Spec states that pointer files should not have the executable 
bit set.
+gitMode = '100644'
+return (gitMode, contents)
+
 class Command:
 def __init__(self):
 self.usage = "usage: %prog [options]"
@@ -2038,6 +2085,7 @@ class P4Sync(Command, P4UserMap):
 self.clientSpecDirs = None
 self.tempBranches = []
 self.tempBranchLocation = "git-p4-tmp"
+self.largeFiles = []
 
 if gitConfig("git-p4.syncFromOrigin") == "false":
 self.syncWithOrigin = False
@@ -2158,6 +2206,59 @@ class P4Sync(Command, P4UserMap):
 
 return branches
 
+def writeToGitStream(self, gitMode, relPath, contents):
+self.gitStream.write('M %s inline %s\n' % (gitMode, relPath))
+self.gitStream.write('data %d\n' % sum(len(d) for d in contents))
+for