Re: [PATCH v2 4/4] git-p4: add support for large file systems
On Thu, Sep 03, 2015 at 10:49:36PM +0200, Lars Schneider wrote: > Do you want to test this feature against a real backend? In that case > you would need a LFS enabled GitHub account. If you don’t have one, > maybe _Jeff King_ can help? You can sign up here: https://github.com/early_access/git-lfs I don't know what the turnaround time is like, but if you (or anybody doing development work around git-lfs) needs it expedited, email me off-list and I can look into it. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 4/4] git-p4: add support for large file systems
On 03 Sep 2015, at 22:03, Luke Diamand wrote: > On 03/09/15 17:35, larsxschnei...@gmail.com wrote: >> From: Lars Schneider >> >> Perforce repositories can contain large (binary) files. Migrating these >> repositories to Git generates very large local clones. External storage >> systems such as LFS [1] or git-annex [2] try to address this problem. >> >> Add a generic mechanism to detect large files based on extension, >> uncompressed size, and/or compressed size. Add LFS as example >> implementation. > > Can you split this into "add mechanism for large file support" and then "add > LFS implementation". > > It will make it easier to review if nothing else. Fun fact: I had it that way first and then decided against it. I split it this way: commit 1: generic impl + docs commits 2: lfs impl + TC OK? > > Some other comments inline. > > Thanks! > Luke > >> >> [1] https://git-lfs.github.com/ >> [2] http://www.git-annex.org/ >> >> Signed-off-by: Lars Schneider >> --- >> Documentation/git-p4.txt | 21 >> git-p4.py| 126 +-- >> t/t9823-git-p4-lfs.sh| 263 >> +++ >> 3 files changed, 401 insertions(+), 9 deletions(-) >> create mode 100755 t/t9823-git-p4-lfs.sh >> >> diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt >> index 82aa5d6..eac5bad 100644 >> --- a/Documentation/git-p4.txt >> +++ b/Documentation/git-p4.txt >> @@ -510,6 +510,27 @@ git-p4.useClientSpec:: >> option '--use-client-spec'. See the "CLIENT SPEC" section above. >> This variable is a boolean, not the name of a p4 client. >> >> +git-p4.largeFileSystem:: >> +Specify the system that is used used for large (binary) files. Only >> +"LFS" [1] is supported right now. Download and install the Git LFS >> +command line extension to use this option. >> +[1] https://git-lfs.github.com/ >> + >> +git-p4.largeFileExtensions:: >> +All files matching a file extension in the list will be processed >> +by the large file system. Do not prefix the extensions with '.'. >> + >> +git-p4.largeFileThreshold:: >> +All files with an uncompressed size exceeding the threshold will be >> +processed by the large file system. By default the threshold is >> +defined in bytes. Add the suffix k, m, or g to change the unit. >> + >> +git-p4.largeFileCompressedThreshold:: >> +All files with an compressed size exceeding the threshold will be > > s/an compressed/a compressed/ ok > >> +processed by the large file system. This option might significantly >> +slow down your clone/sync process. By default the threshold is >> +defined in bytes. Add the suffix k, m, or g to change the unit. >> + >> Submit variables >> >> git-p4.detectRenames:: >> diff --git a/git-p4.py b/git-p4.py >> index 4d78e1c..cde75a5 100755 >> --- a/git-p4.py >> +++ b/git-p4.py >> @@ -22,6 +22,8 @@ import platform >> import re >> import shutil >> import stat >> +import zipfile >> +import zlib >> >> try: >> from subprocess import CalledProcessError >> @@ -922,6 +924,51 @@ def wildcard_present(path): >> m = re.search("[*#@%]", path) >> return m is not None >> >> +def largeFileSystem(): >> +try: >> +return getattr(sys.modules[__name__], >> gitConfig('git-p4.largeFileSystem')) >> +except AttributeError as e: >> +die('Large file system not supported: %s' % >> gitConfig('git-p4.largeFileSystem')) >> + >> +class LFS: > > Docstrings? OK. > > >> +@staticmethod >> +def description(): >> +return 'LFS (see https://git-lfs.github.com/)' >> + >> +@staticmethod >> +def attributeFilter(): >> +return 'lfs' >> + >> +@staticmethod >> +def generatePointer(cloneDestination, relPath, contents): >> +# Write P4 content to temp file >> +p4ContentTempFile = >> tempfile.NamedTemporaryFile(prefix='git-p4-large-file', delete=False) > > delete=False, doesn't that mean that if anything goes wrong, we leave large > files lying around? Perhaps that doesn't matter, I don't know. delete=True means the file is deleted as soon as it is closed source: https://docs.python.org/2/library/tempfile.html I don’t close the file (sounds like a bug?!) so in theory delete=True would work. However, I am moving the file a few lines below. > >> +for d in contents: >> +p4ContentTempFile.write(d) >> +p4ContentTempFile.flush() > > These seems like behaviour that could live in the base class or the main > git-p4 code. It doesn't look LFS-specific. ok > >> + >> +# Generate LFS pointer file based on P4 content >> +lfsProcess = subprocess.Popen( >> +['git', 'lfs', 'pointer', '--file=' + p4ContentTempFile.name], >> +stdout=subprocess.PIPE >> +) >> +lfsPointerFile = lfsProcess.stdout.read() >> +if lfsProcess.wait(): >> +os.remove(p4ContentTempFile.name) >> +
Re: [PATCH v2 4/4] git-p4: add support for large file systems
On 03/09/15 17:35, larsxschnei...@gmail.com wrote: From: Lars Schneider Perforce repositories can contain large (binary) files. Migrating these repositories to Git generates very large local clones. External storage systems such as LFS [1] or git-annex [2] try to address this problem. Add a generic mechanism to detect large files based on extension, uncompressed size, and/or compressed size. Add LFS as example implementation. Can you split this into "add mechanism for large file support" and then "add LFS implementation". It will make it easier to review if nothing else. Some other comments inline. Thanks! Luke [1] https://git-lfs.github.com/ [2] http://www.git-annex.org/ Signed-off-by: Lars Schneider --- Documentation/git-p4.txt | 21 git-p4.py| 126 +-- t/t9823-git-p4-lfs.sh| 263 +++ 3 files changed, 401 insertions(+), 9 deletions(-) create mode 100755 t/t9823-git-p4-lfs.sh diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt index 82aa5d6..eac5bad 100644 --- a/Documentation/git-p4.txt +++ b/Documentation/git-p4.txt @@ -510,6 +510,27 @@ git-p4.useClientSpec:: option '--use-client-spec'. See the "CLIENT SPEC" section above. This variable is a boolean, not the name of a p4 client. +git-p4.largeFileSystem:: + Specify the system that is used used for large (binary) files. Only + "LFS" [1] is supported right now. Download and install the Git LFS + command line extension to use this option. + [1] https://git-lfs.github.com/ + +git-p4.largeFileExtensions:: + All files matching a file extension in the list will be processed + by the large file system. Do not prefix the extensions with '.'. + +git-p4.largeFileThreshold:: + All files with an uncompressed size exceeding the threshold will be + processed by the large file system. By default the threshold is + defined in bytes. Add the suffix k, m, or g to change the unit. + +git-p4.largeFileCompressedThreshold:: + All files with an compressed size exceeding the threshold will be s/an compressed/a compressed/ + processed by the large file system. This option might significantly + slow down your clone/sync process. By default the threshold is + defined in bytes. Add the suffix k, m, or g to change the unit. + Submit variables git-p4.detectRenames:: diff --git a/git-p4.py b/git-p4.py index 4d78e1c..cde75a5 100755 --- a/git-p4.py +++ b/git-p4.py @@ -22,6 +22,8 @@ import platform import re import shutil import stat +import zipfile +import zlib try: from subprocess import CalledProcessError @@ -922,6 +924,51 @@ def wildcard_present(path): m = re.search("[*#@%]", path) return m is not None +def largeFileSystem(): +try: +return getattr(sys.modules[__name__], gitConfig('git-p4.largeFileSystem')) +except AttributeError as e: +die('Large file system not supported: %s' % gitConfig('git-p4.largeFileSystem')) + +class LFS: Docstrings? +@staticmethod +def description(): +return 'LFS (see https://git-lfs.github.com/)' + +@staticmethod +def attributeFilter(): +return 'lfs' + +@staticmethod +def generatePointer(cloneDestination, relPath, contents): +# Write P4 content to temp file +p4ContentTempFile = tempfile.NamedTemporaryFile(prefix='git-p4-large-file', delete=False) delete=False, doesn't that mean that if anything goes wrong, we leave large files lying around? Perhaps that doesn't matter, I don't know. +for d in contents: +p4ContentTempFile.write(d) +p4ContentTempFile.flush() These seems like behaviour that could live in the base class or the main git-p4 code. It doesn't look LFS-specific. + +# Generate LFS pointer file based on P4 content +lfsProcess = subprocess.Popen( +['git', 'lfs', 'pointer', '--file=' + p4ContentTempFile.name], +stdout=subprocess.PIPE +) +lfsPointerFile = lfsProcess.stdout.read() +if lfsProcess.wait(): +os.remove(p4ContentTempFile.name) +die('git-lfs command failed. Did you install the extension?') +contents = [i+'\n' for i in lfsPointerFile.split('\n')[2:][:-1]] + +# Write P4 content to LFS +oid = contents[1].split(' ')[1].split(':')[1][:-1] +oidPath = os.path.join(cloneDestination, '.git', 'lfs', 'objects', oid[:2], oid[2:4]) +if not os.path.isdir(oidPath): +os.makedirs(oidPath) +shutil.move(p4ContentTempFile.name, os.path.join(oidPath, oid)) This also does not look LFS-specific. + +# LFS Spec states that pointer files should not have the executable bit set. +gitMode = '100644' +return (gitMode, contents) + class Command: def __init__(self): self.usage = "usage: %
[PATCH v2 4/4] git-p4: add support for large file systems
From: Lars Schneider Perforce repositories can contain large (binary) files. Migrating these repositories to Git generates very large local clones. External storage systems such as LFS [1] or git-annex [2] try to address this problem. Add a generic mechanism to detect large files based on extension, uncompressed size, and/or compressed size. Add LFS as example implementation. [1] https://git-lfs.github.com/ [2] http://www.git-annex.org/ Signed-off-by: Lars Schneider --- Documentation/git-p4.txt | 21 git-p4.py| 126 +-- t/t9823-git-p4-lfs.sh| 263 +++ 3 files changed, 401 insertions(+), 9 deletions(-) create mode 100755 t/t9823-git-p4-lfs.sh diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt index 82aa5d6..eac5bad 100644 --- a/Documentation/git-p4.txt +++ b/Documentation/git-p4.txt @@ -510,6 +510,27 @@ git-p4.useClientSpec:: option '--use-client-spec'. See the "CLIENT SPEC" section above. This variable is a boolean, not the name of a p4 client. +git-p4.largeFileSystem:: + Specify the system that is used used for large (binary) files. Only + "LFS" [1] is supported right now. Download and install the Git LFS + command line extension to use this option. + [1] https://git-lfs.github.com/ + +git-p4.largeFileExtensions:: + All files matching a file extension in the list will be processed + by the large file system. Do not prefix the extensions with '.'. + +git-p4.largeFileThreshold:: + All files with an uncompressed size exceeding the threshold will be + processed by the large file system. By default the threshold is + defined in bytes. Add the suffix k, m, or g to change the unit. + +git-p4.largeFileCompressedThreshold:: + All files with an compressed size exceeding the threshold will be + processed by the large file system. This option might significantly + slow down your clone/sync process. By default the threshold is + defined in bytes. Add the suffix k, m, or g to change the unit. + Submit variables git-p4.detectRenames:: diff --git a/git-p4.py b/git-p4.py index 4d78e1c..cde75a5 100755 --- a/git-p4.py +++ b/git-p4.py @@ -22,6 +22,8 @@ import platform import re import shutil import stat +import zipfile +import zlib try: from subprocess import CalledProcessError @@ -922,6 +924,51 @@ def wildcard_present(path): m = re.search("[*#@%]", path) return m is not None +def largeFileSystem(): +try: +return getattr(sys.modules[__name__], gitConfig('git-p4.largeFileSystem')) +except AttributeError as e: +die('Large file system not supported: %s' % gitConfig('git-p4.largeFileSystem')) + +class LFS: +@staticmethod +def description(): +return 'LFS (see https://git-lfs.github.com/)' + +@staticmethod +def attributeFilter(): +return 'lfs' + +@staticmethod +def generatePointer(cloneDestination, relPath, contents): +# Write P4 content to temp file +p4ContentTempFile = tempfile.NamedTemporaryFile(prefix='git-p4-large-file', delete=False) +for d in contents: +p4ContentTempFile.write(d) +p4ContentTempFile.flush() + +# Generate LFS pointer file based on P4 content +lfsProcess = subprocess.Popen( +['git', 'lfs', 'pointer', '--file=' + p4ContentTempFile.name], +stdout=subprocess.PIPE +) +lfsPointerFile = lfsProcess.stdout.read() +if lfsProcess.wait(): +os.remove(p4ContentTempFile.name) +die('git-lfs command failed. Did you install the extension?') +contents = [i+'\n' for i in lfsPointerFile.split('\n')[2:][:-1]] + +# Write P4 content to LFS +oid = contents[1].split(' ')[1].split(':')[1][:-1] +oidPath = os.path.join(cloneDestination, '.git', 'lfs', 'objects', oid[:2], oid[2:4]) +if not os.path.isdir(oidPath): +os.makedirs(oidPath) +shutil.move(p4ContentTempFile.name, os.path.join(oidPath, oid)) + +# LFS Spec states that pointer files should not have the executable bit set. +gitMode = '100644' +return (gitMode, contents) + class Command: def __init__(self): self.usage = "usage: %prog [options]" @@ -2038,6 +2085,7 @@ class P4Sync(Command, P4UserMap): self.clientSpecDirs = None self.tempBranches = [] self.tempBranchLocation = "git-p4-tmp" +self.largeFiles = [] if gitConfig("git-p4.syncFromOrigin") == "false": self.syncWithOrigin = False @@ -2158,6 +2206,59 @@ class P4Sync(Command, P4UserMap): return branches +def writeToGitStream(self, gitMode, relPath, contents): +self.gitStream.write('M %s inline %s\n' % (gitMode, relPath)) +self.gitStream.write('data %d\n' % sum(len(d) for d in contents)) +for