Re: EXT :Re: GIT and large files
On Tue, 20 May 2014 18:18:08 + "Stewart, Louis (IS)" wrote: > From you response then there is a method to only obtain the Project, > Directory and Files (which could hold 80 GBs of data) and not the > rest of the Repository that contained the full overall Projects? Please google the phrase "Git shallow cloning". I would also recommend to read up on git-annex [1]. You might also consider using Subversion as it seems you do not need most benefits Git has over it and want certain benefits Subversion has over Git: * You don't need a distributed VCS (as you don't want each developer to have a full clone). * You only need a single slice of the repository history at any given revision on a developer's machine, and this is *almost* what Subversion does: it will keep the so-called "base" (or "pristine") versions of files comprising the revision you will check out, plus the checked out files theirselves. So, twice the space of the files comprising a revision. * Subversion allows you to check out only a single folder out of the entire revision. * IIRC, subversion supports locks, when a developer might tell the server they're editing a file, and this will prevent other devs from locking the same file. This might be used to serialize editions of huge and/or unmergeable files. Git can't do that (without non-standard tools deployed on the side or a centralized "meeting point" repository). My point is that while Git is fantastic for managing source code projects and project of similar types with regard to their contents, it seems your requirements are mainly not suitable for the use case Git is best tailored for. Your apparent lack of familiarity with Git might as well bite you later should you pick it right now. At least please consider reading a book or some other introduction-level material on Git to get the feeling of typical workflows used with it. 1. https://git-annex.branchable.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EXT :Re: GIT and large files
Am Dienstag, den 20.05.2014, 17:24 + schrieb Stewart, Louis (IS): > Thanks for the reply. I just read the intro to GIT and I am concerned > about the part that it will copy the whole repository to the developers > work area. They really just need the one directory and files under > that one directory. The history has TBs of data. > > Lou > > -Original Message- > From: Junio C Hamano [mailto:gits...@pobox.com] > Sent: Tuesday, May 20, 2014 1:18 PM > To: Stewart, Louis (IS) > Cc: git@vger.kernel.org > Subject: EXT :Re: GIT and large files > > "Stewart, Louis (IS)" writes: > > > Can GIT handle versioning of large 20+ GB files in a directory? > > I think you can "git add" such files, push/fetch histories that > contains such files over the wire, and "git checkout" such files, but > naturally reading, processing and writing 20+GB would take some time. > In order to run operations that need to see the changes, e.g. "git log > -p", a real content-level merge, etc., you would also need sufficient > memory because we do things in-core. Preventing that a clone fetches the whole history can be done with the --depth option of git clone. The question is what do you want to do with these 20G files? Just store them in the repo and *very* occasionally change them? For that you need a 64bit compiled version of git with enough ram. 32G does the trick here. Everything with git 1.9.1. Doing some tests on my machine with a normal harddisc gives (sorry for LC_ALL != C): $time git add file.dat; time git commit -m "add file"; time git status real16m17.913s user13m3.965s sys 0m22.461s [master 15fa953] add file 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 file.dat real15m36.666s user13m26.962s sys 0m16.185s # Auf Branch master nichts zu committen, Arbeitsverzeichnis unverändert real11m58.936s user11m50.300s sys 0m5.468s $ls -lh -rw-r--r-- 1 thomas thomas 20G Mai 20 19:01 file.dat So this works but aint fast. Playing some tricks with --assume-unchanged helps here: $git update-index --assume-unchanged file.dat $time git status # Auf Branch master nichts zu committen, Arbeitsverzeichnis unverändert real0m0.003s user0m0.000s sys 0m0.000s This trick is only save if you *know* that file.dat does not change. And btw I also set $cat .gitattributes *.dat -delta as delta compresssion should be skipped in any case. Pushing and pulling these files to and from a server needs some tweaking on the server side, otherwise the occasional git gc might kill the box. Btw. I happily have files with 1.5GB in my git repositories and also change them. And also work with git for windows. So in this region of file sizes things work quite well. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: EXT :Re: GIT and large files
>From you response then there is a method to only obtain the Project, Directory >and Files (which could hold 80 GBs of data) and not the rest of the Repository >that contained the full overall Projects? -Original Message- From: Junio C Hamano [mailto:gits...@pobox.com] Sent: Tuesday, May 20, 2014 2:15 PM To: Stewart, Louis (IS) Cc: git@vger.kernel.org Subject: Re: EXT :Re: GIT and large files "Stewart, Louis (IS)" writes: > Thanks for the reply. I just read the intro to GIT and I am concerned > about the part that it will copy the whole repository to the > developers work area. They really just need the one directory and > files under that one directory. The history has TBs of data. Then you will spend time reading, processing and writing TBs of data when you clone, unless your developers do something to limit the history they fetch, e.g. by shallowly cloning. > > Lou > > -Original Message- > From: Junio C Hamano [mailto:gits...@pobox.com] > Sent: Tuesday, May 20, 2014 1:18 PM > To: Stewart, Louis (IS) > Cc: git@vger.kernel.org > Subject: EXT :Re: GIT and large files > > "Stewart, Louis (IS)" writes: > >> Can GIT handle versioning of large 20+ GB files in a directory? > > I think you can "git add" such files, push/fetch histories that contains such > files over the wire, and "git checkout" such files, but naturally reading, > processing and writing 20+GB would take some time. In order to run > operations that need to see the changes, e.g. "git log -p", a real > content-level merge, etc., you would also need sufficient memory because we > do things in-core. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EXT :Re: GIT and large files
"Stewart, Louis (IS)" writes: > Thanks for the reply. I just read the intro to GIT and I am > concerned about the part that it will copy the whole repository to > the developers work area. They really just need the one directory > and files under that one directory. The history has TBs of data. Then you will spend time reading, processing and writing TBs of data when you clone, unless your developers do something to limit the history they fetch, e.g. by shallowly cloning. > > Lou > > -Original Message- > From: Junio C Hamano [mailto:gits...@pobox.com] > Sent: Tuesday, May 20, 2014 1:18 PM > To: Stewart, Louis (IS) > Cc: git@vger.kernel.org > Subject: EXT :Re: GIT and large files > > "Stewart, Louis (IS)" writes: > >> Can GIT handle versioning of large 20+ GB files in a directory? > > I think you can "git add" such files, push/fetch histories that contains such > files over the wire, and "git checkout" such files, but naturally reading, > processing and writing 20+GB would take some time. In order to run > operations that need to see the changes, e.g. "git log -p", a real > content-level merge, etc., you would also need sufficient memory because we > do things in-core. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: EXT :Re: GIT and large files
Thanks for the reply. I just read the intro to GIT and I am concerned about the part that it will copy the whole repository to the developers work area. They really just need the one directory and files under that one directory. The history has TBs of data. Lou -Original Message- From: Junio C Hamano [mailto:gits...@pobox.com] Sent: Tuesday, May 20, 2014 1:18 PM To: Stewart, Louis (IS) Cc: git@vger.kernel.org Subject: EXT :Re: GIT and large files "Stewart, Louis (IS)" writes: > Can GIT handle versioning of large 20+ GB files in a directory? I think you can "git add" such files, push/fetch histories that contains such files over the wire, and "git checkout" such files, but naturally reading, processing and writing 20+GB would take some time. In order to run operations that need to see the changes, e.g. "git log -p", a real content-level merge, etc., you would also need sufficient memory because we do things in-core. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GIT and large files
"Stewart, Louis (IS)" writes: > Can GIT handle versioning of large 20+ GB files in a directory? I think you can "git add" such files, push/fetch histories that contains such files over the wire, and "git checkout" such files, but naturally reading, processing and writing 20+GB would take some time. In order to run operations that need to see the changes, e.g. "git log -p", a real content-level merge, etc., you would also need sufficient memory because we do things in-core. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GIT and large files
On 5/20/2014 10:37 AM, Stewart, Louis (IS) wrote: Can GIT handle versioning of large 20+ GB files in a directory? Maybe you're looking for git-annex? https://git-annex.branchable.com/ -- .marius -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: GIT and large files
> -Original Message- > From: Stewart, Louis (IS) > Sent: Tuesday, May 20, 2014 11:38 > > Can GIT handle versioning of large 20+ GB files in a directory? Are you asking 20 files of a GB each or files 20GB each? A what and why may help with the underlying questions. v/r, Jason Pyeron -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100- - +1 (443) 269-1555 x333Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: EXT :Re: GIT and large files
The files in question would be in directory containing many files some small other huge (example: text files, docs,and jpgs are Mbs, but executables and ova images are GBs, etc). Lou From: Gary Fixler [mailto:gfix...@gmail.com] Sent: Tuesday, May 20, 2014 12:09 PM To: Stewart, Louis (IS) Cc: git@vger.kernel.org Subject: EXT :Re: GIT and large files Technically yes, but from a practical standpoint, not really. Facebook recently revealed that they have a 54GB git repo[1], but I doubt it has 20+GB files in it. I've put 18GB of photos into a git repo, but everything about the process was fairly painful, and I don't plan to do it again. Are your files non-mergeable binaries (e.g. videos)? The biggest problem here is with branching and merging. Conflict resolution with non-mergeable assets ends up an us-vs-them fight, and I don't understand all of the particulars of that. From git's standpoint it's simple - you just have to choose one or the other. From a workflow standpoint, you end up causing trouble if two people have changed an asset, and both people consider their change important. Centralized systems get around this problem with locks. Git could do this, and I've thought about it quite a bit. I work in games - we have code, but also a lot of binaries, that I'd like to keep in sync with the code. For awhile I considered suggesting some ideas to this group, but I'm pretty sure the locking issue makes it a non-starter. The basic idea - skipping locking for the moment - would be to allow setting git attributes by file type, file size threshold, folder, etc., to allow git to know that some files are considered "bigfiles." These could be placed into the objects folder, but I'd actually prefer they go into a .git/bigfile folder. They'd still be saved as contents under their hash, but a normal git transfer wouldn't send them. They'd be in the tree as 'big' or 'bigfile' (instead of 'blob', 'tree', or 'commit' (for submodules)). Git would warn you on push that there were bigfiles to send, and you could add, say, --with-big to also send them, or send them later with, say, `git push --big`. They'd simply be zipped up and sent over, without any packfile fanciness. When you clone, you wouldn't get the bigfiles, unless you specified --with-big, and it would warn you that there are also bigfiles, and tell you what command to run to get also get them (`git fetch --big`, perhaps). Git status would always let you know if you were missing bigfiles. I think hopping around between commits would follow the same strategy, you'd always have to, e.g. `git checkout foo --with-big`, or `git checkout foo` and then `git update big` (or whatever - I'm not married to any of these names). Resolving conflicts on merge would simply have to be up to you. It would be documented clearly that you're entering weird territory, and that your team has to deal with bigfiles somehow, perhaps with some suggested strategies ("Pass the conch?"). I could imagine some strategies for this. Maybe bigfiles require connecting to a blessed repo to grab the right to make a commit on it. That has many problems, of course, and now I can feel everyone reading this shifting uneasily in their seats :) -g [1] https://twitter.com/feross/status/459259593630433280 On Tue, May 20, 2014 at 8:37 AM, Stewart, Louis (IS) wrote: Can GIT handle versioning of large 20+ GB files in a directory? Lou Stewart AOCWS Software Configuration Management 757-269-2388 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html N�r��yb�X��ǧv�^�){.n�+ا���ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf