[git-users] Convert file system with backup into git repo

2014-04-25 Thread George Georgiev
Hi,

I am researching how I can convert file system with backup history into a 
git repository.

I would like to do this in phases. The first phase is to create a shallow 
repo with only the head files. And then I would like to unshallow it step 
by step. The goal is to have a valid git repo to start working with asap.

Could you please give me some references from where I should start reading 
first - to determine is this possible, and second - how.

Thanks,

George


-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Convert file system with backup into git repo

2014-04-25 Thread George Georgiev
Thank you Dale,

  The problem is that adding a commit to the *beginning* of the chain 
requires a bit of work, because you have to recreate all of the later 
commits so they reference the first commit. 

Are you certain about this. At first pass reading through the git shallow.c 
code I am having the feeling that I will be able to avoid this with 
creating the objects with a shallow flag. Then when I need to add a parent 
I could just attach the parent and unregistered the object as shallow 
without a need to recreate it. (exactly as it seems --unshallow works)

Thanks,
George






On Friday, April 25, 2014 11:38:20 AM UTC-7, Dale Worley wrote:

  From: George Georgiev george.ge...@gmail.com javascript: 
  
  I am researching how I can convert file system with backup history into 
 a 
  git repository. 
  
  I would like to do this in phases. The first phase is to create a 
 shallow 
  repo with only the head files. And then I would like to unshallow it 
 step 
  by step. The goal is to have a valid git repo to start working with 
 asap. 

 The details depend on the specifics of your situation.  But as long as 
 you can create copies of the file trees which are the historical 
 snapshots, you can add them to the repository and string them together 
 to form a series of commits.  The problem is that adding a commit to 
 the *beginning* of the chain requires a bit of work, because you have 
 to recreate all of the later commits so they reference the first 
 commit.  I don't think there's a porcelain command to do that, you 
 have to use plumbing commands to recreate each commit in the chain 
 one by one. 

 I don't know of any references, but following is a short Perl program 
 I use to prune out some of the commits in a repository if they are too 
 closely spaced together compared to their time in the past.  It shows 
 how you go about recreating a chain of commits. 

 Dale 

 #! /bin/perl 

 use strict; 

 # Process the -d switch, which must have a numeric argument. 
 my($debug) = 0; 
 if ($ARGV[0] =~ /^-d([\d]+)$/) { 
 $debug = $1; 
 print STDERR \$debug = $debug\n; 
 shift; 
 } 
 die Unknown argument(s): , join(' ', @ARGV) if $#ARGV = 0; 

 # This is the rate at which commits are to be retained: 
 my($rate); 
 # At a time N in the past, commits should be spaced at most N/$rate 
 # apart. 
 # Thus, larger $rate values mean to keep more commits around. 
 # The rate is stored in the Git configuration as time-warp.rate. 
 # If the user entered it on the command line, it would be easier for 
 # the user to fumble-finger a small value and delete much of the 
 # history he wanted to save. 
 my($config_name) = 'time-warp.rate'; 
 my($command) = 'git config ' . $config_name; 
 chomp($rate = `$command`); 
 my($r) = $?  8; 
 if ($r != 0) { 
 warn Could not obtain Git configuration value '$config_name'.\n; 
 die Error executing '$command': exit code $r\n if $r; 
 } elsif ($rate !~ /^\d+$/  $rate = 1) { 
 die Rate value '$rate' is syntactically incorrect or less than 1.\n; 
 } 
 print STDERR \$rate = $rate\n if $debug; 

 # Get the hashes and times of the commit history. 
 # Note that we are assuming that the current branch of the repository 
 # is the branch to be operated upon. 
 $command = git log --pretty=tformat:'%H %ct'; 
 print STDERR \$command = $command\n if $debug = 3; 
 open(GIT, -|, $command) || 
 die Error executing '$command' for input: $!\n; 
 # Note that git log lists commits going back in time, so @hashes and 
 @times 
 # will describe the latest commits first. 
 my(@hashes, @times); 
 while (GIT) { 
 chomp; 
 my($hash, $time) = split; 
 push(@hashes, $hash); 
 push(@times, $time); 
 print STDERR \$hashes[, $#hashes, ] = $hash, \$times[, $#times, ] 
 = $time\n if $debug = 2; 
 } 
 close GIT || die Error closing '$command': $!\n; 

 # Get the now time, which is the time of the last commit. 
 my($now) = $times[0]; 
 print STDERR \$now = $now\n if $debug; 

 # Now, working from oldest to newest, look at each commit and decide 
 whether 
 # to recreate it. 
 # The last commit we've recreated and its time. 
 my($last_commit) = ''; 
 my($last_commit_time) = 0; 
 my($commits_created) = 0; 
 print STDERR \$last_commit = $last_commit, \$last_commit_time = 
 $last_commit_time\n 
 if $debug; 
 # Cycle through the commits from the oldest to the newest, recreating 
 # the commit chain, retaining the commits we desire. 
 for (my $i = $#hashes; $i = 0; $i--) { 
 print STDERR \$i = $i, \$hashes[$i] = $hashes[$i], \$times[$i] = 
 $times[$i]\n 
 if $debug; 
 # Test if commit $i-1 (the next-newer commit than this one) is 
 # close enough to $last_commit that we can omit creating a new 
 # commit from this one, commit $i.  We always generate a new 
 # commit from commit 0, which is the newest. 
 if ($i  0  $debug) { 
 print STDERR \$times[, $i-1, ] = $times[$i-1], 
 \$last_commit_time

Re: [git-users] Convert file system with backup into git repo

2014-04-28 Thread George Georgiev
Thanks Dale.

 But it's not tremendously difficult to recreate a chain of commits.
My concern is time to have, not difficulty to do. I would like users to
have a git repo as soon as possible. In same cases creating full repo might
take many hours.

It seems like it will be impossible to do, unless the hash is just an ID,
but not an actual checksum - in which case I could ignore to update it -
but this seems very risky.




On Mon, Apr 28, 2014 at 9:16 AM, Dale R. Worley wor...@alum.mit.edu wrote:

  From: George Georgiev george.georgiev...@gmail.com
 
The problem is that adding a commit to the *beginning* of the chain
   requires a bit of work, because you have to recreate all of the later
   commits so they reference the first commit.
 
  Are you certain about this. At first pass reading through the git
  shallow.c code I am having the feeling that I will be able to avoid
  this with creating the objects with a shallow flag. Then when I need
  to add a parent I could just attach the parent and unregistered the
  object as shallow without a need to recreate it. (exactly as it
  seems --unshallow works)

 I'm not familiar with shallow, but as I understand it, it fetches
 only the latest commits from a repository and inserts them into a new
 repository.  Thus, the oldest commit in the chain has a dangling
 pointer to its parent.

 The difference between that and the situation you describe is that
 adding a new commit to the beginning of the commit chain changes the
 hashes of all the commits, because the information that does into the
 hash function to get the commit name includes the hash of the parent
 commit.  Shallow repositories work because you already know the
 proper hash for the latest commit.  You even know the proper hash for
 the first commit that you're fetching.

 But if you want to add a parent to the current root commit, that
 changes the information in the current root commit, which changes its
 hash.  And in the *next* commit, you have to update the parent
 pointer, which changes *its* hash, etc.

 But it's not tremendously difficult to recreate a chain of commits.

 Dale

 --
 You received this message because you are subscribed to a topic in the
 Google Groups Git for human beings group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/git-users/SKSucOqLvts/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 git-users+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.