Re: [git-users] converting timestamped files into Git

2020-02-18 Thread Konstantin Khomoutov
On Tue, Feb 18, 2020 at 05:27:52AM -0800, Steve Cobrin wrote:

> Historically we were really old-school and stored different versions of 
> files with a timestamp appended to the end of the filename, and stashed 
> them into a directory, e.g.
> 
> foo
> .archive/mmdd
> 
> Now we want to put all the instances of the file int a Git repo, is there 
> any easy way to do it?

Depends on what you define as "easy".

Git has a whole lot of low-level commands to synthesize commits, so I'd
say in your case you could write a program in any suitable programming
language which would call out to Git.

The program would enumerate the files under ".archive", sort them by
date parsed from those "mmdd"-formatted names then would synthesize
a series of commits referring to these files.

Here is a quick stab at undertaking such a task. The script presupposes
it is run in a directory one level higher than ".archive" and it creates
a Git repository named "dest" (it is recreated if exists). The contents
of each version of the file is recorded in commits as "foo.txt".

8<
#!/bin/sh

set -e -u

src=.archive
dest=dest

test -d "$dest" && rm -rf "$dest"
git init "$dest"

GIT_DIR="$dest/.git"
export GIT_DIR

find "$src" -mindepth 1 -maxdepth 1 -type f -printf '%f\n' |
sed -nEe 's!^(.{4})(.{2})(.{2})$!& \1-\2-\3!p' | {
set -e -u
while read fn ymd; do
sec=`date --date="$ymd" +%s`
printf '%s\t%s\t%s\n' "$sec" "$ymd" "$fn"
done | sort -k 1 -n
} | {
set -e -u
parent=''
while read _ ymd fn; do
sha=`git hash-object -w "$src/$fn"`
git update-index --add --replace --cacheinfo 0644,$sha,foo.txt
sha=`git write-tree`
GIT_AUTHOR_DATE="${ymd}T00:00:00"
export GIT_AUTHOR_DATE
if [ "x$parent" = "x" ]; then
parent=`git commit-tree -m 'xxx' "$sha"`
else
parent=`git commit-tree -m 'xxx' -p "$parent" "$sha"`
fi
echo "Committed: $parent"
done
git update-ref HEAD "$parent"
}
8<

How the script rolls:

- The contents of the ".archive" direcory is searched for files,
  only one level deep.

- The names of the files are filtered through a sed script which
  extracts the first four, the following two and then another following
  two characters for each and then prints the original string and
  the extracted bits joined by a dash - so that "mmdd" gets converted
  to "mmdd -mm-dd".

- The output of the sed script is fed to a compound shell command
  which converts each received "-mm-dd" to a number of seconds
  since UNIX epoch corresponding to the source date.

  The result is joined with the original file name and the source date
  and then sorted using numeric sort on the number of seconds.

- The sorted output is piped to another compound shell command
  which injects the named file into the destination Git database
  then updates the index with the SHA1 name of the injected blob
  recording it as a file (hence mode 0644 with the name "foo.txt").

  The index is then written as a new Git tree object (tree object
  represent the contents of what you'd call a directory) and then
  its SHA1 name is used to record a new commit with the message "xxx".

- Once the string of commits is formed, the HEAD reference in the
  destination repository is updated to point to that last commit.


The somewhat subtle points are exporting of the GIT_DIR environment
variable - it allows to make all called Git commands to expect to find
the Git object database in the specified location rather than to apply
their usual heuristics about finding it in the current directory or in
one of its parent directories, - and the GIT_AUTHOR_DATE environment
variable which forces `git commit-tree` to use that date in the
generated commit object, not the current one.

You might also want to set (and export) other GIT_AUTHOR_* variables in
order to set authorship of the commits.

Also note two subtle points about interpretation of dates: it happens
here in two places: `date --date=...` interprets it and then
`git commit-tree` does the same. By default, both of them interpret the
dates as referring to your local timezone (well, the timezone as seen
from the shell running the script). This might be perfectly OK but if
not, you might need to resort to various tricks to tell the both
commands which timezone the date really is to be interpreted.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

[git-users] converting timestamped files into Git

2020-02-18 Thread Steve Cobrin
Historically we were really old-school and stored different versions of 
files with a timestamp appended to the end of the filename, and stashed 
them into a directory, e.g.

foo
.archive/mmdd

Now we want to put all the instances of the file int a Git repo, is there 
any easy way to do it?

Cheers
Steve 

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/git-users/c2bf19c9-9105-4577-8ac6-c8e3464a2da9%40googlegroups.com.


[git-users] Re: converting timestamped files into Git

2020-02-18 Thread Philip Oakley
Hi Steve,

You may need to elaborate a bit more as the options may be wider than you 
imply, and so needs more information to make the choice.

I'm going to guess that what you had was a series of project snapshots (on 
one of my projects we would 'compressed zip' the super-top level folder, 
and then update the folder name with its new/next version suffix). 

Thus we had a whole load of these snapshots, and I created short script 
that would extract each zip to a working directory, and then commit that 
directory with the date set to the zip date, then clean the directory and 
do the next zip. 

This created a simplistic git project structure that captured all the 
release points. It was a small team of 2-3 people hence the original method 
was pretty effective locally, and a adding the zips to the git repo gives a 
good view of the history.

If your archive is arranged on some other basis, then a bit more info...

Philip


On Tuesday, February 18, 2020 at 1:27:53 PM UTC, Steve Cobrin wrote:
>
> Historically we were really old-school and stored different versions of 
> files with a timestamp appended to the end of the filename, and stashed 
> them into a directory, e.g.
>
> foo
> .archive/mmdd
>
> Now we want to put all the instances of the file int a Git repo, is there 
> any easy way to do it?
>
> Cheers
> Steve 
>

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/git-users/cbb36006-6839-4c3c-a10b-3278b132322b%40googlegroups.com.