Re: git svn import failure : write .git/Git_svn_hash_BmjclS: Bad file descriptor

2015-02-16 Thread Eric Wong
Nico Schlömer  wrote:
> I just double-checked and I can only produce this issue on one machine
> (tested on 3). Apparently, this is has nothing to do with Git itself
> then.
> 
> Any ideas of what could be wrong?

What's different about that one machine?
e.g. SVN version, 64 vs 32-bit, Perl version, etc. could all be
factors (assuming identical git versions).

Also, any chance git was misinstalled somehow or your PATH was not
pointing to the correct git installation?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: odb_mkstemp's 0444 permission broke write/delete access on AFP

2015-02-16 Thread Fairuzan Roslan

> On Feb 17, 2015, at 1:34 PM, Torsten Bögershausen  wrote:
> 
> On 02/17/2015 04:22 AM, Fairuzan Roslan wrote:
>>> On Feb 17, 2015, at 3:08 AM, Matthieu Moy  
>>> wrote:
>>> 
>>> [ Please, don't top post on this list ]
>>> 
>>> Fairuzan Roslan  writes:
>>> 
 I don’t see the issue for the owner of his/her own file to have write
 access.
>>> Object and pack files are not meant to be modified. Hence, they are
>>> read-only so that an (accidental) attempt to modify them fails.
>>> 
 Setting tmp idx & pack files to read-only even for the file owner is
 not a safety feature.
>>> Yes it is. If you do not think so, then please give some arguments.
>>> 
 You should at least give the user the option to set the permission in
 the config file and not hardcoded the permission in the binary.
>>> This is the kind of thing I meant by "investigate alternate solutions".
>>> I have no AFP share to test, so it would help if you answered the
>>> question I asked in my previous message:
>>> 
> On Feb 17, 2015, at 2:23 AM, Matthieu Moy  
> wrote:
> 
> Fairuzan Roslan  writes:
> 
>> Hi,
>> 
>> Somehow the “int mode = 0444;” in odb_mkstemp (environment.c) are
>> causing a lot of issues (unable to unlink/write/rename) to those
>> people who use AFP shares.
> Is it a problem when using Git (like "git gc" failing to remove old
> packs), or when trying to remove files outside Git?
>>> (BTW, why did you try to write/rename pack files?)
>>> 
>>> --
>>> Matthieu Moy
>>> http://www-verimag.imag.fr/~moy/
>> I think its easier if I just show you…
>> 
>> OS : OS X 10.10.0 - 10.10.2
>> Client :  git version 1.9.3 (Apple Git-50) and git version 2.2.1
>> AFP share : //user@hostname._afpovertcp._tcp.local/installer on 
>> /Volumes/installer (afpfs, nodev, nosuid, mounted by user)
>> 
>> 1. git clone example
>> 
>> $ git clone https://github.com/robbyrussell/oh-my-zsh.git
>> Cloning into 'oh-my-zsh'...
>> remote: Counting objects: 11830, done.
>> remote: Total 11830 (delta 0), reused 0 (delta 0)
>> Receiving objects: 100% (11830/11830), 2.12 MiB | 481.00 KiB/s, done.
>> Resolving deltas: 100% (6510/6510), done.
>> warning: unable to unlink 
>> /Volumes/installer/oh-my-zsh/.git/objects/pack/tmp_pack_zjPxuc: Operation 
>> not permitted
>> error: unable to write sha1 filename 
>> /Volumes/installer/oh-my-zsh/.git/objects/pack/pack-cceafdc9ef02bc58844138ba543ec6cc38252bb1.pack:
>>  Operation not permitted
>> fatal: cannot store pack file
>> fatal: index-pack failed
>> 
>> $ ls -l oh-my-zsh/.git/objects/pack
>> total 5008
>> -rw---  1 user  staff   32 Feb 17 09:59 
>> pack-cceafdc9ef02bc58844138ba543ec6cc38252bb1.keep
>> -r--r--r--  1 user  staff   332312 Feb 17 09:59 tmp_idx_oUN1sb
>> -r--r--r--  1 user  staff  2223007 Feb 17 09:59 tmp_pack_zjPxuc
>> 
>> $ rm -rf oh-my-zsh/.git/objects/pack/tmp_*
>> rm: oh-my-zsh/.git/objects/pack/tmp_idx_oUN1sb: Operation not permitted
>> rm: oh-my-zsh/.git/objects/pack/tmp_pack_zjPxuc: Operation not permitted
>> 
>> Detail Errors:
>> 1. delete_ref_loose (refs.c) -> unlink_or_msg (wrapper.c) -> "unable to 
>> unlink %s: %s"
>> 2. move_temp_to_file (sha1_file.c ) -> “unable to write sha1 filename %s: %s”
>> 
>> 2. git pull example
>> 
>> Textual git:master $ git pull
>> remote: Counting objects: 435, done.
>> remote: Compressing objects: 100% (398/398), done.
>> remote: Total 435 (delta 219), reused 18 (delta 12)
>> Receiving objects: 100% (435/435), 1.22 MiB | 756.00 KiB/s, done.
>> Resolving deltas: 100% (219/219), done.
>> warning: unable to unlink .git/objects/pack/tmp_pack_vDaIZa: Operation not 
>> permitted
>> error: unable to write sha1 filename 
>> .git/objects/pack/pack-977a2dc0f4be3996dc1186e565a30d55d14b5e87.pack: 
>> Operation not permitted
> I'm somewhat unsure how this is connected to 0444 ?
> 
> It seems as if you don't have write permissions for some reasons.
> (on the higher directory), what does
> ls -ld  .git/objects/pack/
> ls -ld  .git/objects/
> give ?
> 
> can you run
> rm .git/objects/pack/pack-977a2dc0f4be3996dc1186e565a30d55d14b5e87.pack
> 
> on the command line ?

No. I have write permission on all of the folders.
drwxr-xr-x  1 user  staff   264 Feb 17 11:05 .
drwxr-xr-x  1 user  staff   264 Jan 30 12:52 ..

It has nothing to do with my folder permissions. Like I said earlier this only 
happened to people who use AFP shares.

When odb_mkstemp being called and sets the tmp idx & pack files to 0444 and 
later functions like unlink_or_msg or finish_tmp_packfile tries to unlink or 
rename those files, it will fail

It would be much faster and easier if you can try it on a AFP shares or I can 
talk you through it over irc @freenode #git (riaf^)

Regards,
Fairuzan



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: odb_mkstemp's 0444 permission broke write/delete access on AFP

2015-02-16 Thread Torsten Bögershausen

On 02/17/2015 04:22 AM, Fairuzan Roslan wrote:

On Feb 17, 2015, at 3:08 AM, Matthieu Moy  wrote:

[ Please, don't top post on this list ]

Fairuzan Roslan  writes:


I don’t see the issue for the owner of his/her own file to have write
access.

Object and pack files are not meant to be modified. Hence, they are
read-only so that an (accidental) attempt to modify them fails.


Setting tmp idx & pack files to read-only even for the file owner is
not a safety feature.

Yes it is. If you do not think so, then please give some arguments.


You should at least give the user the option to set the permission in
the config file and not hardcoded the permission in the binary.

This is the kind of thing I meant by "investigate alternate solutions".
I have no AFP share to test, so it would help if you answered the
question I asked in my previous message:


On Feb 17, 2015, at 2:23 AM, Matthieu Moy  wrote:

Fairuzan Roslan  writes:


Hi,

Somehow the “int mode = 0444;” in odb_mkstemp (environment.c) are
causing a lot of issues (unable to unlink/write/rename) to those
people who use AFP shares.

Is it a problem when using Git (like "git gc" failing to remove old
packs), or when trying to remove files outside Git?

(BTW, why did you try to write/rename pack files?)

--
Matthieu Moy
http://www-verimag.imag.fr/~moy/

I think its easier if I just show you…

OS : OS X 10.10.0 - 10.10.2
Client :  git version 1.9.3 (Apple Git-50) and git version 2.2.1
AFP share : //user@hostname._afpovertcp._tcp.local/installer on 
/Volumes/installer (afpfs, nodev, nosuid, mounted by user)

1. git clone example

$ git clone https://github.com/robbyrussell/oh-my-zsh.git
Cloning into 'oh-my-zsh'...
remote: Counting objects: 11830, done.
remote: Total 11830 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (11830/11830), 2.12 MiB | 481.00 KiB/s, done.
Resolving deltas: 100% (6510/6510), done.
warning: unable to unlink 
/Volumes/installer/oh-my-zsh/.git/objects/pack/tmp_pack_zjPxuc: Operation not 
permitted
error: unable to write sha1 filename 
/Volumes/installer/oh-my-zsh/.git/objects/pack/pack-cceafdc9ef02bc58844138ba543ec6cc38252bb1.pack:
 Operation not permitted
fatal: cannot store pack file
fatal: index-pack failed

$ ls -l oh-my-zsh/.git/objects/pack
total 5008
-rw---  1 user  staff   32 Feb 17 09:59 
pack-cceafdc9ef02bc58844138ba543ec6cc38252bb1.keep
-r--r--r--  1 user  staff   332312 Feb 17 09:59 tmp_idx_oUN1sb
-r--r--r--  1 user  staff  2223007 Feb 17 09:59 tmp_pack_zjPxuc

$ rm -rf oh-my-zsh/.git/objects/pack/tmp_*
rm: oh-my-zsh/.git/objects/pack/tmp_idx_oUN1sb: Operation not permitted
rm: oh-my-zsh/.git/objects/pack/tmp_pack_zjPxuc: Operation not permitted

Detail Errors:
1. delete_ref_loose (refs.c) -> unlink_or_msg (wrapper.c) -> "unable to unlink %s: 
%s"
2. move_temp_to_file (sha1_file.c ) -> “unable to write sha1 filename %s: %s”

2. git pull example

Textual git:master $ git pull
remote: Counting objects: 435, done.
remote: Compressing objects: 100% (398/398), done.
remote: Total 435 (delta 219), reused 18 (delta 12)
Receiving objects: 100% (435/435), 1.22 MiB | 756.00 KiB/s, done.
Resolving deltas: 100% (219/219), done.
warning: unable to unlink .git/objects/pack/tmp_pack_vDaIZa: Operation not 
permitted
error: unable to write sha1 filename 
.git/objects/pack/pack-977a2dc0f4be3996dc1186e565a30d55d14b5e87.pack: Operation 
not permitted

I'm somewhat unsure how this is connected to 0444 ?

It seems as if you don't have write permissions for some reasons.
(on the higher directory), what does
ls -ld  .git/objects/pack/
ls -ld  .git/objects/
give ?

can you run
rm .git/objects/pack/pack-977a2dc0f4be3996dc1186e565a30d55d14b5e87.pack

on the command line ?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-threaded 'git clone'

2015-02-16 Thread Martin Fick
There currently is a thread on the Gerrit list about how much faster cloning 
can be when using Gerrit/jgit GCed packs with bitmaps versus C git GCed packs 
with bitmaps.

Some differences outlined are that jgit seems to have more bitmaps, it creates 
one for every refs/heads, is C git doing that?  Another difference seems to be 
that jgit creates two packs, splitting stuff not reachable from refs/heads into 
its own pack.  This makes a clone have zero CPU server side in the pristine 
case.  In the Gerrit use case, this second "unreachable" packfile can be 
sizeable, I wonder if there are other use cases where this might also be the 
case (and this slowing down clones for C git GCed repos)?

If there is not a lot of parallelism left to squeak out, perhaps a focus with 
better returns is trying to do whatever is possible to make all clones (and 
potentially any fetch use case deemed important on a particular server) have 
zero CPU?  Depending on what a server's primary mission is, I could envision 
certain admins willing to sacrifice significant amounts of disk space to speed 
up their fetches.  Perhaps some more extreme thinking (such as what must have 
led to bitmaps) is worth brainstorming about to improve server use cases?

What if an admin were willing to sacrifice a packfile for every use case he 
deemed important, could git be made to support that easily?  For example, maybe 
the admin considers a clone or a fetch from master to be important, could zero 
percent CPU be achieved regularly for those two use cases?  Cloning is possible 
if the repository were repacked in the jgit style after any push to a head.  Is 
it worth exploring ways of making GC efficient enough to make this feasible?  
Can bitmaps be leveraged to make repacking faster?  I believe that at least 
reachability checking could potentially be improved with bitmaps? Are there 
potentially any ways to make better deltification reuse during repacking (not 
bitmap related), by somehow reversing or translating deltas to new objects that 
were just received, without actually recalculating them, but yet still getting 
most objects deltified against the newest objects (achieving the same packs as 
git GC would achieve today, but faster)? What other pieces need to be improved 
to make repacking faster?

As for the single branch fetch case, could this somehow be improved by 
allocating one or more packfiles to this use case?  The simplest single branch 
fetch use case is likely someone doing a git init followed by a single branch 
fetch.  I think the android repo tool can be used in this way, so this may 
actually be a common use case?  With a packfile dedicated to this branch, git 
should be able to just stream it out without any CPU.  But I think git would 
need to know this packfile exists to be able to use it.  It would be nice if 
bitmaps could help here, but I believe bitmaps can so far only be used for one 
packfile.  I understand that making bitmaps span multiple packfiles would be 
very complicated, but maybe it would not be so hard to support bitmaps on 
multiple packfiles if each of these were "self contained"?  By self contained I 
mean that all objects referenced by objects in the packfile were contained in 
that packfile.

What other still unimplemented caching techniques could be used to improve 
clone/fetch use cases? 

- Shallow clones (dedicate a special packfile to this, what about another 
bitmap format that only maps objects in a single tree to help this)?

- Small fetches (simple branch FF updates), I suspect these are fast enough, 
but if not, maybe caching some thin packs (that could result in zero CPU 
requests for many clients) would be useful?  Maybe spread these out 
exponentially over time so that many will be available for recent updates and 
fewer for older updates?  I know git normally throws away thin packs after 
receiving them and resolving them, but if it kept them around (maybe in a 
special directory), it seems that they could be useful for updating other 
clients with zero CPU?  A thin pack cache might be something really easy to 
manage based on file timestamps, an admin may simply need to set a max cache 
size.  But how can git know what thin packs it has, and what they would be 
useful for, name them with their start and ending shas?

Sorry for the long winded rant. I suspect that some variation of all my 
suggestions have already been suggested, but maybe they will rekindle some 
older, now useful thoughts, or inspire some new ones.  And maybe some of these 
are better to pursue then more parallelism?

-Martin

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a 
Linux Foundation Collaborative ProjectOn Feb 16, 2015 8:47 AM, Jeff King 
 wrote:
>
> On Mon, Feb 16, 2015 at 07:31:33AM -0800, David Lang wrote: 
>
> > >Then the server streams the data to the client. It might do some light 
> > >work transforming the data as it comes off the disk, but

Potential Bug: git merge overwrites uncommitted changes

2015-02-16 Thread Martin Maas
Hi all,

Teaching a university class using git, we encountered a case of
potentially incorrect behavior when students made changes to a file,
forgot to commit them, and then pulled a new one-line change from our
remote repository. This resulted in the uncommitted changes being
overwritten without a warning.

To our understanding, the expected behavior should have been a warning
that uncommitted files are being overwritten, or an auto-merge that
preserves the changes in the uncommitted files. Instead, git shows the
merge as a single-line change, while in reality discarding potentially
a large number of uncommitted lines.

I have attached a script that replicates this behavior -- we have been
able to replicate the problem with git versions 1.9.1, 1.9.3 and
2.2.2. Please let us know whether this is a bug, or whether this is
the intended behavior.

It appears that it is only this specific sequence of commands that
causes the behavior. Across a range of small modifications to the
sequence of steps, the behavior is as expected:

* using git cp and dropping the add step -> modifications preserved
* not doing commit 2 (or its changes) -> modifications live in new file
* any change in branch use other than test first master second ->
modifications preserved

(Asking colleagues, one of them pointed me to the following article
which describes a potentially related problem that appears to have
been fixed in 1.7.7:
http://benno.id.au/blog/2011/10/01/git-recursive-merge-broken)

Thanks,
Martin

---

git-test.sh: Run in a clean directory!

#!/bin/bash

# Replicate the two different states of hw2_starter, before and after
our update.
git clone g...@github.com:cs61c-spring2015/hw2_starter.git
cd hw2_starter
git checkout f8a2e4418b4c370921790d1dfd1b6f9761262d4a
git checkout -b test
cd ..

# Set up the cs61c-xx repository, and fetch "first" hw2_starter.
mkdir cs61c-test
cd cs61c-test
git init
git remote add hw2_starter ../hw2_starter/
git fetch hw2_starter
git merge hw2_starter/test -m "add hw2 changes"

# Perform a series of commits that some students accidentally did.
mkdir hw2
git mv hw1 hw2
git commit -a -m "commit 1"

cp hw2/hw1/* hw2/
git add hw2/*
git commit -a -m "commit 2"

# Now we make changes to the beargit.c file, but don't commit them.
for i in `seq 1 5`; do
echo "beargitcode line $i" >> hw2/beargit.c
done

echo
echo
echo " *** SHOULD HAVE A LINE CONTAINING beargitcode IN beargit.c ***"
echo
echo "CONTENT OF beargit.c | grep beargitcode BEFORE MERGE:"
cat hw2/beargit.c | grep beargitcode
echo "[EOF]"
echo

# Now we fetch the update, as we told students
git fetch hw2_starter
git merge hw2_starter/master -m "add hw2 changes"

# This should fail, because there are uncommitted changes!!!

echo
echo "CONTENT OF beargit.c | grep beargitcode AFTER MERGE:"
cat hw2/beargit.c | grep beargitcode
echo "[EOF]"
echo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pack v4 again..

2015-02-16 Thread Shawn Pearce
On Sun, Feb 15, 2015 at 10:45 PM, Jeff King  wrote:
> On Sun, Feb 15, 2015 at 11:59:02PM -0500, Nicolas Pitre wrote:
>
>> Yet, I think the biggest problem with pack v4 at the moment is the
>> packing algorithm for tree objects.  We are piggy-backing on the pack v2
>> object delta compression sorting and that produces suboptimal results
>> due to deep recursions.  And it is the recursion that kills us. The pack
>> v4 requires a new packing algorithm for its tree objects.
>>
>> What I imagined is something like this:
>>
>> - Each canonical tree entry is made of a SHA1, mode and path.  Let's
>>   assume this is hashed into a 24-bit value.
>>
>> - Each tree object can therefore be represented as a string of 24-bit
>>   "characters".
>>
>> - Delta-compressing a tree object becomes a substring search where we
>>   try to replace a sequence of "characters" with the longest "string"
>>   possible from another object.  Repeat with the remaining sequences.
>
> Somewhat related to this, I was playing this weekend with the idea of
> generating fast tree diffs from our on-disk deltas. That is, given a
> base tree and a binary delta against it, could I reliably reproduce a
> diff (one way or the other) in O(size of diff), rather than O(size of
> tree)?

Yes, if you always make the tree diff *search* on entry boundaries.

> The conclusion I came to was "no, you cannot do it in the general case
> of byte-wise binary diffs"[2].

This is also why you cannot binary search inside of the canonical tree
format. :(

> If we knew that our deltas were always produced on entry-boundaries (a
> "character" in your description above), this would be much simpler.

Eons ago Nico and I were of the opinion that pack v4 trees could use
the existing byte based delta format on disk, but the delta
search/encoder would always align to fixed width entry boundaries.
That gives you deltas that are understandable by the current decoder,
but are also trivially processed in delta format as insertions and
copy runs always cover complete entries and are never a partial entry.

It was all theory; we never actually wrote a prototype of that.

> [1] Of course there are other reachability checks besides packing, like
> git-prune. But all of those are even better sped up by using
> reachability bitmaps. Packing, as the place where we generate the
> bitmaps, and which is sensitive to things like traversal order,
> remains the place where we will always need to actually walk.

`git log -- $path` isn't trivially improved with reachability bitmaps.
Its something we have been pondering a lot at $DAY_JOB and haven't
found a magic bullet solution for yet. Until someone comes up with
another chunk of magic, we need tree diffs for a lot of operations.

> [2] One option, of course, is to generate byte-wise deltas, but with a
> promise to always align them on entry boundaries. I'm tempted by
> this, because the result would be readable by existing packv2
> readers. We'd have to set a flag somewhere that indicates the pack
> was written with this property, though.

Yes, that was always something we wanted to look at doing.

> [3] I suspect you could come up with some heuristic that finds tree
> entry boundaries with high probability, and in the low probability
> case does not produce a wrong answer, but instead has to walk all
> the way back to the beginning of the tree. That would be fine here.
> But frankly, this "walk backwards" thing was just the straw that
> broke the camel's back for me in working on this. Handling all the
> possible cases was ending up quite complicated.

No, I tried this in JGit once. You can't do it reliably enough.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: odb_mkstemp's 0444 permission broke write/delete access on AFP

2015-02-16 Thread Fairuzan Roslan

> On Feb 17, 2015, at 3:08 AM, Matthieu Moy  
> wrote:
> 
> [ Please, don't top post on this list ]
> 
> Fairuzan Roslan  writes:
> 
>> I don’t see the issue for the owner of his/her own file to have write
>> access.
> 
> Object and pack files are not meant to be modified. Hence, they are
> read-only so that an (accidental) attempt to modify them fails.
> 
>> Setting tmp idx & pack files to read-only even for the file owner is
>> not a safety feature.
> 
> Yes it is. If you do not think so, then please give some arguments.
> 
>> You should at least give the user the option to set the permission in
>> the config file and not hardcoded the permission in the binary.
> 
> This is the kind of thing I meant by "investigate alternate solutions".
> I have no AFP share to test, so it would help if you answered the
> question I asked in my previous message:
> 
>>> On Feb 17, 2015, at 2:23 AM, Matthieu Moy  
>>> wrote:
>>> 
>>> Fairuzan Roslan  writes:
>>> 
 Hi,
 
 Somehow the “int mode = 0444;” in odb_mkstemp (environment.c) are
 causing a lot of issues (unable to unlink/write/rename) to those
 people who use AFP shares.
>>> 
>>> Is it a problem when using Git (like "git gc" failing to remove old
>>> packs), or when trying to remove files outside Git?
> 
> (BTW, why did you try to write/rename pack files?)
> 
> --
> Matthieu Moy
> http://www-verimag.imag.fr/~moy/

I think its easier if I just show you…

OS : OS X 10.10.0 - 10.10.2
Client :  git version 1.9.3 (Apple Git-50) and git version 2.2.1
AFP share : //user@hostname._afpovertcp._tcp.local/installer on 
/Volumes/installer (afpfs, nodev, nosuid, mounted by user)

1. git clone example

$ git clone https://github.com/robbyrussell/oh-my-zsh.git
Cloning into 'oh-my-zsh'...
remote: Counting objects: 11830, done.
remote: Total 11830 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (11830/11830), 2.12 MiB | 481.00 KiB/s, done.
Resolving deltas: 100% (6510/6510), done.
warning: unable to unlink 
/Volumes/installer/oh-my-zsh/.git/objects/pack/tmp_pack_zjPxuc: Operation not 
permitted
error: unable to write sha1 filename 
/Volumes/installer/oh-my-zsh/.git/objects/pack/pack-cceafdc9ef02bc58844138ba543ec6cc38252bb1.pack:
 Operation not permitted
fatal: cannot store pack file
fatal: index-pack failed

$ ls -l oh-my-zsh/.git/objects/pack
total 5008
-rw---  1 user  staff   32 Feb 17 09:59 
pack-cceafdc9ef02bc58844138ba543ec6cc38252bb1.keep
-r--r--r--  1 user  staff   332312 Feb 17 09:59 tmp_idx_oUN1sb
-r--r--r--  1 user  staff  2223007 Feb 17 09:59 tmp_pack_zjPxuc

$ rm -rf oh-my-zsh/.git/objects/pack/tmp_*
rm: oh-my-zsh/.git/objects/pack/tmp_idx_oUN1sb: Operation not permitted
rm: oh-my-zsh/.git/objects/pack/tmp_pack_zjPxuc: Operation not permitted

Detail Errors:
1. delete_ref_loose (refs.c) -> unlink_or_msg (wrapper.c) -> "unable to unlink 
%s: %s"
2. move_temp_to_file (sha1_file.c ) -> “unable to write sha1 filename %s: %s”

2. git pull example

Textual git:master $ git pull
remote: Counting objects: 435, done.
remote: Compressing objects: 100% (398/398), done.
remote: Total 435 (delta 219), reused 18 (delta 12)
Receiving objects: 100% (435/435), 1.22 MiB | 756.00 KiB/s, done.
Resolving deltas: 100% (219/219), done.
warning: unable to unlink .git/objects/pack/tmp_pack_vDaIZa: Operation not 
permitted
error: unable to write sha1 filename 
.git/objects/pack/pack-977a2dc0f4be3996dc1186e565a30d55d14b5e87.pack: Operation 
not permitted
fatal: cannot store pack file
fatal: index-pack failed

Textual git:master $ ls -l .git/objects/pack/tmp_*
-r--r--r--  1 user  staff13252 Feb 17 10:51 .git/objects/pack/tmp_idx_uhnicb
-r--r--r--  1 user  staff  1275487 Feb 17 10:51 
.git/objects/pack/tmp_pack_vDaIZa

= Same explanation as git clone example

3. git gc example

Textual git:master $ git gc
Counting objects: 49691, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (11347/11347), done.
fatal: unable to rename temporary pack file: Operation not permitted
error: failed to run repack

Textual git:master $ ls -l .git/objects/pack/tmp_*
-r--r--r--  1 user  staff   1392420 Feb 17 10:58 
.git/objects/pack/tmp_idx_77nr1b
-r--r--r--  1 user  staff  96260304 Feb 17 10:58 
.git/objects/pack/tmp_pack_RlAZc9

Detail Error:
1. finish_tmp_packfile (pack-write.c) -> die_errno(“unable to rename temporary 
pack file”);


If you insist on setting the tmp idx & pack file permission to 0444 at least 
give it a u+w permission whenever you try to unlink and rename it so it won’t 
fail.

Regards,
Fairuzan







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Multi-threaded 'git clone'

2015-02-16 Thread Shawn Pearce
On Mon, Feb 16, 2015 at 10:43 AM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> ... And the whole output is checksummed by a single sha1
>> over the whole stream that comes at the end.
>>
>> I think the most feasible thing would be to quickly spool it to a
>> server on the LAN, and then use an existing fetch-in-parallel tool
>> to grab it from there over the WAN.
>
> One possibility would be for the server to prepare a static bundle
> file to bootstrap all the "clone" clients with and publish it on a
> CDN.  A protocol extension would tell the client where to download
> the bundle from, the client can then grab the bundle to clone from
> it to become "slightly stale but mostly up to date", and then do a
> usual incremental update with the server after that to be complete.
>
> The server would update the bundle used to bootstrap clone clients
> periodically in order to keep the incrementals to the minimum, and
> would make sure their bitmap is anchored at the tips of bundles to
> minimize the object counting load during the incremental phase.
>
> I think "repo" used by folks who do AOSP does something similar to
> that by scripting around "git clone".  I'd imagine that they would
> be happy to see if "git clone" did all that inside.

Yes, the "repo" tool used by Android uses curl to download a
previously cached $URL/clone.bundle using resumable HTTP. For Android
the file is only updated ~every 6 months at major releases and is
easily cached by CDNs and HTTP proxy servers.

This is spooled to a temporary file on disk then unpacked using `git
fetch $path/clone.bundle refs/heads/*:refs/remotes/origin/*`.
Afterwards a normal git fetch is run to bring the new clone current
with the server, picking up any delta that happened since the bundle
was created and cached.

The Android Git servers at android.googlesource.com just recognize
*/clone.bundle GET requests and issue 302 redirects to the CDN farm
that actually stores and serves the precreated bundle files.

We really want to see this in stock git clone for HTTP transports, as
other projects like Chromium want to use it for their ~3 GiB
repository. Being able to build the bulk of the repo every few months
and serve it out using a CDN to bootstrap new clients would really
help developers on slower or flaky network connections.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-threaded 'git clone'

2015-02-16 Thread Jeff King
On Tue, Feb 17, 2015 at 06:16:39AM +0700, Duy Nguyen wrote:

> On Mon, Feb 16, 2015 at 10:47 PM, Jeff King  wrote:
> > Each clone generates the pack on the fly
> > based on what's on disk and streams it out. It should _usually_ be the
> > same, but there's nothing to guarantee byte-for-byte equality between
> > invocations.
> 
> It's usually _not_ the same. I tried when I wanted to produce stable
> packs. The first condition is single-threaded pack-objects. Otherwise
> thread scheduler could make object order unpredictable.

True. If you keep your server repositories fully packed, that eliminates
the delta search (and/or makes it feasible to turn pack.threads to 1 to
make it deterministic). But any change in the repository (e.g., somebody
else pushing, even to a ref you are not fetching) can cause unexpected
changes in the bytes.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should "git log --decorate" indicate whether the HEAD is detached?

2015-02-16 Thread Julien Cretel
On Mon, Feb 16, 2015 at 11:15 PM, Junio C Hamano  wrote:
> Julien Cretel  writes:
>
>> As of Git 2.3.0, the output of "git log --decorate" is ambiguous as to
>> whether the HEAD is detached or not.
>
> It sounds as if you are reporting some regression, but has any
> version of Git ever done so, or is this just a new feature that
> does not exist yet?

Apologies; I should have explained myself better. I'm not reporting a
regression;
as far as I can tell, "git log --decorate" has always been ambiguous
in that way.

>
>> More specifically, consider the following output of "git log --decorate":
>>
>> 4d860e9 (HEAD, master, dev) Remove trailing whitespace
>>
>> Whether the HEAD is attached to master or detached, the output is the same.
>> Could/should "git log --decorate" be modified to provide this information?
>> Perhaps something along the lines of
>>
>> 4d860e9 (HEAD -> master, dev) Remove trailing whitespace
>>
>> or
>>
>> 4d860e9 (HEAD = master, dev) Remove trailing whitespace
>>
>
> I personally do not see a need for such a differenciation.  Why does
> one even need to know, and is it worth the cost of computing at the
> runtime?

I believe the "--decorate" flag to be quite popular. I personally like to run
"git log --decorate --graph --oneline --all" to quickly get an idea of the state
of a repo. In my experience, many users do the same, to the point that they
feel the need to define an alias for this command; see the top answers to
http://stackoverflow.com/q/1057564/2541573.

My problem with the current output of "git log --decorate" is the asymmetry,
so to speak. If the HEAD is detached but pointing at a commit that isn't any
branch's tip, then the user can be sure the HEAD detached; however, if at
least one branch points to the current commit, there is no way to tell.

I must admit I haven't given much thought about the cost involved, but I can't
imagine performance would take a big hit. Would it?

>
> Most of the time when I am on detached HEAD it is either a few
> commits behind a tip, or a few commits ahead of a tip.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitk: Remove tcl-format flag from a message that shouldn't have it

2015-02-16 Thread Alex Henrie
2015-02-09 14:55 GMT-07:00 Junio C Hamano :
>
> Alex Henrie  writes:
>
> > This is just a friendly reminder that this patch has been sitting in
> > the mailing list archives for a couple of weeks, and it has not yet
> > been accepted or commented on.
>
> I think that is because the message was not sent to the right
> people, and also because the patch was made against a wrong project
> ;-).
>
> I'll forward it to the gitk maintainer after digging it out of the
> archive and tweaking it.  Thanks.
>
> Paul, comments?

Another week and still no comments on either this patch or the gitk
Catalan translation patch. Is Paul Mackerras still actively involved
in the project?

-Alex
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-threaded 'git clone'

2015-02-16 Thread Duy Nguyen
On Mon, Feb 16, 2015 at 10:47 PM, Jeff King  wrote:
> Each clone generates the pack on the fly
> based on what's on disk and streams it out. It should _usually_ be the
> same, but there's nothing to guarantee byte-for-byte equality between
> invocations.

It's usually _not_ the same. I tried when I wanted to produce stable
packs. The first condition is single-threaded pack-objects. Otherwise
thread scheduler could make object order unpredictable.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should "git log --decorate" indicate whether the HEAD is detached?

2015-02-16 Thread Junio C Hamano
Julien Cretel  writes:

> As of Git 2.3.0, the output of "git log --decorate" is ambiguous as to
> whether the HEAD is detached or not.

It sounds as if you are reporting some regression, but has any
version of Git ever done so, or is this just a new feature that
does not exist yet?

> More specifically, consider the following output of "git log --decorate":
>
> 4d860e9 (HEAD, master, dev) Remove trailing whitespace
>
> Whether the HEAD is attached to master or detached, the output is the same.
> Could/should "git log --decorate" be modified to provide this information?
> Perhaps something along the lines of
>
> 4d860e9 (HEAD -> master, dev) Remove trailing whitespace
>
> or
>
> 4d860e9 (HEAD = master, dev) Remove trailing whitespace
>

I personally do not see a need for such a differenciation.  Why does
one even need to know, and is it worth the cost of computing at the
runtime?

Most of the time when I am on detached HEAD it is either a few
commits behind a tip, or a few commits ahead of a tip.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Should "git log --decorate" indicate whether the HEAD is detached?

2015-02-16 Thread Julien Cretel
As of Git 2.3.0, the output of "git log --decorate" is ambiguous as to
whether the HEAD is detached or not.
More specifically, consider the following output of "git log --decorate":

4d860e9 (HEAD, master, dev) Remove trailing whitespace

Whether the HEAD is attached to master or detached, the output is the same.
Could/should "git log --decorate" be modified to provide this information?
Perhaps something along the lines of

4d860e9 (HEAD -> master, dev) Remove trailing whitespace

or

4d860e9 (HEAD = master, dev) Remove trailing whitespace

in case HEAD is attached to master, and the current output if it is
detached? Any thoughts?
(For information, I asked a related question on Stack Overflow a few
months ago: http://stackoverflow.com/q/25392580/2541573.)

Also, if such a feature is desirable, would its implementation be a
suitable "microproject" for a prospective GSoC-2015 applicant?

Jubobs
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] migrate api-strbuf.txt into strbuf.h

2015-02-16 Thread Junio C Hamano
Jeff King  writes:

>> Is there a general concensus on the direction?
>> 
>> I am inclined to merge this to 'next', if there is a general
>> understanding that we will try to make the headers _the_ single
>> source of truth of the API by (1) not adding to api-*.txt without
>> describing new things in the headers and (2) moving things from
>> api-*.txt to corresponding headers when clarifying, fixing or
>> updating the API.
>
> I'm fine with that (unsurprisingly), but I would like to hear an "OK"
> from Jonathan before going ahead.

OK.  Jonathan?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] send-email: ask confirmation if given encoding name is very short

2015-02-16 Thread Junio C Hamano
Sometimes people respond "y" (or "yes") when asked
this question:

Which 8bit encoding should I declare [UTF-8]?

We already have a mechanism to avoid accepting a mistyped e-mail
address (we ask to confirm when the given address lacks "@" in it);
reuse it to trigger the same confirmation when given a very short
answer.  As a typical charset name is probably at least 4 chars or
longer (e.g. "UTF8" spelled without the dash, or "Big5"), this would
prevent such a mistake.

Signed-off-by: Junio C Hamano 
---

 * Will mark to be merged to 'next'.

 git-send-email.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/git-send-email.perl b/git-send-email.perl
index fdb0029..eb32371 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -733,6 +733,7 @@ sub file_declares_8bit_cte {
print "$f\n";
}
$auto_8bit_encoding = ask("Which 8bit encoding should I declare 
[UTF-8]? ",
+ valid_re => qr/.{4}/, confirm_only => 1,
  default => "UTF-8");
 }
 
-- 
2.3.0-282-gf18c841

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Win32: nanosecond-precision file times

2015-02-16 Thread Junio C Hamano
Karsten Blees  writes:

> However, the Makefile has this to say on the subject:
>
> # Define USE_NSEC below if you want git to care about sub-second file mtimes
> # and ctimes. Note that you need recent glibc (at least 2.2.4) for this, and
> # it will BREAK YOUR LOCAL DIFFS! show-diff and anything using it will likely
> # randomly break unless your underlying filesystem supports those sub-second
> # times (my ext3 doesn't).
>
> Am I missing something?

I think "it would break" is about show-diff which wanted to use the
cached stat information for freshness.

>foo
git update-index --add foo
sleep 2
>foo
git diff-files ;# modern counterpart of show-diff

would say that "foo" is *different*, because the plumbing commands
like diff-files expect you to refresh the index before you call
them.

And if you did "git update-index --refresh" after touching "foo" the
last time before running "git diff-files" in the above sequence, you
should expect that it does not say "foo" is different, no matter how
much time passes between the time you run that "refresh" and
"diff-files" (or between the time you last touched "foo" and you run
"refresh", for that matter), as long as you do not touch "foo" in
the meantime.  The following should say "foo" is *not* different,
that is:

>foo
git update-index --add foo
sleep 2
>foo
sleep arbitrary
git update-index --refresh
sleep arbitrary
git diff-files ;# modern counterpart of show-diff

If you use NSEC, however, and "refresh" grabbed a subsecond time and
then later "diff-files" learned a truncated/rounded time because the
filesystem later purged the cached inodes and re-read it from the
underlying filesystem with no subsecond time resolution, the times
would not match so you will again see "diff-files" report that "foo"
is now different.

That is what the comment you cited is about.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/24] dir.c: optionally compute sha-1 of a .gitignore file

2015-02-16 Thread Junio C Hamano
Duy Nguyen  writes:

> On Thu, Feb 12, 2015 at 4:23 AM, Junio C Hamano  wrote:
> ...
>> If you want to detect the content changes across working tree, index
>> and the tree objects by reusing hash_sha1_file(), however, you must
>> not feed the checked out (aka "smudged") representation to it.
>> You'd need to turn it into "cleaned" representation by doing the
>> equivalent of calling index_path().  Some helpers in the callchain
>> that originates from index_path() might directly be reusable for
>> your purpose.
>
> Urgh.. you're right this test would fail when some filters are
> involved. I'm not sure if we want to check the cleaned version though.
> What matters to exclude machinery is the checkout version. 

Oh, I wouldn't suggest getting lines from the cleaned version.  It
is just that you must hash the cleaned version if you want to decide
"Ah, the content is different from what the internal cache is based
on, so I need to invalidate my cache" and "Because the version I
have on the filesystem matches what is in the index, which is what
my cache is based on, I would use my cached version".  The latter
would break (i.e. the signature would not match when it should) and
you end up invalidating the cache when you do not have to.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Win32: nanosecond-precision file times

2015-02-16 Thread Karsten Blees
Am 13.02.2015 um 20:28 schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> Am 13.02.2015 um 00:38 schrieb Junio C Hamano:
>>>
>>> We do have sec/nsec fields in cache_time structure, so I have
>>> nothing against updating the msysGit port to fill that value.
> 
> Having said that, we do not enable the NSEC stuff by default on Unix
> for a reason.  I'd expect those who know Windows filesystems well to
> pick the default there wisely ;-)
> 

Now I'm a bit confused about the discrepancy between racy-git.txt and
the Makefile.

Racy-git.txt explains that the nsec-part may be dropped when an inode
is flushed to disk if the file system doesn't support nsec resolution.
This was supposedly an issue with the Linux kernel fixed back in 2005.

In my understanding, this means that git would see the file as
changed and re-check the content (i.e. it will hurt performance).

IOW: Git may be slow if the file system cache has better file time
resolution than the on-disk file system representation.


However, the Makefile has this to say on the subject:

# Define USE_NSEC below if you want git to care about sub-second file mtimes
# and ctimes. Note that you need recent glibc (at least 2.2.4) for this, and
# it will BREAK YOUR LOCAL DIFFS! show-diff and anything using it will likely
# randomly break unless your underlying filesystem supports those sub-second
# times (my ext3 doesn't).

Am I missing something? Is there anything in Git that will actually
"break" with USE_NSEC if the OS / file system doesn't support it
(rather than just being slow)?

History:
* The Makefile comment was added in 2005 (bdd4da59), along with a
  comment in read-cache.c explaining the issue (i.e. flushing to disk
  will clear the nsec field).
* The comment in read-cache.c was removed in 2008 (7a51ed66),
  seemingly dropping USE_NSEC support entirely.
* USE_NSEC support was re-added (without the read-cache.c comment) in
  2009 (fba2f38a).


Regarding the Windows situation: I've just verified (on my Win7 x64
box) that file times obtained through a variety of APIs (GetFileTime,
GetFileAttributesEx, GetFileInformationByHandle, FindFirstFile) are
consistent and properly rounded to the file system's resolution (e.g.
10ms / 2s for FAT). This is even if the file is still open and I try
to SetFileTime() to unrounded values.

So I think enabling USE_NSEC should be fine on Windows.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: odb_mkstemp's 0444 permission broke write/delete access on AFP

2015-02-16 Thread Torsten Bögershausen
On 16.02.15 20:06, Junio C Hamano wrote:
> Matthieu Moy  writes:
> 
>> The issue is that having object and pack files read-only on the
>> filesystem is a safety feature to prevent accidental modifications (even
>> though it's actually not that effective, since brute-force "sed -i" or
>> "perl -i" still accept to modify read-only files).
> 
> I did not see it as a "safety feature", and instead saw it as a
> reminder to me that I am not supposed to write into them when I
> check them with "ls -l".
> 
>> So, I'd be a bit reluctant to remove this safety feature for all users
>> if it's only for the benefit of a minority of users. Not that I think
>> the problem shouldn't be fixed, but I'd rather investigate alternate
>> solutions before using this mode = 0644.
> 
> I fully agree with you that this should not be unconditional.
> However, I am not sure if there is an effective workaround to a
> filesystem that pays attention to the mode bits of the file when
> doing an operation on the directory the file is sitting within.  It
> may be OK to introduce a new configuration variable, perhaps call it
> core.brokenFileSystemNeedsWritableFile or something, and probe and
> enable it inside init_db().
> 
> But I suspect that the single "mode = 0444" under discussion may not
> cover all places we create files, as the assumption that the we get
> a sane semantics from the filesystem permeates throughout the code.
> 
> What other glitches does AFP have?  Does it share Windows' "you
> cannot rename a file you have an open file descriptor on?"  Anything
> else?

May I ask which OS you have on the server side, and which on the client side?

I'm aware that Mac OS "speaks" AFP, but even Linux can do and there is
SW which enables AFP on  a Windows machine (all this is server side).

As a client we may have Mac OS, Linux (not sure if Windows can use APF)
What do you use ?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] request-pull: do something if $3 is passed

2015-02-16 Thread Junio C Hamano
Paolo Bonzini  writes:

> From: Paolo Bonzini 
>
> After updating to git 2.3.0, "git request-pull" is stubbornly complaining
> that I lack a matching tag on the remote side unless I pass the third
> argument.  But I did prepare and push a signed tag.

A few questions.

 - what old version did you update from?  I think the "correct
   over-eager dwimming" change was from v2.0 days.

 - what exactly do you mean by "stubbornly complain"?  I think we
   say something about HEAD not matching the HEAD over there, which
   I think is bogus (we should instead say things about the branch
   you are on and the branch over there with the same name) and is
   worth fixing.

> This looks like a bug to me; when $3 is not passed git will try to use
> "HEAD" as the default but it cannot be resolved to a tag, neither locally
> (patch 2) nor remotely (patch 3).

An earlier 024d34cb (request-pull: more strictly match local/remote
branches, 2014-01-22) deliberately disabled over-eager DWIMming when
the $3-rd argument _is_ given.  It didn't say much about what should
happen when it is missing.

I am torn about your changes.

One part of me feel that not giving the $3-rd argument should behave
the same way as if you gave the name of the current branch as the
$3-rd argument.  DWIMming from local HEAD to a local branch name
(e.g. 'master') may be OK and necessary (I already said it is worth
fixing above).  But we should not be resurrecting the over-eager
DWIMming from that point---not from a local branch name to a tag
that points at it, which was what 024d34cb wanted to forbid.

On the other hand, I can also understand (not necessarily agree
with) a view that not giving the $3-rd argument is an explicit
user's wish to us to DWIM as much as we want.  But again, that
directly contradicts with the desire of 024d34cb.

So,... I dunno.

I'd be more comfortable if 2/3 and 3/3 were replaced with something
like "do not ask HEAD to be pulled, but always require a specific
ref to be pulled", by dereferencing HEAD locally to a branch name,
and behave as if that name was given to $3 from the command line,
without doing any other changes (like turning that branch name that
was implicitly given into a tag that happens to point at it).

Thanks.

>
> Patch 1 is a simple testcase fix.
>
> Paolo
>
> Paolo Bonzini (3):
>   request-pull: fix expected format in tests
>   request-pull: use "git tag --points-at" to detect local tags
>   request-pull: find matching tag or branch name on remote side
>
>  git-request-pull.sh | 15 +++
>  t/t5150-request-pull.sh |  5 ++---
>  2 files changed, 13 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: odb_mkstemp's 0444 permission broke write/delete access on AFP

2015-02-16 Thread Matthieu Moy
[ Please, don't top post on this list ]

Fairuzan Roslan  writes:

> I don’t see the issue for the owner of his/her own file to have write
> access.

Object and pack files are not meant to be modified. Hence, they are
read-only so that an (accidental) attempt to modify them fails.

> Setting tmp idx & pack files to read-only even for the file owner is
> not a safety feature.

Yes it is. If you do not think so, then please give some arguments.

> You should at least give the user the option to set the permission in
> the config file and not hardcoded the permission in the binary.

This is the kind of thing I meant by "investigate alternate solutions".
I have no AFP share to test, so it would help if you answered the
question I asked in my previous message:

>> On Feb 17, 2015, at 2:23 AM, Matthieu Moy  
>> wrote:
>> 
>> Fairuzan Roslan  writes:
>> 
>>> Hi,
>>> 
>>> Somehow the “int mode = 0444;” in odb_mkstemp (environment.c) are
>>> causing a lot of issues (unable to unlink/write/rename) to those
>>> people who use AFP shares.
>> 
>> Is it a problem when using Git (like "git gc" failing to remove old
>> packs), or when trying to remove files outside Git?

(BTW, why did you try to write/rename pack files?)

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: odb_mkstemp's 0444 permission broke write/delete access on AFP

2015-02-16 Thread Junio C Hamano
Matthieu Moy  writes:

> The issue is that having object and pack files read-only on the
> filesystem is a safety feature to prevent accidental modifications (even
> though it's actually not that effective, since brute-force "sed -i" or
> "perl -i" still accept to modify read-only files).

I did not see it as a "safety feature", and instead saw it as a
reminder to me that I am not supposed to write into them when I
check them with "ls -l".

> So, I'd be a bit reluctant to remove this safety feature for all users
> if it's only for the benefit of a minority of users. Not that I think
> the problem shouldn't be fixed, but I'd rather investigate alternate
> solutions before using this mode = 0644.

I fully agree with you that this should not be unconditional.
However, I am not sure if there is an effective workaround to a
filesystem that pays attention to the mode bits of the file when
doing an operation on the directory the file is sitting within.  It
may be OK to introduce a new configuration variable, perhaps call it
core.brokenFileSystemNeedsWritableFile or something, and probe and
enable it inside init_db().

But I suspect that the single "mode = 0444" under discussion may not
cover all places we create files, as the assumption that the we get
a sane semantics from the filesystem permeates throughout the code.

What other glitches does AFP have?  Does it share Windows' "you
cannot rename a file you have an open file descriptor on?"  Anything
else?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-threaded 'git clone'

2015-02-16 Thread Junio C Hamano
Jeff King  writes:

> ... And the whole output is checksummed by a single sha1
> over the whole stream that comes at the end.
>
> I think the most feasible thing would be to quickly spool it to a
> server on the LAN, and then use an existing fetch-in-parallel tool
> to grab it from there over the WAN.

One possibility would be for the server to prepare a static bundle
file to bootstrap all the "clone" clients with and publish it on a
CDN.  A protocol extension would tell the client where to download
the bundle from, the client can then grab the bundle to clone from
it to become "slightly stale but mostly up to date", and then do a
usual incremental update with the server after that to be complete.

The server would update the bundle used to bootstrap clone clients
periodically in order to keep the incrementals to the minimum, and
would make sure their bitmap is anchored at the tips of bundles to
minimize the object counting load during the incremental phase.

I think "repo" used by folks who do AOSP does something similar to
that by scripting around "git clone".  I'd imagine that they would
be happy to see if "git clone" did all that inside.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: odb_mkstemp's 0444 permission broke write/delete access on AFP

2015-02-16 Thread Fairuzan Roslan
I don’t see the issue for the owner of his/her own file to have write access.
Setting tmp idx & pack files to read-only even for the file owner is not a 
safety feature.

The real issue here is that in AFP file system can’t even unlink or rename or 
delete the tmp idx and pack file with no write access after it done, while 
other file system like ext4,hfs+, etc can.

You should at least give the user the option to set the permission in the 
config file and not hardcoded the permission in the binary.

Regards,
Fairuzan


> On Feb 17, 2015, at 2:23 AM, Matthieu Moy  
> wrote:
> 
> Fairuzan Roslan  writes:
> 
>> Hi,
>> 
>> Somehow the “int mode = 0444;” in odb_mkstemp (environment.c) are
>> causing a lot of issues (unable to unlink/write/rename) to those
>> people who use AFP shares.
> 
> Is it a problem when using Git (like "git gc" failing to remove old
> packs), or when trying to remove files outside Git?
> 
>> The issue was first introduced in
>> https://github.com/git/git/blob/f80c7ae8fe9c0f3ce93c96a2dccaba34e456e33a/wrapper.c
>> line 284.
> 
> I don't think so. The code before this commit did essentially a chmod
> 444 on the file, so object files were already read-only before.
> 
> The pack files have been read-only since d83c9af5c6a437ddaa9dd27 (Junio,
> Apr 22 2007).
> 
>> To fix these issues the permission need to be adjusted to “int mode =
>> 0644;” in odb_mkstemp (environment.c)
> 
> The issue is that having object and pack files read-only on the
> filesystem is a safety feature to prevent accidental modifications (even
> though it's actually not that effective, since brute-force "sed -i" or
> "perl -i" still accept to modify read-only files).
> 
> So, I'd be a bit reluctant to remove this safety feature for all users
> if it's only for the benefit of a minority of users. Not that I think
> the problem shouldn't be fixed, but I'd rather investigate alternate
> solutions before using this mode = 0644.
> 
> --
> Matthieu Moy
> http://www-verimag.imag.fr/~moy/



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: odb_mkstemp's 0444 permission broke write/delete access on AFP

2015-02-16 Thread Matthieu Moy
Fairuzan Roslan  writes:

> Hi,
>
> Somehow the “int mode = 0444;” in odb_mkstemp (environment.c) are
> causing a lot of issues (unable to unlink/write/rename) to those
> people who use AFP shares.

Is it a problem when using Git (like "git gc" failing to remove old
packs), or when trying to remove files outside Git?

> The issue was first introduced in
> https://github.com/git/git/blob/f80c7ae8fe9c0f3ce93c96a2dccaba34e456e33a/wrapper.c
> line 284.

I don't think so. The code before this commit did essentially a chmod
444 on the file, so object files were already read-only before.

The pack files have been read-only since d83c9af5c6a437ddaa9dd27 (Junio,
Apr 22 2007).

> To fix these issues the permission need to be adjusted to “int mode =
> 0644;” in odb_mkstemp (environment.c)

The issue is that having object and pack files read-only on the
filesystem is a safety feature to prevent accidental modifications (even
though it's actually not that effective, since brute-force "sed -i" or
"perl -i" still accept to modify read-only files).

So, I'd be a bit reluctant to remove this safety feature for all users
if it's only for the benefit of a minority of users. Not that I think
the problem shouldn't be fixed, but I'd rather investigate alternate
solutions before using this mode = 0644.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] request-pull: do something if $3 is passed

2015-02-16 Thread Paolo Bonzini
From: Paolo Bonzini 

After updating to git 2.3.0, "git request-pull" is stubbornly complaining
that I lack a matching tag on the remote side unless I pass the third
argument.  But I did prepare and push a signed tag.

This looks like a bug to me; when $3 is not passed git will try to use
"HEAD" as the default but it cannot be resolved to a tag, neither locally
(patch 2) nor remotely (patch 3).

Patch 1 is a simple testcase fix.

Paolo

Paolo Bonzini (3):
  request-pull: fix expected format in tests
  request-pull: use "git tag --points-at" to detect local tags
  request-pull: find matching tag or branch name on remote side

 git-request-pull.sh | 15 +++
 t/t5150-request-pull.sh |  5 ++---
 2 files changed, 13 insertions(+), 7 deletions(-)

-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] request-pull: use "git tag --points-at" to detect local tags

2015-02-16 Thread Paolo Bonzini
From: Paolo Bonzini 

If the third argument is not passed, "git show-ref --tags HEAD" will
never return anything and git-request-pull will never detect a tag
name.

Instead, "git tag --points-at" can find it.  Use it if "git show-ref"
fails.

Signed-off-by: Paolo Bonzini 
---
 git-request-pull.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/git-request-pull.sh b/git-request-pull.sh
index d5500fd..a507006 100755
--- a/git-request-pull.sh
+++ b/git-request-pull.sh
@@ -58,6 +58,7 @@ pretty_remote=${remote#refs/}
 pretty_remote=${pretty_remote#heads/}
 head=$(git symbolic-ref -q "$local")
 head=${head:-$(git show-ref --heads --tags "$local" | cut -d' ' -f2)}
+head=${head:-$(git tag --points-at "$local" | sed 's,^,refs/tags/,')}
 head=${head:-$(git rev-parse --quiet --verify "$local")}
 
 # None of the above? Bad.
-- 
2.3.0


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] request-pull: find matching tag or branch name on remote side

2015-02-16 Thread Paolo Bonzini
From: Paolo Bonzini 

If the third argument is not passed to "git request-pull", the
find_matching_ref script will look for HEAD in the remote side
which does not work.  Instead, default to the ref names found
via "git show-ref" or "git tag".

Signed-off-by: Paolo Bonzini 
---
 git-request-pull.sh | 14 ++
 t/t5150-request-pull.sh |  2 +-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/git-request-pull.sh b/git-request-pull.sh
index a507006..fcbe383 100755
--- a/git-request-pull.sh
+++ b/git-request-pull.sh
@@ -54,8 +54,6 @@ fi
 local=${3%:*}
 local=${local:-HEAD}
 remote=${3#*:}
-pretty_remote=${remote#refs/}
-pretty_remote=${pretty_remote#heads/}
 head=$(git symbolic-ref -q "$local")
 head=${head:-$(git show-ref --heads --tags "$local" | cut -d' ' -f2)}
 head=${head:-$(git tag --points-at "$local" | sed 's,^,refs/tags/,')}
@@ -64,6 +62,14 @@ head=${head:-$(git rev-parse --quiet --verify "$local")}
 # None of the above? Bad.
 test -z "$head" && die "fatal: Not a valid revision: $local"
 
+#
+# If $3 was not there, the remote name should be the same
+# as the locally detected name
+#
+remote=${remote:-$head}
+pretty_remote=${remote#refs/}
+pretty_remote=${pretty_remote#heads/}
+
 # This also verifies that the resulting head is unique:
 # "git show-ref" could have shown multiple matching refs..
 headrev=$(git rev-parse --verify --quiet "$head"^0)
@@ -111,12 +117,12 @@ find_matching_ref='
}
 '
 
-ref=$(git ls-remote "$url" | @@PERL@@ -e "$find_matching_ref" 
"${remote:-HEAD}" "$headrev")
+ref=$(git ls-remote "$url" | @@PERL@@ -e "$find_matching_ref" "$remote" 
"$headrev")
 
 if test -z "$ref"
 then
echo "warn: No match for commit $headrev found at $url" >&2
-   echo "warn: Are you sure you pushed '${remote:-HEAD}' there?" >&2
+   echo "warn: Are you sure you pushed '$remote' there?" >&2
status=1
 fi
 
diff --git a/t/t5150-request-pull.sh b/t/t5150-request-pull.sh
index 8b19279..11ba8ff 100755
--- a/t/t5150-request-pull.sh
+++ b/t/t5150-request-pull.sh
@@ -178,7 +178,7 @@ test_expect_success 'request asks HEAD to be pulled' '
read repository &&
read branch
} http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] request-pull: fix expected format in tests

2015-02-16 Thread Paolo Bonzini
From: Paolo Bonzini 

"tag foo" in requests has been replaced with "tags/foo" (commit f032d66,
request-pull: do not emit "tag" before the tagname, 2011-12-19).  Adjust
the parsing script to match; since the new format does not have spaces,
doing nothing is fine.

Signed-off-by: Paolo Bonzini 
---
 t/t5150-request-pull.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/t/t5150-request-pull.sh b/t/t5150-request-pull.sh
index 82c33b8..8b19279 100755
--- a/t/t5150-request-pull.sh
+++ b/t/t5150-request-pull.sh
@@ -67,11 +67,10 @@ test_expect_success 'setup: two scripts for reading pull 
requests' '
 
cat <<-\EOT >read-request.sed &&
#!/bin/sed -nf
-   # Note that a request could ask for "tag $tagname"
+   # Note that a request could ask for "tags/$tagname"
/ in the git repository at:$/!d
n
/^$/ n
-   s/ tag \([^ ]*\)$/ tag--\1/
s/^[]*\(.*\) \([^ ]*\)/please pull\
\1\
\2/p
-- 
2.3.0


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


odb_mkstemp's 0444 permission broke write/delete access on AFP

2015-02-16 Thread Fairuzan Roslan
Hi,

Somehow the “int mode = 0444;” in odb_mkstemp (environment.c) are causing a lot 
of issues (unable to unlink/write/rename) to those people who use AFP shares.

In order to be able to write/unlink/delete/rename a file on AFP filesystem the 
owner of the file must have at least a u+w access to it.

The issue was first introduced in 
https://github.com/git/git/blob/f80c7ae8fe9c0f3ce93c96a2dccaba34e456e33a/wrapper.c
 line 284.

To fix these issues the permission need to be adjusted to “int mode = 0644;” in 
odb_mkstemp (environment.c)

Please let me know if you need further detail.

Regards,
Fairuzan--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-threaded 'git clone'

2015-02-16 Thread Jeff King
On Mon, Feb 16, 2015 at 07:31:33AM -0800, David Lang wrote:

> >Then the server streams the data to the client. It might do some light
> >work transforming the data as it comes off the disk, but most of it is
> >just blitted straight from disk, and the network is the bottleneck.
> 
> Depending on how close to full the WAN link is, it may be possible to
> improve this with multiple connections (again, referencing bbcp), but
> there's also the question of if it's worth trying to use the entire WAN for
> a single user. The vast majority of the time the server is doing more than
> one thing and would rather let any individual user wait a bit and service
> the other users.

Yeah, I have seen clients that make multiple TCP connections to each
request a chunk of a file in parallel. The short answer is that this is
going to be very hard with git. Each clone generates the pack on the fly
based on what's on disk and streams it out. It should _usually_ be the
same, but there's nothing to guarantee byte-for-byte equality between
invocations. So you'd have to multiplex all of the connections into the
same server process. And even then it's hard; that process knows its
going to send you byte the bytes for object X, but it doesn't know at
exactly which offset until it gets there, which makes sending things out
of order tricky. And the whole output is checksummed by a single sha1
over the whole stream that comes at the end.

I think the most feasible thing would be to quickly spool it to a server
on the LAN, and then use an existing fetch-in-parallel tool to grab it
from there over the WAN.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-threaded 'git clone'

2015-02-16 Thread David Lang

On Mon, 16 Feb 2015, Jeff King wrote:


On Mon, Feb 16, 2015 at 05:31:13AM -0800, David Lang wrote:


I think it's an interesting question to look at, but before you start
looking at changing the architecture of the current code, I would suggest
doing a bit more analisys of the problem to see if the bottleneck is really
where you think it is.

First measure, then optimize :-)


Yes, very much so. Fortunately some people have already done some of
this work. :)


nice summary


Then the server streams the data to the client. It might do some light
work transforming the data as it comes off the disk, but most of it is
just blitted straight from disk, and the network is the bottleneck.


Depending on how close to full the WAN link is, it may be possible to improve 
this with multiple connections (again, referencing bbcp), but there's also the 
question of if it's worth trying to use the entire WAN for a single user. The 
vast majority of the time the server is doing more than one thing and would 
rather let any individual user wait a bit and service the other users.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git svn import failure : write .git/Git_svn_hash_BmjclS: Bad file descriptor

2015-02-16 Thread Nico Schlömer
I just double-checked and I can only produce this issue on one machine
(tested on 3). Apparently, this is has nothing to do with Git itself
then.

Any ideas of what could be wrong?

Cheers,
Nico

On Thu, Feb 12, 2015 at 8:18 PM, Eric Wong  wrote:
> Valery Yundin  wrote:
>> On 31 January 2015 at 13:51, Nico Schlömer  wrote:
>> > I tried the patch and I still get
>> > ```
>> > [...]
>> > r100 = e2a9b5baa2cebb18591ecb04ff350410d52f36de (refs/remotes/git-svn)
>> > Unexpected result returned from git cat-file at
>> > /home/nschloe/share/perl/5.18.2/Git/SVN/Fetcher.pm line 335.
>> > Failed to read object 619f9d1d857fb287d06a70c9dac6b8b534d0de6a at
>> > /home/nschloe/share/perl/5.18.2/Git/SVN/Fetcher.pm line 336, 
>> > line 757.
>> >
>> > error closing pipe: Bad file descriptor at
>> > /home/nschloe/libexec/git-core/git-svn line 0.
>> > error closing pipe: Bad file descriptor at
>> > /home/nschloe/libexec/git-core/git-svn line 0.
>> > ```
>> > when
>> > ```
>> > git svn clone https://geuz.org/svn/gmsh/trunk
>>
>> It seems that the same commit dfa72fdb96 is responsible for the error
>> in "git svn clone https://geuz.org/svn/gmsh/trunk";. But unlike in my
>> case, the patch doesn't fix it.
>
> (top-posting corrected)
>
> Odd, I managed to clone that without issues, but I couldn't reproduce
> this problem with or without the tempfile clearing patch applied.
>
>git svn clone --username gmsh https://geuz.org/svn/gmsh/trunk
>
> Anybody else?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-threaded 'git clone'

2015-02-16 Thread Jeff King
On Mon, Feb 16, 2015 at 05:31:13AM -0800, David Lang wrote:

> I think it's an interesting question to look at, but before you start
> looking at changing the architecture of the current code, I would suggest
> doing a bit more analisys of the problem to see if the bottleneck is really
> where you think it is.
> 
> First measure, then optimize :-)

Yes, very much so. Fortunately some people have already done some of
this work. :)

On the server side of a clone, the things that must be done before
sending any data are:

  1. Count up all of the objects that must be sent by traversing the
 object graph.

  2. Find any pairs for delta compression (this is the "Compressing
 objects" phase of the progress reporting).

Step (1) naively takes 30-45 seconds for a kernel repo. However, with
reachability bitmaps, it's instant-ish. I just did a clone from
kernel.org, and it looks like they've turned on bitmaps.

For step (2), git will reuse deltas that already exist in the on-disk
packfile, and will not consider new deltas between objects that are
already in the same pack (because we would already have considered them
when packing in the first place). So the key for servers is to keep
things pretty packed. My kernel.org clone shows that they could probably
stand to repack torvalds/linux.git, but it's not too terrible.

This part is multithreaded, so what work we do happens in parallel. But
note that some servers may turn pack.threads down to 1 (since their many
CPUs are kept busy by multiple requests, rather than trying to finish a
single one).

Then the server streams the data to the client. It might do some light
work transforming the data as it comes off the disk, but most of it is
just blitted straight from disk, and the network is the bottleneck.

On the client side, the incoming data streams into an index-pack
process. For each full object it sees, it hashes and records the name of
the object as it comes in. For deltas, it queues them for resolution
after the complete pack arrives.

Once the full pack arrives, then it resolves all of the deltas. This
part is also multithreaded. If you check out "top" during the "resolving
deltas" phase of the clone, you should see multiple cores in use.

So I don't think there is any room for "just multithread it" in this
process. The CPU intensive bits are already multithreaded. There may be
room for optimizing that, though (e.g., reducing lock contention or
similar).

It would also be possible to resolve deltas while the pack is streaming
in, rather than waiting until the whole thing arrives. That's not
possible in all cases (an object may be a delta against a base that
comes later in the pack), but in practice git puts bases before their
deltas. However, it's overall less efficient, because you may end up
walking through the same parts of the delta chain more than once. For
example, imagine you see a stream of objects A, B, C, D. You get B and
see that it's a delta against A. So you resolve it, hash the object, and
are good. Now you see C, which is a delta against B. To generate C, you
have to compute B again. Now you get to D, which is another delta
against B. So now we compute B again.

You can get around this somewhat with a cache of intermediate object
contents, but of course there may be hundreds or thousands of chains
like this in use at once, so you're going to end up with some cache
misses.

What index-pack does instead is to wait until it has all of the objects,
then finds A and says "what objects use A as a base?". Then it computes
B, hashes it, and says "what objects use B as a base?". And finds C and
D, after which it nows it can drop the intermediate result B.

So that's less work over all, though in some workloads it may finish
faster if you were to stream it (because your many processors are
sitting idle while we are blocked on network bandwidth). So that's a
potential area of exploration.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi-threaded 'git clone'

2015-02-16 Thread David Lang

On Mon, 16 Feb 2015, Koosha Khajehmoogahi wrote:


Cloning huge repositories like Linux kernel takes considerable amount
of time. Is it possible to incorporate a multi-threaded simultaneous
connections functionality for cloning? To what extent do we need to
change the architecture of the current code and how large would be the
scope of the work? That just seems an interesting idea to me and would
liked to share it with the community.


They key question is what is it that takes the time in clonding and can that be 
multi-threaded.


If it's the netwrok traffic that takes the most time, where is the bottleneck?

Is it in the server software assembling what will be sent? Is it in the 
receiving software processing it? If so, multiple threads could help.


Is it in network bandwidth? If so doing multiple connections won't help much. 
TCP connections favour a few connections passing a lot of data rather than many 
connections passing a little. The one place where multiple connections can help 
is when you have non-congestion induced packet loss as a lost packet on a 
connection will cause the throughput of that connection to drop (if the drop is 
due to congestion, this is TCP working as designed, throttling back to match the 
available bandwidth). This can be a significant effect if you have a very high 
bandwidth, high latency connection (think multiple Gb on international 
connections), but for lower bandwidth connections it's much less of a factor. 
You can look at projects like bbcp


I think it's an interesting question to look at, but before you start looking at 
changing the architecture of the current code, I would suggest doing a bit more 
analisys of the problem to see if the bottleneck is really where you think it 
is.


First measure, then optimize :-)

David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Experience with Recovering From User Error (And suggestions for improvements)

2015-02-16 Thread Armin Ronacher

Hi,

On 16/02/15 13:09, Ævar Arnfjörð Bjarmason wrote:

We should definitely make recovery like this harder, but is there a
reason for why you don't use "git reset --keep" instead of --hard?
This was only the second time in years of git usage that the reset was 
incorrectly done.  I suppose at this point I might try to retrain my 
muscle memory to type something else :)



If we created such hooks for "git reset --hard" we'd just need to
expose some other thing as that low-level operation (and break scripts
that already rely on it doing the minimal "yes I want to change the
tree no matter what" thing), and then we'd just be back to square one
in a few years when users started using "git reset --really-hard" (or
whatever the flag would be).
I don't think that's necessary, I don't think it would make the 
operation much slower to just make a dangling commit and write out a few 
blobs.  The garbage collect will soon enough take care of that data 
anyways.  But I guess that would need testing on large trees to see how 
bad that goes.


I might look into the git undo thing that was mentioned.


Regards,
Armin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Multi-threaded 'git clone'

2015-02-16 Thread Koosha Khajehmoogahi
Greetings,

Cloning huge repositories like Linux kernel takes considerable amount
of time. Is it possible to incorporate a multi-threaded simultaneous
connections functionality for cloning? To what extent do we need to
change the architecture of the current code and how large would be the
scope of the work? That just seems an interesting idea to me and would
liked to share it with the community.

Regards

-- 
Koosha
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Experience with Recovering From User Error (And suggestions for improvements)

2015-02-16 Thread Duy Nguyen
On Mon, Feb 16, 2015 at 7:10 PM, Ævar Arnfjörð Bjarmason
 wrote:
>> We should definitely make recovery like this harder, but is there a
>> reason for why you don't use "git reset --keep" instead of --hard?
>> It'll keep any local changes to your index/staging area, and reset the
>> files that don't conflict, if there's any conflicts the operation will
>> be aborted.
>
> "Recovery like this easier", i.e. make it easier to get back
> previously staged commits / blobs.

I started with git-undo (or what's its name) a while back (*). The
idea is for dangerous commands like this we could save some data back,
which would be pruned after some time. Saving stuff in index is quite
easy because they are already in object database. For this reset
--hard, we may need to hash/store some more blobs. I think it's worth
the safety. Not sure if anyone's interested in continuing that work.

(*) found it: 
http://thread.gmane.org/gmane.comp.version-control.git/231621/focus=231879
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Experience with Recovering From User Error (And suggestions for improvements)

2015-02-16 Thread Ævar Arnfjörð Bjarmason
On Mon, Feb 16, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason
 wrote:
> On Mon, Feb 16, 2015 at 11:41 AM, Armin Ronacher
>  wrote:
>> Long story short: I failed big time yesterday with accidentally executing
>> git reset hard in the wrong terminal window but managed to recover my
>> changes from the staging area by manually examining blobs touched recently.
>>
>> After that however I figured I might want to add a precaution for myself
>> that would have helped there.  git fsck is quite nice, but unfortunately it
>> does not help if you do not have a commit.  So I figured it might be nice to
>> create a dangling backup commit before a reset which would have helped me.
>> Unfortunately there is currently no good way to hook into git reset.
>>
>> Things I noticed in the process:
>>
>> *   for recovering blobs, going through the objects itself was more
>> useful because they were all recent changes and as such I could
>> order by timestamp.  git fsck will not provide any timestamps
>> (which generally makes sense, but made it quite useless for me)
>> *   Recovering from blobs is painful, it would be nice if git reset
>> --hard made a dangling dummy commit before :)
>> *   There is no pre-commit hook which could be used to implement the
>> previous suggestion.
>>
>> Would it make sense to introduce a `pre-commit` hook for this sort of thing
>> or even create a dummy commit by default?  I did a quick googling around and
>> it looks like I was not the first person who made this mistake.  Github's
>> windows client even creates dangling backup commits in what appears to be
>> fixed time intervals.
>>
>> I understand that ultimately this was a user error on my part, but it seems
>> like a small change that could save a lot of frustration.
>
> Something like "can we have a hook for every change in the working
> tree" has come up in the past, but has been defeated by performance
> concerns. "git reset --hard" is a low-level-ish operation, and it's
> really useful to be able to quickly reset the working tree to some
> state no matter what, and without creating extra commits or whatever.
>
> We should definitely make recovery like this harder, but is there a
> reason for why you don't use "git reset --keep" instead of --hard?
> It'll keep any local changes to your index/staging area, and reset the
> files that don't conflict, if there's any conflicts the operation will
> be aborted.

"Recovery like this easier", i.e. make it easier to get back
previously staged commits / blobs.

> If we created such hooks for "git reset --hard" we'd just need to
> expose some other thing as that low-level operation (and break scripts
> that already rely on it doing the minimal "yes I want to change the
> tree no matter what" thing), and then we'd just be back to square one
> in a few years when users started using "git reset --really-hard" (or
> whatever the flag would be).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Experience with Recovering From User Error (And suggestions for improvements)

2015-02-16 Thread Ævar Arnfjörð Bjarmason
On Mon, Feb 16, 2015 at 11:41 AM, Armin Ronacher
 wrote:
> Long story short: I failed big time yesterday with accidentally executing
> git reset hard in the wrong terminal window but managed to recover my
> changes from the staging area by manually examining blobs touched recently.
>
> After that however I figured I might want to add a precaution for myself
> that would have helped there.  git fsck is quite nice, but unfortunately it
> does not help if you do not have a commit.  So I figured it might be nice to
> create a dangling backup commit before a reset which would have helped me.
> Unfortunately there is currently no good way to hook into git reset.
>
> Things I noticed in the process:
>
> *   for recovering blobs, going through the objects itself was more
> useful because they were all recent changes and as such I could
> order by timestamp.  git fsck will not provide any timestamps
> (which generally makes sense, but made it quite useless for me)
> *   Recovering from blobs is painful, it would be nice if git reset
> --hard made a dangling dummy commit before :)
> *   There is no pre-commit hook which could be used to implement the
> previous suggestion.
>
> Would it make sense to introduce a `pre-commit` hook for this sort of thing
> or even create a dummy commit by default?  I did a quick googling around and
> it looks like I was not the first person who made this mistake.  Github's
> windows client even creates dangling backup commits in what appears to be
> fixed time intervals.
>
> I understand that ultimately this was a user error on my part, but it seems
> like a small change that could save a lot of frustration.

Something like "can we have a hook for every change in the working
tree" has come up in the past, but has been defeated by performance
concerns. "git reset --hard" is a low-level-ish operation, and it's
really useful to be able to quickly reset the working tree to some
state no matter what, and without creating extra commits or whatever.

We should definitely make recovery like this harder, but is there a
reason for why you don't use "git reset --keep" instead of --hard?
It'll keep any local changes to your index/staging area, and reset the
files that don't conflict, if there's any conflicts the operation will
be aborted.

If we created such hooks for "git reset --hard" we'd just need to
expose some other thing as that low-level operation (and break scripts
that already rely on it doing the minimal "yes I want to change the
tree no matter what" thing), and then we'd just be back to square one
in a few years when users started using "git reset --really-hard" (or
whatever the flag would be).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pack v4 again..

2015-02-16 Thread Duy Nguyen
On Mon, Feb 16, 2015 at 1:45 PM, Jeff King  wrote:
> Somewhat related to this, I was playing this weekend with the idea of
> generating fast tree diffs from our on-disk deltas. That is, given a
> base tree and a binary delta against it, could I reliably reproduce a
> diff (one way or the other) in O(size of diff), rather than O(size of
> tree)?

If you add a "delta" base cache for v4 trees to avoid the recursion
issue Nico mentioned, you effectively have a "diff" that aligns at
tree entry boundaries. The v4 tree encoding will become just another
delta format from this perspective. I'm very tempted to just go with
this to take advantage of v4 SHA-1 encoding, ident and path encoding..
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Experience with Recovering From User Error (And suggestions for improvements)

2015-02-16 Thread Armin Ronacher

Hi,

Long story short: I failed big time yesterday with accidentally 
executing git reset hard in the wrong terminal window but managed to 
recover my changes from the staging area by manually examining blobs 
touched recently.


After that however I figured I might want to add a precaution for myself 
that would have helped there.  git fsck is quite nice, but unfortunately 
it does not help if you do not have a commit.  So I figured it might be 
nice to create a dangling backup commit before a reset which would have 
helped me.  Unfortunately there is currently no good way to hook into 
git reset.


Things I noticed in the process:

*   for recovering blobs, going through the objects itself was more
useful because they were all recent changes and as such I could
order by timestamp.  git fsck will not provide any timestamps
(which generally makes sense, but made it quite useless for me)
*   Recovering from blobs is painful, it would be nice if git reset
--hard made a dangling dummy commit before :)
*   There is no pre-commit hook which could be used to implement the
previous suggestion.

Would it make sense to introduce a `pre-commit` hook for this sort of 
thing or even create a dummy commit by default?  I did a quick googling 
around and it looks like I was not the first person who made this 
mistake.  Github's windows client even creates dangling backup commits 
in what appears to be fixed time intervals.


I understand that ultimately this was a user error on my part, but it 
seems like a small change that could save a lot of frustration.



Regards,
Armin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git Merge, April 8-9, Paris

2015-02-16 Thread Matthieu Moy
Hi,

- Original Message -
> GitHub is organizing a Git-related conference to be held April 8-9,
> 2015, in Paris.  Details here:
> 
>   http://git-merge.com/
> 
> The exact schedule is still being worked out, but there is going to be
> some dedicated time/space for Git (and libgit2 and JGit) developers to
> meet and talk to each other.

A naive question: will there be subscription fees? If so, can you give an order 
of magnitude?

Thanks,

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pack v4 again..

2015-02-16 Thread Duy Nguyen
On Mon, Feb 16, 2015 at 11:59 AM, Nicolas Pitre  wrote:
>> I think pack v4 does not deliver its best promise that walking a tree
>> is simply following pointers and jumping from place to place. When we
>> want to copy from the middle of another tree, we need to scan from the
>> beginning of the tree. Tree offset cache helps, but the problem
>> remains. What do you think about an alternative format that each
>> "copy" instruction includes both index of the tree entry to copy from
>> (i.e. what we store now)  _and_ the byte offset from the beginning of
>> the tree? With this byte offset, we know exactly where to start
>> copying without scanning from the beginning. It will be a bit(?)
>> bigger, but it's also faster.
>
> That would make the format inflexible.  If we want to do partial
> repacking by, say, copying some objects and repacking others (some
> objects might need repacking because the objects they refer to are
> omitted from the repack) then if those repacked objects are themselves
> referred to by byte offset then we lose as the offset is no longer
> valid.

When generate new packs, we could rip out these offsets, but it
depends on whether we can reuse v4 trees unchanged like we do with v2
deltas. If we can, ripping is not an option because then we need to
parse more.

>> I imagine this is an optimization that can be done locally. The pack
>> transferred over network does not have these byte offsets. After the
>> pack is stored and verified by index-pack, we can rewrite it and add
>> this info. The simplest way is use a fixed size for this offset (e.g.
>> uint16_t or even uint8_t), add the place holder in copy instructions
>> of all v4 trees. After that object offsets will not change again and
>> we can start filling real offsets to placeholders.
>
> Having a local extra index is fine.  Just like the pack index which is
> always created locally and can be recreated at any time.  Some tree
> offset cache might be beneficial, but I'd avoid making it into the pack
> file itself.

Hm.. right.

> Yet, I think the biggest problem with pack v4 at the moment is the
> packing algorithm for tree objects.  We are piggy-backing on the pack v2
> object delta compression sorting and that produces suboptimal results
> due to deep recursions.  And it is the recursion that kills us. The pack
> v4 requires a new packing algorithm for its tree objects.

Yep. I made a conversion tool a few days ago to "flatten" v4 trees. So
if tree A copies some entries from B, but then B copies from C, tree A
could copy directly from C. Performance improves significantly (close
to v2 with rev-list, but still slower). But pack size doubles because
copy sequences are fragmented and we're duplicating same copy patterns
over and over again. All because we follow the single delta chain
decided since v2 time so we only have one tree with no copy sequences
(best to copy from).

> What I imagined is something like this:
>
> - Each canonical tree entry is made of a SHA1, mode and path.  Let's
>   assume this is hashed into a 24-bit value.
>
> - Each tree object can therefore be represented as a string of 24-bit
>   "characters".
>
> - Delta-compressing a tree object becomes a substring search where we
>   try to replace a sequence of "characters" with the longest "string"
>   possible from another object.  Repeat with the remaining sequences.
>
> Having a 24-bit hash value is totally arbitrary.  It could be 16 bits
> with more collisions but much faster search and less memory usage.  The
> optimal value would need to be determined after some experimentation.
>
> Algorithms for the longest common substring problem already exist.  So
> one of the classical algorithms could probably be adapted here.
>
> This would allow for exploiting the provision in pack v4 to copy from
> more than one tree object.  And this would also favor shallower
> recursions and even smaller packs.  Imposing a minimum substring length
> (rather than a maximum delta depth) would determine the runtime
> performance when using the pack afterwards.
>
> If you have enough free cycles to work on this, that's what I'd suggest
> you explore at this point. I wish I could myself as I think this ought
> to be rather cool work.

I'm getting there. I'm writing an alternative implementation to your
pv4_encode_tree() that takes multiple base trees instead of just one,
finding all copy sequences from these bases, then somehow pick the
best ones. After trees are sorted by similarity in pack-objects, we
preserve n trees as base trees (no copy sequences). There is a window
to feed some of these base trees to this encode function, like how we
do it in try_delta().

Your encoding tree entries as strings would be faster, but that's not
the immediate problem. Using the longest common substring is kinda
greedy (exactly what I'm thinking to do), but it probably produces a
suboptimal copy sequences. Maybe using two shorter copy sequences
would reduce fragmentation than one large copy sequence an

Re: [PATCH 0/2] Getopt::Long workaround in send-email

2015-02-16 Thread Tom G. Christensen

On 13/02/15 21:19, Junio C Hamano wrote:

I am inclined to squash these into one commit before starting to
merge them down to 'next' and then to 'master', after getting
Tested-by: from those with older Getopt::Long (prior to 2.32).

Junio C Hamano (1):
   SQUASH??? t9001: turn --no$option workarounds to --no-$option

Kyle J. McKay (1):
   git-send-email.perl: support no- prefix with older GetOptions

  git-send-email.perl   | 10 ++
  t/t9001-send-email.sh | 10 +-
  2 files changed, 15 insertions(+), 5 deletions(-)



Tested-by: Tom G. Christensen 

I replaced my original patch with this series on top of 2.3.0 and then 
did a build on RHEL3 (perl 5.8.0) and RHEL4 (perl 5.8.5).

On both platforms t9001 passes.

-tgc
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/24] dir.c: optionally compute sha-1 of a .gitignore file

2015-02-16 Thread Duy Nguyen
On Thu, Feb 12, 2015 at 4:23 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> -int add_excludes_from_file_to_list(const char *fname,
>> -const char *base,
>> -int baselen,
>> -struct exclude_list *el,
>> -int check_index)
>> +/*
>> + * Given a file with name "fname", read it (either from disk, or from
>> + * the index if "check_index" is non-zero), parse it and store the
>> + * exclude rules in "el".
>> + *
>> + * If "ss" is not NULL, compute SHA-1 of the exclude file and fill
>> + * stat data from disk (only valid if add_excludes returns zero). If
>> + * ss_valid is non-zero, "ss" must contain good value as input.
>> + */
>> +static int add_excludes(const char *fname, const char *base, int baselen,
>> + struct exclude_list *el, int check_index,
>> + struct sha1_stat *sha1_stat)
>> ...
>> @@ -571,6 +588,21 @@ int add_excludes_from_file_to_list(const char *fname,
>>   }
>>   buf[size++] = '\n';
>>   close(fd);
>> + if (sha1_stat) {
>> + int pos;
>> + if (sha1_stat->valid &&
>> + !match_stat_data(&sha1_stat->stat, &st))
>> + ; /* no content change, ss->sha1 still good */
>> + else if (check_index &&
>> +  (pos = cache_name_pos(fname, strlen(fname))) 
>> >= 0 &&
>> +  !ce_stage(active_cache[pos]) &&
>> +  ce_uptodate(active_cache[pos]))
>> + hashcpy(sha1_stat->sha1, 
>> active_cache[pos]->sha1);
>> + else
>> + hash_sha1_file(buf, size, "blob", 
>> sha1_stat->sha1);
>
> I do not think this would work well on DOS.
>
> This helper function originally is meant to work *only* on the
> checked out representation of the file and that is what is read by
> read_in_full(), and that is the reason why it handles the case where
> the contents of buf[] happens to be CRLF terminated in the function.
>
> If you want to detect the content changes across working tree, index
> and the tree objects by reusing hash_sha1_file(), however, you must
> not feed the checked out (aka "smudged") representation to it.
> You'd need to turn it into "cleaned" representation by doing the
> equivalent of calling index_path().  Some helpers in the callchain
> that originates from index_path() might directly be reusable for
> your purpose.

Urgh.. you're right this test would fail when some filters are
involved. I'm not sure if we want to check the cleaned version though.
What matters to exclude machinery is the checkout version. I think for
now we fall back to hashing .gitignore content. Perhaps later we could
make an exception for cr/lf conversion (and just that, not generic
filters, doing content conversion here sounds like a bad idea).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] add a flag to supress errors in git_config_parse_key()

2015-02-16 Thread Tanay Abhra

`git_config_parse_key()` is used to sanitize the input key.
Some callers of the function like `git_config_set_multivar_in_file()`
get the pre-sanitized key directly from the user so it becomes
necessary to raise an error specifying what went wrong when the entered
key is syntactically malformed.

Other callers like `configset_find_element()` get their keys from
the git itself so a return value signifying error would be enough.
The error output shown to the user is useless and confusing in this
case so add a flag to suppress errors in such cases.

Helped-by: Junio C Hamano 
Helped-by: Jeff King 
Signed-off-by: Tanay Abhra 
---
Hi Jeff,

I went through Junio's config guideline patch series
and the whole thread of underscore bug report and I also think
that pager.*.command is the right path to go.

If you want to relax the syntactic requirement (such as add '_' to
the current set of allowed chacters), I can work upon it but most of the
comments point that moving towards pager.*.command would be better.

p.s: I hope that I got the unsigned flag suggestion by Junio correctly.

-Tanay

 builtin/config.c |  2 +-
 cache.h  |  4 +++-
 config.c | 20 +---
 t/t7006-pager.sh |  9 +
 4 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index d32c532..326d3d3 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -200,7 +200,7 @@ static int get_value(const char *key_, const char *regex_)
goto free_strings;
}
} else {
-   if (git_config_parse_key(key_, &key, NULL)) {
+   if (git_config_parse_key(key_, &key, NULL, 0)) {
ret = CONFIG_INVALID_KEY;
goto free_strings;
}
diff --git a/cache.h b/cache.h
index f704af5..9073ee2 100644
--- a/cache.h
+++ b/cache.h
@@ -1329,6 +1329,8 @@ extern int update_server_info(int);

 #define CONFIG_REGEX_NONE ((void *)1)

+#define CONFIG_ERROR_QUIET 0x0001
+
 struct git_config_source {
unsigned int use_stdin:1;
const char *file;
@@ -1358,7 +1360,7 @@ extern int git_config_string(const char **, const char *, 
const char *);
 extern int git_config_pathname(const char **, const char *, const char *);
 extern int git_config_set_in_file(const char *, const char *, const char *);
 extern int git_config_set(const char *, const char *);
-extern int git_config_parse_key(const char *, char **, int *);
+extern int git_config_parse_key(const char *, char **, int *, unsigned int);
 extern int git_config_set_multivar(const char *, const char *, const char *, 
int);
 extern int git_config_set_multivar_in_file(const char *, const char *, const 
char *, const char *, int);
 extern int git_config_rename_section(const char *, const char *);
diff --git a/config.c b/config.c
index e5e64dc..7e23bb9 100644
--- a/config.c
+++ b/config.c
@@ -1309,7 +1309,7 @@ static struct config_set_element 
*configset_find_element(struct config_set *cs,
 * `key` may come from the user, so normalize it before using it
 * for querying entries from the hashmap.
 */
-   ret = git_config_parse_key(key, &normalized_key, NULL);
+   ret = git_config_parse_key(key, &normalized_key, NULL, 
CONFIG_ERROR_QUIET);

if (ret)
return NULL;
@@ -1842,8 +1842,10 @@ int git_config_set(const char *key, const char *value)
  * lowercase section and variable name
  * baselen - pointer to int which will hold the length of the
  *   section + subsection part, can be NULL
+ * flags - toggle whether the function raises an error on a syntactically
+ * malformed key
  */
-int git_config_parse_key(const char *key, char **store_key, int *baselen_)
+int git_config_parse_key(const char *key, char **store_key, int *baselen_, 
unsigned int flags)
 {
int i, dot, baselen;
const char *last_dot = strrchr(key, '.');
@@ -1854,12 +1856,14 @@ int git_config_parse_key(const char *key, char 
**store_key, int *baselen_)
 */

if (last_dot == NULL || last_dot == key) {
-   error("key does not contain a section: %s", key);
+   if (!flags)
+   error("key does not contain a section: %s", key);
return -CONFIG_NO_SECTION_OR_NAME;
}

if (!last_dot[1]) {
-   error("key does not contain variable name: %s", key);
+   if (!flags)
+   error("key does not contain variable name: %s", key);
return -CONFIG_NO_SECTION_OR_NAME;
}

@@ -1881,12 +1885,14 @@ int git_config_parse_key(const char *key, char 
**store_key, int *baselen_)
if (!dot || i > baselen) {
if (!iskeychar(c) ||
(i == baselen + 1 && !isalpha(c))) {
-   error("invalid key: %s", key);
+   if (!flags)
+