Re: [PATCH v4 1/3] update-unicode.sh: automatically download newer definition files

2016-12-03 Thread Torsten Bögershausen
On Sat, Dec 03, 2016 at 10:00:47PM +0100, Beat Bolli wrote:
> Checking just for the unicode data files' existence is not sufficient;
> we should also download them if a newer version exists on the Unicode
> consortium's servers. Option -N of wget does this nicely for us.
> 
> Reviewed-by: Torsten Boegershausen 

Minor remark (Not sure if this motivates v5, may be Junio can fix it locally?)
s/oe/ö/

Beside this: Thanks again (and I learned about the -N option of wget)


Re: Git v2.11.0 breaks max depth nested alternates

2016-12-03 Thread Jeff King
On Sat, Dec 03, 2016 at 04:24:02PM -0800, Kyle J. McKay wrote:

> When the incoming quarantine takes place the current objects directory  
> is demoted to an alternate thereby increasing its depth (and any  
> alternates it references) by one and causing any object store that was  
> previously at the maximum nesting depth to be ignored courtesy of the  
> above hard-coded maximum depth.
> 
> If the incoming push happens to need access to some of those objects  
> to perhaps "--fix-thin" its pack it will crash and burn.

Yep, that makes sense. I didn't really worry about this because the
existing "5" is totally arbitrary, and meant to be so high that nobody
reaches it (it's just there to break cycles).

So I do think this is worth dealing with, but I'm also curious why
you're hitting the depth-5 limit. I'm guessing it has to do with hosting
a hierarchy of related repos. But is your system then always in danger
of busting the 5-limit if people create too deep a repository hierarchy?

Specifically, I'm wondering if it would be sufficient to just bump it to
6. Or 100.

Of course any static bump runs into the funny case where a repo
_usually_ works, but fails when pushed to. Which is kind of nasty and
unintuitive. And your patch fixes that, and we can leave the idea of
bumping the static depth number as an orthogonal issue (that personally,
I do not care about much about either way).

> diff --git a/common-main.c b/common-main.c
> index c654f955..9f747491 100644
> --- a/common-main.c
> +++ b/common-main.c
> @@ -37,5 +37,8 @@ int main(int argc, const char **argv)
>  
>   restore_sigpipe_to_default();
>  
> + if (getenv(GIT_QUARANTINE_ENVIRONMENT))
> + alt_odb_max_depth++;
> +
>   return cmd_main(argc, argv);

After reading your problem description, my initial thought was to
increment the counter when we allocate the tmp-objdir, and decrement
when it is destroyed. Because the parent receive-pack process adds it to
its alternates, too. But:

  1. Receive-pack doesn't care; it adds the tmp-objdir as an alternate,
 rather than adding it as its main object dir and bumping down the
 main one.

  2. There would have to be some way of communicating to sub-processes
 that they should bump their max-depth by one.

You've basically used the quarantine-path variable as the
inter-process flag for (2). Which feels a little funny, because its
value is unrelated to the alt-odb setup. But it is a reliable signal, so
there's a certain elegance. It's probably the best option, given that
the alternative is a specific variable to say "hey, bump your
max-alt-odb-depth by one". That's pretty ugly, too. :)

-Peff


[PATCH v2] tag, branch, for-each-ref: add --ignore-case for sorting and filtering

2016-12-03 Thread Nguyễn Thái Ngọc Duy
This options makes sorting ignore case, which is great when you have
branches named bug-12-do-something, Bug-12-do-some-more and
BUG-12-do-what and want to group them together. Sorting externally may
not be an option because we lose coloring and column layout from
git-branch and git-tag.

The same could be said for filtering, but it's probably less important
because you can always go with the ugly pattern [bB][uU][gG]-* if you're
desperate.

You can't have case-sensitive filtering and case-insensitive sorting (or
the other way around) with this though. For branch and tag, that should
be no problem. for-each-ref, as a plumbing, might want finer control.
But we can always add --{filter,sort}-ignore-case when there is a need
for it.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Changes are in tests only:

diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index fad79e8..52283df 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -208,6 +208,13 @@ test_expect_success 'sort branches, ignore case' '
test_commit initial &&
git branch branch-one &&
git branch BRANCH-two &&
+   git branch --list | awk "{print \$NF}" >actual &&
+   cat >expected <<-\EOF &&
+   BRANCH-two
+   branch-one
+   master
+   EOF
+   test_cmp expected actual &&
git branch --list -i | awk "{print \$NF}" >actual &&
cat >expected <<-\EOF &&
branch-one
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 2d9cae3..07869b0 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -34,6 +34,13 @@ test_expect_success 'sort tags, ignore case' '
test_commit initial &&
git tag tag-one &&
git tag TAG-two &&
+   git tag -l >actual &&
+   cat >expected <<-\EOF &&
+   TAG-two
+   initial
+   tag-one
+   EOF
+   test_cmp expected actual &&
git tag -l -i >actual &&
cat >expected <<-\EOF &&
initial
@@ -98,8 +105,8 @@ test_expect_success 'listing all tags if one exists 
should output that tag' '
 test_expect_success 'listing a tag using a matching pattern should 
succeed' \
'git tag -l mytag'
 
-test_expect_success 'listing a tag using a matching pattern should 
succeed' \
-   'git tag -l --ignore-case MYTAG'
+test_expect_success 'listing a tag with --ignore-case' \
+   'test $(git tag -l --ignore-case MYTAG) = mytag'
 
 test_expect_success \
'listing a tag using a matching pattern should output that tag' \

 Documentation/git-branch.txt   |  4 
 Documentation/git-for-each-ref.txt |  3 +++
 Documentation/git-tag.txt  |  4 
 builtin/branch.c   | 23 ++-
 builtin/for-each-ref.c |  5 -
 builtin/tag.c  |  4 
 ref-filter.c   | 28 +---
 ref-filter.h   |  2 ++
 t/t3203-branch-output.sh   | 29 +
 t/t7004-tag.sh | 27 +++
 10 files changed, 112 insertions(+), 17 deletions(-)

diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 1fe7344..5516a47 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -118,6 +118,10 @@ OPTIONS
default to color output.
Same as `--color=never`.
 
+-i::
+--ignore-case::
+   Sorting and filtering branches are case insensitive.
+
 --column[=]::
 --no-column::
Display branch listing in columns. See configuration variable
diff --git a/Documentation/git-for-each-ref.txt 
b/Documentation/git-for-each-ref.txt
index f57e69b..6d22974 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -79,6 +79,9 @@ OPTIONS
Only list refs which contain the specified commit (HEAD if not
specified).
 
+--ignore-case::
+   Sorting and filtering refs are case insensitive.
+
 FIELD NAMES
 ---
 
diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
index 80019c5..76cfe40 100644
--- a/Documentation/git-tag.txt
+++ b/Documentation/git-tag.txt
@@ -108,6 +108,10 @@ OPTIONS
variable if it exists, or lexicographic order otherwise. See
linkgit:git-config[1].
 
+-i::
+--ignore-case::
+   Sorting and filtering tags are case insensitive.
+
 --column[=]::
 --no-column::
Display tag listing in columns. See configuration variable
diff --git a/builtin/branch.c b/builtin/branch.c
index 60cc5c8..36e0a21 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -512,15 +512,6 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
if (filter->verbose)
 

Re: git reset --hard should not irretrievably destroy new files

2016-12-03 Thread Julian de Bhal
On Sat, Dec 3, 2016 at 6:11 PM, Christian Couder
 wrote:
> On Sat, Dec 3, 2016 at 6:04 AM, Julian de Bhal  
> wrote:
>> but I'd be nearly as happy if a
>> commit was added to the reflog when the reset happens (I can probably make
>> that happen with some configuration now that I've been bitten).
>
> Not sure if this has been proposed. Perhaps it would be simpler to
> just output the sha1, and maybe the filenames too, of the blobs, that
> are no more referenced from the trees, somewhere (in a bloblog?).

Yeah, after doing a bit more reading around the issue, this seems like
a smaller part of destroying local changes with a hard reset, and I'm
one of the lucky ones where it is recoverable.

Has anyone discussed having `git reset --hard` create objects for the
current state of anything it's about to destroy, specifically so they
end up in the --lost-found?

I think this is what you're suggesting, only without checking for
references, so that tree & blob objects exist that make any hard reset
reversible.

Cheers

Jules

P.s. Thank you for such a warm welcome while I blunder through
unfamiliar protocols.


Git v2.11.0 breaks max depth nested alternates

2016-12-03 Thread Kyle J. McKay
The recent addition of pre-receive quarantining breaks nested  
alternates that are already at the maximum alternates nesting depth.

In the file sha1_file.c in the function link_alt_odb_entries we have  
this:

 > if (depth > 5) {
 > error("%s: ignoring alternate object stores, nesting too deep.",
 > relative_base);
 > return;
 > }

When the incoming quarantine takes place the current objects directory  
is demoted to an alternate thereby increasing its depth (and any  
alternates it references) by one and causing any object store that was  
previously at the maximum nesting depth to be ignored courtesy of the  
above hard-coded maximum depth.

If the incoming push happens to need access to some of those objects  
to perhaps "--fix-thin" its pack it will crash and burn.

Originally I was not going to include a patch to fix this, but simply  
suggest that the expeditious fix is to just allow one additional  
alternates nesting depth level during quarantine operations.

However, it was so simple, I have included the patch below :)

I have verified that where a push with Git v2.10.2 succeeds and a push  
with Git v2.11.0 to the same repository fails because of this problem  
that the below patch does indeed correct the issue and allow the push  
to succeed.

Cheers,

Kyle

-- 8< --
Subject: [PATCH] receive-pack: increase max alternates depth during quarantine

Ever since 722ff7f876 (receive-pack: quarantine objects until
pre-receive accepts, 2016-10-03, v2.11.0), Git has been quarantining
objects and packs received during an incoming push into a separate
objects directory and using the alternates mechanism to make them
available until they are either accepted and moved into the main
objects directory or rejected and discarded.

Unfortunately this has the side effect of increasing the alternates
nesting depth level by one for all pre-existing alternates.

If a repository is already at the maximum alternates nesting depth,
then this quarantining operation can temporarily push it over making
the incoming push fail.

To prevent the failure we simply increase the allowed alternates
nesting depth by one whenever a quarantine operation is in effect.

Signed-off-by: Kyle J. McKay 
---

Notes:
Some alternates nesting depth background:

If base/fork0/fork1/fork2/fork3/fork4/fork5 represents
seven git repositories where base.git has no alternates,
fork0.git has base.git as an alternate, fork1.git has
fork0.git as an alternate and so on where fork5.git has
only fork4.git as an alternate, then fork5.git is at
the maximum allowed depth of 5.  git fsck --strict --full
works without complaint on fork5.git.

However, in base/fork0/fork1/fork2/fork3/fork4/fork5/fork6,
an fsck --strict --full of fork6.git will generate complaints
and any objects/packs present in base.git will be ignored.

 cache.h   | 1 +
 common-main.c | 3 +++
 environment.c | 1 +
 sha1_file.c   | 2 +-
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index a50a61a1..25c17c29 100644
--- a/cache.h
+++ b/cache.h
@@ -676,6 +676,7 @@ extern size_t packed_git_limit;
 extern size_t delta_base_cache_limit;
 extern unsigned long big_file_threshold;
 extern unsigned long pack_size_limit_cfg;
+extern int alt_odb_max_depth;
 
 /*
  * Accessors for the core.sharedrepository config which lazy-load the value
diff --git a/common-main.c b/common-main.c
index c654f955..9f747491 100644
--- a/common-main.c
+++ b/common-main.c
@@ -37,5 +37,8 @@ int main(int argc, const char **argv)
 
restore_sigpipe_to_default();
 
+   if (getenv(GIT_QUARANTINE_ENVIRONMENT))
+   alt_odb_max_depth++;
+
return cmd_main(argc, argv);
 }
diff --git a/environment.c b/environment.c
index 0935ec69..32e11f70 100644
--- a/environment.c
+++ b/environment.c
@@ -64,6 +64,7 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
 enum hide_dotfiles_type hide_dotfiles = HIDE_DOTFILES_DOTGITONLY;
+int alt_odb_max_depth = 5;
 
 #ifndef PROTECT_HFS_DEFAULT
 #define PROTECT_HFS_DEFAULT 0
diff --git a/sha1_file.c b/sha1_file.c
index 9c86d192..15b8432e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -337,7 +337,7 @@ static void link_alt_odb_entries(const char *alt, int len, 
int sep,
int i;
struct strbuf objdirbuf = STRBUF_INIT;
 
-   if (depth > 5) {
+   if (depth > alt_odb_max_depth) {
error("%s: ignoring alternate object stores, nesting too deep.",
relative_base);
return;
---


Re: git reset --hard should not irretrievably destroy new files

2016-12-03 Thread Julian de Bhal
On Sat, Dec 3, 2016 at 5:49 PM, Johannes Sixt  wrote:
> Am 03.12.2016 um 06:04 schrieb Julian de Bhal:
>>
>> If you `git add new_file; git reset --hard`, new_file is gone forever.
>
> AFAIC, this is a feature ;-) I occasionally use it to remove a file when I
> already have git-gui in front of me. Then it's often less convenient to type
> the path in a shell, or to pointy-click around in a file browser.

Yeah, I'm conscious that it would be a change in behaviour and would
almost certainly break things in the wild.

On the other hand, `rm` deletes perfectly well, but there's no good
way to recover the lost files after the fact. You can take some
precautions after you've been bitten, but git usually means never
saying "you should have".

>> git add new_file
>> [...]
>> git reset --hard # decided copy from backed up diff
>> # boom. new_file is gone forever
>
> ... it is not. The file is still among the dangling blobs in the repository
> until you clean it up with 'git gc'. Use 'git fsck --lost-found':

Thank you so much! Super glad to be wrong here.

Cheers,

Jules

On Sat, Dec 3, 2016 at 5:49 PM, Johannes Sixt  wrote:
> Am 03.12.2016 um 06:04 schrieb Julian de Bhal:
>>
>> If you `git add new_file; git reset --hard`, new_file is gone forever.
>
>
> AFAIC, this is a feature ;-) I occasionally use it to remove a file when I
> already have git-gui in front of me. Then it's often less convenient to type
> the path in a shell, or to pointy-click around in a file browser.
>
>> git add new_file
>
>
> Because of this ...
>
>> git add -p   # also not necessary, but distracting
>> git reset --hard # decided copy from backed up diff
>> # boom. new_file is gone forever
>
>
> ... it is not. The file is still among the dangling blobs in the repository
> until you clean it up with 'git gc'. Use 'git fsck --lost-found':
>
> --lost-found
>
> Write dangling objects into .git/lost-found/commit/ or
> .git/lost-found/other/, depending on type. If the object is a blob, the
> contents are written into the file, rather than its object name.
>
> -- Hannes
>


[PATCH v4 3/3] unicode_width.h: update the tables to Unicode 9.0

2016-12-03 Thread Beat Bolli
Rerunning update-unicode.sh that we fixed in the two previous commits
produces these new tables.

Signed-off-by: Beat Bolli 
---
 unicode_width.h | 131 +---
 1 file changed, 107 insertions(+), 24 deletions(-)

diff --git a/unicode_width.h b/unicode_width.h
index 47cdd23..02207be 100644
--- a/unicode_width.h
+++ b/unicode_width.h
@@ -25,7 +25,7 @@ static const struct interval zero_width[] = {
 { 0x0825, 0x0827 },
 { 0x0829, 0x082D },
 { 0x0859, 0x085B },
-{ 0x08E4, 0x0902 },
+{ 0x08D4, 0x0902 },
 { 0x093A, 0x093A },
 { 0x093C, 0x093C },
 { 0x0941, 0x0948 },
@@ -120,6 +120,7 @@ static const struct interval zero_width[] = {
 { 0x17C9, 0x17D3 },
 { 0x17DD, 0x17DD },
 { 0x180B, 0x180E },
+{ 0x1885, 0x1886 },
 { 0x18A9, 0x18A9 },
 { 0x1920, 0x1922 },
 { 0x1927, 0x1928 },
@@ -158,7 +159,7 @@ static const struct interval zero_width[] = {
 { 0x1CF4, 0x1CF4 },
 { 0x1CF8, 0x1CF9 },
 { 0x1DC0, 0x1DF5 },
-{ 0x1DFC, 0x1DFF },
+{ 0x1DFB, 0x1DFF },
 { 0x200B, 0x200F },
 { 0x202A, 0x202E },
 { 0x2060, 0x2064 },
@@ -171,13 +172,13 @@ static const struct interval zero_width[] = {
 { 0x3099, 0x309A },
 { 0xA66F, 0xA672 },
 { 0xA674, 0xA67D },
-{ 0xA69F, 0xA69F },
+{ 0xA69E, 0xA69F },
 { 0xA6F0, 0xA6F1 },
 { 0xA802, 0xA802 },
 { 0xA806, 0xA806 },
 { 0xA80B, 0xA80B },
 { 0xA825, 0xA826 },
-{ 0xA8C4, 0xA8C4 },
+{ 0xA8C4, 0xA8C5 },
 { 0xA8E0, 0xA8F1 },
 { 0xA926, 0xA92D },
 { 0xA947, 0xA951 },
@@ -204,7 +205,7 @@ static const struct interval zero_width[] = {
 { 0xABED, 0xABED },
 { 0xFB1E, 0xFB1E },
 { 0xFE00, 0xFE0F },
-{ 0xFE20, 0xFE2D },
+{ 0xFE20, 0xFE2F },
 { 0xFEFF, 0xFEFF },
 { 0xFFF9, 0xFFFB },
 { 0x101FD, 0x101FD },
@@ -228,16 +229,21 @@ static const struct interval zero_width[] = {
 { 0x11173, 0x11173 },
 { 0x11180, 0x11181 },
 { 0x111B6, 0x111BE },
+{ 0x111CA, 0x111CC },
 { 0x1122F, 0x11231 },
 { 0x11234, 0x11234 },
 { 0x11236, 0x11237 },
+{ 0x1123E, 0x1123E },
 { 0x112DF, 0x112DF },
 { 0x112E3, 0x112EA },
-{ 0x11301, 0x11301 },
+{ 0x11300, 0x11301 },
 { 0x1133C, 0x1133C },
 { 0x11340, 0x11340 },
 { 0x11366, 0x1136C },
 { 0x11370, 0x11374 },
+{ 0x11438, 0x1143F },
+{ 0x11442, 0x11444 },
+{ 0x11446, 0x11446 },
 { 0x114B3, 0x114B8 },
 { 0x114BA, 0x114BA },
 { 0x114BF, 0x114C0 },
@@ -245,6 +251,7 @@ static const struct interval zero_width[] = {
 { 0x115B2, 0x115B5 },
 { 0x115BC, 0x115BD },
 { 0x115BF, 0x115C0 },
+{ 0x115DC, 0x115DD },
 { 0x11633, 0x1163A },
 { 0x1163D, 0x1163D },
 { 0x1163F, 0x11640 },
@@ -252,6 +259,16 @@ static const struct interval zero_width[] = {
 { 0x116AD, 0x116AD },
 { 0x116B0, 0x116B5 },
 { 0x116B7, 0x116B7 },
+{ 0x1171D, 0x1171F },
+{ 0x11722, 0x11725 },
+{ 0x11727, 0x1172B },
+{ 0x11C30, 0x11C36 },
+{ 0x11C38, 0x11C3D },
+{ 0x11C3F, 0x11C3F },
+{ 0x11C92, 0x11CA7 },
+{ 0x11CAA, 0x11CB0 },
+{ 0x11CB2, 0x11CB3 },
+{ 0x11CB5, 0x11CB6 },
 { 0x16AF0, 0x16AF4 },
 { 0x16B30, 0x16B36 },
 { 0x16F8F, 0x16F92 },
@@ -262,31 +279,59 @@ static const struct interval zero_width[] = {
 { 0x1D185, 0x1D18B },
 { 0x1D1AA, 0x1D1AD },
 { 0x1D242, 0x1D244 },
+{ 0x1DA00, 0x1DA36 },
+{ 0x1DA3B, 0x1DA6C },
+{ 0x1DA75, 0x1DA75 },
+{ 0x1DA84, 0x1DA84 },
+{ 0x1DA9B, 0x1DA9F },
+{ 0x1DAA1, 0x1DAAF },
+{ 0x1E000, 0x1E006 },
+{ 0x1E008, 0x1E018 },
+{ 0x1E01B, 0x1E021 },
+{ 0x1E023, 0x1E024 },
+{ 0x1E026, 0x1E02A },
 { 0x1E8D0, 0x1E8D6 },
+{ 0x1E944, 0x1E94A },
 { 0xE0001, 0xE0001 },
 { 0xE0020, 0xE007F },
 { 0xE0100, 0xE01EF }
 };
 static const struct interval double_width[] = {
-{ /* plane */ 0x0, 0x1C },
-{ /* plane */ 0x1C, 0x21 },
-{ /* plane */ 0x21, 0x22 },
-{ /* plane */ 0x22, 0x23 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
 { 0x1100, 0x115F },
+{ 0x231A, 0x231B },
 { 0x2329, 0x232A },
+{ 0x23E9, 0x23EC },
+{ 0x23F0, 0x23F0 },
+{ 0x23F3, 0x23F3 },
+{ 0x25FD, 0x25FE },
+{ 0x2614, 0x2615 },
+{ 0x2648, 0x2653 },
+{ 0x267F, 0x267F },
+{ 0x2693, 0x2693 },
+{ 0x26A1, 0x26A1 },
+{ 0x26AA, 0x26AB },
+{ 0x26BD, 0x26BE },
+{ 0x26C4, 0x26C5 },
+{ 0x26CE, 0x26CE },
+{ 0x26D4, 0x26D4 },
+{ 0x26EA, 0x26EA },
+{ 0x26F2, 0x26F3 },
+{ 0x26F5, 0x26F5 },
+{ 0x26FA, 0x26FA },
+{ 0x26FD, 0x26FD },
+{ 0x2705, 0x2705 },
+{ 0x270A, 0x270B },
+{ 0x2728, 0x2728 },
+{ 0x274C, 0x274C },
+{ 0x274E, 0x274E },
+{ 0x2753, 0x2755 },
+{ 0x2757, 0x2757 },
+{ 0x2795, 0x2797 },
+{ 0x27B0, 0x27B0 },
+{ 0x27BF, 0x27BF },
+{ 0x2B1B, 0x2B1C },
+{ 0x2B50, 0x2B50 },
+{ 0x2B55, 0x2B55 },
 { 0x2E80, 0x2E99 },
 { 0x2E9B, 0x2EF3 },
 { 0x2F00, 0x2FD5 },
@@ -313,11 +358,49 @@ static const struct interval double_width[] = {
 { 0xFE68, 0xFE6B },
 { 0xFF01, 0xFF60 },
 { 0xFFE0, 0xFFE6 },
+{ 0x16FE0, 0x16FE0 },
+{ 0x17000, 0x187EC },
+{ 0x18800, 0x18AF2 },
 { 0x1B000, 0x1B001 },
+{ 0x1F004, 

[PATCH v4 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table

2016-12-03 Thread Beat Bolli
The function bisearch() in utf8.c does a pure binary search in
double_width. It does not care about the 17 plane offsets which
unicode/uniset/uniset prepends. Leaving the plane offsets in the table
may cause wrong results.

Filter out the plane offsets in update-unicode.sh.

Reviewed-by: Torsten Bögershausen 
Signed-off-by: Beat Bolli 
---
 update_unicode.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/update_unicode.sh b/update_unicode.sh
index 3c84270..4c1ec8d 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -30,7 +30,7 @@ fi &&
  grep -v plane)
};
static const struct interval double_width[] = {
-   $(uniset/uniset --32 eaw:F,W)
+   $(uniset/uniset --32 eaw:F,W | grep -v plane)
};
EOF
 )
-- 
2.7.2


[PATCH v4 1/3] update-unicode.sh: automatically download newer definition files

2016-12-03 Thread Beat Bolli
Checking just for the unicode data files' existence is not sufficient;
we should also download them if a newer version exists on the Unicode
consortium's servers. Option -N of wget does this nicely for us.

Reviewed-by: Torsten Boegershausen 
Signed-off-by: Beat Bolli 
---
Diff to v3:
  - change the Cc: into Reviewed-by: on Thorsten's request
  - include the old reroll diffs

Diff to v2:
  - reorder the commits: fix all of update-unicode.sh first, then
regenerate unicode_width.h only once

Diff to v1:
  - reword the commit message
  - add Thorsten's Cc:

 update_unicode.sh | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/update_unicode.sh b/update_unicode.sh
index 27af77c..3c84270 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -10,12 +10,8 @@ if ! test -d unicode; then
mkdir unicode
 fi &&
 ( cd unicode &&
-   if ! test -f UnicodeData.txt; then
-   wget 
http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
-   fi &&
-   if ! test -f EastAsianWidth.txt; then
-   wget 
http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
-   fi &&
+   wget -N http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt \
+   http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt 
&&
if ! test -d uniset; then
git clone https://github.com/depp/uniset.git
fi &&
-- 
2.7.2


[PATCH] docs: warn about possible '=' in clean/smudge filter process values

2016-12-03 Thread larsxschneider
From: Lars Schneider 

A pathname value in a clean/smudge filter process "key=value" pair can
contain the '=' character (introduced in edcc858). Make the user aware
of this issue in the docs, add a corresponding test case, and fix the
issue in filter process value parser of the example implementation in
contrib.

Signed-off-by: Lars Schneider 
---
 Documentation/gitattributes.txt|  4 +++-
 contrib/long-running-filter/example.pl |  8 ++--
 t/t0021-conversion.sh  | 20 ++--
 t/t0021/rot13-filter.pl|  8 ++--
 4 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 976243a63e..e0b66c1220 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -435,7 +435,9 @@ to filter relative to the repository root. Right after the 
flush packet
 Git sends the content split in zero or more pkt-line packets and a
 flush packet to terminate content. Please note, that the filter
 must not send any response before it received the content and the
-final flush packet.
+final flush packet. Also note that the "value" of a "key=value" pair
+can contain the "=" character whereas the key would never contain
+that character.
 
 packet:  git> command=smudge
 packet:  git> pathname=path/testfile.dat
diff --git a/contrib/long-running-filter/example.pl 
b/contrib/long-running-filter/example.pl
index 39457055a5..a677569ddd 100755
--- a/contrib/long-running-filter/example.pl
+++ b/contrib/long-running-filter/example.pl
@@ -81,8 +81,12 @@ packet_txt_write("capability=smudge");
 packet_flush();

 while (1) {
-   my ($command)  = packet_txt_read() =~ /^command=([^=]+)$/;
-   my ($pathname) = packet_txt_read() =~ /^pathname=([^=]+)$/;
+   my ($command)  = packet_txt_read() =~ /^command=(.+)$/;
+   my ($pathname) = packet_txt_read() =~ /^pathname=(.+)$/;
+
+   if ( $pathname eq "" ) {
+   die "bad pathname '$pathname'";
+   }

packet_bin_read();

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 4ea534e9fa..f3a0df2add 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -93,7 +93,7 @@ test_expect_success setup '
git checkout -- test test.t test.i &&

echo "content-test2" >test2.o &&
-   echo "content-test3 - filename with special characters" >"test3 
'\''sq'\'',\$x.o"
+   echo "content-test3 - filename with special characters" >"test3 
'\''sq'\'',\$x=.o"
 '

 script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
@@ -359,12 +359,12 @@ test_expect_success PERL 'required process filter should 
filter data' '
cp "$TEST_ROOT/test.o" test.r &&
cp "$TEST_ROOT/test2.o" test2.r &&
mkdir testsubdir &&
-   cp "$TEST_ROOT/test3 '\''sq'\'',\$x.o" "testsubdir/test3 
'\''sq'\'',\$x.r" &&
+   cp "$TEST_ROOT/test3 '\''sq'\'',\$x=.o" "testsubdir/test3 
'\''sq'\'',\$x=.r" &&
>test4-empty.r &&

S=$(file_size test.r) &&
S2=$(file_size test2.r) &&
-   S3=$(file_size "testsubdir/test3 '\''sq'\'',\$x.r") &&
+   S3=$(file_size "testsubdir/test3 '\''sq'\'',\$x=.r") &&

filter_git add . &&
cat >expected.log <<-EOF &&
@@ -373,7 +373,7 @@ test_expect_success PERL 'required process filter should 
filter data' '
IN: clean test.r $S [OK] -- OUT: $S . [OK]
IN: clean test2.r $S2 [OK] -- OUT: $S2 . [OK]
IN: clean test4-empty.r 0 [OK] -- OUT: 0  [OK]
-   IN: clean testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- 
OUT: $S3 . [OK]
+   IN: clean testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] 
-- OUT: $S3 . [OK]
STOP
EOF
test_cmp_count expected.log rot13-filter.log &&
@@ -385,23 +385,23 @@ test_expect_success PERL 'required process filter should 
filter data' '
IN: clean test.r $S [OK] -- OUT: $S . [OK]
IN: clean test2.r $S2 [OK] -- OUT: $S2 . [OK]
IN: clean test4-empty.r 0 [OK] -- OUT: 0  [OK]
-   IN: clean testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- 
OUT: $S3 . [OK]
+   IN: clean testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] 
-- OUT: $S3 . [OK]
IN: clean test.r $S [OK] -- OUT: $S . [OK]
IN: clean test2.r $S2 [OK] -- OUT: $S2 . [OK]
IN: clean test4-empty.r 0 [OK] -- OUT: 0  [OK]
-   IN: clean testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- 
OUT: $S3 . [OK]
+   IN: clean testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] 
-- OUT: $S3 . [OK]
STOP
EOF
  

Re: [RFC/PATCH v3 00/16] Add initial experimental external ODB support

2016-12-03 Thread Lars Schneider

> On 30 Nov 2016, at 22:04, Christian Couder  wrote:
> 
> Goal
> 
> 
> Git can store its objects only in the form of loose objects in
> separate files or packed objects in a pack file.
> 
> To be able to better handle some kind of objects, for example big
> blobs, it would be nice if Git could store its objects in other object
> databases (ODB).

This is a great goal. I really hope we can use that to solve the
pain points in the current Git <--> GitLFS integration!
Thanks for working on this!

Minor nit: I feel the term "other" could be more expressive. Plus
"database" might confuse people. What do you think about
"External Object Storage" or something?


> Design
> ~~
> 
>  - " have": the command should output the sha1, size and
> type of all the objects the external ODB contains, one object per
> line.

This looks impractical. If a repo has 10k external files with
100 versions each then you need to read/transfer 1m hashes (this is
not made up - I am working with Git repos than contain >>10k files
in GitLFS).

Wouldn't it be better if Git collects all hashes that it currently 
needs and then asks the external ODBs if they have them?


>  - " get ": the command should then read from the
> external ODB the content of the object corresponding to  and
> output it on stdout.
> 
>  - " put   ": the command should then read
> from stdin an object and store it in the external ODB.

Based on my experience with Git clean/smudge filters I think this kind 
of single shot protocol will be a performance bottleneck as soon as 
people store more than >1000 files in the external ODB.
Maybe you can reuse my "filter process protocol" (edcc858) here?


> * Transfer
> 
> To tranfer information about the blobs stored in external ODB, some
> special refs, called "odb ref", similar as replace refs, are used.
> 
> For now there should be one odb ref per blob. Each ref name should be
> refs/odbs// where  is the sha1 of the blob stored
> in the external odb named .
> 
> These odb refs should all point to a blob that should be stored in the
> Git repository and contain information about the blob stored in the
> external odb. This information can be specific to the external odb.
> The repos can then share this information using commands like:
> 
> `git fetch origin "refs/odbs//*:refs/odbs//*"`

The "odbref" would point to a blob and the blob could contain anything,
right? E.g. it could contain an existing GitLFS pointer, right?

version https://git-lfs.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 12345


> Design discussion about performance
> ~~~
> 
> Yeah, it is not efficient to fork/exec a command to just read or write
> one object to or from the external ODB. Batch calls and/or using a
> daemon and/or RPC should be used instead to be able to store regular
> objects in an external ODB. But for now the external ODB would be all
> about really big files, where the cost of a fork+exec should not
> matter much. If we later want to extend usage of external ODBs, yeah
> we will probably need to design other mechanisms.

I think we should leverage the learnings from GitLFS as much as possible.
My learnings are:

(1) Fork/exec per object won't work. People have lots and lots of content
that is not suited for Git (e.g. integration test data, images, ...).

(2) We need a good UI. I think it would be great if the average user would 
not even need to know about ODB. Moving files explicitly with a "put"
command seems unpractical to me. GitLFS tracks files via filename and
that has a number of drawbacks, too. Do you see a way to define a 
customizable metric such as "move all files to ODB X that are gzip 
compressed larger than Y"?


> Future work
> ~~~
> 
> I think that the odb refs don't prevent a regular fetch or push from
> wanting to send the objects that are managed by an external odb. So I
> am interested in suggestions about this problem. I will take a look at
> previous discussions and how other mechanisms (shallow clone, bundle
> v3, ...) handle this.

If the ODB configuration is stored in the Git repo similar to
.gitmodules then every client that clones ODB references would be able
to resolve them, right?

Cheers,
Lars



Re: [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files

2016-12-03 Thread Beat Bolli
On 03.12.16 17:40, Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= wrote:
> On Sat, Dec 03, 2016 at 02:19:31PM +0100, Beat Bolli wrote:
>> Checking just for the unicode data files' existence is not sufficient;
>> we should also download them if a newer version exists on the Unicode
>> consortium's servers. Option -N of wget does this nicely for us.
>>
>> Cc: Torsten B??gershausen 
> 
> The V3 series makes perfect sense, thanks for cleaning up my mess.
Yeah, it took me three tries, too :-)

> (And can we remove the Cc: line, or replace with it Reviewed-by ?)
If you prefer, sure.

Do you have any other comments?

Beat


Re: [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files

2016-12-03 Thread Torsten B??gershausen
On Sat, Dec 03, 2016 at 02:19:31PM +0100, Beat Bolli wrote:
> Checking just for the unicode data files' existence is not sufficient;
> we should also download them if a newer version exists on the Unicode
> consortium's servers. Option -N of wget does this nicely for us.
> 
> Cc: Torsten B??gershausen 

The V3 series makes perfect sense, thanks for cleaning up my mess.
(And can we remove the Cc: line, or replace with it Reviewed-by ?)


Re: [PATCH] commit: make --only --allow-empty work without paths

2016-12-03 Thread Jeff King
On Sat, Dec 03, 2016 at 07:59:49AM +0100, Andreas Krey wrote:

> > OK. I'm not sure why you would want to create an empty commit in such a
> > case.
> 
> User: Ok tool, make me a pullreq.
> 
> Tool: But you haven't mentioned any issue
>   in your commit messages. Which are they?
> 
> User: Ok, that would be A-123.
> 
> Tool: git commit --allow-empty -m 'FIX: A-123'

OK. I think "tool" is slightly funny here, but I get that is part of the
real world works. Thanks for illustrating.

> > Yes, I think --run is a misfeature (I actually had to look it up, as I
> ...
> > implicit. If a single test script is annoyingly long to run, I'd argue
> 
> It wasn't about runtime but about output. I would have
> liked to see only the output of my still-failing test;
> a 'stop after test X' would be helpful there.

You can do --verbose-only=, but if the test is failing, I typically
use "-v -i". That makes everything verbose, and then stops at the
failing test, so you can see the output easily.

-Peff


[PATCH v3 1/3] update-unicode.sh: automatically download newer definition files

2016-12-03 Thread Beat Bolli
Checking just for the unicode data files' existence is not sufficient;
we should also download them if a newer version exists on the Unicode
consortium's servers. Option -N of wget does this nicely for us.

Cc: Torsten Bögershausen 
Signed-off-by: Beat Bolli 
---
Diff to v2:
  - reorder the commits: fix all of update-unicode.sh first, then
regenerate unicode_width.h only once

 update_unicode.sh | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/update_unicode.sh b/update_unicode.sh
index 27af77c..3c84270 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -10,12 +10,8 @@ if ! test -d unicode; then
mkdir unicode
 fi &&
 ( cd unicode &&
-   if ! test -f UnicodeData.txt; then
-   wget 
http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
-   fi &&
-   if ! test -f EastAsianWidth.txt; then
-   wget 
http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
-   fi &&
+   wget -N http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt \
+   http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt 
&&
if ! test -d uniset; then
git clone https://github.com/depp/uniset.git
fi &&
-- 
2.7.2


[PATCH v3 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table

2016-12-03 Thread Beat Bolli
The function bisearch() in utf8.c does a pure binary search in
double_width. It does not care about the 17 plane offsets which
unicode/uniset/uniset prepends. Leaving the plane offsets in the table
may cause wrong results.

Filter out the plane offsets in update-unicode.sh.

Cc: Torsten Bögershausen 
Signed-off-by: Beat Bolli 
---
 update_unicode.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/update_unicode.sh b/update_unicode.sh
index 3c84270..4c1ec8d 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -30,7 +30,7 @@ fi &&
  grep -v plane)
};
static const struct interval double_width[] = {
-   $(uniset/uniset --32 eaw:F,W)
+   $(uniset/uniset --32 eaw:F,W | grep -v plane)
};
EOF
 )
-- 
2.7.2


[PATCH v3 3/3] unicode_width.h: update the tables to Unicode 9.0

2016-12-03 Thread Beat Bolli
Rerunning update-unicode.sh that we fixed in the two previous commits
produces these new tables.

Signed-off-by: Beat Bolli 
---
 unicode_width.h | 131 +---
 1 file changed, 107 insertions(+), 24 deletions(-)

diff --git a/unicode_width.h b/unicode_width.h
index 47cdd23..02207be 100644
--- a/unicode_width.h
+++ b/unicode_width.h
@@ -25,7 +25,7 @@ static const struct interval zero_width[] = {
 { 0x0825, 0x0827 },
 { 0x0829, 0x082D },
 { 0x0859, 0x085B },
-{ 0x08E4, 0x0902 },
+{ 0x08D4, 0x0902 },
 { 0x093A, 0x093A },
 { 0x093C, 0x093C },
 { 0x0941, 0x0948 },
@@ -120,6 +120,7 @@ static const struct interval zero_width[] = {
 { 0x17C9, 0x17D3 },
 { 0x17DD, 0x17DD },
 { 0x180B, 0x180E },
+{ 0x1885, 0x1886 },
 { 0x18A9, 0x18A9 },
 { 0x1920, 0x1922 },
 { 0x1927, 0x1928 },
@@ -158,7 +159,7 @@ static const struct interval zero_width[] = {
 { 0x1CF4, 0x1CF4 },
 { 0x1CF8, 0x1CF9 },
 { 0x1DC0, 0x1DF5 },
-{ 0x1DFC, 0x1DFF },
+{ 0x1DFB, 0x1DFF },
 { 0x200B, 0x200F },
 { 0x202A, 0x202E },
 { 0x2060, 0x2064 },
@@ -171,13 +172,13 @@ static const struct interval zero_width[] = {
 { 0x3099, 0x309A },
 { 0xA66F, 0xA672 },
 { 0xA674, 0xA67D },
-{ 0xA69F, 0xA69F },
+{ 0xA69E, 0xA69F },
 { 0xA6F0, 0xA6F1 },
 { 0xA802, 0xA802 },
 { 0xA806, 0xA806 },
 { 0xA80B, 0xA80B },
 { 0xA825, 0xA826 },
-{ 0xA8C4, 0xA8C4 },
+{ 0xA8C4, 0xA8C5 },
 { 0xA8E0, 0xA8F1 },
 { 0xA926, 0xA92D },
 { 0xA947, 0xA951 },
@@ -204,7 +205,7 @@ static const struct interval zero_width[] = {
 { 0xABED, 0xABED },
 { 0xFB1E, 0xFB1E },
 { 0xFE00, 0xFE0F },
-{ 0xFE20, 0xFE2D },
+{ 0xFE20, 0xFE2F },
 { 0xFEFF, 0xFEFF },
 { 0xFFF9, 0xFFFB },
 { 0x101FD, 0x101FD },
@@ -228,16 +229,21 @@ static const struct interval zero_width[] = {
 { 0x11173, 0x11173 },
 { 0x11180, 0x11181 },
 { 0x111B6, 0x111BE },
+{ 0x111CA, 0x111CC },
 { 0x1122F, 0x11231 },
 { 0x11234, 0x11234 },
 { 0x11236, 0x11237 },
+{ 0x1123E, 0x1123E },
 { 0x112DF, 0x112DF },
 { 0x112E3, 0x112EA },
-{ 0x11301, 0x11301 },
+{ 0x11300, 0x11301 },
 { 0x1133C, 0x1133C },
 { 0x11340, 0x11340 },
 { 0x11366, 0x1136C },
 { 0x11370, 0x11374 },
+{ 0x11438, 0x1143F },
+{ 0x11442, 0x11444 },
+{ 0x11446, 0x11446 },
 { 0x114B3, 0x114B8 },
 { 0x114BA, 0x114BA },
 { 0x114BF, 0x114C0 },
@@ -245,6 +251,7 @@ static const struct interval zero_width[] = {
 { 0x115B2, 0x115B5 },
 { 0x115BC, 0x115BD },
 { 0x115BF, 0x115C0 },
+{ 0x115DC, 0x115DD },
 { 0x11633, 0x1163A },
 { 0x1163D, 0x1163D },
 { 0x1163F, 0x11640 },
@@ -252,6 +259,16 @@ static const struct interval zero_width[] = {
 { 0x116AD, 0x116AD },
 { 0x116B0, 0x116B5 },
 { 0x116B7, 0x116B7 },
+{ 0x1171D, 0x1171F },
+{ 0x11722, 0x11725 },
+{ 0x11727, 0x1172B },
+{ 0x11C30, 0x11C36 },
+{ 0x11C38, 0x11C3D },
+{ 0x11C3F, 0x11C3F },
+{ 0x11C92, 0x11CA7 },
+{ 0x11CAA, 0x11CB0 },
+{ 0x11CB2, 0x11CB3 },
+{ 0x11CB5, 0x11CB6 },
 { 0x16AF0, 0x16AF4 },
 { 0x16B30, 0x16B36 },
 { 0x16F8F, 0x16F92 },
@@ -262,31 +279,59 @@ static const struct interval zero_width[] = {
 { 0x1D185, 0x1D18B },
 { 0x1D1AA, 0x1D1AD },
 { 0x1D242, 0x1D244 },
+{ 0x1DA00, 0x1DA36 },
+{ 0x1DA3B, 0x1DA6C },
+{ 0x1DA75, 0x1DA75 },
+{ 0x1DA84, 0x1DA84 },
+{ 0x1DA9B, 0x1DA9F },
+{ 0x1DAA1, 0x1DAAF },
+{ 0x1E000, 0x1E006 },
+{ 0x1E008, 0x1E018 },
+{ 0x1E01B, 0x1E021 },
+{ 0x1E023, 0x1E024 },
+{ 0x1E026, 0x1E02A },
 { 0x1E8D0, 0x1E8D6 },
+{ 0x1E944, 0x1E94A },
 { 0xE0001, 0xE0001 },
 { 0xE0020, 0xE007F },
 { 0xE0100, 0xE01EF }
 };
 static const struct interval double_width[] = {
-{ /* plane */ 0x0, 0x1C },
-{ /* plane */ 0x1C, 0x21 },
-{ /* plane */ 0x21, 0x22 },
-{ /* plane */ 0x22, 0x23 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
 { 0x1100, 0x115F },
+{ 0x231A, 0x231B },
 { 0x2329, 0x232A },
+{ 0x23E9, 0x23EC },
+{ 0x23F0, 0x23F0 },
+{ 0x23F3, 0x23F3 },
+{ 0x25FD, 0x25FE },
+{ 0x2614, 0x2615 },
+{ 0x2648, 0x2653 },
+{ 0x267F, 0x267F },
+{ 0x2693, 0x2693 },
+{ 0x26A1, 0x26A1 },
+{ 0x26AA, 0x26AB },
+{ 0x26BD, 0x26BE },
+{ 0x26C4, 0x26C5 },
+{ 0x26CE, 0x26CE },
+{ 0x26D4, 0x26D4 },
+{ 0x26EA, 0x26EA },
+{ 0x26F2, 0x26F3 },
+{ 0x26F5, 0x26F5 },
+{ 0x26FA, 0x26FA },
+{ 0x26FD, 0x26FD },
+{ 0x2705, 0x2705 },
+{ 0x270A, 0x270B },
+{ 0x2728, 0x2728 },
+{ 0x274C, 0x274C },
+{ 0x274E, 0x274E },
+{ 0x2753, 0x2755 },
+{ 0x2757, 0x2757 },
+{ 0x2795, 0x2797 },
+{ 0x27B0, 0x27B0 },
+{ 0x27BF, 0x27BF },
+{ 0x2B1B, 0x2B1C },
+{ 0x2B50, 0x2B50 },
+{ 0x2B55, 0x2B55 },
 { 0x2E80, 0x2E99 },
 { 0x2E9B, 0x2EF3 },
 { 0x2F00, 0x2FD5 },
@@ -313,11 +358,49 @@ static const struct interval double_width[] = {
 { 0xFE68, 0xFE6B },
 { 0xFF01, 0xFF60 },
 { 0xFFE0, 0xFFE6 },
+{ 0x16FE0, 0x16FE0 },
+{ 0x17000, 0x187EC },
+{ 0x18800, 0x18AF2 },
 { 0x1B000, 0x1B001 },
+{ 0x1F004, 

[PATCH v2 1/3] update-unicode.sh: automatically download newer definition files

2016-12-03 Thread Beat Bolli
Checking just for the unicode data files' existence is not sufficient;
we should also download them if a newer version exists on the Unicode
consortium's servers. Option -N of wget does this nicely for us.

Cc: Torsten Bögershausen 
Signed-off-by: Beat Bolli 
---
Diff to v1:
  - reword the commit message
  - add Thorsten's Cc:

 update_unicode.sh | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/update_unicode.sh b/update_unicode.sh
index 27af77c..3c84270 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -10,12 +10,8 @@ if ! test -d unicode; then
mkdir unicode
 fi &&
 ( cd unicode &&
-   if ! test -f UnicodeData.txt; then
-   wget 
http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
-   fi &&
-   if ! test -f EastAsianWidth.txt; then
-   wget 
http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
-   fi &&
+   wget -N http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt \
+   http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt 
&&
if ! test -d uniset; then
git clone https://github.com/depp/uniset.git
fi &&
-- 
2.7.2


[PATCH v2 3/3] unicode_width.h: fix the double_width[] table

2016-12-03 Thread Beat Bolli
The function bisearch() in utf8.c does a pure binary search in
double_width. It does not care about the 17 plane offsets which
unicode/uniset/uniset prepends. Leaving the plane offsets in the table
may cause wrong results.

Filter out the plane offsets in update-unicode.sh and regenerate the
table.

Cc: Torsten Bögershausen 
Signed-off-by: Beat Bolli 
---
Diff to v1:
  - add Thorsten's Cc:

 unicode_width.h   | 17 -
 update_unicode.sh |  2 +-
 2 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/unicode_width.h b/unicode_width.h
index 73b5fd6..02207be 100644
--- a/unicode_width.h
+++ b/unicode_width.h
@@ -297,23 +297,6 @@ static const struct interval zero_width[] = {
 { 0xE0100, 0xE01EF }
 };
 static const struct interval double_width[] = {
-{ /* plane */ 0x0, 0x3D },
-{ /* plane */ 0x3D, 0x68 },
-{ /* plane */ 0x68, 0x69 },
-{ /* plane */ 0x69, 0x6A },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
 { 0x1100, 0x115F },
 { 0x231A, 0x231B },
 { 0x2329, 0x232A },
diff --git a/update_unicode.sh b/update_unicode.sh
index 3c84270..4c1ec8d 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -30,7 +30,7 @@ fi &&
  grep -v plane)
};
static const struct interval double_width[] = {
-   $(uniset/uniset --32 eaw:F,W)
+   $(uniset/uniset --32 eaw:F,W | grep -v plane)
};
EOF
 )
-- 
2.7.2


[PATCH 3/3] unicode_width.h: fix the double_width[] table

2016-12-03 Thread Beat Bolli
The function bisearch() in utf8.c does a pure binary search in
double_width. It does not care about the 17 plane offsets which
unicode/uniset/uniset prepends. Leaving the plane offsets in the table
may cause wrong results.

Filter out the plane offsets in the update-unicode.sh and regenerate
the table.

Signed-off-by: Beat Bolli 
---
 unicode_width.h   | 17 -
 update_unicode.sh |  2 +-
 2 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/unicode_width.h b/unicode_width.h
index 73b5fd6..02207be 100644
--- a/unicode_width.h
+++ b/unicode_width.h
@@ -297,23 +297,6 @@ static const struct interval zero_width[] = {
 { 0xE0100, 0xE01EF }
 };
 static const struct interval double_width[] = {
-{ /* plane */ 0x0, 0x3D },
-{ /* plane */ 0x3D, 0x68 },
-{ /* plane */ 0x68, 0x69 },
-{ /* plane */ 0x69, 0x6A },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
 { 0x1100, 0x115F },
 { 0x231A, 0x231B },
 { 0x2329, 0x232A },
diff --git a/update_unicode.sh b/update_unicode.sh
index 3c84270..4c1ec8d 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -30,7 +30,7 @@ fi &&
  grep -v plane)
};
static const struct interval double_width[] = {
-   $(uniset/uniset --32 eaw:F,W)
+   $(uniset/uniset --32 eaw:F,W | grep -v plane)
};
EOF
 )
-- 
2.7.2


Re: git reset --hard should not irretrievably destroy new files

2016-12-03 Thread Christian Couder
On Sat, Dec 3, 2016 at 6:04 AM, Julian de Bhal  wrote:
> If you `git add new_file; git reset --hard`, new_file is gone forever.
>
> This is totally what git says it will do on the box, but it caught me out.

Yeah, you are not the first one, and probably not the last
unfortunately, to be caught by it, see for example the last discussion
about it:

https://public-inbox.org/git/loom.20160523t023140-...@post.gmane.org/

which itself refers to this previous discussion:

https://public-inbox.org/git/CANWD=rx-meis4cnzdwr2wwkshz2zu8-l31urkwbzrjsbcjx...@mail.gmail.com/

> It might seem a little less stupid if I explain what I was doing: I was
> breaking apart a chunk of work into smaller changes:
>
> git commit -a -m 'tmp'   # You feel pretty safe now, right?
> git checkout -b backup/my-stuff  # Not necessary, just a convenience
> git checkout -
> git reset HEAD^  # mixed
> git add new_file
> git add -p   # also not necessary, but distracting
> git reset --hard # decided copy from backed up diff
> # boom. new_file is gone forever
>
>
> Now, again, this is totally what git says it's going to do, and that was
> pretty stupid, but that file is gone for good, and it feels bad.

Yeah, I agree that it feels bad even if there are often ways to get
back your data as you can see from the links in Yotam's email above.

> Everything that was committed is safe, and the other untracked files in
> my local directory are also fine, but that particular file is
> permanently destroyed. This is the first time I've lost something since I
> discovered the reflog a year or two ago.
>
> The behaviour that would make the most sense to me (personally) would be
> for a hard reset to unstage new files,

This has already been proposed last time...

> but I'd be nearly as happy if a
> commit was added to the reflog when the reset happens (I can probably make
> that happen with some configuration now that I've been bitten).

Not sure if this has been proposed. Perhaps it would be simpler to
just output the sha1, and maybe the filenames too, of the blobs, that
are no more referenced from the trees, somewhere (in a bloblog?).

> If there's support for this idea but no-one is keen to write the code, let
> me know and I could have a crack at it.

Not sure if your report and your offer will make us more likely to
agree to do something, but thanks for trying!