Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility
Le 22/07/2017 à 02:43, Jiang Xin a écrit : > > Benefit of using the tweak version of gettext: > > 1. `make pot` can be run in a tar extract directory (without git controlled). This issue is real for packet maintainers who can patch the original source and run their own set of utilities outside of a git repo. This can be possible with Junio's proposition by writing the files to a temporary directory before running the xgettext, then removing the temporary directory. Please note that with respect to this issue, the patched xgettext approach is completely disruptive. > 2. do not need to run `git reset --hard`. Same as before. > 3. it's quick (nobody cares). > Requiring patched tools is really breaking collaboration. Git made a great case of relying on standard tools (not even GNU versions), so that would really go backward. Plus, I hope that some day, instead of translators finding afterwards that a change broke i18n capabilities, developpers would have some kind of sanity check. Requiring special versions of i18n tooling stops this hope. <>
Re: [L10N] Kickoff of translation for Git 2.14.0 round 1
Le 22/07/2017 à 19:02, Kaartic Sivaraam a écrit : > On Sat, 2017-07-15 at 21:30 +0200, Jean-Noël Avila wrote: >> * commit 4ddb1354e8 ("status: contextually notify user about an initial >> commit") plays sentence lego while introducing colorization which again >> does not play well with i18n. >> > What, if anything, should be done about this? > I only spotted it because the string is new for translation. But the previous version was already playing sentence lego. So this is not a regression ;-) If I understand correctly, getting a i18n friendly string would require being able to "color_sprintf" the branche name, and then "color_fprintf" the output with a %s formatting string. None of this is already available and that would introduce cumbersome logic in the code. More generally, i18n puts some pressure on coding style for sure, and it gets worse with multi-platform and coloring... how can we ease the burden of developpers on this front without resorting to ad hoc patches?
Re: reftable [v3]: new ref storage format
3rd iteration of the reftable storage format. You can read a rendered version of this here: https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md Significant changes from v2: - efficient lookup by SHA-1 for allow-tip-sha1-in-want. - type 0x4 for FETCH_HEAD, MERGE_HEAD. - file size up (27.7 M in v1, 34.4 M in v3) The file size increase is due to lookup by SHA-1 support. By using unique abbreviations its adding about 7 MiB to the file size for 865,258 objects behind 866,456 refs. Average entry for this direction costs 8 bytes, using a 6 byte/12 hex unique abbreviation. ## Overview ### Problem statement Some repositories contain a lot of references (e.g. android at 866k, rails at 31k). The existing packed-refs format takes up a lot of space (e.g. 62M), and does not scale with additional references. Lookup of a single reference requires linearly scanning the file. Atomic pushes modifying multiple references require copying the entire packed-refs file, which can be a considerable amount of data moved (e.g. 62M in, 62M out) for even small transactions (2 refs modified). Repositories with many loose references occupy a large number of disk blocks from the local file system, as each reference is its own file storing 41 bytes (and another file for the corresponding reflog). This negatively affects the number of inodes available when a large number of repositories are stored on the same filesystem. Readers can be penalized due to the larger number of syscalls required to traverse and read the `$GIT_DIR/refs` directory. ### Objectives - Near constant time lookup for any single reference, even when the repository is cold and not in process or kernel cache. - Near constant time verification a SHA-1 is referred to by at least one reference (for allow-tip-sha1-in-want). - Efficient lookup of an entire namespace, such as `refs/tags/`. - Support atomic push `O(size_of_update)` operations. - Combine reflog storage with ref storage. ### Description A reftable file is a portable binary file format customized for reference storage. References are sorted, enabling linear scans, binary search lookup, and range scans. Storage in the file is organized into blocks. Prefix compression is used within a single block to reduce disk space. Block size is tunable by the writer. ### Performance Space used, packed-refs vs. reftable: repository | packed-refs | reftable | % original | avg ref | avg obj ---|:|-:|---:|-:|: android| 62.2 M | 34.4 M | 55.2% | 33 bytes | 8 bytes rails | 1.8 M |1.1 M | 57.7% | 29 bytes | 6 bytes git| 78.7 K | 44.0 K | 60.0% | 50 bytes | 6 bytes git (heads)| 332 b |239 b | 72.0% | 31 bytes | 0 bytes Scan (read 866k refs), by reference name lookup (single ref from 866k refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs): format | scan| by name| by SHA-1 |:|---:|---: packed-refs | 402 ms | 409,660.1 usec | 412,535.8 usec reftable| 112 ms | 42.7 usec | 340.8 usec Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable: format| size | avg log --|--:|---: $GIT_DIR/logs | 173 M | 1209 bytes reftable | 4 M | 30 bytes ## Details ### Peeling References in a reftable are always peeled. ### Reference name encoding Reference names should be encoded with UTF-8. ### Network byte order All multi-byte, fixed width fields are in network byte order. ### Ordering Blocks are lexicographically ordered by their first reference. ### Directory/file conflicts The reftable format accepts both `refs/heads/foo` and `refs/heads/foo/bar` as distinct references. This property is useful for retaining log records in reftable, but may confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain references. Users of reftable may choose to continue to reject `foo` and `foo/bar` type conflicts to prevent problems for peers. ## File format ### Structure A reftable file has the following high-level structure: first_block { header first_ref_block } ref_blocks* ref_index? obj_blocks* obj_index? log_blocks* log_index? footer ### Block size The `block_size` is arbitrarily determined by the writer, and does not have to be a power of 2. The block size must be larger than the longest reference name or deflated log entry used in the repository, as references cannot span blocks. Powers of two that are friendly to the virtual memory system or filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can yield better compression, with a possible increased cost incurred by readers during access. The largest block size is `16777215` bytes (15.99 MiB). ### Header An 8-byte header appears at the beginning of the file: '\1REF' uint8(
Re: [L10N] Kickoff of translation for Git 2.14.0 round 1
On Sat, 2017-07-15 at 21:30 +0200, Jean-Noël Avila wrote: > * commit 4ddb1354e8 ("status: contextually notify user about an initial > commit") plays sentence lego while introducing colorization which again > does not play well with i18n. > What, if anything, should be done about this? -- Kaartic
Re: [PATCH] sha1_file: use access(), not lstat(), if possible
Johannes Schindelinwrites: > But this whole thread taps into a gripe I have with parts of Git's code > base: part of the code is not clear at all in its intent by virtue of > calling whatever POSIX function may seem to give the answer for the > intended question, instead of implementing a function whose name says > precisely what question is asked. > > In this instance, we do not call a helper get_file_size(). Oh no. That > would make it too obvious. We call lstat() instead. I agree with you for this case and a case like this in general. In codepaths at a lot lower level (they tend to be the ancient and quite fundamental ones) in our codebase, lstat() is often directly used by the caller because they are interested not only in a single aspect of a path but many fields in struct stat are of interest. When the code is interested in existence or size or whatever single aspect of a path and nothing else, however, the code would become easier to read if a helper function with a more specific name is used. And it may even help individual platforms that do not want to use the full lstat() emulation, by telling them that other fields in struct stat are not needed. Of course, then the issue becomes what to do when we are interested in not just one but a selected few attributes. Perhaps we create a helper "get_A_B_and_C_attributes_for_path()", which may use lstat() on POSIX and the most efficient way to get only A, B and C attributes on non-POSIX platforms. The implementation would be OK, but the naming becomes a bit hard; we need to give it a good name. Things gets even more interesting when the set of attributes we are interested in grows by one and we need to rename the function to "get_A_B_C_and_D_attributes_for_path()". When it is a lot easier to fall back to the full lstat() emulation on non-POSIX platforms, the temptation to just use it even though it would grab attributes that are not needed in that function grows, which needs to be resisted by those who are doing the actual implementation for a particular platform.
Re: [PATCH] make get_be64() compile on pu with NO_UNALIGNED_LOADS
Martin Ågrenwrites: > Applies to pu and passes the tests. I think this should be squashed in > somewhere. Perhaps a mismerge in commit d553324d ("Merge branch > 'bp/fsmonitor' into pu", 2017-07-21). Yes, you spotted a mistaken evil-merge. Thanks. > > compat/bswap.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/compat/bswap.h b/compat/bswap.h > index 133da1d2b..f86110a72 100644 > --- a/compat/bswap.h > +++ b/compat/bswap.h > @@ -188,11 +188,11 @@ static inline void put_be32(void *ptr, uint32_t value) > p[3] = value >> 0; > } > > -static inline unit64_t get_be64(const void *ptr) > +static inline uint64_t get_be64(const void *ptr) > { > - unsigned char *p = ptr; > + const unsigned char *p = ptr; > return ((uint64_t)get_be32(p) << 32) | > - ((uint64_t)get_be32(p + 4); > + ((uint64_t)get_be32(p + 4)); > } > > #endif
Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility
Johannes Schindelinwrites: > On Fri, 21 Jul 2017, Junio C Hamano wrote: > >> Jean-Noël Avila writes: >> >> > Le 20/07/2017 à 20:57, Junio C Hamano a écrit : >> >> >> >> + git diff --quiet HEAD && git diff --quiet --cached >> >> + >> >> + @for s in $(LOCALIZED_C) $(LOCALIZED_SH) $(LOCALIZED_PERL); \ >> > >> > Does PRIuMAX make sense for perl and sh files? >> >> Not really; I did this primarily because I would prefer to keep >> things consistent, anticipating there may be some other things we >> need to replace before running gettext(1) for other reasons later. > > It would add unnecessary churn, too, to add those specific exclusions and > make things inconsistent: the use of PRItime in Perl or shell scripts > would already make those scripts barf. And if it is unnecessary churn... > let's not do it? Sorry, but I cannot quite tell if you are in favor of limiting the set of source files that go through the sed substitution (because we know PRIuMAX is just as nonsensical as PRItime in perl and shell source), or if you are in favor of keeping the patch as-is (because changing the set of source files is a churn and substitutions would not hurt)? I am actually OK to change the above loop to process only the C sources; I am not OK to change it to process only date.c which happens to be the only source that has PRItime that matters in this context, of course. Thanks.
Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility
Johannes Schindelinwrites: >> >> A very small hack on gettext. > > I am 100% opposed to this hack. It is already cumbersome enough to find > out what is involved in i18n (it took *me* five minutes to find out that > much of the information is in po/README, with a lot of information stored > *on an external site*, and I still managed to miss the `make pot` target). > > If at all, we need to make things easier instead of harder. > > Requiring potential volunteers to waste their time to compile an > unnecessary fork of gettext? Not so great an idea. > > Plus, each and every Git build would now have to compile their own > gettext, too, as the vanilla one would not handle the .po files containing > %!!! > > And that requirement would impact instantaneously people like me, and even > worse: some other packagers might be unaware of the new requirement which > would not be caught during the build, and neither by the test suite. > Double bad idea. If I understand correctly, the patch hacks the input processing of xgettext (which reads our source code and generates po/git.pot) so that when it sees PRItime, pretend that it saw PRIuMAX, causing it to output % in its output. In our workflow, * The po/git.pot file is updated only by the l10n coordinator, and then the result is committed to our tree. * Translators build on that commit by (1) running msgmerge which takes po/git.pot and wiggles its entries into their existing po/$lang.po file so that po/$lang.po file has new entries from po/git.pot and (2) editing po/$lang.po file. The result is committed to our tree. * The build procedure builders use runs the resulting po/$lang.po files through msgfmt to produce po/$lang.mo files, which will be installed. As long as the first step results in % (not % or anything that plain vanilla msgmerge and msgfmt do not understand), the second step and third step do not require any hacked version of gettext tools. Even though I tend to agree with your conclusion that pre-processing our source before passing it to xgettext is probably a better solution in the longer term, I think the most of the objections in your message come from your misunderstanding of what Jiang's patch does and are not based on facts. My understanding is that translators do not need to compile a custom msgmerge and builders do not need a custom msgfmt.
Hello Beautiful,
Good day dear, i hope this mail meets you well? my name is Jack, from the U.S. I know this may seem inappropriate so i ask for your forgiveness but i wish to get to know you better, if I may be so bold. I consider myself an easy-going man, adventurous, honest and fun loving person but I am currently looking for a relationship in which I will feel loved. I promise to answer any question that you may want to ask me...all i need is just your attention and the chance to know you more. Please tell me more about yourself, if you do not mind. Hope to hear back from you soon. Jack.
De la señora Malika
Querida Soy la señora MALIKA Tengo 56 años de edad, viuda sin un hijo, tengo una donación de $ 2.5 millones (dos millones, quinientos mil dólares) para donar a alguien que puede usarla para trabajar para Dios, mi médico me dijo que No voy a durar mucho tiempo debido a mi enfermedad de cáncer. Quiero que me escribas para que te pueda explicar mejor. siendo bendecido, SRA Malika
Re: [PATCH v6 00/10] The final building block for a faster rebase -i
Hi Junio, On Thu, 20 Jul 2017, Junio C Hamano wrote: > Johannes Schindelinwrites: > > > Changes since v5: > > > > - replaced a get_sha1() call by a get_oid() call already. > > > > - adjusted to hashmap API changes > > Applying this to the tip of 'master' yields exactly the same result > as merging the previous round js/rebase-i-final to the tip of > 'master' and then applying merge-fix/js/rebase-i-final to adjust to > the codebase, so the net effect of this reroll is none. Which is a > good sign, as it means there wasn't any rebase mistake and the evil > merge we've been carrying was a good one. Good. > But at the same time, I prefer to avoid rebasing to newer 'master' > until the codebase starts drifting too far apart, or until a new > feature release is made out of newer 'master'. This is primarily > because I want dates on commits to mean something---namely, "this > change hasn't seen a need to be updated for 'oops, that was wrong' > since this date". This use of commit dates as 'priority date' > matters much less for a topic not in 'next', but as a general > principle, my workflow tries to preserve commit dates for all > topics. By that token, commit message updates would also be inappropriate, in particular when they came from somebody else than the patch author ;-P As to avoiding a rebase: we can add that to the growing list of things on which we disagree. If the author dates really meant anything, we would also have to avoid v2, v3, v4, ... v226 of patch series. So that flies in the face of trying to keep the meaning of author dates. In addition, the development flow I prefer is one that is in harmony with the modern Continuous Integration style, where topic branches are merged into a single, always-ready-to-release integration branch. That means that I always work off of `master`, unless there is a good reason to base off of `next` or even `pu`. That's to avoid merge conflicts, to see what really gets applied. I am *especially* adamant about rebasing to a newer upstream commit when there are merge conflicts. Such as is the case here. > For the above reason, I may hold onto this patch series in my inbox > without actually updating js/rebase-i-final topic until the current > cycle is over; please do not mistake it as this new reroll being > ignored. You do as you want, of course. But please note that I will not rebase my topic branches to an ancient revision, especially if that would cause merge conflicts with the current `master`. And if there should be another iteration of this wallflower patch series, I will rebase it to the then-current `master` again [*1*]. Ciao, Dscho Footnote *1*: in general, I try to abide by the wishes of maintainers when contributing code, unless those wishes are contrary to what I consider correct software development. Like, when in Rome, I will do as the Romans do. Except when I see them looting a parking meter.
Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility
Hi, On Sat, 22 Jul 2017, Jiang Xin wrote: > 2017-07-22 7:34 GMT+08:00 Junio C Hamano: > > Jiang Xin writes: > > > >> A very small hack on gettext. I am 100% opposed to this hack. It is already cumbersome enough to find out what is involved in i18n (it took *me* five minutes to find out that much of the information is in po/README, with a lot of information stored *on an external site*, and I still managed to miss the `make pot` target). If at all, we need to make things easier instead of harder. Requiring potential volunteers to waste their time to compile an unnecessary fork of gettext? Not so great an idea. Plus, each and every Git build would now have to compile their own gettext, too, as the vanilla one would not handle the .po files containing %!!! And that requirement would impact instantaneously people like me, and even worse: some other packagers might be unaware of the new requirement which would not be caught during the build, and neither by the test suite. Double bad idea. So let's go with Junio's patch. Ciao, Dscho
Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility
Hi, On Fri, 21 Jul 2017, Junio C Hamano wrote: > Jean-Noël Avilawrites: > > > Le 20/07/2017 à 20:57, Junio C Hamano a écrit : > >> > >> + git diff --quiet HEAD && git diff --quiet --cached > >> + > >> + @for s in $(LOCALIZED_C) $(LOCALIZED_SH) $(LOCALIZED_PERL); \ > > > > Does PRIuMAX make sense for perl and sh files? > > Not really; I did this primarily because I would prefer to keep > things consistent, anticipating there may be some other things we > need to replace before running gettext(1) for other reasons later. It would add unnecessary churn, too, to add those specific exclusions and make things inconsistent: the use of PRItime in Perl or shell scripts would already make those scripts barf. And if it is unnecessary churn... let's not do it? Ciao, Dscho
Re: [PATCH] sha1_file: use access(), not lstat(), if possible
Hi, On Thu, 20 Jul 2017, Junio C Hamano wrote: > Jonathan Tanwrites: > > > In sha1_loose_object_info(), use access() (indirectly invoked through > > has_loose_object()) instead of lstat() if we do not need the on-disk > > size, as it should be faster on Windows [1]. > > That sounds as if Windows is the only thing that matters. "It is > faster in general, and is much faster on Windows" would have been > more convincing, and "It isn't slower, and is much faster on > Windows" would also have been OK. Do we have any measurement, or > this patch does not yield any measuable gain? > > By the way, the special casing of disk_sizep (which is only used by > the batch-check feature of cat-file) is somewhat annoying with or > without this patch, but this change makes it even more so by adding > an extra indentation level. I do not think of a way to make it less > annoying offhand, and I do not think this change needs to address it > in any way, but I am mentioning this as a hint to bystanders who may > want to find something small that can be cleaned up ;-) I actually found a separate piece of information in the meantime: https://blogs.msdn.microsoft.com/oldnewthing/20071023-00/?p=24713#comment-562083 i.e. _waccess() is implemented in the same way our mingw_lstat() implementation is: by calling the very same GetFileAttributes() code path. So it has exactly the same performance characteristics, and I was wrong. But this whole thread taps into a gripe I have with parts of Git's code base: part of the code is not clear at all in its intent by virtue of calling whatever POSIX function may seem to give the answer for the intended question, instead of implementing a function whose name says precisely what question is asked. In this instance, we do not call a helper get_file_size(). Oh no. That would make it too obvious. We call lstat() instead -- under the assumption that the whole world runs on Linux, really, because let's be honest about it: lstat() implementations all differ in subtle ways and we really only test on Linux. The obviousness of something like get_file_size() would be so refreshing to these tired eyes. Oh, and it would make it much easier to maintain ports to other Operating Systems, most notably Windows. Ciao, Dscho
[PATCH] make get_be64() compile on pu with NO_UNALIGNED_LOADS
1. s/unit64_t/uint64_t/ 2. add const-qualifier to *p 3. add missing closing ')' Signed-off-by: Martin Ågren--- Applies to pu and passes the tests. I think this should be squashed in somewhere. Perhaps a mismerge in commit d553324d ("Merge branch 'bp/fsmonitor' into pu", 2017-07-21). compat/bswap.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/compat/bswap.h b/compat/bswap.h index 133da1d2b..f86110a72 100644 --- a/compat/bswap.h +++ b/compat/bswap.h @@ -188,11 +188,11 @@ static inline void put_be32(void *ptr, uint32_t value) p[3] = value >> 0; } -static inline unit64_t get_be64(const void *ptr) +static inline uint64_t get_be64(const void *ptr) { - unsigned char *p = ptr; + const unsigned char *p = ptr; return ((uint64_t)get_be32(p) << 32) | - ((uint64_t)get_be32(p + 4); + ((uint64_t)get_be32(p + 4)); } #endif -- 2.14.0.rc0.14.g12cc05b53