Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long
Torsten Bögershausen writes: > The only problematic system is Win64, where "unsigned long" is 32 bit, > and therefore we must use size_t to address data in memory. > This is not to be confused with off_t, which is used for "data on disk" > (and nothing else) or timestamp_t which is used for timestamps (and nothing > else). > > I haven't followed the "coccinelle script" development at all, if someone > makes a patch do replace "unsigned long" with size_t, that could replace > my whole patch. (Some of them may be downgraded to "unsigned int" ?) This paragraph makes it sound as if this patch is s/ulong/size_t/, but that contradicts with the previous paragraph, no? It is much better to leave a ulong that is not about the size of a memory region as-is, to be turned into appropriate and correct type later, than changing it into another wrong type (size_t). In short, we could do ulong to size_t with Coccinelle, but I do not think we want to blindly rewrite all.
Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long
Am 19.11.2018 um 06:33 schrieb Torsten Bögershausen: > The archive-tar.c is actually a good example, why a step-by-step update > is not ideal (the code would not work any more on Win64). > > If we look here: > > static int stream_blocked(const struct object_id *oid) > { > struct git_istream *st; > enum object_type type; > size_t sz; > char buf[BLOCKSIZE]; > ssize_t readlen; > > st = open_istream(oid, &type, &sz, NULL); > ^ > if (!st) > return error(_("cannot stream blob %s"), oid_to_hex(oid)); > for (;;) { > > The sz variable must follow whatever open_istream() uses, so if we start > with archive-tar.c, we must use either size_t or ulong, whatever > open_istream() needs. Otherwise things will break: > archive-tar.c uses ulong, open_istream() size_t, but we are passing pointers > around, and here &ulong != &size_t > > If we only update open_istream(), but not archive-tar.c, then > things are not better: > &size_t != &ulong. > > I don't have a good idea how to split the patch. sz is not actually used later in that function; this change can be done independently of any other ulong/size_t conversion in that file. Hmm, looking at that call I wonder why open_istream() doesn't return type and size in the struct git_istream. To make it match read_object_file(), I suppose. Perhaps it's an opportunity to improve its interface? René
Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long
On Sun, Nov 18, 2018 at 03:18:52PM -0500, Derrick Stolee wrote: > On 11/17/2018 10:11 AM, tbo...@web.de wrote: > >From: Torsten Bögershausen > > > >Currently Git users can not commit files >4Gib under 64 bit Windows, > >where "long" is 32 bit but size_t is 64 bit. > >Improve the code base in small steps, as small as possible. > >What started with a small patch to replace "unsigned long" with size_t > >in one file (convert.c) ended up with a change in many files. > > > >Signed-off-by: Torsten Bögershausen > >--- > > > >This needs to go on top of pu, to cover all the good stuff > > cooking here. > > Better to work on top of 'master', as the work in 'pu' will be rewritten > several times, probably. > > >I have started this series on November 1st, since that 2 or 3 rebases > > had been done to catch up, and now it is on pu from November 15. > > > >I couldn't find a reason why changing "unsigned ling" > > into "size_t" may break anything, any thoughts, please ? > > IIRC, the blocker for why we haven't done this already is that "size_t", > "timestamp_t" and "off_t" are all 64-bit types that give more implied > meaning, so we should swap types specifically to these as we want. One > example series does a specific, small change [1]. > > If we wanted to do a single swap that removes 'unsigned long' with an > unambiguously 64-bit type, I would recommend "uint64_t". Later replacements > to size_t, off_t, and timestamp_t could happen on a case-by-case basis for > type clarity. > > It may benefit reviewers to split this change into multiple patches, > starting at the lowest levels of the call stack (so higher 'unsigned long's > can up-cast to the 64-bit types when calling a function) and focusing the > changes to one or two files at a time (X.c and X.h, preferrably). > > Since you are talking about the benefits for Git for Windows to accept 4GB > files, I wonder if we can add a test that verifies such a file will work. If > you have such a test, then I could help verify that the test fails before > the change and succeeds afterward. > > Finally, it may be good to add a coccinelle script that replaces 'unsigned > long' with 'uint64_t' so we can easily fix any new introductions that happen > in the future. The plan was never to change "unsigned long" to a 64 bit value in general. The usage of "unsigned long" (instead of size_t) was (and is) still good for 32bit systems, where both are 32 bit. (at least all system I am aware of). For 64 bit systems like Linux or Mac OS it is the same, both are 64 bit. The only problematic system is Win64, where "unsigned long" is 32 bit, and therefore we must use size_t to address data in memory. This is not to be confused with off_t, which is used for "data on disk" (and nothing else) or timestamp_t which is used for timestamps (and nothing else). I haven't followed the "coccinelle script" development at all, if someone makes a patch do replace "unsigned long" with size_t, that could replace my whole patch. (Some of them may be downgraded to "unsigned int" ?) However, as we need to let tb/print-size-t-with-uintmax-format make it to master, otherwise we are not able to print the variables in a portable way. > Thanks! I do think we should make this change, but we must be careful. It > may be disruptive to topics in flight. > > -Stolee > > [1] https://public-inbox.org/git/20181112084031.11769-1-care...@gmail.com/ >
Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long
On 2018-11-19 00:40, Junio C Hamano wrote: > Derrick Stolee writes: > >>> This needs to go on top of pu, to cover all the good stuff >>>cooking here. >> >> Better to work on top of 'master', as the work in 'pu' will be >> rewritten several times, probably. > > We may not be able to find any good moment to update some codepaths > with deep callchains that reaches a basic API function that take > ulong that way, as things are always in motion, but hopefully a lot > of areas that need changes are rather isolated. > > For example, the changes I see around "offset" (which is "ulong" and > the patch wants to change it to "size_t") in archive-tar.c in the > patch do not have any interaction with the changes in this patch > outside that single file, and I do not think any topic in-flight > would interact with this change badly, either. I didn't carefully > look at the remainder of the patches, but I have a feeling that many > can be separated out into independent and focused set of smaller > patches that can be evaluated on their own. > The archive-tar.c is actually a good example, why a step-by-step update is not ideal (the code would not work any more on Win64). If we look here: static int stream_blocked(const struct object_id *oid) { struct git_istream *st; enum object_type type; size_t sz; char buf[BLOCKSIZE]; ssize_t readlen; st = open_istream(oid, &type, &sz, NULL); ^ if (!st) return error(_("cannot stream blob %s"), oid_to_hex(oid)); for (;;) { The sz variable must follow whatever open_istream() uses, so if we start with archive-tar.c, we must use either size_t or ulong, whatever open_istream() needs. Otherwise things will break: archive-tar.c uses ulong, open_istream() size_t, but we are passing pointers around, and here &ulong != &size_t If we only update open_istream(), but not archive-tar.c, then things are not better: &size_t != &ulong. I don't have a good idea how to split the patch. However, "add a coccinelle script" may be a solution.
Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long
Derrick Stolee writes: >> This needs to go on top of pu, to cover all the good stuff >>cooking here. > > Better to work on top of 'master', as the work in 'pu' will be > rewritten several times, probably. We may not be able to find any good moment to update some codepaths with deep callchains that reaches a basic API function that take ulong that way, as things are always in motion, but hopefully a lot of areas that need changes are rather isolated. For example, the changes I see around "offset" (which is "ulong" and the patch wants to change it to "size_t") in archive-tar.c in the patch do not have any interaction with the changes in this patch outside that single file, and I do not think any topic in-flight would interact with this change badly, either. I didn't carefully look at the remainder of the patches, but I have a feeling that many can be separated out into independent and focused set of smaller patches that can be evaluated on their own.
Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long
On 11/17/2018 10:11 AM, tbo...@web.de wrote: From: Torsten Bögershausen Currently Git users can not commit files >4Gib under 64 bit Windows, where "long" is 32 bit but size_t is 64 bit. Improve the code base in small steps, as small as possible. What started with a small patch to replace "unsigned long" with size_t in one file (convert.c) ended up with a change in many files. Signed-off-by: Torsten Bögershausen --- This needs to go on top of pu, to cover all the good stuff cooking here. Better to work on top of 'master', as the work in 'pu' will be rewritten several times, probably. I have started this series on November 1st, since that 2 or 3 rebases had been done to catch up, and now it is on pu from November 15. I couldn't find a reason why changing "unsigned ling" into "size_t" may break anything, any thoughts, please ? IIRC, the blocker for why we haven't done this already is that "size_t", "timestamp_t" and "off_t" are all 64-bit types that give more implied meaning, so we should swap types specifically to these as we want. One example series does a specific, small change [1]. If we wanted to do a single swap that removes 'unsigned long' with an unambiguously 64-bit type, I would recommend "uint64_t". Later replacements to size_t, off_t, and timestamp_t could happen on a case-by-case basis for type clarity. It may benefit reviewers to split this change into multiple patches, starting at the lowest levels of the call stack (so higher 'unsigned long's can up-cast to the 64-bit types when calling a function) and focusing the changes to one or two files at a time (X.c and X.h, preferrably). Since you are talking about the benefits for Git for Windows to accept 4GB files, I wonder if we can add a test that verifies such a file will work. If you have such a test, then I could help verify that the test fails before the change and succeeds afterward. Finally, it may be good to add a coccinelle script that replaces 'unsigned long' with 'uint64_t' so we can easily fix any new introductions that happen in the future. Thanks! I do think we should make this change, but we must be careful. It may be disruptive to topics in flight. -Stolee [1] https://public-inbox.org/git/20181112084031.11769-1-care...@gmail.com/
[PATCH/RFC v1 1/1] Use size_t instead of unsigned long
From: Torsten Bögershausen Currently Git users can not commit files >4Gib under 64 bit Windows, where "long" is 32 bit but size_t is 64 bit. Improve the code base in small steps, as small as possible. What started with a small patch to replace "unsigned long" with size_t in one file (convert.c) ended up with a change in many files. Signed-off-by: Torsten Bögershausen --- This needs to go on top of pu, to cover all the good stuff cooking here. I have started this series on November 1st, since that 2 or 3 rebases had been done to catch up, and now it is on pu from November 15. I couldn't find a reason why changing "unsigned ling" into "size_t" may break anything, any thoughts, please ? Side question: One thing I wondered about is why Git creates a conflict like this, using git cherry-pick: <<< HEAD unsigned long size; void *data = read_object_file(oid, &type, &size); === size_t size; void *data = repo_read_object_file(the_repository, oid, &type, &size); >>> 3ee0abef4c... Use size_t instead of unsigned long One commit changed "unsigned long size" into "size_t size", the other commit swapped repo_read_object_file() with read_object_file(). Both changed are on different lines, but Git sees a conflict here. apply.c | 14 - archive-tar.c| 18 +-- archive-zip.c| 2 +- archive.c| 2 +- archive.h| 2 +- bisect.c | 2 +- blame.c | 6 ++-- blame.h | 2 +- builtin/cat-file.c | 10 +++--- builtin/difftool.c | 3 +- builtin/fast-export.c| 6 ++-- builtin/fmt-merge-msg.c | 4 ++- builtin/fsck.c | 6 ++-- builtin/grep.c | 8 ++--- builtin/index-pack.c | 27 builtin/log.c| 4 +-- builtin/ls-tree.c| 2 +- builtin/merge-tree.c | 6 ++-- builtin/mktag.c | 5 +-- builtin/notes.c | 6 ++-- builtin/pack-objects.c | 56 +- builtin/reflog.c | 2 +- builtin/replace.c| 2 +- builtin/tag.c| 4 +-- builtin/unpack-file.c| 2 +- builtin/unpack-objects.c | 35 ++--- builtin/verify-commit.c | 4 +-- bundle.c | 2 +- cache.h | 10 +++--- combine-diff.c | 11 --- commit.c | 22 +++--- commit.h | 10 +++--- config.c | 2 +- convert.c| 18 +-- delta.h | 20 ++-- diff-delta.c | 4 +-- diff.c | 30 +- diff.h | 2 +- diffcore-pickaxe.c | 4 +-- diffcore.h | 2 +- dir.c| 6 ++-- dir.h| 2 +- entry.c | 4 +-- fast-import.c| 26 fsck.c | 12 fsck.h | 2 +- fuzz-pack-headers.c | 4 +-- grep.h | 2 +- http-push.c | 2 +- list-objects-filter.c| 2 +- mailmap.c| 2 +- match-trees.c| 4 +-- merge-blobs.c| 6 ++-- merge-blobs.h| 2 +- merge-recursive.c| 4 +-- notes-cache.c| 2 +- notes-merge.c| 4 +-- notes.c | 6 ++-- object-store.h | 20 ++-- object.c | 4 +-- object.h | 2 +- pack-check.c | 2 +- pack-objects.h | 14 - pack.h | 2 +- packfile.c | 40 packfile.h | 8 ++--- patch-delta.c| 8 ++--- range-diff.c | 2 +- read-cache.c | 48 ++--- ref-filter.c | 30 +- remote-testsvn.c | 4 +-- rerere.c | 2 +- sha1-file.c | 66 sha1dc_git.c | 2 +- sha1dc_git.h | 2 +- streaming.c | 12 streaming.h | 2 +- submodule-config.c | 2 +- t/helper/test-delta.c| 2 +- tag.c| 6 ++-- tag.h| 2 +- tree-walk.c | 14 - tree.c | 2 +- xdiff-interface.c| 4 +-- xdiff-interface.h| 4 +-- 85 files changed, 391 insertions(+), 384 deletions(-) diff --git a/apply.c b/apply.c index 3703bfc8d0..5e11b85d17 100644 --- a/apply.c +++ b/apply.c @@ -3096,7 +3096,7 @@ static int apply_binary_fragment(struct apply_state *state, struct patch *patch) { struct fragment *fragment = patch->fragments; - unsigned long len; + size_t len;