> On 13 Dec 2025, at 13:34, Even Rouault <[email protected]> wrote:
> 
> Hi Roger,
> 
>> The reason these stand out is that there are an awful lot of static analysis 
>> warnings (over 400) relating to unnecessary (and potentially buggy) sign 
>> conversions, and most of these are due to the use of signed integers where 
>> unsigned would typically be used.
> With which tool? The library component itself is clean with cppcheck, 
> Coverity Scan, CLang static analyzer and CodeQL as we use them in GDAL with a 
> vendored copy of libtiff.

clang-tidy and also just clang.  I’ve been progressively adding more compiler 
warnings and looking at the scope required to fix each of them.  They are 
actually already in the source tree as the extra warnings in 
cmake/CompilerChecks.cmake.  In the CI builds we only enable "-Wall -Wextra 
-Werror” and these are by no means comprehensive.  Adding in the extra warnings 
shows up a lot of basic problems including implicit narrowing conversions, 
cast-alignment problems etc.  Not sure why the static analysers aren’t also 
picking these up unless we need to increase their stringency.

I have also been exploring the use of AI tools like Claude Code the last few 
weeks.  While I’m not yet sold on their abilities to write new code, I’ve found 
them quite useful for doing detailed code review and also systematic 
refactoring with in many cases perfect accuracy.  I’ve used it to find and fix 
some long-standing defects which had been known and unfixable for 15 years, but 
it only took a few minutes to find and fix issues that human reviewers had 
never managed to get their heads around.  So I’ll also be trying it out on 
libtiff and asking it to review portions of the codebase and identify defects 
which the static analysis would not pick up.  It’s good at finding 
inconsistencies in the API and design which the static analysis would never 
notice.

The other aspect to this is I’ve been looking at what changes would be needed 
to make the libtiff C source code compile with a C++ compiler, and to scope out 
what changes are required to make it compile cleanly with both.

>>  And for the standard memory allocation functions, these all use size_t, not 
>> a signed size type like ssize_t.
>> 
>> The same consideration may also apply to other functions where we are using 
>> the type to represent sizes which can not be negative, where unsigned would 
>> be more natural to use and in many cases all of the calculations are done 
>> with unsigned inputs.
>> 
>> I was wondering if anyone could provide some historical context for why the 
>> API is this way.  And as a followup, if it would be something we could 
>> consider changing to make the API and the internal implementation safer and 
>> cleaner.
> I've no historical context on that either. I'm wondering how practical a 
> change is at that point of history given that it impacts API as well. I 
> suspect that there might part of the code base where the signedness would be 
> relied on, typically for loops (e.g. "tmsize_t i = stride; for (; i < cc - 3; 
> i += 4)" at line 371 of tif_predict.c that would not run correctly on 
> unsigned type if cc < 3),  and switching to unsigned could introduce bugs. 
> Not a strong opinion though, just a remark that the benefit vs risk at that 
> point of time isn't immediately obvious to me.

There are almost certainly places where the signedness is needed, but of all 
the sites I’ve manually reviewed, the vast majority could have been made 
unsigned safely.  Agreed that the risk of change likely isn’t worth it, but I 
would like to scope out exactly what would need changing to fix it properly.

Kind regards,
Roger
_______________________________________________
Tiff mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/tiff

Reply via email to