Should Inflater & Deflater implement Closeable?
Hello, java.util.zip.Inflater and java.util.zip.Deflater both have an `end()` method to release native memory. However, both classes do not implement Closeable. This might prevent IDEs and other tooling from indicating that users should release the resources, and prevents usage in try-with-resources. Are the reasons for this only historical? (Inflater and Deflater seem to be older than Closeable) If so, would it be reasonable to adjust these classes to implement Closeable? The implementation could then delegate to `end()` (and omit `throws IOException`), though to satisfy the `close()` contract it might be necessary to track whether the method has already been before. I did not find anything regarding this on the bug tracker or this mailing list. Kind regards
Add String & Character ASCII case conversion methods
Hello, could you please add String & Character ASCII case conversion methods, that is, methods which only perform case conversion on ASCII characters in the input and leave any other characters unchanged. The conversion should not depend on the default locale. For example: - String: - toAsciiLowerCase - toAsciiUpperCase - equalsAsciiIgnoreCase (or a better name) - compareToAsciiIgnoreCase (or a better name) - Character: - toAsciiLowerCase - toAsciiUpperCase This would give the following advantages: - Increased performance (+ not be vulnerable to denial of service attacks) - Reduced number of bugs in applications Please read on for a detailed explanation. I assume for historic reasons (Applets) the current case conversion methods use the Unicode conversion rules, and even worse String.toLowerCase() and String.toUpperCase() use the default locale. While this might back then have been a reasonable choice because Applets ran locally in the browser and localization was a nice to have feature (or even a requirement), nowadays Java is largely used for back-end systems and case conversion is pretty often done for technical strings and not display text anymore. In this context applications mostly process ASCII strings. However, because Java does not offer any specific case conversion methods for these cases, users still use the standard String & Character methods. This causes the following problems [1]: - String.toLowerCase() & String.toUpperCase() using default locale What this means is that depending on the OS locale your application might behave differently or fail [2]. For the scale of this, simply look in the OpenJDK database: https://bugs.openjdk.org/issues/?jql=text ~ "turkish locale" At this point you probably have to add a disclaimer to any Java program that running it on systems with Turkish (and possibly others) as locale is not supported, because either your own code or the libraries you are using might be calling toLowerCase() or toUpperCase() [3]. - Bad performance for Unicode aware case conversions Compared to simply performing ASCII case conversion, applying Unicode case conversion has worse performance. In some cases it can even have extremely bad performance (JDK-8292573). This could have security implications. - Bugs due to case conversion changing string length Unicode case conversion for certain strings can change the length, either increasing or decreasing the size of the string (or when combining both, shifting position of characters in the string while keeping the length the same). If an application assumes that the length of the string remains the same and uses data derived from the original string (e.g. character indices or length) on the converted string this can lead to exceptions or potentially even security issues. - Unicode characters mapping to ASCII chars When performing case conversion on certain non-ASCII Unicode characters, the results are ASCII characters. For example `Character.toLowerCase('\u212A') == 'k'`. This could have security implications. - Update to Unicode data changing application behavior Unicode evolves over time, and the JDK regularly updates the Unicode data it is using. Even if an application is not affected by the issues mentioned above, it could become affected by them when the Unicode data is updated in a newer JDK version. My main point here is that (I assume) in many cases Java applications don't need Unicode case conversion, let alone Unicode case conversion using the default locale. If Java offered ASCII-only case conversion methods, then hopefully users would (where applicable) switch to these methods over time and avoid all the issues mentioned above. And even if they accidentally use the ASCII-only methods for display text, the result might be a minor inconvenience for users seeing the display text, compared to in the other cases application bugs and security vulnerabilities. Related information about other programming languages: - Rust: Has dedicated methods for ASCII case conversion, e.g. https://doc.rust-lang.org/std/string/struct.String.html#method.to_ascii_lowercase - Kotlin: Functions which implicitly use the default locale were deprecated, see https://youtrack.jetbrains.com/issue/KT-43023 Risks: - ASCII case conversion could lead to undesired results in some cases, see the example for the word "café" on https://doc.rust-lang.org/std/ascii/trait.AsciiExt.html (though that specific example is about a display string, for which these ASCII-only methods are not intended) - When applications start to mix ASCII-only and the existing Unicode conversion methods this could lead to bugs and security issues as well; though it might also indicate a flaw in the application if it performs case conversion on the same value in different places I hope you consider this suggestion. Feedback is appreciated! Kind regards [1] I am not saying though that Java is the only affecte
Re: Add String & Character ASCII case conversion methods
Thanks for your effort on deprecating the error-prone toLowerCase() and toUpperCase() methods! Though that only addresses one aspect of my original proposal. Do you or anyone else have any opinion or feedback regarding the proposed ASCII case conversion methods? Even if you think these methods are not needed, such feedback would already be useful (assuming you explain why you think that is the case, and why the issues I presented are not a problem). Maybe I will also submit this as "Request for Enhancement" to the bug tracker. Kind regards > Hi, > We discussed this issue on this mailing list[1] earlier this year. > > I investigated the usage of these two methods and found that all use cases > within > JDK are suspicious, resulting in many imperceptible bugs. > > I hope to create a PR for this issue, deprecate these two methods, and create > alternative methods for them. But I don't have the experience of making such > changes, maybe I need some guidance or have more experienced people do these > things. > > Glavo > [1] https://mail.openjdk.org/pipermail/core-libs-dev/2023-January/099375.html
RE: [BUG]Collections.unmodifiableMap(map).entrySet().iterator().forEachRemaining ismissing null check
Hi, could you please report this at https://bugreport.java.com/bugreport/ (please also first search at https://bugs.openjdk.org/issues?jql=project%3DJDK%20AND%20issuetype%3DBug whether it has already been reported). This also seems to affect the `CheckedEntrySet`; would be good if you could mention that in your report as well. I think the easiest reproducers for this can be created with an empty map: ``` Collections.unmodifiableMap(Map.of()).entrySet().iterator().forEachRemaining(null) Collections.checkedMap(Map.of(), String.class, String.class).entrySet().iterator().forEachRemaining(null) ``` Per specification both calls should throw a NullPointerException, but they don't throw it. Might also be worth recommending to the JDK authors to perform the null check in the internal `entryConsumer` method to avoid having it repeated 4 times, once for each caller. The `CheckedEntrySet` would still need a separate null check though. Kind regards
Reduce / avoid ConditionalSpecialCasing.lookUpTable temporary memory allocation
Hello, originally I reported this on the bug tracker, but was asked to first post this topic to this mailing list. I was told that afterwards the bug report will be created. The internal method `java.lang.ConditionalSpecialCasing#lookUpTable` is used for special case conversion rules, and is called when either the specified locale has special casing rules (e.g. Turkish) or the string to convert contains characters with special casing rules, for example U+0130 (Latin Capital Letter I with Dot Above). The problem with this method is that it creates temporary objects. Given that the method is in the worst case called for every character (possibly even twice per character), this can cause a lot of temporary memory allocation for large strings. Below is the original bug report description (slightly modified), with a proposal how it can (at least in parts) be implemented without allocating any temporary objects; feedback is appreciated. I am not a JDK member and therefore cannot submit a pull request for this. Kind regards -- There are two issues with the method `lookUpTable` of the internal class java.lang.ConditionalSpecialCasing which is used for special case conversion: - It uses the int codepoint as key for a Map to look up the case conversion; therefore this wraps the int as an Integer - The special case conversion entries are stored in a HashSet - First of all usage of a Set seems redundant because Entry does not even override `equals` and it look like always distinct Entry instances are added to the Set - Usage of a Set means a new Iterator object is created whenever case conversion entries are found for a code point It looks like both of this can be fixed, for example in the following way: 1. Remove ConditionalSpecialCasing.Entry.ch (and the corresponding getter) 2. Remove the static field ConditionalSpecialCasing.entry 3. For every existing entry add a static final field `entry` storing a Entry[] ( being a placeholder for the codepoint hex string) 4. In ConditionalSpecialCasing.lookUpTable use a `switch` to access the corresponding `entry...` Here is a short example snippet showing that: ``` private static final Entry[] entry0069 = { new Entry(new char[]{0x0069}, new char[]{0x0130}, "tr", 0), // # LATIN SMALL LETTER I new Entry(new char[]{0x0069}, new char[]{0x0130}, "az", 0) // # LATIN SMALL LETTER I }; ... private static char[] lookUpTable(String src, int index, Locale locale, boolean bLowerCasing) { Entry[] entries = switch (src.codePointAt(index)) { case 0x0069 -> entry0069; ... default -> null; }; char[] ret = null; if (entries != null) { String currentLang = locale.getLanguage(); for (Entry entry : entries) { String conditionLang = entry.getLanguage(); ... } } return ret; } ``` Note: `java.lang.ConditionalSpecialCasing.isFinalCased` is also quite problematic because it creates a new StringCharacterIterator and a RuleBasedBreakIterator for each call. Unfortunately I don't know of an easy way how this can be avoided; it would be great if you could investigate solving this nonetheless, in the worst case with ThreadLocal or simiar. STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : Profile the object allocations of the `toLowerCase` calls of the following code snippets, for example with VisualVM: 1. Snippet: ``` String s = "\u0130".repeat(1000); s.toLowerCase(Locale.ROOT); ``` 2. Snippet: ``` String s = "\u03A3".repeat(1000); s.toLowerCase(Locale.ROOT); ``` ACTUAL - 1. Snippet: 2000 Integer objects created 2000 HashMap$KeyIterator objects created 2. Snippet: 1000 Integer objects created 1000 HashMap$KeyIterator objects created 1000 StringCharacterIterator objects created 1000 RuleBasedBreakIterator objects created
Why does CompletableFuture use ThreadPerTaskExecutor?
Hello, when ForkJoinPool.getCommonPoolParallelism() is < 2, CompletableFuture uses a ThreadPerTaskExecutor. As the name implies, that executor creates a new thread per task. My question is, why is it implemented this way? This approach can cause drastic performance decreases for applications making heavy use of CompletableFuture. For example when I ran a program on a machine with 2 CPU cores, the program took ~40 minutes. After manually specifying an executor for CompletableFuture which simply runs the tasks in the current thread, the program took ~5 minutes. I assume this was caused by the extensive context switching for hundreds of threads running concurrently. A similar extreme case occurred for Minecraft Java Edition in the past where this caused the game to run out of memory because it created too many threads (https://bugs.mojang.com/browse/MC-137353): OutOfMemoryError: unable to create new native thread This is tracked already by https://bugs.openjdk.org/browse/JDK-8213115, but unfortunately it looks like that report received little attention so far. I am a bit afraid that a lot of applications are negatively affected by this (also on GitHub hosted workflow runners which run with 2 CPU cores), but in most cases application developers dismiss this as general performance issues (possibly assuming the hardware is to blame) without drawing the connection to CompletableFuture. After digging a bit in the Git history, it appears this behavior was introduced by b1a10b8ed7bedb27ae25341602319a11a1225ee7 to fix test failures for CompletableFuture/Basic.java (JDK-8020435; private report so I don't know the details). However, wouldn't it be more reasonable to simply use a thread pool with 2 worker threads? That would match the behavior of a machine with 3 CPU cores and is known to work without issues (otherwise you would have special casing for this scenario as well in the CompletableFuture implementation). Sorry if bumping JDK bug reports here is not appreciated. After all I solved this issue by simply not using commonPool(), but I am afraid others are negatively affected by this as well without being aware of this. Kind regards