Should Inflater & Deflater implement Closeable?

2022-11-15 Thread some-java-user-99206970363698485155
Hello,
java.util.zip.Inflater and java.util.zip.Deflater both have an `end()` method 
to release native memory. However, both classes
do not implement Closeable. This might prevent IDEs and other tooling from 
indicating that users should release the
resources, and prevents usage in try-with-resources.

Are the reasons for this only historical? (Inflater and Deflater seem to be 
older than Closeable)
If so, would it be reasonable to adjust these classes to implement Closeable? 
The implementation could then delegate to
`end()` (and omit `throws IOException`), though to satisfy the `close()` 
contract it might be necessary to track whether the
method has already been before.

I did not find anything regarding this on the bug tracker or this mailing list.

Kind regards



Add String & Character ASCII case conversion methods

2023-04-09 Thread some-java-user-99206970363698485155
Hello,
could you please add String & Character ASCII case conversion methods, that is, 
methods which only perform case conversion on ASCII characters in the input and 
leave any other characters unchanged. The conversion should not depend on the 
default locale. For example:
- String:
  - toAsciiLowerCase
  - toAsciiUpperCase
  - equalsAsciiIgnoreCase (or a better name)
  - compareToAsciiIgnoreCase (or a better name)
- Character:
  - toAsciiLowerCase
  - toAsciiUpperCase

This would give the following advantages:
- Increased performance (+ not be vulnerable to denial of service attacks)
- Reduced number of bugs in applications


Please read on for a detailed explanation.

I assume for historic reasons (Applets) the current case conversion methods use 
the Unicode conversion rules, and even worse String.toLowerCase() and 
String.toUpperCase() use the default locale. While this might back then have 
been a reasonable choice because Applets ran locally in the browser and 
localization was a nice to have feature (or even a requirement), nowadays Java 
is largely used for back-end systems and case conversion is pretty often done 
for technical strings and not display text anymore. In this context 
applications mostly process ASCII strings.
However, because Java does not offer any specific case conversion methods for 
these cases, users still use the standard String & Character methods. This 
causes the following problems [1]:

- String.toLowerCase() & String.toUpperCase() using default locale
  What this means is that depending on the OS locale your application might 
behave differently or fail [2]. For the scale of this, simply look in the 
OpenJDK database: https://bugs.openjdk.org/issues/?jql=text ~ "turkish locale"
  At this point you probably have to add a disclaimer to any Java program that 
running it on systems with Turkish (and possibly others) as locale is not 
supported, because either your own code or the libraries you are using might be 
calling toLowerCase() or toUpperCase() [3].

- Bad performance for Unicode aware case conversions
  Compared to simply performing ASCII case conversion, applying Unicode case 
conversion has worse performance. In some cases it can even have extremely bad 
performance (JDK-8292573). This could have security implications.

- Bugs due to case conversion changing string length
  Unicode case conversion for certain strings can change the length, either 
increasing or decreasing the size of the string (or when combining both, 
shifting position of characters in the string while keeping the length the 
same). If an application assumes that the length of the string remains the same 
and uses data derived from the original string (e.g. character indices or 
length) on the converted string this can lead to exceptions or potentially even 
security issues.

- Unicode characters mapping to ASCII chars
  When performing case conversion on certain non-ASCII Unicode characters, the 
results are ASCII characters. For example `Character.toLowerCase('\u212A') == 
'k'`. This could have security implications.

- Update to Unicode data changing application behavior
  Unicode evolves over time, and the JDK regularly updates the Unicode data it 
is using. Even if an application is not affected by the issues mentioned above, 
it could become affected by them when the Unicode data is updated in a newer 
JDK version.

My main point here is that (I assume) in many cases Java applications don't 
need Unicode case conversion, let alone Unicode case conversion using the 
default locale. If Java offered ASCII-only case conversion methods, then 
hopefully users would (where applicable) switch to these methods over time and 
avoid all the issues mentioned above. And even if they accidentally use the 
ASCII-only methods for display text, the result might be a minor inconvenience 
for users seeing the display text, compared to in the other cases application 
bugs and security vulnerabilities.

Related information about other programming languages:
- Rust: Has dedicated methods for ASCII case conversion, e.g. 
https://doc.rust-lang.org/std/string/struct.String.html#method.to_ascii_lowercase
- Kotlin: Functions which implicitly use the default locale were deprecated, 
see https://youtrack.jetbrains.com/issue/KT-43023

Risks:
- ASCII case conversion could lead to undesired results in some cases, see the 
example for the word "café" on 
https://doc.rust-lang.org/std/ascii/trait.AsciiExt.html (though that specific 
example is about a display string, for which these ASCII-only methods are not 
intended)
- When applications start to mix ASCII-only and the existing Unicode conversion 
methods this could lead to bugs and security issues as well; though it might 
also indicate a flaw in the application if it performs case conversion on the 
same value in different places

I hope you consider this suggestion. Feedback is appreciated!

Kind regards



[1] I am not saying though that Java is the only affecte

Re: Add String & Character ASCII case conversion methods

2023-04-16 Thread some-java-user-99206970363698485155
Thanks for your effort on deprecating the error-prone toLowerCase() and 
toUpperCase() methods!

Though that only addresses one aspect of my original proposal. Do you or anyone 
else have any opinion or feedback regarding the proposed ASCII case conversion 
methods? Even if you think these methods are not needed, such feedback would 
already be useful (assuming you explain why you think that is the case, and why 
the issues I presented are not a problem).

Maybe I will also submit this as "Request for Enhancement" to the bug tracker.

Kind regards

> Hi,
> We discussed this issue on this mailing list[1] earlier this year.
> 
> I investigated the usage of these two methods and found that all use cases 
> within
> JDK are suspicious, resulting in many imperceptible bugs.
> 
> I hope to create a PR for this issue, deprecate these two methods, and create
> alternative methods for them. But I don't have the experience of making such
> changes, maybe I need some guidance or have more experienced people do these 
> things.
> 
> Glavo
> [1] https://mail.openjdk.org/pipermail/core-libs-dev/2023-January/099375.html




RE: [BUG]Collections.unmodifiableMap(map).entrySet().iterator().forEachRemaining ismissing null check

2022-08-10 Thread some-java-user-99206970363698485155
Hi,
could you please report this at https://bugreport.java.com/bugreport/
(please also first search at 
https://bugs.openjdk.org/issues?jql=project%3DJDK%20AND%20issuetype%3DBug
whether it has already been reported).

This also seems to affect the `CheckedEntrySet`; would be good if you could 
mention that
in your report as well.

I think the easiest reproducers for this can be created with an empty map:
```
Collections.unmodifiableMap(Map.of()).entrySet().iterator().forEachRemaining(null)
Collections.checkedMap(Map.of(), String.class, 
String.class).entrySet().iterator().forEachRemaining(null)
```
Per specification both calls should throw a NullPointerException, but they 
don't throw it.

Might also be worth recommending to the JDK authors to perform the null check 
in the internal
`entryConsumer` method to avoid having it repeated 4 times, once for each 
caller. The `CheckedEntrySet`
would still need a separate null check though.

Kind regards



Reduce / avoid ConditionalSpecialCasing.lookUpTable temporary memory allocation

2022-08-17 Thread some-java-user-99206970363698485155
Hello,
originally I reported this on the bug tracker, but was asked to first post this 
topic to this mailing list. I was told that afterwards the bug report will be 
created.

The internal method `java.lang.ConditionalSpecialCasing#lookUpTable` is used 
for special case conversion rules, and is called when either the specified 
locale has special casing rules (e.g. Turkish) or the string to convert 
contains characters with special casing rules, for example U+0130 (Latin 
Capital Letter I with Dot Above). The problem with this method is that it 
creates temporary objects. Given that the method is in the worst case called 
for every character (possibly even twice per character), this can cause a lot 
of temporary memory allocation for large strings.

Below is the original bug report description (slightly modified), with a 
proposal how it can (at least in parts) be implemented without allocating any 
temporary objects; feedback is appreciated. I am not a JDK member and therefore 
cannot submit a pull request for this.

Kind regards

--

There are two issues with the method `lookUpTable` of the internal class 
java.lang.ConditionalSpecialCasing which is used for special case conversion:
- It uses the int codepoint as key for a Map to look up the case 
conversion; therefore this wraps the int as an Integer
- The special case conversion entries are stored in a HashSet
  - First of all usage of a Set seems redundant because Entry does not even 
override `equals` and it look like always distinct Entry instances are added to 
the Set
  - Usage of a Set means a new Iterator object is created whenever case 
conversion entries are found for a code point

It looks like both of this can be fixed, for example in the following way:
1. Remove ConditionalSpecialCasing.Entry.ch (and the corresponding getter)
2. Remove the static field ConditionalSpecialCasing.entry
3. For every existing entry add a static final field `entry` storing 
a Entry[]  ( being a placeholder for the codepoint hex string)
4. In ConditionalSpecialCasing.lookUpTable use a `switch` to access the 
corresponding `entry...`

Here is a short example snippet showing that:
```
private static final Entry[] entry0069 = {
new Entry(new char[]{0x0069}, new char[]{0x0130}, "tr", 0), // # LATIN 
SMALL LETTER I
new Entry(new char[]{0x0069}, new char[]{0x0130}, "az", 0) // # LATIN SMALL 
LETTER I
};
...

private static char[] lookUpTable(String src, int index, Locale locale, boolean 
bLowerCasing) {
Entry[] entries = switch (src.codePointAt(index)) {
case 0x0069 -> entry0069;
...
default -> null;
};
char[] ret = null;

if (entries != null) {
String currentLang = locale.getLanguage();
for (Entry entry : entries) {
String conditionLang = entry.getLanguage();
...
}
}

return ret;
}
```


Note: `java.lang.ConditionalSpecialCasing.isFinalCased` is also quite 
problematic because it creates a new StringCharacterIterator and a 
RuleBasedBreakIterator for each call.
Unfortunately I don't know of an easy way how this can be avoided; it would be 
great if you could investigate solving this nonetheless, in the worst case with 
ThreadLocal or simiar.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Profile the object allocations of the `toLowerCase` calls of the following code 
snippets, for example with VisualVM:

1. Snippet:
```
String s = "\u0130".repeat(1000);
s.toLowerCase(Locale.ROOT);
```

2. Snippet:
```
String s = "\u03A3".repeat(1000);
s.toLowerCase(Locale.ROOT);
```


ACTUAL -
1. Snippet:
2000 Integer objects created
2000 HashMap$KeyIterator objects created

2. Snippet:
1000 Integer objects created
1000 HashMap$KeyIterator objects created
1000 StringCharacterIterator objects created
1000 RuleBasedBreakIterator objects created



Why does CompletableFuture use ThreadPerTaskExecutor?

2022-08-17 Thread some-java-user-99206970363698485155
Hello,
when ForkJoinPool.getCommonPoolParallelism() is < 2, CompletableFuture uses a 
ThreadPerTaskExecutor. As the name implies,
that executor creates a new thread per task.
My question is, why is it implemented this way?

This approach can cause drastic performance decreases for applications making 
heavy use of CompletableFuture. For example when
I ran a program on a machine with 2 CPU cores, the program took ~40 minutes. 
After manually specifying an executor for CompletableFuture
which simply runs the tasks in the current thread, the program took ~5 minutes. 
I assume this was caused by the extensive context
switching for hundreds of threads running concurrently.

A similar extreme case occurred for Minecraft Java Edition in the past where 
this caused the game to run out of memory because it
created too many threads (https://bugs.mojang.com/browse/MC-137353): 
OutOfMemoryError: unable to create new native thread

This is tracked already by https://bugs.openjdk.org/browse/JDK-8213115, but 
unfortunately it looks like that report received little attention
so far. I am a bit afraid that a lot of applications are negatively affected by 
this (also on GitHub hosted workflow runners which run with
2 CPU cores), but in most cases application developers dismiss this as general 
performance issues (possibly assuming the hardware
is to blame) without drawing the connection to CompletableFuture.

After digging a bit in the Git history, it appears this behavior was introduced 
by b1a10b8ed7bedb27ae25341602319a11a1225ee7 to
fix test failures for CompletableFuture/Basic.java (JDK-8020435; private report 
so I don't know the details). However, wouldn't it be more
reasonable to simply use a thread pool with 2 worker threads? That would match 
the behavior of a machine with 3 CPU cores and is
known to work without issues (otherwise you would have special casing for this 
scenario as well in the CompletableFuture implementation).

Sorry if bumping JDK bug reports here is not appreciated. After all I solved 
this issue by simply not using commonPool(), but I am
afraid others are negatively affected by this as well without being aware of 
this.

Kind regards