Thanks David -

Yes there are situations where hs_err fails, and few people are sadder than me when that happens 8-) , so I was thinking about how scared to be by the comment.

With the safety net of the error handler for the steps of the hs_err file
(which works well, we see it invoked frequently), it looks reasonable to use
%f as we might do other slightly questionable things for a signal handler.

Corrupting locale information or floating point state might possibly cause
problems, but if I cause a fake crash in print_date_and_time the error
handler recovers and the report continues.


Thinking about printing with two ints, seconds and fractions:
I don't see anything already that returns such a time in two components in the
JVM, so we might implement a new form of os::javaTimeNanos() or similar that
returns the two parts, and do that on each platform.

I didn't yet come up with anything to do in os::print_date_and_time()
which will take the fractional part of the double, and print just the fraction
as an int, without using any library / %f facilities.

If you're still concerned I could revisit these or some other idea.

Genuine laugh out loud moment for me, I backported the elapsed time logging from
6u4 to 5u19  (https://bugs.openjdk.java.net/browse/JDK-6447157) (2007).
(I said before jdk5 was created, I should have said before it was in mercurial.)

Thanks
Kevin


On 03/03/2020 01:11, David Holmes wrote:
Hi Kevin,

On 2/03/2020 8:48 pm, Kevin Walls wrote:
Oops, and with the bug ID in the title and JBS link:
https://bugs.openjdk.java.net/browse/JDK-8240295


On 02/03/2020 10:47, Kevin Walls wrote:
Hi,

(s11y and runtime opinions both relevant)

A few times in the last month I've really wanted to compare the Events logged in the hs_err file, and the time of the JVM's crash.

"elapsed time" in hs_err is only accurate to one second, and has been since before jdk5 was created.

The diff below changes the format string and uses the non-rounded time value (I don't see a need to change the other integer arithmetic here), and we can enjoy hs_errs with detail like:

...
Time: Mon Mar  2 09:57:13 2020 UTC elapsed time: 5.490135 seconds (0d 0h 0m 5s)
...

Thanks
Kevin


/jdk/open$ hg diff
diff --git a/src/hotspot/share/runtime/os.cpp b/src/hotspot/share/runtime/os.cpp
--- a/src/hotspot/share/runtime/os.cpp
+++ b/src/hotspot/share/runtime/os.cpp
@@ -1016,9 +1016,8 @@
   }

   double t = os::elapsedTime();
-  // NOTE: It tends to crash after a SEGV if we want to printf("%f",...) in -  //       Linux. Must be a bug in glibc ? Workaround is to round "t" to int
-  //       before printf. We lost some precision, but who cares?
+  // NOTE: a crash using printf("%f",...) on Linux was historically noted here
+  //       (before the jdk5 repo was created).

Just because it is old doesn't mean it no longer applies. printf is not async-signal-safe - we know that but we try to use it anyway. Maybe %f is even less async-signal-safe?

This may get through testing okay but cause problems with real crashes in the field.

What about breaking the time up into two ints: seconds and nanos?

Cheers,
David
-----

   int eltime = (int)t;  // elapsed time in seconds

   // print elapsed time in a human-readable format:
@@ -1029,7 +1028,7 @@
   int elmins = (eltime - day_secs - hour_secs) / secs_per_min;
   int minute_secs = elmins * secs_per_min;
   int elsecs = (eltime - day_secs - hour_secs - minute_secs);
-  st->print_cr(" elapsed time: %d seconds (%dd %dh %dm %ds)", eltime, eldays, elhours, elmins, elsecs); +  st->print_cr(" elapsed time: %f seconds (%dd %dh %dm %ds)", t, eldays, elhours, elmins, elsecs);
 }


Reply via email to