Re: RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition

Robbin Ehn Tue, 08 Oct 2019 01:53:26 -0700

Great, thanks!

/Robbin


On 2019-10-07 18:41, Daniil Titov wrote:

Hi Robbin,

Yes, I ran my benchmark [1]. Please see below the output showing that the table 
was grown.

../jdk/build/linux-x64-release/images/jdk/bin/java -cp . 
-Xlog:thread+table=info ThreadStartupTest
Starting the test
[0.185s][info][thread,table] Grown to size:512
The test finished.
Execution time:15673 ms


[1] https://cr.openjdk.java.net/~dtitov/tests/ThreadStartupTest.java

Thanks!
Daniil


On 10/7/19, 12:34 AM, "Robbin Ehn" <[email protected]> wrote:

     Hi Daniil,

Yes, good, but:>> >> Testing: Mach5 tier1, tier2, and tier3 tests successfully passed.

     >>      And if you have not done so, you should test this with the 
benchmark you
     >> have as
     >>      a stress test and see that this does what we think.

Can you please test it with your benchmark, if you have not done so?/Robbin>> Thanks, Robbin

     >>      >>
     >>      >> Thank you!
     >>      >>
     >>      >> Best regards,
     >>      >> Daniil
     >>      >>
     >>      >> On 10/2/19, 3:26 PM, "David Holmes" <[email protected]> 
wrote:
     >>      >>
     >>      >>      Hi Daniil,
     >>      >>      On 3/10/2019 2:21 am, Daniil Titov wrote:
     >>      >>      > Hi David and Robbin,
     >>      >>      >
     >>      >>      > Could we consider  making the ServiceThread responsible 
for the
     >>      >> ThreadIdTable resizing in the similar way how
     >>      >>      > it works for  StringTable  and ResolvedMethodTable, 
rather than
     >> having
     >>      >> ThreadIdTable::add() method calling ThreadIdTable::grow()?
     >>      >>      > As I understand It should solve  the current  issue and
     >> address the
     >>      >> concern that  the doing the resizing could be a relatively long 
and
     >>      >>      > doing it without polling  for safepoints or while the 
holding
     >>      >> Threads_lock is not desirable.
     >>      >>      I originally rejected copying that part of the code from 
the other
     >>      >>      tables as it seems to introduce unnecessary complexity. 
Having a
     >>      >>      separate thread trying to grow the table when it could be
     >> concurrently
     >>      >>      having threads added and removed seems like it could 
introduce
     >> hard to
     >>      >>      diagnose performance pathologies. It also adds what we 
know to be a
     >>      >>      potentially long running action to the workload of the 
service
     >> thread,
     >>      >>      which means it may also impact the other tasks the service 
thread is
     >>      >>      doing, thus potentially introducing even more hard to 
diagnose
     >>      >>      performance pathologies.
     >>      >>      So this change does concern me. But go ahead and trial it.
     >>      >>      Thanks,
     >>      >>      David
     >>      >>      > Thank you,
     >>      >>      > Daniil
     >>      >>      >
     >>      >>      >
     >>      >>      > On 10/2/19, 6:25 AM, "David Holmes" 
<[email protected]>
     >> wrote:
     >>      >>      >
     >>      >>      >      Hi Robbin,
     >>      >>      >
     >>      >>      >      On 2/10/2019 7:58 pm, Robbin Ehn wrote:
     >>      >>      >      > Hi David,
     >>      >>      >      >
     >>      >>      >      >> What if the table is full and must be grown?
     >>      >>      >      >
     >>      >>      >      > The table uses chaining, it just means load 
factor tip
     >> over what is
     >>      >>      >      > considered a good backing array size.
     >>      >>      >
     >>      >>      >      Coleen raised a good question in a separate 
discussion,
     >> which made me
     >>      >>      >      realize that once the table has been initially 
populated all
     >>      >> subsequent
     >>      >>      >      additions, and hence all subsequent calls to grow() 
always
     >> happen
     >>      >> with
     >>      >>      >      the Threads_lock held. So we can't just defer the 
grow().
     >>      >>      >
     >>      >>      >      >> That aside, I'd like to know how expensive it is 
to
     >> grow this
     >>      >> table.
     >>      >>      >      >> What are we talking about here?
     >>      >>      >      >
     >>      >>      >      > We use global counter which on write_synchronize 
must
     >> scan all
     >>      >>      >      > threads to make sure they have seen the update 
(there some
     >>      >>      >      > optimization to avoid it if there is no readers 
at all).
     >> Since this
     >>      >>      >      > table contains the threads, we get double 
penalized, for
     >> each new
     >>      >>      >      > thread the synchronization cost increase AND the 
number
     >> of items.
     >>      >>      >      >
     >>      >>      >      > With concurrent reads you still need many 
thousands of
     >> threads, but
     >>      >>      >      > I think I saw someone mentioning 100k threads, 
assuming
     >> concurrent
     >>      >>      >      > queries the resize can take hundreds of ms to 
finish.
     >> Note that
     >>      >> reads
     >>      >>      >      > and inserts still in operate roughly at the same 
speed
     >> while
     >>      >>      >      > resizing. So a longer resize is only problematic 
if we
     >> do not
     >>      >>      >      > respect safepoints.
     >>      >>      >      I think if anything were capable of running 100K 
threads
     >> we would be
     >>      >>      >      hitting far worse scalability bottlenecks than 
this. But
     >> this does
     >>      >> seem
     >>      >>      >      problematic.
     >>      >>      >
     >>      >>      >      Thanks,
     >>      >>      >      David
     >>      >>      >      -----
     >>      >>      >
     >>      >>      >      > Thanks, Robbin
     >>      >>      >      >
     >>      >>      >      >>
     >>      >>      >      >> David
     >>      >>      >      >>
     >>      >>      >      >>> /Robbin
     >>      >>      >      >>>
     >>      >>      >      >>> On 2019-10-02 08:46, David Holmes wrote:
     >>      >>      >      >>>> Hi Daniil,
     >>      >>      >      >>>>
     >>      >>      >      >>>> On 2/10/2019 4:13 pm, Daniil Titov wrote:
     >>      >>      >      >>>>> Please review a change that fixes the issue. 
The
     >> problem
     >>      >> here is
     >>      >>      >      >>>>> that that the thread is added to the 
ThreadIdTable
     >>      >> (introduced in
     >>      >>      >      >>>>> [3]) while the Threads_lock is held by
     >>      >>      >      >>>>> JVM_StartThread. When new thread is added  to 
the
     >> thread
     >>      >> table the
     >>      >>      >      >>>>> table checks if its load factor is greater 
than
     >> required and
     >>      >> if so
     >>      >>      >      >>>>> it grows itself while polling for safepoints.
     >>      >>      >      >>>>> After changes [4]  an attempt to block the 
thread while
     >>      >> holding the
     >>      >>      >      >>>>> Threads_lock  results in assertion in
     >>      >>      >      >>>>> Thread::check_possible_safepoint().
     >>      >>      >      >>>>>
     >>      >>      >      >>>>> The fix  proposed by David Holmes ( thank you,
     >> David!)  is
     >>      >> to skip
     >>      >>      >      >>>>> the ThreadBlockInVM inside 
ThreadIdTable::grow()
     >> method if the
     >>      >>      >      >>>>> current thread owns the Threads_lock.
     >>      >>      >      >>>>
     >>      >>      >      >>>> Sorry but looking at the fix in context now I 
think
     >> it would be
     >>      >>      >      >>>> better to do this:
     >>      >>      >      >>>>
     >>      >>      >      >>>>      while (gt.do_task(jt)) {
     >>      >>      >      >>>>        if (Threads_lock->owner() == jt) {
     >>      >>      >      >>>>          gt.pause(jt);
     >>      >>      >      >>>>          ThreadBlockInVM tbivm(jt);
     >>      >>      >      >>>>          gt.cont(jt);
     >>      >>      >      >>>>        }
     >>      >>      >      >>>>      }
     >>      >>      >      >>>>
     >>      >>      >      >>>> This way we don't waste time with the 
pause/cont when
     >> there's no
     >>      >>      >      >>>> safepoint pause going to happen - and the 
owner()
     >> check is
     >>      >> quicker
     >>      >>      >      >>>> than owned_by_self(). That partially addresses 
a general
     >>      >> concern I
     >>      >>      >      >>>> have about how long it may take to grow the 
table, as
     >> we are
     >>      >>      >      >>>> deferring safepoints until it is complete in 
this
     >>      >> JVM_StartThread
     >>      >>      >      >>>> usecase.
     >>      >>      >      >>>>
     >>      >>      >      >>>> In the test you don't need all of:
     >>      >>      >      >>>>
     >>      >>      >      >>>>    32  * @run clean ThreadStartTest
     >>      >>      >      >>>>    33  * @run build ThreadStartTest
     >>      >>      >      >>>>    34  * @run main ThreadStartTest
     >>      >>      >      >>>>
     >>      >>      >      >>>> just the last @run suffices to build and run 
the test.
     >>      >>      >      >>>>
     >>      >>      >      >>>> Thanks,
     >>      >>      >      >>>> David
     >>      >>      >      >>>> -----
     >>      >>      >      >>>>
     >>      >>      >      >>>>> Testing : Mach 5 tier1 and tier2 completed
     >> successfully,
     >>      >> tier3 is
     >>      >>      >      >>>>> in progress.
     >>      >>      >      >>>>>
     >>      >>      >      >>>>> [1] Webrev:
     >>      >> http://cr.openjdk.java.net/~dtitov/8231666/webrev.01/
     >>      >>      >      >>>>> [2] Bug:
     >> https://bugs.openjdk.java.net/browse/JDK-8231666
     >>      >>      >      >>>>> [3] 
https://bugs.openjdk.java.net/browse/JDK-8185005
     >>      >>      >      >>>>> [4] 
https://bugs.openjdk.java.net/browse/JDK-8184732
     >>      >>      >      >>>>>
     >>      >>      >      >>>>> Best regards,
     >>      >>      >      >>>>> Danill
     >>      >>      >      >>>>>
     >>      >>      >      >>>>>
     >>      >>      >
     >>      >>      >
     >>      >>      >
     >>      >>
     >>      >>
     >>
     >>

Re: RFR: 8231666: ThreadIdTable::grow() invokes invalid thread transition

Reply via email to