Re: [IronPython] Ironclad problem, with which someone here may be able to help

William Reade Fri, 07 Nov 2008 02:22:38 -0800

Thanks Dino -- you've given me plenty to be going on with until you hearback from the CLR dev.


Dino Viehland wrote:

In MTA mode we'll create a new thread which is thread 3 - thread 1 will just be 
hanging around waiting for it to exit.  Unfortunately w/o a separate EXE we 
can't avoid the separate thread.


I've pinged the CLR dev who owns interop these days to see if he's ever seen 
anything like this.  Unfortunately he's OOF until the 10th.  My best suggestion 
for the time being is to avoid using Monitor for this lock so that you don't 
pump messages.  That's probably going to mean writing your own lock and then 
P/Invoking to WaitForSingleObject when you experience contention and need to 
block the thread.  Using an AutoResetEvent or ManualResetEvent directly won't 
work because we'll pump messages there too!

I would also suggest looking at what CPython does here.  Dumpbin /imports 
python.exe shows no imports from ole32 so I'm guessing they don't do anything 
to spin up COM - which makes sense.  I would therefore assume when acquiring 
the GIL CPython will never pump messages and therefore programs don't expect 
possible re-entrancy - it'd probably be a good idea to mimic that behavior :).  
Unfortunately there are other places where we use Monitor in IronPython - 
list.append comes to mind.  It'll be interesting to see if that turns out to be 
a problem eventually as well.

Anyway, I'll let you know when I hear back.


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of William Reade
Sent: Thursday, November 06, 2008 7:29 AM
To: Discussion of IronPython
Subject: Re: [IronPython] Ironclad problem, with which someone here may be able 
to help

Finalization still happens on thread 2, and I see no evidence of thread
1 any more; my main thread now appears to have id 3.

Curt Hagenlocher wrote:

When you run with -X:MTA, is the object now being finalized on a
thread other than thread #2?

On Thu, Nov 6, 2008 at 7:01 AM, William Reade
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    Thanks Dino -- it seems that if I use ipy with -X:MTA, I can no
    longer reproduce the problem. However, this bothers me a bit: I'm
    not competent to follow all the consequences here, but this
    situation seems to imply that Ironclad won't be usable safely from
    any STA thread. Is this an intended restriction or a bug?

    Incidentally, tracking down the call stacks proved to be hard
    work: the timing changed enough that I only got a single failure
    over dozens of runs, and it turned out I'd got the logging wrong,
    so I didn't find out anything useful :(.


    Dino Viehland wrote:

        I would suggest getting a snap shot of the call stacks when
        this is happening if that's possible.  I can't pin anything
        down but I wonder if you could have an STA object or something
        that otherwise requires message pumping.  That message pumping
        could happen while you're doing a Monitor.Enter call.  If that
        was happening maybe there is some subtle CLR bug or a surprise
        feature?  It is surprising that Monitor.Enter can go
        re-entrant but it can...

        So it'd be interesting to get thread snapshots and see if



                          EnsureGIL (443) 2
                          EnsureGIL (443) 1      <- omg, wtf, bbq, etc.



        Could be happening because Thread 1 experiences contention,
        blocks and pumps messages, causing the finalizer thread (2) to
        run, the lock is acquired and ... ?

        Only other thing I could think of is does it repro on other
        machines?  Maybe it's a hardware bug as unlikely as that seems?

        -----Original Message-----
        From: [EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>
        [mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>] On Behalf Of
        William Reade
        Sent: Wednesday, November 05, 2008 10:01 AM
        To: Discussion of IronPython
        Subject: Re: [IronPython] Ironclad problem, with which someone
        here may be able to help

        The log starts in the middle (after many lock/unlocks, some
        from each
        thread); I'm running on x86; and I have no additional AppDomains.

        I don't think it would be safe for me to entirely avoid
        locking during
        finalization, but I could probably cut it down to a quick
        lock, on a
        separate object, to enqueue pointers for cleanup and
        deallocation on the
        main thread. However, I'm reluctant to do that until I'm sure
        that the
        problem is specifically related to GC, otherwise it'll just
        come back as
        soon as anyone tries any serious multithreading :).

        Curt Hagenlocher wrote:


            Locking during finalization is often considered to be a
            bad idea.  In
            particular, locking without a timeout introduces the
            possibility that
            you will hang the finalization thread, preventing further
            objects from
            being finalized.  But clearly, that's not what's happening
            here.

            Other questions that probably don't matter but might be
            interesting to
            know:

            Can we assume that the finalization thread isn't the first
            place where
            this lock is required?  That your log starts somewhere in
            the middle?

            Is this under x86 or x64 or both?

            Are you creating any additional AppDomains in the process?


            On Wed, Nov 5, 2008 at 10:15 AM, William Reade
            <[EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>
            <mailto:[EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>>> wrote:

               Hi Curt

               I am indeed; that's how I know thread 2 is the GC
            thread. Is
               locking during GC forbidden?

               William

               Curt Hagenlocher wrote:

                   ...or, for that matter, any __del__ methods from
            within Python
                   -- which ultimately are handled by finalization.

                   On Wed, Nov 5, 2008 at 9:37 AM, Curt Hagenlocher
                   <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
            <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
                   <mailto:[EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]> <mailto:[EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>>>>
                   wrote:

                      So, the obvious question for me is whether or
            not you're
                   using any
                      finalizers.


                      On Wed, Nov 5, 2008 at 5:57 AM, William Reade
                      <[EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>
                   <mailto:[EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>>
                   <mailto:[EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>
                   <mailto:[EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>>>>

                      wrote:

                          Hi all

                          While running the numpy tests, I've come
            across a situation
                          which, to the best of my knowledge, is simply
                   impossible. I'm
                          hoping that one of the local .NET gurus will
            be able to
                   tell
                          me what I'm missing, or point me somewhere I
            can get
                   more insight.

                          The 4 methods involved are as follows:
                          -----------------------
                                public int GetThreadId()
                                {
                                    return
            Thread.CurrentThread.ManagedThreadId;
                                }

                                public void WriteFlush(string info)
                                {
                                    Console.WriteLine(info);
                                    Console.Out.Flush();
                                }

                                public void EnsureGIL()
                                {
                                    Monitor.Enter(this.dispatcherLock);
                                    this.WriteFlush(String.Format(
                                        "EnsureGIL ({1}) {0}",
            this.GetThreadId(),
                          Builtin.id(this.dispatcherLock)));
                                }

                                public void ReleaseGIL()
                                {
                                    this.WriteFlush(String.Format(
                                        "ReleaseGIL ({1}) {0}\n",
            this.GetThreadId(),
                          Builtin.id(this.dispatcherLock)));
                                    Monitor.Exit(this.dispatcherLock);
                                }
                          -----------------------
                          ...and they can, and do, occasionally
            produce output as
                   follows:
                          -----------------------
                          EnsureGIL (443) 2
                          EnsureGIL (443) 1      <- omg, wtf, bbq, etc.
                          ReleaseGIL (443) 2

                          EnsureGIL (443) 2
                          ReleaseGIL (443) 1

                          ReleaseGIL (443) 2
                          -----------------------
                          When this happens, the process continues
            happily for a
                   short
                          time and then falls over in a later call to
            ReleaseGIL
                   (after
                          successfully calling it several times). The
            error is "
                   Object
                          synchronization method was called from an
                   unsynchronized block
                          of code", which I understand to mean "you
            can't release
                   this
                          lock because you don't hold it".

                          It doesn't happen very often, but I can usually
                   reproduce it
                          by running
                   test_multiarray.TestFromToFile.test_malformed a few
                          hundred times. It may be relevant to note
            that thread 2
                   is the
                          GC thread, and thread 1 is the main thread.
            I have
                   considered
                          the following possibilities:

                          (1) That I'm locking on the wrong object. I
            believe
                   that isn't
                          the case, because it's constructed only
            once, as a "new
                          Object()" (ie, a reference type), and is
            only subsequently
                          used for locking; and, because it keeps the
            same ipy id
                          throughout.

                          (2) That Monitor.Enter occasionally allows
            two different
                          threads to acquire the same lock. I consider
            this extremely
                          unlikely, because... well, how many
            multithreaded .NET apps
                          already exist? If Monitor really were
            broken, I think we'd
                          probably know about it by now.

                          (3) That calling Flush() on a SyncTextWriter
            (the type of
                          Console.Out) doesn't actually do anything,
            and the
                   output is
                          somehow wrongly ordered (although I can't
            imagine how this
                          could actually be: if the locking is really
            working,
                   then my
                          console writes are strictly sequential). I
            don't have
                   access
                          to the code, so I have no idea how it's
            implemented,
                   but even
                          if this is the case it doesn't help much
            with the
                   fundamental
                          problem (the synchronisation error which
            follows).

                          Apart from the above, I'm out of ideas. Can
            anyone suggest
                          what I've missed?

                          William
                          _______________________________________________
                          Users mailing list
                          Users@lists.ironpython.com
            <mailto:Users@lists.ironpython.com>
                   <mailto:Users@lists.ironpython.com
            <mailto:Users@lists.ironpython.com>>
                   <mailto:Users@lists.ironpython.com
            <mailto:Users@lists.ironpython.com>
                   <mailto:Users@lists.ironpython.com
            <mailto:Users@lists.ironpython.com>>>



            http://lists.ironpython.com/listinfo.cgi/users-ironpython.com




             
------------------------------------------------------------------------



                   _______________________________________________
                   Users mailing list
                   Users@lists.ironpython.com
            <mailto:Users@lists.ironpython.com>
            <mailto:Users@lists.ironpython.com
            <mailto:Users@lists.ironpython.com>>

             http://lists.ironpython.com/listinfo.cgi/users-ironpython.com



               _______________________________________________
               Users mailing list
               Users@lists.ironpython.com
            <mailto:Users@lists.ironpython.com>
            <mailto:Users@lists.ironpython.com
            <mailto:Users@lists.ironpython.com>>

             http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


            
------------------------------------------------------------------------

            _______________________________________________
            Users mailing list
            Users@lists.ironpython.com <mailto:Users@lists.ironpython.com>
            http://lists.ironpython.com/listinfo.cgi/users-ironpython.com




        _______________________________________________
        Users mailing list
        Users@lists.ironpython.com <mailto:Users@lists.ironpython.com>
        http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

        _______________________________________________
        Users mailing list
        Users@lists.ironpython.com <mailto:Users@lists.ironpython.com>
        http://lists.ironpython.com/listinfo.cgi/users-ironpython.com




    _______________________________________________
    Users mailing list
    Users@lists.ironpython.com <mailto:Users@lists.ironpython.com>
    http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


------------------------------------------------------------------------

_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Re: [IronPython] Ironclad problem, with which someone here may be able to help

Reply via email to