pth_event() problems, and undocumented features

David Flynn Thu, 24 May 2001 08:13:57 -0700
Hello,
 The following applies to pth-1.3 and pth-1.4 running on linux 2.4.4 with a
no additional ./configure arguments.

 Firstly, may i ask how out of date is the documentation supplied with pth?
I have two areas in my current project where i must wait for events while
conducting a select() query.  However, when i had a casual browse through
the pth source i found the following undocumented:
pth_event(PTH_EVENT_SELECT, ...) and pth_event(PTH_EVENT_COND, ...).  I am
using both of these in the two main loops (each in a different thread) of
the program.

However, i have discovered what i can only call a lock-out condition, which
can be clearly seen from the following log file extract of my code:

The following extract shows what _should_ happen, and occasionally does.
put_client_in_queue() is called to put an fd + some info in a linked list,
and then calls pth_cond_notify(&refresh_condition, TRUE) to notify the
waiting thread (request_dispatcher) that it needs to update the FD's in its
select query (as select wont do it automatically).


#[Thu May 24 15:03:49 2001, 50] pipeline_request.c:put_client_in_queue[863]
#  Entering put_client_in_queue (of keepalives)...
#         The client_fd is 9
#
#[Thu May 24 15:03:49 2001, 45] pipeline_request.c:request_dispatcher[187]
#  Recieved event of (unknown) type ... processing...
#Our worker function has detected that we should update the fdlist

This is the point where the request_dispatcher thread has recieved the
condition, and now restarts its main loop to update the select FDsets...

#
#[Thu May 24 15:03:49 2001, 50] pipeline_request.c:request_dispatcher[101]
#  Reconstructing fd_list ... Adding: 9, ... Reconstructed with maxfd = 9
#
#[Thu May 24 15:03:49 2001, 100] pipeline_request.c:request_dispatcher[113]
#  Displaying fd_l_cli_st3s list ...
#         0x80fb678{fd = 9, listentry = 0, cli_state = 0x80fb5b8, next = 0,
prev = 0}
#        Constructing original event_msg = pth_event(PTH_EVENT_MSG, msgport)
#        Constructing original event_refresh = pth_event(PTH_EVENT_COND,
&refresh_condition)
#        Constructing original event_select = pth_event(PTH_EVENT_SELECT,
&nocfds, maxfd + 1, &rdset, NULL, NULL)
#        Constructing original event_timeout = pth_event(PTH_EVENT_TIME,
pth_timeout(10,0))
#        Concating the events
#        Reconstruction the events, waiting for them to occur

at this point all the events have been concanticated into one
pth-event-loop, and pth_wait() is called. You may be wondering what the
event_timeout is, the reason for its existance will be made clear later (as
a tempory solution to the problem)

#
#[Thu May 24 15:03:49 2001, 45] pipeline_request.c:request_dispatcher[187]
#  Recieved event of (unknown) type ... processing...
#         it was 1 client(s) in the keep_alive pool bleeting
#

the select event is imeditally triggered as there has been data waiting to
be processed while the lists were being updated. Execution continues in a
well ordered manner .... (well, possibly untill this bit happens again)


The above showed the desired flow of execution, however the following
occasionally happens (quite often infact !)

As before, put_client_in_queue() is called to put an fd + some info in a
linked list, and then calls pth_cond_notify(&refresh_condition, TRUE) to
notify the waiting thread (request_dispatcher) that it needs to update the
FD's in its select query...

#[Thu May 24 15:03:28 2001, 50] pipeline_request.c:put_client_in_queue[863]
#  Entering put_client_in_queue (of keepalives)...
#         The client_fd is 9
#
#[Thu May 24 15:03:28 2001, 45] pipeline_request.c:request_dispatcher[187]
#  Recieved event of (unknown) type ... processing...
#Our worker function has detected that we should update the fdlist

(request_dispatcher) as before has recieved the notice to update the select
FD's and does so :

#
#[Thu May 24 15:03:28 2001, 50] pipeline_request.c:request_dispatcher[101]
#  Reconstructing fd_list ... Adding: 9, ... Reconstructed with maxfd = 9
#
#[Thu May 24 15:03:28 2001, 100] pipeline_request.c:request_dispatcher[113]
#  Displaying fd_l_cli_st3s list ...
#         0x80fb678{fd = 9, listentry = 0, cli_state = 0x80fb5b8, next = 0,
prev = 0}
#        Constructing original event_msg = pth_event(PTH_EVENT_MSG, msgport)
#        Constructing original event_refresh = pth_event(PTH_EVENT_COND,
&refresh_condition)
#        Constructing original event_select = pth_event(PTH_EVENT_SELECT,
&nocfds, maxfd + 1, &rdset, NULL, NULL)
#        Constructing original event_timeout = pth_event(PTH_EVENT_TIME,
pth_timeout(10,0))
#        Concating the events
#        Reconstruction the events, waiting for them to occur

that was all nicley done, and pth_wait() is called to wait for the events
... now, in this test situation, i _KNOW_ that there is data on FD 9, as the
test client is just sending request after request ... so its either the
split second timing of something, or that the scheduler has thought it more
worthwhile to run the (download_manager) thread first, but what ever it is,
the (download_manager) thread resumes execution... and it reconstructs all
its FDlists etc ..


#[Thu May 24 15:03:28 2001, 50] pipeline_download.c:download_manager[40]
#  Reconstructing FD_SET ...Reconstructed with maxrdfd = 0, maxwrfd = 0

Now, as the two FD_SET's it uses are empty, the (download_manager) does not
recreate the pth_event(PTH_EVENT_SELECT, ...) calls to handle them.

#[Thu May 24 15:03:28 2001, 100] pipeline_download.c:download_manager[60]
#  Displaying fd_l_cli_st3s list ...
#        Constructing original event_msg = pth_event(PTH_EVENT_MSG, msgport)
#        Not Concating the events
#        Reconstruction the events, waiting for them to occur

It now has reconstructed the events, and has only one to wait for, so it
doesnt concat them, it then executes pth_wait(event_msg), waiting for a
message...

Now, i definitley know there is some data waiting on the socket ... and what
happens ... we wait 5 seconds demonstrated by out ticker appearing, and
nothing happens, surley the select query has returended ???????

#ticker: time: Thu May 24 15:03:29 2001,
#  total threads: 10, waiting: 9, ready: 0, suspended: 0, dead: 0,
#  average load: 1.043408, #new_connected_requests: 1,
#  #handled_requests: 0, #requests_pending_from_keepalive: 1

Well evidently not !! as after a further 5 seconds, the ticker comes along
again ....


#ticker: time: Thu May 24 15:03:34 2001,
#  total threads: 10, waiting: 9, ready: 0, suspended: 0, dead: 0,
#  average load: 1.032556, #new_connected_requests: 1,
#  #handled_requests: 0, #requests_pending_from_keepalive: 1
#

Now we have waited 10 seconds .... (remember the event_timeout from earlier
?) well 10 seconds after our initial pth_wait(), the timeout event happens
...

#[Thu May 24 15:03:38 2001, 45] pipeline_request.c:request_dispatcher[187]
#  Recieved event of (unknown) type ... processing...
#Our 10 second timeout occured, processing lists for dead things
#        Testing 9, Found one that was waiting

What this does is use poll(), to check for dead sockets, and cleans them
out, the loop then recreates the FD_SET's and _RE_creates the events ...


#[Thu May 24 15:03:38 2001, 50] pipeline_request.c:request_dispatcher[101]
#  Reconstructing fd_list ... Adding: 9, ... Reconstructed with maxfd = 9
#

That was the FD_SET, here are the events ...

#[Thu May 24 15:03:38 2001, 100] pipeline_request.c:request_dispatcher[113]
#  Displaying fd_l_cli_st3s list ...
#         0x80fb678{fd = 9, listentry = 0, cli_state = 0x80fb5b8, next = 0,
prev = 0}
#        Constructing original event_msg = pth_event(PTH_EVENT_MSG, msgport)
#        Constructing original event_refresh = pth_event(PTH_EVENT_COND,
&refresh_condition)
#        Constructing original event_select = pth_event(PTH_EVENT_SELECT,
&nocfds, maxfd + 1, &rdset, NULL, NULL)
#        Constructing original event_timeout = pth_event(PTH_EVENT_TIME,
pth_timeout(10,0))
#        Concating the events
#        Reconstruction the events, waiting for them to occur
#

ok, we have recreated the events, and are now pth_wait()'ing for them to
happen, and supprise supprise,  immeditally,

#[Thu May 24 15:03:38 2001, 45] pipeline_request.c:request_dispatcher[187]
#  Recieved event of (unknown) type ... processing...
#         it was 1 client(s) in the keep_alive pool bleeting

we get our select returning ... more than a full 10 seconds after data was
avaliable on the socket which we were waiting for ...


Now whats the point of all this ?, simple ... every thing works fine as long
as the (download_manager) thread does not resume execution and recreate its
events, and wait for them ....

Here is some possible good news, in the last 20 minutes while i have been
typing this, i have (in the last 30 seconds) thought of why the problem may
actuall y be happening ....

first a question : When you call pth_event(PTH_EVENT_SELECT, ...) say twice,
in different threads, do the FD_SETS from the two completely different
queries get merged into one ?, if so, then would calling
pth_event_free(concacted_events, PTH_FREE_ALL); in one of the threads
destroy all the FD's waiting in the main select call ?

This may illustrait what i mean a bit better... However, it may not as the
80char limit imposed by SMTP is likley to skrew it up ! (sorry !)

Thread1---pth_event(PTH_EVENT_SELECT)--->---\
                                            |
                                            +-->(a single select request)
                                            |   (made by the pth library)
Thread2---pth_event(PTH_EVENT_SELECT)--->---/


Now, say Thread2 free's its event list, its FD_SET must be removed from the
main FD_SET being waited on by the scheduler, if the FD_SET being in the
scheduler were simply emptied, then Thread1 would never get the notification
that there is something waiting, as the query is no longer being made. IE,
it only notices there is something, when it _RE_creates the FD_SET for the
pth_event(PTH_EVENT_SELECT, ...) query ...

If the answer to my question was not that, then my hypothesis to why all
this happens is rather wrong, either way, any ideas ? (you know slightly
more than me about pth!, you having written it, and myself only broused
through it after a quick 'grep -nir select' :-)

Any information, advice, solutions, whatever would be extreemly apreciated
!!!

Many Thanks in advance,

David Flynn

PS: I am terribly sorry for the length of this e-mail, and the fact that i
have dumped a log file of a program you have never seen before, and know
nothing of what it does on you !
---------------------------------------
The information in this e-mail and any files sent with it is confidential to
the ordinary user of the e-mail address to which it was addressed and may
also be legally privileged. It is not to be relied upon by any person other
than the addressee except with the sender's prior written approval. If no
such approval is given, the sender will not accept liability (in negligence
or otherwise) arising from any third party acting, or refraining from
acting, on such information. If you are not the intended recipient of this
e-mail you may not copy, forward, disclose or otherwise use it or any part
of it in any form whatsoever. If you have received this e-mail in error
please notify the sender immediately, destroy any copies and delete it from
your computer system. Have a nice Day !
---------------------------------------------

______________________________________________________________________
GNU Portable Threads (Pth)            http://www.gnu.org/software/pth/
User Support Mailing List                            [EMAIL PROTECTED]
Automated List Manager (Majordomo)           [EMAIL PROTECTED]
pth_event() problems, and undocumented features

Reply via email to