Hi guys,
This is a follow up to the e-mail sent by myself, shown below
(please read the other one first to make sence of what i am saying here ...
Ive written a little debuging function which ive modified part of the
pth_sched_eventmanager() funtion to print debug information:
This is the function:
void print_fdset_members(fd_set *fdset, int maxfd)
{
int i;
printf("members of fdset are: ");
for (i = 0; i < maxfd; i++)
{
if(FD_ISSET(i, fdset))
printf("%d, ", i);
}
printf("\n");
}
And the modified lines in pth_sched.c
Line 614:
if (!(dopoll && fdmax == -1))
{
/*Modification By David Flynn*/
pth_debug2("pth_sched_eventmanager: doing a select on rdfs with
fdmax of %d", fdmax);
print_fdset_members(&rfds, fdmax + 1);
while ((rc = pth_sc(select)(fdmax+1, &rfds, &wfds, &efds, pdelay)) <
0
&& errno == EINTR) ;
}
Ok, with the functions ive modified and created shown, lets show you some
more detail of the sequence of events where it all buggers up :
Ok this is the beginning of the loop in one thread, where the FD_SET list is
recreated
#[Fri May 25 13:43:04 2001, 50] pipeline_request.c:request_dispatcher[101]
# Reconstructing fd_list ... Adding: 9, ... Reconstructed with maxfd = 9
#
#[Fri May 25 13:43:04 2001, 100] pipeline_request.c:request_dispatcher[113]
# Displaying fd_l_cli_st3s list ...
# 0x810cad0{fd = 9, listentry = 0, cli_state = 0x810ca10, next = 0,
prev = 0}
# REconstructing event_msg = pth_event(PTH_EVENT_MSG |
PTH_MODE_REUSE, event_msg, msgport)
# REconstructing event_refresh = pth_event(PTH_EVENT_COND |
PTH_MODE_REUSE, event_refresh, &refresh_condition)
# Constructing Semi-original event_select =
pth_event(PTH_EVENT_SELECT | PTH_MODE_REUSE, event_select, &nocfds, 10,
&rdset, NULL, NULL)
# Concating the events
# Reconstruction the events, waiting for them to occur
Ok, we have recreated our events (Note,as there is _no_ documentation about
PTH_MODE_REUSE, i am assuming that i am correct in the way i have used it (i
have attempted to guess how to use it from one of the files in the test
suite)).
And now call pth_wait() ...
#31606:pth_event.c:0388: pth_wait: enter from thread "request_dispatcher"
#31606:pth_event.c:0394: pth_wait: waiting on event 0x810c408
#31606:pth_event.c:0394: pth_wait: waiting on event 0x810c438
#31606:pth_event.c:0394: pth_wait: waiting on event 0x810c468
#31606:pth_lib.c:0444: pth_yield: enter from thread "request_dispatcher"
#31606:pth_lib.c:0466: pth_yield: give up control to scheduler
#==== THREAD CONTEXT SWITCH ===========================================
#31606:pth_sched.c:0248: pth_scheduler: cameback from thread 0x808a6e8
("request_dispatcher")
#31606:pth_sched.c:0257: pth_scheduler: thread "request_dispatcher" ran
0.589000
#31606:pth_sched.c:0339: pth_scheduler: moving thread "request_dispatcher"
to waiting queue
#31606:pth_sched.c:0400: pth_sched_eventmanager: enter in polling mode
#31606:pth_sched.c:0617: pth_sched_eventmanager: doing a select on rdfs with
fdmax of 9
#members of fdset are: 5, 7, 9,
Ok, we are in the eventmanager ... and it is doing a select with 5, 7, 9 (i
dont know what 5 and 7 are, but 9 is mine !) now, i KNOW there is no data
on the socket ... (ive made the test program guarantee it)
#31606:pth_sched.c:0780: pth_sched_eventmanager: leaving
#31606:pth_sched.c:0207: pth_scheduler: thread "download_manager" selected
(prio=0, qprio=0)
#31606:pth_sched.c:0232: pth_scheduler: switching to thread 0x80aab78
("download_manager")
#==== THREAD CONTEXT SWITCH ===========================================
#31606:pth_lib.c:0468: pth_yield: got back control from scheduler
#31606:pth_lib.c:0470: pth_yield: leave to thread "download_manager"
We have now switched to thread 2 .... and it reconstructs its things ..
#
#[Fri May 25 13:43:04 2001, 50] pipeline_download.c:download_manager[40]
# Reconstructing FD_SET ...Reconstructed with maxrdfd = 0, maxwrfd = 0
#
#[Fri May 25 13:43:04 2001, 100] pipeline_download.c:download_manager[60]
# Displaying fd_l_cli_st3s list ...
# REconstructing event_msg = pth_event(PTH_EVENT_MSG |
PTH_MODE_REUSE, event_msg, msgport)
# Constructing Semi-original event_select =
pth_event(PTH_EVENT_SELECT | PTH_MODE_REUSE, event_select, &nosfds, 0,
&rdset, NULL, NULL)
# Constructing Semi-original event_select =
pth_event(PTH_EVENT_SELECT | PTH_MODE_REUSE, event_wrselect, &nocfds, 0,
&wrset, NULL, NULL)
# Concating the events
# Reconstruction the events, waiting for them to occur
The events have been reconstructed, and neither of the fdsets here have
anything in them, so maxfd = 0. and we call pth_wait()
#31606:pth_event.c:0388: pth_wait: enter from thread "download_manager"
#31606:pth_event.c:0394: pth_wait: waiting on event 0x810c960
#31606:pth_event.c:0394: pth_wait: waiting on event 0x810c990
#31606:pth_event.c:0394: pth_wait: waiting on event 0x810c9c0
#31606:pth_lib.c:0444: pth_yield: enter from thread "download_manager"
#31606:pth_lib.c:0466: pth_yield: give up control to scheduler
#==== THREAD CONTEXT SWITCH ===========================================
#31606:pth_sched.c:0248: pth_scheduler: cameback from thread 0x80aab78
("download_manager")
#31606:pth_sched.c:0257: pth_scheduler: thread "download_manager" ran
0.394300
#31606:pth_sched.c:0339: pth_scheduler: moving thread "download_manager" to
waiting queue
#31606:pth_sched.c:0400: pth_sched_eventmanager: enter in waiting mode
#31606:pth_sched.c:0617: pth_sched_eventmanager: doing a select on rdfs with
fdmax of 9
ok, we are in the scheduler and the eventmanager ... the fdmax of the select
query in the event manager is 9, thats correct, but LOW AND BEHOLD ..... my
FD is missing from the FD SET !!!!
#members of fdset are: 5, 7,
Now all i can conclude from this is that there is something wrong inside
pth, and it is loosing the fd's for some unknown reason ... (remember it
only looses them after thread 2 makes its events etc ....)
#31606:pth_sched.c:0643: pth_sched_eventmanager: [timeout] event occurred
for thread "ticker"
#31606:pth_sched.c:0770: pth_sched_eventmanager: thread "ticker" moved from
waiting to ready queue
#31606:pth_sched.c:0780: pth_sched_eventmanager: leaving
#31606:pth_sched.c:0207: pth_scheduler: thread "ticker" selected (prio=0,
qprio=0)
#31606:pth_sched.c:0232: pth_scheduler: switching to thread 0x807a4a0
("ticker")
#==== THREAD CONTEXT SWITCH ===========================================
and we continue to wait forever ...
Well i hope this extra info is usefull, Thanks in advance
Dave
> Hello,
> The following applies to pth-1.3 and pth-1.4 running on linux 2.4.4 with
a
> no additional ./configure arguments.
>
> Firstly, may i ask how out of date is the documentation supplied with
pth?
> I have two areas in my current project where i must wait for events while
> conducting a select() query. However, when i had a casual browse through
> the pth source i found the following undocumented:
> pth_event(PTH_EVENT_SELECT, ...) and pth_event(PTH_EVENT_COND, ...). I am
> using both of these in the two main loops (each in a different thread) of
> the program.
>
> However, i have discovered what i can only call a lock-out condition,
which
> can be clearly seen from the following log file extract of my code:
>
> The following extract shows what _should_ happen, and occasionally does.
> put_client_in_queue() is called to put an fd + some info in a linked list,
> and then calls pth_cond_notify(&refresh_condition, TRUE) to notify the
> waiting thread (request_dispatcher) that it needs to update the FD's in
its
> select query (as select wont do it automatically).
>
>
> #[Thu May 24 15:03:49 2001, 50]
pipeline_request.c:put_client_in_queue[863]
> # Entering put_client_in_queue (of keepalives)...
> # The client_fd is 9
> #
> #[Thu May 24 15:03:49 2001, 45] pipeline_request.c:request_dispatcher[187]
> # Recieved event of (unknown) type ... processing...
> #Our worker function has detected that we should update the fdlist
>
> This is the point where the request_dispatcher thread has recieved the
> condition, and now restarts its main loop to update the select FDsets...
>
> #
> #[Thu May 24 15:03:49 2001, 50] pipeline_request.c:request_dispatcher[101]
> # Reconstructing fd_list ... Adding: 9, ... Reconstructed with maxfd = 9
> #
> #[Thu May 24 15:03:49 2001, 100]
pipeline_request.c:request_dispatcher[113]
> # Displaying fd_l_cli_st3s list ...
> # 0x80fb678{fd = 9, listentry = 0, cli_state = 0x80fb5b8, next =
0,
> prev = 0}
> # Constructing original event_msg = pth_event(PTH_EVENT_MSG,
msgport)
> # Constructing original event_refresh = pth_event(PTH_EVENT_COND,
> &refresh_condition)
> # Constructing original event_select = pth_event(PTH_EVENT_SELECT,
> &nocfds, maxfd + 1, &rdset, NULL, NULL)
> # Constructing original event_timeout = pth_event(PTH_EVENT_TIME,
> pth_timeout(10,0))
> # Concating the events
> # Reconstruction the events, waiting for them to occur
>
> at this point all the events have been concanticated into one
> pth-event-loop, and pth_wait() is called. You may be wondering what the
> event_timeout is, the reason for its existance will be made clear later
(as
> a tempory solution to the problem)
>
> #
> #[Thu May 24 15:03:49 2001, 45] pipeline_request.c:request_dispatcher[187]
> # Recieved event of (unknown) type ... processing...
> # it was 1 client(s) in the keep_alive pool bleeting
> #
>
> the select event is imeditally triggered as there has been data waiting to
> be processed while the lists were being updated. Execution continues in a
> well ordered manner .... (well, possibly untill this bit happens again)
>
>
> The above showed the desired flow of execution, however the following
> occasionally happens (quite often infact !)
>
> As before, put_client_in_queue() is called to put an fd + some info in a
> linked list, and then calls pth_cond_notify(&refresh_condition, TRUE) to
> notify the waiting thread (request_dispatcher) that it needs to update the
> FD's in its select query...
>
> #[Thu May 24 15:03:28 2001, 50]
pipeline_request.c:put_client_in_queue[863]
> # Entering put_client_in_queue (of keepalives)...
> # The client_fd is 9
> #
> #[Thu May 24 15:03:28 2001, 45] pipeline_request.c:request_dispatcher[187]
> # Recieved event of (unknown) type ... processing...
> #Our worker function has detected that we should update the fdlist
>
> (request_dispatcher) as before has recieved the notice to update the
select
> FD's and does so :
>
> #
> #[Thu May 24 15:03:28 2001, 50] pipeline_request.c:request_dispatcher[101]
> # Reconstructing fd_list ... Adding: 9, ... Reconstructed with maxfd = 9
> #
> #[Thu May 24 15:03:28 2001, 100]
pipeline_request.c:request_dispatcher[113]
> # Displaying fd_l_cli_st3s list ...
> # 0x80fb678{fd = 9, listentry = 0, cli_state = 0x80fb5b8, next =
0,
> prev = 0}
> # Constructing original event_msg = pth_event(PTH_EVENT_MSG,
msgport)
> # Constructing original event_refresh = pth_event(PTH_EVENT_COND,
> &refresh_condition)
> # Constructing original event_select = pth_event(PTH_EVENT_SELECT,
> &nocfds, maxfd + 1, &rdset, NULL, NULL)
> # Constructing original event_timeout = pth_event(PTH_EVENT_TIME,
> pth_timeout(10,0))
> # Concating the events
> # Reconstruction the events, waiting for them to occur
>
> that was all nicley done, and pth_wait() is called to wait for the events
> ... now, in this test situation, i _KNOW_ that there is data on FD 9, as
the
> test client is just sending request after request ... so its either the
> split second timing of something, or that the scheduler has thought it
more
> worthwhile to run the (download_manager) thread first, but what ever it
is,
> the (download_manager) thread resumes execution... and it reconstructs all
> its FDlists etc ..
>
>
> #[Thu May 24 15:03:28 2001, 50] pipeline_download.c:download_manager[40]
> # Reconstructing FD_SET ...Reconstructed with maxrdfd = 0, maxwrfd = 0
>
> Now, as the two FD_SET's it uses are empty, the (download_manager) does
not
> recreate the pth_event(PTH_EVENT_SELECT, ...) calls to handle them.
>
> #[Thu May 24 15:03:28 2001, 100] pipeline_download.c:download_manager[60]
> # Displaying fd_l_cli_st3s list ...
> # Constructing original event_msg = pth_event(PTH_EVENT_MSG,
msgport)
> # Not Concating the events
> # Reconstruction the events, waiting for them to occur
>
> It now has reconstructed the events, and has only one to wait for, so it
> doesnt concat them, it then executes pth_wait(event_msg), waiting for a
> message...
>
> Now, i definitley know there is some data waiting on the socket ... and
what
> happens ... we wait 5 seconds demonstrated by out ticker appearing, and
> nothing happens, surley the select query has returended ???????
>
> #ticker: time: Thu May 24 15:03:29 2001,
> # total threads: 10, waiting: 9, ready: 0, suspended: 0, dead: 0,
> # average load: 1.043408, #new_connected_requests: 1,
> # #handled_requests: 0, #requests_pending_from_keepalive: 1
>
> Well evidently not !! as after a further 5 seconds, the ticker comes along
> again ....
>
>
> #ticker: time: Thu May 24 15:03:34 2001,
> # total threads: 10, waiting: 9, ready: 0, suspended: 0, dead: 0,
> # average load: 1.032556, #new_connected_requests: 1,
> # #handled_requests: 0, #requests_pending_from_keepalive: 1
> #
>
> Now we have waited 10 seconds .... (remember the event_timeout from
earlier
> ?) well 10 seconds after our initial pth_wait(), the timeout event happens
> ...
>
> #[Thu May 24 15:03:38 2001, 45] pipeline_request.c:request_dispatcher[187]
> # Recieved event of (unknown) type ... processing...
> #Our 10 second timeout occured, processing lists for dead things
> # Testing 9, Found one that was waiting
>
> What this does is use poll(), to check for dead sockets, and cleans them
> out, the loop then recreates the FD_SET's and _RE_creates the events ...
>
>
> #[Thu May 24 15:03:38 2001, 50] pipeline_request.c:request_dispatcher[101]
> # Reconstructing fd_list ... Adding: 9, ... Reconstructed with maxfd = 9
> #
>
> That was the FD_SET, here are the events ...
>
> #[Thu May 24 15:03:38 2001, 100]
pipeline_request.c:request_dispatcher[113]
> # Displaying fd_l_cli_st3s list ...
> # 0x80fb678{fd = 9, listentry = 0, cli_state = 0x80fb5b8, next =
0,
> prev = 0}
> # Constructing original event_msg = pth_event(PTH_EVENT_MSG,
msgport)
> # Constructing original event_refresh = pth_event(PTH_EVENT_COND,
> &refresh_condition)
> # Constructing original event_select = pth_event(PTH_EVENT_SELECT,
> &nocfds, maxfd + 1, &rdset, NULL, NULL)
> # Constructing original event_timeout = pth_event(PTH_EVENT_TIME,
> pth_timeout(10,0))
> # Concating the events
> # Reconstruction the events, waiting for them to occur
> #
>
> ok, we have recreated the events, and are now pth_wait()'ing for them to
> happen, and supprise supprise, immeditally,
>
> #[Thu May 24 15:03:38 2001, 45] pipeline_request.c:request_dispatcher[187]
> # Recieved event of (unknown) type ... processing...
> # it was 1 client(s) in the keep_alive pool bleeting
>
> we get our select returning ... more than a full 10 seconds after data was
> avaliable on the socket which we were waiting for ...
>
>
> Now whats the point of all this ?, simple ... every thing works fine as
long
> as the (download_manager) thread does not resume execution and recreate
its
> events, and wait for them ....
>
> Here is some possible good news, in the last 20 minutes while i have been
> typing this, i have (in the last 30 seconds) thought of why the problem
may
> actuall y be happening ....
>
> first a question : When you call pth_event(PTH_EVENT_SELECT, ...) say
twice,
> in different threads, do the FD_SETS from the two completely different
> queries get merged into one ?, if so, then would calling
> pth_event_free(concacted_events, PTH_FREE_ALL); in one of the threads
> destroy all the FD's waiting in the main select call ?
>
> This may illustrait what i mean a bit better... However, it may not as the
> 80char limit imposed by SMTP is likley to skrew it up ! (sorry !)
>
> Thread1---pth_event(PTH_EVENT_SELECT)--->---\
> |
> +-->(a single select request)
> | (made by the pth library)
> Thread2---pth_event(PTH_EVENT_SELECT)--->---/
>
>
> Now, say Thread2 free's its event list, its FD_SET must be removed from
the
> main FD_SET being waited on by the scheduler, if the FD_SET being in the
> scheduler were simply emptied, then Thread1 would never get the
notification
> that there is something waiting, as the query is no longer being made. IE,
> it only notices there is something, when it _RE_creates the FD_SET for the
> pth_event(PTH_EVENT_SELECT, ...) query ...
>
> If the answer to my question was not that, then my hypothesis to why all
> this happens is rather wrong, either way, any ideas ? (you know slightly
> more than me about pth!, you having written it, and myself only broused
> through it after a quick 'grep -nir select' :-)
>
> Any information, advice, solutions, whatever would be extreemly apreciated
> !!!
>
> Many Thanks in advance,
>
> David Flynn
>
> PS: I am terribly sorry for the length of this e-mail, and the fact that i
> have dumped a log file of a program you have never seen before, and know
> nothing of what it does on you !
> ---------------------------------------
> The information in this e-mail and any files sent with it is confidential
to
> the ordinary user of the e-mail address to which it was addressed and may
> also be legally privileged. It is not to be relied upon by any person
other
> than the addressee except with the sender's prior written approval. If no
> such approval is given, the sender will not accept liability (in
negligence
> or otherwise) arising from any third party acting, or refraining from
> acting, on such information. If you are not the intended recipient of this
> e-mail you may not copy, forward, disclose or otherwise use it or any part
> of it in any form whatsoever. If you have received this e-mail in error
> please notify the sender immediately, destroy any copies and delete it
from
> your computer system. Have a nice Day !
> ---------------------------------------------
>
______________________________________________________________________
GNU Portable Threads (Pth) http://www.gnu.org/software/pth/
User Support Mailing List [EMAIL PROTECTED]
Automated List Manager (Majordomo) [EMAIL PROTECTED]