Re: select() efficiency / epoll

2005-08-24 Thread Davy Durham

Davide Libenzi wrote:



There is no known problem in using epoll_ctl() in one thread while 
another does epoll_wait().
I suggest you to ask Valgrind to take a look at you binary. Since I 
have no clue of what your software does, please create the *minimal* 
code snippet that exploit the eventual problem, and post it.


Yes, I have pretty much confirmed this. And unfortunately I tried to 
make a minimal code snippet which demonstrates the problem, but wasn't 
able to do that before I figured out a work-around.  I may still try to 
create something for you to test against so you can fix it.  But I'm 
going to have to continue to work with the existing implementation since 
I'm going to be running this code on some production servers where 
updating the kernel might not be an option.


The work-around is as follows:

1) I create a queue that can hold operations to perform on the epoll 
structure and I protect it with a mutex.


2) Other threads (when needing to modify the epoll) lock the mutex and 
enque the operation into the operation queue instead of calling 
epoll_ctl itself (i.e. add this socket for reading.. add this socket for 
writing, remove this socket.. etc) *and* then cancel the epoll_wait() 
  I implemented the cancel by having a pipe() always being watched for 
read, and write a byte to it when I want to cancel (is there a better way?)
  There are several operations that could be supported 
(add/remove/modify/change userdata/etc), but I only need two myself.


3) There's only one thread that actually does the epoll_wait().  When 
epoll_wait() returns, (I first drain the cancel pipe so it never fills 
up) I handle what events need handling, and then lock the operations 
queue mutex, perform all the operations in the queue then clear the queue




So, this works for me now.

Thanks for all your guys' info.

-- Davy

P.S.   Davide, I still might get you that snipped, but it's not a 
trivial snippet as you can imagine... and timing is everything to the 
problem :( .. and also the question of WHERE it corrupts memory.. it 
seemed to be unpredictable so far.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: select() efficiency / epoll

2005-08-23 Thread Davy Durham

Jari Sundell wrote:


On 8/23/05, Davy Durham <[EMAIL PROTECTED]> wrote:

 


I was hoping you would mention in your reply that you knew
epoll_data_t was an union and you didn't touch epoll_data::fd, so i
wouldn't have to say it explicitly. ;)

 

Oh!.. unless the epoll_data_t is a union just for convenience in that it 
already has an 'int fd' if you want to use that, but don't have to.. 
that at least makes the void *ptr, useful..  The example in 'man epoll' 
sorta made it look necessary to set the 'fd' of the union.


But that still doesn't fix the issue of course.. but good to know.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: select() efficiency / epoll

2005-08-23 Thread Davy Durham

Jari Sundell wrote:


On 8/23/05, Davy Durham <[EMAIL PROTECTED]> wrote:
 


I was hoping you would mention in your reply that you knew
epoll_data_t was an union and you didn't touch epoll_data::fd, so i
wouldn't have to say it explicitly. ;)

 

No, I saw that epoll_data_t was a union (although, it kind of makes the 
ptr useless as a user data pointer.. but I'm not using it for that)


When I mean that pointers are getting corrupted, I just mean in other 
parts of the code (actually it's some C++ STL container's data and is 
completely unrelated to the epoll specific code)  Something, somewhere 
seems to be writing to memory that it's not supposed to be writing to.  
And as far as I can tell, it happens when I use epoll and doesn't when I 
use select  :-/


-- Davy




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: select() efficiency / epoll

2005-08-23 Thread Davy Durham

Davide Libenzi wrote:



I should mention that the 2.4 patch is old WRT mainline epoll in 2.6 
(I stopped maintaining it when 2.6 went "stable"). I'd definitely 
suggest to use 2.6 if you are looking at epoll.


I am using linux-2.6.11 and glibc-2.3.4  .. and using select() in it's 
place seems to work fine.  Are there any known issues with say, one 
thread does epoll_wait()s while other threads may be doing epoll_ctl()s?


Is there someone else I should be asking this question?

Thanks,
 Davy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: select() efficiency / epoll

2005-08-23 Thread Davy Durham
Thanks for the info.. I did find this thread and was wondering if this 
patch ever got put in


http://www.ussg.iu.edu/hypermail/linux/kernel/0303.3/1139.html



Willy Tarreau wrote:


On Tue, Aug 23, 2005 at 06:24:42AM -0500, Davy Durham wrote:
 

That's probably a good idea.  Where would I find out what other projects 
use it?
   



I use it in my load-balancer (haproxy), and it could somewhat match your
needs, because I ported the select()-based earlier version to epoll() with
the smallest possible changes. Indeed, the new epoll() loop still uses the
FD_ISSET() to determine what to do with epoll_ctl(). If you have changed
your code to use select(), you may find similarities. But I want to tell
you from now that my code is NOT multi-threaded. It could be a bug in the
epoll implementation, because I don't think that there are so many
applications using epoll on MT models. Bert says that the epoll implementation
is heavily benchmarked, which is true, but which does not guarantee that it
is tested under every condition.

You can download it from there :

 http://w.ods.org/tools/haproxy/src/devel/

Use version 1.2.6. I added epoll in 1.2.5, so the diff between 1.2.4 and
1.2.5 could help you too.

Good luck !
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: select() efficiency / epoll

2005-08-23 Thread Davy Durham

Jari Sundell wrote:


On 8/23/05, Davy Durham <[EMAIL PROTECTED]> wrote:
 


However, I'm getting segfaults because some pointers in places are
getting set to low integer values (which didn't used to have those values).
   



Is it possible that you are overwritting the pointers with file
descriptors, as those would have low integer values?

 

Yes, that is what I was thinking and is why I mentioned that.  But I'm 
apparently not overwriting the pointers with FDs.. it seems that epoll 
is the cause at this point (unless I'm misusing the epoll API).  I've 
made some changes to now use select() instead of epoll and things work 
flawlessly (although it obviously won't work as efficiently when I 
really connect a lot of clients to this server)




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: select() efficiency / epoll

2005-08-23 Thread Davy Durham
That's probably a good idea.  Where would I find out what other projects 
use it?


Willy Tarreau wrote:


Hi,

On Tue, Aug 23, 2005 at 06:01:15AM -0500, Davy Durham wrote:
 

I just mean that when  I debug and catch the segv, it's dies because 
some pointers now have corrupted values.  (usually because something is 
overwriting some memory some where)


I'm currently re-writing some code to make it use select() instead of 
epoll_wait() and see if everything is suddently fixed.  If so, then I 
will suspect that epoll has a problem.  But it's still not ruled out 
being my fault since it could be a timing issue that makes the crash 
show up.
   



Just out of curiosity, have you had the opportunity to read some other
code which uses epoll ? Maybe reading others code could enlighten you
on potential bugs in your code, potential races, etc...

Regards,
Willy
 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: select() efficiency / epoll

2005-08-23 Thread Davy Durham

Davy Durham wrote:



I'm currently re-writing some code to make it use select() instead of 
epoll_wait() and see if everything is suddently fixed.  If so, then I 
will suspect that epoll has a problem.  But it's still not ruled out 
being my fault since it could be a timing issue that makes the crash 
show up.



Well, the select() replacement works fine... so hrmm..


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: select() efficiency / epoll

2005-08-23 Thread Davy Durham

bert hubert wrote:


On Tue, Aug 23, 2005 at 04:49:14AM -0500, Davy Durham wrote:

 

However, I'm getting segfaults because some pointers in places are 
getting set to low integer values (which didn't used to have those values).
   



epoll is pretty heavily benchmarked and hence tested. I don't entirely
understand the remark above and suggest looking at the generated core dumps.

 

I just mean that when  I debug and catch the segv, it's dies because 
some pointers now have corrupted values.  (usually because something is 
overwriting some memory some where)


I'm currently re-writing some code to make it use select() instead of 
epoll_wait() and see if everything is suddently fixed.  If so, then I 
will suspect that epoll has a problem.  But it's still not ruled out 
being my fault since it could be a timing issue that makes the crash 
show up.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: select() efficiency / epoll

2005-08-23 Thread Davy Durham

So, I've been trying to use epoll.. on linux-2.6.11-6mdk


However, I'm getting segfaults because some pointers in places are 
getting set to low integer values (which didn't used to have those values).


The deal is that my application is multi-threaded, and I was wondering 
if epoll had issues if you use epoll_ctl while an epoll_wait is waiting 
or something like that.  I'm also compiling with -D_MULTI_THREADED.  I'm 
not new to threading, but am stumped at this point.


I'm not ruling out it being my code, but wanted to ask about epoll since 
it's so new.


Any ideas?

Thanks,
 Davy


bert hubert wrote:


On Fri, Jul 22, 2005 at 04:18:46PM -0500, Davy Durham wrote:
 

Please forgive and redirect me if this is not the right place to ask 
this question:


I'm looking to write a sort of messaging system that would take input 
from any number of entities that "register" with it.. it would then 
route the messages to outputs and so forth..
   



Look at epoll, or libevent, which uses epoll to be quick in this scenario.


 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /proc question

2005-08-04 Thread Davy Durham

Jan Engelhardt wrote:


I have a zombie process which has apparently died for some unknown reason.. I
know it was terminated by a signal (found that from the 9th field (sheduler
flags) in /proc/pid/stat)
   



Start the process under the observation of strace.

 


However, I'm trying to figure out what signal killed it.
   




Jan Engelhardt
 

Wish I could.. but it's already happened (to a lot of processes for the 
same reason)


It's an intermittant problem and can't really reproduce it at will.

I've redeployed the binary now so I can hopefully attach to it with gdb 
to figure out some things next time it does happen.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


/proc question

2005-08-04 Thread Davy Durham

After much research.. I have a question regarding /proc

I have a zombie process which has apparently died for some unknown 
reason.. I know it was terminated by a signal (found that from the 9th 
field (sheduler flags) in /proc/pid/stat)


However, I'm trying to figure out what signal killed it.

Also, it would be nice if /proc could show what the exit status of a 
dead process is.. seems strange that it doesn't contain that information 
(or am I just not seeing it in there).



Any info would be helpful.. thanks,
 Davy


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


select() efficiency

2005-07-22 Thread Davy Durham
Please forgive and redirect me if this is not the right place to ask 
this question:


I'm looking to write a sort of messaging system that would take input 
from any number of entities that "register" with it.. it would then 
route the messages to outputs and so forth..


I'm guessing that the messaging system would be a single process on the 
machine..


So, I'm considering making the means of input to the system be a unix 
socket.  An entity would connect to the socket as it's means of 
inputting messages into the system. 

However, lets suppose that 1000+ entities connect to that socket.. this 
would require the message system's loop to be adding 1000+ file 
descriptures to an fd_set and call select() every time it loops around 
to check for any messages.


So, my question is: how efficient would things be, doing selects() very 
often on 1000+ file descriptors?  I'm not aware of max size for an 
fd_set.. (I do know that NT is limited to 64 handles.. but that's really 
beside the point unless I look at porting someday)


Should I go another route?

The system is meant to rapidly route messages ASAP.. so it would be a 
bad idea to say write them to a file and poll the file or something like 
that...


Another thought was to use a system-wide mutex and write to a named 
pipe, but the socket method seems more appealing to me in design... and 
I didn't know if it was pretty much equivalent either way since either I 
will do the work of dealing with 1000+ things or the kernel will.


Thanks,
 Davy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Suspend/Resume

2005-04-21 Thread Davy Durham
Hi,
 I've been trying for the last few days to get my D810 to suspend and 
resume in linux.

I'm doing it from klaptop in kde using Fedora Core 3, but I've now 
compiled my own linux-2.6.12-rc2-mm3 kernel since I've seen some ACPI 
changes going in.

At 2.6.11 it would seem to suspend ok, but when doing the resume it 
would come back and have I/O errors.. causing the computer to freeze for 
a few seconds, then run for a second, then freeze again, etc.. the HDD 
light would stay on solid, and at the tty1 I saw something like "ata1: 
command 0xc8 timeout... I/O error..."  So apparently something isn't 
getting starting back up.  Thinking it might be the HDD not spinning, I 
powered off, but DID hear it spin down.

Running what I compiled,  2.6.12-rc2-mm3, the suspend happens a little 
faster but the resume comes to a blank screen, then immediately reboots 
without any messages that I can see.

I'm very interested in getting this to work and will do whatever someone 
needs to gather information.

I may need to ask basic kernel info questions when asked to do something 
as I haven't done much trouble shooting at this low a level before but 
I'm game.  From googling around this is a problem for many and I would 
like to help resolve it.

If I need to take this message to another mailing list or another 
individual working on ACPI stuff or something just let me know.

Any ideas?
Thanks,
 Davy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Dell D810 Laptop Suspend/Resume

2005-04-20 Thread Davy Durham
Hi,
  I've been trying for the last few days to get my D810 to suspend and 
resume in linux.

I'm doing it from klaptop in kde using Fedora Core 3, but I've now 
compiled my own linux-2.6.12-rc2-mm3 kernel since I've seen some ACPI 
changes going in.

At 2.6.11 it would seem to suspend ok, but when doing the resume it 
would come back and have I/O errors.. causing the computer to freeze for 
a few seconds, then run for a second, then freeze again, etc.. the HDD 
light would stay on solid, and at the tty1 I saw something like "ata1: 
command 0xc8 timeout... I/O error..."  So apparently something isn't 
getting starting back up.  Thinking it might be the HDD not spinning, I 
powered off, but DID hear it spin down.

Running what I compiled,  2.6.12-rc2-mm3, the suspend happens a little 
faster but the resume comes to a blank screen, then immediately reboots 
without any messages that I can see.

I'm very interested in getting this to work and will do whatever someone 
needs to gather information.

I may need to ask basic kernel info questions when asked to do something 
as I haven't done much trouble shooting at this low a level before but 
I'm game.  From googling around this is a problem for many and I would 
like to help resolve it.

Any ideas?
Thanks,
  Davy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/