IPv4 established connection paradox

2015-03-25 Thread Aaron Wiebe
I apologize for bringing this to the list, but after several hours of
google, testing, and so forth I'm coming up blank.  I've been unable
to reproduce this state in targeted tests, however the application
itself does it on a semi-regular basis.

I currently have an application with the following state

(from netstat):
tcp0  0 127.0.0.1:51115 127.0.0.1:51115
 ESTABLISHED 46965/python2.6
(from lsof):
python2.6 46965 root   14u  IPv4 11218239  0t0 TCP
localhost:51115->localhost:51115 (ESTABLISHED)

The application is blocked in recvfrom() on that socket.

I'm not looking for any specific assistance except the basic one:  how
is it possible to get into this state?  In my tests, binding to the
same port outgoing as a listener isn't possible (not surprised), and
if that's the case, how is it possible to ever have an established
connection to... the same socket.  This is effectively blocking binds
to the port (which is actually used by another application most of the
time).  The application would normally connect to this port to
status-check the running service.  In this case, the service is unable
to start because of this state.

Older RHEL6.6 kernel (2.6.32-431), but if this is a bug I can't seem
to find any mention of it anywhere.  And if it's not, I'm totally
confused and hoping someone can explain this to me.

(Please cc me in response)

-Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


IPv4 established connection paradox

2015-03-25 Thread Aaron Wiebe
I apologize for bringing this to the list, but after several hours of
google, testing, and so forth I'm coming up blank.  I've been unable
to reproduce this state in targeted tests, however the application
itself does it on a semi-regular basis.

I currently have an application with the following state

(from netstat):
tcp0  0 127.0.0.1:51115 127.0.0.1:51115
 ESTABLISHED 46965/python2.6
(from lsof):
python2.6 46965 root   14u  IPv4 11218239  0t0 TCP
localhost:51115-localhost:51115 (ESTABLISHED)

The application is blocked in recvfrom() on that socket.

I'm not looking for any specific assistance except the basic one:  how
is it possible to get into this state?  In my tests, binding to the
same port outgoing as a listener isn't possible (not surprised), and
if that's the case, how is it possible to ever have an established
connection to... the same socket.  This is effectively blocking binds
to the port (which is actually used by another application most of the
time).  The application would normally connect to this port to
status-check the running service.  In this case, the service is unable
to start because of this state.

Older RHEL6.6 kernel (2.6.32-431), but if this is a bug I can't seem
to find any mention of it anywhere.  And if it's not, I'm totally
confused and hoping someone can explain this to me.

(Please cc me in response)

-Aaron
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

Actually, lets see if I can summarize this more generically... I
realize I'm suggesting something that probably would be a massive
undertaking, but ..

Regular files are the only interface that requires an application to
wait.  With any other case, the nonblocking interfaces are fairly
complete and easy to work with.  If userspace could treat regular
files in the same fashion as sockets, life would be good.

I admittedly do not understand internal kernel semantics in the
differences between a socket and a regular file.  Why couldn't we just
have a different 'socket type' like PF_FILE or something like this?

Abstracting any IO through the existing interfaces provided to sockets
would be ideal from my perspective.  The code required to use a file
through these interfaces would be more complex in userspace, but the
abstraction of the current open() itself could simply be an aggregate
of these interfaces without a nonblocking flag.

It would, however, fix problems around issues with event-based
applications handling events from both disk and sockets.  I can't
trigger disk read/write events in the same event handlers I use for
sockets (ie, poll or epoll).  I end up having two separate event
handlers - one for disk (currently using glibc's aio thread kludge),
and one for sockets.

I'm sure this isn't a new idea.  Coming from my own development
backround that had little to do with disk, I was actually surprised
when I first discovered that I couldn't edge-trigger disk IO through
poll().

Thoughts, comments?

-Aaron

On 6/4/07, Aaron Wiebe <[EMAIL PROTECTED]> wrote:

On 6/4/07, Trond Myklebust <[EMAIL PROTECTED]> wrote:
>
> So exactly how would you expect a nonblocking open to work? Should it be
> starting I/O? What if that involves blocking? How would you know when to
> try again?

Well, theres a bunch of options - some have been suggested in the
thread already.  The idea of an open with O_NONBLOCK (or a different
flag) returning a handle immediately, and subsequent calls returning
EAGAIN if the open is incomplete, or ESTALE if it fails (with some
auxiliary method of getting the reason why it failed) are not too far
a stretch from my perspective.

The other option that comes to mind would be to add an interface that
behaves like sockets - get a handle from one system call, set it
nonblocking using fcntl, and use another call to attach it to a
regular file.  This method would make the most sense to me - but its
also because I've worked with sockets in the past far far more than
with regular files.

The one that would take the least amount of work from the application
perspective would be to simply reply to the nonblocking open call with
EAGAIN (or something), and when an open on the same file is performed,
the kernel could have performed its work in the background.  I can
understand, given the fact that there is no handle provided to the
application, that this idea could be sloppy.

I'm still getting caught up on some of the other suggestions (I'm
currently reading about the syslets work that Zach and Ingo are
doing), and it sounds like this is a common complaint that is being
addressed through a number of initiatives.  I'm looking forward to
seeing where that work goes.

-Aaron


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

On 6/4/07, Trond Myklebust <[EMAIL PROTECTED]> wrote:


So exactly how would you expect a nonblocking open to work? Should it be
starting I/O? What if that involves blocking? How would you know when to
try again?


Well, theres a bunch of options - some have been suggested in the
thread already.  The idea of an open with O_NONBLOCK (or a different
flag) returning a handle immediately, and subsequent calls returning
EAGAIN if the open is incomplete, or ESTALE if it fails (with some
auxiliary method of getting the reason why it failed) are not too far
a stretch from my perspective.

The other option that comes to mind would be to add an interface that
behaves like sockets - get a handle from one system call, set it
nonblocking using fcntl, and use another call to attach it to a
regular file.  This method would make the most sense to me - but its
also because I've worked with sockets in the past far far more than
with regular files.

The one that would take the least amount of work from the application
perspective would be to simply reply to the nonblocking open call with
EAGAIN (or something), and when an open on the same file is performed,
the kernel could have performed its work in the background.  I can
understand, given the fact that there is no handle provided to the
application, that this idea could be sloppy.

I'm still getting caught up on some of the other suggestions (I'm
currently reading about the syslets work that Zach and Ingo are
doing), and it sounds like this is a common complaint that is being
addressed through a number of initiatives.  I'm looking forward to
seeing where that work goes.

-Aaron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

Sorry for the unthreaded responses, I wasn't cc'd here, so I'm
replying to these based on mailing list archives

Al Viro wrote:


BTW, why close these suckers all the time? It's not that kernel would
be unable to hold thousands of open descriptors for your process...
Hash descriptors by pathname and be done with that; don't bother with
close unless you decide that you've got too many of them (e.g. when you
get a hash conflict).


A valid point - I currently keep a pool of 4000 descriptors open and
cycle them out based on inactivity.  I hadn't seriously considered
just keeping them all open, because I simply wasn't sure how well
things would go with 100,000 files open.  Would my backend storage
keep up... would the kernel mind maintaining 100,000 files open over
NFS?

The majority of the files would simply be idle - I would be keeping
file handles open for no reason.  Pooling allows me to substantially
drop the number of opens I require, but I am hesitant to blow the pool
size to substantially higher numbers.  Can anyone shed light on any
issues that may come up with a massive pool size, such as 128k?

-Aaron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

On 6/4/07, John Stoffel <[EMAIL PROTECTED]> wrote:


So how many files are in the directory where you're seeing the delays?
And what's the average size of the files in there?


The directories themselves will have a maximum of 160 files, and the
files are maybe a few megs each - the delays are (as you pointed out
earlier) due to the ram restrictions and our filesystem design of very
deep directory structures that Netapps suck at.

My point is more generic though - I will come up with ways to handle
this problem in my application (probably with threads), but I'm
griping more about the lack of a kernel interface that would have
allowed me to avoid this.

-Aaron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

Replying to David Schwartz here.. (David, good to hear from you again
- haven't seen you around since the irc days :))

David Schwartz wrote:


There is no way you can re-try the request. The open must either succeed or
not return a handle. It is not like a 'read' operation that has an "I didn't
do anything, and you can retry this request" option.

If 'open' returns a file handle, you can't retry it (since it must succeed
in order to do that, failure must not return a handle). If you 'open'
doesn't return a file handle, you can't retry it (because, without a handle,
there is no way to associate a future request with this one, if it creates a
file, the file must not be created if you don't call 'open' again).


I understand, but this is exactly the situation that I'm complaining
about.  There is no functionality to provide a nonblocking open - no
ability to come back around and retry a given open call.


You need either threads or a working asynchronous system call interface.
Short of that, you need your own NFS client code.


This is exactly my point - there is no asynchronous system call to do
this work, to my knowledge.  I will likely fix this in my own code
using threads, but I see using threads in this case as working around
that lack of systems interface.  Threads, imho, should be limited to
cases where I'm using them to distribute load across multiple
processors, not because the kernel interfaces for IO cannot support
nonblocking calls.

I'm speaking to my ideal world view - but any application I write
should not have to wait for the kernel if I don't want it to.   I
should be able to submit my request, and come back to it later as I so
decide.

(And I did actually consider writing my own NFS client for about 5 minutes.)

Thanks for the response!
-Aaron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

On 6/4/07, Alan Cox <[EMAIL PROTECTED]> wrote:


> Now, I'm a userspace guy so I can be pretty dense, but shouldn't a
> call with a nonblocking flag return EAGAIN if its going to take
> anywhere near 415ms?

Violation of causality. We don't know it will block for 415ms until 415ms
have elapsed.


Understood - but what I'm getting at is more the fact that there
really doesn't appear to be any real implementation of nonblocking
open().  On the socket side of the fence, I would consider a regular
file open() to be equivalent to a connect() call - the difference
obviously being that we already have a handle for the socket.

The end result, however, is roughly the same.  We have a file
descriptor with the endpoint established.  In the socket world, we
assume that a nonblocking request will always return immediately and
the application is expected to come back around and see if the request
has completed.  Regular files have no equivalent.

-Aaron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

On 6/3/07, Neil Brown <[EMAIL PROTECTED]> wrote:


Have you tried the "nocto" mount option for your NFS filesystems.

The cache-coherency rules of NFS require the client to check with the
server at each open.  If you are the sole client on this filesystem,
then you don't need the same cache-coherency, and "nocto" will tell
the NFS client not to both checking with the server in information is
available in cache.


No I haven't - I will research this a little further today.  While
we're not the only client using these filesystems, this process is
(currently) the only process that writes to these files.  Thanks for
the suggestion.

-Aaron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

On 6/3/07, Neil Brown [EMAIL PROTECTED] wrote:


Have you tried the nocto mount option for your NFS filesystems.

The cache-coherency rules of NFS require the client to check with the
server at each open.  If you are the sole client on this filesystem,
then you don't need the same cache-coherency, and nocto will tell
the NFS client not to both checking with the server in information is
available in cache.


No I haven't - I will research this a little further today.  While
we're not the only client using these filesystems, this process is
(currently) the only process that writes to these files.  Thanks for
the suggestion.

-Aaron
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

On 6/4/07, Alan Cox [EMAIL PROTECTED] wrote:


 Now, I'm a userspace guy so I can be pretty dense, but shouldn't a
 call with a nonblocking flag return EAGAIN if its going to take
 anywhere near 415ms?

Violation of causality. We don't know it will block for 415ms until 415ms
have elapsed.


Understood - but what I'm getting at is more the fact that there
really doesn't appear to be any real implementation of nonblocking
open().  On the socket side of the fence, I would consider a regular
file open() to be equivalent to a connect() call - the difference
obviously being that we already have a handle for the socket.

The end result, however, is roughly the same.  We have a file
descriptor with the endpoint established.  In the socket world, we
assume that a nonblocking request will always return immediately and
the application is expected to come back around and see if the request
has completed.  Regular files have no equivalent.

-Aaron
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

Replying to David Schwartz here.. (David, good to hear from you again
- haven't seen you around since the irc days :))

David Schwartz wrote:


There is no way you can re-try the request. The open must either succeed or
not return a handle. It is not like a 'read' operation that has an I didn't
do anything, and you can retry this request option.

If 'open' returns a file handle, you can't retry it (since it must succeed
in order to do that, failure must not return a handle). If you 'open'
doesn't return a file handle, you can't retry it (because, without a handle,
there is no way to associate a future request with this one, if it creates a
file, the file must not be created if you don't call 'open' again).


I understand, but this is exactly the situation that I'm complaining
about.  There is no functionality to provide a nonblocking open - no
ability to come back around and retry a given open call.


You need either threads or a working asynchronous system call interface.
Short of that, you need your own NFS client code.


This is exactly my point - there is no asynchronous system call to do
this work, to my knowledge.  I will likely fix this in my own code
using threads, but I see using threads in this case as working around
that lack of systems interface.  Threads, imho, should be limited to
cases where I'm using them to distribute load across multiple
processors, not because the kernel interfaces for IO cannot support
nonblocking calls.

I'm speaking to my ideal world view - but any application I write
should not have to wait for the kernel if I don't want it to.   I
should be able to submit my request, and come back to it later as I so
decide.

(And I did actually consider writing my own NFS client for about 5 minutes.)

Thanks for the response!
-Aaron
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

On 6/4/07, John Stoffel [EMAIL PROTECTED] wrote:


So how many files are in the directory where you're seeing the delays?
And what's the average size of the files in there?


The directories themselves will have a maximum of 160 files, and the
files are maybe a few megs each - the delays are (as you pointed out
earlier) due to the ram restrictions and our filesystem design of very
deep directory structures that Netapps suck at.

My point is more generic though - I will come up with ways to handle
this problem in my application (probably with threads), but I'm
griping more about the lack of a kernel interface that would have
allowed me to avoid this.

-Aaron
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

Sorry for the unthreaded responses, I wasn't cc'd here, so I'm
replying to these based on mailing list archives

Al Viro wrote:


BTW, why close these suckers all the time? It's not that kernel would
be unable to hold thousands of open descriptors for your process...
Hash descriptors by pathname and be done with that; don't bother with
close unless you decide that you've got too many of them (e.g. when you
get a hash conflict).


A valid point - I currently keep a pool of 4000 descriptors open and
cycle them out based on inactivity.  I hadn't seriously considered
just keeping them all open, because I simply wasn't sure how well
things would go with 100,000 files open.  Would my backend storage
keep up... would the kernel mind maintaining 100,000 files open over
NFS?

The majority of the files would simply be idle - I would be keeping
file handles open for no reason.  Pooling allows me to substantially
drop the number of opens I require, but I am hesitant to blow the pool
size to substantially higher numbers.  Can anyone shed light on any
issues that may come up with a massive pool size, such as 128k?

-Aaron
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

On 6/4/07, Trond Myklebust [EMAIL PROTECTED] wrote:


So exactly how would you expect a nonblocking open to work? Should it be
starting I/O? What if that involves blocking? How would you know when to
try again?


Well, theres a bunch of options - some have been suggested in the
thread already.  The idea of an open with O_NONBLOCK (or a different
flag) returning a handle immediately, and subsequent calls returning
EAGAIN if the open is incomplete, or ESTALE if it fails (with some
auxiliary method of getting the reason why it failed) are not too far
a stretch from my perspective.

The other option that comes to mind would be to add an interface that
behaves like sockets - get a handle from one system call, set it
nonblocking using fcntl, and use another call to attach it to a
regular file.  This method would make the most sense to me - but its
also because I've worked with sockets in the past far far more than
with regular files.

The one that would take the least amount of work from the application
perspective would be to simply reply to the nonblocking open call with
EAGAIN (or something), and when an open on the same file is performed,
the kernel could have performed its work in the background.  I can
understand, given the fact that there is no handle provided to the
application, that this idea could be sloppy.

I'm still getting caught up on some of the other suggestions (I'm
currently reading about the syslets work that Zach and Ingo are
doing), and it sounds like this is a common complaint that is being
addressed through a number of initiatives.  I'm looking forward to
seeing where that work goes.

-Aaron
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-04 Thread Aaron Wiebe

Actually, lets see if I can summarize this more generically... I
realize I'm suggesting something that probably would be a massive
undertaking, but ..

Regular files are the only interface that requires an application to
wait.  With any other case, the nonblocking interfaces are fairly
complete and easy to work with.  If userspace could treat regular
files in the same fashion as sockets, life would be good.

I admittedly do not understand internal kernel semantics in the
differences between a socket and a regular file.  Why couldn't we just
have a different 'socket type' like PF_FILE or something like this?

Abstracting any IO through the existing interfaces provided to sockets
would be ideal from my perspective.  The code required to use a file
through these interfaces would be more complex in userspace, but the
abstraction of the current open() itself could simply be an aggregate
of these interfaces without a nonblocking flag.

It would, however, fix problems around issues with event-based
applications handling events from both disk and sockets.  I can't
trigger disk read/write events in the same event handlers I use for
sockets (ie, poll or epoll).  I end up having two separate event
handlers - one for disk (currently using glibc's aio thread kludge),
and one for sockets.

I'm sure this isn't a new idea.  Coming from my own development
backround that had little to do with disk, I was actually surprised
when I first discovered that I couldn't edge-trigger disk IO through
poll().

Thoughts, comments?

-Aaron

On 6/4/07, Aaron Wiebe [EMAIL PROTECTED] wrote:

On 6/4/07, Trond Myklebust [EMAIL PROTECTED] wrote:

 So exactly how would you expect a nonblocking open to work? Should it be
 starting I/O? What if that involves blocking? How would you know when to
 try again?

Well, theres a bunch of options - some have been suggested in the
thread already.  The idea of an open with O_NONBLOCK (or a different
flag) returning a handle immediately, and subsequent calls returning
EAGAIN if the open is incomplete, or ESTALE if it fails (with some
auxiliary method of getting the reason why it failed) are not too far
a stretch from my perspective.

The other option that comes to mind would be to add an interface that
behaves like sockets - get a handle from one system call, set it
nonblocking using fcntl, and use another call to attach it to a
regular file.  This method would make the most sense to me - but its
also because I've worked with sockets in the past far far more than
with regular files.

The one that would take the least amount of work from the application
perspective would be to simply reply to the nonblocking open call with
EAGAIN (or something), and when an open on the same file is performed,
the kernel could have performed its work in the background.  I can
understand, given the fact that there is no handle provided to the
application, that this idea could be sloppy.

I'm still getting caught up on some of the other suggestions (I'm
currently reading about the syslets work that Zach and Ingo are
doing), and it sounds like this is a common complaint that is being
addressed through a number of initiatives.  I'm looking forward to
seeing where that work goes.

-Aaron


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-03 Thread Aaron Wiebe

Hi John, thanks for responding.  I'm using kernel 2.6.20 on a
home-grown distro.

I've responded to a few specific points inline - but as a whole,
Davide directed me to work that is being done specifically to address
these issues in the kernel, as well as a userspace implementation that
would allow me to sidestep this failing for the time being.


On 6/3/07, John Stoffel <[EMAIL PROTECTED]> wrote:


How large are these files?  Are they all in a single directory?  How
many files are in the directory?

Ugh. Why don't you just write to a DB instead?  It sounds like you're
writing small records, with one record to a file.  It can work, but
when you're doing thousands per-minute, the open/close overhead is
starting to dominate.  Can you just amortize that overhead across a
bunch of writes instead by writing to a single file which is more
structured for your needs?


In short, I'm distributing logs in realtime for about 600,000
websites.  The sources of the logs (http, ftp, realmedia, etc) are
flexible, however the base framework was build around a large cluster
of webservers.  The output can be to several hundred thousand files
across about two dozen filers for user consumption - some can be very
active, some can be completely inactive.


Netapps usually scream for NFS writes and such, so it sounds to me
that you've blown out the NVRAM cache on the box.  Can you elaborate
more on your hardware & Network & Netapp setup?


You're totally correct here - Netapp has told us as much about our
filesystem design, we use too much ram on the filer itself.  Its true
that the application would handle just fine if our filesystem
structure were redesigned - I am approaching this from an application
perspective though.  These units are capable of the raw IO, its the
simple fact that open calls are taking a while.  If I were to thread
off the application (which Davide has been kind enough to provide some
libraries which will make that substantially easier), the problem
wouldn't exist.


The problem is that O_NONBLOCK on files open doesn't make sense.  You
either open it, or you don't.  How long it takes to comlete isn't part
of the spec.


You can certainly open the file, but not block on the call to do it.
What confuses me is why the kernel would "block" for 415ms on an open
call.  Thats an eternity to suspend a process that has to distribute
data such as this.


But in this case, I think you're doing something hokey with your data
design.  You should be opening just a handful of files and then
streaming your writes to those files.   You'll get much more
performance.


Except I cant very well keep 600,000 files open over NFS.  :)  Pool
and queue, and cycle through the pool.  I've managed to achieve a
balance in my production deployment with this method - my email was
more of a rant after months of trying to work around a problem (caused
by a limitation in system calls), only to have it present an order of
magnitude worse than I expected.  Sorry for not giving more
information off the line - and thanks for your time.

-Aaron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


slow open() calls and o_nonblock

2007-06-03 Thread Aaron Wiebe

Greetings all.  I'm not on this list, so I apologize if this subject
has been covered before.  (Also, please cc me in the response.)

I've spent the last several months trying to work around the lack of a
decent disk AIO interface.  I'm starting to wonder if one exists
anywhere.  The short version:

I have written a daemon that needs to open several thousand files a
minute and write a small amount of data to each file.  After extensive
research, I ended up going with the POSIX AIO kludgy pthreads wrapper
in glibc to handle my writes due to the time constraints of writing my
own pthreads handler into the application.

The problem with this equation is that opens, closes and non-readwrite
operations (fchmod, fcntl, etc) have no interface in posix aio.  Now I
was under the assumption that given open and close operations are
comparatively less common than the write operations, this wouldn't be
a huge problem.  My tests seemed to reflect that.

I went to production with this yesterday to discover that under
production load, our filesystems (nfs on netapps) were substantially
slower than I was expecting.  open() calls are taking upwards of 2
seconds on occation, and usually ~20ms.

Now, Netapp speed aside, O_NONBLOCK and O_DIRECT seem to make zero
difference to my open times.  Example:

open("/somefile", O_WRONLY|O_NONBLOCK|O_CREAT, 0644) = 1621 <0.415147>

Now, I'm a userspace guy so I can be pretty dense, but shouldn't a
call with a nonblocking flag return EAGAIN if its going to take
anywhere near 415ms?  Is there a way I can force opens to EAGAIN if
they take more than 10ms?

Thanks for any help you folks can offer.

-Aaron Wiebe

(ps.  having come from the socket side of the fence, its incredibly
frustrating to be unable to poll() or epoll regular file FDs --
Especially knowing that the kernel is translating them into a TCP
socket to do NFS anyway.  Please add regular files to epoll and give
me a way to do the opens in the same fasion as connects!)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


slow open() calls and o_nonblock

2007-06-03 Thread Aaron Wiebe

Greetings all.  I'm not on this list, so I apologize if this subject
has been covered before.  (Also, please cc me in the response.)

I've spent the last several months trying to work around the lack of a
decent disk AIO interface.  I'm starting to wonder if one exists
anywhere.  The short version:

I have written a daemon that needs to open several thousand files a
minute and write a small amount of data to each file.  After extensive
research, I ended up going with the POSIX AIO kludgy pthreads wrapper
in glibc to handle my writes due to the time constraints of writing my
own pthreads handler into the application.

The problem with this equation is that opens, closes and non-readwrite
operations (fchmod, fcntl, etc) have no interface in posix aio.  Now I
was under the assumption that given open and close operations are
comparatively less common than the write operations, this wouldn't be
a huge problem.  My tests seemed to reflect that.

I went to production with this yesterday to discover that under
production load, our filesystems (nfs on netapps) were substantially
slower than I was expecting.  open() calls are taking upwards of 2
seconds on occation, and usually ~20ms.

Now, Netapp speed aside, O_NONBLOCK and O_DIRECT seem to make zero
difference to my open times.  Example:

open(/somefile, O_WRONLY|O_NONBLOCK|O_CREAT, 0644) = 1621 0.415147

Now, I'm a userspace guy so I can be pretty dense, but shouldn't a
call with a nonblocking flag return EAGAIN if its going to take
anywhere near 415ms?  Is there a way I can force opens to EAGAIN if
they take more than 10ms?

Thanks for any help you folks can offer.

-Aaron Wiebe

(ps.  having come from the socket side of the fence, its incredibly
frustrating to be unable to poll() or epoll regular file FDs --
Especially knowing that the kernel is translating them into a TCP
socket to do NFS anyway.  Please add regular files to epoll and give
me a way to do the opens in the same fasion as connects!)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slow open() calls and o_nonblock

2007-06-03 Thread Aaron Wiebe

Hi John, thanks for responding.  I'm using kernel 2.6.20 on a
home-grown distro.

I've responded to a few specific points inline - but as a whole,
Davide directed me to work that is being done specifically to address
these issues in the kernel, as well as a userspace implementation that
would allow me to sidestep this failing for the time being.


On 6/3/07, John Stoffel [EMAIL PROTECTED] wrote:


How large are these files?  Are they all in a single directory?  How
many files are in the directory?

Ugh. Why don't you just write to a DB instead?  It sounds like you're
writing small records, with one record to a file.  It can work, but
when you're doing thousands per-minute, the open/close overhead is
starting to dominate.  Can you just amortize that overhead across a
bunch of writes instead by writing to a single file which is more
structured for your needs?


In short, I'm distributing logs in realtime for about 600,000
websites.  The sources of the logs (http, ftp, realmedia, etc) are
flexible, however the base framework was build around a large cluster
of webservers.  The output can be to several hundred thousand files
across about two dozen filers for user consumption - some can be very
active, some can be completely inactive.


Netapps usually scream for NFS writes and such, so it sounds to me
that you've blown out the NVRAM cache on the box.  Can you elaborate
more on your hardware  Network  Netapp setup?


You're totally correct here - Netapp has told us as much about our
filesystem design, we use too much ram on the filer itself.  Its true
that the application would handle just fine if our filesystem
structure were redesigned - I am approaching this from an application
perspective though.  These units are capable of the raw IO, its the
simple fact that open calls are taking a while.  If I were to thread
off the application (which Davide has been kind enough to provide some
libraries which will make that substantially easier), the problem
wouldn't exist.


The problem is that O_NONBLOCK on files open doesn't make sense.  You
either open it, or you don't.  How long it takes to comlete isn't part
of the spec.


You can certainly open the file, but not block on the call to do it.
What confuses me is why the kernel would block for 415ms on an open
call.  Thats an eternity to suspend a process that has to distribute
data such as this.


But in this case, I think you're doing something hokey with your data
design.  You should be opening just a handful of files and then
streaming your writes to those files.   You'll get much more
performance.


Except I cant very well keep 600,000 files open over NFS.  :)  Pool
and queue, and cycle through the pool.  I've managed to achieve a
balance in my production deployment with this method - my email was
more of a rant after months of trying to work around a problem (caused
by a limitation in system calls), only to have it present an order of
magnitude worse than I expected.  Sorry for not giving more
information off the line - and thanks for your time.

-Aaron
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fwd: uninterruptable fcntl calls

2007-02-02 Thread Aaron Wiebe

Greetings,

I've run into a situation where fcntl F_SETLKW calls lock up nearly
completely.  I've tried several approaches to handle this case, and
have yet to come up with some method of handling this.  I've never
really ventured outside userspace, so I'm turning to this list to try
and get a handle on this.

Over NFSv3 udp, this situation takes place VERY rarely, however with
the volume I do, its creating a problem.

In short, I am attempting to read or write lock, and the call hangs to
the point where a sigkill is not captured - no signal is.  I've tried
alarming out and I've tried switching the socket to nonblocking -
nothing I can think of prevents or even allows me to handle the case.
I understand NFS locking can be rather sketchy at times - but all I
need is the ability to handle the case.

I can force the process to die by sending a sigkill, then stracing.
The strace reports the process as sigstop, then processes the kill
signal.

All I need here is a method of capturing this case.  I can "repair"
the stuck lock by regenerating the file, but I can't capture the case
in order to handle this in code.

Any help would be useful - I am currently running 2.6.15.6 compiled
with the NFS patches from linux-nfs.org, but this case was happening
before applying those patches.  I'd be happy to provide any more
information nessecary.  I've been struggling with this one for a few
months now.

Thanks,
-Aaron


Straces:

rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0
alarm(120)  = 0
fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}
[hangs]

Or:

fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}



Code used for locking:

static int db_lock(int fd, int type)
{
   struct flock fl;
   struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec));
   int ret, c = 0;

   if(!(fd > 0))
   return -1;

#ifdef SIGALRM_HACK
   /* after two minutes, wig out */
   sigalrm_set();
   alarm(120);
#endif

   fl.l_whence = SEEK_SET;
   fl.l_start = 0;
   fl.l_len = 0;
   fl.l_type = type;

#ifdef NONBLOCKING_HACK
   set_nonblocking(fd);
#endif

   while((ret = fcntl(fd, F_SETLKW, )) < 0)
   {
   c++;
   if(c > 600)
   {
   /* we've been waiting for 60 seconds... */
   my_error("stuck on fcntl request, aborting");
   return -1;
   }
   tv->tv_nsec = 100;   /* 10th of a second wait */
   tv->tv_sec = 0;
   nanosleep(tv, NULL);
   }
   free(tv);
#ifdef SIGALRM_HACK
   sigalrm_unset();
#endif
#ifdef NONBLOCKING_HACK
   unset_nonblocking(fd);
#endif
   return ret;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


uninterruptable fcntl calls

2007-02-02 Thread Aaron Wiebe

Greetings,

I've run into a situation where fcntl F_SETLKW calls lock up nearly
completely.  I've tried several approaches to handle this case, and
have yet to come up with some method of handling this.  I've never
really ventured outside userspace, so I'm turning to this list to try
and get a handle on this.

Over NFSv3 udp, this situation takes place VERY rarely, however with
the volume I do, its creating a problem.

In short, I am attempting to read or write lock, and the call hangs to
the point where a sigkill is not captured - no signal is.  I've tried
alarming out and I've tried switching the socket to nonblocking -
nothing I can think of prevents or even allows me to handle the case.
I understand NFS locking can be rather sketchy at times - but all I
need is the ability to handle the case.

I can force the process to die by sending a sigkill, then stracing.
The strace reports the process as sigstop, then processes the kill
signal.

All I need here is a method of capturing this case.  I can "repair"
the stuck lock by regenerating the file, but I can't capture the case
in order to handle this in code.

Any help would be useful - I am currently running 2.6.15.6 compiled
with the NFS patches from linux-nfs.org, but this case was happening
before applying those patches.  I'd be happy to provide any more
information nessecary.  I've been struggling with this one for a few
months now.

Thanks,
-Aaron


Straces:

rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0
alarm(120)  = 0
fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}
[hangs]

Or:

fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}



Code used for locking:

static int db_lock(int fd, int type)
{
   struct flock fl;
   struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec));
   int ret, c = 0;

   if(!(fd > 0))
   return -1;

#ifdef SIGALRM_HACK
   /* after two minutes, wig out */
   sigalrm_set();
   alarm(120);
#endif

   fl.l_whence = SEEK_SET;
   fl.l_start = 0;
   fl.l_len = 0;
   fl.l_type = type;

#ifdef NONBLOCKING_HACK
   set_nonblocking(fd);
#endif

   while((ret = fcntl(fd, F_SETLKW, )) < 0)
   {
   c++;
   if(c > 600)
   {
   /* we've been waiting for 60 seconds... */
   my_error("stuck on fcntl request, aborting");
   return -1;
   }
   tv->tv_nsec = 100;   /* 10th of a second wait */
   tv->tv_sec = 0;
   nanosleep(tv, NULL);
   }
   free(tv);
#ifdef SIGALRM_HACK
   sigalrm_unset();
#endif
#ifdef NONBLOCKING_HACK
   unset_nonblocking(fd);
#endif
   return ret;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


uninterruptable fcntl calls

2007-02-02 Thread Aaron Wiebe

Greetings,

I've run into a situation where fcntl F_SETLKW calls lock up nearly
completely.  I've tried several approaches to handle this case, and
have yet to come up with some method of handling this.  I've never
really ventured outside userspace, so I'm turning to this list to try
and get a handle on this.

Over NFSv3 udp, this situation takes place VERY rarely, however with
the volume I do, its creating a problem.

In short, I am attempting to read or write lock, and the call hangs to
the point where a sigkill is not captured - no signal is.  I've tried
alarming out and I've tried switching the socket to nonblocking -
nothing I can think of prevents or even allows me to handle the case.
I understand NFS locking can be rather sketchy at times - but all I
need is the ability to handle the case.

I can force the process to die by sending a sigkill, then stracing.
The strace reports the process as sigstop, then processes the kill
signal.

All I need here is a method of capturing this case.  I can repair
the stuck lock by regenerating the file, but I can't capture the case
in order to handle this in code.

Any help would be useful - I am currently running 2.6.15.6 compiled
with the NFS patches from linux-nfs.org, but this case was happening
before applying those patches.  I'd be happy to provide any more
information nessecary.  I've been struggling with this one for a few
months now.

Thanks,
-Aaron


Straces:

rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0
alarm(120)  = 0
fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}
[hangs]

Or:

fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}



Code used for locking:

static int db_lock(int fd, int type)
{
   struct flock fl;
   struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec));
   int ret, c = 0;

   if(!(fd  0))
   return -1;

#ifdef SIGALRM_HACK
   /* after two minutes, wig out */
   sigalrm_set();
   alarm(120);
#endif

   fl.l_whence = SEEK_SET;
   fl.l_start = 0;
   fl.l_len = 0;
   fl.l_type = type;

#ifdef NONBLOCKING_HACK
   set_nonblocking(fd);
#endif

   while((ret = fcntl(fd, F_SETLKW, fl))  0)
   {
   c++;
   if(c  600)
   {
   /* we've been waiting for 60 seconds... */
   my_error(stuck on fcntl request, aborting);
   return -1;
   }
   tv-tv_nsec = 100;   /* 10th of a second wait */
   tv-tv_sec = 0;
   nanosleep(tv, NULL);
   }
   free(tv);
#ifdef SIGALRM_HACK
   sigalrm_unset();
#endif
#ifdef NONBLOCKING_HACK
   unset_nonblocking(fd);
#endif
   return ret;
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fwd: uninterruptable fcntl calls

2007-02-02 Thread Aaron Wiebe

Greetings,

I've run into a situation where fcntl F_SETLKW calls lock up nearly
completely.  I've tried several approaches to handle this case, and
have yet to come up with some method of handling this.  I've never
really ventured outside userspace, so I'm turning to this list to try
and get a handle on this.

Over NFSv3 udp, this situation takes place VERY rarely, however with
the volume I do, its creating a problem.

In short, I am attempting to read or write lock, and the call hangs to
the point where a sigkill is not captured - no signal is.  I've tried
alarming out and I've tried switching the socket to nonblocking -
nothing I can think of prevents or even allows me to handle the case.
I understand NFS locking can be rather sketchy at times - but all I
need is the ability to handle the case.

I can force the process to die by sending a sigkill, then stracing.
The strace reports the process as sigstop, then processes the kill
signal.

All I need here is a method of capturing this case.  I can repair
the stuck lock by regenerating the file, but I can't capture the case
in order to handle this in code.

Any help would be useful - I am currently running 2.6.15.6 compiled
with the NFS patches from linux-nfs.org, but this case was happening
before applying those patches.  I'd be happy to provide any more
information nessecary.  I've been struggling with this one for a few
months now.

Thanks,
-Aaron


Straces:

rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0
alarm(120)  = 0
fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}
[hangs]

Or:

fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}



Code used for locking:

static int db_lock(int fd, int type)
{
   struct flock fl;
   struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec));
   int ret, c = 0;

   if(!(fd  0))
   return -1;

#ifdef SIGALRM_HACK
   /* after two minutes, wig out */
   sigalrm_set();
   alarm(120);
#endif

   fl.l_whence = SEEK_SET;
   fl.l_start = 0;
   fl.l_len = 0;
   fl.l_type = type;

#ifdef NONBLOCKING_HACK
   set_nonblocking(fd);
#endif

   while((ret = fcntl(fd, F_SETLKW, fl))  0)
   {
   c++;
   if(c  600)
   {
   /* we've been waiting for 60 seconds... */
   my_error(stuck on fcntl request, aborting);
   return -1;
   }
   tv-tv_nsec = 100;   /* 10th of a second wait */
   tv-tv_sec = 0;
   nanosleep(tv, NULL);
   }
   free(tv);
#ifdef SIGALRM_HACK
   sigalrm_unset();
#endif
#ifdef NONBLOCKING_HACK
   unset_nonblocking(fd);
#endif
   return ret;
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/