[Bug libstdc++/45574] cin.getline() is extremely slow
--- Comment #24 from tstarling at wikimedia dot org 2010-09-10 15:25 --- Created an attachment (id=21766) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21766action=view) dynamic_cast hack The attached patch uses a dynamic_cast hack to avoid the need to break the ABI. I added *_unlocked functions to cstdio, I'm not sure if this is necessary, but it's easy enough to remove that part if not. I also added some lightly-tested autoconf stuff. I'm an autoconf newbie so that part should probably be reviewed carefully. stdio_sync_filebufwchar_t::_M_getline() is currently unreachable, since I only edited basic_istreamchar::getline() and not basic_istreamwchar_t::getline(). It would be easy enough to fix that. I haven't used getwc_unlocked() because it's a GNU extension, POSIX only has non-wide unlocked I/O. The timings for 1M lines with 500 bytes per line, user time only, are: Old library: 26.7s New library: 1.65s fgets: 0.280s So it's better, but not perfect. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45574
[Bug libstdc++/45574] cin.getline() is extremely slow
--- Comment #18 from tstarling at wikimedia dot org 2010-09-09 14:12 --- Created an attachment (id=21752) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21752action=view) gprof output I haven't managed to get libstdc++ to compile with -pg, but compiling the test program with -static at least gives you a function breakdown. gprof output attached for 1 million lines, 500 bytes per line. To summarise: fgetc: 36.13% istream::getline: 18.01% ungetc: 16.70% _IO_sputbackc: 9.54% stdio_sync_filebuf::underflow: 5.66% stdio_sync_filebuf::uflow: 4.93% I should have spotted it from reading the code, it's not a loop of getc(), it's a loop of ungetc(getc()) getc(). It really demonstrates how poorly suited the streambuf interface is to unbuffered input. The virtual functions called by istream::getline() don't give much flexibility. So I still have no other ideas apart from breaking the ABI. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45574
[Bug libstdc++/45574] cin.getline() is extremely slow
--- Comment #19 from tstarling at wikimedia dot org 2010-09-09 14:28 --- (In reply to comment #16) The *_unlocked versions are faster a lot actually, at least for the one character ops, because no locking is performed and the calls are inlined. But the question is whether libstdc++ can use them, unless there is some restriction that would disallow several threads from using the same FILE * (including using STL APIs in one thread and C stdio APIs in another thread). My current idea is to do: flockfile(stdin); while (!eof) { c = getc_unlocked(stdin); ... } funlockfile(stdin); This is not only much faster, it's an improvement to the current behaviour in terms of locking and thread safety. The current behaviour, as I said in comment #4, could cause data to be badly mangled if one thread uses stdio while another uses cin.getline(). Using getc() in preference to getc_unlocked() does not help. And unlike getdelim(), the unlocked I/O functions are in POSIX.1-2001, says the man page, so it's relatively portable. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45574
[Bug libstdc++/45574] cin.getline() is extremely slow
--- Comment #23 from tstarling at wikimedia dot org 2010-09-10 00:17 --- (In reply to comment #21) Anyway, not sure which STL getline we are talking about here, because e.g. src/istream.cc getline seems to access the stdio buffer directly: streamsize __size = std::min(streamsize(__sb-egptr() - __sb-gptr()), streamsize(__n - _M_gcount - 1)); __sb-gptr() and __sb-egptr() are always null for this kind of streambuf, so __size is always zero, and so the loop just calls snextc() on every iteration. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45574
[Bug libstdc++/45574] cin.getline() is extremely slow
--- Comment #14 from tstarling at wikimedia dot org 2010-09-09 02:31 --- (In reply to comment #11) So? We are not changing glibc here. The C++ library does *not* use buffering in the synced mode, and it does otherwise, for fstreams in particular. Where do you think the performance difference is essentially coming from? Sure, buffering would help, because the interface between the C++ library and the buffer in the C library is slow. I just meant that the lack of a buffer in C++ isn't an excuse for slowness since it should theoretically be possible for C++ to access the buffer in the C library without much overhead. At another level, your question is unsolved and interesting, because while(getc(stdin)!=EOF); is much faster than cin.getline(), taking only 0.632s of user time for the attached test case. And a loop of getc_unlocked() only takes 0.188s of user time. So there may be opportunity for optimisation here without resorting to fgets() or getdelim(), which as you say, suck in various ways. I'll see if I have time for some more testing. If I wrote a patch involving a new virtual method or two, would it be looked at? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45574
[Bug libstdc++/45574] ifstream::getline() is extremely slow
--- Comment #2 from tstarling at wikimedia dot org 2010-09-07 10:46 --- (In reply to comment #1) If the problem is in the stdio sync code, then file a glibc PR. I mean the stdio sync code as in the code in libstdc++ which synchronises with glibc, not actual code within glibc. If there was a problem with glibc, glibc would be slow, but it isn't. -- tstarling at wikimedia dot org changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45574
[Bug libstdc++/45574] ifstream::getline() is extremely slow
--- Comment #4 from tstarling at wikimedia dot org 2010-09-07 17:18 --- Benchmarking on Solaris indicates that cin.getline() takes only 1us per iteration there, but I don't think the source code is available, so it's hard to provide details. However, I think that a huge speedup could be achieved by making basic_istreamchar::getline() into a simple wrapper around a GNU-specific virtual function in basic_streambuf. This would allow it to be specialised in stdio_sync_filebuf, where it could be implemented using fgets() or getdelim() instead of getc(). This would have the additional positive impact of making it atomic. Currently, cin.getline() does not properly lock the underlying libc stream with flockfile(). This means that if one thread is calling cin.getline(), and another thread is calling getc(), then cin.getline() may return mangled partial lines due to interleaved calls to getc() from the other thread. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45574
[Bug libstdc++/45574] ifstream::getline() is extremely slow
--- Comment #8 from tstarling at wikimedia dot org 2010-09-08 01:34 --- (In reply to comment #5) For sure we cannot add virtual functions to basic_streambuf without breaking the ABI. I'm mostly looking for a long-term fix, to improve the speed of libstdc++ applications generally, especially those that don't have developers who would go to the trouble to track down the source of slowness in their programs. The short-term fix is to call ios::sync_with_stdio(false). So it's fine for me to wait for the next major version. Also, getline certainly isn't just fgets, takes a delim char, uses traits, etc. The delim char can be taken care of with getdelim(). I don't think it's unreasonable to specialise for default traits, that would take care of 99% of use cases. Sure, anyway, in principle you can often speed-up special cases, but also given that in ~5-7 years nobody else reported anything about the performance of the synced getline, I don't think anything is going to happen anytime soon, I could keep this open, but it would be futile, we have a lot of work to do, for C++0x, in particular. OK, let's keep it open. (In reply to comment #6) By the way, I don't know anything about your testcase (it would be a good idea attaching something here, just in case), but on my machines, i7 mostly, I don't see anything similar to your performance gap, I see something more similar to 9-10x, which, considering that a real synced mode must be unbuffered, seems completely reasonable to me. Probably the main difference is the number of bytes per line in the input file. I'm using a file with 1M lines and an average of 429 bytes per line. Using less bytes per line would bring more pressure on to the constant per-line overhead, and less on the inner loop. But a 9-10x difference doesn't sound reasonable to me. The synced mode is not unbuffered, before or after my suggested change, it uses the internal buffer in glibc. (In reply to comment #7) It's well known (though maybe not well enough) that you should use sync_with_stdio(false) to get good performance, unless you specifically need the synchronisation. Maybe you should tell that to Paolo Carlini, who closed bug 15002 as resolved fixed in 2004, or to Loren Rittle, who closed bug 5001 as resolved fixed in 2003, declaring This issue was addressed by gcc 3.2.X such that sync_with_stdio was no longer required for reasonable performance. -- tstarling at wikimedia dot org changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45574
[Bug libstdc++/45574] ifstream::getline() is extremely slow
--- Comment #9 from tstarling at wikimedia dot org 2010-09-08 02:36 --- Created an attachment (id=21732) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21732action=view) 10 lines, 500 bytes per line Test file attached as requested, compressed with gzip. Test code follows. getline-test.cpp #include iostream int main(int argc, char** argv) { char buffer[65536]; while (std::cin.getline(buffer, sizeof(buffer), '\n')); return 0; } fgets-test.cpp: #include stdio.h int main(int argc, char** argv) { char buffer[65536]; while (fgets(buffer, sizeof(buffer), stdin)); return 0; } $ time ./fgets-test 500x100k.txt real0m0.076s user0m0.040s sys 0m0.032s $ time ./getline-test 500x100k.txt real0m2.727s user0m2.672s sys 0m0.028s -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45574
[Bug libstdc++/45574] New: ifstream::getline() is extremely slow
In libstdc++ 4.4.3-4ubuntu5, getline() is extremely slow, taking around 23us of user time per line on an Intel T9300 processor. By contrast, fgets() takes about 0.3us per line. Calling ios::sync_with_stdio(false) before the loop start reduces the time per line to around 0.3us, on par with fgets(). This suggests that the problem is with the stdio synchronisation code. -- Summary: ifstream::getline() is extremely slow Product: gcc Version: 4.4.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tstarling at wikimedia dot org GCC build triplet: i486-linux-gnu GCC host triplet: i486-linux-gnu GCC target triplet: i486-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45574