[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #16 from Jonathan Wakely --- Author: redi Date: Wed Jun 26 15:12:15 2019 New Revision: 272697 URL: https://gcc.gnu.org/viewcvs?rev=272697&root=gcc&view=rev Log: PR libstdc++/90634 reduce allocations in filesystem::path construction Backport from mainline 2019-05-28 Jonathan Wakely PR libstdc++/90634 * include/experimental/bits/fs_path.h (path::path(path&&)): Only call _M_split_cmpts() for a path with multiple components. (path::_S_is_dir_sep()): Add missing 'static' keyword to function. * src/filesystem/path.cc (path::_M_split_cmpts()): Count number of components and reserve space in vector. Return early when there is only one component. * testsuite/experimental/filesystem/path/construct/90634.cc: New test. Added: branches/gcc-7-branch/libstdc++-v3/testsuite/experimental/filesystem/path/construct/90634.cc Modified: branches/gcc-7-branch/libstdc++-v3/ChangeLog branches/gcc-7-branch/libstdc++-v3/include/experimental/bits/fs_path.h branches/gcc-7-branch/libstdc++-v3/src/filesystem/path.cc
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #15 from baltic <1000hz.radiowave at gmail dot com> --- (In reply to Jonathan Wakely from comment #14) > It's true that the standard does not require path::iterator to be usable in > generic algorithms that expect forward iterators or bidirectional iterators. Indeed! Ability to run a modifying algorithm on a path's elements is extremely useful. Sorting path's components makes a lot of practical sense. So does std::random_shuffle them. And ability to heapify them (std::make_heap) is basically unavoidable in any app, which has to deal with filesystem::path. But what weirdo would want to store hundreds of thousands of paths without x10 memory overhead, right? So i agree. Just screw the C++ standard and do what you think is best for ppl out there. Thanks for your great work!
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #14 from Jonathan Wakely --- (In reply to baltic from comment #9) > (In reply to Jonathan Wakely from comment #8) > > > The libc++ implementation also fails this test: > > As i've shown before, neither of those are failures. By the current C++ > standard at least. It's true that the standard does not require path::iterator to be usable in generic algorithms that expect forward iterators or bidirectional iterators. I think the standard is wrong. > So, long story short: "I am not going to fix the x10 overhead, because I > believe the standard is wrong." Yes. The vector reallocations during path construction are gone for gcc 8.4 (and also in experimental::filesystem::path in all branches except gcc-7-branch).
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #13 from Jonathan Wakely --- Author: redi Date: Tue May 28 20:48:31 2019 New Revision: 271719 URL: https://gcc.gnu.org/viewcvs?rev=271719&root=gcc&view=rev Log: PR libstdc++/90634 reduce allocations in filesystem::path construction Backport from mainline 2019-05-28 Jonathan Wakely PR libstdc++/90634 * include/experimental/bits/fs_path.h (path::path(path&&)): Only call _M_split_cmpts() for a path with multiple components. (path::_S_is_dir_sep()): Add missing 'static' keyword to function. * src/filesystem/path.cc (path::_M_split_cmpts()): Count number of components and reserve space in vector. Return early when there is only one component. * testsuite/27_io/filesystem/path/construct/90634.cc: New test. * testsuite/experimental/filesystem/path/construct/90634.cc: New test. Added: branches/gcc-9-branch/libstdc++-v3/testsuite/27_io/filesystem/path/construct/90634.cc branches/gcc-9-branch/libstdc++-v3/testsuite/experimental/filesystem/path/construct/90634.cc Modified: branches/gcc-9-branch/libstdc++-v3/ChangeLog branches/gcc-9-branch/libstdc++-v3/include/experimental/bits/fs_path.h branches/gcc-9-branch/libstdc++-v3/src/filesystem/path.cc
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #12 from Jonathan Wakely --- Author: redi Date: Tue May 28 19:39:48 2019 New Revision: 271717 URL: https://gcc.gnu.org/viewcvs?rev=271717&root=gcc&view=rev Log: PR libstdc++/90634 reduce allocations in filesystem::path construction PR libstdc++/90634 * include/experimental/bits/fs_path.h (path::path(path&&)): Only call _M_split_cmpts() for a path with multiple components. (path::_S_is_dir_sep()): Add missing 'static' keyword to function. * src/filesystem/path.cc (path::_M_split_cmpts()): Count number of components and reserve space in vector. Return early when there is only one component. * testsuite/27_io/filesystem/path/construct/90634.cc: New test. * testsuite/experimental/filesystem/path/construct/90634.cc: New test. Added: trunk/libstdc++-v3/testsuite/27_io/filesystem/path/construct/90634.cc trunk/libstdc++-v3/testsuite/experimental/filesystem/path/construct/90634.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/experimental/bits/fs_path.h trunk/libstdc++-v3/src/filesystem/path.cc
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #11 from Jonathan Wakely --- Author: redi Date: Tue May 28 16:08:51 2019 New Revision: 271712 URL: https://gcc.gnu.org/viewcvs?rev=271712&root=gcc&view=rev Log: Fix check for root-directory path and add tests PR libstdc++/90634 * src/filesystem/path.cc (path::_M_split_cmpts()): Fix check for "/". * testsuite/27_io/filesystem/path/construct/90634.cc: New test. * testsuite/experimental/filesystem/path/construct/90634.cc: New test. Added: branches/gcc-8-branch/libstdc++-v3/testsuite/27_io/filesystem/path/construct/90634.cc branches/gcc-8-branch/libstdc++-v3/testsuite/experimental/filesystem/path/construct/90634.cc Modified: branches/gcc-8-branch/libstdc++-v3/ChangeLog branches/gcc-8-branch/libstdc++-v3/src/filesystem/path.cc
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #10 from Jonathan Wakely --- Author: redi Date: Tue May 28 14:57:35 2019 New Revision: 271710 URL: https://gcc.gnu.org/viewcvs?rev=271710&root=gcc&view=rev Log: PR libstdc++/90634 reduce allocations in filesystem::path construction PR libstdc++/90634 * include/bits/fs_path.h (path::path(path&&)): Only call _M_split_cmpts() for a path with multiple components. (path::_S_is_dir_sep()): Add missing 'static' keyword to function. * include/experimental/bits/fs_path.h: Likewise. * src/filesystem/path.cc (path::_M_split_cmpts()): Count number of components and reserve space in vector. Return early when there is only one component. * src/filesystem/std-path.cc (path::_M_split_cmpts()): Likewise. Modified: branches/gcc-8-branch/libstdc++-v3/ChangeLog branches/gcc-8-branch/libstdc++-v3/include/bits/fs_path.h branches/gcc-8-branch/libstdc++-v3/include/experimental/bits/fs_path.h branches/gcc-8-branch/libstdc++-v3/src/filesystem/path.cc branches/gcc-8-branch/libstdc++-v3/src/filesystem/std-path.cc
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #9 from baltic <1000hz.radiowave at gmail dot com> --- (In reply to Jonathan Wakely from comment #8) > The libc++ implementation also fails this test: As i've shown before, neither of those are failures. By the current C++ standard at least. So, long story short: "I am not going to fix the x10 overhead, because I believe the standard is wrong." Ok. But damn! how good clang looks on the same test: about to quit. total allocated 0 bytes
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #8 from Jonathan Wakely --- Yes, I'm well aware what the standard says, I reported https://wg21.link/lwg2674 I consider it a mistake for the standard to say "these are bidirectional iterators, except they're not, but it's OK because at some point in the future a subset of algorithms won't care about that requirement any more". GCC's implementation provides real bidirectional iterators that meet the C++17 requirements and the C++2a BidirectionalIterator requirement that: "Pointers and references obtained from a forward iterator into a range [i, s) shall remain valid while [i, s) continues to denote a range." The libc++ implementation also fails this test: https://wandbox.org/permlink/RwiFphMn1Hh2iis4 It's OK though, because you don't pay for what you don't use! The design trade off was discussed in Bug 71044 comment 6. Populating the range lazily would still be possible (as long as it was thread-safe) but I'm not convinced it would be an improvement. In theory the range could also be shared by copies of the same path (again, as long as it was thread-safe). Those optimizations can be considered later, but for now the implementation aims to be correct and not produce dangling pointers. As I said, I'll reduce the number of allocations done by the GCC 8.x implementation, but I'm not planning on any other changes for now.
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #7 from baltic <1000hz.radiowave at gmail dot com> --- (In reply to Jonathan Wakely from comment #5) > So don't store them as filesystem::path objects then, store them as strings > and create a path as needed. Type system is here for a reason. And it has advantages to type something, which is naturally a path, as such, not as an arbitrary string. > GCC's implementation creates the path components eagerly, so that > path::iterator meets the requirements of a forward iterator, whereas the > libc++ implementation creates them lazily during iteration, and so is not a > valid forward iterator. It doesn't have to be. See standard 30.11.7.5: A path::iterator is a constant iterator satisfying all the requirements of a bidirectional iterator (27.2.6) except that, for dereferenceable iterators a and b of type path::iterator with a == b, there is no requirement that *a and *b are bound to the same object. Its value_type is path. > The implementations have different trade-offs. That is not a bug. Poorly considered trade-offs are called "bad design" ;) Besides, your implementation breaks the fundamental C++ design principle "you don't pay for what you don't use". now everyone has to pay x10 memory overhead even if they are not ever going to iterate over a path object. I agree, that is not a bug, in C++ world, its a pure crime ;)
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 Jonathan Wakely changed: What|Removed |Added Target Milestone|9.0 |8.4 --- Comment #6 from Jonathan Wakely --- (In reply to Jonathan Wakely from comment #5) > The correct results with 9.1.0 are: > > allocating 21 bytes > allocating 248 bytes > about to quit. total allocated 269 bytes > freed 248 bytes > freed 21 bytes Proof: https://wandbox.org/permlink/zJVe39jAZtL1p6je
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #5 from Jonathan Wakely --- (In reply to baltic from comment #3) > (In reply to Jonathan Wakely from comment #2) > > > I'm not sure where the repeated 72 allocations come from, and can't check > > the code right now, but the new code doesn't do it anyway. > > It comes from std::string creation, which then is passed to path constructor. No it doesn't, that's the 21 bytes. With GCC 9.x the 72 bytes is sizeof(path), which comes from allocating a path. It comes from each component of the path creating a vector during parsing, and then if there's only one component it empties the vector again. Those transient allocations could be avoided, so I'll change that for GCC 8.4.0, but there's nothing to do for GCC 9.x as it's already avoiding unnecessary vector operations. > > I think this can be considered fixed. > > hardly. the gcc 9.1 is even worse in that regard, as it allocates 2kB of > memory for the same case > see the line: > > about to quit. total allocated 2131 bytes > > in here: > https://wandbox.org/permlink/u9dEfPh1Zc5pmJ34 You should try using a real compiler and not only rely on wandbox. Maybe there's something wrong with the wandbox compiler, but that result only happens when using Boost (so you should take it up with Boost instead). The correct results with 9.1.0 are: allocating 21 bytes allocating 248 bytes about to quit. total allocated 269 bytes freed 248 bytes freed 21 bytes (In reply to baltic from comment #4) > besides the 269 bytes you have mentioned, is still x10 overhead for a 20 > chars string. and massively adds up, if you store a lot objects. The overhead is linear in the number of components in the path, not the string length. If you have a path with a single filename and no directories then the overhead is nothing like 10x. > for example, when i go around home folder on my machine and save all the > found paths to vector, it takes 2.7GB, while should take ~150MB!! quite an > overhead on a simple task, for a language which strives for efficiency. So don't store them as filesystem::path objects then, store them as strings and create a path as needed. > check out clang, for example: > https://wandbox.org/permlink/u9dEfPh1Zc5pmJ34 > it's smart enough to allocate such short strings inplace: > about to quit. total allocated 0 bytes GCC will also create short strings in place, but with a different limit for what is considered "short". GCC's implementation creates the path components eagerly, so that path::iterator meets the requirements of a forward iterator, whereas the libc++ implementation creates them lazily during iteration, and so is not a valid forward iterator. This fails with libc++: https://wandbox.org/permlink/eas3ZqA2CojecrQJ The implementations have different trade-offs. That is not a bug.
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #4 from baltic <1000hz.radiowave at gmail dot com> --- besides the 269 bytes you have mentioned, is still x10 overhead for a 20 chars string. and massively adds up, if you store a lot objects. for example, when i go around home folder on my machine and save all the found paths to vector, it takes 2.7GB, while should take ~150MB!! quite an overhead on a simple task, for a language which strives for efficiency. check out clang, for example: https://wandbox.org/permlink/u9dEfPh1Zc5pmJ34 it's smart enough to allocate such short strings inplace: about to quit. total allocated 0 bytes
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 baltic <1000hz.radiowave at gmail dot com> changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|FIXED |--- --- Comment #3 from baltic <1000hz.radiowave at gmail dot com> --- (In reply to Jonathan Wakely from comment #2) > I'm not sure where the repeated 72 allocations come from, and can't check > the code right now, but the new code doesn't do it anyway. It comes from std::string creation, which then is passed to path constructor. > I think this can be considered fixed. hardly. the gcc 9.1 is even worse in that regard, as it allocates 2kB of memory for the same case see the line: about to quit. total allocated 2131 bytes in here: https://wandbox.org/permlink/u9dEfPh1Zc5pmJ34
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED Target Milestone|--- |9.0 --- Comment #2 from Jonathan Wakely --- The 813 bytes seen with GCC 8.x is due to reallocations within std::vector, as the sequence of path objects in [begin(),end()) is populated. The new code counts the number of components first, so there's no need to keep growing as the path is parsed. The sequence no longer uses std::vector at all, which also reduces sizeof(path). I'm not sure where the repeated 72 allocations come from, and can't check the code right now, but the new code doesn't do it anyway. I think this can be considered fixed.
[Bug libstdc++/90634] filesystem::path insane memory allocations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634 --- Comment #1 from Jonathan Wakely --- The path type was rewritten for GCC 9, and now prints: allocating 21 bytes allocating 248 bytes about to quit. total allocated 269 bytes freed 248 bytes freed 21 bytes The 21 bytes is for the native() string, and 248 bytes is the sequence of path objects in the range [begin(),end()).