Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
On Thu, 29 Feb 2024 at 09:20, Philip Lorenz wrote: > > One possibility is teaching it to mass-import pre-computed entries > > into its index, so that sweeping file tree scans with archive > > extractions can be avoided altogether. Or doing incremental index > > imports directly from do_package. > Producing this data is exactly what this RFC is about. Using the > extracted build ID information to optimize the import into debuginfod is > one of the possible use cases but I'd also suggest to keep the extracted > data agnostic of any concrete tooling (e.g. pkgdata). This is fair enough. But you need to think upfront about how producing this data should be tested with just oe-core/poky and what use cases it could have. Simple sanity check is ok, but improving debuginfod to import the pre-computed values is much better. Maybe something else too? This also allows you to develop and publish the alternative service on its own schedule and terms, if the code is not mature or BMW legal is having a hard time signing off on making it public etc. We don't need to see it, if there's a use case in core. Alex -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#196421): https://lists.openembedded.org/g/openembedded-core/message/196421 Mute This Topic: https://lists.openembedded.org/mt/104619206/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
Hi Alex, On 28.02.24 18:40, Alexander Kanavin wrote: On Wed, 28 Feb 2024 at 16:41, Philip Lorenz wrote: I'm assuming this data wouldn't be that large or that expensive to compute so I'd prefer not to hide it behind extra configuration options if we can help it. That does depend on the overheads/costs though. I just executed build ID extraction on the debug packages of our medium sized kirkstone based distro (see my reply to Alex for more details). Sequentially extracting build IDs from around 8000 files took around 1:30 minutes on my machine. While I wouldn't call this excessive, I am also not sure whether this is too much overhead given that I only expect this data to be used in some deployments. I have to object to the numbers because they were done with a sequential shell loop. Debuginfod does it in threads and is able to complete the scans much faster. So you need to check how quickly it completes its job when started with oe-debuginfod rather. There might be an improvement coming from what you are proposing, but it's most likely not going to be as drastic. I think there's some misunderstanding that I'd like to sort out first: This is in no way about deprecating or not using debuginfod. It however is an optimization on how build IDs are extracted which can be used by a variety of tools (such as debuginfod). As such a sequential scan should give a rough idea on how much time it takes to extract the build IDs during do_package (wall clock time is bound to differ). Based on this we can see that its not free but also not extremely expensive although I'd like to leave the judgement call on whether this something that should be enabled on all builds to someone else. There's also something else I noticed just now: there seems to be an alternative implementation of debuginfod you want to introduce? Why? If the original from elfutils isn't working well enough, shouldn't we make it better? Let try to give you some sort of insight of how we are planning to use it and I hope this clarifies things: In our case we are dealing with hundreds of bitbake builds whose artifacts (including package feeds) are published to some storage accessible via a HTTP. We would now like to offer a service that gives developers access to the debug files in a seamless way (i.e. we want to eliminate the process of manually having to download the debug packages matching a particular build). To accomplish this, our setup is based around a lightweight "gateway" daemon that translates a debuginfo HTTP request into a fetch of the corresponding package from the matching repository, extracting the debug symbol file and then serving that to the requesting client. This is quite different to the way debuginfod works (which seems to be built around the idea of having the debug symbol files readily available via the file system) and I also see advantages in that approach when one has a fairly static set of debug symbol files one wants to serve. There's also some other non-functional requirements that would make deployment of debuginfod in our case quite difficult. This is no way meant to be a fully fledged debuginfod reimplementation but a simple gateway between the debuginfod protocol and a backing package repository. I am not sure whether such an extension is in scope of the elfutils package. One possibility is teaching it to mass-import pre-computed entries into its index, so that sweeping file tree scans with archive extractions can be avoided altogether. Or doing incremental index imports directly from do_package. Producing this data is exactly what this RFC is about. Using the extracted build ID information to optimize the import into debuginfod is one of the possible use cases but I'd also suggest to keep the extracted data agnostic of any concrete tooling (e.g. pkgdata). Br, Philip -- Philip Lorenz BMW Car IT GmbH, Software-Plattform, -Integration Connected Company, Lise-Meitner-Straße 14, 89081 Ulm - BMW Car IT GmbH Management: Chris Brandt and Michael Böttrich Domicile and Court of Registry: München HRB 134810 - -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#196419): https://lists.openembedded.org/g/openembedded-core/message/196419 Mute This Topic: https://lists.openembedded.org/mt/104619206/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
On Wed, 28 Feb 2024 at 16:41, Philip Lorenz wrote: > > I'm assuming this data wouldn't be that large or that expensive to > > compute so I'd prefer not to hide it behind extra configuration options > > if we can help it. That does depend on the overheads/costs though. > > > I just executed build ID extraction on the debug packages of our medium > sized kirkstone based distro (see my reply to Alex for more details). > Sequentially extracting build IDs from around 8000 files took around > 1:30 minutes on my machine. While I wouldn't call this excessive, I am > also not sure whether this is too much overhead given that I only expect > this data to be used in some deployments. I have to object to the numbers because they were done with a sequential shell loop. Debuginfod does it in threads and is able to complete the scans much faster. So you need to check how quickly it completes its job when started with oe-debuginfod rather. There might be an improvement coming from what you are proposing, but it's most likely not going to be as drastic. >From debuginfod manpage: -c NUM --concurrency=NUM Set the concurrency limit for the scanning queue threads, which work together to process archives & files located by the traversal thread. This important for controlling CPU-intensive operations like parsing an ELF file and especially decompressing archives. The default is the number of processors on the system; the minimum is 1. https://manpages.debian.org/testing/debuginfod/debuginfod.8.en.html There's also something else I noticed just now: there seems to be an alternative implementation of debuginfod you want to introduce? Why? If the original from elfutils isn't working well enough, shouldn't we make it better? One possibility is teaching it to mass-import pre-computed entries into its index, so that sweeping file tree scans with archive extractions can be avoided altogether. Or doing incremental index imports directly from do_package. Alex -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#196398): https://lists.openembedded.org/g/openembedded-core/message/196398 Mute This Topic: https://lists.openembedded.org/mt/104619206/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
Hi Richard, On 28.02.24 10:14, Richard Purdie wrote: On Wed, 2024-02-28 at 07:21 +0100, Philip Lorenz wrote: With the introduction of debuginfod ([1]), providing debug symbols to developers has been greatly simplified. Initial support for spawning a debuginfod server is already available as part of poky. However, this relies on debuginfod scraping the debug packages for their build IDs. This is not only inefficient (as all packages need to be extracted again), but it also does not scale well when covering a large number of builds. To mitigate this, we are currently working on an approach to extract the metadata needed to provide debug symbols as part of the bitbake build. This metadata includes the mapping of the GNU build ID to the package holding the debug symbol. The metadata will be treated as another build artifact and can be consumed by a daemon implementing the debuginfod HTTP API to serve debug symbol file requests from the package feed produced by the bitbake build. Initially, we considered implementing the generation of debug metadata directly as part of emit_pkgdata() in package.bbclass (disabled by default). However, we discarded this idea as introducing a configuration option would increase maintenance effort for a feature that would potentially only be enabled in very few builds. Instead, we opted to extend package.bbclass to expose the minimal information needed to reliably identify debug symbol files, which can then be consumed by a packaging hook. Is this extension something that is viable to be merged? We are considering open-sourcing the other parts needed to implement the setup described above, but as those parts are still in the prototyping phase, it will require some more time. [1] https://sourceware.org/elfutils/Debuginfod.html I think this is the kind of direction we've wanted to go in. I'm not sure the patch as it stands is that useful as it just lists files which you could just as easily obtain with a os.walk on the filesystem but in principle I'd be fine with writing some extra data during do_package or do_packagedata which saves the buildid mappings. In one of my first iterations I placed the build ID to file mapping into the "extended" section of "pkgdata". We'd then consume this data after the build has finished to produce the debug info metadata database which contains the mapping from build ID to debug symbol file and the package containing the file. If this sounds sane to you I can clean up that version and share it here. So yes, in principle the idea sounds good but obviously the final decision would depend upon the patches. I'm assuming this data wouldn't be that large or that expensive to compute so I'd prefer not to hide it behind extra configuration options if we can help it. That does depend on the overheads/costs though. I just executed build ID extraction on the debug packages of our medium sized kirkstone based distro (see my reply to Alex for more details). Sequentially extracting build IDs from around 8000 files took around 1:30 minutes on my machine. While I wouldn't call this excessive, I am also not sure whether this is too much overhead given that I only expect this data to be used in some deployments. Br, Philip -- Philip Lorenz BMW Car IT GmbH, Software-Plattform, -Integration Connected Company, Lise-Meitner-Straße 14, 89081 Ulm - BMW Car IT GmbH Management: Chris Brandt and Michael Böttrich Domicile and Court of Registry: München HRB 134810 - -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#196392): https://lists.openembedded.org/g/openembedded-core/message/196392 Mute This Topic: https://lists.openembedded.org/mt/104619206/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
Hi Alex, On 28.02.24 08:41, Alexander Kanavin wrote: On Wed, 28 Feb 2024 at 07:22, Philip Lorenz wrote: However, this relies on debuginfod scraping the debug packages for their build IDs. This is not only inefficient (as all packages need to be extracted again), but it also does not scale well when covering a large number of builds. Is it possible to see numbers behind this claim? When there is a proposal to increase code complexity, that needs to be justified in a way that can be locally observed. Let me provide some numbers based on both an internal medium-sized build based on kirkstone as well as a core-image-minimal build based on master. Kirkstone: find -name "*.ipk" | wc 8415 8415 615076 du -h -c 3.5G total time /bin/sh -c 'for f in */*.ipk; do ar p $f data.tar.xz | tar -tJ > /dev/null; done' real 5m13.629s user 4m56.653s sys 1m41.578s master (core-image-minimal): find -name "*.ipk" | wc 4553 4553 287890 du -h -c 2.1G total time /bin/sh -c 'for f in */*.ipk; do ar p $f data.tar.zst | tar --zstd -t > /dev/null; done' real 1m2.521s user 0m40.876s sys 1m8.232s Exact figures of course vary and this can be further optimized by introducing parallelism. However, given that the artifacts are available uncompressed during packaging and the packaging step is also the one responsible for splitting out the debug symbols so limiting build ID extraction to the files that are known to contain debug symbols also is an efficiency win (and one also avoid implementing any kind of heuristics to determine which files actually contain the debug symbols). Is this extension something that is viable to be merged? We are considering open-sourcing the other parts needed to implement the setup described above, but as those parts are still in the prototyping phase, it will require some more time. The patch looks okay, but it's not useful without those other parts, so you need to get them ready and submit the whole set. I'll answer this as part of my reply to Richard. I'd be more than happy to share the tooling we use to produce the build ID metadata and this was more an issue of where to actually place it. The only thing that is not yet in a state ready for public consumption is our daemon that consumes this metadata and then transparently fulfills any incoming debuginfo requests by retrieving the debug file from the corresponding package. Br, Philip -- Philip Lorenz BMW Car IT GmbH, Software-Plattform, -Integration Connected Company, Lise-Meitner-Straße 14, 89081 Ulm - BMW Car IT GmbH Management: Chris Brandt and Michael Böttrich Domicile and Court of Registry: München HRB 134810 - -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#196389): https://lists.openembedded.org/g/openembedded-core/message/196389 Mute This Topic: https://lists.openembedded.org/mt/104619206/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
On Wed, 2024-02-28 at 07:21 +0100, Philip Lorenz wrote: > With the introduction of debuginfod ([1]), providing debug symbols to > developers has been greatly simplified. Initial support for spawning a > debuginfod server is already available as part of poky. > > However, this relies on debuginfod scraping the debug packages for their > build IDs. This is not only inefficient (as all packages need to be > extracted again), but it also does not scale well when covering a large > number of builds. > > To mitigate this, we are currently working on an approach to extract the > metadata needed to provide debug symbols as part of the bitbake build. > This metadata includes the mapping of the GNU build ID to the package > holding the debug symbol. The metadata will be treated as another build > artifact and can be consumed by a daemon implementing the debuginfod > HTTP API to serve debug symbol file requests from the package feed > produced by the bitbake build. > > Initially, we considered implementing the generation of debug metadata > directly as part of emit_pkgdata() in package.bbclass (disabled by > default). However, we discarded this idea as introducing a configuration > option would increase maintenance effort for a feature that would > potentially only be enabled in very few builds. Instead, we opted to > extend package.bbclass to expose the minimal information needed to > reliably identify debug symbol files, which can then be consumed by a > packaging hook. > > Is this extension something that is viable to be merged? We are > considering open-sourcing the other parts needed to implement the setup > described above, but as those parts are still in the prototyping phase, > it will require some more time. > > [1] https://sourceware.org/elfutils/Debuginfod.html I think this is the kind of direction we've wanted to go in. I'm not sure the patch as it stands is that useful as it just lists files which you could just as easily obtain with a os.walk on the filesystem but in principle I'd be fine with writing some extra data during do_package or do_packagedata which saves the buildid mappings. So yes, in principle the idea sounds good but obviously the final decision would depend upon the patches. I'm assuming this data wouldn't be that large or that expensive to compute so I'd prefer not to hide it behind extra configuration options if we can help it. That does depend on the overheads/costs though. Cheers, Richard -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#196377): https://lists.openembedded.org/g/openembedded-core/message/196377 Mute This Topic: https://lists.openembedded.org/mt/104619206/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
On Wed, 28 Feb 2024 at 07:22, Philip Lorenz wrote: > However, this relies on debuginfod scraping the debug packages for their > build IDs. This is not only inefficient (as all packages need to be > extracted again), but it also does not scale well when covering a large > number of builds. Is it possible to see numbers behind this claim? When there is a proposal to increase code complexity, that needs to be justified in a way that can be locally observed. > Is this extension something that is viable to be merged? We are > considering open-sourcing the other parts needed to implement the setup > described above, but as those parts are still in the prototyping phase, > it will require some more time. The patch looks okay, but it's not useful without those other parts, so you need to get them ready and submit the whole set. Alex -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#196376): https://lists.openembedded.org/g/openembedded-core/message/196376 Mute This Topic: https://lists.openembedded.org/mt/104619206/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-