Re: Inverted index to accelerate guix package search
Hi simoun, Guix, On +2020-01-13 18:54:18 +0100, zimoun wrote: > Hi Arun, > > For sure, it is a good direction... but it is not that simple. :-) > > > On Mon, 13 Jan 2020 at 16:08, Arun Isaac wrote: > > > > Those measures don't seem precise enough to draw a good conclusion. > > > Could you increase the sample size (or maybe just loop?) so that all > > > times reach over a second or so? > > > > Indeed, my bad! Here are better timing results. I have repeated each of > > the searches a 1000 times, and I'm getting around 90x speedup! :-) > > > > Building index > > clock utime stime cutime cstime gctime > > 9.76 12.50 0.18 0.00 0.00 6.16 > > > > Brute force search (repeated 1000 times) > > clock utime stime cutime cstime gctime > > 195.95 264.13 0.90 0.00 0.00 142.62 > > `fold-packages` traverses all the packages so yes it is "slow". > > > > Inverted index search (repeated 1000 times) > > clock utime stime cutime cstime gctime > > 2.18 2.48 0.00 0.00 0.00 0.60 > > Yes, looking up into an hash table is really really faster. :-) > If you want to instrument particular calls, as you know, you can probably get sub-microsecond resolution (try the following snippet on your platform to see how long it takes to get a gettimeofday value -- even on this Acer Swift with its Intel(R) Pentium(R) CPU N4200 @ 1.10GHz I get about 10 calls per microsecond). At that kind of resolution, timing at various milestones in your program will show interesting regularities and outliers. You may even detect timing subsets caused by such things as the CPU taking a shortcut when multiplying by zero. If you time a thousand calls of A in a loop followed by the same for B, and then do a loop that calls them in succession inside the loop, you will most likely discover cache or other resource competitions between A and B. My point is that benchmarking meaningfully is subtle, and it is easy to be fooled by statistical summary measures of distributions that are really combinations of separate distributions (that are easily, sometimes beautifully, seen in scattergrams). --8<---cut here---start->8--- #!/usr/bin/guile -s !# ttime -- see how fast gettimeofday is 2017-08-24 20:44:21 -- bokr AT bokr DOT com (use-modules (ice-9 pretty-print)) (use-modules (ice-9 format)) (define loop-count 0) (catch #t (lambda () (set! loop-count (string->number (cadr (command-line) (lambda ( k . rest ) (begin (display "\nFirst arg should be integer loop count\n") (display "Suggested usage from bash: ./ttime 500|uniq -c\n") (display "to see calls/useq\n\n") (exit (define start-loop (gettimeofday)) (define end-loop start-loop) (display (let lp ((x loop-count) (tl `())) (if (positive? x) (lp (- x 1) (cons (gettimeofday) tl)) (begin (set! end-loop (gettimeofday)) (format #f "~y" tl) (format #t "start loop: ~a\n" start-loop) (format #t " end loop: ~a\n" end-loop) (format #t " end disp: ~a\n" (gettimeofday)) (newline) --8<---cut here---end--->8--- (pls excuse the odd use of catch -- I was playing around, meaning later to catch other errors like bad count etc., but never got a round tuit) So just chmod 755 ttime and then ./ttime 500|uniq -c as it will suggest if you run it without args. > > > However, the core issue about "search" is not 'find quickly an item' > but 'find the relevant item'. > For example, Bloom filter [1] or B-tree could be interesting data > structure if we are talking about fast retrieval. Note that this part > should even be delegated to SQLite, IMHO. > > > [1] https://en.wikipedia.org/wiki/Bloom_filter > [2] https://en.wikipedia.org/wiki/B-tree > > > Well, the issue is 'scoring' a query. For example, try: > > guix search youtube | recsel -p name,relevance > > --8<---cut here---start->8--- > name: youtube-dl-gui > relevance: 15 > > name: youtube-viewer > relevance: 11 > > name: youtube-dl > relevance: 11 > > name: mps-youtube > relevance: 11 > > name: emacs-youtube-dl > relevance: 11 > --8<---cut here---end--->8--- > > Why the package youtube-dl-gui is more relevant? Because the function > that computes the relevance score not enough "smart". The function > guix/ui.scm(relevance) is too simple: the score is the number of > ocurences of the pattern (weighted by the field where the pattern > matches). Therefore, if you compare "guix show" the packages > "youtube-dl-gui" and "youtube-dl", you will see that the term > "youtube" (case-insentive) appears 5 times for youtube-dl-gui against > only 3 times for youtube-dl. Does the difference (5 vs 3) provide more > information? I do not think so... The issue is: 1. the description is > not well-written (for the current scoring system) and 2. the scoring > function is too
Re: Package request: OpenRA
Hi Pierre, > We don't have OpenRA (https://www.openra.net/). > Has anyone worked on it? > Otherwise I'll go ahead. > > Any tip, any special wish? I have a working package (last tested a while ago, some third party hashes would need updaring, and there may be new build failures, but i played it in the past) in my repository at https://gitlab.com/pkill-9/guix-packages-free/tree/master/pkill9/packages/openra, however the third party .Net libraries aren't built from source because I couldn't get mono to build them from source. I believe Mono is too old on Guix to build them. Another issue to note is that i had to wrap xdg data dirs to point to generated mono certificates. Another issue is that maxmind geoip database is used during build, but the url constantly updates and they dont provide previous versions. best solution would be patch openra upstream to provide option not to use geoip database, or to grab it at runtime. I don't believe there are any licensing issues with using maxmind's geoip database in guix, can't remember though.
Re: staging branch open
Danny Milosavljevic writes: > Hi Marius, > > On Sat, 11 Jan 2020 23:34:19 +0100 > Marius Bakke wrote: > >> There is a brand new staging branch on Savannah: >> >> https://git.savannah.gnu.org/cgit/guix.git/log/?h=staging >> >> Please submit your changes by Monday, January 19th. > > That's not a Monday :-) Derp, I meant 20th. :-) > Should I update mrustc on staging? That would be terrific. I look forward to reducing the chain of Rust compilers. Let's see if it makes porting to other architectures than x86_64 easier. signature.asc Description: PGP signature
Re: Package file indexing
Pierre Neidhardt writes: > Christopher Baines writes: > >>> Hmm, but this data relates to items in the store and online, they are >>> global. Per-user would mean redundant packages and redundant (remote) >>> queries. >> >> Yeah, maybe there's some way of optimising things for systems with >> multiple users. > > Note: I meant "redundant database", not "packages". > > Can you think of good reasons not to store it globally in /var? Not specifically, it just becomes more complex as you have to consider more issues. Like the mundane "what if one user is using a file, and another deletes it" to the more unusual "what if you can modify the file to crash or exploit processes run by other users". signature.asc Description: PGP signature
Re: [Proposal] The Formal Methods in GNU Guix Working Group
Hello, Ludovic Courtès writes: > Hello! > > (Cc: maintainers.) > > Brett Gilio skribis: > >> Dec 30, 2019 3:34:22 PM Ludovic Courtès : >> >>> Guix-HPC is “institutional”, that’s part of the reason behind this. >>> Regarding gitlab.inria.fr, that’s because it used to be hosted at Inria. >>> Also, is a channel developed >>> by colleagues at Inria, so it’s more convenient to have it there. >> >> >> Hey Ludo, thanks for the explanation. >> >> It makes sense why Guix-HPC lives somewhere else. Given this, what >> do you propose for initiating the conversation on where the formal >> methods haunt page should live with the other maintainers? I >> personally think the repository should live on Savannah, but the >> address needs to be discussed. > > It’s fine to host the repo on Savannah: we can ask for a new repo under > the Guix umbrella, the downside being that access control will be the > same as for the other repos (we can only grant access to all the repos > or none of them.) If you plan to open it more to formal methods people > that do not yet contribute to Guix, it might be easier to use a separate > repo. You tell us! > > As for the domain name: I think it would be fine to use > formal-methods.guix.gnu.org as long as the web site follows GNU and Guix > policy, which mostly means referring only to free software, avoiding the > phrase “open source” to describe it, and probably avoiding institution > logos and such (I don’t think there’s any written policy but I would > personally find it out of place on gnu.org.) Anyway, the two of you are > webmasters so you probably know this better than I do. IOW, if you want > to flatter your employers and labs, you might want to opt for a separate > web site. :-) > > Maintainers, what do you think? I haven't followed the discussion of a formal methods work group associated with Guix as closely as I should have, but I'm thrilled that such an initiative is taking off! > Anyway, step #1 is to get a web page ready. :-) Agreed. Maxim
Re: qtwenengine anybody?
> 2. Send all other patches by using the "--in-reply-to=$ABOVE_MESSAGE_ID" > option. Make sure the patches are in the right order on the command > line. Example: > > git send-email --to=38...@debbugs.gnu.org \ > --in-reply-to='<20200113103304.9093-1-mike.ros...@gmail.com>' \ > 0002-patch-foo.diff 0003-patch-bar.diff > > Hope that helps! Ah I appreciate this feedback. Hopefully I can improve my ML patch workflow. Mike
Re: Package file indexing
Pierre Neidhardt writes: > Christopher Baines writes: > >> Maybe sqlite is one to try initially. There's guile-sqlite3 for reading >> and writing, and it can contain multiple tables as well as indexes for >> fast searching. >> >>> Where would we store this database? In /var? >> >> Per user is probably most flexible, so in the home directory somewhere. > > Hmm, but this data relates to items in the store and online, they are > global. Per-user would mean redundant packages and redundant (remote) > queries. Yeah, maybe there's some way of optimising things for systems with multiple users. signature.asc Description: PGP signature
Re: Inverted index to accelerate guix package search
Hi Arun, For sure, it is a good direction... but it is not that simple. :-) On Mon, 13 Jan 2020 at 16:08, Arun Isaac wrote: > > Those measures don't seem precise enough to draw a good conclusion. > > Could you increase the sample size (or maybe just loop?) so that all > > times reach over a second or so? > > Indeed, my bad! Here are better timing results. I have repeated each of > the searches a 1000 times, and I'm getting around 90x speedup! :-) > > Building index > clock utime stime cutime cstime gctime > 9.76 12.50 0.18 0.00 0.00 6.16 > > Brute force search (repeated 1000 times) > clock utime stime cutime cstime gctime > 195.95 264.13 0.90 0.00 0.00 142.62 `fold-packages` traverses all the packages so yes it is "slow". > Inverted index search (repeated 1000 times) > clock utime stime cutime cstime gctime > 2.18 2.48 0.00 0.00 0.00 0.60 Yes, looking up into an hash table is really really faster. :-) However, the core issue about "search" is not 'find quickly an item' but 'find the relevant item'. For example, Bloom filter [1] or B-tree could be interesting data structure if we are talking about fast retrieval. Note that this part should even be delegated to SQLite, IMHO. [1] https://en.wikipedia.org/wiki/Bloom_filter [2] https://en.wikipedia.org/wiki/B-tree Well, the issue is 'scoring' a query. For example, try: guix search youtube | recsel -p name,relevance --8<---cut here---start->8--- name: youtube-dl-gui relevance: 15 name: youtube-viewer relevance: 11 name: youtube-dl relevance: 11 name: mps-youtube relevance: 11 name: emacs-youtube-dl relevance: 11 --8<---cut here---end--->8--- Why the package youtube-dl-gui is more relevant? Because the function that computes the relevance score not enough "smart". The function guix/ui.scm(relevance) is too simple: the score is the number of ocurences of the pattern (weighted by the field where the pattern matches). Therefore, if you compare "guix show" the packages "youtube-dl-gui" and "youtube-dl", you will see that the term "youtube" (case-insentive) appears 5 times for youtube-dl-gui against only 3 times for youtube-dl. Does the difference (5 vs 3) provide more information? I do not think so... The issue is: 1. the description is not well-written (for the current scoring system) and 2. the scoring function is too simple. I have tried to described such issues/views here [3] [4] and since I have not read so much about recommender systems: state-of-the-art, NLP tools in Guile, etc. [3] https://lists.gnu.org/archive/html/guix-devel/2019-07/msg00252.html [4] https://lists.gnu.org/archive/html/guix-devel/2019-12/msg00160.html Last, I have not read the 2 links you pointed but I think you have a bug in your code. The retrieval "game" using the index returns the package "r-mgcv" which does not contain the term "game" in its description. Well, the query "game" using the brute force does not return this very package. Then, in the function "guix/scripts/package.scm(find-packages-by-description)", you could try to replace the `fold-packages' by something using the inverted index. (Note that the name `find-packages-by-descripton' is odd because it uses synopsis, name, etc. too.). Hope that helps. Chhers, simon
Re: Store channel specification in profile
zimoun writes: > To me, the aim is to have something compliant between > /manifest and --manifest. And compliant does not mean that > /manifest is the entry point for the user specifications. > What I find a bit odd is: today, --manifest accepts a DSL and Guix > outputs to /manifest another DSL. Both are restricted tiny > DSL. And from all the recent discussions about manifests and so on, I > find appealing to extend the DSL of --manifest and use a subset to > write /manifest. I do not know. As I see it, /manifest or even a subset of it is not convenient for users to write. If we want to expose a convenient format, I can't see a way around using a new file. -- Pierre Neidhardt https://ambrevar.xyz/ signature.asc Description: PGP signature
Re: Store channel specification in profile
On Mon, 13 Jan 2020 at 15:59, Pierre Neidhardt wrote: > > If I understand correctly, it's because of the manifest files need > information like the store path and the propagated inputs, which are too > inconvenient for a user-facing "specification file." Hum? I am not convinced yet. :-) For example, the record contains the field "outputs" but some packages do not use it; e.g., "xmag". Same for "native-inputs", etc. To me, the aim is to have something compliant between /manifest and --manifest. And compliant does not mean that /manifest is the entry point for the user specifications. What I find a bit odd is: today, --manifest accepts a DSL and Guix outputs to /manifest another DSL. Both are restricted tiny DSL. And from all the recent discussions about manifests and so on, I find appealing to extend the DSL of --manifest and use a subset to write /manifest. I do not know. Cheers, simon
Re: Inverted index to accelerate guix package search
> Those measures don't seem precise enough to draw a good conclusion. > Could you increase the sample size (or maybe just loop?) so that all > times reach over a second or so? Indeed, my bad! Here are better timing results. I have repeated each of the searches a 1000 times, and I'm getting around 90x speedup! :-) Building index clock utime stime cutime cstime gctime 9.76 12.50 0.18 0.00 0.00 6.16 Brute force search (repeated 1000 times) clock utime stime cutime cstime gctime 195.95 264.13 0.90 0.00 0.00 142.62 Inverted index search (repeated 1000 times) clock utime stime cutime cstime gctime 2.18 2.48 0.00 0.00 0.00 0.60 Also, find attached the updated proof of concept script. Cheers! (use-modules (gnu packages) (guix packages) (ice-9 match) (ice-9 time) (srfi srfi-1) (srfi srfi-26)) ;;; Implementation of sets using hash tables (define make-set make-hash-table) (define set-element-of? hash-ref) (define set-add! (cut hash-set! <> <> #t)) (define set->list (cut hash-map->list (lambda (element _) element) <>)) (define (set-intersection set1 set2) "Return the intersection of SET1 with SET2" (let ((result (make-set))) (hash-for-each (lambda (element _) (when (set-element-of? set2 element) (set-add! result element))) set1) result)) (define (trigrams str) "Return all trigrams in STR." (if (< (string-length str) 3) (list) (map (lambda (start) (substring str start (+ start 3))) (iota (- (string-length str) 2) ;; Inverted index (define index (make-hash-table)) (define (build-index!) "Return an inverted index of all trigrams in the descriptions of all packages." (fold-packages (lambda (package _) (for-each (lambda (trigram) (let ((set (hash-ref index trigram (make-set (set-add! set (package-name package)) (hash-set! index trigram set))) (trigrams (package-description package #f) index) (define (brute-force-retrieve query) "Return names of all packages whose descriptions contain the string QUERY. Search brute force by folding over all packages." (fold-packages (lambda (package result) (if (string-contains (package-description package) query) (cons (package-name package) result) result)) (list))) (define (inverted-index-retrieve query) "Return names of all packages whose descriptions contain the string QUERY. Search using the pre-built inverted index." (match (trigrams query) ((first-trigram . remaining-trigrams) (set->list (fold (lambda (trigram result) (set-intersection result (hash-ref index trigram))) (hash-ref index first-trigram) remaining-trigrams))) (() (brute-force-retrieve query (define (repeat thunk times) (unless (zero? times) (thunk) (repeat thunk (1- times (display "Building index") (newline) (time (build-index!)) (newline) (display "Brute force search (repeated 1000 times)") (newline) (time (repeat (cut brute-force-retrieve "strategy") 1000)) (newline) (display "Inverted index search (repeated 1000 times)") (newline) (time (repeat (cut inverted-index-retrieve "strategy") 1000)) (newline) signature.asc Description: PGP signature
Re: Store channel specification in profile
If I understand correctly, it's because of the manifest files need information like the store path and the propagated inputs, which are too inconvenient for a user-facing "specification file." -- Pierre Neidhardt https://ambrevar.xyz/ signature.asc Description: PGP signature
Re: Store channel specification in profile
Hi Pierre, On Mon, 13 Jan 2020 at 15:02, Pierre Neidhardt wrote: > So what's the take-away of this thread? > > 1. Simon suggested to add options to convert the manifest to the > user-friendly specification file (i.e. something compatible with the > --manifest option). > > What about this instead: systematically generate this "specification" > file in every profile? This way no need for extra command line options, > the work is already done for the user. "To reproduce a profile" would > boil down to passing around this specification file. Sounds good to me. ;-) > 2. On December 2, Simon mentioned Ludo's suggestion (from > https://lists.gnu.org/archive/html/guix-devel/2019-11/msg00285.html) > that we added a "guix channel" subcommand for channel management. You mean Konrad's suggestions. :-) > Shall we open 2 bugs for these? The UI is not clear to me. And before adding some new CLI, we need to agree (reach consensus) on what it is expected. Because then, it will be hard to change/remove this CLI. For example, Konrad also suggested "guix profile" doing part of "guix package". And this same name "guix profile" showed up again in the discussion about "guix environment". Maybe "guix channel" could supersede "guix describe". etc. All the best, simon
Re: Store channel specification in profile
Hi Ludo, On Sun, 12 Jan 2020 at 00:48, Ludovic Courtès wrote: > Note that it was never designed to be human-friendly. :-) The Scheme > code passed to ‘--manifest’ is friendlier. Why do not change a bit both /manifest and code accepted by '--manifest' to have something consistent and human friendly? (Even if successive transactions will produce an unavoidable mess to /manifest.) Well, I do not see why there is 2 different syntax/semantic; one for /manifest and another one for the code accepted by '--manifest'. Except, it is some pieces of work to modify the actual code. :-) Cheers, simon
Re: qtwenengine anybody?
Hi Mike, > I've have sent some patches to an open bug #38148 in regards to > qutebrowser being outdated. I'm hoping this will help test out > qtwebengine and close that bug. Two birds with one stone :) Qutebrowser 1.9.0 works now, great job! I've reviewed those patches; fix the few nits and I'll merge. > On second thought maybe I should have emailed them to > guix-patc...@gnu.org. Let me know if that's better for reviewing. If you think the bug title is relevant, then it's the right thing to do. That said, it seems that you've sent the first patch twice. To clarify, this is how you should proceed to send patches to debbugs: 1. Send the first patch only. If you send with "git send-email", you should see the message ID in the output. 2. Send all other patches by using the "--in-reply-to=$ABOVE_MESSAGE_ID" option. Make sure the patches are in the right order on the command line. Example: git send-email --to=38...@debbugs.gnu.org \ --in-reply-to='<20200113103304.9093-1-mike.ros...@gmail.com>' \ 0002-patch-foo.diff 0003-patch-bar.diff Hope that helps! -- Pierre Neidhardt https://ambrevar.xyz/ signature.asc Description: PGP signature
Re: Store channel specification in profile
Hi, >> Questions: >> >> - Do manifests really need the store path? >> - Same question about propagated-inputs. Aren't they already encoded in >> the package definition? Why repeating them here? > > This ‘manifest’ file exists mostly for one purpose: to allow incremental > operations on a profile with ‘guix upgrade’, ‘guix install’, and so on. > If ‘--manifest’ were the only way to build a profile, this ‘manifest’ > file would (almost) not be needed. (Actually it’s also needed for > ‘--list-installed’.) Makes sense. So what's the take-away of this thread? 1. Simon suggested to add options to convert the manifest to the user-friendly specification file (i.e. something compatible with the --manifest option). What about this instead: systematically generate this "specification" file in every profile? This way no need for extra command line options, the work is already done for the user. "To reproduce a profile" would boil down to passing around this specification file. 2. On December 2, Simon mentioned Ludo's suggestion (from https://lists.gnu.org/archive/html/guix-devel/2019-11/msg00285.html) that we added a "guix channel" subcommand for channel management. Shall we open 2 bugs for these? -- Pierre Neidhardt https://ambrevar.xyz/ signature.asc Description: PGP signature
Outreachy 2020 summer blogpost
Hello, we are happy to announce a new post detailing the Guix participation on the following round. See https://guix.gnu.org/blog/2020/join-gnu-guix-through-outreachy/ Please spread the word! Prospective mentors, please feel free to contact me should you have any other proposals. Mentor applications are open, and proposals can be uploaded already. Proposal upload deadline is Feb. 25, perferably before Feb. 11, so that we can promote them on the Outreachy Twitter chat on Feb. 11 at 4pm UTC. Best regards, g_bor -- OpenPGP Key Fingerprint: 7988:3B9F:7D6A:4DBF:3719:0367:2506:A96C:CF63:0B21
Re: qtwenengine anybody?
Pierre Neidhardt writes: > Hooray! > > Thank you Marius, and thank you Mike for the tremendous effort! > >> Now we just need some packages using it! :-) > > We can get started with Qutebrowser. Hello Pierre and Marius, I've have sent some patches to an open bug #38148 in regards to qutebrowser being outdated. I'm hoping this will help test out qtwebengine and close that bug. Two birds with one stone :) The patch includes python-pyqtwebengine and upgrade of qutebrowser to 1.9.0 and it now uses qtwebengine. If anyone can help review and make suggestions that would be great. On second thought maybe I should have emailed them to guix-patc...@gnu.org. Let me know if that's better for reviewing. Mike
Re: staging branch open
Hi Marius, On Sat, 11 Jan 2020 23:34:19 +0100 Marius Bakke wrote: > There is a brand new staging branch on Savannah: > > https://git.savannah.gnu.org/cgit/guix.git/log/?h=staging > > Please submit your changes by Monday, January 19th. That's not a Monday :-) Should I update mrustc on staging? pgp1yBX88Xric.pgp Description: OpenPGP digital signature
Re: Proposal for a blog contribution on reproducible computations
Hello! Konrad Hinsen skribis: >> Minor comments: >> >> • You write “Build systems are packages as well”. This could be >> slightly misleading: build systems are (1) a set of packages, and >> (2) a build procedure. Dunno if it makes sense to clarify that. > > Maybe I got something wrong, but I think I described this as you say > (please check!). Quote: > > Build systems are pieces of Guile code that are part of Guix. But this > Guile code is only a shallow layer orchestrating invocations of other > software, such as =gcc= or =make=. And that software is defined by > packages. > > The build procedure is that "shallow layer orchestrating invocations". > Does this sound right? Oh yes, that’s entirely correct! It’s just the section title that I thought could be misleading, but maybe not given this explanation. >> • Regarding ‘--container’, you write that namespaces “may not be >> present on your system, or may be disabled by default”, which is a >> bit strong; “may be present on your system, but perhaps disabled by >> default” would be more accurate. :-) > > Fixed. I don't know anything about the implementation techniques of > –container, so I'll blindly write what you say :-) It relies on “unprivileged user namespaces”, a Linux feature that’s been around for some time, and is almost always compiled in, but is disabled by default on some major distros. Thanks! Ludo’.