Re: Mechanism for helping in multi-channels configuration (and Xapian index)

2024-05-06 Thread Simon Tournier
Hi,

Sorry for the long delay.

On lun., 18 mars 2024 at 16:05, Christina O'Donnell  wrote:

>> 2: https://issues.guix.gnu.org/issue/39258

> As I said above, [2] is a fairly long thread, but I think I get the 
> general idea. It seems that Xapian was implemented but didn't have the 
> desired speedup. Am I getting the right impression there?

Not really.  From my memories, the first blocker for implementing search
with Xapian was adding Xapian as dependency of Guix.  This addition
would be a bad idea, IMHO.  However…

At the time of discussing Xapian-based “guix search”,
GUIX_EXTENSIONS_PATH was at its infancy.  Therefore, it was not really
on the table.

… Xapian-based “package search” appears to me an option if it is turned
into a Guix extension.  This way, adding Xapian as dependency is not for
all but only for those who want more features. :-)

I have on my TODO list to resume the work:

 1. Benchmark Xapian-based search
 2. Benchmark Xapian index building

Then depending on that, it draws the directions.


Cheers,
simon




Re: Mechanism for helping in multi-channels configuration (and Xapian index)

2024-03-18 Thread Christina O'Donnell

Hi Simon,

Sorry for the really long delay, I meant to reply after I'd had a good 
read through the conversation you linked, but I haven't had a chance to 
really get into it yet, but I have read enough to get a surface idea of 
the project. The project looks fun, and looks like it will help Guix 
users and developers so I'd be on board in principle.


On 15/02/2024 15:05, Simon Tournier wrote:

...

I think you mean ’fold-packages’.


   2. Have a script that determines the symbols needed by each file. (Macros
      make this more difficult, but.)

Well, this would be difficult, IMHO.  Somehow, it is what the compiler
does. :-)
I asked on guile-devel and Maxime suggested using `module-map` or 
`module-for-each` which map over all symbols in a module. Presumably 
this would know what's quotes literally and what's a proper symbol.

   3. Have both scripts have an incremental version that runs on diffs (for
      performance).
   4. Run this for every commit on every branch on every channel caching the
      result.
   5. Have a CI script keep this updated for new commits.
   6. Have a server track incompatibilities.

Here, I think the issue is that one server needs to track all the
channels.  And that’s a too strong assumption, IMHO.

I think the design should be something on channel maintainer side.
Somehow, the main Guix channel could be seen as a Git submodule from the
channel side and the issue is that information is not tracked.

There is this ’.guix-channel’ file which allows to describe channel
dependencies.  And the improvements could be to add more there.  The
question is what to add and how to add it.  Keeping in mind the
simplicity and the maintenance burden-free. :-)


Okay, this makes sense. I'm thinking that you could have something like 
a sqlite index that can be generated by running a script on the code. 
The index could exist on the server separate to the channel repo, 
pointed at by the .guix-channel file. The commit hook could: (1) update 
the local index to include the latest commit; (2) update the hash inside 
.guix-channel. Then a push hook could also push the index to the server.


It's a bit clunky because you've got this binary blob that you have to 
synchronize with the channel, and it's easy to get this wrong. Putting 
the index in the channel repo would bloat the channel with old versions 
of the index. Forcing users to generate the index from scratch is 
undesirable too.


As an alternative to having the index referenced in the .guix-channel, 
we could use git-annex. This would take care of: Fetching the index, 
uploading the new index on push, and updating the hash. No extra steps 
would be /required/ by developers, as it won't be necessary to have the 
index 100% up to date. But developers could choose to regenerate the 
index and call `git annex sync`. I suspect that adding git-annex as a 
dependency would be resisted, but that's the way I think would work 
best. And could apply to existing indexes. It depends on how long it 
takes to generate the index from scratch. There was some talk of 
data.guix.gnu.org using PostgreSQL to index packages. I suppose it'd be 
worth figuring out what they do to see if they have anything sql or code 
that might be portable to sqlite.



Full disclosure: I've got nothing lined up for the summer yet, so I'm on the
prowl for GSoC projects :)

Cool!

In that spirit, one tool that is missing is: search packages in all the
history. Somehow the need is described by this message [1]: how to find
which Guix revision provides which version of Foo?

In addition, “guix search” is slow [2].

Well, I have started the embryo of an extension based on Guile-Xapian
for indexing and improving the search.  Really an embryo. :-)

I think this would fit some GSoC. ;-)


As I said above, [2] is a fairly long thread, but I think I get the 
general idea. It seems that Xapian was implemented but didn't have the 
desired speedup. Am I getting the right impression there?


It's certainly an interesting problem. I'll keep thinking about it.

Kind regards,

Christina


1: Re: List available versions of package.
Philippe Veber 
Tue, 11 Jun 2019 09:43:08 +0200
id:CAOOOohSzUezKvm=ro0bxrgh3m0eo2x0cotvd--varxwoqtc...@mail.gmail.com
https://lists.gnu.org/archive/html/help-guix/2019-06
https://yhetil.org/guix/CAOOOohSzUezKvm=ro0bxrgh3m0eo2x0cotvd--varxwoqtc...@mail.gmail.com

2: https://issues.guix.gnu.org/issue/39258