Re: Presentation of traversi framework via graph recommendations

2016-09-09 Thread Amirouche Boubekki

I updated the article with a schema and the code.

http://hyperdev.fr/notes/a-graph-based-movie-recommender-engine-using-guile-scheme.html

Delete the content of /tmp/wt and reload the database.

Also, have a look at movielens-step01-*.scm file. The algorithm is very
simple hence the recommendation is not very good...


On 2016-09-09 16:05, Amirouche Boubekki wrote:

On 2016-09-09 09:49, Neil Jerram wrote:

I got lost at the point of looking up the genres for Toy Story; why
does that involve graph traversal? 



Because genres are connected to movies using an edge. It's possible to
store genre information in a movie vertex assoc as a list value but 
then

it will be difficult to fetch all movies for a given genre.

With this graph layout, you can for instance fetch the "fantasy" genre
and ask the question "what are all the movies of fantasy genre" simply
using 'outgoings' proc... See below.




Probably it would help to add a bit into the blog to explain how the
movie information is mapped into a graph. 



I should prolly add a drawing too.

While trying to write down an explanation about how the graph
is built I figured there is a mistake in how the graph is built.
Movie and genre are connected by a genre edge, it doesn't make much
sens. It should be somekind of relation like "movie is instance
of genre". It will make more sens, it will be more explicit.

I will rewrite the load script to avoid this mistake and rework
the article.

I will keep you posted. Thanks for your interest.




  Original Message  
From: Amirouche Boubekki
Sent: Friday, 9 September 2016 07:32
To: Guile User
Subject: Presentation of traversi framework via graph recommendations

Héllo,

I published an article on my blog about how to use `grf3`
the graph database library built on top of wiredtiger [0].

[0]
http://hyperdev.fr/notes/a-graph-based-movie-recommender-engine-using-guile-scheme.html

This introduce traversi framework to do graph traversal.
traversi is inspired from Tinkerpop's Gremlin. Traversi
is a custom stream library which is faster than srfi-41
and support backtracking.

I think that building traversi on top of streams make
graph traversal much more approachable.

This article is inspired from a *graph-based recommender engine* [1]


[1]
https://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/


Have fun!


--
Amirouche ~ amz3 ~ http://www.hyperdev.fr



Re: [HELP] a search engine in GNU Guile

2016-09-09 Thread Amirouche Boubekki

On 2016-09-09 16:05, Ralf Mattes wrote:
On Fri, Sep 09, 2016 at 09:39:24AM -0500, Christopher Allan Webber 
wrote:

Amirouche Boubekki writes:

> - port whoosh/lucene to guile to improve text search


Sorry, but I don't see the point of this.


I mean to say "to improve text search of my previous attempt at writing 
a search
engine". The previous iteration of this project does not support boolean 
search.



At least Lucene has a http-based
interface that can be accessed by any kind of client language.


That is trivial to do with guile too.


Why reinvent the wheel


Because it's a hobby.


(and, in the case of Lucene, a rather well working,


It's not possible to use a custom storage engine with Lucene.


extremly mature


My theory is that some search engine businesses like algolia forked 
Lucene to build
it on top of something similar to wiredtiger and can now claim 
impressive performance.


What I mean to say basically, is that wsh.scm is innovation. I read here 
and there that
big players are actually using storage engines similar to wiredtiger to 
build search engines...

So, it's not a bad idea it just an idea that is not common.


and complex wheel)?


How complex? That's what I try to understand. AFAIK it's not as complex 
as opencog

since I can rewrite more features.





This is something I'd love to see generally.  It would be nice to have
an indexing library, either by writing bindings to Xapian (which
unfortunately couldn't use the FFI since it's C++),


But almost all of Xapian's bindings are Swig-generated (and that seems 
to be
the prefered way of generating bindings). IIRC I used the Swig Guile 
bindings
years ago (I'm pretty shure that code got lost in a harddisk crash, but 
I'm to

lazy to google it up ...).


or natively porting
something like Whoosh, for Guile.


I've seen similar approaches for Common Lisp (search for montezuma) but 
in the

end it seems to be way too much work - remember that not a small part
of Lucene's
success is based on the existing ecosystem (Solr, excellent language
parsers et al.)


If you think about stemming then it's not supported yet by wsh at all. 
It's an area

I'd like to improve.

I agree that if someone wants to create a business using Guile, they 
would be
up and running faster using ES or solr. It will be a good contribution 
to Guile
ecosystem. I am not building a business, I'm studying free software zoo. 
wsh is
basically a notes in the form of code on the road to what I actually 
want to reach
which is concept search  cf. 
https://en.wikipedia.org/wiki/Concept_search




Re: [HELP] a search engine in GNU Guile

2016-09-09 Thread Ralf Mattes
On Fri, Sep 09, 2016 at 09:39:24AM -0500, Christopher Allan Webber wrote:
> Amirouche Boubekki writes:
> 
> > - port whoosh/lucene to guile to improve text search

Sorry, but I don't see the point of this. At least Lucene has a http-based
interface that can be accessed by any kind of client language. Why reinvent the
wheel (and, in the case of Lucene, a rather well working, extremly mature and
complex wheel)?
 
> This is something I'd love to see generally.  It would be nice to have
> an indexing library, either by writing bindings to Xapian (which
> unfortunately couldn't use the FFI since it's C++),

But almost all of Xapian's bindings are Swig-generated (and that seems to be
the prefered way of generating bindings). IIRC I used the Swig Guile bindings
years ago (I'm pretty shure that code got lost in a harddisk crash, but I'm to
lazy to google it up ...).
 
> or natively porting
> something like Whoosh, for Guile.

I've seen similar approaches for Common Lisp (search for montezuma) but in the
end it seems to be way too much work - remember that not a small part of 
Lucene's
success is based on the existing ecosystem (Solr, excellent language parsers et 
al.)

Cheers, Ralf Mattes


> If you write this as an independent library, let me know.  I'm a likely
> user.
> 
>  - Chris
> 



Re: [HELP] a search engine in GNU Guile

2016-09-09 Thread Christopher Allan Webber
Amirouche Boubekki writes:

> Héllo,
>
>
> I'd like to share with you a mini-project on the road of Culturia 0.1
> [0] which is a boolean keyword search engine (similar in principle to
> xapian, lucene and whoosh (with less polish and features)).

... and I didn't read this until after I wrote my last message.  Very
cool! :)  I hope to look more soon.



Re: [HELP] a search engine in GNU Guile

2016-09-09 Thread Christopher Allan Webber
Amirouche Boubekki writes:

> - port whoosh/lucene to guile to improve text search

This is something I'd love to see generally.  It would be nice to have
an indexing library, either by writing bindings to Xapian (which
unfortunately couldn't use the FFI since it's C++), or natively porting
something like Whoosh, for Guile.

If you write this as an independent library, let me know.  I'm a likely
user.

 - Chris



Re: Presentation of traversi framework via graph recommendations

2016-09-09 Thread Amirouche Boubekki

On 2016-09-09 09:49, Neil Jerram wrote:

I got lost at the point of looking up the genres for Toy Story; why
does that involve graph traversal? 



Because genres are connected to movies using an edge. It's possible to
store genre information in a movie vertex assoc as a list value but then
it will be difficult to fetch all movies for a given genre.

With this graph layout, you can for instance fetch the "fantasy" genre
and ask the question "what are all the movies of fantasy genre" simply
using 'outgoings' proc... See below.




Probably it would help to add a bit into the blog to explain how the
movie information is mapped into a graph. 



I should prolly add a drawing too.

While trying to write down an explanation about how the graph
is built I figured there is a mistake in how the graph is built.
Movie and genre are connected by a genre edge, it doesn't make much
sens. It should be somekind of relation like "movie is instance
of genre". It will make more sens, it will be more explicit.

I will rewrite the load script to avoid this mistake and rework
the article.

I will keep you posted. Thanks for your interest.




  Original Message  
From: Amirouche Boubekki
Sent: Friday, 9 September 2016 07:32
To: Guile User
Subject: Presentation of traversi framework via graph recommendations

Héllo,

I published an article on my blog about how to use `grf3`
the graph database library built on top of wiredtiger [0].

[0]
http://hyperdev.fr/notes/a-graph-based-movie-recommender-engine-using-guile-scheme.html

This introduce traversi framework to do graph traversal.
traversi is inspired from Tinkerpop's Gremlin. Traversi
is a custom stream library which is faster than srfi-41
and support backtracking.

I think that building traversi on top of streams make
graph traversal much more approachable.

This article is inspired from a *graph-based recommender engine* [1]


[1]
https://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/


Have fun!




Re: Presentation of traversi framework via graph recommendations

2016-09-09 Thread Neil Jerram
I got lost at the point of looking up the genres for Toy Story; why does that 
involve graph traversal? 

Probably it would help to add a bit into the blog to explain how the movie 
information is mapped into a graph. 


  Original Message  
From: Amirouche Boubekki
Sent: Friday, 9 September 2016 07:32
To: Guile User
Subject: Presentation of traversi framework via graph recommendations

Héllo,

I published an article on my blog about how to use `grf3`
the graph database library built on top of wiredtiger [0].

[0] 
http://hyperdev.fr/notes/a-graph-based-movie-recommender-engine-using-guile-scheme.html

This introduce traversi framework to do graph traversal.
traversi is inspired from Tinkerpop's Gremlin. Traversi
is a custom stream library which is faster than srfi-41
and support backtracking.

I think that building traversi on top of streams make
graph traversal much more approachable.

This article is inspired from a *graph-based recommender engine* [1]


[1] 
https://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/


Have fun!




Re: Presentation of traversi framework via graph recommendations

2016-09-09 Thread Neil Jerram
Your blog includes:

‎Mind the fact that -ref procedures have no ! at the end which means they 
return a new record.

I think that should be -set instead of -ref

   Neil 


  Original Message  
From: Amirouche Boubekki
Sent: Friday, 9 September 2016 07:32
To: Guile User
Subject: Presentation of traversi framework via graph recommendations

Héllo,

I published an article on my blog about how to use `grf3`
the graph database library built on top of wiredtiger [0].

[0] 
http://hyperdev.fr/notes/a-graph-based-movie-recommender-engine-using-guile-scheme.html

This introduce traversi framework to do graph traversal.
traversi is inspired from Tinkerpop's Gremlin. Traversi
is a custom stream library which is faster than srfi-41
and support backtracking.

I think that building traversi on top of streams make
graph traversal much more approachable.

This article is inspired from a *graph-based recommender engine* [1]


[1] 
https://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/


Have fun!