Re: database setup in language learning

Alexander Burger Thu, 17 Jul 2014 00:19:31 -0700

Hello marmorine,

> (note: a bit long being a first post)


No problem :)


> post a first attempt to check form and convention and company, so here
> the latest variant (working):
> ...

Fine. That looks good.


A few minor notes:

1. As "___ " is a local transient symbol, the single quote is not really
   needed.


2. (and M (<> (setq G (cadr (assoc (key) *Art))) " ") )

   The symbols 'G' and 'N' are free variables. Doesn't harm probably, but
   I would recommend to put a

      (use (G N)

   somehere before the 'loop'.



> I've noticed the pil db is considerably larger than the original
> sqlite, even with only part of the data. If that is normal that is also
> OK with me, just wanted to check.

You may be able to tune that a bit. If you call (pool "words02d.db") the
default is a single-file database, with a block size of 256.

Now it depends on the average sizes of the word lists, but if most of
them are rather short, you waste some space in each block. You can take
a look at the sizes if you create a (perhaps smaller) database with
typical data, and then do

   (mapcar size (collect 'wrd '+Wrds))

Also, it might be better to put the indexes into a separate file(s),
both for space and speed considerations. Especially the 'wrd' index has
rather long keys. Assuming that a block size for the words of 128 is
enough, I would do

   (dbs
      (1 +Wrds)               # 128 Byte/block
      (4 (+Wrds wid wrd)) )   # 4096 Byte/block

to put both indexes into the second DB file, and then

   (pool "words02d.db" *Dbs)


> Mostly I am wondering about what to index and what not, also whether an
> idx is of any benefit to me.

I think the rule is simply: Index what you want to search for.

An 'idx' is only usefule for an in-memory structure. You could build the
whole system as a large 'idx' tree, if it fits into memory. But you lose
persistency then.


> The wid key (see below) seems necessary because I want to be able to
> random sample first of all.

Hmm, I've never done this, but there might be a very efficient solution:
If you have all words in a single file (and no objects of other types),
you can access them directly, without index, with the 'id' function. You
first determine the ID of the last object in the file:

   (setq *Max (id (for (S (seq (db: +Wrds)) S (seq S)) S)))

And then later try (as in your case)

   (while (id (db: +Wrds) (rand 1 *Max)))

because there may be holes in the file due to deleted objects.

Keep in mind that this would hang forever (like your version too), if
the DB is empty!


> deletions). I am writing it to run on a Nanonote among other things
> (picolisp already compiled and running!), so memory and speed could

Cool!


> Its funny, I was dead set on programming lisp, kind of a principle
> thing, but now that I have gotten my feet wet a bit I am starting to
> think that picolisp really IS the best choice for what I have in mind.
> Because of the db, the gui, its "all just there". A little gem.

I'm glad to hear that :)

♪♫ Alex
-- 
UNSUBSCRIBE: mailto:[email protected]?subject=Unsubscribe

Re: database setup in language learning

Reply via email to