Re: a passionate guy who want to join in as a developer

2012-08-12 Thread rushan chen
Hi Mark,

Very appreciate for your reply.

I see you mention that it's useful to implement a larger library of
efficient data structure, and I'm interested in that very much. I used to
work on projects which involve complicated but very interesting data
structures, implementing them could be challenging, but once done I feel a
great sense of achievement.

One such project is implementing a language model (LM) which is a core
component of speech recognition and machine translation. I don't know if
you heard of it before. Unfortunately, I can't cover it too detailed here,
that would complicate things too much.

Basically, one of the key operations LM supports is it should return a
probability associated with any given id sequence. All id sequences are of
the same length, and there are a mass amount of such id sequences (a
commonly-seen LM may contain billions of them). So it's required to store
LM in a concise way, and at the same time make the search for each id
sequence very quickly.

Trie is finally chosen to be the data structure for LM (there were many
papers discussing this issue). All id sequences with the same prefix share
the same internal node, for example, for 1, 2, 3, 4 and 1, 2, 3, 5,
only one copy of 1, 2, 3 will be stored in LM, and a search for a id
sequence is done by a sequence of binary search until the leaf is met. One
extra thing worth mentioning is that I store the whole trie structure in a
single large piece of memory (usually around 2 gigabytes), which makes
it convenient to write out to disk and load into memory by simply using
mmap, and I think it also makes the system faster than if you allocate
memory every time it's needed.

There are some other projects I worked or working on like Spell Corrector,
which also involve complicated data structures, but due to privacy policy,
I can't say much about it.

All in all, I'm very interested in it, and I really really hope I can help.

Looking forward to your reply. Thanks in advance.

Have fun!

Rushan Chen


New guildhall repository

2012-08-12 Thread Ian Price

Hi,

For those of you using guildhall, I've set up a new repository on my
website. There isn't much there now, but I'll be looking into packaging
and uploading more code as time goes on.

To get started with guildhall, please follow the instructions in
https://gist.github.com/3327296

If you already have a guildhall install, add
 (repository shift-reset http://shift-reset.com/doro/;)
to your config.scm
and `guild update' to enable access.

Naturally, finding and packaging guile code could be speeded up if you
are willing to help out :) As I said in the other thread, you can email
me links if you want me to package them, and you can check out the
guildhall documentation for the pkg-list.scm format if you want to give
it a try yourself.

There is no interface for user submissions at the moment, but I am
looking into that. Hopefully, there will be soon.

-- 
Ian Price -- shift-reset.com

Programming is like pinball. The reward for doing it well is
the opportunity to do it again - from The Wizardy Compiled




wip-rtl extra space at allocation of frame at call's

2012-08-12 Thread Stefan Israelsson Tampe
Hi,

Currently if we want to compile to native code in the rtl branch the call
instruction is very heave to inline
directly. I would like to, when the number of arguments is less then 20
arguments jump to global code segment
where all the nessesary heavy lifting is done. Else the whole call
instruction can go in. So the issue here is that
I would like to move the alloc-frame part of the call instruction out to
the general code and simply just add the argumet's to their place and then
jump to this global code. The problem is that we need to check the stack
space before jumping and this checking is quite wordy and I would like to
keep the size if the inlined code small.

Therefore I suggest that always when we allocate stack space we take out 20
extra slots in the check, meaning that we do not need to check for these
slot's when the call.

At least that's what I plan to do in the native compilation.

Any other ideas?

/Stefan