I've made a Vapi for "Aho–Corasick string matching algorithm". Lib can be taken from here: http://sourceforge.net/projects/multifast/files/ Need modifications of Makefile to make shared lib: add line to the head: SONAME := libahocorasick.so.$(ACVERSION) add to CFLAGS: -fPIC add section: so: aho_corasick.o node.o gcc -shared -Wl,-soname,$(SONAME) -o $(SONAME) aho_corasick.o node.o ln -s -f $(SONAME) libahocorasick.so
run "make so".
What is the advantage on using this library:
Get substrings found in given string (for example: tags in text,
domains/keywords in uri, etc.)
My benchmarks on i7:
500 000 substrings: index time - 15 sec., in memory 1.7Gb, average key
length 32 chars.
10 000 keys check: overall time 0.048 sec.
Example of use with: --pkg ahocorasick -X -lahocorasick :
using AC;
public static int match_handler (Match m, void * param)
{
uint j;
for (j=0; j < m.match_num; j++)
{
stdout.printf("%ld ", m.position);
stdout.printf("%ld ", m.matched_strings[j].id);
stdout.printf("%s ", m.matched_strings[j].str);
stdout.printf("\n");
};
return 0; /* Find all matches */
}
public void main (string[] args)
{
var aca = AC.Automata (match_handler);
var str = AC.String () {
id = 1,
str = "test",
length = 4 // "str".length should be passed here
};
aca.add_string (str);
aca.build(); // this build an index, before it's done - search
can't be executed
var str = AC.String () {
str = "tes",
length = "tes".length
};
aca.search (str, null);
aca.reset(); // reset AC instance
}
ahocorasick.vapi
Description: Binary data
_______________________________________________ vala-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/vala-list
