[Pharo-users] Hapax/CodeFu changes and example script

Hernán Morales Durand Thu, 17 Jan 2013 08:03:19 -0800


Hi guys,

For those working in information retrieval, for example for doing td-idfranking, you can find adapted packages: "Hapax" and "CodeFu" in theBioSmalltalk repository http://ss3.gemstone.com/ss/BioSmalltalk.html . Ihave translated some VW specific code to Pharo 1.4 (under Windowsrequires the ProcessWrapper package) and adapted some Hapax methods towork with corpus in different languages.


This is an example script for a corpus in Spanish:

| corpus tdm documents |

corpus := HXSpanishCorpus new.

documents := 'el río Danubio pasa por Viena, su color es azul
el caudal de un río asciende en Invierno
el río Rhin y el río Danubio tienen mucho caudal
si un río es navegable, es porque tiene mucho caudal'.

documents lines doWithIndex: [: doc : index |
        corpus
                addDocument: index asString
                with: (Terms new
                        addString: doc
                        using: CamelcaseScanner;
                        yourself)].
corpus removeStopwords.
corpus stemAll.
tdm := TermDocumentMatrix on: corpus.

Feel free to integrate to any repository. If you want to add a languagejust see methods with selectors including "spanish".

Cheers,

Hernán

[Pharo-users] Hapax/CodeFu changes and example script

Reply via email to