Have you considered Apache Solr/Lucerne? It may be a standard text classification problem for that technology.
Sent from my iPhone > On 28 Dec 2015, at 6:04 PM, Jonathan Camilleri <[email protected]> > wrote: > > I am trying to come up with an algorithm that parses and creates a machine > learning algorithm e.g. classifying URLs read from RDF files into categories. > > The examples I have found so far were a bit limiting so I am asking if there > is any project that is worth mimicking. I have done some experiments with > Eclipse but they were not very complete so far, I am now stuck at trying to > understand what syntax to use to read particular parts of a UDF file. > > I have read tutorials at W3C as well, they appear to provide information on > the file formats. > > Further reading > 1. https://en.wikipedia.org/wiki/Bag-of-words_model > 2. http://nlp.stanford.edu/software/CRF-NER.shtml > > See attachments. > > -- > Jonathan Camilleri > > Mobile (MT): ++356 7982 7113 > E-mail: [email protected] > Please consider your environmental responsibility before printing this e-mail. > > I usually reply to emails within 2 business days. If it's urgent, give me a > call. > > > <ics_5111_dataset.zip> > <assignment-reading the udf.docx>
