Can you provide more information on BoW? Is there some quick start guide? :)
Jon On 28 December 2015 at 12:20, Bruno P. Kinoshita <[email protected]> wrote: > Hi Jonathan; > > > I have built Jena on Windows a few times, but never used it to run Jena > (though I think I once started Fuseki 1 on Windows). But I believe it > should work. Don't know if you have a deadline for your project, but even > then you may find useful to spend some time going through Jena's > documentation - http://jena.apache.org/ > > > Maybe what you are looking for is Fuseki? It provides a web layer and > SPARQL endpoint using Jena (on downloads, click on the link for Fuseki, not > for Jena). Then take a look at > http://jena.apache.org/documentation/fuseki2/index.html > > > Spend some time reading about RDF, Reification, etc, even if you already > know about the topics, as these notes may explain more about how Jena works > and uses these concepts. > > > Finally, on parsing UDF's, if I understand correctly, you are trying to > apply a NLP algorithm on URL's used to identify resources in RDF. > > > If that's the case, and if you want to use BoW or NER (CRF, MaxEnt, etc), > you would probably be considering only a part of the URL? For example, for > http://niwa.co.nz/tax#galaxias_aff_divergens_northern > <http://niwa.co.nz/tax#galaxias_aff_divergens_northern.>, you would > extract just galaxias_aff_divergens_northern, getting "galaxias aff > divergens northen" (which comes from Galaxias aff. divergens 'northern' - > https://tad.niwa.co.nz/trs#trs/1727994/Galaxias aff. divergens > 'northern'/summary, FWIW). Or you could implement a simple tokenizer that > included the domain as well... > > > If you used BoW, you could apply, for instance, cosine distance and find > URL's that look similar. If you decide to use a NER classifier, you may > need a bigger corpus (or many different corpora, depending on your data) to > correctly classify the URL's. Not sure if that'd would work well for your > assignment, probably BoW is the simplest approach. > > > Hope that helps. > Bruno > > > ------------------------------ > *From:* Jonathan Camilleri <[email protected]> > *To:* [email protected] > *Sent:* Monday, 28 December 2015 9:05 PM > *Subject:* Re: Parsing UDFs... > > I also need help figuring out whether Apache Jena can be installed on > Windows, I have not yet quite managed to find a suitable installation guide > which explains how to start-up and stop the service, or something of the > sort, I just downloaded the bunch of files and I realized that they can be > uncompressed. > > On 28 December 2015 at 09:04, Jonathan Camilleri <[email protected]> > wrote: > > > I am trying to come up with an algorithm that parses and creates a > machine > > learning algorithm e.g. classifying URLs read from RDF files into > > categories. > > > > The examples I have found so far were a bit limiting so I am asking if > > there is any project that is worth mimicking. I have done some > experiments > > with Eclipse but they were not very complete so far, I am now stuck at > > trying to understand what syntax to use to read particular parts of a UDF > > file. > > > > I have read tutorials at W3C as well, they appear to provide information > > on the file formats. > > > > Further reading > > 1. https://en.wikipedia.org/wiki/Bag-of-words_model > > 2. http://nlp.stanford.edu/software/CRF-NER.shtml > > > > See attachments. > > > > -- > > Jonathan Camilleri > > > > Mobile (MT): ++356 7982 7113 > > E-mail: [email protected] > > Please consider your environmental responsibility before printing this > > e-mail. > > > > I usually reply to emails within 2 business days. If it's urgent, give > me > > a call. > > > > > > > > -- > Jonathan Camilleri > > Mobile (MT): ++356 7982 7113 > E-mail: [email protected] > Please consider your environmental responsibility before printing this > e-mail. > > I usually reply to emails within 2 business days. If it's urgent, give me > a call. > > > -- Jonathan Camilleri Mobile (MT): ++356 7982 7113 E-mail: [email protected] Please consider your environmental responsibility before printing this e-mail. I usually reply to emails within 2 business days. If it's urgent, give me a call.
