Hi Jonathan; I have built Jena on Windows a few times, but never used it to run Jena (though I think I once started Fuseki 1 on Windows). But I believe it should work. Don't know if you have a deadline for your project, but even then you may find useful to spend some time going through Jena's documentation - http://jena.apache.org/
Maybe what you are looking for is Fuseki? It provides a web layer and SPARQL endpoint using Jena (on downloads, click on the link for Fuseki, not for Jena). Then take a look at http://jena.apache.org/documentation/fuseki2/index.html Spend some time reading about RDF, Reification, etc, even if you already know about the topics, as these notes may explain more about how Jena works and uses these concepts. Finally, on parsing UDF's, if I understand correctly, you are trying to apply a NLP algorithm on URL's used to identify resources in RDF. If that's the case, and if you want to use BoW or NER (CRF, MaxEnt, etc), you would probably be considering only a part of the URL? For example, for http://niwa.co.nz/tax#galaxias_aff_divergens_northern, you would extract just galaxias_aff_divergens_northern, getting "galaxias aff divergens northen" (which comes from Galaxias aff. divergens 'northern' - https://tad.niwa.co.nz/trs#trs/1727994/Galaxias aff. divergens 'northern'/summary, FWIW). Or you could implement a simple tokenizer that included the domain as well... If you used BoW, you could apply, for instance, cosine distance and find URL's that look similar. If you decide to use a NER classifier, you may need a bigger corpus (or many different corpora, depending on your data) to correctly classify the URL's. Not sure if that'd would work well for your assignment, probably BoW is the simplest approach. Hope that helps.Bruno From: Jonathan Camilleri <[email protected]> To: [email protected] Sent: Monday, 28 December 2015 9:05 PM Subject: Re: Parsing UDFs... I also need help figuring out whether Apache Jena can be installed on Windows, I have not yet quite managed to find a suitable installation guide which explains how to start-up and stop the service, or something of the sort, I just downloaded the bunch of files and I realized that they can be uncompressed. On 28 December 2015 at 09:04, Jonathan Camilleri <[email protected]> wrote: > I am trying to come up with an algorithm that parses and creates a machine > learning algorithm e.g. classifying URLs read from RDF files into > categories. > > The examples I have found so far were a bit limiting so I am asking if > there is any project that is worth mimicking. I have done some experiments > with Eclipse but they were not very complete so far, I am now stuck at > trying to understand what syntax to use to read particular parts of a UDF > file. > > I have read tutorials at W3C as well, they appear to provide information > on the file formats. > > Further reading > 1. https://en.wikipedia.org/wiki/Bag-of-words_model > 2. http://nlp.stanford.edu/software/CRF-NER.shtml > > See attachments. > > -- > Jonathan Camilleri > > Mobile (MT): ++356 7982 7113 > E-mail: [email protected] > Please consider your environmental responsibility before printing this > e-mail. > > I usually reply to emails within 2 business days. If it's urgent, give me > a call. > > -- Jonathan Camilleri Mobile (MT): ++356 7982 7113 E-mail: [email protected] Please consider your environmental responsibility before printing this e-mail. I usually reply to emails within 2 business days. If it's urgent, give me a call.
