Re: Parsing UDFs...

Bruno P. Kinoshita Mon, 28 Dec 2015 03:24:59 -0800

Hi Jonathan;

I have built Jena on Windows a few times, but never used it to run Jena (though 
I think I once started Fuseki 1 on Windows). But I believe it should work. 
Don't know if you have a deadline for your project, but even then you may find 
useful to spend some time going through Jena's documentation - 
http://jena.apache.org/

Maybe what you are looking for is Fuseki? It provides a web layer and SPARQL 
endpoint using Jena (on downloads, click on the link for Fuseki, not for Jena). 
Then take a look at http://jena.apache.org/documentation/fuseki2/index.html

Spend some time reading about RDF, Reification, etc, even if you already know 
about the topics, as these notes may explain more about how Jena works and uses 
these concepts.

Finally, on parsing UDF's, if I understand correctly, you are trying to apply a 
NLP algorithm on URL's used to identify resources in RDF.

If that's the case, and if you want to use BoW or NER (CRF, MaxEnt, etc), you 
would probably be considering only a part of the URL? For example, for 
http://niwa.co.nz/tax#galaxias_aff_divergens_northern, you would extract just 
galaxias_aff_divergens_northern, getting "galaxias aff divergens northen" 
(which comes from Galaxias aff. divergens 'northern' - 
https://tad.niwa.co.nz/trs#trs/1727994/Galaxias aff. divergens 
'northern'/summary, FWIW). Or you could implement a simple tokenizer that 
included the domain as well...

If you used BoW, you could apply, for instance, cosine distance and find URL's 
that look similar. If you decide to use a NER classifier, you may need a bigger 
corpus (or many different corpora, depending on your data) to correctly 
classify the URL's. Not sure if that'd would work well for your assignment, 
probably BoW is the simplest approach.

Hope that helps.Bruno 

      From: Jonathan Camilleri <[email protected]>
 To: [email protected] 
 Sent: Monday, 28 December 2015 9:05 PM
 Subject: Re: Parsing UDFs...

I also need help figuring out whether Apache Jena can be installed on
Windows, I have not yet quite managed to find a suitable installation guide
which explains how to start-up and stop the service, or something of the
sort, I just downloaded the bunch of files and I realized that they can be
uncompressed.

On 28 December 2015 at 09:04, Jonathan Camilleri <[email protected]>
wrote:

> I am trying to come up with an algorithm that parses and creates a machine
> learning algorithm e.g. classifying URLs read from RDF files into
> categories.
>
> The examples I have found so far were a bit limiting so I am asking if
> there is any project that is worth mimicking.  I have done some experiments
> with Eclipse but they were not very complete so far, I am now stuck at
> trying to understand what syntax to use to read particular parts of a UDF
> file.
>
> I have read tutorials at W3C as well, they appear to provide information
> on the file formats.
>
> Further reading
> 1. https://en.wikipedia.org/wiki/Bag-of-words_model
> 2. http://nlp.stanford.edu/software/CRF-NER.shtml
>
> See attachments.
>
> --
> Jonathan Camilleri
>
> Mobile (MT): ++356 7982 7113
> E-mail: [email protected]
> Please consider your environmental responsibility before printing this
> e-mail.
>
> I usually reply to emails within 2 business days.  If it's urgent, give me
> a call.
>
>

-- 
Jonathan Camilleri

Mobile (MT): ++356 7982 7113
E-mail: [email protected]
Please consider your environmental responsibility before printing this
e-mail.

I usually reply to emails within 2 business days.  If it's urgent, give me
a call.

Re: Parsing UDFs...

Reply via email to