Can you provide more information on BoW?
Is there some quick start guide? :)

Jon

On 28 December 2015 at 12:20, Bruno P. Kinoshita <[email protected]> wrote:

> Hi Jonathan;
>
>
> I have built Jena on Windows a few times, but never used it to run Jena
> (though I think I once started Fuseki 1 on Windows). But I believe it
> should work. Don't know if you have a deadline for your project, but even
> then you may find useful to spend some time going through Jena's
> documentation - http://jena.apache.org/
>
>
> Maybe what you are looking for is Fuseki? It provides a web layer and
> SPARQL endpoint using Jena (on downloads, click on the link for Fuseki, not
> for Jena). Then take a look at
> http://jena.apache.org/documentation/fuseki2/index.html
>
>
> Spend some time reading about RDF, Reification, etc, even if you already
> know about the topics, as these notes may explain more about how Jena works
> and uses these concepts.
>
>
> Finally, on parsing UDF's, if I understand correctly, you are trying to
> apply a NLP algorithm on URL's used to identify resources in RDF.
>
>
> If that's the case, and if you want to use BoW or NER (CRF, MaxEnt, etc),
> you would probably be considering only a part of the URL? For example, for
> http://niwa.co.nz/tax#galaxias_aff_divergens_northern
> <http://niwa.co.nz/tax#galaxias_aff_divergens_northern.>, you would
> extract just galaxias_aff_divergens_northern, getting "galaxias aff
> divergens northen" (which comes from Galaxias aff. divergens 'northern' -
> https://tad.niwa.co.nz/trs#trs/1727994/Galaxias aff. divergens
> 'northern'/summary, FWIW). Or you could implement a simple tokenizer that
> included the domain as well...
>
>
> If you used BoW, you could apply, for instance, cosine distance and find
> URL's that look similar. If you decide to use a NER classifier, you may
> need a bigger corpus (or many different corpora, depending on your data) to
> correctly classify the URL's. Not sure if that'd would work well for your
> assignment, probably BoW is the simplest approach.
>
>
> Hope that helps.
> Bruno
>
>
> ------------------------------
> *From:* Jonathan Camilleri <[email protected]>
> *To:* [email protected]
> *Sent:* Monday, 28 December 2015 9:05 PM
> *Subject:* Re: Parsing UDFs...
>
> I also need help figuring out whether Apache Jena can be installed on
> Windows, I have not yet quite managed to find a suitable installation guide
> which explains how to start-up and stop the service, or something of the
> sort, I just downloaded the bunch of files and I realized that they can be
> uncompressed.
>
> On 28 December 2015 at 09:04, Jonathan Camilleri <[email protected]>
> wrote:
>
> > I am trying to come up with an algorithm that parses and creates a
> machine
> > learning algorithm e.g. classifying URLs read from RDF files into
> > categories.
> >
> > The examples I have found so far were a bit limiting so I am asking if
> > there is any project that is worth mimicking.  I have done some
> experiments
> > with Eclipse but they were not very complete so far, I am now stuck at
> > trying to understand what syntax to use to read particular parts of a UDF
> > file.
> >
> > I have read tutorials at W3C as well, they appear to provide information
> > on the file formats.
> >
> > Further reading
> > 1. https://en.wikipedia.org/wiki/Bag-of-words_model
> > 2. http://nlp.stanford.edu/software/CRF-NER.shtml
> >
> > See attachments.
> >
> > --
> > Jonathan Camilleri
> >
> > Mobile (MT): ++356 7982 7113
> > E-mail: [email protected]
> > Please consider your environmental responsibility before printing this
> > e-mail.
> >
> > I usually reply to emails within 2 business days.  If it's urgent, give
> me
> > a call.
>
> >
> >
>
>
> --
> Jonathan Camilleri
>
> Mobile (MT): ++356 7982 7113
> E-mail: [email protected]
> Please consider your environmental responsibility before printing this
> e-mail.
>
> I usually reply to emails within 2 business days.  If it's urgent, give me
> a call.
>
>
>


-- 
Jonathan Camilleri

Mobile (MT): ++356 7982 7113
E-mail: [email protected]
Please consider your environmental responsibility before printing this
e-mail.

I usually reply to emails within 2 business days.  If it's urgent, give me
a call.

Reply via email to