Actually at my company, we do a lot of NLP work and we've ended up
using bespoke formats, formerly a FeatureStructure serialized to JSON,
but most recently in protobufs. Possibly not the answer you were
looking for, Otis, but at least it's a datapoint.
Michael Della Bitta
Otis,
If you are doing Named Entity Recognition, you may want to look at the
research area concerned with Named Entity Recognition. :-) In general,
there is inline markup and standoff markup. You seem to be going for
standoff/stand-alone markup. I am not clear though whether it is just
'discovery'
is Gospodnetic
> Sent: Tuesday, September 11, 2012 11:51 AM
> To: solr-user@lucene.apache.org
> Subject: Semantic document format... standards?
>
> Hello,
>
> If I'm extracting named entities, topics, key phrases/tags, etc. from
> documents and I want to have a representation
ssage-
From: Otis Gospodnetic
Sent: Tuesday, September 11, 2012 11:51 AM
To: solr-user@lucene.apache.org
Subject: Semantic document format... standards?
Hello,
If I'm extracting named entities, topics, key phrases/tags, etc. from
documents and I want to have a representation of this docum
As Michael hinted, I believe RDF would be the de-factor answer.
Within it, things such as OWL or SKOS certainly represent classical formats.
Processors such as OWLAPI can go pretty far there.
As Péter hinted, schema.org might provide a way to complement an existing XML
with semantic information.
Hi,
I guess the most common format today is using the schema.org's ontologies.
It provides a couple of definitions, and it is supported by big players,
such as Google, Yahoo, Microsoft. See http://schema.org/.
Hope it helps,
Péter
wrote:
> > Hello,
> >
> > If I'm extracting named entities, top
I'm probably a little unclear about the breadth of what you want to
do, but I would recommend DC at the extremely lightweight end, and TEI
at the very heavyweight end. Perhaps you could come up with a mashup
of DC and your own fields in RDF as well.
Michael Della Bitta
---