Re: Semantic document format... standards?

2012-09-12 Thread Michael Della Bitta
Actually at my company, we do a lot of NLP work and we've ended up using bespoke formats, formerly a FeatureStructure serialized to JSON, but most recently in protobufs. Possibly not the answer you were looking for, Otis, but at least it's a datapoint. Michael Della Bitta

Re: Semantic document format... standards?

2012-09-12 Thread Alexandre Rafalovitch
Otis, If you are doing Named Entity Recognition, you may want to look at the research area concerned with Named Entity Recognition. :-) In general, there is inline markup and standoff markup. You seem to be going for standoff/stand-alone markup. I am not clear though whether it is just 'discovery'

Re: Semantic document format... standards?

2012-09-11 Thread Paul Libbrecht
is Gospodnetic > Sent: Tuesday, September 11, 2012 11:51 AM > To: solr-user@lucene.apache.org > Subject: Semantic document format... standards? > > Hello, > > If I'm extracting named entities, topics, key phrases/tags, etc. from > documents and I want to have a representation

Re: Semantic document format... standards?

2012-09-11 Thread Jack Krupansky
ssage- From: Otis Gospodnetic Sent: Tuesday, September 11, 2012 11:51 AM To: solr-user@lucene.apache.org Subject: Semantic document format... standards? Hello, If I'm extracting named entities, topics, key phrases/tags, etc. from documents and I want to have a representation of this docum

Re: Semantic document format... standards?

2012-09-11 Thread Paul Libbrecht
As Michael hinted, I believe RDF would be the de-factor answer. Within it, things such as OWL or SKOS certainly represent classical formats. Processors such as OWLAPI can go pretty far there. As Péter hinted, schema.org might provide a way to complement an existing XML with semantic information.

Re: Semantic document format... standards?

2012-09-11 Thread Péter Király
Hi, I guess the most common format today is using the schema.org's ontologies. It provides a couple of definitions, and it is supported by big players, such as Google, Yahoo, Microsoft. See http://schema.org/. Hope it helps, Péter wrote: > > Hello, > > > > If I'm extracting named entities, top

Re: Semantic document format... standards?

2012-09-11 Thread Michael Della Bitta
I'm probably a little unclear about the breadth of what you want to do, but I would recommend DC at the extremely lightweight end, and TEI at the very heavyweight end. Perhaps you could come up with a mashup of DC and your own fields in RDF as well. Michael Della Bitta ---