As Irene said, http://esw.w3.org/topic/ConverterToRdf is the best place
to start, but I thought I'd ramble a bit about some of the broader issues.
If the data to convert is in a file, as opposed to being delivered from
a server with an interface that you can write to (as D2RQ and OpenLink
do for relational data), then the first step is to parse the input, so
tools will be built around parsers for each input format.
Any modern programming language can parse CSV easily, and most tools
that advertise the ability to convert spreadsheets to RDF actually
expect CSV input. (TopQuadrant's tools can read binary Excel files. Full
disclosure: I work for them.)
When your input is XML (which can include HTML if you use TagSoup or
Tidy to clean it up), XSLT is a popular way to create triples. This is
the principle behind GRDDL
(http://www.w3.org/2004/01/rdxh/spechttp://www.w3.org/2004/01/rdxh/spec).
TopQuadrant also has a more general-purpose XML-to-RDF converter that
takes the structure of the input document into account so that it can
round-trip the RDF back to XML.
With plain text, something needs to identify structure within the text
so that it can work out what the subjects, predicates, and objects are,
and that structure depends on the needs of the application. (That
actually applies to CSV and XML as well, but commas and tags give you
more to go on if you understand the purpose of the input data.) Semweb
meetups are seeing more interest from the Natural Language Processing
community--I think the NYC semweb meetup actually has a subgroup of
people dedicated to NLP issues--so there could be more interesting work
coming from them in the future. Thomson Reuters Calais is the most
well-known example that comes to mind of a tool that takes plain text as
input and returns it with embedded RDF.
Bob
Alasdair Logan wrote:
Hey all,
I was wondering if anyone is familiar with tools to convert data into RDF triples and Linked Data. They can be for any data format i.e. XML, CSV, plain text etc.
Im doing this as part of a pilot study for my Master's project so i'm just
trying get a general view of any tools used.
Thanks in advance
Ally