Workflow for loading RDF graph data into Giraph
-----------------------------------------------

                 Key: GIRAPH-170
                 URL: https://issues.apache.org/jira/browse/GIRAPH-170
             Project: Giraph
          Issue Type: New Feature
            Reporter: Dan Brickley
            Priority: Minor


W3C RDF provides a family of Web standards for exchanging graph-based data. RDF 
uses sets of simple binary relationships, labeling nodes and links with Web 
identifiers (URIs). Many public datasets are available as RDF, including the 
"Linked Data" cloud (see http://richard.cyganiak.de/2007/10/lod/ ). Many such 
datasets are listed at http://thedatahub.org/

RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple 
line-oriented format is N-Triples. A format aligned with RDF's SPARQL query 
language is Turtle. Apache Jena and Any23 provide software to handle all these; 
http://incubator.apache.org/jena/ http://incubator.apache.org/any23/

This JIRA leaves open the strategy for loading RDF data into Giraph. There are 
various possibilites, including exploitation of intermediate Hadoop-friendly 
stores, or pre-processing with e.g. Pig-based tools into a more Giraph-friendly 
form, or writing custom loaders. Even a HOWTO document or implementor notes 
here would be an advance on the current state of the art. The BluePrints Graph 
API (Gremlin etc.) has also been aligned with various RDF datasources.

Related topics: multigraphs https://issues.apache.org/jira/browse/GIRAPH-141 
touches on the issue (since we can't currently easily represent fully general 
RDF graphs since two nodes might be connected by more than one typed edge). 
Even without multigraphs it ought to be possible to bring RDF-sourced data
into Giraph, e.g. perhaps some app is only interested in say the Movies + 
People subset of a big RDF collection.

>From Avery in email: "a helper VertexInputFormat (and maybe 
>VertexOutputFormat) would certainly [despite GIRAPH-141] still help"



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to