Re: Strange behaviour of XMLLiterals in RDF/XML
On Mon, Jun 25, 2012 at 12:57 PM, Martynas Jusevičius marty...@graphity.org wrote: Both br/ and br/br are well-formed and equivalent in the XML context, so why the difference in serialization? I'm using Jena 2.6.4 and ARQ 2.8.7. Martynas graphity.org Back in the bad old days br/ was not parsed correctly by some parsers and had to be written as br / (note the space). If you try that substitution does it work? Claude -- I like: Like Like - The likeliest place on the webhttp://like-like.xenei.com Identity: https://www.identify.nu/user.php?cla...@xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren
Re: Can SDB 1.3.4 be used with Jena 2.7.1?
Have you tried TDB? I think it's currently more actively developed and more performant. On Jun 26, 2012 1:55 AM, Holger Knublauch hol...@knublauch.com wrote: Yes this is my hope, assuming SDB still works for us. Holger On 6/26/2012 9:46, Martynas Jusevičius wrote: Holger, does that also mean a new release of SPIN API which will be packaged with the latest Jena? Martynas On Tue, Jun 26, 2012 at 1:33 AM, Holger Knublauch hol...@knublauch.com wrote: We are now starting the process of upgrading our platform to the latest Jena version(s). I noticed that SDB has not been released yet as an Apache module. Question: is it safe to use SDB 1.3.4 in conjunction with Jena 2.7.1? Apologies if this has been asked before. Thanks Holger
Re: Can SDB 1.3.4 be used with Jena 2.7.1?
Yes sure, but this doesn't help with our existing customer base of SDB users. A good reason for using SDB is the use of standard tools to back-up your data etc. Holger On 6/26/2012 17:41, Martynas Jusevičius wrote: Have you tried TDB? I think it's currently more actively developed and more performant. On Jun 26, 2012 1:55 AM, Holger Knublauch hol...@knublauch.com wrote: Yes this is my hope, assuming SDB still works for us. Holger On 6/26/2012 9:46, Martynas Jusevičius wrote: Holger, does that also mean a new release of SPIN API which will be packaged with the latest Jena? Martynas On Tue, Jun 26, 2012 at 1:33 AM, Holger Knublauch hol...@knublauch.com wrote: We are now starting the process of upgrading our platform to the latest Jena version(s). I noticed that SDB has not been released yet as an Apache module. Question: is it safe to use SDB 1.3.4 in conjunction with Jena 2.7.1? Apologies if this has been asked before. Thanks Holger
Re: Reading JSON from Virtuoso OpenSource output
#include everything Rob says about CONSTRUCT queries. 1/ But also the JSON results set parser has a bug in it - it is reading the link field as a string, but it should be an array. This is now fixed in SVN. The development snapshot build has the fix in it. https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena/ 2/ But I think the data has problems as well: For example: s: { type: uri , value: _:vb43419 } That looks like it is meant to be a bNode not a URI. It is illegal as a URI because there is no _ URI scheme name and scheme names are only letters, digits, plus (+), period (.), or hyphen (-). s: { type: bnode , value: _:vb43419 } Andy PS For testing, ARQ has a command line tool arq.rset for reading and writing result sets. On 26/06/12 00:35, Rob Vesse wrote: Hi Lorena JenaReaderRdfJson is for reading a JSON serialization of RDF. The serialization you are trying to read is the JSON serialization of SPARQL Results which is completely different. I notice you say that you use a CONSTRUCT query but the results you show are the SPARQL Results JSON format which should only be used for ASK/SELECT queries. If Virtuoso is replying with that to your CONSTRUCT query then they are behaving incorrectly and you should report a bug to them. If you genuinely expect SPARQL results instead then use ResultSetFactory.fromJSON() which will give you a ResultSet object. Rob On 6/25/12 3:14 PM, lorena lore...@fing.edu.uy wrote: Hi: I'm trying to process the results of performing a CONSTRUCT query on Virtuoso using apache-jena-2.7.0-incubating [1] shows the JSON String I would like to read (schemaStr). Here is an extract of my code: SysRIOT.wireIntoJena(); Model modelSchema = ModelFactory.createDefaultModel(); RDFReader schemaReader = new JenaReaderRdfJson() ; StringReader s = new StringReader(schemaStr); schemaReader.read(modelSchema, s, ); And I receive the following exception caused in the line that executes the read: com.hp.hpl.jena.shared.JenaException: org.openjena.riot.RiotException: [line: 2, col: 3 ] Relative IRI: head at org.openjena.riot.system.JenaReaderRIOT.readImpl(JenaReaderRIOT.java:150) at org.openjena.riot.system.JenaReaderRIOT.read(JenaReaderRIOT.java:54) It seems to have trouble reading the head section. My questions: Is Virtuoso JSON output compatible with what JenaReaderRdfJson expects to read? Am I missing something else? I'm using the empty string () as base URI in the read method, but I don't understand what is the read method expecting in this field. Thanks in advance Lorena [1] { head: { link: [], vars: [ s, p, o ] }, results: { distinct: false, ordered: true, bindings: [ { s: { type : uri, value : _:vb43419 } , p: { type : uri, value : http://purl.org/olap#hasAggregateFunction; }, o: { type : uri, value : http://purl.org/olap#sum; }}, { s: { type : uri, value : _:vb43418 } , p: { type : uri, value : http://purl.org/olap#level; } , o: { type : uri, value : http://example.org/householdCS#year; }}, { s: { type : uri, value : http://example.org/householdCS#household_withoutGeo; } , p: { type : uri, value : http://purl.org/linked-data/cube#component; } , o: { type : uri, value : _:vb43418 }}, { s: { type : uri, value : http://example.org/householdCS#household_withoutGeo; } , p: { type : uri, value : http://purl.org/linked-data/cube#component; } , o: { type : uri, value : _:vb43419 }}, { s: { type : uri, value : _:vb43419 } , p: { type : uri, value : http://purl.org/linked-data/cube#measure; } , o: { type : uri, value : http://example.org/householdCS#household; }}, { s: { type : uri, value : http://example.org/householdCS#householdCS; }, p: { type : uri, value : http://www.w3.org/1999/02/22-rdf-syntax-ns#type; } , o: { type : uri, value : http://purl.org/linked-data/cube#DataStructureDefinition; }} ] } }
Re: Can SDB 1.3.4 be used with Jena 2.7.1?
Yes (it's 1.3.4-SNAPSHOT) SDB is being built against Jena each night. You can check the POM for version - it says 2.7.2-SNAPSHOT but there are no changes from 2.7.1. Andy PS Your next question will be about a release. We need a way to test SDB on all, or at least most, of the databases supported. Can you help? On 26/06/12 08:46, Holger Knublauch wrote: Yes sure, but this doesn't help with our existing customer base of SDB users. A good reason for using SDB is the use of standard tools to back-up your data etc. Holger On 6/26/2012 17:41, Martynas Jusevičius wrote: Have you tried TDB? I think it's currently more actively developed and more performant. On Jun 26, 2012 1:55 AM, Holger Knublauch hol...@knublauch.com wrote: Yes this is my hope, assuming SDB still works for us. Holger On 6/26/2012 9:46, Martynas Jusevičius wrote: Holger, does that also mean a new release of SPIN API which will be packaged with the latest Jena? Martynas On Tue, Jun 26, 2012 at 1:33 AM, Holger Knublauch hol...@knublauch.com wrote: We are now starting the process of upgrading our platform to the latest Jena version(s). I noticed that SDB has not been released yet as an Apache module. Question: is it safe to use SDB 1.3.4 in conjunction with Jena 2.7.1? Apologies if this has been asked before. Thanks Holger
How to convert assign URL to blank node?
How can I assign an URI to a blank node? The Resource class only provides getURI() or getId() methods, but the URI can't be set. Do I have to create a new Resource, copy all properties and delete the original node?
Re: How to convert assign URL to blank node?
On 26/06/12 01:30, franswors...@googlemail.com wrote: How can I assign an URI to a blank node? The Resource class only provides getURI() or getId() methods, but the URI can't be set. Do I have to create a new Resource, copy all properties and delete the original node? Yes, you create a new resource. Resources are immutable - you can't modify them after creation. Andy
Re: Property not removed?
Note that the statement seems to point to a bNode so a simple remove is probably not enough anyway. Dave, can you elaborate on this a little? What I'm trying to do, is to replace such ORDER BY expression sp:orderBy ([ a sp:Desc ; sp:expression :TriplesVar ]) with my own -- say change :Triples var into another node, or sp:Desc to sp:Asc. Martynas
Re: Property not removed?
On 26/06/12 09:56, Martynas Jusevičius wrote: Note that the statement seems to point to a bNode so a simple remove is probably not enough anyway. Dave, can you elaborate on this a little? I just meant that the contents of the bNode would remain. For some purposes that might be a problem. I guess the likelihood is that SPIN won't care and a few bits of no-longer-connected bNodes lying around in the model would do no harm. Dave
Re: How to convert assign URL to blank node?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 26/06/12 09:20, Andy Seaborne wrote: You can use ResourceUtils.renameResource(oldResource, uri) [1] to achieve the same effect. Behind the scenes this removes old statements using oldResource and makes new ones with uri. Damian [1] http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/util/ResourceUtils.html -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk/pfYcACgkQAyLCB+mTtynRqgCfW8H0ZvWhjCKfa+FawTl0Gq83 WlAAmwYf+tWUhAukVEunHyDTE60x+AeW =KF6y -END PGP SIGNATURE-
Re: How to convert assign URL to blank node?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 26/06/12 09:20, Andy Seaborne wrote: On 26/06/12 01:30, franswors...@googlemail.com wrote: How can I assign an URI to a blank node? The Resource class only provides getURI() or getId() methods, but the URI can't be set. Do I have to create a new Resource, copy all properties and delete the original node? Yes, you create a new resource. Resources are immutable - you can't modify them after creation. You can use ResourceUtils.renameResource(oldResource, uri) [1] to achieve the same effect. Behind the scenes this removes old statements using oldResource and makes new ones with uri. Damian [1] http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/util/ResourceUtils.html -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk/pfYgACgkQAyLCB+mTtyldTQCgsEH+RrgHeXhntyRjKw+M/JoD eoIAoM4CbsZoSZjrmD9AQAgEaJghmcdB =548K -END PGP SIGNATURE-
Re: memory issues using TDB from servlet
a few questions and then a a suggestion: How much physical RAM does the machine have? Which version of the Jena software is this? Is this running on MS Windows? If you are on 64 bit hardware, then TDB uses out-of-heap memory as well as heap memory. But what I am most suspicious of is Dataset dataset = getDataset(); ... dataset.close(); which seems to opening the database on every call which may be the cause of your problems. You may have many copies of the in-RAM datastructres (especially on 64-bit Windows which does not release mnamory mapped segments during the lifetime of the JVM - (in)famous Java bug). You should open the database once at start up, do not close it when a request is finished. With transactions, you can get away with not closing it at all but to be neat, close at shutdown if you like. Otherwise, could you turn this into a standalone test case that simulates your set up but runs outside Spring so we can debug it? Andy On 25/06/12 21:55, Stephan Zednik wrote: I have been having memory issues using TDB from a java servlet. Memory usage by tomcat increases until the service becomes unresponsive and must be restarted. The service operations appear to be completing successfully until the service becomes unresponsive. The memory usage will rapidly rise to whatever my heap max size (CATALINA_OPTS=-Xms512m -Xmx4096m) or what my available RAM can hold before the service becomes unresponsive. Generally during testing that has been 1.5-1.6 GB before my RAM is full up. I have a fairly simple set of unit tests, it does not have full coverage but what tests I do have all pass. I am using Spring web. Below is my Spring Controller class, it asks the application context for a Dataset which causes Spring to invoke DatasetFactoryBean.getObject(). The DatasetFactoryBean is a singleton has been initialized with location to my TDB dataset. The controller method is fairly simple. A post request contains XML payload. The payload is passed to a service method that parses the XML and generates an RDF representation of the input data, encoded as RDF and stored in an in-memory jena model. AnalysisSettings is a class that acts as a proxy to the Jena Model with methods for manipulating/accessing the encoded RDF. I have commented out the TDB-related code and tested both the xml parsing and xml parsing + in-memory rdf. Service memory usage slowly grows to a level I am unhappy with (~1GB according to ActivityMonitor.app and VisualVM), but does stabilize. Since it stabilizes and grows slowly I do not think it is the main culprit of my current memory problem. If I test the TDB Dataset creation code, but leave all queries run against the TDB dataset commented out, memory usage grows much quicker to the 1.5 GB range before my RAM is full and the service becomes unresponsive. My tests against the deployed servlet are to make 1000 requests against the service. I check the response of each request to ensure it succeeded and wait 10 ms before sending the next request. Wait between runs of the test suite is around 6 seconds. When TDB Dataset connections are made (but no queries are run), the service will become unresponsive within the 3rd of 4th run of the test suite, so somewhere in the 4k-5k request range. Is this an unreasonable test suite? Perhaps I need to adjust my tomcat configuration? I am using the default except for -Xms and -Xmx. Here are the relevant methods from my controller class public class AnalysisSettingsController implements ApplicationContextAware { // private vars ... private Dataset getDataset() { return (Dataset) context.getBean(dataset); } @RequestMapping(value=/test, method = RequestMethod.POST, consumes = {application/xml, text/xml}) public void test(HttpServletRequest request, HttpServletResponse response) throws IOException { logger.info(in create(...)); OntModel m = ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM); try { AnalysisSettings settings = service.load(m, request.getInputStream()); // creates rdf representation of input, stores in in-memory model (m) try { String id = settings.getIdentifier(); String location = request.getRequestURL().toString()+/report/+id; response.setHeader(Location, location); Dataset dataset = getDataset(); logger.info(dataset connection opened); try { /* commented out during testing if(service.has(dataset, id)) { response.setStatus(HttpServletResponse.SC_FOUND);
Re: Queries with Multiple Aggregates in Select
On 25/06/12 23:01, Stephen Allen wrote: All, I have a question about what the expected results are of a query with multiple aggregates when there are no matching solutions, specifically if one of them is COUNT. Take the following query for example: PREFIX books: http://example.org/book/ PREFIX dc: http://purl.org/dc/elements/1.1/ select (count(?b) as ?bookCount) (min(?title) as ?firstBook) where { ?b dc:title ?title } If you run it against the books database on sparql.org you get: (?bookCount ?firstBook) { (7^^http://www.w3.org/2001/XMLSchema#integer Harry Potter and the Chamber of Secrets) } However, running it against an empty triple store (or s/dc:title/dc:title2) brings back a resultset consisting of a single row with both variables unbound. Intuitively, I would expect that you should instead get back a single binding like: (?bookCount ?firstBook) { (0^^http://www.w3.org/2001/XMLSchema#integer UNDEF) } Does anyone know if this behavior expected? I'm running against Fuseki 0.2.2. There's a bug when the second aggregate evals to an error on zero rows. If you reverse the select expressions you'll see a difference. select (min(?title) as ?firstBook) (count(?b) as ?bookCount) Then you get what I expect - a zero and an unbound variable (min is undefined). So you get one row (for the aggregates) and bound variable (count). Fixed in SVN. Andy -Stephen
Re: Want to run SPARQL Query with Hadoop Map Reduce Framework
Right now I am only using DBPedia, Geoname and NYTimes for LOD cloud. And later on I want to extend my dataset. By the way, yes, I can use sparql directly to collect my required statistics but my assumption is using Hadoop could give me some boosting in collecting those stat. Sincerely Md Mizanur Hello Md, The Revelytix Spinner product supports SPARQL in Hadoop if you're interested (SPARQL translated to map/reduce jobs). To fully use the parallelism of Hadoop you would need to import all of the data. You might also find that just using Spinner outside of Hadoop, simple federation via SERVICE extension might be sufficient and that is also supported. http://www.revelytix.com/content/download-spinner Alex Miller
Re: memory issues using TDB from servlet
On Jun 26, 2012, at 4:24 AM, Andy Seaborne wrote: a few questions and then a a suggestion: How much physical RAM does the machine have? 4 GB Which version of the Jena software is this? 0.9.0-incubating (set via Maven) Is this running on MS Windows? Mac OSX 10.7.4 If you are on 64 bit hardware, then TDB uses out-of-heap memory as well as heap memory. I am on 64bit software. But what I am most suspicious of is Dataset dataset = getDataset(); ... dataset.close(); Ah. I thought opening a Dataset was like opening a JDBC connection, and I could consequently open and close Datasets as needed. which seems to opening the database on every call which may be the cause of your problems. You may have many copies of the in-RAM datastructres (especially on 64-bit Windows which does not release mnamory mapped segments during the lifetime of the JVM - (in)famous Java bug). You should open the database once at start up, do not close it when a request is finished. With transactions, you can get away with not closing it at all but to be neat, close at shutdown if you like. OK, I will modify DatasetFactoryBean return only one instance of Dataset and add logic to shutdown the Dataset at servlet close. Otherwise, could you turn this into a standalone test case that simulates your set up but runs outside Spring so we can debug it? Taking it outside of Spring would require a great deal of refactoring, it would be easier to send my full project (built via maven). First though, I will make the change suggested above and report back to the list. --Stephan Andy On 25/06/12 21:55, Stephan Zednik wrote: I have been having memory issues using TDB from a java servlet. Memory usage by tomcat increases until the service becomes unresponsive and must be restarted. The service operations appear to be completing successfully until the service becomes unresponsive. The memory usage will rapidly rise to whatever my heap max size (CATALINA_OPTS=-Xms512m -Xmx4096m) or what my available RAM can hold before the service becomes unresponsive. Generally during testing that has been 1.5-1.6 GB before my RAM is full up. I have a fairly simple set of unit tests, it does not have full coverage but what tests I do have all pass. I am using Spring web. Below is my Spring Controller class, it asks the application context for a Dataset which causes Spring to invoke DatasetFactoryBean.getObject(). The DatasetFactoryBean is a singleton has been initialized with location to my TDB dataset. The controller method is fairly simple. A post request contains XML payload. The payload is passed to a service method that parses the XML and generates an RDF representation of the input data, encoded as RDF and stored in an in-memory jena model. AnalysisSettings is a class that acts as a proxy to the Jena Model with methods for manipulating/accessing the encoded RDF. I have commented out the TDB-related code and tested both the xml parsing and xml parsing + in-memory rdf. Service memory usage slowly grows to a level I am unhappy with (~1GB according to ActivityMonitor.app and VisualVM), but does stabilize. Since it stabilizes and grows slowly I do not think it is the main culprit of my current memory problem. If I test the TDB Dataset creation code, but leave all queries run against the TDB dataset commented out, memory usage grows much quicker to the 1.5 GB range before my RAM is full and the service becomes unresponsive. My tests against the deployed servlet are to make 1000 requests against the service. I check the response of each request to ensure it succeeded and wait 10 ms before sending the next request. Wait between runs of the test suite is around 6 seconds. When TDB Dataset connections are made (but no queries are run), the service will become unresponsive within the 3rd of 4th run of the test suite, so somewhere in the 4k-5k request range. Is this an unreasonable test suite? Perhaps I need to adjust my tomcat configuration? I am using the default except for -Xms and -Xmx. Here are the relevant methods from my controller class public class AnalysisSettingsController implements ApplicationContextAware { // private vars ... private Dataset getDataset() { return (Dataset) context.getBean(dataset); } @RequestMapping(value=/test, method = RequestMethod.POST, consumes = {application/xml, text/xml}) public void test(HttpServletRequest request, HttpServletResponse response) throws IOException { logger.info(in create(...)); OntModel m = ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM); try { AnalysisSettings settings = service.load(m, request.getInputStream()); // creates rdf representation of input, stores in in-memory model (m) try {
Correct SPARQL query for all information for particular Individual
Hi Everyone, Having produced a subset of a rather large ontology I'm now attempting to write a SPARQL query to retrieve all attributes of any given individuals if the name matches. An example of one of my NamedIndividuals is below !-- http://www.buildingsmart-tech.org/ifcXML/IFC2x3/FINAL/IFC2X3_subset.owl#IfcBeam -- NamedIndividual rdf:about=IFC2X3_subset;IfcBeam rdf:type Restriction onProperty rdf:resource=IFC2X3_subset;hasSubstitutionGroup/ allValuesFrom rdf:resource=IFC2X3_subset;IfcBuildingElement/ /Restriction /rdf:type rdf:type Restriction onProperty rdf:resource=IFC2X3_subset;hasNillableValue/ allValuesFrom rdf:resource=xsd;boolean/ /Restriction /rdf:type IFC2X3_subset:hasComplexTypeName rdf:datatype=xsd;NameIfcBeam/IFC2X3_subset:hasComplexTypeName IFC2X3_subset:hasExtensionBase rdf:datatype=rdfs;Literalifc:IfcBuildingElement/IFC2X3_subset:hasExtensionBase IFC2X3_subset:hasNillableValue rdf:datatype=xsd;booleantrue/IFC2X3_subset:hasNillableValue IFC2X3_subset:hasName rdf:resource=IFC2X3_subset;IfcBeam/ IFC2X3_subset:isOfType rdf:resource=IFC2X3_subset;IfcBeam/ /NamedIndividual Now say my query matched the rdf:About= and the IFC2X3_subset:hasName rdf:resource=IFC2X3_subset;IfcBeam out of all of the individuals I have persisted, I would like all the information above to be returned within the response... Any help would be greatly appreciated. Thank you very much in advance Lewis -- Lewis
Re: Correct SPARQL query for all information for particular Individual
I don't do RDF/XML :-) but this may achieve what you want. PREFIX DESCRIBE ?x { ?x IFC2X3_subset:hasName IFC2X3_subset:IfcBeam } The default for DESCRIBE is the bNode closure. If you know the structure, then you can use CONSTRUCT or extract into variables with SELECT. Andy On 26/06/12 18:14, Lewis John Mcgibbney wrote: Hi Everyone, Having produced a subset of a rather large ontology I'm now attempting to write a SPARQL query to retrieve all attributes of any given individuals if the name matches. An example of one of my NamedIndividuals is below !-- http://www.buildingsmart-tech.org/ifcXML/IFC2x3/FINAL/IFC2X3_subset.owl#IfcBeam -- NamedIndividual rdf:about=IFC2X3_subset;IfcBeam rdf:type Restriction onProperty rdf:resource=IFC2X3_subset;hasSubstitutionGroup/ allValuesFrom rdf:resource=IFC2X3_subset;IfcBuildingElement/ /Restriction /rdf:type rdf:type Restriction onProperty rdf:resource=IFC2X3_subset;hasNillableValue/ allValuesFrom rdf:resource=xsd;boolean/ /Restriction /rdf:type IFC2X3_subset:hasComplexTypeName rdf:datatype=xsd;NameIfcBeam/IFC2X3_subset:hasComplexTypeName IFC2X3_subset:hasExtensionBase rdf:datatype=rdfs;Literalifc:IfcBuildingElement/IFC2X3_subset:hasExtensionBase IFC2X3_subset:hasNillableValue rdf:datatype=xsd;booleantrue/IFC2X3_subset:hasNillableValue IFC2X3_subset:hasName rdf:resource=IFC2X3_subset;IfcBeam/ IFC2X3_subset:isOfType rdf:resource=IFC2X3_subset;IfcBeam/ /NamedIndividual Now say my query matched the rdf:About= and the IFC2X3_subset:hasName rdf:resource=IFC2X3_subset;IfcBeam out of all of the individuals I have persisted, I would like all the information above to be returned within the response... Any help would be greatly appreciated. Thank you very much in advance Lewis
Re: Want to run SPARQL Query with Hadoop Map Reduce Framework
Md. Mizanur Rahoman wrote: Hi Paolo, Thanks for your reply. Right now I am only using DBPedia, Geoname and NYTimes for LOD cloud. And later on I want to extend my dataset. Ok, so it's big, but not huge! ;-) If you have enough RAM you can do everything on a single machine. By the way, yes, I can use sparql directly to collect my required statistics but my assumption is using Hadoop could give me some boosting in collecting those stat. Well, it all depends if you already have an Hadoop cluster you can use. If not, a single machine with a lot of RAM might be easier/faster/better. I will knock you after going through your links. Sure, let me know how it goes. Paolo - Sincerely Md Mizanur On Tue, Jun 26, 2012 at 12:50 AM, Paolo Castagna castagna.li...@googlemail.com wrote: Hi Mizanur, when you have big RDF datasets, it might make sense to use MapReduce (but only if you already have an Hadoop cluster at hand. Is this your case?). You say that your data is 'huge', just for the sake of curiosity... how many triples/quads is 'huge'? ;-) Most of the use cases I've seen related to statistics on RDF datasets were trivial MapReduce jobs. For a couple of examples on using MapReduce with RDF datasets have a look here: https://github.com/castagna/jena-grande https://github.com/castagna/tdbloader4 This, for example, is certainly not exactly what you need, but I am sure that with little changes you can get what you want: https://github.com/castagna/tdbloader4/blob/master/src/main/java/org/apache/jena/tdbloader4/StatsDriver.java Last but not least, you'll need to dump your RDF data out onto HDFS. I suggest you use N-Triples/N-Quads serialization formats. Running SPARQL queries on top of an Hadoop cluster is another (long and not easy) story. But, it might be possible to translate part of the SPARQL algebra into Pig Latin scripts and use Pig. In my opinion however, it makes more sense to use MapReduce to filter/slice massive datasets, load the result into a triple store and refine your data analysis using SPARQL there. My 2 cents, Paolo Md. Mizanur Rahoman wrote: Dear All, I want to collect some statistics over RDF data. My triple store is Virtuoso and I am using Jena for executing my query. I want to get some statistics like i) how many resources in my dataset ii) resources belong to in which position of dataset (i.e., sub/prd/obj) etc. As my data is huge, I want to use Hadoop Map Reduce in calculating such statistics. Can you please suggest.