[ 
https://issues.apache.org/jira/browse/JENA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-275.
------------------------------

    
> different query results for tdbloader and tdbloader3
> ----------------------------------------------------
>
>                 Key: JENA-275
>                 URL: https://issues.apache.org/jira/browse/JENA-275
>             Project: Apache Jena
>          Issue Type: Question
>          Components: TDB
>    Affects Versions: TDB 0.9.2
>            Reporter: Jon Phillips
>            Assignee: Andy Seaborne
>
> I had intended to use tdbloader3 over tdbloader for loading some large data 
> sets of (> 100 million triples) because I was seening higher sustained 
> triples-per-second load rates.  However, I am running into some immediate 
> issues running basic queries on the resulting models, even on small toy test 
> sets.  In one simple case, a SPARQL query with a fixed predicate but unbound 
> subject (excuse my novice grasp of terminology) and objects fails to return 
> any results for the model loaded with tdbloader3. 
> Here is the sequence of steps that I ran:
> cat dbpedia.nt  (list of 10 triples from dbpedia)
> <http://dbpedia.org/resource/AccessibleComputing> 
> <http://www.w3.org/2000/01/rdf-schema#label> "AccessibleComputing"@en .
> <http://dbpedia.org/resource/AfghanistanGeography> 
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanGeography"@en .
> <http://dbpedia.org/resource/AfghanistanHistory> 
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanHistory"@en .
> <http://dbpedia.org/resource/AfghanistanPeople> 
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanPeople"@en .
> <http://dbpedia.org/resource/AfghanistanCommunications> 
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanCommunications"@en .
> <http://dbpedia.org/resource/AfghanistanTransportations> 
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanTransportations"@en .
> <http://dbpedia.org/resource/AfghanistanMilitary> 
> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanMilitary"@en .
> <http://dbpedia.org/resource/AfghanistanTransnationalIssues> 
> <http://www.w3.org/2000/01/rdf-schema#label> 
> "AfghanistanTransnationalIssues"@en .
> <http://dbpedia.org/resource/AmoeboidTaxa> 
> <http://www.w3.org/2000/01/rdf-schema#label> "AmoeboidTaxa"@en .
> build the model with tdbloader
> tdbloader --loc=dbpedia_tdbl1 dbpedia.nt 
> 23:18:29 INFO  loader               :: -- Start triples data phase
> 23:18:29 INFO  loader               :: ** Load empty triples table
> 23:18:29 INFO  loader               :: Load: dbpedia.nt -- 2012/07/11 
> 23:18:29 EDT
> 23:18:29 INFO  loader               :: -- Finish triples data phase
> 23:18:29 INFO  loader               :: 9 triples loaded in 0.04 seconds 
> [Rate: 214.29 per second]
> 23:18:29 INFO  loader               :: -- Start triples index phase
> 23:18:29 INFO  loader               :: ** Index SPO->POS: 9 slots indexed in 
> 0.00 seconds [Rate: 9,000.00 per second]
> 23:18:29 INFO  loader               :: ** Index SPO->OSP: 9 slots indexed in 
> 0.00 seconds [Rate: 9,000.00 per second]
> 23:18:29 INFO  loader               :: -- Finish triples index phase
> 23:18:29 INFO  loader               :: ** 9 triples indexed in 0.00 seconds 
> [Rate: 1,800.00 per second]
> 23:18:29 INFO  loader               :: -- Finish triples load
> 23:18:29 INFO  loader               :: ** Completed: 9 triples loaded in 0.05 
> seconds [Rate: 163.64 per second]
> now build the same model with tdbloader3
> tdbloader3 --loc=dbpedia_tdbl3 dbpedia.nt 
> 23:18:38 INFO  tdbloader3           :: Load: dbpedia.nt -- 2012/07/11 
> 23:18:38 EDT
> 23:18:38 INFO  tdbloader3           :: Node Table (1/3): building nodes.dat 
> and sorting hash|id ...
> 23:18:38 INFO  tdbloader3           :: Total: 27 tuples : 0.01 seconds : 
> 1,928.57 tuples/sec [2012/07/11 23:18:38 EDT]
> 23:18:38 INFO  tdbloader3           :: Node Table (2/3): generating input 
> data using node ids...
> 23:18:38 INFO  tdbloader3           :: Total: 8 tuples : 0.03 seconds : 
> 275.86 tuples/sec [2012/07/11 23:18:38 EDT]
> 23:18:38 INFO  tdbloader3           :: Node Table (3/3): building node table 
> B+Tree index (i.e. node2id.dat and node2id.idn files)...
> 23:18:39 INFO  tdbloader3           :: Total: 19 tuples : 0.08 seconds : 
> 234.57 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating SPO index...
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.01 seconds : 
> 1,500.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating GSPO index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for POS index...
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.00 seconds : 
> 4,500.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating POS index...
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.01 seconds : 
> 1,125.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for OSP index...
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating OSP index...
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.00 seconds : 
> 1,800.00 tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for GPOS index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating GPOS index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for GOSP index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating GOSP index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for POSG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating POSG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for OSPG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating OSPG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: sorting data for SPOG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Index: creating SPOG index...
> 23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
> tuples/sec [2012/07/11 23:18:39 EDT]
> 23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.45 seconds : 20.18 
> tuples/sec [2012/07/11 23:18:39 EDT]
> two simple queries that return the entire result set return the same set of 
> triples:
> ./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x ?y  ?z }"
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> | x                                                            | y            
>                                 | z                                   |
> =====================================================================================================================================================
> | <http://dbpedia.org/resource/AccessibleComputing>            | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en       
>      |
> | <http://dbpedia.org/resource/AfghanistanGeography>           | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en      
>      |
> | <http://dbpedia.org/resource/AfghanistanHistory>             | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en        
>      |
> | <http://dbpedia.org/resource/AfghanistanPeople>              | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en         
>      |
> | <http://dbpedia.org/resource/AfghanistanCommunications>      | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en 
>      |
> | <http://dbpedia.org/resource/AfghanistanTransportations>     | 
> <http://www.w3.org/2000/01/rdf-schema#label> | 
> "AfghanistanTransportations"@en     |
> | <http://dbpedia.org/resource/AfghanistanMilitary>            | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en       
>      |
> | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | 
> <http://www.w3.org/2000/01/rdf-schema#label> | 
> "AfghanistanTransnationalIssues"@en |
> | <http://dbpedia.org/resource/AmoeboidTaxa>                   | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en              
>      |
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> same result for the model built with tdbloader3
> ./tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x ?y  ?z }"
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> | x                                                            | y            
>                                 | z                                   |
> =====================================================================================================================================================
> | <http://dbpedia.org/resource/AccessibleComputing>            | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en       
>      |
> | <http://dbpedia.org/resource/AfghanistanCommunications>      | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en 
>      |
> | <http://dbpedia.org/resource/AfghanistanGeography>           | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en      
>      |
> | <http://dbpedia.org/resource/AfghanistanHistory>             | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en        
>      |
> | <http://dbpedia.org/resource/AfghanistanMilitary>            | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en       
>      |
> | <http://dbpedia.org/resource/AfghanistanPeople>              | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en         
>      |
> | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | 
> <http://www.w3.org/2000/01/rdf-schema#label> | 
> "AfghanistanTransnationalIssues"@en |
> | <http://dbpedia.org/resource/AfghanistanTransportations>     | 
> <http://www.w3.org/2000/01/rdf-schema#label> | 
> "AfghanistanTransportations"@en     |
> | <http://dbpedia.org/resource/AmoeboidTaxa>                   | 
> <http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en              
>      |
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> different query run on model build with tdbloader that matches on the 
> predicate type:
> ./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x 
> <http://www.w3.org/2000/01/rdf-schema#label>  ?z }"
> ----------------------------------------------------------------------------------------------------------
> | x                                                            | y | z        
>                            |
> ==========================================================================================================
> | <http://dbpedia.org/resource/AccessibleComputing>            |   | 
> "AccessibleComputing"@en            |
> | <http://dbpedia.org/resource/AfghanistanGeography>           |   | 
> "AfghanistanGeography"@en           |
> | <http://dbpedia.org/resource/AfghanistanHistory>             |   | 
> "AfghanistanHistory"@en             |
> | <http://dbpedia.org/resource/AfghanistanPeople>              |   | 
> "AfghanistanPeople"@en              |
> | <http://dbpedia.org/resource/AfghanistanCommunications>      |   | 
> "AfghanistanCommunications"@en      |
> | <http://dbpedia.org/resource/AfghanistanTransportations>     |   | 
> "AfghanistanTransportations"@en     |
> | <http://dbpedia.org/resource/AfghanistanMilitary>            |   | 
> "AfghanistanMilitary"@en            |
> | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> |   | 
> "AfghanistanTransnationalIssues"@en |
> | <http://dbpedia.org/resource/AmoeboidTaxa>                   |   | 
> "AmoeboidTaxa"@en                   |
> ----------------------------------------------------------------------------------------------------------
> Expected that the data loaded with tdbloader3 to return the same result but 
> returned empty result:
> tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x 
> <http://www.w3.org/2000/01/rdf-schema#label>  ?z }"
> -------------
> | x | y | z |
> =============
> -------------
> Any help would be much appreciated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to