Hello, Thank you very much for your comment. Indeed I have gathered all the facts and in November we did use tdbloader2 for our import. In April I used tdbloader. Could you please give me some more information on the updates. If I use tdbupdate tool after I used tdbloader2, the benefit of smaller (in theory faster) index is removed? Can I do incremental updates some other way though without loosing it? The requirement is we do updates to the store after we load. Ewa
---------- Forwarded message ---------- From: bwm-epimorphics <[email protected]> Date: 2014-05-19 11:41 GMT+01:00 Subject: Re: Freebase data on Jena TDB To: [email protected] On 19/05/14 11:26, Ewa Szwed wrote: > Hi Brian - I was using tdbloader for both November and April imports - I > have tested it before and for freebase data set it works better than > tdbloader2. > tdbloader2 had faster data importing phase but much slower the indexing > phase hence it makes the total import time longer than tdbloader for my > case. > Yes. For some of mine too. The reason I asked is that, as Andy mentioned, tdbloader2 tends to generate a significantly more compact set of files and as a result tdb can go a bit faster. That advantage goes away if you then update the database. If you are loading a tdb image and then not updating it, it might be worth the wait for tdbloader2. Brian > > 2014-05-14 10:00 GMT+01:00 bwm-epimorphics <[email protected]>: > > How did you load the TDB store? Is it possible you used tdbloader2 for >> the first load and tdbloader for the second? >> >> Brian >> >> >> On 13/05/14 14:13, Ewa Szwed wrote: >> >> I have the following problem with my Jena TDB instance. >>> Last year in November I have loaded freebase dump to Jena TDB and I was >>> able to work with it reasonably good and got quite good performance for >>> most of my queries. >>> Recently I have updated my Jena TDB store with a dump from April. >>> Here are some numbers to show the difference between these 2 instances. >>> >>> >>> >>> *November 2013* >>> >>> *April 2014* >>> >>> >>> Full time of import >>> >>> 262,052 sec /3,03 days >>> >>> 716,121 sec / 8,29 days >>> >>> Number of triples >>> >>> 1,826,551,456 >>> >>> 2,489,221,915 >>> >>> Index size (whole dir) >>> >>> 174 GB >>> >>> 333 GB >>> >>> >>> My problem is that my new instance in not performing at all. >>> The queries that previously run for a couple of minutes take a couple of >>> hours now and it is not acceptable for my business. :( >>> So I would like to ask if there is a practical index limit size for Jena >>> TDB. Is there anything I can do to improve the performance of it. >>> Is this significant drop in performance sth expected or maybe I have sth >>> fundamentally wrong in my set up - which I would need to track and fix. >>> Please advise. >>> Regards, >>> Ewa Szwed >>> >>> >>> -- >> Epimorphics Ltd (http://www.epimorphics.com) >> >> Epimorphics Ltd. is a limited company registered in England (number >> 7016688) >> Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 >> 6PT, UK >> >> >> -- Epimorphics Ltd (http://www.epimorphics.com) Epimorphics Ltd. is a limited company registered in England (number 7016688) Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT, UK
