Thank you Andy and Adam for the help. Actually, I am just indexing the quads 
where object is either literal or foreign URI (i.e. Object belonging to 
different dataset than subject), I am using NXParser (as Jena is giving various 
parsing errors) to parse the dataset and then I am storing in TDB2 in the 
following manner.



public  void SetQuadsList(String sub, String pred, String obj, String context) {
Node subjects = NodeFactory.createURI(sub);
Node objects = NodeFactory.createURI(obj);
Node contexts =NodeFactory.createURI(context);
//Node rdfSeeAlso = RDFS.seeAlso.asNode();

Node predicates =NodeFactory.createURI(pred);

//Quad quads = Quad.create(contexts, objects, rdfSeeAlso, subjects);

Quad quads = Quad.create(contexts, subjects, predicates, objects);

QuadList.add(quads);

//System.out.println("Number of backlinks:" + QuadList.size());

//System.out.println("quad written");

//System.out.println("Quad"+quads.toString());

}
public List<Quad> GetQuadsList(){
return QuadList;
}
public void QuadsToTDB(List<Quad> quadList) {
final String DATASET_DIR_NAME = "DyLDO1000K_Index";
        Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );


        dataset.begin ( ReadWrite.WRITE );
        try {
        DatasetGraph dsg = dataset.asDatasetGraph();
            Iterator<Quad> quads = quadList.iterator();
            System.out.println("Size of Quad List: "+quadList.size());
            while ( quads.hasNext() ) {
            //System.out.println("here");
                Quad quad = quads.next();
                dsg.add(quad);
                //System.out.println(quad.toString()+ "added");
                //RDFDataMgr.writeQuads(System.out, quads);
              //  RDFDataMgr.write(System.out, dsg, Lang.NQUADS);

            }
            System.out.println("dsg created of size "+dsg.size());
            //RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
            System.out.println("written dsg using datamgr.");


            //System.out.println(dataset.isEmpty());
            //RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
            dataset.commit();

            System.out.println("committed dataset.");


        } catch ( Exception e ) {
            e.printStackTrace(System.err);
            //dataset.abort();
        } finally {
        //RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
            dataset.end();

        }
        System.out.println("end method.");
}}


I have indexed 40,000 files (as I have spilited the dataset into files 
according to context) and the index size has become 120 GB. I have a total of 
1,35,600 files whose total size is 19.8 GB only.


Why the TDB is making such BIG index size. I am confused :( is there any 
problem in my code.


Please suggest me if there can be some improvements.



Regards,

Samita Bai






________________________________
From: ajs6f <aj...@apache.org>
Sent: 15 April 2018 03:07:59
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

42 million quads is nothing like so many that either TDB version should have 
any problem doing normal indexing (assuming very little in the way of 
hardware-- I ingest datasets like that on my laptop all the time).

Do you have some extraordinary hardware limitations?

Adam

> On Apr 14, 2018, at 11:42 AM, Andy Seaborne <a...@apache.org> wrote:
>
> Hi Samita,
>
> Firstly - as Adam points out - if theer are no indexes then access to the 
> data will be very slow.  For a GSPO index,  that means squeries must be 
> "GRAPH <uri> { ... }" and probably "GRAPH <uri> { <fixedSubject>.. }".
>
> GSPO means lookup by G then S within those G and the same for P then O.
>
> I looked at the data and it seems to be able 42 million quads.
>
> Using TDB1 (the loader is faster at this scale currently) is likely to be a 
> better choice.
>
> Looking at StoreParams in TDB2:
>
> The code below creates the database at TDB2Factory.connectDataset so any 
> StoreParams after that do not affect indexing.
>
> I tried to make it work in the release but the code ignores provided 
> StoreParams - sorry.  Even if it did work, it hits a test to make sure there 
> are basic indexing (Adam's point).
>
>    Andy
>
>
> On 13/04/18 13:42, Samita Bai  / PhD CS Scholar @ City Campus wrote:
>> I wrote the following code to build only one type of triple and quad index 
>> but it is still creating all indexes 😞
>> package ldbqPack;
>> import org.apache.jena.query.Dataset;
>> import org.apache.jena.tdb2.TDB2Factory;
>> import org.apache.jena.tdb2.setup.StoreParams;
>> import org.apache.jena.tdb2.sys.DatabaseConnection;
>> import org.apache.jena.dboe.base.block.FileMode;
>> import org.apache.jena.dboe.base.file.Location;
>> import org.apache.jena.tdb2.setup.StoreParamsFactory;
>> public class StrPrms {
>> static String[] tindexes= {"SPO"};
>> static String[] qindexes= {"GSPO"};
>> static String[] pindexes= {"GPU"};
>> static final StoreParams pApp = StoreParams.builder()
>>        .blockSize(12)              // Not dynamic
>>        .nodeMissCacheSize(12)      // Dynamic
>>        .build();
>>    static final StoreParams pLoc = StoreParams.builder()
>>        .blockSize(0)
>>        .nodeMissCacheSize(0).build();
>>    static final StoreParams pDft = StoreParams.builder()
>>     .fileMode(FileMode.mapped)
>>     .blockSize(8192)
>>     .blockReadCacheSize(5000)
>>     .blockWriteCacheSize(1000)
>>     .node2NodeIdCacheSize(200000)
>>     .nodeId2NodeCacheSize(750000)
>>     .nodeMissCacheSize(1000)
>>     .nodeTableBaseName("nodes")
>>     .primaryIndexTriples("SPO")
>>     .tripleIndexes(tindexes)
>>     .primaryIndexQuads("GSPO")
>>     .quadIndexes(qindexes)
>>     .prefixTableBaseName("prefixes")
>>     .primaryIndexPrefix("GPU")
>>     .prefixIndexes(pindexes)
>>     .build();
>> public static void main(String[] args) {
>> // TODO Auto-generated method stub
>> final String DATASET_DIR_NAME = "DyLDO100";
>>         Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );
>>         Location location = Location.create(DATASET_DIR_NAME);
>>         StoreParams custom_params = 
>> StoreParamsFactory.decideStoreParams(location, true, pApp, pLoc,  pDft);
>>        DatabaseConnection.connectCreate(location, custom_params);
>>        StoreParams params = StoreParams.getSmallStoreParams();
>>         System.out.println(params);
>> }
>> }
>> Please help.
>> Regards,
>> Samita Bai
>> ________________________________
>> P : Please consider the environment before printing this e-mail
>> ________________________________
>> CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
>> contain confidential and privileged information. If you are not the intended 
>> recipient, please notify the sender immediately by return e-mail, delete 
>> this e-mail and destroy any copies. Any dissemination or use of this 
>> information by a person other than the intended recipient is unauthorized 
>> and may be illegal.
>> ________________________________


P : Please consider the environment before printing this e-mail

________________________________

CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
contain confidential and privileged information. If you are not the intended 
recipient, please notify the sender immediately by return e-mail, delete this 
e-mail and destroy any copies. Any dissemination or use of this information by 
a person other than the intended recipient is unauthorized and may be illegal.

________________________________

Reply via email to