Dear Adam,

I am using CLI utility now 😊


Regards,

Samita Bai

________________________________
From: ajs6f <[email protected]>
Sent: 17 April 2018 19:39:34
To: [email protected]
Subject: Re: TDB 2 Store Parameters

I'm glad you got what you wanted, but you should also be aware that if you're 
just trying to load RDF into a TDB instance, there is no need at all to write 
Java code. The tdbloader and tdbloader2 CLI utilities work very very well for 
that.

ajs6f

> On Apr 17, 2018, at 1:03 AM, Samita Bai / PhD CS Scholar @ City Campus 
> <[email protected]> wrote:
>
> Dear Andy & Adam,
>
>
> Thanks a lot for the help, I got my code running finally. I just caught the 
> RiotException, that was all needed. Feeling so happy.
>
>
> I really appreciate for your time and efforts :)
>
>
> Best regards,
>
> Samita Bai
>
> ________________________________
> From: Samita Bai / PhD CS Scholar @ City Campus <[email protected]>
> Sent: 17 April 2018 02:13:32
> To: [email protected]
> Subject: Re: TDB 2 Store Parameters
>
> Dear Andy,
>
>
> I downloaded the same dataset from the link as you told i.e.
>
>
> http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz
>
>
> Then I extracted and ran the following code
>
>
> public class ReadQuadInJena {
>
> public static void main(String[] args) {
> // TODO Auto-generated method stub
> TDBLoader tlobj= new TDBLoader();
> String Ds ="/home/samita/data.nq";
> Location location = Location.create("/home/samita/Load_TDB");
> DatasetGraphTDB dgtdb = DatasetBuilderStd.create(location);
> try {
> InputStream is = new FileInputStream(AndyDs);
> tlobj.loadDataset(dgtdb, is);
> }catch(FileNotFoundException e) {}
> }
>
> It ended up with this error.
>
> Exception in thread "main" org.apache.jena.riot.RiotException: [line: 30506, 
> col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
> <http://fonts.googleapis.com/css?family=Nunito[|]...>
> at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
> at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
> at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
> at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:67)
> at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:54)
> at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
> at 
> org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:195)
> at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
> at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:324)
> at org.apache.jena.riot.RDFParser.parse(RDFParser.java:273)
> at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
> at org.apache.jena.riot.RDFDataMgr.parseFromInputStream(RDFDataMgr.java:870)
> at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:693)
> at 
> org.apache.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:152)
> at 
> org.apache.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:115)
> at org.apache.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:256)
> at org.apache.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:191)
> at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:47)
>
> If it was running fine at your end what's wrong with my code. Please help me.
>
>
>
>
>
> ________________________________
> From: Andy Seaborne <[email protected]>
> Sent: 16 April 2018 22:13:36
> To: [email protected]
> Subject: Re: TDB 2 Store Parameters
>
> I downlaoded
>
> http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz
>
> (the latest I could find)
>
> and used tdblaoder.
>
> Is that the data you are using?
>
>     Andy
>
> On 16/04/18 17:32, ajs6f wrote:
>> You should be able to check the validity of any of your files just by 
>> running them through Jena's `riot` command.
>>
>> You can try loading them into a TDB1 or TDB2 db by using the `tdbloader` or 
>> `tdb2.tdbloader` commands.
>>
>> ajs6f
>>
>>> On Apr 16, 2018, at 12:28 PM, Samita Bai / PhD CS Scholar @ City Campus 
>>> <[email protected]> wrote:
>>>
>>> OK Andy I got your point. Can you please share the code that you used to 
>>> read the Dynamic Linked Data Observatory dataset?
>>>
>>>
>>>
>>> Regards,
>>>
>>> Samita Bai
>>>
>>> ________________________________
>>> From: Andy Seaborne <[email protected]>
>>> Sent: 16 April 2018 15:34:07
>>> To: [email protected]
>>> Subject: Re: TDB 2 Store Parameters
>>>
>>> If you wish to prcoess the data as it is parsed, then see StreamRDF and
>>> either
>>>
>>> NxParser, which is not part of Jena, is not a validating parser.
>>>
>>> If the data is not valid, then you will have problems at some point,
>>> either loading, querying or outputting later.
>>>
>>> Adam has explained that TDB2 inxexes heavily so that querying is well
>>> severed.
>>>
>>> We can't help with the parser errors without knowing what they are.
>>>
>>> Which files from Dynamic Linked Data Observatory are you processing?
>>> Don't the later ones replace the earlier ones?
>>>
>>> I found that the last n-quads file was 42 million triples and all valid.
>>>
>>>     Andy
>>>
>>> On 16/04/18 11:05, ajs6f wrote:
>>>> Is there are syntax errors in your RDF (and it sounds like that is why 
>>>> Jena will not read it directly) you are doing yourself no service by 
>>>> taking unusual pains to force TDB to ingest your data.
>>>>
>>>> Please show us the errors that Jena is throwing trying to read your data 
>>>> and an appropriate sample of the data in question.
>>>>
>>>>
>>>> ajs6f
>>>>
>>>>> On Apr 16, 2018, at 4:42 AM, Samita Bai / PhD CS Scholar @ City Campus 
>>>>> <[email protected]> wrote:
>>>>>
>>>>> In addition to previous query. It is taking a lot of time to first parse 
>>>>> the dataset using NXParser then checking for object, and creating quad 
>>>>> again and storing in TDB. It could be very simple if we can take the quad 
>>>>> check its object and insert it in TDB.
>>>>>
>>>>>
>>>>> But Jena is not helping me with this 😞
>>>>>
>>>>>
>>>>> So I have to create quads again and store it in TDB.
>>>>>
>>>>>
>>>>> Any help is surely appreciated.
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Samita Bai
>>>>>
>>>>> ________________________________
>>>>> From: Samita Bai / PhD CS Scholar @ City Campus
>>>>> Sent: 16 April 2018 13:33:51
>>>>> To: [email protected]
>>>>> Subject: Re: TDB 2 Store Parameters
>>>>>
>>>>>
>>>>> Thank you Andy and Adam for the help. Actually, I am just indexing the 
>>>>> quads where object is either literal or foreign URI (i.e. Object 
>>>>> belonging to different dataset than subject), I am using NXParser (as 
>>>>> Jena is giving various parsing errors) to parse the dataset and then I am 
>>>>> storing in TDB2 in the following manner.
>>>>>
>>>>>
>>>>>
>>>>> public  void SetQuadsList(String sub, String pred, String obj, String 
>>>>> context) {
>>>>> Node subjects = NodeFactory.createURI(sub);
>>>>> Node objects = NodeFactory.createURI(obj);
>>>>> Node contexts =NodeFactory.createURI(context);
>>>>> //Node rdfSeeAlso = RDFS.seeAlso.asNode();
>>>>>
>>>>> Node predicates =NodeFactory.createURI(pred);
>>>>>
>>>>> //Quad quads = Quad.create(contexts, objects, rdfSeeAlso, subjects);
>>>>>
>>>>> Quad quads = Quad.create(contexts, subjects, predicates, objects);
>>>>>
>>>>> QuadList.add(quads);
>>>>>
>>>>> //System.out.println("Number of backlinks:" + QuadList.size());
>>>>>
>>>>> //System.out.println("quad written");
>>>>>
>>>>> //System.out.println("Quad"+quads.toString());
>>>>>
>>>>> }
>>>>> public List<Quad> GetQuadsList(){
>>>>> return QuadList;
>>>>> }
>>>>> public void QuadsToTDB(List<Quad> quadList) {
>>>>> final String DATASET_DIR_NAME = "DyLDO1000K_Index";
>>>>>        Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );
>>>>>
>>>>>
>>>>>        dataset.begin ( ReadWrite.WRITE );
>>>>>        try {
>>>>>        DatasetGraph dsg = dataset.asDatasetGraph();
>>>>>            Iterator<Quad> quads = quadList.iterator();
>>>>>            System.out.println("Size of Quad List: "+quadList.size());
>>>>>            while ( quads.hasNext() ) {
>>>>>            //System.out.println("here");
>>>>>                Quad quad = quads.next();
>>>>>                dsg.add(quad);
>>>>>                //System.out.println(quad.toString()+ "added");
>>>>>                //RDFDataMgr.writeQuads(System.out, quads);
>>>>>              //  RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
>>>>>
>>>>>            }
>>>>>            System.out.println("dsg created of size "+dsg.size());
>>>>>            //RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
>>>>>            System.out.println("written dsg using datamgr.");
>>>>>
>>>>>
>>>>>            //System.out.println(dataset.isEmpty());
>>>>>            //RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
>>>>>            dataset.commit();
>>>>>
>>>>>            System.out.println("committed dataset.");
>>>>>
>>>>>
>>>>>        } catch ( Exception e ) {
>>>>>            e.printStackTrace(System.err);
>>>>>            //dataset.abort();
>>>>>        } finally {
>>>>>        //RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
>>>>>            dataset.end();
>>>>>
>>>>>        }
>>>>>        System.out.println("end method.");
>>>>> }}
>>>>>
>>>>>
>>>>> I have indexed 40,000 files (as I have spilited the dataset into files 
>>>>> according to context) and the index size has become 120 GB. I have a 
>>>>> total of 1,35,600 files whose total size is 19.8 GB only.
>>>>>
>>>>>
>>>>> Why the TDB is making such BIG index size. I am confused :( is there any 
>>>>> problem in my code.
>>>>>
>>>>>
>>>>> Please suggest me if there can be some improvements.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Samita Bai
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: ajs6f <[email protected]>
>>>>> Sent: 15 April 2018 03:07:59
>>>>> To: [email protected]
>>>>> Subject: Re: TDB 2 Store Parameters
>>>>>
>>>>> 42 million quads is nothing like so many that either TDB version should 
>>>>> have any problem doing normal indexing (assuming very little in the way 
>>>>> of hardware-- I ingest datasets like that on my laptop all the time).
>>>>>
>>>>> Do you have some extraordinary hardware limitations?
>>>>>
>>>>> Adam
>>>>>
>>>>>> On Apr 14, 2018, at 11:42 AM, Andy Seaborne <[email protected]> wrote:
>>>>>>
>>>>>> Hi Samita,
>>>>>>
>>>>>> Firstly - as Adam points out - if theer are no indexes then access to 
>>>>>> the data will be very slow.  For a GSPO index,  that means squeries must 
>>>>>> be "GRAPH <uri> { ... }" and probably "GRAPH <uri> { <fixedSubject>.. }".
>>>>>>
>>>>>> GSPO means lookup by G then S within those G and the same for P then O.
>>>>>>
>>>>>> I looked at the data and it seems to be able 42 million quads.
>>>>>>
>>>>>> Using TDB1 (the loader is faster at this scale currently) is likely to 
>>>>>> be a better choice.
>>>>>>
>>>>>> Looking at StoreParams in TDB2:
>>>>>>
>>>>>> The code below creates the database at TDB2Factory.connectDataset so any 
>>>>>> StoreParams after that do not affect indexing.
>>>>>>
>>>>>> I tried to make it work in the release but the code ignores provided 
>>>>>> StoreParams - sorry.  Even if it did work, it hits a test to make sure 
>>>>>> there are basic indexing (Adam's point).
>>>>>>
>>>>>>   Andy
>>>>>>
>>>>>>
>>>>>> On 13/04/18 13:42, Samita Bai  / PhD CS Scholar @ City Campus wrote:
>>>>>>> I wrote the following code to build only one type of triple and quad 
>>>>>>> index but it is still creating all indexes 😞
>>>>>>> package ldbqPack;
>>>>>>> import org.apache.jena.query.Dataset;
>>>>>>> import org.apache.jena.tdb2.TDB2Factory;
>>>>>>> import org.apache.jena.tdb2.setup.StoreParams;
>>>>>>> import org.apache.jena.tdb2.sys.DatabaseConnection;
>>>>>>> import org.apache.jena.dboe.base.block.FileMode;
>>>>>>> import org.apache.jena.dboe.base.file.Location;
>>>>>>> import org.apache.jena.tdb2.setup.StoreParamsFactory;
>>>>>>> public class StrPrms {
>>>>>>> static String[] tindexes= {"SPO"};
>>>>>>> static String[] qindexes= {"GSPO"};
>>>>>>> static String[] pindexes= {"GPU"};
>>>>>>> static final StoreParams pApp = StoreParams.builder()
>>>>>>>       .blockSize(12)              // Not dynamic
>>>>>>>       .nodeMissCacheSize(12)      // Dynamic
>>>>>>>       .build();
>>>>>>>   static final StoreParams pLoc = StoreParams.builder()
>>>>>>>       .blockSize(0)
>>>>>>>       .nodeMissCacheSize(0).build();
>>>>>>>   static final StoreParams pDft = StoreParams.builder()
>>>>>>>    .fileMode(FileMode.mapped)
>>>>>>>    .blockSize(8192)
>>>>>>>    .blockReadCacheSize(5000)
>>>>>>>    .blockWriteCacheSize(1000)
>>>>>>>    .node2NodeIdCacheSize(200000)
>>>>>>>    .nodeId2NodeCacheSize(750000)
>>>>>>>    .nodeMissCacheSize(1000)
>>>>>>>    .nodeTableBaseName("nodes")
>>>>>>>    .primaryIndexTriples("SPO")
>>>>>>>    .tripleIndexes(tindexes)
>>>>>>>    .primaryIndexQuads("GSPO")
>>>>>>>    .quadIndexes(qindexes)
>>>>>>>    .prefixTableBaseName("prefixes")
>>>>>>>    .primaryIndexPrefix("GPU")
>>>>>>>    .prefixIndexes(pindexes)
>>>>>>>    .build();
>>>>>>> public static void main(String[] args) {
>>>>>>> // TODO Auto-generated method stub
>>>>>>> final String DATASET_DIR_NAME = "DyLDO100";
>>>>>>>        Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME 
>>>>>>> );
>>>>>>>        Location location = Location.create(DATASET_DIR_NAME);
>>>>>>>        StoreParams custom_params = 
>>>>>>> StoreParamsFactory.decideStoreParams(location, true, pApp, pLoc,  pDft);
>>>>>>>       DatabaseConnection.connectCreate(location, custom_params);
>>>>>>>       StoreParams params = StoreParams.getSmallStoreParams();
>>>>>>>        System.out.println(params);
>>>>>>> }
>>>>>>> }
>>>>>>> Please help.
>>>>>>> Regards,
>>>>>>> Samita Bai
>>>>>>> ________________________________
>>>>>>> P : Please consider the environment before printing this e-mail
>>>>>>> ________________________________
>>>>>>> CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments 
>>>>>>> may contain confidential and privileged information. If you are not the 
>>>>>>> intended recipient, please notify the sender immediately by return 
>>>>>>> e-mail, delete this e-mail and destroy any copies. Any dissemination or 
>>>>>>> use of this information by a person other than the intended recipient 
>>>>>>> is unauthorized and may be illegal.
>>>>>>> ________________________________
>>>>>
>>>>>
>>>>> P : Please consider the environment before printing this e-mail
>>>>>
>>>>> ________________________________
>>>>>
>>>>> CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
>>>>> contain confidential and privileged information. If you are not the 
>>>>> intended recipient, please notify the sender immediately by return 
>>>>> e-mail, delete this e-mail and destroy any copies. Any dissemination or 
>>>>> use of this information by a person other than the intended recipient is 
>>>>> unauthorized and may be illegal.
>>>>>
>>>>> ________________________________
>>>>
>>>
>>> P : Please consider the environment before printing this e-mail
>>>
>>> ________________________________
>>>
>>> CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
>>> contain confidential and privileged information. If you are not the 
>>> intended recipient, please notify the sender immediately by return e-mail, 
>>> delete this e-mail and destroy any copies. Any dissemination or use of this 
>>> information by a person other than the intended recipient is unauthorized 
>>> and may be illegal.
>>>
>>> ________________________________
>>
>
> P : Please consider the environment before printing this e-mail
>
> ________________________________
>
> CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
> contain confidential and privileged information. If you are not the intended 
> recipient, please notify the sender immediately by return e-mail, delete this 
> e-mail and destroy any copies. Any dissemination or use of this information 
> by a person other than the intended recipient is unauthorized and may be 
> illegal.
>
> ________________________________
>
> P : Please consider the environment before printing this e-mail
>
> ________________________________
>
> CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
> contain confidential and privileged information. If you are not the intended 
> recipient, please notify the sender immediately by return e-mail, delete this 
> e-mail and destroy any copies. Any dissemination or use of this information 
> by a person other than the intended recipient is unauthorized and may be 
> illegal.
>
> ________________________________


P : Please consider the environment before printing this e-mail

________________________________

CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
contain confidential and privileged information. If you are not the intended 
recipient, please notify the sender immediately by return e-mail, delete this 
e-mail and destroy any copies. Any dissemination or use of this information by 
a person other than the intended recipient is unauthorized and may be illegal.

________________________________

Reply via email to