Re: TDB 2 Store Parameters

2018-04-17 Thread Samita Bai / PhD CS Scholar @ City Campus
Dear Adam,


I am using CLI utility now 😊


Regards,

Samita Bai


From: ajs6f 
Sent: 17 April 2018 19:39:34
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

I'm glad you got what you wanted, but you should also be aware that if you're 
just trying to load RDF into a TDB instance, there is no need at all to write 
Java code. The tdbloader and tdbloader2 CLI utilities work very very well for 
that.

ajs6f

> On Apr 17, 2018, at 1:03 AM, Samita Bai / PhD CS Scholar @ City Campus 
>  wrote:
>
> Dear Andy & Adam,
>
>
> Thanks a lot for the help, I got my code running finally. I just caught the 
> RiotException, that was all needed. Feeling so happy.
>
>
> I really appreciate for your time and efforts :)
>
>
> Best regards,
>
> Samita Bai
>
> 
> From: Samita Bai / PhD CS Scholar @ City Campus 
> Sent: 17 April 2018 02:13:32
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
>
> Dear Andy,
>
>
> I downloaded the same dataset from the link as you told i.e.
>
>
> http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz
>
>
> Then I extracted and ran the following code
>
>
> public class ReadQuadInJena {
>
> public static void main(String[] args) {
> // TODO Auto-generated method stub
> TDBLoader tlobj= new TDBLoader();
> String Ds ="/home/samita/data.nq";
> Location location = Location.create("/home/samita/Load_TDB");
> DatasetGraphTDB dgtdb = DatasetBuilderStd.create(location);
> try {
> InputStream is = new FileInputStream(AndyDs);
> tlobj.loadDataset(dgtdb, is);
> }catch(FileNotFoundException e) {}
> }
>
> It ended up with this error.
>
> Exception in thread "main" org.apache.jena.riot.RiotException: [line: 30506, 
> col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
> <http://fonts.googleapis.com/css?family=Nunito[|]...>
> at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
> at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
> at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
> at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:67)
> at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:54)
> at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
> at 
> org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:195)
> at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
> at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:324)
> at org.apache.jena.riot.RDFParser.parse(RDFParser.java:273)
> at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
> at org.apache.jena.riot.RDFDataMgr.parseFromInputStream(RDFDataMgr.java:870)
> at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:693)
> at 
> org.apache.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:152)
> at 
> org.apache.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:115)
> at org.apache.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:256)
> at org.apache.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:191)
> at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:47)
>
> If it was running fine at your end what's wrong with my code. Please help me.
>
>
>
>
>
> 
> From: Andy Seaborne 
> Sent: 16 April 2018 22:13:36
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
>
> I downlaoded
>
> http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz
>
> (the latest I could find)
>
> and used tdblaoder.
>
> Is that the data you are using?
>
> Andy
>
> On 16/04/18 17:32, ajs6f wrote:
>> You should be able to check the validity of any of your files just by 
>> running them through Jena's `riot` command.
>>
>> You can try loading them into a TDB1 or TDB2 db by using the `tdbloader` or 
>> `tdb2.tdbloader` commands.
>>
>> ajs6f
>>
>>> On Apr 16, 2018, at 12:28 PM, Samita Bai / PhD CS Scholar @ City Campus 
>>>  wrote:
>>>
>>> OK Andy I got your point. Can you please share the code that you used to 
>>> read the Dynamic Linked Data Observatory dataset?
>>>
>>>
>>>
>>> Regards,
>>>
>>> Samita Bai
>>>
>>> 
>>> From: Andy Seaborne 
>>> Sent: 16 April 2018 15:34:07
>>> To: users@jena.apache.org
>>> Subject: Re: TDB 2 Store Parameters
>>>
>>> If you wish to prcoess the data as it is parsed, then se

Re: TDB 2 Store Parameters

2018-04-17 Thread ajs6f
I'm glad you got what you wanted, but you should also be aware that if you're 
just trying to load RDF into a TDB instance, there is no need at all to write 
Java code. The tdbloader and tdbloader2 CLI utilities work very very well for 
that.

ajs6f

> On Apr 17, 2018, at 1:03 AM, Samita Bai / PhD CS Scholar @ City Campus 
>  wrote:
> 
> Dear Andy & Adam,
> 
> 
> Thanks a lot for the help, I got my code running finally. I just caught the 
> RiotException, that was all needed. Feeling so happy.
> 
> 
> I really appreciate for your time and efforts :)
> 
> 
> Best regards,
> 
> Samita Bai
> 
> 
> From: Samita Bai / PhD CS Scholar @ City Campus 
> Sent: 17 April 2018 02:13:32
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
> 
> Dear Andy,
> 
> 
> I downloaded the same dataset from the link as you told i.e.
> 
> 
> http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz
> 
> 
> Then I extracted and ran the following code
> 
> 
> public class ReadQuadInJena {
> 
> public static void main(String[] args) {
> // TODO Auto-generated method stub
> TDBLoader tlobj= new TDBLoader();
> String Ds ="/home/samita/data.nq";
> Location location = Location.create("/home/samita/Load_TDB");
> DatasetGraphTDB dgtdb = DatasetBuilderStd.create(location);
> try {
> InputStream is = new FileInputStream(AndyDs);
> tlobj.loadDataset(dgtdb, is);
> }catch(FileNotFoundException e) {}
> }
> 
> It ended up with this error.
> 
> Exception in thread "main" org.apache.jena.riot.RiotException: [line: 30506, 
> col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
> <http://fonts.googleapis.com/css?family=Nunito[|]...>
> at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
> at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
> at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
> at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:67)
> at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:54)
> at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
> at 
> org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:195)
> at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
> at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:324)
> at org.apache.jena.riot.RDFParser.parse(RDFParser.java:273)
> at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
> at org.apache.jena.riot.RDFDataMgr.parseFromInputStream(RDFDataMgr.java:870)
> at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:693)
> at 
> org.apache.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:152)
> at 
> org.apache.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:115)
> at org.apache.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:256)
> at org.apache.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:191)
> at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:47)
> 
> If it was running fine at your end what's wrong with my code. Please help me.
> 
> 
> 
> 
> 
> 
> From: Andy Seaborne 
> Sent: 16 April 2018 22:13:36
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
> 
> I downlaoded
> 
> http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz
> 
> (the latest I could find)
> 
> and used tdblaoder.
> 
> Is that the data you are using?
> 
> Andy
> 
> On 16/04/18 17:32, ajs6f wrote:
>> You should be able to check the validity of any of your files just by 
>> running them through Jena's `riot` command.
>> 
>> You can try loading them into a TDB1 or TDB2 db by using the `tdbloader` or 
>> `tdb2.tdbloader` commands.
>> 
>> ajs6f
>> 
>>> On Apr 16, 2018, at 12:28 PM, Samita Bai / PhD CS Scholar @ City Campus 
>>>  wrote:
>>> 
>>> OK Andy I got your point. Can you please share the code that you used to 
>>> read the Dynamic Linked Data Observatory dataset?
>>> 
>>> 
>>> 
>>> Regards,
>>> 
>>> Samita Bai
>>> 
>>> 
>>> From: Andy Seaborne 
>>> Sent: 16 April 2018 15:34:07
>>> To: users@jena.apache.org
>>> Subject: Re: TDB 2 Store Parameters
>>> 
>>> If you wish to prcoess the data as it is parsed, then see StreamRDF and
>>> either
>>> 
>>> NxParser, which is not part of Jena, is not a validating parser.
>>> 
>>> If the da

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
Dear Andy & Adam,


Thanks a lot for the help, I got my code running finally. I just caught the 
RiotException, that was all needed. Feeling so happy.


I really appreciate for your time and efforts :)


Best regards,

Samita Bai


From: Samita Bai / PhD CS Scholar @ City Campus 
Sent: 17 April 2018 02:13:32
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

Dear Andy,


I downloaded the same dataset from the link as you told i.e.


http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz


Then I extracted and ran the following code


public class ReadQuadInJena {

public static void main(String[] args) {
// TODO Auto-generated method stub
TDBLoader tlobj= new TDBLoader();
String Ds ="/home/samita/data.nq";
Location location = Location.create("/home/samita/Load_TDB");
DatasetGraphTDB dgtdb = DatasetBuilderStd.create(location);
try {
InputStream is = new FileInputStream(AndyDs);
tlobj.loadDataset(dgtdb, is);
}catch(FileNotFoundException e) {}
}

It ended up with this error.

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 30506, 
col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
<http://fonts.googleapis.com/css?family=Nunito[|]...>
at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:67)
at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:54)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:195)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:324)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:273)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
at org.apache.jena.riot.RDFDataMgr.parseFromInputStream(RDFDataMgr.java:870)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:693)
at 
org.apache.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:152)
at 
org.apache.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:115)
at org.apache.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:256)
at org.apache.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:191)
at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:47)

If it was running fine at your end what's wrong with my code. Please help me.






From: Andy Seaborne 
Sent: 16 April 2018 22:13:36
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

I downlaoded

http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz

(the latest I could find)

and used tdblaoder.

Is that the data you are using?

 Andy

On 16/04/18 17:32, ajs6f wrote:
> You should be able to check the validity of any of your files just by running 
> them through Jena's `riot` command.
>
> You can try loading them into a TDB1 or TDB2 db by using the `tdbloader` or 
> `tdb2.tdbloader` commands.
>
> ajs6f
>
>> On Apr 16, 2018, at 12:28 PM, Samita Bai / PhD CS Scholar @ City Campus 
>>  wrote:
>>
>> OK Andy I got your point. Can you please share the code that you used to 
>> read the Dynamic Linked Data Observatory dataset?
>>
>>
>>
>> Regards,
>>
>> Samita Bai
>>
>> 
>> From: Andy Seaborne 
>> Sent: 16 April 2018 15:34:07
>> To: users@jena.apache.org
>> Subject: Re: TDB 2 Store Parameters
>>
>> If you wish to prcoess the data as it is parsed, then see StreamRDF and
>> either
>>
>> NxParser, which is not part of Jena, is not a validating parser.
>>
>> If the data is not valid, then you will have problems at some point,
>> either loading, querying or outputting later.
>>
>> Adam has explained that TDB2 inxexes heavily so that querying is well
>> severed.
>>
>> We can't help with the parser errors without knowing what they are.
>>
>> Which files from Dynamic Linked Data Observatory are you processing?
>> Don't the later ones replace the earlier ones?
>>
>> I found that the last n-quads file was 42 million triples and all valid.
>>
>>  Andy
>>
>> On 16/04/18 11:05, ajs6f wrote:
>>> Is there are syntax errors in your RDF (and it sounds like that is why Jena 
>>> will not read it directly) you are doing yourself no service by taking 
>>> unusual pains to force TDB to ingest your data.
>>>
>>> Please show us the errors that Jena is throwing trying to read your data 
>>> and an appr

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
Dear Andy,


I downloaded the same dataset from the link as you told i.e.


http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz


Then I extracted and ran the following code


public class ReadQuadInJena {

public static void main(String[] args) {
// TODO Auto-generated method stub
TDBLoader tlobj= new TDBLoader();
String Ds ="/home/samita/data.nq";
Location location = Location.create("/home/samita/Load_TDB");
DatasetGraphTDB dgtdb = DatasetBuilderStd.create(location);
try {
InputStream is = new FileInputStream(AndyDs);
tlobj.loadDataset(dgtdb, is);
}catch(FileNotFoundException e) {}
}

It ended up with this error.

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 30506, 
col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
<http://fonts.googleapis.com/css?family=Nunito[|]...>
at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:67)
at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:54)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:195)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:324)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:273)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
at org.apache.jena.riot.RDFDataMgr.parseFromInputStream(RDFDataMgr.java:870)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:693)
at 
org.apache.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:152)
at 
org.apache.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:115)
at org.apache.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:256)
at org.apache.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:191)
at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:47)

If it was running fine at your end what's wrong with my code. Please help me.






From: Andy Seaborne 
Sent: 16 April 2018 22:13:36
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

I downlaoded

http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz

(the latest I could find)

and used tdblaoder.

Is that the data you are using?

 Andy

On 16/04/18 17:32, ajs6f wrote:
> You should be able to check the validity of any of your files just by running 
> them through Jena's `riot` command.
>
> You can try loading them into a TDB1 or TDB2 db by using the `tdbloader` or 
> `tdb2.tdbloader` commands.
>
> ajs6f
>
>> On Apr 16, 2018, at 12:28 PM, Samita Bai / PhD CS Scholar @ City Campus 
>>  wrote:
>>
>> OK Andy I got your point. Can you please share the code that you used to 
>> read the Dynamic Linked Data Observatory dataset?
>>
>>
>>
>> Regards,
>>
>> Samita Bai
>>
>> 
>> From: Andy Seaborne 
>> Sent: 16 April 2018 15:34:07
>> To: users@jena.apache.org
>> Subject: Re: TDB 2 Store Parameters
>>
>> If you wish to prcoess the data as it is parsed, then see StreamRDF and
>> either
>>
>> NxParser, which is not part of Jena, is not a validating parser.
>>
>> If the data is not valid, then you will have problems at some point,
>> either loading, querying or outputting later.
>>
>> Adam has explained that TDB2 inxexes heavily so that querying is well
>> severed.
>>
>> We can't help with the parser errors without knowing what they are.
>>
>> Which files from Dynamic Linked Data Observatory are you processing?
>> Don't the later ones replace the earlier ones?
>>
>> I found that the last n-quads file was 42 million triples and all valid.
>>
>>  Andy
>>
>> On 16/04/18 11:05, ajs6f wrote:
>>> Is there are syntax errors in your RDF (and it sounds like that is why Jena 
>>> will not read it directly) you are doing yourself no service by taking 
>>> unusual pains to force TDB to ingest your data.
>>>
>>> Please show us the errors that Jena is throwing trying to read your data 
>>> and an appropriate sample of the data in question.
>>>
>>>
>>> ajs6f
>>>
>>>> On Apr 16, 2018, at 4:42 AM, Samita Bai / PhD CS Scholar @ City Campus 
>>>>  wrote:
>>>>
>>>> In addition to previous query. It is taking a lot of time to first parse 
>>>> the dataset using NXParser then checking for object, and creating quad 
&g

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
I think I have to download the file first as it in nq.gz format. I wrote the 
following code which gave me exception


public class ReadQuadInJena {

public static void main(String[] args) {
// TODO Auto-generated method stub
String URL = "http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz";;
Location location = Location.create("/home/samita/TDBLoaded");
DatasetGraphTDB dgtdb = DatasetBuilderStd.create(location);
TDBLoader.load(dgtdb, URL);
}}


URL is not valid cz it contains the nq.gz format.


From: Andy Seaborne 
Sent: 17 April 2018 01:49:06
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

Corrupt input file or the input file has binary in it.

(it means the input is not legal UTF-8)

On 16/04/18 21:20, Samita Bai  / PhD CS Scholar @ City Campus wrote:
>
> Dear Andy,
>
>
>
> I got the following exception with TDBLoader for the latest dataset as you 
> said.
>
>
> Exception in thread "main" org.apache.jena.atlas.RuntimeIOException: 
> java.nio.charset.MalformedInputException: Input length = 1
> at org.apache.jena.atlas.io.IO.exception(IO.java:233)
> at 
> org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
> at 
> org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
> at 
> org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
> at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:235)
> at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:229)
> at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:151)
> at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:92)
> at 
> org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:48)
> at org.apache.jena.riot.lang.RiotParsers.createParser(RiotParsers.java:57)
> at 
> org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:194)
> at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
> at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:303)
> at org.apache.jena.riot.RDFParser.parse(RDFParser.java:277)
> at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
> at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:890)
> at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:680)
> at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:649)
> at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:637)
> at 
> org.apache.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:143)
> at 
> org.apache.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:109)
> at org.apache.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:252)
> at org.apache.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:184)
> at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:74)
> at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:53)
> at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:44)
> at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:42)
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
> at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:281)
> at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
> at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
> at java.base/java.io.Reader.read(Reader.java:140)
> ... 26 more
>
>
>
> 
> From: ajs6f 
> Sent: 17 April 2018 00:24:12
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
>
> This appears to be a plain problem in the data. The character "|" should be 
> %-escaped. Have you talked with the data providers to figure out why the data 
> is invalid? You don't show where this triple comes from, but since Andy had 
> no problem loading a more recent data set from the same provider, perhaps you 
> can just try that.
>
> Parsing in Jena intentionally defaults to rejecting invalid RDF. That's by 
> far the safest approach for a library system like Jena. You can catch an 
> exception and ignore the invalid data, and if that works for your 
> application, good, or you can try to take some more sophisticated approach. 
> But in any event you'll generally be well-advised to clean up the data 
> _before_ it goes into your application. For one thing, Jena's tools (e.g. 
> tdbloader) expect valid data.
>
> As for your code terminating, you don't show your code with a try-catch, so 
> we can't help you very well.
>
> Adam
>
>> On Apr 16, 2018, at 1:50 PM, Samita Bai / PhD CS Scholar @ City Campus 
>>  wrote:
>>
>> E

Re: TDB 2 Store Parameters

2018-04-16 Thread Andy Seaborne

Corrupt input file or the input file has binary in it.

(it means the input is not legal UTF-8)

On 16/04/18 21:20, Samita Bai  / PhD CS Scholar @ City Campus wrote:


Dear Andy,



I got the following exception with TDBLoader for the latest dataset as you said.


Exception in thread "main" org.apache.jena.atlas.RuntimeIOException: 
java.nio.charset.MalformedInputException: Input length = 1
at org.apache.jena.atlas.io.IO.exception(IO.java:233)
at 
org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
at 
org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
at 
org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:235)
at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:229)
at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:151)
at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:92)
at 
org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:48)
at org.apache.jena.riot.lang.RiotParsers.createParser(RiotParsers.java:57)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:194)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:303)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:277)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:890)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:680)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:649)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:637)
at 
org.apache.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:143)
at 
org.apache.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:109)
at org.apache.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:252)
at org.apache.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:184)
at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:74)
at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:53)
at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:44)
at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:42)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
at java.base/java.io.Reader.read(Reader.java:140)
... 26 more




From: ajs6f 
Sent: 17 April 2018 00:24:12
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

This appears to be a plain problem in the data. The character "|" should be 
%-escaped. Have you talked with the data providers to figure out why the data is invalid? 
You don't show where this triple comes from, but since Andy had no problem loading a more 
recent data set from the same provider, perhaps you can just try that.

Parsing in Jena intentionally defaults to rejecting invalid RDF. That's by far 
the safest approach for a library system like Jena. You can catch an exception 
and ignore the invalid data, and if that works for your application, good, or 
you can try to take some more sophisticated approach. But in any event you'll 
generally be well-advised to clean up the data _before_ it goes into your 
application. For one thing, Jena's tools (e.g. tdbloader) expect valid data.

As for your code terminating, you don't show your code with a try-catch, so we 
can't help you very well.

Adam


On Apr 16, 2018, at 1:50 PM, Samita Bai / PhD CS Scholar @ City Campus 
 wrote:

Even if I am using try catch to catch RiotException but my code still gets 
terminated on this exception 😞


Regards,

Samita Bai


From: Samita Bai / PhD CS Scholar @ City Campus
Sent: 16 April 2018 22:32:26
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters


Yes I am using the same data but of Feb, 2018 as I started experimenting that 
time. For example for  the following piece of code I am getting the error as 
shown below.


public class ReadQuadInJena {

public static void main(String[] args) {
// TODO Auto-generated method stub
String FileName = "/home/samita/Dyldo_DS_4Feb2018/data.nq";
DatasetGraph dsg = RDFDataMgr.loadDatasetGraph(FileName);
//System.out.println(node);
Iterator iterQuad = dsg.find();
while(iterQuad.hasNext()){
System.out.println(iterQuad.next());
}






Exception in thread "main" org.apache.jena.riot.RiotException: [line: 89841, col: 
232] Illegal character in IRI (codepoint 0x7C, '|'): 
<http://fonts.googleapis.com/css?family=Nunito[|]...>
at 
org.apache.jena.

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus

Dear Andy,



I got the following exception with TDBLoader for the latest dataset as you said.


Exception in thread "main" org.apache.jena.atlas.RuntimeIOException: 
java.nio.charset.MalformedInputException: Input length = 1
at org.apache.jena.atlas.io.IO.exception(IO.java:233)
at 
org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
at 
org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
at 
org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:235)
at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:229)
at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:151)
at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:92)
at 
org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:48)
at org.apache.jena.riot.lang.RiotParsers.createParser(RiotParsers.java:57)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:194)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:303)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:277)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:890)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:680)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:649)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:637)
at 
org.apache.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:143)
at 
org.apache.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:109)
at org.apache.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:252)
at org.apache.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:184)
at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:74)
at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:53)
at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:44)
at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:42)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
at java.base/java.io.Reader.read(Reader.java:140)
... 26 more




From: ajs6f 
Sent: 17 April 2018 00:24:12
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

This appears to be a plain problem in the data. The character "|" should be 
%-escaped. Have you talked with the data providers to figure out why the data 
is invalid? You don't show where this triple comes from, but since Andy had no 
problem loading a more recent data set from the same provider, perhaps you can 
just try that.

Parsing in Jena intentionally defaults to rejecting invalid RDF. That's by far 
the safest approach for a library system like Jena. You can catch an exception 
and ignore the invalid data, and if that works for your application, good, or 
you can try to take some more sophisticated approach. But in any event you'll 
generally be well-advised to clean up the data _before_ it goes into your 
application. For one thing, Jena's tools (e.g. tdbloader) expect valid data.

As for your code terminating, you don't show your code with a try-catch, so we 
can't help you very well.

Adam

> On Apr 16, 2018, at 1:50 PM, Samita Bai / PhD CS Scholar @ City Campus 
>  wrote:
>
> Even if I am using try catch to catch RiotException but my code still gets 
> terminated on this exception 😞
>
>
> Regards,
>
> Samita Bai
>
> 
> From: Samita Bai / PhD CS Scholar @ City Campus
> Sent: 16 April 2018 22:32:26
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
>
>
> Yes I am using the same data but of Feb, 2018 as I started experimenting that 
> time. For example for  the following piece of code I am getting the error as 
> shown below.
>
>
> public class ReadQuadInJena {
>
> public static void main(String[] args) {
> // TODO Auto-generated method stub
> String FileName = "/home/samita/Dyldo_DS_4Feb2018/data.nq";
> DatasetGraph dsg = RDFDataMgr.loadDatasetGraph(FileName);
> //System.out.println(node);
> Iterator iterQuad = dsg.find();
> while(iterQuad.hasNext()){
> System.out.println(iterQuad.next());
> }
>
>
>
>
>
>
> Exception in thread "main" org.apache.jena.riot.RiotException: [line: 89841, 
> col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
> <http://fonts.googleapis.com/css

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
Dear Andy,

I received this exception with TDBLoader for the dataset you told me.


Exception in thread "main" org.apache.jena.atlas.RuntimeIOException: 
java.nio.charset.MalformedInputException: Input length = 1
at org.apache.jena.atlas.io.IO.exception(IO.java:233)
at 
org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
at 
org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
at 
org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:235)
at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:229)
at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:151)
at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:92)
at 
org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:48)
at org.apache.jena.riot.lang.RiotParsers.createParser(RiotParsers.java:57)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:194)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:303)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:277)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:890)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:680)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:649)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:637)
at 
org.apache.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:143)
at 
org.apache.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:109)
at org.apache.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:252)
at org.apache.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:184)
at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:74)
at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:53)
at org.apache.jena.tdb.TDBLoader.load(TDBLoader.java:44)
at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:42)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
at java.base/java.io.Reader.read(Reader.java:140)
... 26 more




From: Samita Bai / PhD CS Scholar @ City Campus
Sent: 17 April 2018 01:09:53
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters


i have updated the code as follows:


public class ReadQuadInJena {

public static void main(String[] args) {
// TODO Auto-generated method stub
String URL = "http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz";;
Location location = Location.create("/home/samita/TDBLoaded");
DatasetGraphTDB dgtdb = DatasetBuilderStd.create(location);
TDBLoader.load(dgtdb, URL);
}}


Code is running, I really don't know will it take data from .nq.gz or not?




From: Samita Bai / PhD CS Scholar @ City Campus
Sent: 17 April 2018 00:43:10
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters


Dear Adam,


I just simply put the data in try-catch block like this


try {
Iterator iterQuad = dsg.find();
while(iterQuad.hasNext()){
System.out.println(iterQuad.next());
}}catch(RiotException e) {System.out.println("caught");}

}

OK I can work with new data using TDBLoader. Please help me should I download 
and extract the data first as it is .nq.gz format?


Or should I give URL of this file in TDBLoader? will it work with .gz format?


Plus if you have any code snippet of using TDBLoader please share with me as I 
am trying to create DatasetGraphTDB but I could not 😞


Regards,

Samita Bai


From: ajs6f 
Sent: 17 April 2018 00:24:12
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

This appears to be a plain problem in the data. The character "|" should be 
%-escaped. Have you talked with the data providers to figure out why the data 
is invalid? You don't show where this triple comes from, but since Andy had no 
problem loading a more recent data set from the same provider, perhaps you can 
just try that.

Parsing in Jena intentionally defaults to rejecting invalid RDF. That's by far 
the safest approach for a library system like Jena. You can catch an exception 
and ignore the invalid data, and if that works for your application, good, or 
you can try to take some more sophisticated approach. But in any event you'll 
generally be well-advised to clean up the data _before_ it goes into your 
application. For one thing, Jena's tools (e.g. tdbloader) expect valid data.

As for your code terminating, you don't show

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
i have updated the code as follows:


public class ReadQuadInJena {

public static void main(String[] args) {
// TODO Auto-generated method stub
String URL = "http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz";;
Location location = Location.create("/home/samita/TDBLoaded");
DatasetGraphTDB dgtdb = DatasetBuilderStd.create(location);
TDBLoader.load(dgtdb, URL);
}}


Code is running, I really don't know will it take data from .nq.gz or not?




From: Samita Bai / PhD CS Scholar @ City Campus
Sent: 17 April 2018 00:43:10
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters


Dear Adam,


I just simply put the data in try-catch block like this


try {
Iterator iterQuad = dsg.find();
while(iterQuad.hasNext()){
System.out.println(iterQuad.next());
}}catch(RiotException e) {System.out.println("caught");}

}

OK I can work with new data using TDBLoader. Please help me should I download 
and extract the data first as it is .nq.gz format?


Or should I give URL of this file in TDBLoader? will it work with .gz format?


Plus if you have any code snippet of using TDBLoader please share with me as I 
am trying to create DatasetGraphTDB but I could not 😞


Regards,

Samita Bai


From: ajs6f 
Sent: 17 April 2018 00:24:12
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

This appears to be a plain problem in the data. The character "|" should be 
%-escaped. Have you talked with the data providers to figure out why the data 
is invalid? You don't show where this triple comes from, but since Andy had no 
problem loading a more recent data set from the same provider, perhaps you can 
just try that.

Parsing in Jena intentionally defaults to rejecting invalid RDF. That's by far 
the safest approach for a library system like Jena. You can catch an exception 
and ignore the invalid data, and if that works for your application, good, or 
you can try to take some more sophisticated approach. But in any event you'll 
generally be well-advised to clean up the data _before_ it goes into your 
application. For one thing, Jena's tools (e.g. tdbloader) expect valid data.

As for your code terminating, you don't show your code with a try-catch, so we 
can't help you very well.

Adam

> On Apr 16, 2018, at 1:50 PM, Samita Bai / PhD CS Scholar @ City Campus 
>  wrote:
>
> Even if I am using try catch to catch RiotException but my code still gets 
> terminated on this exception 😞
>
>
> Regards,
>
> Samita Bai
>
> 
> From: Samita Bai / PhD CS Scholar @ City Campus
> Sent: 16 April 2018 22:32:26
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
>
>
> Yes I am using the same data but of Feb, 2018 as I started experimenting that 
> time. For example for  the following piece of code I am getting the error as 
> shown below.
>
>
> public class ReadQuadInJena {
>
> public static void main(String[] args) {
> // TODO Auto-generated method stub
> String FileName = "/home/samita/Dyldo_DS_4Feb2018/data.nq";
> DatasetGraph dsg = RDFDataMgr.loadDatasetGraph(FileName);
> //System.out.println(node);
> Iterator iterQuad = dsg.find();
> while(iterQuad.hasNext()){
> System.out.println(iterQuad.next());
> }
>
>
>
>
>
>
> Exception in thread "main" org.apache.jena.riot.RiotException: [line: 89841, 
> col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
> <http://fonts.googleapis.com/css?family=Nunito[|]...>
> at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
> at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
> at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
> at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:67)
> at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:54)
> at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
> at 
> org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:195)
> at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
> at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:303)
> at org.apache.jena.riot.RDFParser.parse(RDFParser.java:277)
> at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
> at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:890)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:519)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:486)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:439)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:419)
> at org.apache.jena.riot.RDFDataMgr.loadDatasetGraph(RDFDataMgr.java:392)
> at ldbqPa

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
Dear Adam,


I just simply put the data in try-catch block like this


try {
Iterator iterQuad = dsg.find();
while(iterQuad.hasNext()){
System.out.println(iterQuad.next());
}}catch(RiotException e) {System.out.println("caught");}

}

OK I can work with new data using TDBLoader. Please help me should I download 
and extract the data first as it is .nq.gz format?


Or should I give URL of this file in TDBLoader? will it work with .gz format?


Plus if you have any code snippet of using TDBLoader please share with me as I 
am trying to create DatasetGraphTDB but I could not 😞


Regards,

Samita Bai


From: ajs6f 
Sent: 17 April 2018 00:24:12
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

This appears to be a plain problem in the data. The character "|" should be 
%-escaped. Have you talked with the data providers to figure out why the data 
is invalid? You don't show where this triple comes from, but since Andy had no 
problem loading a more recent data set from the same provider, perhaps you can 
just try that.

Parsing in Jena intentionally defaults to rejecting invalid RDF. That's by far 
the safest approach for a library system like Jena. You can catch an exception 
and ignore the invalid data, and if that works for your application, good, or 
you can try to take some more sophisticated approach. But in any event you'll 
generally be well-advised to clean up the data _before_ it goes into your 
application. For one thing, Jena's tools (e.g. tdbloader) expect valid data.

As for your code terminating, you don't show your code with a try-catch, so we 
can't help you very well.

Adam

> On Apr 16, 2018, at 1:50 PM, Samita Bai / PhD CS Scholar @ City Campus 
>  wrote:
>
> Even if I am using try catch to catch RiotException but my code still gets 
> terminated on this exception 😞
>
>
> Regards,
>
> Samita Bai
>
> 
> From: Samita Bai / PhD CS Scholar @ City Campus
> Sent: 16 April 2018 22:32:26
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
>
>
> Yes I am using the same data but of Feb, 2018 as I started experimenting that 
> time. For example for  the following piece of code I am getting the error as 
> shown below.
>
>
> public class ReadQuadInJena {
>
> public static void main(String[] args) {
> // TODO Auto-generated method stub
> String FileName = "/home/samita/Dyldo_DS_4Feb2018/data.nq";
> DatasetGraph dsg = RDFDataMgr.loadDatasetGraph(FileName);
> //System.out.println(node);
> Iterator iterQuad = dsg.find();
> while(iterQuad.hasNext()){
> System.out.println(iterQuad.next());
> }
>
>
>
>
>
>
> Exception in thread "main" org.apache.jena.riot.RiotException: [line: 89841, 
> col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
> <http://fonts.googleapis.com/css?family=Nunito[|]...>
> at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
> at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
> at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
> at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:67)
> at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:54)
> at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
> at 
> org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:195)
> at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
> at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:303)
> at org.apache.jena.riot.RDFParser.parse(RDFParser.java:277)
> at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
> at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:890)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:519)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:486)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:439)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:419)
> at org.apache.jena.riot.RDFDataMgr.loadDatasetGraph(RDFDataMgr.java:392)
> at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:19)
>
>
>
> 
> From: Andy Seaborne 
> Sent: 16 April 2018 22:13:36
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
>
> I downlaoded
>
> http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz
>
> (the latest I could find)
>
> and used tdblaoder.
>
> Is that the data you are using?
>
> Andy
>
> On 16/04/18 17:32, ajs6f wrote:
>> You should be able to check the validity of any of your files just by 
>> running them through Jena's `riot` command.
>>
>> 

Re: TDB 2 Store Parameters

2018-04-16 Thread ajs6f
This appears to be a plain problem in the data. The character "|" should be 
%-escaped. Have you talked with the data providers to figure out why the data 
is invalid? You don't show where this triple comes from, but since Andy had no 
problem loading a more recent data set from the same provider, perhaps you can 
just try that.

Parsing in Jena intentionally defaults to rejecting invalid RDF. That's by far 
the safest approach for a library system like Jena. You can catch an exception 
and ignore the invalid data, and if that works for your application, good, or 
you can try to take some more sophisticated approach. But in any event you'll 
generally be well-advised to clean up the data _before_ it goes into your 
application. For one thing, Jena's tools (e.g. tdbloader) expect valid data.

As for your code terminating, you don't show your code with a try-catch, so we 
can't help you very well.

Adam

> On Apr 16, 2018, at 1:50 PM, Samita Bai / PhD CS Scholar @ City Campus 
>  wrote:
> 
> Even if I am using try catch to catch RiotException but my code still gets 
> terminated on this exception 😞
> 
> 
> Regards,
> 
> Samita Bai
> 
> 
> From: Samita Bai / PhD CS Scholar @ City Campus
> Sent: 16 April 2018 22:32:26
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
> 
> 
> Yes I am using the same data but of Feb, 2018 as I started experimenting that 
> time. For example for  the following piece of code I am getting the error as 
> shown below.
> 
> 
> public class ReadQuadInJena {
> 
> public static void main(String[] args) {
> // TODO Auto-generated method stub
> String FileName = "/home/samita/Dyldo_DS_4Feb2018/data.nq";
> DatasetGraph dsg = RDFDataMgr.loadDatasetGraph(FileName);
> //System.out.println(node);
> Iterator iterQuad = dsg.find();
> while(iterQuad.hasNext()){
> System.out.println(iterQuad.next());
> }
> 
> 
> 
> 
> 
> 
> Exception in thread "main" org.apache.jena.riot.RiotException: [line: 89841, 
> col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
> <http://fonts.googleapis.com/css?family=Nunito[|]...>
> at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
> at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
> at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
> at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:67)
> at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:54)
> at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
> at 
> org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:195)
> at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
> at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:303)
> at org.apache.jena.riot.RDFParser.parse(RDFParser.java:277)
> at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
> at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:890)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:519)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:486)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:439)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:419)
> at org.apache.jena.riot.RDFDataMgr.loadDatasetGraph(RDFDataMgr.java:392)
> at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:19)
> 
> 
> 
> 
> From: Andy Seaborne 
> Sent: 16 April 2018 22:13:36
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
> 
> I downlaoded
> 
> http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz
> 
> (the latest I could find)
> 
> and used tdblaoder.
> 
> Is that the data you are using?
> 
> Andy
> 
> On 16/04/18 17:32, ajs6f wrote:
>> You should be able to check the validity of any of your files just by 
>> running them through Jena's `riot` command.
>> 
>> You can try loading them into a TDB1 or TDB2 db by using the `tdbloader` or 
>> `tdb2.tdbloader` commands.
>> 
>> ajs6f
>> 
>>> On Apr 16, 2018, at 12:28 PM, Samita Bai / PhD CS Scholar @ City Campus 
>>>  wrote:
>>> 
>>> OK Andy I got your point. Can you please share the code that you used to 
>>> read the Dynamic Linked Data Observatory dataset?
>>> 
>>> 
>>> 
>>> Regards,
>>> 
>>> Samita Bai
>>> 
>>> 
>>> From: Andy Seaborne 
>>> Sent: 16 April 2018 15:34:07
>>> To: users@jena.apache.org
>>> Subject: Re: TDB 2 Stor

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
Even if I am using try catch to catch RiotException but my code still gets 
terminated on this exception 😞


Regards,

Samita Bai


From: Samita Bai / PhD CS Scholar @ City Campus
Sent: 16 April 2018 22:32:26
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters


Yes I am using the same data but of Feb, 2018 as I started experimenting that 
time. For example for  the following piece of code I am getting the error as 
shown below.


public class ReadQuadInJena {

public static void main(String[] args) {
// TODO Auto-generated method stub
String FileName = "/home/samita/Dyldo_DS_4Feb2018/data.nq";
DatasetGraph dsg = RDFDataMgr.loadDatasetGraph(FileName);
//System.out.println(node);
Iterator iterQuad = dsg.find();
while(iterQuad.hasNext()){
System.out.println(iterQuad.next());
}






Exception in thread "main" org.apache.jena.riot.RiotException: [line: 89841, 
col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
<http://fonts.googleapis.com/css?family=Nunito[|]...>
at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:67)
at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:54)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:195)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:303)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:277)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:890)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:519)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:486)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:439)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:419)
at org.apache.jena.riot.RDFDataMgr.loadDatasetGraph(RDFDataMgr.java:392)
at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:19)




From: Andy Seaborne 
Sent: 16 April 2018 22:13:36
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

I downlaoded

http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz

(the latest I could find)

and used tdblaoder.

Is that the data you are using?

 Andy

On 16/04/18 17:32, ajs6f wrote:
> You should be able to check the validity of any of your files just by running 
> them through Jena's `riot` command.
>
> You can try loading them into a TDB1 or TDB2 db by using the `tdbloader` or 
> `tdb2.tdbloader` commands.
>
> ajs6f
>
>> On Apr 16, 2018, at 12:28 PM, Samita Bai / PhD CS Scholar @ City Campus 
>>  wrote:
>>
>> OK Andy I got your point. Can you please share the code that you used to 
>> read the Dynamic Linked Data Observatory dataset?
>>
>>
>>
>> Regards,
>>
>> Samita Bai
>>
>> ________
>> From: Andy Seaborne 
>> Sent: 16 April 2018 15:34:07
>> To: users@jena.apache.org
>> Subject: Re: TDB 2 Store Parameters
>>
>> If you wish to prcoess the data as it is parsed, then see StreamRDF and
>> either
>>
>> NxParser, which is not part of Jena, is not a validating parser.
>>
>> If the data is not valid, then you will have problems at some point,
>> either loading, querying or outputting later.
>>
>> Adam has explained that TDB2 inxexes heavily so that querying is well
>> severed.
>>
>> We can't help with the parser errors without knowing what they are.
>>
>> Which files from Dynamic Linked Data Observatory are you processing?
>> Don't the later ones replace the earlier ones?
>>
>> I found that the last n-quads file was 42 million triples and all valid.
>>
>>  Andy
>>
>> On 16/04/18 11:05, ajs6f wrote:
>>> Is there are syntax errors in your RDF (and it sounds like that is why Jena 
>>> will not read it directly) you are doing yourself no service by taking 
>>> unusual pains to force TDB to ingest your data.
>>>
>>> Please show us the errors that Jena is throwing trying to read your data 
>>> and an appropriate sample of the data in question.
>>>
>>>
>>> ajs6f
>>>
>>>> On Apr 16, 2018, at 4:42 AM, Samita Bai / PhD CS Scholar @ City Campus 
>>>>  wrote:
>>>>
>>>> In addition to previous query. It is taking a lot of time to first parse 
>>>> the dataset usin

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
Yes I am using the same data but of Feb, 2018 as I started experimenting that 
time. For example for  the following piece of code I am getting the error as 
shown below.


public class ReadQuadInJena {

public static void main(String[] args) {
// TODO Auto-generated method stub
String FileName = "/home/samita/Dyldo_DS_4Feb2018/data.nq";
DatasetGraph dsg = RDFDataMgr.loadDatasetGraph(FileName);
//System.out.println(node);
Iterator iterQuad = dsg.find();
while(iterQuad.hasNext()){
System.out.println(iterQuad.next());
}






Exception in thread "main" org.apache.jena.riot.RiotException: [line: 89841, 
col: 232] Illegal character in IRI (codepoint 0x7C, '|'): 
<http://fonts.googleapis.com/css?family=Nunito[|]...>
at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:67)
at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:54)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:195)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:334)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:303)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:277)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:890)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:519)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:486)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:439)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:419)
at org.apache.jena.riot.RDFDataMgr.loadDatasetGraph(RDFDataMgr.java:392)
at ldbqPack.ReadQuadInJena.main(ReadQuadInJena.java:19)




From: Andy Seaborne 
Sent: 16 April 2018 22:13:36
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

I downlaoded

http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz

(the latest I could find)

and used tdblaoder.

Is that the data you are using?

 Andy

On 16/04/18 17:32, ajs6f wrote:
> You should be able to check the validity of any of your files just by running 
> them through Jena's `riot` command.
>
> You can try loading them into a TDB1 or TDB2 db by using the `tdbloader` or 
> `tdb2.tdbloader` commands.
>
> ajs6f
>
>> On Apr 16, 2018, at 12:28 PM, Samita Bai / PhD CS Scholar @ City Campus 
>>  wrote:
>>
>> OK Andy I got your point. Can you please share the code that you used to 
>> read the Dynamic Linked Data Observatory dataset?
>>
>>
>>
>> Regards,
>>
>> Samita Bai
>>
>> ________
>> From: Andy Seaborne 
>> Sent: 16 April 2018 15:34:07
>> To: users@jena.apache.org
>> Subject: Re: TDB 2 Store Parameters
>>
>> If you wish to prcoess the data as it is parsed, then see StreamRDF and
>> either
>>
>> NxParser, which is not part of Jena, is not a validating parser.
>>
>> If the data is not valid, then you will have problems at some point,
>> either loading, querying or outputting later.
>>
>> Adam has explained that TDB2 inxexes heavily so that querying is well
>> severed.
>>
>> We can't help with the parser errors without knowing what they are.
>>
>> Which files from Dynamic Linked Data Observatory are you processing?
>> Don't the later ones replace the earlier ones?
>>
>> I found that the last n-quads file was 42 million triples and all valid.
>>
>>  Andy
>>
>> On 16/04/18 11:05, ajs6f wrote:
>>> Is there are syntax errors in your RDF (and it sounds like that is why Jena 
>>> will not read it directly) you are doing yourself no service by taking 
>>> unusual pains to force TDB to ingest your data.
>>>
>>> Please show us the errors that Jena is throwing trying to read your data 
>>> and an appropriate sample of the data in question.
>>>
>>>
>>> ajs6f
>>>
>>>> On Apr 16, 2018, at 4:42 AM, Samita Bai / PhD CS Scholar @ City Campus 
>>>>  wrote:
>>>>
>>>> In addition to previous query. It is taking a lot of time to first parse 
>>>> the dataset using NXParser then checking for object, and creating quad 
>>>> again and storing in TDB. It could be very simple if we can take the quad 
>>>> check its object and insert it in TDB.
>>>>
>>>>
>>>> But Jena is not helping me with this 😞
>>&

Re: TDB 2 Store Parameters

2018-04-16 Thread Andy Seaborne

I downlaoded

http://swse.deri.org/dyldo/data/2016-03-27/data.nq.gz

(the latest I could find)

and used tdblaoder.

Is that the data you are using?

Andy

On 16/04/18 17:32, ajs6f wrote:

You should be able to check the validity of any of your files just by running 
them through Jena's `riot` command.

You can try loading them into a TDB1 or TDB2 db by using the `tdbloader` or 
`tdb2.tdbloader` commands.

ajs6f


On Apr 16, 2018, at 12:28 PM, Samita Bai / PhD CS Scholar @ City Campus 
 wrote:

OK Andy I got your point. Can you please share the code that you used to read 
the Dynamic Linked Data Observatory dataset?



Regards,

Samita Bai


From: Andy Seaborne 
Sent: 16 April 2018 15:34:07
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

If you wish to prcoess the data as it is parsed, then see StreamRDF and
either

NxParser, which is not part of Jena, is not a validating parser.

If the data is not valid, then you will have problems at some point,
either loading, querying or outputting later.

Adam has explained that TDB2 inxexes heavily so that querying is well
severed.

We can't help with the parser errors without knowing what they are.

Which files from Dynamic Linked Data Observatory are you processing?
Don't the later ones replace the earlier ones?

I found that the last n-quads file was 42 million triples and all valid.

 Andy

On 16/04/18 11:05, ajs6f wrote:

Is there are syntax errors in your RDF (and it sounds like that is why Jena 
will not read it directly) you are doing yourself no service by taking unusual 
pains to force TDB to ingest your data.

Please show us the errors that Jena is throwing trying to read your data and an 
appropriate sample of the data in question.


ajs6f


On Apr 16, 2018, at 4:42 AM, Samita Bai / PhD CS Scholar @ City Campus 
 wrote:

In addition to previous query. It is taking a lot of time to first parse the 
dataset using NXParser then checking for object, and creating quad again and 
storing in TDB. It could be very simple if we can take the quad check its 
object and insert it in TDB.


But Jena is not helping me with this 😞


So I have to create quads again and store it in TDB.


Any help is surely appreciated.


Regards,

Samita Bai


From: Samita Bai / PhD CS Scholar @ City Campus
Sent: 16 April 2018 13:33:51
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters


Thank you Andy and Adam for the help. Actually, I am just indexing the quads 
where object is either literal or foreign URI (i.e. Object belonging to 
different dataset than subject), I am using NXParser (as Jena is giving various 
parsing errors) to parse the dataset and then I am storing in TDB2 in the 
following manner.



public  void SetQuadsList(String sub, String pred, String obj, String context) {
Node subjects = NodeFactory.createURI(sub);
Node objects = NodeFactory.createURI(obj);
Node contexts =NodeFactory.createURI(context);
//Node rdfSeeAlso = RDFS.seeAlso.asNode();

Node predicates =NodeFactory.createURI(pred);

//Quad quads = Quad.create(contexts, objects, rdfSeeAlso, subjects);

Quad quads = Quad.create(contexts, subjects, predicates, objects);

QuadList.add(quads);

//System.out.println("Number of backlinks:" + QuadList.size());

//System.out.println("quad written");

//System.out.println("Quad"+quads.toString());

}
public List GetQuadsList(){
return QuadList;
}
public void QuadsToTDB(List quadList) {
final String DATASET_DIR_NAME = "DyLDO1000K_Index";
Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );


dataset.begin ( ReadWrite.WRITE );
try {
DatasetGraph dsg = dataset.asDatasetGraph();
Iterator quads = quadList.iterator();
System.out.println("Size of Quad List: "+quadList.size());
while ( quads.hasNext() ) {
//System.out.println("here");
Quad quad = quads.next();
dsg.add(quad);
//System.out.println(quad.toString()+ "added");
//RDFDataMgr.writeQuads(System.out, quads);
  //  RDFDataMgr.write(System.out, dsg, Lang.NQUADS);

}
System.out.println("dsg created of size "+dsg.size());
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
System.out.println("written dsg using datamgr.");


//System.out.println(dataset.isEmpty());
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
dataset.commit();

System.out.println("committed dataset.");


} catch ( Exception e ) {
e.printStackTrace(System.err);
//dataset.abort();
} finally {
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
dataset.end();

}
System.out.println("end method.");
}}


I 

Re: TDB 2 Store Parameters

2018-04-16 Thread ajs6f
You should be able to check the validity of any of your files just by running 
them through Jena's `riot` command.

You can try loading them into a TDB1 or TDB2 db by using the `tdbloader` or 
`tdb2.tdbloader` commands.

ajs6f

> On Apr 16, 2018, at 12:28 PM, Samita Bai / PhD CS Scholar @ City Campus 
>  wrote:
> 
> OK Andy I got your point. Can you please share the code that you used to read 
> the Dynamic Linked Data Observatory dataset?
> 
> 
> 
> Regards,
> 
> Samita Bai
> 
> 
> From: Andy Seaborne 
> Sent: 16 April 2018 15:34:07
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
> 
> If you wish to prcoess the data as it is parsed, then see StreamRDF and
> either
> 
> NxParser, which is not part of Jena, is not a validating parser.
> 
> If the data is not valid, then you will have problems at some point,
> either loading, querying or outputting later.
> 
> Adam has explained that TDB2 inxexes heavily so that querying is well
> severed.
> 
> We can't help with the parser errors without knowing what they are.
> 
> Which files from Dynamic Linked Data Observatory are you processing?
> Don't the later ones replace the earlier ones?
> 
> I found that the last n-quads file was 42 million triples and all valid.
> 
> Andy
> 
> On 16/04/18 11:05, ajs6f wrote:
>> Is there are syntax errors in your RDF (and it sounds like that is why Jena 
>> will not read it directly) you are doing yourself no service by taking 
>> unusual pains to force TDB to ingest your data.
>> 
>> Please show us the errors that Jena is throwing trying to read your data and 
>> an appropriate sample of the data in question.
>> 
>> 
>> ajs6f
>> 
>>> On Apr 16, 2018, at 4:42 AM, Samita Bai / PhD CS Scholar @ City Campus 
>>>  wrote:
>>> 
>>> In addition to previous query. It is taking a lot of time to first parse 
>>> the dataset using NXParser then checking for object, and creating quad 
>>> again and storing in TDB. It could be very simple if we can take the quad 
>>> check its object and insert it in TDB.
>>> 
>>> 
>>> But Jena is not helping me with this 😞
>>> 
>>> 
>>> So I have to create quads again and store it in TDB.
>>> 
>>> 
>>> Any help is surely appreciated.
>>> 
>>> 
>>> Regards,
>>> 
>>> Samita Bai
>>> 
>>> 
>>> From: Samita Bai / PhD CS Scholar @ City Campus
>>> Sent: 16 April 2018 13:33:51
>>> To: users@jena.apache.org
>>> Subject: Re: TDB 2 Store Parameters
>>> 
>>> 
>>> Thank you Andy and Adam for the help. Actually, I am just indexing the 
>>> quads where object is either literal or foreign URI (i.e. Object belonging 
>>> to different dataset than subject), I am using NXParser (as Jena is giving 
>>> various parsing errors) to parse the dataset and then I am storing in TDB2 
>>> in the following manner.
>>> 
>>> 
>>> 
>>> public  void SetQuadsList(String sub, String pred, String obj, String 
>>> context) {
>>> Node subjects = NodeFactory.createURI(sub);
>>> Node objects = NodeFactory.createURI(obj);
>>> Node contexts =NodeFactory.createURI(context);
>>> //Node rdfSeeAlso = RDFS.seeAlso.asNode();
>>> 
>>> Node predicates =NodeFactory.createURI(pred);
>>> 
>>> //Quad quads = Quad.create(contexts, objects, rdfSeeAlso, subjects);
>>> 
>>> Quad quads = Quad.create(contexts, subjects, predicates, objects);
>>> 
>>> QuadList.add(quads);
>>> 
>>> //System.out.println("Number of backlinks:" + QuadList.size());
>>> 
>>> //System.out.println("quad written");
>>> 
>>> //System.out.println("Quad"+quads.toString());
>>> 
>>> }
>>> public List GetQuadsList(){
>>> return QuadList;
>>> }
>>> public void QuadsToTDB(List quadList) {
>>> final String DATASET_DIR_NAME = "DyLDO1000K_Index";
>>>Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );
>>> 
>>> 
>>>dataset.begin ( ReadWrite.WRITE );
>>>try {
>>>DatasetGraph dsg = dataset.asDatasetGraph();
>>>Iterator quads = quadList.iterator();
>>>System.out.println("Size of Quad List: "+quadList.size());
>>>while ( quads.hasNext() ) {
>>>   

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
OK Andy I got your point. Can you please share the code that you used to read 
the Dynamic Linked Data Observatory dataset?



Regards,

Samita Bai


From: Andy Seaborne 
Sent: 16 April 2018 15:34:07
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

If you wish to prcoess the data as it is parsed, then see StreamRDF and
either

NxParser, which is not part of Jena, is not a validating parser.

If the data is not valid, then you will have problems at some point,
either loading, querying or outputting later.

Adam has explained that TDB2 inxexes heavily so that querying is well
severed.

We can't help with the parser errors without knowing what they are.

Which files from Dynamic Linked Data Observatory are you processing?
Don't the later ones replace the earlier ones?

I found that the last n-quads file was 42 million triples and all valid.

 Andy

On 16/04/18 11:05, ajs6f wrote:
> Is there are syntax errors in your RDF (and it sounds like that is why Jena 
> will not read it directly) you are doing yourself no service by taking 
> unusual pains to force TDB to ingest your data.
>
> Please show us the errors that Jena is throwing trying to read your data and 
> an appropriate sample of the data in question.
>
>
> ajs6f
>
>> On Apr 16, 2018, at 4:42 AM, Samita Bai / PhD CS Scholar @ City Campus 
>>  wrote:
>>
>> In addition to previous query. It is taking a lot of time to first parse the 
>> dataset using NXParser then checking for object, and creating quad again and 
>> storing in TDB. It could be very simple if we can take the quad check its 
>> object and insert it in TDB.
>>
>>
>> But Jena is not helping me with this 😞
>>
>>
>> So I have to create quads again and store it in TDB.
>>
>>
>> Any help is surely appreciated.
>>
>>
>> Regards,
>>
>> Samita Bai
>>
>> ____
>> From: Samita Bai / PhD CS Scholar @ City Campus
>> Sent: 16 April 2018 13:33:51
>> To: users@jena.apache.org
>> Subject: Re: TDB 2 Store Parameters
>>
>>
>> Thank you Andy and Adam for the help. Actually, I am just indexing the quads 
>> where object is either literal or foreign URI (i.e. Object belonging to 
>> different dataset than subject), I am using NXParser (as Jena is giving 
>> various parsing errors) to parse the dataset and then I am storing in TDB2 
>> in the following manner.
>>
>>
>>
>> public  void SetQuadsList(String sub, String pred, String obj, String 
>> context) {
>> Node subjects = NodeFactory.createURI(sub);
>> Node objects = NodeFactory.createURI(obj);
>> Node contexts =NodeFactory.createURI(context);
>> //Node rdfSeeAlso = RDFS.seeAlso.asNode();
>>
>> Node predicates =NodeFactory.createURI(pred);
>>
>> //Quad quads = Quad.create(contexts, objects, rdfSeeAlso, subjects);
>>
>> Quad quads = Quad.create(contexts, subjects, predicates, objects);
>>
>> QuadList.add(quads);
>>
>> //System.out.println("Number of backlinks:" + QuadList.size());
>>
>> //System.out.println("quad written");
>>
>> //System.out.println("Quad"+quads.toString());
>>
>> }
>> public List GetQuadsList(){
>> return QuadList;
>> }
>> public void QuadsToTDB(List quadList) {
>> final String DATASET_DIR_NAME = "DyLDO1000K_Index";
>> Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );
>>
>>
>> dataset.begin ( ReadWrite.WRITE );
>> try {
>> DatasetGraph dsg = dataset.asDatasetGraph();
>> Iterator quads = quadList.iterator();
>> System.out.println("Size of Quad List: "+quadList.size());
>> while ( quads.hasNext() ) {
>> //System.out.println("here");
>> Quad quad = quads.next();
>> dsg.add(quad);
>> //System.out.println(quad.toString()+ "added");
>> //RDFDataMgr.writeQuads(System.out, quads);
>>   //  RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
>>
>> }
>> System.out.println("dsg created of size "+dsg.size());
>> //RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
>> System.out.println("written dsg using datamgr.");
>>
>>
>> //System.out.println(dataset.isEmpty());
>> //RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
>> dataset.commit();
>>
>> System.

Re: TDB 2 Store Parameters

2018-04-16 Thread Andy Seaborne
If you wish to prcoess the data as it is parsed, then see StreamRDF and 
either


NxParser, which is not part of Jena, is not a validating parser.

If the data is not valid, then you will have problems at some point, 
either loading, querying or outputting later.


Adam has explained that TDB2 inxexes heavily so that querying is well 
severed.


We can't help with the parser errors without knowing what they are.

Which files from Dynamic Linked Data Observatory are you processing?
Don't the later ones replace the earlier ones?

I found that the last n-quads file was 42 million triples and all valid.

Andy

On 16/04/18 11:05, ajs6f wrote:

Is there are syntax errors in your RDF (and it sounds like that is why Jena 
will not read it directly) you are doing yourself no service by taking unusual 
pains to force TDB to ingest your data.

Please show us the errors that Jena is throwing trying to read your data and an 
appropriate sample of the data in question.


ajs6f


On Apr 16, 2018, at 4:42 AM, Samita Bai / PhD CS Scholar @ City Campus 
 wrote:

In addition to previous query. It is taking a lot of time to first parse the 
dataset using NXParser then checking for object, and creating quad again and 
storing in TDB. It could be very simple if we can take the quad check its 
object and insert it in TDB.


But Jena is not helping me with this 😞


So I have to create quads again and store it in TDB.


Any help is surely appreciated.


Regards,

Samita Bai


From: Samita Bai / PhD CS Scholar @ City Campus
Sent: 16 April 2018 13:33:51
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters


Thank you Andy and Adam for the help. Actually, I am just indexing the quads 
where object is either literal or foreign URI (i.e. Object belonging to 
different dataset than subject), I am using NXParser (as Jena is giving various 
parsing errors) to parse the dataset and then I am storing in TDB2 in the 
following manner.



public  void SetQuadsList(String sub, String pred, String obj, String context) {
Node subjects = NodeFactory.createURI(sub);
Node objects = NodeFactory.createURI(obj);
Node contexts =NodeFactory.createURI(context);
//Node rdfSeeAlso = RDFS.seeAlso.asNode();

Node predicates =NodeFactory.createURI(pred);

//Quad quads = Quad.create(contexts, objects, rdfSeeAlso, subjects);

Quad quads = Quad.create(contexts, subjects, predicates, objects);

QuadList.add(quads);

//System.out.println("Number of backlinks:" + QuadList.size());

//System.out.println("quad written");

//System.out.println("Quad"+quads.toString());

}
public List GetQuadsList(){
return QuadList;
}
public void QuadsToTDB(List quadList) {
final String DATASET_DIR_NAME = "DyLDO1000K_Index";
Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );


dataset.begin ( ReadWrite.WRITE );
try {
DatasetGraph dsg = dataset.asDatasetGraph();
Iterator quads = quadList.iterator();
System.out.println("Size of Quad List: "+quadList.size());
while ( quads.hasNext() ) {
//System.out.println("here");
Quad quad = quads.next();
dsg.add(quad);
//System.out.println(quad.toString()+ "added");
//RDFDataMgr.writeQuads(System.out, quads);
  //  RDFDataMgr.write(System.out, dsg, Lang.NQUADS);

}
System.out.println("dsg created of size "+dsg.size());
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
System.out.println("written dsg using datamgr.");


//System.out.println(dataset.isEmpty());
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
dataset.commit();

System.out.println("committed dataset.");


} catch ( Exception e ) {
e.printStackTrace(System.err);
//dataset.abort();
} finally {
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
dataset.end();

}
System.out.println("end method.");
}}


I have indexed 40,000 files (as I have spilited the dataset into files 
according to context) and the index size has become 120 GB. I have a total of 
1,35,600 files whose total size is 19.8 GB only.


Why the TDB is making such BIG index size. I am confused :( is there any 
problem in my code.


Please suggest me if there can be some improvements.



Regards,

Samita Bai






____
From: ajs6f 
Sent: 15 April 2018 03:07:59
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

42 million quads is nothing like so many that either TDB version should have 
any problem doing normal indexing (assuming very little in the way of 
hardware-- I ingest datasets like that on my laptop all the time).

Do you have some extraordinary hardware limitations?

Adam


On A

Re: TDB 2 Store Parameters

2018-04-16 Thread ajs6f
Is there are syntax errors in your RDF (and it sounds like that is why Jena 
will not read it directly) you are doing yourself no service by taking unusual 
pains to force TDB to ingest your data.

Please show us the errors that Jena is throwing trying to read your data and an 
appropriate sample of the data in question.


ajs6f

> On Apr 16, 2018, at 4:42 AM, Samita Bai / PhD CS Scholar @ City Campus 
>  wrote:
> 
> In addition to previous query. It is taking a lot of time to first parse the 
> dataset using NXParser then checking for object, and creating quad again and 
> storing in TDB. It could be very simple if we can take the quad check its 
> object and insert it in TDB.
> 
> 
> But Jena is not helping me with this 😞
> 
> 
> So I have to create quads again and store it in TDB.
> 
> 
> Any help is surely appreciated.
> 
> 
> Regards,
> 
> Samita Bai
> 
> 
> From: Samita Bai / PhD CS Scholar @ City Campus
> Sent: 16 April 2018 13:33:51
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
> 
> 
> Thank you Andy and Adam for the help. Actually, I am just indexing the quads 
> where object is either literal or foreign URI (i.e. Object belonging to 
> different dataset than subject), I am using NXParser (as Jena is giving 
> various parsing errors) to parse the dataset and then I am storing in TDB2 in 
> the following manner.
> 
> 
> 
> public  void SetQuadsList(String sub, String pred, String obj, String 
> context) {
> Node subjects = NodeFactory.createURI(sub);
> Node objects = NodeFactory.createURI(obj);
> Node contexts =NodeFactory.createURI(context);
> //Node rdfSeeAlso = RDFS.seeAlso.asNode();
> 
> Node predicates =NodeFactory.createURI(pred);
> 
> //Quad quads = Quad.create(contexts, objects, rdfSeeAlso, subjects);
> 
> Quad quads = Quad.create(contexts, subjects, predicates, objects);
> 
> QuadList.add(quads);
> 
> //System.out.println("Number of backlinks:" + QuadList.size());
> 
> //System.out.println("quad written");
> 
> //System.out.println("Quad"+quads.toString());
> 
> }
> public List GetQuadsList(){
> return QuadList;
> }
> public void QuadsToTDB(List quadList) {
> final String DATASET_DIR_NAME = "DyLDO1000K_Index";
>Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );
> 
> 
>dataset.begin ( ReadWrite.WRITE );
>try {
>DatasetGraph dsg = dataset.asDatasetGraph();
>Iterator quads = quadList.iterator();
>System.out.println("Size of Quad List: "+quadList.size());
>while ( quads.hasNext() ) {
>//System.out.println("here");
>Quad quad = quads.next();
>dsg.add(quad);
>//System.out.println(quad.toString()+ "added");
>//RDFDataMgr.writeQuads(System.out, quads);
>  //  RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
> 
>}
>System.out.println("dsg created of size "+dsg.size());
>//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
>System.out.println("written dsg using datamgr.");
> 
> 
>//System.out.println(dataset.isEmpty());
>//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
>dataset.commit();
> 
>System.out.println("committed dataset.");
> 
> 
>} catch ( Exception e ) {
>e.printStackTrace(System.err);
>//dataset.abort();
>} finally {
>//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
>dataset.end();
> 
>}
>System.out.println("end method.");
> }}
> 
> 
> I have indexed 40,000 files (as I have spilited the dataset into files 
> according to context) and the index size has become 120 GB. I have a total of 
> 1,35,600 files whose total size is 19.8 GB only.
> 
> 
> Why the TDB is making such BIG index size. I am confused :( is there any 
> problem in my code.
> 
> 
> Please suggest me if there can be some improvements.
> 
> 
> 
> Regards,
> 
> Samita Bai
> 
> 
> 
> 
> 
> 
> 
> From: ajs6f 
> Sent: 15 April 2018 03:07:59
> To: users@jena.apache.org
> Subject: Re: TDB 2 Store Parameters
> 
> 42 million quads is nothing like so many that either TDB version should have 
> any problem doing normal indexing (assuming very little in the way of 
> hardware-- I ingest datasets like that on my laptop all the time).
> 
> Do you have some extraordinary hardware limitations?
> 
> A

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
In addition to previous query. It is taking a lot of time to first parse the 
dataset using NXParser then checking for object, and creating quad again and 
storing in TDB. It could be very simple if we can take the quad check its 
object and insert it in TDB.


But Jena is not helping me with this 😞


So I have to create quads again and store it in TDB.


Any help is surely appreciated.


Regards,

Samita Bai


From: Samita Bai / PhD CS Scholar @ City Campus
Sent: 16 April 2018 13:33:51
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters


Thank you Andy and Adam for the help. Actually, I am just indexing the quads 
where object is either literal or foreign URI (i.e. Object belonging to 
different dataset than subject), I am using NXParser (as Jena is giving various 
parsing errors) to parse the dataset and then I am storing in TDB2 in the 
following manner.



public  void SetQuadsList(String sub, String pred, String obj, String context) {
Node subjects = NodeFactory.createURI(sub);
Node objects = NodeFactory.createURI(obj);
Node contexts =NodeFactory.createURI(context);
//Node rdfSeeAlso = RDFS.seeAlso.asNode();

Node predicates =NodeFactory.createURI(pred);

//Quad quads = Quad.create(contexts, objects, rdfSeeAlso, subjects);

Quad quads = Quad.create(contexts, subjects, predicates, objects);

QuadList.add(quads);

//System.out.println("Number of backlinks:" + QuadList.size());

//System.out.println("quad written");

//System.out.println("Quad"+quads.toString());

}
public List GetQuadsList(){
return QuadList;
}
public void QuadsToTDB(List quadList) {
final String DATASET_DIR_NAME = "DyLDO1000K_Index";
Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );


dataset.begin ( ReadWrite.WRITE );
try {
DatasetGraph dsg = dataset.asDatasetGraph();
Iterator quads = quadList.iterator();
System.out.println("Size of Quad List: "+quadList.size());
while ( quads.hasNext() ) {
//System.out.println("here");
Quad quad = quads.next();
dsg.add(quad);
//System.out.println(quad.toString()+ "added");
//RDFDataMgr.writeQuads(System.out, quads);
  //  RDFDataMgr.write(System.out, dsg, Lang.NQUADS);

}
System.out.println("dsg created of size "+dsg.size());
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
System.out.println("written dsg using datamgr.");


//System.out.println(dataset.isEmpty());
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
dataset.commit();

System.out.println("committed dataset.");


} catch ( Exception e ) {
e.printStackTrace(System.err);
//dataset.abort();
} finally {
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
dataset.end();

}
System.out.println("end method.");
}}


I have indexed 40,000 files (as I have spilited the dataset into files 
according to context) and the index size has become 120 GB. I have a total of 
1,35,600 files whose total size is 19.8 GB only.


Why the TDB is making such BIG index size. I am confused :( is there any 
problem in my code.


Please suggest me if there can be some improvements.



Regards,

Samita Bai






____________
From: ajs6f 
Sent: 15 April 2018 03:07:59
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

42 million quads is nothing like so many that either TDB version should have 
any problem doing normal indexing (assuming very little in the way of 
hardware-- I ingest datasets like that on my laptop all the time).

Do you have some extraordinary hardware limitations?

Adam

> On Apr 14, 2018, at 11:42 AM, Andy Seaborne  wrote:
>
> Hi Samita,
>
> Firstly - as Adam points out - if theer are no indexes then access to the 
> data will be very slow.  For a GSPO index,  that means squeries must be 
> "GRAPH  { ... }" and probably "GRAPH  { .. }".
>
> GSPO means lookup by G then S within those G and the same for P then O.
>
> I looked at the data and it seems to be able 42 million quads.
>
> Using TDB1 (the loader is faster at this scale currently) is likely to be a 
> better choice.
>
> Looking at StoreParams in TDB2:
>
> The code below creates the database at TDB2Factory.connectDataset so any 
> StoreParams after that do not affect indexing.
>
> I tried to make it work in the release but the code ignores provided 
> StoreParams - sorry.  Even if it did work, it hits a test to make sure there 
> are basic indexing (Adam's point).
>
>Andy
>
>
> On 13/04/18 13:42, Samita Bai  / PhD CS Scholar @ City Campus wrote:
>> I wrote t

Re: TDB 2 Store Parameters

2018-04-16 Thread Samita Bai / PhD CS Scholar @ City Campus
Thank you Andy and Adam for the help. Actually, I am just indexing the quads 
where object is either literal or foreign URI (i.e. Object belonging to 
different dataset than subject), I am using NXParser (as Jena is giving various 
parsing errors) to parse the dataset and then I am storing in TDB2 in the 
following manner.



public  void SetQuadsList(String sub, String pred, String obj, String context) {
Node subjects = NodeFactory.createURI(sub);
Node objects = NodeFactory.createURI(obj);
Node contexts =NodeFactory.createURI(context);
//Node rdfSeeAlso = RDFS.seeAlso.asNode();

Node predicates =NodeFactory.createURI(pred);

//Quad quads = Quad.create(contexts, objects, rdfSeeAlso, subjects);

Quad quads = Quad.create(contexts, subjects, predicates, objects);

QuadList.add(quads);

//System.out.println("Number of backlinks:" + QuadList.size());

//System.out.println("quad written");

//System.out.println("Quad"+quads.toString());

}
public List GetQuadsList(){
return QuadList;
}
public void QuadsToTDB(List quadList) {
final String DATASET_DIR_NAME = "DyLDO1000K_Index";
Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );


dataset.begin ( ReadWrite.WRITE );
try {
DatasetGraph dsg = dataset.asDatasetGraph();
Iterator quads = quadList.iterator();
System.out.println("Size of Quad List: "+quadList.size());
while ( quads.hasNext() ) {
//System.out.println("here");
Quad quad = quads.next();
dsg.add(quad);
//System.out.println(quad.toString()+ "added");
//RDFDataMgr.writeQuads(System.out, quads);
  //  RDFDataMgr.write(System.out, dsg, Lang.NQUADS);

}
System.out.println("dsg created of size "+dsg.size());
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
System.out.println("written dsg using datamgr.");


//System.out.println(dataset.isEmpty());
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
dataset.commit();

System.out.println("committed dataset.");


} catch ( Exception e ) {
e.printStackTrace(System.err);
//dataset.abort();
} finally {
//RDFDataMgr.write(System.out, dsg, Lang.NQUADS);
dataset.end();

}
System.out.println("end method.");
}}


I have indexed 40,000 files (as I have spilited the dataset into files 
according to context) and the index size has become 120 GB. I have a total of 
1,35,600 files whose total size is 19.8 GB only.


Why the TDB is making such BIG index size. I am confused :( is there any 
problem in my code.


Please suggest me if there can be some improvements.



Regards,

Samita Bai






____________
From: ajs6f 
Sent: 15 April 2018 03:07:59
To: users@jena.apache.org
Subject: Re: TDB 2 Store Parameters

42 million quads is nothing like so many that either TDB version should have 
any problem doing normal indexing (assuming very little in the way of 
hardware-- I ingest datasets like that on my laptop all the time).

Do you have some extraordinary hardware limitations?

Adam

> On Apr 14, 2018, at 11:42 AM, Andy Seaborne  wrote:
>
> Hi Samita,
>
> Firstly - as Adam points out - if theer are no indexes then access to the 
> data will be very slow.  For a GSPO index,  that means squeries must be 
> "GRAPH  { ... }" and probably "GRAPH  { .. }".
>
> GSPO means lookup by G then S within those G and the same for P then O.
>
> I looked at the data and it seems to be able 42 million quads.
>
> Using TDB1 (the loader is faster at this scale currently) is likely to be a 
> better choice.
>
> Looking at StoreParams in TDB2:
>
> The code below creates the database at TDB2Factory.connectDataset so any 
> StoreParams after that do not affect indexing.
>
> I tried to make it work in the release but the code ignores provided 
> StoreParams - sorry.  Even if it did work, it hits a test to make sure there 
> are basic indexing (Adam's point).
>
>Andy
>
>
> On 13/04/18 13:42, Samita Bai  / PhD CS Scholar @ City Campus wrote:
>> I wrote the following code to build only one type of triple and quad index 
>> but it is still creating all indexes 😞
>> package ldbqPack;
>> import org.apache.jena.query.Dataset;
>> import org.apache.jena.tdb2.TDB2Factory;
>> import org.apache.jena.tdb2.setup.StoreParams;
>> import org.apache.jena.tdb2.sys.DatabaseConnection;
>> import org.apache.jena.dboe.base.block.FileMode;
>> import org.apache.jena.dboe.base.file.Location;
>> import org.apache.jena.tdb2.setup.StoreParamsFactory;
>> public class StrPrms {
>> static Strin

Re: TDB 2 Store Parameters

2018-04-14 Thread ajs6f
42 million quads is nothing like so many that either TDB version should have 
any problem doing normal indexing (assuming very little in the way of 
hardware-- I ingest datasets like that on my laptop all the time).

Do you have some extraordinary hardware limitations?

Adam

> On Apr 14, 2018, at 11:42 AM, Andy Seaborne  wrote:
> 
> Hi Samita,
> 
> Firstly - as Adam points out - if theer are no indexes then access to the 
> data will be very slow.  For a GSPO index,  that means squeries must be 
> "GRAPH  { ... }" and probably "GRAPH  { .. }".
> 
> GSPO means lookup by G then S within those G and the same for P then O.
> 
> I looked at the data and it seems to be able 42 million quads.
> 
> Using TDB1 (the loader is faster at this scale currently) is likely to be a 
> better choice.
> 
> Looking at StoreParams in TDB2:
> 
> The code below creates the database at TDB2Factory.connectDataset so any 
> StoreParams after that do not affect indexing.
> 
> I tried to make it work in the release but the code ignores provided 
> StoreParams - sorry.  Even if it did work, it hits a test to make sure there 
> are basic indexing (Adam's point).
> 
>Andy
> 
> 
> On 13/04/18 13:42, Samita Bai  / PhD CS Scholar @ City Campus wrote:
>> I wrote the following code to build only one type of triple and quad index 
>> but it is still creating all indexes 😞
>> package ldbqPack;
>> import org.apache.jena.query.Dataset;
>> import org.apache.jena.tdb2.TDB2Factory;
>> import org.apache.jena.tdb2.setup.StoreParams;
>> import org.apache.jena.tdb2.sys.DatabaseConnection;
>> import org.apache.jena.dboe.base.block.FileMode;
>> import org.apache.jena.dboe.base.file.Location;
>> import org.apache.jena.tdb2.setup.StoreParamsFactory;
>> public class StrPrms {
>> static String[] tindexes= {"SPO"};
>> static String[] qindexes= {"GSPO"};
>> static String[] pindexes= {"GPU"};
>> static final StoreParams pApp = StoreParams.builder()
>>.blockSize(12)  // Not dynamic
>>.nodeMissCacheSize(12)  // Dynamic
>>.build();
>>static final StoreParams pLoc = StoreParams.builder()
>>.blockSize(0)
>>.nodeMissCacheSize(0).build();
>>static final StoreParams pDft = StoreParams.builder()
>> .fileMode(FileMode.mapped)
>> .blockSize(8192)
>> .blockReadCacheSize(5000)
>> .blockWriteCacheSize(1000)
>> .node2NodeIdCacheSize(20)
>> .nodeId2NodeCacheSize(75)
>> .nodeMissCacheSize(1000)
>> .nodeTableBaseName("nodes")
>> .primaryIndexTriples("SPO")
>> .tripleIndexes(tindexes)
>> .primaryIndexQuads("GSPO")
>> .quadIndexes(qindexes)
>> .prefixTableBaseName("prefixes")
>> .primaryIndexPrefix("GPU")
>> .prefixIndexes(pindexes)
>> .build();
>> public static void main(String[] args) {
>> // TODO Auto-generated method stub
>> final String DATASET_DIR_NAME = "DyLDO100";
>> Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );
>> Location location = Location.create(DATASET_DIR_NAME);
>> StoreParams custom_params = 
>> StoreParamsFactory.decideStoreParams(location, true, pApp, pLoc,  pDft);
>>DatabaseConnection.connectCreate(location, custom_params);
>>StoreParams params = StoreParams.getSmallStoreParams();
>> System.out.println(params);
>> }
>> }
>> Please help.
>> Regards,
>> Samita Bai
>> 
>> P : Please consider the environment before printing this e-mail
>> 
>> CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
>> contain confidential and privileged information. If you are not the intended 
>> recipient, please notify the sender immediately by return e-mail, delete 
>> this e-mail and destroy any copies. Any dissemination or use of this 
>> information by a person other than the intended recipient is unauthorized 
>> and may be illegal.
>> 



Re: TDB 2 Store Parameters

2018-04-14 Thread Andy Seaborne

Hi Samita,

Firstly - as Adam points out - if theer are no indexes then access to 
the data will be very slow.  For a GSPO index,  that means squeries must 
be "GRAPH  { ... }" and probably "GRAPH  { .. }".


GSPO means lookup by G then S within those G and the same for P then O.

I looked at the data and it seems to be able 42 million quads.

Using TDB1 (the loader is faster at this scale currently) is likely to 
be a better choice.


Looking at StoreParams in TDB2:

The code below creates the database at TDB2Factory.connectDataset so any 
StoreParams after that do not affect indexing.


I tried to make it work in the release but the code ignores provided 
StoreParams - sorry.  Even if it did work, it hits a test to make sure 
there are basic indexing (Adam's point).


Andy


On 13/04/18 13:42, Samita Bai  / PhD CS Scholar @ City Campus wrote:

I wrote the following code to build only one type of triple and quad index but 
it is still creating all indexes 😞


package ldbqPack;

import org.apache.jena.query.Dataset;

import org.apache.jena.tdb2.TDB2Factory;
import org.apache.jena.tdb2.setup.StoreParams;
import org.apache.jena.tdb2.sys.DatabaseConnection;
import org.apache.jena.dboe.base.block.FileMode;
import org.apache.jena.dboe.base.file.Location;
import org.apache.jena.tdb2.setup.StoreParamsFactory;


public class StrPrms {
static String[] tindexes= {"SPO"};
static String[] qindexes= {"GSPO"};
static String[] pindexes= {"GPU"};
static final StoreParams pApp = StoreParams.builder()
.blockSize(12)  // Not dynamic
.nodeMissCacheSize(12)  // Dynamic
.build();
static final StoreParams pLoc = StoreParams.builder()
.blockSize(0)
.nodeMissCacheSize(0).build();

static final StoreParams pDft = StoreParams.builder()
 .fileMode(FileMode.mapped)
 .blockSize(8192)
 .blockReadCacheSize(5000)
 .blockWriteCacheSize(1000)
 .node2NodeIdCacheSize(20)
 .nodeId2NodeCacheSize(75)
 .nodeMissCacheSize(1000)
 .nodeTableBaseName("nodes")
 .primaryIndexTriples("SPO")
 .tripleIndexes(tindexes)
 .primaryIndexQuads("GSPO")
 .quadIndexes(qindexes)
 .prefixTableBaseName("prefixes")
 .primaryIndexPrefix("GPU")
 .prefixIndexes(pindexes)
 .build();


public static void main(String[] args) {
// TODO Auto-generated method stub
final String DATASET_DIR_NAME = "DyLDO100";
 Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );

 Location location = Location.create(DATASET_DIR_NAME);

 StoreParams custom_params = 
StoreParamsFactory.decideStoreParams(location, true, pApp, pLoc,  pDft);

DatabaseConnection.connectCreate(location, custom_params);

StoreParams params = StoreParams.getSmallStoreParams();

 System.out.println(params);


}

}

Please help.

Regards,
Samita Bai




P : Please consider the environment before printing this e-mail



CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
contain confidential and privileged information. If you are not the intended 
recipient, please notify the sender immediately by return e-mail, delete this 
e-mail and destroy any copies. Any dissemination or use of this information by 
a person other than the intended recipient is unauthorized and may be illegal.





TDB 2 Store Parameters

2018-04-13 Thread Samita Bai / PhD CS Scholar @ City Campus
I wrote the following code to build only one type of triple and quad index but 
it is still creating all indexes 😞


package ldbqPack;

import org.apache.jena.query.Dataset;

import org.apache.jena.tdb2.TDB2Factory;
import org.apache.jena.tdb2.setup.StoreParams;
import org.apache.jena.tdb2.sys.DatabaseConnection;
import org.apache.jena.dboe.base.block.FileMode;
import org.apache.jena.dboe.base.file.Location;
import org.apache.jena.tdb2.setup.StoreParamsFactory;


public class StrPrms {
static String[] tindexes= {"SPO"};
static String[] qindexes= {"GSPO"};
static String[] pindexes= {"GPU"};
static final StoreParams pApp = StoreParams.builder()
   .blockSize(12)  // Not dynamic
   .nodeMissCacheSize(12)  // Dynamic
   .build();
   static final StoreParams pLoc = StoreParams.builder()
   .blockSize(0)
   .nodeMissCacheSize(0).build();

   static final StoreParams pDft = StoreParams.builder()
.fileMode(FileMode.mapped)
.blockSize(8192)
.blockReadCacheSize(5000)
.blockWriteCacheSize(1000)
.node2NodeIdCacheSize(20)
.nodeId2NodeCacheSize(75)
.nodeMissCacheSize(1000)
.nodeTableBaseName("nodes")
.primaryIndexTriples("SPO")
.tripleIndexes(tindexes)
.primaryIndexQuads("GSPO")
.quadIndexes(qindexes)
.prefixTableBaseName("prefixes")
.primaryIndexPrefix("GPU")
.prefixIndexes(pindexes)
.build();


public static void main(String[] args) {
// TODO Auto-generated method stub
final String DATASET_DIR_NAME = "DyLDO100";
Dataset dataset = TDB2Factory.connectDataset ( DATASET_DIR_NAME );

Location location = Location.create(DATASET_DIR_NAME);

StoreParams custom_params = 
StoreParamsFactory.decideStoreParams(location, true, pApp, pLoc,  pDft);

   DatabaseConnection.connectCreate(location, custom_params);

   StoreParams params = StoreParams.getSmallStoreParams();

System.out.println(params);


}

}

Please help.

Regards,
Samita Bai




P : Please consider the environment before printing this e-mail



CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
contain confidential and privileged information. If you are not the intended 
recipient, please notify the sender immediately by return e-mail, delete this 
e-mail and destroy any copies. Any dissemination or use of this information by 
a person other than the intended recipient is unauthorized and may be illegal.