Updating large amounts of data

2018-09-12 Thread Markus Neumann
Hi, we are running a Fuseki server that will hold about 2.2 * 10^9 triples of meteorological data eventually. I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 on a 900GB SSD. Now I face several performance issues: 1. Inserting data: It takes more than one hour

Re: fuseki text:query : strange results + Lucene configuration

2018-09-12 Thread Vincent Ventresque
> Just to be sure, you can try to execute some very generic queries (e.g. "*a*") and count the results. Thanks, I'll do that when I have a moment > The downside of using a high limit (and the reason the default is "only" 1) is that jena-text/Lucene allocates an array of that size to hold

Re: fuseki text:query : strange results + Lucene configuration

2018-09-12 Thread Osma Suominen
Hi Vincent! Vincent Ventresque kirjoitti 12.09.2018 klo 15:53: What do you think about this solution : ?uriBnF text:query ( foaf:givenName "*J*" 200 ) . ?uriBnF text:query ( foaf:familyName "roussea*" ) . ?uriBnF foaf:familyName ?nom .  ?uriBnF foaf:givenName ?prenom It returns all the

Re: fuseki text:query : strange results + Lucene configuration

2018-09-12 Thread Vincent Ventresque
Hi Osma, Thanks again, it's very helpful. > Either you get less results than expected or the query will take a long time, or both What do you think about this solution : ?uriBnF text:query ( foaf:givenName "*J*" 200 ) . ?uriBnF text:query ( foaf:familyName "roussea*" ) . ?uriBnF

Re: fuseki text:query : strange results + Lucene configuration

2018-09-12 Thread Osma Suominen
Hi Vincent! Jena-text with the Lucene backend indexes each triple as a separate Lucene document. This means that you cannot combine givenName and familyName in the same query - from the Lucene perspective, the givenName appears in one document where familyName appears in another document,

Re: fuseki text:query : strange results + Lucene configuration

2018-09-12 Thread Vincent Ventresque
Hello Rob Thank you for all these elements. > there is a limit on the results returned from each text search so when these are *separately executed and joined together* you may only get a subset of the full results Could you please explain what would be a 'non-separate' query? Do you mean

Re: fuseki text:query : strange results + Lucene configuration

2018-09-12 Thread Rob Vesse
Well the order of triple patterns shouldn't matter too much when you have a pure BGP (albeit the optimiser might pick a bad order in some cases) But we aren't talking about pure BGPs here, having the text:query triples results in the BGP being broken up into joins of several property functions

Re: fuseki text:query : strange results + Lucene configuration

2018-09-12 Thread Vincent Ventresque
Hi Lorenz, Thanks for your reply. > for me it sounds more like you've found a bug I'm not able to tell, just beginning to use Fuseki + Lucene. > I'm just referring to "Order of triple patterns in a BGP" here Could you please give a raw text URL for "Order of triple patterns in a BGP" (seems

Re: fuseki text:query : strange results + Lucene configuration

2018-09-12 Thread Lorenz B.
Hi "VV", well, for me it sounds more like you've found a bug and are now doing a workaround. Or at least something is strange and I'm just referring to "Order of triple patterns in a BGP" here. The order of triple patterns in a BGP shouldn't matter - as far as I know it's always a good old join