Is Fuseki has clustering or Auto scaling support ?

2021-02-13 Thread Soundararajan C
Hi Team , Got your support mail id from https://jena.apache.org/help_and_support/ . We were using Apache Jena Fuseki (2.4) , Just thought of checking with your team , whether apache Jena Fuseki has auto scaling , Clustering ,High availability option . Purpose : We have made all our services

Re: Server workload finder

2021-02-13 Thread Andy Seaborne
In Fuseki, there is a metrics endpoint: https://jena.apache.org/documentation/fuseki2/fuseki-server-info.html which includes counts and also JVM information for Prometeus/Grafana. (You can also attach dirtectly to the server e.g. VisualVM). Andy On 13/02/2021 17:24, Hashim Khan wrote:

Re: Merging a massive amount of RDFs

2021-02-13 Thread Andy Seaborne
BTW - the very long time may be the garbage collector on a nearly exhausted heap. Actually, this couples with the way the data is being written. The RDFFormat.RDFXML_PLAIN does not have expensive corner cases. Andy On 12/02/2021 11:45, emri mbiemri wrote: Dear all, Do you know how I

Server workload finder

2021-02-13 Thread Hashim Khan
Hi, I want to ask if there is any function in jena which can return information about SPARQL Server in terms of current RAM usage by that server, throughput in terms of QpS and other such workload related parameters? - Or, is there any SPARQL query like that? Best Regards, -- *Hashim Khan*

Re: Understanding the output of Jena TDB Loader

2021-02-13 Thread Daniel Hernandez
Hi, Andy Seaborne writes: > How much data are you loading? I am loading a billion triples. > Heap is only used for the node table cache and not index work which is > out of heap in memory mapped filesmapped by the virtual memory of the > OS process so caching is done by the OS filesystem

Re: Understanding the output of Jena TDB Loader

2021-02-13 Thread Andy Seaborne
On 13/02/2021 13:21, Daniel Hernandez wrote: Hi, Thanks Lorenz for your answer. Regarding a possible spill to disk, my machine has 256 GB of RAM, and the Java process is taking only 20 G. I am not sure if changing the -Xmx Java argument would speed up the loading process. I see that

Re: Merging a massive amount of RDFs

2021-02-13 Thread Andy Seaborne
On 13/02/2021 13:53, Alexis Armin Huf wrote: Thanks for clarifying, Andy. I hadn't followed the code in RDFDataMgr.read/write(). The "In my experience" bit comes from two cases: - iterating a Model - Iterating the QuerySolution's of a ResultSet Right - that's when the query work is

Re: Merging a massive amount of RDFs

2021-02-13 Thread Alexis Armin Huf
Thanks for clarifying, Andy. I hadn't followed the code in RDFDataMgr.read/write(). The "In my experience" bit comes from two cases: - iterating a Model - Iterating the QuerySolution's of a ResultSet In both cases the culprit turned out to be just the GC overhead. The Model instances in that

Re: Merging a massive amount of RDFs

2021-02-13 Thread Andy Seaborne
On 12/02/2021 13:43, Alexis Armin Huf wrote: Hi, emri. In my experience with Jena I have observed that Graphs are more efficient than Models when there is too much data being iterated. The actual parsing should go straight into the graph, having picked it out of the model. What can

Re: Understanding the output of Jena TDB Loader

2021-02-13 Thread Daniel Hernandez
Hi, Thanks Lorenz for your answer. Regarding a possible spill to disk, my machine has 256 GB of RAM, and the Java process is taking only 20 G. I am not sure if changing the -Xmx Java argument would speed up the loading process. I see that tdbloader2 started the process using with the

Re: Find out in data if resources are connected

2021-02-13 Thread Andy Seaborne
Minor but "*" -> "+" On 12/02/2021 20:13, Martynas Jusevičius wrote: SPARQL is based on pattern matching, so path traversal is not its strong point. You might want to try a different language like Gremlin. On Fri, 12 Feb 2021 at 15.05, Mikael Pesonen wrote: Sorry meant of course to find

Re: Understanding the output of Jena TDB Loader

2021-02-13 Thread Lorenz Buehmann
On 13.02.21 12:00, Daniel Hernandez wrote: > Hi, > > I am loading an n-triples file using tdbloader2. I am curious about > what is the meaning of the numbers in the loader output. The loading > output started as follows: > > 09:54:15 INFO -- TDB Bulk Loader Start > 09:54:15 INFO Data Load

Understanding the output of Jena TDB Loader

2021-02-13 Thread Daniel Hernandez
Hi, I am loading an n-triples file using tdbloader2. I am curious about what is the meaning of the numbers in the loader output. The loading output started as follows: 09:54:15 INFO -- TDB Bulk Loader Start 09:54:15 INFO Data Load Phase 09:54:15 INFO Got 1 data files to load 09:54:15

Re: Merging a massive amount of RDFs

2021-02-13 Thread Lorenz Buehmann
wrong thread and the question is too vague ... 1) open another thread and 2) what is "cleaning" in your context, i.e. which kind of data quality issues? Also, Jena is pretty much not a data cleansing tool, you can use its RDF capabilities to write your own algorithms though. On 12.02.21 16:52,