On 21/11/2021 22:50, Jakub Jałowiec wrote:
> Hi Andy,
> sorry for late reply.
...
> I'm curious how can I get involved in this. Is* abstract public class
> PathEngine
> <https://github.com/apache/jena/blob/08404760a0ac10c615b006eff52a9a9963ccec0b/jena-arq/src/main/java/org/apache/jena/sparql/path/eval/PathEngine.java>
> *&
> its implementations (e.g. PathEngineSPARQL
> <https://github.com/apache/jena/blob/08404760a0ac10c615b006eff52a9a9963ccec0b/jena-arq/src/main/java/org/apache/jena/sparql/path/eval/PathEngineSPARQL.java>)
> a good place to start?

Yes.

more inline

On 23/11/2021 13:58, aj...@apache.org wrote:
There are a number of points here, but I'm only qualified to speak to one
of them. :)


On Sun, Nov 21, 2021, 5:51 PM Jakub Jałowiec <j.jalow...@student.uw.edu.pl>
wrote:

Hi Andy,
sorry for late reply.


<snipped>

Neither the in-memory nor TDB data storage is
designed for that. Mid-scale LPG systems are, typically (adjacency
layout). That design favours one set of use cases over others.

I found particularly interesting that a non-LPG in-memory version could
not be suitable here. Could you elaborate on that? Do you mean that the
limit here is the available RAM or something else?


Andy can correct me, but what I understood him to be saying was not that a
non-LPG in-memory data layout _could not_ be designed to meet this use
case, but that our particular current in-memory data layouts _do not_.

yes - (over generalizing ...)

An LPG or RDF graph built with adjacency lists and memory pointers is fast for some things but not others.

There have been experiments run WikiData over an commercial LPG and the WD style queries are reported to be very poorly performing.

A simple example - if adjacency lists and memory pointer, you can't go backwards along a pointer. Without addition indexing, ?s P O (lookup by key) is going to be bad.

    Andy


In particular, I can speak to the default "TIM" (transactional in-memory)
layout. It adopts a straightforward Java translation of functional data
structures from Scala that make assumptions: that they are running in a
single JVM, with a single contiguous heap, accessed at uniform speed by a
single processor with many threads. One could implement a similar design
with other structures that support partitioning across a network and new
logic to that account, but it would be far from trivial and it's not
obvious to me that you would end up with reasonable performance.

I'm not familiar myself with any open source distributed triplestores that
aren't academic products, but there are a few of those, e.g. TriAD or
Trinity.RDF. I found it useful to read papers about such systems to learn
about the design questions involved.

There has been some discussion of efforts from within the Jena community to
develop a distribution mechanism on top of the extant store designs, but we
haven't been able to get a finished contribution committed.

Adam

Also could you point me to any resources on memory management in Apache
Jena? Is it possible to run TDB or TDB2 fully in an in-memory mode?

I'm sure that path evaluation could be improved.
A good first step is to check that there aren't any basic, silly errors
in the existing code (it wouldn't be the first time). If none, then
there are specific algorithms like Floyd–Warshall (and I think there are
parallel versions of that).


I'm curious how can I get involved in this. Is* abstract public class
PathEngine
<
https://github.com/apache/jena/blob/08404760a0ac10c615b006eff52a9a9963ccec0b/jena-arq/src/main/java/org/apache/jena/sparql/path/eval/PathEngine.java

*&
its implementations (e.g. PathEngineSPARQL
<
https://github.com/apache/jena/blob/08404760a0ac10c615b006eff52a9a9963ccec0b/jena-arq/src/main/java/org/apache/jena/sparql/path/eval/PathEngineSPARQL.java
)
a good place to start?

Thanks for your suggestions.

Best regards,
Jakub

sob., 6 lis 2021 o 11:34 Andy Seaborne <a...@apache.org> napisał(a):

Hi Jakub,

For this data, what does it look like?
    How deep are the foaf:knows chains?
    How many foaf:knows triples overall are there?
    Are there cycles?
    (presumably there are shared
     substrees p1->p2->p3 and p1->p4->p3
     as well)

If designing a system bottom-to-top for the :p+ and other path-like
queries, the data can be arranged in datastructures suitable to
partitioning across CPUs. Neither the in-memory nor TDB data storage is
designed for that. Mid-scale LPG systems are, typically (adjacency
layout). That design favours one set of use cases over others.

The actual work per step per node is small so if there is poor
co-locality to a CPU, then there is a good chance that inter-thread
costs are going to dominate. Project Loom (java virtual threads) will
help but not solve that because thread switching will be in the JVM, not
the OS kernel.

There is also the compute synchronization costs and related impact on
CPU caches. Stalling to read memory (or competing for L2 cache) or to
synchronize between threads is expensive because it breaks up the
single-thread execution pipeline on a core.

So the layout across cores benefits from clustering.

If it done during query execution, rather than ahead-of-time, that more
cost.

It's a tradeoff - the other use of multiple cores is serving multiple
requests concurrently - the Fuseki case.

At the moment:

I'm sure that path evaluation could be improved.

A good first step is to check that there aren't any basic, silly errors
in the existing code (it wouldn't be the first time). If none, then
there are specific algorithms like Floyd–Warshall (and I think there are
parallel versions of that).

      Andy


On 05/11/2021 12:25, Jakub Jałowiec wrote:
I've noticed that the figure from my last message has not been sent
properly, here is a link to it: https://kubajal.github.io/covidepid/
<https://kubajal.github.io/covidepid/>

Best regards,
Jakub

pt., 5 lis 2021 o 13:00 Jakub Jałowiec <j.jalow...@student.uw.edu.pl
<mailto:j.jalow...@student.uw.edu.pl>> napisał(a):

     Hi Andy,
     thank for your input.
     Let me drill it down a little more so I understand it better. Let's
     say I have the following query:

         prefix foaf: <http://xmlns.com/foaf/0.1/
         <http://xmlns.com/foaf/0.1/>>
         prefix ex: <http://example.net/ <http://example.net/>>
         SELECT ?person1 (count(?label) as ?labelCounter)
         WHERE {
            ?p1 foaf:knows+ ?p2 .
            ?p2 ex:hasLabel ?label .
         }
         GROUP BY ?p1


     Let's also say I have multiple cores on my machine and I want to
     speed up a single query just by utilizing their parallelism. The
     bottleneck here is the complex property path (/foaf:knows+/).
     To speed things up I want to split the search space into chunks
     processed by each core. Intuitively to me the best solution would
be
     to split by /p1/ equally between all the cores (e.g. if I have /N/
     persons and /k/ cores then each core receives /N///k/ persons to
     evaluate as /p1/). The figure below shows workload distribution I
am
     trying to achieve (green circles are persons, brown arcs are
     instances of the /foaf:knows/ relationship, instances of
     /ex:hasLabel/ have been left out).
     cores.png
     The query engine would start the evaluation at a "person" node
     (/p1/ in the query) and then just do a closure of the /foaf:knows
     /relationship (/p1 foaf:knows+ p2/). This would require shared
     memory between all the threads.
     I have three questions:

      1. How would the SPARQL query engine know that it needs to split
         the workload in the 'per root of the pattern' manner and not in
         a different way? Is there a mechanism in the SPARQL interpreter
         for that?
      2. Can a single transaction be shared between multiple threads
(cores)?
      3. Do I need a transaction if the threads I am running are
         guaranteed to not modify anything? (the query is a SELECT so it
         is 'read-only')

     Best regards,
     Jakub

     niedz., 31 paź 2021 o 11:29 Andy Seaborne <a...@apache.org
     <mailto:a...@apache.org>> napisał(a):

         Hi Jakub,

         The preferred way to have parallel actions on a dataset is via
         transactions.

         concurrency-howto covers threading within a transaction.
         Possible with
         further MRSW (multiple reader or single writer) locking.

         This is how Fuseki executes multiple requests.  Each HTTP
         request that
         is executing in true parallel is executed on a separate thread
and
         inside a transaction.

         So have each thread start a transaction, execute as many
sequential
         queries as it needs and end the transaction.

         In fact, only TDB2 enforces this; TDB1 only enforces it if it
has
         already been used transactionally. Other datasets are
         multiple-reader
         safe anyway.  But placing inside a transaction is the correct
way.

                Andy

         On 30/10/2021 15:44, Jakub Jałowiec wrote:
          > Dear community,
          > is there any high-level user interface to execute parallel
         SELECT queries in Apache Fuseki or the CLI of Apache Jena?
          > I've found a short note on parallelism in Apache Jena here:

https://jena.apache.org/documentation/notes/concurrency-howto.html
         <
https://jena.apache.org/documentation/notes/concurrency-howto.html>.
         But that is not really what I am looking for as it is a general
         note on how to implement low-level parallelism in Apache Jena.
          > I am interested in analytic benchmarking of Apache Jena.
         Ideally, I am looking for something that works out-of-the-box
         just for SELECT queries (no need to modify the model in a
         parallel fashion or synchronize state etc.).
          >
          > I'd appreciate any suggestions or pointing out to any
         resources as I am new to Apache Jena. I couldn't find a lot in
         the archives of the list using the keywords "parallelism" and
         "concurrent".
          >
          > Best regards,
          > Jakub
          >




Reply via email to