Re: Query regarding Jena parallelism

Andy Seaborne Fri, 29 Jan 2016 13:17:05 -0800

On 29/01/16 14:48, Maria Jackson wrote:

1. I am using Jena 2.13.0
2.  I tried increasing the heap size to export JVM_ARGS="-Xmx32768m". Still

Try 2G -- a very large heap slows TDB down because it uses RAM outsidethe heap via the OS.

I am getting 1700% CPU usage. I'll be really grateful if you can suggest a
way by which I can run Jena without parallelization.

As we've said, the query showed has a single threaded execution, noparallelism. It's a streaming query with a simple execution plan of onejoin.

You should see the "virtual" size of the java process to grow as thequery executes. I can't say how big it will get as it depends on thesystem limits of your machine but compare it to the resident size.

3.  I tried using --results=json ... > myResults.srj still my CPU usage is
above 1700%


Java at 1700%? or the machine at 1700%?  Those are different.
Is the machine responsive to a shell in a terminal?

Using top(1), see what other processes are running. The OS runsbackground processes to manage the file system cache. These are beyondJena's control.



How long does it take to execute when producing SRJ results?
What happens if you use a LIMIT of say 50% of the expected results?  10%?

4. I have 64 GB physical RAM on my machine


And it's not a VM?

        Andy


On Fri, Jan 29, 2016 at 7:32 PM, Andy Seaborne <a...@apache.org> wrote:

On 29/01/16 11:55, Rose Beck wrote:

1. Java version which I am using is:
ava version "1.7.0_79"


So this is not Jena3 which requires Java8 - which version of Jena?

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.12.04.1)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
2. I am using disk
3. Size of data=100GB
4. frequency of pred1=1million


How much physical RAM?

I am a novice, therefore, can you please help a bit. How I do I use

JSON results?


tdbquery --help

hence

... --results=json ... > myResults.srj

For text results,
and I would increase the heap size to see if that makes a difference due
to GC.

Nowadays, set environment variable JVM_ARGS.
For some quite old versions of tdbquery, you may have to edit the script
itself.

See also arq.rset which reads a result set and writes it again, changing
formats in the process. (Can't read text format, only write it).

     Andy

On Fri, Jan 29, 2016 at 5:18 PM, Andy Seaborne <a...@apache.org> wrote:

Can you describe the setup in more detail? (versions of jena, java; size
of
data; disk/SSD, environment variables, frequency of <pred1> etc, that
sort
of thing).

The query below does not have any thread parallel execution as Rob
indicated.  The query execution is low footprint as well but the
formatting
of results (the default text format) causes buffering and that can
induce GC
pressure (there is a parallel GC).  If the results are large, then OOME
will
occur eventually; use JSON results.

What the OS is doing, and how it's accounted for, can be important
because
this is a cold-start query (well, sort of - the OS file cache maybe
warm).
What else is happening on the machine is a significant factor.

      Andy


On 29/01/16 10:36, Rose Beck wrote:


Ok I see. I am using Ubuntu 12.04. Also I am getting 1600% CPU usage.
So, is it possible to turn off this parallelization

On Fri, Jan 29, 2016 at 4:00 PM, Andy Seaborne <a...@apache.org> wrote:


It's possibly an artifact of counting.

What's your OS?

Memory mapped files are sometimes counted as "RAM" when the file is
mapped.
In fact, only the working set is in RAM (as managed by the OS, via the
filing system cache, not TBD itself).

And also be careful of which tool you use to see the RAM usage - they
present information differently even on the same OS.

On Linux, look at the resident memory; virtual can be way more than
RAM.

           Andy


On 29/01/16 10:02, Maria Jackson wrote:



Sorry Rob. I really apologize. But can you please help me with this a
bit.

Here's the command:

./tdbquery --time --loc=/home/Jena "select ?a?b?c where{graph ?g{?a <
http://dbpedia.org/ontology/pred1> ?b} graph ?g1{?a <
http://dbpedia.org/ontology/pred2> ?c} }

On Fri, Jan 29, 2016 at 3:23 PM, Rob Vesse <rve...@dotnetrdf.org>
wrote:

Maria


It looks like you hit Send too soon as you haven't shown a command
nor
the
query you are running

Rob

On 29/01/2016 09:19, "Maria Jackson" <maria.jackson....@gmail.com>
wrote:

Dear All,


I am using the following command to run queries on Jena-TDB-2.13.0.
I
observe that Jena uses more than 1200% of RAM on my machine
containing
12
cores. Is it possible to turn off parallelization in Jena. If yes,
then
how.

Re: Query regarding Jena parallelism

Reply via email to