Re: [Owlim-discussion] Owlim-SE not responding with high CPU load

Stefano Parmesan Fri, 29 Mar 2013 03:28:06 -0700

A short update: after killing every query and restarting tomcat, even
without submitting any new query, owlim-se takes 17GB of memory and 100% CPU



2013/3/29 Stefano Parmesan <parme...@spaziodati.eu>

> Hello everybody,
>
> Just to let you know a more specific use case that leads to the issue
> described above. Let me first say that after enabling debug level 4 into
> tomcat we noticed that owlim-se is actually responding, but very slowly,
> and every query submitted by the sesame-workbench is left pending and
> without answer.
>
> So: we are testing owlim in our infrastructure using 
> silk<http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/>, which
> performs simple but heavy queries (fetching all the resources of a given
> type, with all their attributes, in a multithreading fashion), and this
> seems to make the issue appear.
>
> More specific flow:
> - The repository is empty;
>
> - Silk fetches 0 resources;
> - Ingest 1937 resources;
>
> - Silk fetches 1937 resources;
> - Ingest 462 resources;
>
> - Silk fetches 2399 resources;
> - Ingest 495 resources;
>
> - Silk fetches 2894 resources;
> - Ingest 18312 resources;
>
> - Silk fetches 21206 resources;
> - Ingest 34539 resources;
>
> - Silk fetches 55745 resources;
> - Ingest 1793 resources;
>
> - Silk fetches 57538 resources;
> - Ingest 1385 resources;
>
> - Silk fetches 58923 resources;
> - Ingest 473 resources;
>
> - Silk fetches 59396 resources;
> - Ingest 9708 resources;
>
> - Silk fetches 69104 resources;
> - Ingest 79383 resources;
>
> - The repository contains 148487 resources, and owlim-se works without
> slow-downs;
>
> Now, if we run again the same process, without emptying the repository, we
> notice that owlim-se becomes slower and slower in giving the 148487
> resources to Silk. In fact, the first time silk fetches the resources owlim
> responds in around 30 seconds, while at the sixth block (before ingesting
> the 1793 resources) it takes more than 15 minutes to respond. In
> .aduna/openrdf-sesame/logs we can see queries popping-up, so owlim is
> working fine.
>
> Differences between first run and second run:
> - in the first run the repository is empty, in the second it contains
> 148487 resources;
> - in the second run, some owl:sameAs tuples are stored (we can't say
> exactly how many 'cause every attempt to query the repository fails; they
> are supposedly around 300);
> - in the second run, before ingesting resources, we drop them from the
> repository, therefore the number of resources in the repository doesn't
> double;
>
> Let me add another thing: looking at the log file in
> aduna/openrdf-sesame/logs we noticed that owlim takes most of the time
> between these two lines:
> Query optimized in 0 ms
> Request for query -717904301 is finished
>
> This probably means that the issue is query-execution-related. Oh and I
> just noticed that owlim-se is using 100% of the memory we gave it (17.2GB),
> is it okay? Could it be some memory leak, or just a misconfiguration?
>
> Thanks and regards
>
>
>
> 2013/3/28 Ruslan Velkov <rus...@sirma.bg>
>
>> Hi Stefano,
>>
>> We are sorry to hear that you experience problems!
>> We gave a try to reproduce this issue with synthetic data consisting of
>> 1M statements and 100 owl:sameAs links between random entities, performing
>> thousands of small updates in the background of heavy long-running queries
>> and killing the Owlim's process from time to time and then restarting, but
>> couldn't get corrupted predicate statistics.
>>
>> Can you please send us your Owlim config file and the file 'predicates'
>> in your storage folder (as defined in the config; the storage folder
>> contains files like 'entities', 'pso.index', etc.; 'predicates' is a binary
>> file which contains entity IDs + counters) if you keep the corrupted image?
>>
>>
>> Regards,
>> Ruslan
>>
>>
>>
>> On 03/28/2013 04:01 PM, Stefano Parmesan wrote:
>>
>>> Thank you Marek,
>>>
>>> I'll give it a try, I cleaned the repository without updating the conf a
>>> couple of hours ago and the issue haven't appeared yet, but as you say
>>> this
>>> may lead to issues in the future so why not.
>>>
>>> Thanks
>>>
>>>
>>> 2013/3/28 Marek Šurek <marek_su...@yahoo.co.uk>
>>>
>>>  Hi,
>>>> as long as I understand the error and behaviour it causes (I experienced
>>>> this error few times before so I'm familiar), it can end with two
>>>> scenarios, but both are considered as blocker/critical bugs:
>>>> 1. You didn't recieve results which you should recieve (all data
>>>> indicate
>>>> the query is correct but even though you don't get all results you
>>>> should
>>>> get)
>>>> 2. As statistics are broken the query which should normally take 1sec
>>>> now
>>>> runs e.g 20 minutes.
>>>>
>>>> I think the second option fits to you. When query is executed, it is
>>>> normally running and in some future it will give results, but as it
>>>> needs
>>>> the much higher time to return results it blocks database(instead of
>>>> taking
>>>> database resources for 1second it uses it for 20 minutes and therefore
>>>> you
>>>> see such high CPU usage). The thing that you noticed this behaviour this
>>>> morning is just lucky concidence and sooner or later you will certainly
>>>> fall into trouble.
>>>> I think the statistics which are broken are always related to specific
>>>> predicates. As you didn't use the predicate which has broken statistics,
>>>> you didn't notice it.
>>>>
>>>>  From my previous experience, disabling context-index + also set index
>>>> compression to -1 could solve some issues (but probably you'll have to
>>>> reload the database). It is certainly not cure, but it can help you to
>>>> work
>>>> with application until the bug will be fixed.
>>>> Hope I explain it bit to you. Hope the fix will come soon.
>>>>
>>>> Best regards,
>>>> Marek
>>>>
>>>>
>>>>    ------------------------------
>>>> *From:* Stefano Parmesan <parme...@spaziodati.eu>
>>>> *To:* Marek <marek_su...@yahoo.co.uk>
>>>> *Cc:* "owlim-discussion@ontotext.com**" <owlim-discussion@ontotext.com*
>>>> *>
>>>> *Sent:* Thursday, 28 March 2013, 14:10
>>>> *Subject:* Re: [Owlim-discussion] Owlim-SE not responding with high CPU
>>>>
>>>> load
>>>>
>>>> $ grep "ERROR IN PREDICATE STATISTICS" catalina.out | wc -l
>>>> 32368
>>>> (since the 25th)
>>>>
>>>> Apparently the last error is of yesterday afternoon, but we experienced
>>>> such problems this morning as well, I can't say if they are related.
>>>>
>>>> Thanks and regards
>>>>
>>>>
>>>> 2013/3/28 Marek <marek_su...@yahoo.co.uk>
>>>>
>>>>   we experienced very similar behaviour with nearly the same usecase. i
>>>> reported one bug which was in our case temporarily solved by turning off
>>>> context index. recently i found other very similar issue, which is not
>>>> reportwd yet as i cant figure out the cause. look pls into catalina.out
>>>> whether there is not log "error in predicate statistics" which appears
>>>> in
>>>> both mentioned issues. maybe we hit the same problem.
>>>> best regards,
>>>> marek
>>>>   ------------------------------
>>>> From: Stefano Parmesan <parme...@spaziodati.eu>
>>>>
>>>> Sent: 28.3.2013 12:09
>>>> To: owlim-discussion@ontotext.com
>>>> Subject: [Owlim-discussion] Owlim-SE not responding with high CPU load
>>>>
>>>> Hi everybody,
>>>>
>>>> We are evaluating Owlim-SE 5.3.5849 but we are encountering some issues:
>>>>
>>>> Our test repository contains around 1 million triples (around 100 of
>>>> those
>>>> are owl:sameAs) and have concurrent applications both inserting and
>>>> querying (through sesame-console and the sparql endpoint provided by the
>>>> sesame-workbench). The machine is a 12-core 64GB ram debian machine.
>>>> Everything worked fine as of today, when something happened while we
>>>> were
>>>> submitting a high load to the sparql endpoint. Since then, tomcat7 uses
>>>> from 200% to 650% of cpu, and the sparql endpoint does not respond with
>>>> even simple queries.
>>>>
>>>> We tried restarting tomcat7 multiple times, but as soon as it comes back
>>>> the CPU usage increases again and there's no way to do anything (through
>>>> both sesame-workbench and sesame-console).
>>>>
>>>> Could this be due to some misconfiguration? Is this a known issue? How
>>>> can
>>>> we know what's really happening (apart from
>>>> checking .aduna/openrdf-sesame/logs)?
>>>>
>>>> We could clear the repository and start from scratch, but as we are
>>>> evaluating Owlim for production usage we need to find out what's the
>>>> issue
>>>> to better understand if it fits our needs.
>>>>
>>>> --
>>>> Dott. Stefano Parmesan
>>>> Web Developer ~ SpazioDati s.r.l.
>>>> Via del Brennero, 52 – 38122 Trento – Italy
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dott. Stefano Parmesan
>>>> Web Developer ~ SpazioDati s.r.l.
>>>> Via del Brennero, 52 – 38122 Trento – Italy
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> ______________________________**_________________
>>> Owlim-discussion mailing list
>>> Owlim-discussion@ontotext.com
>>> http://ontomail.semdata.org/**cgi-bin/mailman/listinfo/**
>>> owlim-discussion<http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion>
>>>
>>>
>> ______________________________**_________________
>> Owlim-discussion mailing list
>> Owlim-discussion@ontotext.com
>> http://ontomail.semdata.org/**cgi-bin/mailman/listinfo/**owlim-discussion<http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion>
>>
>
>
>
> --
> Dott. Stefano Parmesan
> Web Developer ~ SpazioDati s.r.l.
> Via del Brennero, 52 – 38122 Trento – Italy
>



-- 
Dott. Stefano Parmesan
Web Developer ~ SpazioDati s.r.l.
Via del Brennero, 52 – 38122 Trento – Italy

_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion

Re: [Owlim-discussion] Owlim-SE not responding with high CPU load

Reply via email to