A short update: after killing every query and restarting tomcat, even without submitting any new query, owlim-se takes 17GB of memory and 100% CPU
2013/3/29 Stefano Parmesan <parme...@spaziodati.eu> > Hello everybody, > > Just to let you know a more specific use case that leads to the issue > described above. Let me first say that after enabling debug level 4 into > tomcat we noticed that owlim-se is actually responding, but very slowly, > and every query submitted by the sesame-workbench is left pending and > without answer. > > So: we are testing owlim in our infrastructure using > silk<http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/>, which > performs simple but heavy queries (fetching all the resources of a given > type, with all their attributes, in a multithreading fashion), and this > seems to make the issue appear. > > More specific flow: > - The repository is empty; > > - Silk fetches 0 resources; > - Ingest 1937 resources; > > - Silk fetches 1937 resources; > - Ingest 462 resources; > > - Silk fetches 2399 resources; > - Ingest 495 resources; > > - Silk fetches 2894 resources; > - Ingest 18312 resources; > > - Silk fetches 21206 resources; > - Ingest 34539 resources; > > - Silk fetches 55745 resources; > - Ingest 1793 resources; > > - Silk fetches 57538 resources; > - Ingest 1385 resources; > > - Silk fetches 58923 resources; > - Ingest 473 resources; > > - Silk fetches 59396 resources; > - Ingest 9708 resources; > > - Silk fetches 69104 resources; > - Ingest 79383 resources; > > - The repository contains 148487 resources, and owlim-se works without > slow-downs; > > Now, if we run again the same process, without emptying the repository, we > notice that owlim-se becomes slower and slower in giving the 148487 > resources to Silk. In fact, the first time silk fetches the resources owlim > responds in around 30 seconds, while at the sixth block (before ingesting > the 1793 resources) it takes more than 15 minutes to respond. In > .aduna/openrdf-sesame/logs we can see queries popping-up, so owlim is > working fine. > > Differences between first run and second run: > - in the first run the repository is empty, in the second it contains > 148487 resources; > - in the second run, some owl:sameAs tuples are stored (we can't say > exactly how many 'cause every attempt to query the repository fails; they > are supposedly around 300); > - in the second run, before ingesting resources, we drop them from the > repository, therefore the number of resources in the repository doesn't > double; > > Let me add another thing: looking at the log file in > aduna/openrdf-sesame/logs we noticed that owlim takes most of the time > between these two lines: > Query optimized in 0 ms > Request for query -717904301 is finished > > This probably means that the issue is query-execution-related. Oh and I > just noticed that owlim-se is using 100% of the memory we gave it (17.2GB), > is it okay? Could it be some memory leak, or just a misconfiguration? > > Thanks and regards > > > > 2013/3/28 Ruslan Velkov <rus...@sirma.bg> > >> Hi Stefano, >> >> We are sorry to hear that you experience problems! >> We gave a try to reproduce this issue with synthetic data consisting of >> 1M statements and 100 owl:sameAs links between random entities, performing >> thousands of small updates in the background of heavy long-running queries >> and killing the Owlim's process from time to time and then restarting, but >> couldn't get corrupted predicate statistics. >> >> Can you please send us your Owlim config file and the file 'predicates' >> in your storage folder (as defined in the config; the storage folder >> contains files like 'entities', 'pso.index', etc.; 'predicates' is a binary >> file which contains entity IDs + counters) if you keep the corrupted image? >> >> >> Regards, >> Ruslan >> >> >> >> On 03/28/2013 04:01 PM, Stefano Parmesan wrote: >> >>> Thank you Marek, >>> >>> I'll give it a try, I cleaned the repository without updating the conf a >>> couple of hours ago and the issue haven't appeared yet, but as you say >>> this >>> may lead to issues in the future so why not. >>> >>> Thanks >>> >>> >>> 2013/3/28 Marek Šurek <marek_su...@yahoo.co.uk> >>> >>> Hi, >>>> as long as I understand the error and behaviour it causes (I experienced >>>> this error few times before so I'm familiar), it can end with two >>>> scenarios, but both are considered as blocker/critical bugs: >>>> 1. You didn't recieve results which you should recieve (all data >>>> indicate >>>> the query is correct but even though you don't get all results you >>>> should >>>> get) >>>> 2. As statistics are broken the query which should normally take 1sec >>>> now >>>> runs e.g 20 minutes. >>>> >>>> I think the second option fits to you. When query is executed, it is >>>> normally running and in some future it will give results, but as it >>>> needs >>>> the much higher time to return results it blocks database(instead of >>>> taking >>>> database resources for 1second it uses it for 20 minutes and therefore >>>> you >>>> see such high CPU usage). The thing that you noticed this behaviour this >>>> morning is just lucky concidence and sooner or later you will certainly >>>> fall into trouble. >>>> I think the statistics which are broken are always related to specific >>>> predicates. As you didn't use the predicate which has broken statistics, >>>> you didn't notice it. >>>> >>>> From my previous experience, disabling context-index + also set index >>>> compression to -1 could solve some issues (but probably you'll have to >>>> reload the database). It is certainly not cure, but it can help you to >>>> work >>>> with application until the bug will be fixed. >>>> Hope I explain it bit to you. Hope the fix will come soon. >>>> >>>> Best regards, >>>> Marek >>>> >>>> >>>> ------------------------------ >>>> *From:* Stefano Parmesan <parme...@spaziodati.eu> >>>> *To:* Marek <marek_su...@yahoo.co.uk> >>>> *Cc:* "owlim-discussion@ontotext.com**" <owlim-discussion@ontotext.com* >>>> *> >>>> *Sent:* Thursday, 28 March 2013, 14:10 >>>> *Subject:* Re: [Owlim-discussion] Owlim-SE not responding with high CPU >>>> >>>> load >>>> >>>> $ grep "ERROR IN PREDICATE STATISTICS" catalina.out | wc -l >>>> 32368 >>>> (since the 25th) >>>> >>>> Apparently the last error is of yesterday afternoon, but we experienced >>>> such problems this morning as well, I can't say if they are related. >>>> >>>> Thanks and regards >>>> >>>> >>>> 2013/3/28 Marek <marek_su...@yahoo.co.uk> >>>> >>>> we experienced very similar behaviour with nearly the same usecase. i >>>> reported one bug which was in our case temporarily solved by turning off >>>> context index. recently i found other very similar issue, which is not >>>> reportwd yet as i cant figure out the cause. look pls into catalina.out >>>> whether there is not log "error in predicate statistics" which appears >>>> in >>>> both mentioned issues. maybe we hit the same problem. >>>> best regards, >>>> marek >>>> ------------------------------ >>>> From: Stefano Parmesan <parme...@spaziodati.eu> >>>> >>>> Sent: 28.3.2013 12:09 >>>> To: owlim-discussion@ontotext.com >>>> Subject: [Owlim-discussion] Owlim-SE not responding with high CPU load >>>> >>>> Hi everybody, >>>> >>>> We are evaluating Owlim-SE 5.3.5849 but we are encountering some issues: >>>> >>>> Our test repository contains around 1 million triples (around 100 of >>>> those >>>> are owl:sameAs) and have concurrent applications both inserting and >>>> querying (through sesame-console and the sparql endpoint provided by the >>>> sesame-workbench). The machine is a 12-core 64GB ram debian machine. >>>> Everything worked fine as of today, when something happened while we >>>> were >>>> submitting a high load to the sparql endpoint. Since then, tomcat7 uses >>>> from 200% to 650% of cpu, and the sparql endpoint does not respond with >>>> even simple queries. >>>> >>>> We tried restarting tomcat7 multiple times, but as soon as it comes back >>>> the CPU usage increases again and there's no way to do anything (through >>>> both sesame-workbench and sesame-console). >>>> >>>> Could this be due to some misconfiguration? Is this a known issue? How >>>> can >>>> we know what's really happening (apart from >>>> checking .aduna/openrdf-sesame/logs)? >>>> >>>> We could clear the repository and start from scratch, but as we are >>>> evaluating Owlim for production usage we need to find out what's the >>>> issue >>>> to better understand if it fits our needs. >>>> >>>> -- >>>> Dott. Stefano Parmesan >>>> Web Developer ~ SpazioDati s.r.l. >>>> Via del Brennero, 52 – 38122 Trento – Italy >>>> >>>> >>>> >>>> >>>> -- >>>> Dott. Stefano Parmesan >>>> Web Developer ~ SpazioDati s.r.l. >>>> Via del Brennero, 52 – 38122 Trento – Italy >>>> >>>> >>>> >>>> >>> >>> >>> >>> ______________________________**_________________ >>> Owlim-discussion mailing list >>> Owlim-discussion@ontotext.com >>> http://ontomail.semdata.org/**cgi-bin/mailman/listinfo/** >>> owlim-discussion<http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion> >>> >>> >> ______________________________**_________________ >> Owlim-discussion mailing list >> Owlim-discussion@ontotext.com >> http://ontomail.semdata.org/**cgi-bin/mailman/listinfo/**owlim-discussion<http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion> >> > > > > -- > Dott. Stefano Parmesan > Web Developer ~ SpazioDati s.r.l. > Via del Brennero, 52 – 38122 Trento – Italy > -- Dott. Stefano Parmesan Web Developer ~ SpazioDati s.r.l. Via del Brennero, 52 – 38122 Trento – Italy
_______________________________________________ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion