Hi Ilya, The patch has been applied to the night build 3.6.519, the data has been migrated with se_exp using complete export-import. The issue has not appear again for a week already. I'll keep monitoring Sedna.
However, it seems there is another issue with that night build: LOG 27/10/2014 01:21:18 (TRN nxp pid=20091 sid=49 trid=-1) [sp.c:sp_send_msg:64]: Message send : type 250, len 0 LOG 27/10/2014 01:21:18 (TRN nxp pid=20091 sid=49 trid=-1) [sp.c:sp_recv_msg:34]: Message recv : type 210, len 0 LOG 27/10/2014 01:21:18 (TRN nxp pid=20091 sid=49 trid=49) [sp.c:sp_send_msg:64]: Message send : type 230, len 0 LOG 27/10/2014 01:21:18 (TRN nxp pid=20091 sid=49 trid=49) [sp.c:sp_recv_msg:34]: Message recv : type 300, len 76 LOG 27/10/2014 01:21:18 (TRN nxp pid=20091 sid=49 trid=49) [tr_functions.cpp:on_user_statement_begin:102]: User query: --- DROP DOCUMENT "TreeExport.xml" IN COLLECTION "treeStructures/released" ERROR 27/10/2014 01:21:20 (TRN nxp pid=20091 sid=49 trid=49) [btree.cpp:bt_delete_tmpl:273]: (SE1008) Sedna index subsystem error. Details: Cannot delete object which is not in the btree LOG 27/10/2014 01:21:20 (TRN nxp pid=20091 sid=49 trid=-1) [tr_common_funcs.cpp:on_transaction_end:416]: Transaction has been ROLLED BACK LOG 27/10/2014 01:21:20 (TRN nxp pid=20091 sid=49 trid=-1) [sp.c:sp_send_msg:64]: Message send : type 100, len 122 LOG 27/10/2014 01:21:25 (TRN nxp pid=20091 sid=49 trid=-1) [sp.c:sp_recv_msg:34]: Message recv : type 500, len 0 LOG 27/10/2014 01:21:25 (TRN nxp pid=20091 sid=49 trid=-1) [sp.c:sp_send_msg:64]: Message send : type 510, len 0 INFO 27/10/2014 01:21:25 (SM nxp pid=19377) [bm_core.cpp:bm_log_out_io_statistics:537]: IO block reads:0, writes:0 LOG 27/10/2014 01:21:25 (TRN nxp pid=20091 sid=49 trid=-1) [tr.cpp:TRmain:456]: Session is closed It's persistent, I could not delete the document at all until I deleted the indexes on that collection: <index name="category_href" backend="btree" as_type="xs:string" on_path="descendant::element(Category)" by_path="attribute::attribute(href)" object_name="treeStructures/released" object_type="collection"/> <index name="category_id" backend="btree" as_type="xs:string" on_path="descendant::element(Category)" by_path="attribute::attribute(Identifier)" object_name="treeStructures/released" object_type="collection"/> Any ideas? Best regards, Ivan Lagunov On Fri, Oct 17, 2014 at 10:05 AM, Ilya Taranov <epsi...@socio.msu.ru> wrote: > Hi, here is the patch over the latest label that should add protocol > minimal debugging. > > On Wed, Oct 15, 2014 at 9:53 AM, Ivan Lagunov <lagi...@gmail.com> wrote: > >> Hi Ilya, >> >> As it's stated in the email topic, I use Sedna 3.5.615. It was built >> from sources with RelWithDebugInfo flag enabled on the RHEL linux >> server. If you want to add some additional debug info, I'd suggest adding >> it on top of the latest night build though. I'm planning to upgrade anyway >> as there were quite some nice features implemented. >> >> @Charles, okay, nice to know this. If only I could provide a consistently >> failing test case. Currently the issue is quite mysterious. >> >> Best regards, >> Ivan Lagunov >> >> On Wed, Oct 15, 2014 at 11:02 AM, Ivan Shcheklein <shchekl...@gmail.com> >> wrote: >> >>> Ilya, >>> >>> As far as I remember. It is almost the latest dev sources built with >>> relWithDebInfo. Internal data structures should be compatible. >>> >>> Ivan >>> >>> On Wed, Oct 15, 2014 at 2:08 AM, Ilya Taranov <epsi...@socio.msu.ru> >>> wrote: >>> >>>> I'm not sure, latest build will fix the problem. >>>> >>>> It's definitely some client-to-app communication problem. >>>> If I make a git patch for you with additional debug info, will you be >>>> able to apply it (assuming, the data file version will be the same)? What >>>> build do you currently use? >>>> >>>> On Sat, Oct 11, 2014 at 12:48 AM, Ivan Lagunov <lagi...@gmail.com> >>>> wrote: >>>> >>>>> Hi Ilya, >>>>> >>>>> I've analyzed logs for other issue occurrences and it has appeared >>>>> more complicated. >>>>> >>>>> First, could you please clarify about "Session is ready" statement, >>>>> when exactly is it dumped to the log file? I'm wondering because when the >>>>> database is hanging, those log entries still appear at some random moments >>>>> of time. >>>>> >>>>> Second, I'm going to provide the investigation details for several >>>>> more issue occurrences here. Below "fast XQuery" means that it uses index >>>>> and is expected to be very fast. >>>>> >>>>> https://www.dropbox.com/s/2luq53cstc4t3tl/sedna_debug_info1.zip?dl=0 >>>>> >>>>> se_trn 27173 - DROP DOCUMENT operation, COMMITED >>>>> se_gov 29197 >>>>> se_sm 29224 >>>>> se_trn 29617 - fast XQuery basictypes:getBasicType, not COMMITED, gdb >>>>> thread 1 in select >>>>> se_trn 29820 - Session is ready >>>>> se_trn 29976 - Session is ready >>>>> >>>>> https://www.dropbox.com/s/scmszq9yphyv6xf/sedna_debug_info2.zip?dl=0 >>>>> >>>>> se_gov 15890 >>>>> se_sm 15899 >>>>> se_trn 21611 - doc-available XQuery, not COMMITED, gbd thread 1 in >>>>> nanosleep >>>>> se_trn 21709 - doc('$version') XQuery, COMMITED, gdb thread 1 in select >>>>> se_trn 21713 - doc('$version') XQuery, COMMITED, gdb thread 1 in select >>>>> se_trn 21716 - doc('$version') XQuery, COMMITED, gdb thread 1 in select >>>>> se_trn 21719 - doc('$version') XQuery, COMMITED, gdb thread 1 in select >>>>> se_trn 21727 - LOAD STDIN operation, not COMMITED, gbd thread 1 in >>>>> nanosleep >>>>> se_trn 21996 - fast XQuery basictypes:getBasicType, not COMMITED, gdb >>>>> thread 1 in select >>>>> se_trn 22002 - document("$collections") XQuery, not COMMITED, gdb >>>>> thread 1 in nanosleep >>>>> se_trn 22010 - Session is ready >>>>> se_trn 22013 - Session is ready >>>>> se_trn 22152 - Session is ready >>>>> se_trn 22295 - Session is ready >>>>> se_trn 23615 - Session is ready >>>>> se_trn 24315 - Session is ready >>>>> >>>>> Questions: >>>>> 1. Does nanosleep state mean that the operation is waiting for another >>>>> one (here it's probably 21996 XQuery in select)? >>>>> 2. Why does doc-available wait for any other operation? I'd expect it >>>>> to be non-blocking and to run immediately on the latest data snapshot. >>>>> 3. Why does doc('$version') wait for any other operation? It's a >>>>> database metadata being fetched, so I'd also expect it to run immediately. >>>>> >>>>> https://www.dropbox.com/s/3ibo4dvzapfdirw/sedna_debug_info3.zip?dl=0 >>>>> >>>>> se_trn 641 - fast XQuery packages:getPackage, not COMMITED, gdb thread >>>>> 1 in select >>>>> se_trn 645 - Session is ready & COMMITED without operation >>>>> se_trn 648 - Session is ready >>>>> se_trn 823 - Session is ready >>>>> se_trn 898 - Session is ready >>>>> se_gov 8444 >>>>> se_sm 8453 >>>>> >>>>> Questions: >>>>> What is going on with 645? There is "Session in ready" statement >>>>> immediately followed by "COMMITED" statement in the event.log. What >>>>> operation was committed? Is it possible the order of statements in the >>>>> event.log is inconsistent? Even so there is no more statements with >>>>> pid=645, so no operation to be seen that could be committed. Very >>>>> suspicious. >>>>> >>>>> https://www.dropbox.com/s/38idxmpzjv6rvxf/sedna_debug_info4.zip?dl=0 >>>>> >>>>> se_gov 1532 >>>>> se_sm 1711 >>>>> se_trn 6837 - DROP DOCUMENT operation, not COMMITED, gdb thread 1 in >>>>> nanosleep >>>>> se_trn 11703 - LOAD STDIN operation, not COMMITED, gdb thread 1 in >>>>> nanosleep >>>>> se_trn 11944 - document("$collections") XQuery, not COMMITED, gdb >>>>> thread 1 in nanosleep >>>>> se_trn 11972 - document("$collections") XQuery, not COMMITED, gdb >>>>> thread 1 in nanosleep >>>>> se_trn 12139 - XQuery collection("groupingDefinitions/wip"), COMMITED, >>>>> gdb thread 1 in select >>>>> se_trn 12183 - fast XQuery basictypes:getBasicType, not COMMITED, gdb >>>>> thread 1 in select >>>>> se_trn 12186 - XQuery collection("groupingDefinitions/wip"), COMMITED, >>>>> gdb thread 1 in select >>>>> se_trn 12195 - XQuery collection("groupingDefinitions/wip"), COMMITED, >>>>> gdb thread 1 in select >>>>> se_trn 12222 - document("$collections") XQuery, not COMMITED, gdb >>>>> thread 1 in nanosleep >>>>> se_trn 12302 - fast XQuery documents:getFilesByBasicTypeId, not >>>>> COMMITED, gdb thread 1 in nanosleep >>>>> se_trn 12307 - document("$documents") XQuery, not COMMITED, gdb thread >>>>> 1 in nanosleep >>>>> se_trn 12310 - document("$collections") XQuery, not COMMITED, gdb >>>>> thread 1 in nanosleep >>>>> se_trn 12377 - Session is ready >>>>> se_trn 12382 - Session is ready >>>>> se_trn 12423 - Session is ready >>>>> se_trn 12848 - Session is ready >>>>> >>>>> Comment: >>>>> I understand many of those XQueries are not optimal, they are part of >>>>> an application developed by other devs. Now I'll take care of improving >>>>> those operations to replace document("$documents") and >>>>> document("$collections") with doc-available and col-available accordingly >>>>> and to use an index for collection("groupingDefinitions/wip"). >>>>> >>>>> Overall conclusion: >>>>> There is no single operation/XQuery causing the database hanging. >>>>> There were at least three different XQueries observed. Taking into account >>>>> the seemingly random occurrences, it looks like the protocol issue indeed. >>>>> I don't see how to apply tcpdump in this case as it's Production >>>>> environment. Any ideas? >>>>> >>>>> I'd try to rebuild the database from sources and, perhaps, to take the >>>>> latest night 3.5 (or 3.6?) build for this. Do you think it makes sense to >>>>> try? >>>>> >>>>> Best regards, >>>>> Ivan Lagunov >>>>> >>>>> On Fri, Oct 10, 2014 at 1:23 AM, Ilya Taranov <epsi...@socio.msu.ru> >>>>> wrote: >>>>> >>>>>> Hi, from the top of my head... >>>>>> >>>>>> 1. Queries are evaluated in a lazy fashion. Which means, that before >>>>>> client asks to fetch the next row, nothing is evaluated. >>>>>> >>>>>> So select() is a socket read, which is waiting for client to make a >>>>>> decision on whether to commit (or end RO transaction) or ask for next >>>>>> part >>>>>> of the result. All locks are only released when end transaction happens. >>>>>> That's why others are waiting for one single client. >>>>>> >>>>>> To my mind, it's a wrong approach, and that's not entirely how I >>>>>> would do it right now, But that days we had ACID as a primary goal in >>>>>> minds. >>>>>> >>>>>> 2. Session is ready, means that there is no ongoing transaction, but >>>>>> the connection is established. >>>>>> >>>>>> 3. No idea in this particular case, but there can be a situation when >>>>>> session is started without client connection. Recovery/checkpoint, maybe >>>>>> something else, I'm not sure. >>>>>> >>>>>> 4. That dfntly looks strange, but should not cause a lock. Once >>>>>> transaction is committed, it should release all the locks. So even should >>>>>> there be orphan sessions, they would not interfere the normal operation. >>>>>> >>>>>> 5. Yes that pretty much looks like a problem. Based on logs, it looks >>>>>> like 17338 is waiting for client to fetch a result. >>>>>> >>>>>> That may be a protocol issue, by the way, when as a result of >>>>>> malformed message both client and sedna are waiting for each other. >>>>>> >>>>>> I would guess, that's always happens when >>>>>> "categorytree:getCategoryTree()" is called, could you please try that? If >>>>>> it confirms, could you please make a wireshark/tcpdump dump of this >>>>>> communication? >>>>>> >>>>>> >>>>>> On Thu, Oct 9, 2014 at 6:42 AM, Ivan Lagunov <lagi...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Ilya, >>>>>>> >>>>>>> Please check my investigations below. >>>>>>> >>>>>>> I've collected all the debug information for another hanging today, >>>>>>> it's available here: >>>>>>> https://www.dropbox.com/s/g14aer9821fzx05/sedna_debug_info6.zip?dl=0 >>>>>>> Here are the statuses of all Sedna processes: >>>>>>> >>>>>>> se_trn 13436 - DROP DOCUMENT operation, COMMITED >>>>>>> se_trn 16245 - LOAD STDIN operation, COMMITED, no ESTABLISHED >>>>>>> connection on port 5050 found (only 5151) >>>>>>> se_trn 17335 - doc('$version') XQuery, COMMITED, gdb thread 1 in >>>>>>> select () from /lib64/libc.so.6 >>>>>>> se_trn 17338 - XQuery running about 10sec, not COMMITED, gdb thread >>>>>>> 1 in select () from /lib64/libc.so.6 >>>>>>> se_trn 17410 - XQuery running <1sec, COMMITED >>>>>>> se_trn 17462 - Session is ready >>>>>>> se_trn 17554 - Session is ready >>>>>>> se_trn 18892 - Session is ready >>>>>>> se_trn 19124 - Session is ready >>>>>>> se_gov 20441 - gdb threads 1 and 2 in select () from /lib64/libc.so.6 >>>>>>> se_sm 20449 >>>>>>> >>>>>>> Comments: >>>>>>> 1. Where gdb is not stated, all threads are in semop () from >>>>>>> /lib64/libc.so.6 except the last thread that is in >>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2 >>>>>>> (). I believe semop is a good state, correct me if I'm wrong. >>>>>>> 2. All se_trn with "Session is ready" have no other entries in the >>>>>>> latest event.log. As I collect only the latest event.log, I cannot check >>>>>>> what was the last operation. >>>>>>> 3. All se_trn processes have two ESTABLISHED connections in netstat >>>>>>> for ports 5050 and 5151 if not stated otherwise. >>>>>>> 4. se_trn 17338, 17410, 17462 were created by the same application >>>>>>> that is under suspicion. The database unfreezed after the restart of >>>>>>> that >>>>>>> application. >>>>>>> >>>>>>> Issues/questions: >>>>>>> 1. What exactly does it mean when a thread is in select()? It seems >>>>>>> the database is waiting but for the client or for some internal >>>>>>> operation, >>>>>>> e.g. fetching the data. Please clarify. >>>>>>> 2. What exactly does "Session is ready" mean? When does it appear in >>>>>>> the event.log? I believe those connections are okay but I'm curious to >>>>>>> know. >>>>>>> 3. se_trn 16245 - how is it possible that there is no ESTABLISHED >>>>>>> connection on port 5050 while the process is still alive? An issue? >>>>>>> 4. se_trn 17335 - how is it possible that thread is in select() >>>>>>> while the transaction has been COMMITED in event.log? An issue? >>>>>>> 5. se_trn 17338 - looks like a root cause of Sedna hanging. >>>>>>> Currently I run it via XML:DB API implementation mentioned earlier, more >>>>>>> specifically I run the XQuery via XQueryService.query("xquery string") >>>>>>> call. Besides, I always try to close the connection in the Java finally >>>>>>> block, so it looks roughly like this: >>>>>>> >>>>>>> try { >>>>>>> rootCollection = DatabaseManager.getCollection(...); >>>>>>> ... do something ... >>>>>>> } finally { >>>>>>> rootCollection.close(); >>>>>>> } >>>>>>> >>>>>>> According to event.log the XQuery was started at 09/10/2014 12:00:01. >>>>>>> In the application log somewhere between 12:01:53 and 12:05:00 >>>>>>> appeared a message: >>>>>>> >>>>>>>> Warning: an open Sedna XML:DB Collection/Connection was just >>>>>>>> automatically closed due to inactivity, you SHOULD close these >>>>>>>> manually. >>>>>>> >>>>>>> It seems to mean that the connection was considered as abandoned and >>>>>>> removed by XML:DB API implementation (see above conn-log-abandoned=true >>>>>>> and conn-remove-abandoned-timeout=120000). >>>>>>> For me it looks like the client waited for 2 minutes but did not get >>>>>>> any response from the database. So there was an attempt to close the >>>>>>> connection. It's easy to understand as the database hanged at about >>>>>>> 12:00:02 when the last transaction was COMMITED. So it may be even that >>>>>>> the >>>>>>> root cause is hidden somewhere deeper. For example, this long XQuery >>>>>>> could >>>>>>> cause a deadlock. >>>>>>> >>>>>>> I understand it may be an issue in XML:DB API implementation but its >>>>>>> support was stopped by Charles Foster. Unfortunately, we cannot afford >>>>>>> to >>>>>>> migrate to XQJ API implementation yet. The most tricky part to >>>>>>> understand >>>>>>> is why we had not faced similar issues before recent occurrence when it >>>>>>> has >>>>>>> become a regular major issue immediately. There were no server updates >>>>>>> and >>>>>>> no changes that could affect the way transactions/connections are >>>>>>> handled. >>>>>>> >>>>>>> Best regards, >>>>>>> Ivan Lagunov >>>>>>> >>>>>>> On Thu, Oct 9, 2014 at 1:33 PM, Ivan Lagunov <lagi...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Sorry, copied the wrong link for the connection pool documentation. >>>>>>>> Here is the correct one: >>>>>>>> http://www.cfoster.net/pdf/sedna/xmldb/documentation.pdf#page=10 >>>>>>>> >>>>>>>> Besides, I use the following values for the connection pool >>>>>>>> parameters in the suspected application: >>>>>>>> conn-max-active=50 >>>>>>>> conn-max-wait=30000 >>>>>>>> conn-log-abandoned=true >>>>>>>> conn-remove-abandoned-timeout=120000 >>>>>>>> conn-abandoned-check-interval=5000 >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Ivan Lagunov >>>>>>>> >>>>>>>> On Thu, Oct 9, 2014 at 1:28 PM, Ivan Lagunov <lagi...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Ilya, >>>>>>>>> >>>>>>>>> I use Charles Foster implementation of XML:DB API, version 1.2.6 >>>>>>>>> without any customizations: >>>>>>>>> http://www.cfoster.net/sedna/xmldb/download/ >>>>>>>>> >>>>>>>>> Here is the link again to the connection pool documentation (it's >>>>>>>>> a part of the above implementation): >>>>>>>>> http://www.cfoster.net/sedna/xmldb/download/ >>>>>>>>> >>>>>>>>> Meanwhile, I've identified the application that makes Sedna wait. >>>>>>>>> At least restarting the application has unfreezed the database twice >>>>>>>>> already. I'll analyze the application log along with Sedna event.log >>>>>>>>> shortly. Hopefully, I'll find the root cause. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Ivan Lagunov >>>>>>>>> >>>>>>>>> On Thu, Oct 9, 2014 at 3:50 AM, Ilya Taranov <epsi...@socio.msu.ru >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> I can easily see the scenario, where one of the clients doesn't >>>>>>>>>> terminate the transaction properly. Unfortunately, current >>>>>>>>>> application >>>>>>>>>> timeout doesn't apply for client hangs. The issue can be easily >>>>>>>>>> fixed, >>>>>>>>>> however. >>>>>>>>>> >>>>>>>>>> Unfortunately, even fix still would not solve your problem, you >>>>>>>>>> would still have application timeout wait on locks. Probably the most >>>>>>>>>> straightforward way to find the cause for you would be to log >>>>>>>>>> connection >>>>>>>>>> checkins/checkouts at the client driver level. >>>>>>>>>> >>>>>>>>>> Could you please give me a link to the connection pool + driver >>>>>>>>>> you currently use (including all your changes if possible) so I >>>>>>>>>> could help >>>>>>>>>> you with that? >>>>>>>>>> >>>>>>>>>> On Tue, Oct 7, 2014 at 4:37 AM, Ivan Lagunov <lagi...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Ilya, >>>>>>>>>>> >>>>>>>>>>> Your suggestion seems to be correct. After I've restarted three >>>>>>>>>>> Java applications that had connections to Sedna, the database has >>>>>>>>>>> resumed >>>>>>>>>>> processing operations. Here is the event.log where Sedna has hanged >>>>>>>>>>> between >>>>>>>>>>> 12:35:03 and 13:06:05: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://www.dropbox.com/s/pzafuvzcvfchxwa/event-2014-10-07-13-15-04-018.zip?dl=0 >>>>>>>>>>> >>>>>>>>>>> I'm not sure which of the applications caused it though. It >>>>>>>>>>> still worries me as the queries before hanging look quite simple >>>>>>>>>>> and I >>>>>>>>>>> cannot understand which client operation could have hanged the whole >>>>>>>>>>> database. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Ivan Lagunov >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 6, 2014 at 12:29 PM, Ivan Lagunov <lagi...@gmail.com >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Ilya, >>>>>>>>>>>> >>>>>>>>>>>> Yes, I use a connection pool via Sedna XML:DB API >>>>>>>>>>>> implementation. It's documented here: >>>>>>>>>>>> >>>>>>>>>>>> http://www.cfoster.net/pdf/sedna/xmldb/documentation.pdf#page=10 >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately, there is no support for active connection >>>>>>>>>>>> timeout. This could be the cause if there was a query hanging for >>>>>>>>>>>> some >>>>>>>>>>>> reason. Still if some query hangs forever (that should not as it's >>>>>>>>>>>> in my >>>>>>>>>>>> case) it looks more like a database bug/deadlock situation. >>>>>>>>>>>> >>>>>>>>>>>> I'll try as you suggest next time. >>>>>>>>>>>> >>>>>>>>>>>> P.S. Sorry for the second email as I've forgotten to add CC >>>>>>>>>>>> first. >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Ivan Lagunov >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Oct 3, 2014 at 10:23 PM, Ilya Taranov < >>>>>>>>>>>> epsi...@socio.msu.ru> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Ivan, do you have a connection pool? I see one of the se_trn >>>>>>>>>>>>> processes waiting for socket read. >>>>>>>>>>>>> >>>>>>>>>>>>> If you don't have a connection pool, it just might be a client >>>>>>>>>>>>> problem. I'll take a deeper look anyway, just next time it >>>>>>>>>>>>> happens, could >>>>>>>>>>>>> you please try to shutdown the client thread, if it's even >>>>>>>>>>>>> possible. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 2, 2014 at 5:30 AM, Ivan Lagunov < >>>>>>>>>>>>> lagi...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've been facing major issues with Sedna hanging regularly >>>>>>>>>>>>>> for the last two weeks (after several months of stable work). I >>>>>>>>>>>>>> tried to do >>>>>>>>>>>>>> the complete export-import of data using se_exp but Sedna still >>>>>>>>>>>>>> hangs >>>>>>>>>>>>>> regularly. Looking into event.log does not help as queries >>>>>>>>>>>>>> differ every >>>>>>>>>>>>>> time. Sometimes it looks like Sedna hangs after some heavy query >>>>>>>>>>>>>> (with huge >>>>>>>>>>>>>> response) while in other cases there are no heavy queries to >>>>>>>>>>>>>> suspect. I've >>>>>>>>>>>>>> written a script to collect the following logs/dumps you will >>>>>>>>>>>>>> find in the >>>>>>>>>>>>>> archives: >>>>>>>>>>>>>> >>>>>>>>>>>>>> event.log - the last Sedna event log till the moment DB hangs >>>>>>>>>>>>>> and is stopped; >>>>>>>>>>>>>> sedna_procs.txt - list of all Sedna processes (ps -ef); >>>>>>>>>>>>>> netstat.txt - list of all ESTABLISHED connections; >>>>>>>>>>>>>> ipcs.txt - results of ipcs -s; >>>>>>>>>>>>>> <pid>.txt - results of gdb with bt for all threads of a Sedna >>>>>>>>>>>>>> process (available for all processes at the moment of hanging). >>>>>>>>>>>>>> This is >>>>>>>>>>>>>> collected automatically for threads 1-4 of each process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm providing two sets of logs/dumps for two separate hanging >>>>>>>>>>>>>> events: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 25th September - seemed to hang without any heavy queries: >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://www.dropbox.com/s/2luq53cstc4t3tl/sedna_debug_info1.zip?dl=0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2nd October - hanged after a heavy query: >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://www.dropbox.com/s/scmszq9yphyv6xf/sedna_debug_info2.zip?dl=0 >>>>>>>>>>>>>> The heavy query occurs at LOG 02/10/2014 11:10:20 >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you can find out the cause of the hangings, it'd be the >>>>>>>>>>>>>> best. Otherwise it'd be nice to hear some recommendations at >>>>>>>>>>>>>> least on how >>>>>>>>>>>>>> to avoid these issues/minimize an impact. May be I need to >>>>>>>>>>>>>> upgrade to some >>>>>>>>>>>>>> later version but I'm currently using 3.5.615 that is already a >>>>>>>>>>>>>> night build >>>>>>>>>>>>>> that we had to upgrade to after some older issue (reported back >>>>>>>>>>>>>> in >>>>>>>>>>>>>> March-April of 2013). I'd try some 3.5 or 3.6 version but >>>>>>>>>>>>>> without knowing >>>>>>>>>>>>>> the reason there is no guarantee the potential bug is fixed in >>>>>>>>>>>>>> the later >>>>>>>>>>>>>> night build. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also it'd be really nice if there was an official release for >>>>>>>>>>>>>> 3.5 with all the latest bug fixes (I see the last night build is >>>>>>>>>>>>>> 3.5.627 of >>>>>>>>>>>>>> 8th May 2013). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>> Ivan Lagunov >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>>>>> Meet PCI DSS 3.0 Compliance Requirements with EventLog >>>>>>>>>>>>>> Analyzer >>>>>>>>>>>>>> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI >>>>>>>>>>>>>> DSS Reports >>>>>>>>>>>>>> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download >>>>>>>>>>>>>> White paper >>>>>>>>>>>>>> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog >>>>>>>>>>>>>> Analyzer >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Sedna-discussion mailing list >>>>>>>>>>>>>> Sedna-discussion@lists.sourceforge.net >>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/sedna-discussion >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
------------------------------------------------------------------------------
_______________________________________________ Sedna-discussion mailing list Sedna-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sedna-discussion