Ilya,

As far as I remember. It is almost the latest dev sources built with
relWithDebInfo. Internal data structures should be compatible.

Ivan

On Wed, Oct 15, 2014 at 2:08 AM, Ilya Taranov <epsi...@socio.msu.ru> wrote:

> I'm not sure, latest build will fix the problem.
>
> It's definitely some client-to-app communication problem.
> If I make a git patch for you with additional debug info, will you be able
> to apply it (assuming, the data file version will be the same)? What build
> do you currently use?
>
> On Sat, Oct 11, 2014 at 12:48 AM, Ivan Lagunov <lagi...@gmail.com> wrote:
>
>> Hi Ilya,
>>
>> I've analyzed logs for other issue occurrences and it has appeared more
>> complicated.
>>
>> First, could you please clarify about "Session is ready" statement, when
>> exactly is it dumped to the log file? I'm wondering because when the
>> database is hanging, those log entries still appear at some random moments
>> of time.
>>
>> Second, I'm going to provide the investigation details for several more
>> issue occurrences here. Below "fast XQuery" means that it uses index and is
>> expected to be very fast.
>>
>> https://www.dropbox.com/s/2luq53cstc4t3tl/sedna_debug_info1.zip?dl=0
>>
>> se_trn 27173 - DROP DOCUMENT operation, COMMITED
>> se_gov 29197
>> se_sm 29224
>> se_trn 29617 - fast XQuery basictypes:getBasicType, not COMMITED, gdb
>> thread 1 in select
>> se_trn 29820 - Session is ready
>> se_trn 29976 - Session is ready
>>
>> https://www.dropbox.com/s/scmszq9yphyv6xf/sedna_debug_info2.zip?dl=0
>>
>> se_gov 15890
>> se_sm 15899
>> se_trn 21611 - doc-available XQuery, not COMMITED, gbd thread 1 in
>> nanosleep
>> se_trn 21709 - doc('$version') XQuery, COMMITED, gdb thread 1 in select
>> se_trn 21713 - doc('$version') XQuery, COMMITED, gdb thread 1 in select
>> se_trn 21716 - doc('$version') XQuery, COMMITED, gdb thread 1 in select
>> se_trn 21719 - doc('$version') XQuery, COMMITED, gdb thread 1 in select
>> se_trn 21727 - LOAD STDIN operation, not COMMITED, gbd thread 1 in
>> nanosleep
>> se_trn 21996 - fast XQuery basictypes:getBasicType, not COMMITED, gdb
>> thread 1 in select
>> se_trn 22002 - document("$collections") XQuery, not COMMITED, gdb thread
>> 1 in nanosleep
>> se_trn 22010 - Session is ready
>> se_trn 22013 - Session is ready
>> se_trn 22152 - Session is ready
>> se_trn 22295 - Session is ready
>> se_trn 23615 - Session is ready
>> se_trn 24315 - Session is ready
>>
>> Questions:
>> 1. Does nanosleep state mean that the operation is waiting for another
>> one (here it's probably 21996 XQuery in select)?
>> 2. Why does doc-available wait for any other operation? I'd expect it to
>> be non-blocking and to run immediately on the latest data snapshot.
>> 3. Why does doc('$version') wait for any other operation? It's a database
>> metadata being fetched, so I'd also expect it to run immediately.
>>
>> https://www.dropbox.com/s/3ibo4dvzapfdirw/sedna_debug_info3.zip?dl=0
>>
>> se_trn 641 - fast XQuery packages:getPackage, not COMMITED, gdb thread 1
>> in select
>> se_trn 645 - Session is ready & COMMITED without operation
>> se_trn 648 - Session is ready
>> se_trn 823 - Session is ready
>> se_trn 898 - Session is ready
>> se_gov 8444
>> se_sm 8453
>>
>> Questions:
>> What is going on with 645? There is "Session in ready" statement
>> immediately followed by "COMMITED" statement in the event.log. What
>> operation was committed? Is it possible the order of statements in the
>> event.log is inconsistent? Even so there is no more statements with
>> pid=645, so no operation to be seen that could be committed. Very
>> suspicious.
>>
>> https://www.dropbox.com/s/38idxmpzjv6rvxf/sedna_debug_info4.zip?dl=0
>>
>> se_gov 1532
>> se_sm 1711
>> se_trn 6837 - DROP DOCUMENT operation, not COMMITED, gdb thread 1 in
>> nanosleep
>> se_trn 11703 - LOAD STDIN operation, not COMMITED, gdb thread 1 in
>> nanosleep
>> se_trn 11944 - document("$collections") XQuery, not COMMITED, gdb thread
>> 1 in nanosleep
>> se_trn 11972 - document("$collections") XQuery, not COMMITED, gdb thread
>> 1 in nanosleep
>> se_trn 12139 - XQuery collection("groupingDefinitions/wip"), COMMITED,
>> gdb thread 1 in select
>> se_trn 12183 - fast XQuery basictypes:getBasicType, not COMMITED, gdb
>> thread 1 in select
>> se_trn 12186 - XQuery collection("groupingDefinitions/wip"), COMMITED,
>> gdb thread 1 in select
>> se_trn 12195 - XQuery collection("groupingDefinitions/wip"), COMMITED,
>> gdb thread 1 in select
>> se_trn 12222 - document("$collections") XQuery, not COMMITED, gdb thread
>> 1 in nanosleep
>> se_trn 12302 - fast XQuery documents:getFilesByBasicTypeId, not COMMITED,
>> gdb thread 1 in nanosleep
>> se_trn 12307 - document("$documents") XQuery, not COMMITED, gdb thread 1
>> in nanosleep
>> se_trn 12310 - document("$collections") XQuery, not COMMITED, gdb thread
>> 1 in nanosleep
>> se_trn 12377 - Session is ready
>> se_trn 12382 - Session is ready
>> se_trn 12423 - Session is ready
>> se_trn 12848 - Session is ready
>>
>> Comment:
>> I understand many of those XQueries are not optimal, they are part of an
>> application developed by other devs. Now I'll take care of improving those
>> operations to replace document("$documents") and document("$collections")
>> with doc-available and col-available accordingly and to use an index for
>> collection("groupingDefinitions/wip").
>>
>> Overall conclusion:
>> There is no single operation/XQuery causing the database hanging. There
>> were at least three different XQueries observed. Taking into account the
>> seemingly random occurrences, it looks like the protocol issue indeed. I
>> don't see how to apply tcpdump in this case as it's Production environment.
>> Any ideas?
>>
>> I'd try to rebuild the database from sources and, perhaps, to take the
>> latest night 3.5 (or 3.6?) build for this. Do you think it makes sense to
>> try?
>>
>> Best regards,
>> Ivan Lagunov
>>
>> On Fri, Oct 10, 2014 at 1:23 AM, Ilya Taranov <epsi...@socio.msu.ru>
>> wrote:
>>
>>> Hi, from the top of my head...
>>>
>>> 1. Queries are evaluated in a lazy fashion. Which means, that before
>>> client asks to fetch the next row, nothing is evaluated.
>>>
>>> So select() is a socket read, which is waiting for client to make a
>>> decision on whether to commit (or end RO transaction) or ask for next part
>>> of the result. All locks are only released when end transaction happens.
>>> That's why others are waiting for one single client.
>>>
>>> To my mind, it's a wrong approach, and that's not entirely how I would
>>> do it right now, But that days we had ACID as a primary goal in minds.
>>>
>>> 2. Session is ready, means that there is no ongoing transaction, but the
>>> connection is established.
>>>
>>> 3. No idea in this particular case, but there can be a situation when
>>> session is started without client connection. Recovery/checkpoint, maybe
>>> something else, I'm not sure.
>>>
>>> 4. That dfntly looks strange, but should not cause a lock. Once
>>> transaction is committed, it should release all the locks. So even should
>>> there be orphan sessions, they would not interfere the normal operation.
>>>
>>> 5. Yes that pretty much looks like a problem. Based on logs, it looks
>>> like 17338 is waiting for client to fetch a result.
>>>
>>> That may be a protocol issue, by the way, when as a result of malformed
>>> message both client and sedna are waiting for each other.
>>>
>>> I would guess, that's always happens when
>>> "categorytree:getCategoryTree()" is called, could you please try that? If
>>> it confirms, could you please make a wireshark/tcpdump dump of this
>>> communication?
>>>
>>>
>>> On Thu, Oct 9, 2014 at 6:42 AM, Ivan Lagunov <lagi...@gmail.com> wrote:
>>>
>>>> Hi Ilya,
>>>>
>>>> Please check my investigations below.
>>>>
>>>> I've collected all the debug information for another hanging today,
>>>> it's available here:
>>>> https://www.dropbox.com/s/g14aer9821fzx05/sedna_debug_info6.zip?dl=0
>>>> Here are the statuses of all Sedna processes:
>>>>
>>>> se_trn 13436 - DROP DOCUMENT operation, COMMITED
>>>> se_trn 16245 - LOAD STDIN operation, COMMITED, no ESTABLISHED
>>>> connection on port 5050 found (only 5151)
>>>> se_trn 17335 - doc('$version') XQuery, COMMITED, gdb thread 1 in select
>>>> () from /lib64/libc.so.6
>>>> se_trn 17338 - XQuery running about 10sec, not COMMITED, gdb thread
>>>> 1 in select () from /lib64/libc.so.6
>>>> se_trn 17410 - XQuery running <1sec, COMMITED
>>>> se_trn 17462 - Session is ready
>>>> se_trn 17554 - Session is ready
>>>> se_trn 18892 - Session is ready
>>>> se_trn 19124 - Session is ready
>>>> se_gov 20441 - gdb threads 1 and 2 in select () from /lib64/libc.so.6
>>>> se_sm 20449
>>>>
>>>> Comments:
>>>> 1. Where gdb is not stated, all threads are in semop () from
>>>> /lib64/libc.so.6 except the last thread that is in 
>>>> pthread_cond_timedwait@@GLIBC_2.3.2
>>>> (). I believe semop is a good state, correct me if I'm wrong.
>>>> 2. All se_trn with "Session is ready" have no other entries in the
>>>> latest event.log. As I collect only the latest event.log, I cannot check
>>>> what was the last operation.
>>>> 3. All se_trn processes have two ESTABLISHED connections in netstat for
>>>> ports 5050 and 5151 if not stated otherwise.
>>>> 4. se_trn 17338, 17410, 17462 were created by the same application that
>>>> is under suspicion. The database unfreezed after the restart of that
>>>> application.
>>>>
>>>> Issues/questions:
>>>> 1. What exactly does it mean when a thread is in select()? It seems the
>>>> database is waiting but for the client or for some internal operation, e.g.
>>>> fetching the data. Please clarify.
>>>> 2. What exactly does "Session is ready" mean? When does it appear in
>>>> the event.log? I believe those connections are okay but I'm curious to 
>>>> know.
>>>> 3. se_trn 16245 - how is it possible that there is no ESTABLISHED
>>>> connection on port 5050 while the process is still alive? An issue?
>>>> 4. se_trn 17335 - how is it possible that thread is in select() while
>>>> the transaction has been COMMITED in event.log? An issue?
>>>> 5. se_trn 17338 - looks like a root cause of Sedna hanging. Currently I
>>>> run it via XML:DB API implementation mentioned earlier, more specifically I
>>>> run the XQuery via XQueryService.query("xquery string") call. Besides, I
>>>> always try to close the connection in the Java finally block, so it looks
>>>> roughly like this:
>>>>
>>>> try {
>>>>     rootCollection = DatabaseManager.getCollection(...);
>>>>     ... do something ...
>>>> } finally {
>>>>     rootCollection.close();
>>>> }
>>>>
>>>> According to event.log the XQuery was started at 09/10/2014 12:00:01.
>>>> In the application log somewhere between 12:01:53 and 12:05:00 appeared
>>>> a message:
>>>>
>>>>> Warning: an open Sedna XML:DB Collection/Connection was just
>>>>> automatically closed due to inactivity, you SHOULD close these manually.
>>>>
>>>> It seems to mean that the connection was considered as abandoned and
>>>> removed by XML:DB API implementation (see above conn-log-abandoned=true
>>>> and conn-remove-abandoned-timeout=120000).
>>>> For me it looks like the client waited for 2 minutes but did not get
>>>> any response from the database. So there was an attempt to close the
>>>> connection. It's easy to understand as the database hanged at about
>>>> 12:00:02 when the last transaction was COMMITED. So it may be even that the
>>>> root cause is hidden somewhere deeper. For example, this long XQuery could
>>>> cause a deadlock.
>>>>
>>>> I understand it may be an issue in XML:DB API implementation but its
>>>> support was stopped by Charles Foster. Unfortunately, we cannot afford to
>>>> migrate to XQJ API implementation yet. The most tricky part to understand
>>>> is why we had not faced similar issues before recent occurrence when it has
>>>> become a regular major issue immediately. There were no server updates and
>>>> no changes that could affect the way transactions/connections are handled.
>>>>
>>>> Best regards,
>>>> Ivan Lagunov
>>>>
>>>> On Thu, Oct 9, 2014 at 1:33 PM, Ivan Lagunov <lagi...@gmail.com> wrote:
>>>>
>>>>> Sorry, copied the wrong link for the connection pool documentation.
>>>>> Here is the correct one:
>>>>> http://www.cfoster.net/pdf/sedna/xmldb/documentation.pdf#page=10
>>>>>
>>>>> Besides, I use the following values for the connection pool parameters
>>>>> in the suspected application:
>>>>> conn-max-active=50
>>>>> conn-max-wait=30000
>>>>> conn-log-abandoned=true
>>>>> conn-remove-abandoned-timeout=120000
>>>>> conn-abandoned-check-interval=5000
>>>>>
>>>>> Best regards,
>>>>> Ivan Lagunov
>>>>>
>>>>> On Thu, Oct 9, 2014 at 1:28 PM, Ivan Lagunov <lagi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Ilya,
>>>>>>
>>>>>> I use Charles Foster implementation of XML:DB API, version 1.2.6
>>>>>> without any customizations:
>>>>>> http://www.cfoster.net/sedna/xmldb/download/
>>>>>>
>>>>>> Here is the link again to the connection pool documentation (it's a
>>>>>> part of the above implementation):
>>>>>> http://www.cfoster.net/sedna/xmldb/download/
>>>>>>
>>>>>> Meanwhile, I've identified the application that makes Sedna wait. At
>>>>>> least restarting the application has unfreezed the database twice 
>>>>>> already.
>>>>>> I'll analyze the application log along with Sedna event.log shortly.
>>>>>> Hopefully, I'll find the root cause.
>>>>>>
>>>>>> Best regards,
>>>>>> Ivan Lagunov
>>>>>>
>>>>>> On Thu, Oct 9, 2014 at 3:50 AM, Ilya Taranov <epsi...@socio.msu.ru>
>>>>>> wrote:
>>>>>>
>>>>>>> I can easily see the scenario, where one of the clients doesn't
>>>>>>> terminate the transaction properly. Unfortunately, current application
>>>>>>> timeout doesn't apply for client hangs. The issue can be easily fixed,
>>>>>>> however.
>>>>>>>
>>>>>>> Unfortunately, even fix still would not solve your problem, you
>>>>>>> would still have application timeout wait on locks. Probably the most
>>>>>>> straightforward way to find the cause for you would be to log connection
>>>>>>> checkins/checkouts at the client driver level.
>>>>>>>
>>>>>>> Could you please give me a link to the connection pool + driver you
>>>>>>> currently use (including all your changes if possible) so I could help 
>>>>>>> you
>>>>>>> with that?
>>>>>>>
>>>>>>> On Tue, Oct 7, 2014 at 4:37 AM, Ivan Lagunov <lagi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Ilya,
>>>>>>>>
>>>>>>>> Your suggestion seems to be correct. After I've restarted three
>>>>>>>> Java applications that had connections to Sedna, the database has 
>>>>>>>> resumed
>>>>>>>> processing operations. Here is the event.log where Sedna has hanged 
>>>>>>>> between
>>>>>>>> 12:35:03 and 13:06:05:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://www.dropbox.com/s/pzafuvzcvfchxwa/event-2014-10-07-13-15-04-018.zip?dl=0
>>>>>>>>
>>>>>>>> I'm not sure which of the applications caused it though. It still
>>>>>>>> worries me as the queries before hanging look quite simple and I cannot
>>>>>>>> understand which client operation could have hanged the whole database.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Ivan Lagunov
>>>>>>>>
>>>>>>>> On Mon, Oct 6, 2014 at 12:29 PM, Ivan Lagunov <lagi...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Ilya,
>>>>>>>>>
>>>>>>>>> Yes, I use a connection pool via Sedna XML:DB API implementation.
>>>>>>>>> It's documented here:
>>>>>>>>>
>>>>>>>>> http://www.cfoster.net/pdf/sedna/xmldb/documentation.pdf#page=10
>>>>>>>>>
>>>>>>>>> Unfortunately, there is no support for active connection timeout.
>>>>>>>>> This could be the cause if there was a query hanging for some reason. 
>>>>>>>>> Still
>>>>>>>>> if some query hangs forever (that should not as it's in my case) it 
>>>>>>>>> looks
>>>>>>>>> more like a database bug/deadlock situation.
>>>>>>>>>
>>>>>>>>> I'll try as you suggest next time.
>>>>>>>>>
>>>>>>>>> P.S. Sorry for the second email as I've forgotten to add CC first.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Ivan Lagunov
>>>>>>>>>
>>>>>>>>> On Fri, Oct 3, 2014 at 10:23 PM, Ilya Taranov <
>>>>>>>>> epsi...@socio.msu.ru> wrote:
>>>>>>>>>
>>>>>>>>>> Ivan, do you have a connection pool? I see one of the se_trn
>>>>>>>>>> processes waiting for socket read.
>>>>>>>>>>
>>>>>>>>>> If you don't have a connection pool, it just might be a client
>>>>>>>>>> problem. I'll take a deeper look anyway, just next time it happens, 
>>>>>>>>>> could
>>>>>>>>>> you please try to shutdown the client thread, if it's even possible.
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 2, 2014 at 5:30 AM, Ivan Lagunov <lagi...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I've been facing major issues with Sedna hanging regularly for
>>>>>>>>>>> the last two weeks (after several months of stable work). I tried 
>>>>>>>>>>> to do the
>>>>>>>>>>> complete export-import of data using se_exp but Sedna still hangs
>>>>>>>>>>> regularly. Looking into event.log does not help as queries differ 
>>>>>>>>>>> every
>>>>>>>>>>> time. Sometimes it looks like Sedna hangs after some heavy query 
>>>>>>>>>>> (with huge
>>>>>>>>>>> response) while in other cases there are no heavy queries to 
>>>>>>>>>>> suspect. I've
>>>>>>>>>>> written a script to collect the following logs/dumps you will find 
>>>>>>>>>>> in the
>>>>>>>>>>> archives:
>>>>>>>>>>>
>>>>>>>>>>> event.log - the last Sedna event log till the moment DB hangs
>>>>>>>>>>> and is stopped;
>>>>>>>>>>> sedna_procs.txt - list of all Sedna processes (ps -ef);
>>>>>>>>>>> netstat.txt - list of all ESTABLISHED connections;
>>>>>>>>>>> ipcs.txt - results of ipcs -s;
>>>>>>>>>>> <pid>.txt - results of gdb with bt for all threads of a Sedna
>>>>>>>>>>> process (available for all processes at the moment of hanging). 
>>>>>>>>>>> This is
>>>>>>>>>>> collected automatically for threads 1-4 of each process.
>>>>>>>>>>>
>>>>>>>>>>> I'm providing two sets of logs/dumps for two separate hanging
>>>>>>>>>>> events:
>>>>>>>>>>>
>>>>>>>>>>> 25th September - seemed to hang without any heavy queries:
>>>>>>>>>>>
>>>>>>>>>>> https://www.dropbox.com/s/2luq53cstc4t3tl/sedna_debug_info1.zip?dl=0
>>>>>>>>>>>
>>>>>>>>>>> 2nd October - hanged after a heavy query:
>>>>>>>>>>>
>>>>>>>>>>> https://www.dropbox.com/s/scmszq9yphyv6xf/sedna_debug_info2.zip?dl=0
>>>>>>>>>>> The heavy query occurs at LOG   02/10/2014 11:10:20
>>>>>>>>>>>
>>>>>>>>>>> If you can find out the cause of the hangings, it'd be the best.
>>>>>>>>>>> Otherwise it'd be nice to hear some recommendations at least on how 
>>>>>>>>>>> to
>>>>>>>>>>> avoid these issues/minimize an impact. May be I need to upgrade to 
>>>>>>>>>>> some
>>>>>>>>>>> later version but I'm currently using 3.5.615 that is already a 
>>>>>>>>>>> night build
>>>>>>>>>>> that we had to upgrade to after some older issue (reported back in
>>>>>>>>>>> March-April of 2013). I'd try some 3.5 or 3.6 version but without 
>>>>>>>>>>> knowing
>>>>>>>>>>> the reason there is no guarantee the potential bug is fixed in the 
>>>>>>>>>>> later
>>>>>>>>>>> night build.
>>>>>>>>>>>
>>>>>>>>>>> Also it'd be really nice if there was an official release for
>>>>>>>>>>> 3.5 with all the latest bug fixes (I see the last night build is 
>>>>>>>>>>> 3.5.627 of
>>>>>>>>>>> 8th May 2013).
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Ivan Lagunov
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
>>>>>>>>>>> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS
>>>>>>>>>>> Reports
>>>>>>>>>>> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White
>>>>>>>>>>> paper
>>>>>>>>>>> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog
>>>>>>>>>>> Analyzer
>>>>>>>>>>>
>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Sedna-discussion mailing list
>>>>>>>>>>> Sedna-discussion@lists.sourceforge.net
>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/sedna-discussion
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Sedna-discussion mailing list
Sedna-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sedna-discussion

Reply via email to