Re: [Zope-dev] Zope threads on 2.7.0, python 2.3.3

2004-05-26 Thread Dario Lopez-Kästen
Heimo Laukkanen wrote:
Hi,
I am hitting my head agains wall - and witnessing strange behaviour, 
where after time most of load focuses only to one thread. I am not sure 
what is the cause of this, but so far we have seen different things 
before load has concentrated to one single thread.
We also have this behaviour, but I have thought that it was our 
overusage of DCOracle2 connections that was the culprit, along with out 
bad TTW code.

Do you use any DA, nad have long running SQL queries?
We have no fix for this, but have resigned to do restarts every now and 
then. We use Zope 2.5.1, Redhat 7.3

/dario
--
-- ---
Dario Lopez-Ksten, IT Systems  Services Chalmers University of Tech.
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Zope threads on 2.7.0, python 2.3.3

2004-05-26 Thread Heimo Laukkanen
On Wed, 26 May 2004 08:09:53 +0200, Dario Lopez-Kästen  
[EMAIL PROTECTED] wrote:

Heimo Laukkanen wrote:
 Hi,
 I am hitting my head agains wall - and witnessing strange behaviour,  
where after time most of load focuses only to one thread. I am not sure  
what is the cause of this, but so far we have seen different things  
before load has concentrated to one single thread.
We also have this behaviour, but I have thought that it was our  
overusage of DCOracle2 connections that was the culprit, along with out  
bad TTW code.

Do you use any DA, nad have long running SQL queries?
Yes, extensively. We use DCOracle2 that had to be patched to work with  
UTF-8. And we combine it with badly scripted code too. We started to do  
prototype by using own Archetypes storage layer that stores data from  
content objects to the Oracle database in certain format - so every access  
to an object actually creates a load of queries to the Oracle database.

Connections to the databse are handled by the Oracle adapter in Zope, so  
we have not done any own connections but rather just run queries through  
Zope adapter.

We have no fix for this, but have resigned to do restarts every now and  
then. We use Zope 2.5.1, Redhat 7.3
How did you debug or pinpoint the culprit to be DCOracle connection?
Have you tried cx_Oracle?
--
-huima
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Zope threads on 2.7.0, python 2.3.3

2004-05-26 Thread Dario Lopez-Kästen
Heimo Laukkanen wrote:
Do you use any DA, nad have long running SQL queries?

Yes, extensively. We use DCOracle2 that had to be patched to work with  
UTF-8. And we combine it with badly scripted code too. We started to do  
prototype by using own Archetypes storage layer that stores data from  
content objects to the Oracle database in certain format - so every 
access  to an object actually creates a load of queries to the Oracle 
database.

Connections to the databse are handled by the Oracle adapter in Zope, 
so  we have not done any own connections but rather just run queries 
through  Zope adapter.
We use a lot of queries (minimum 10-15) for each request, accros around 
7 DA connected to various Oracle schemas. Most of the queries are 
fairsly simple, but there are a couple of queries that are often used 
that are just plain bad, however they are so bad that it is very 
difficult to fix them.

These queries generate a lot of load on the DB server (they are *very* 
bad SQL). So these tend to take a long tiem to respond.

We have no fix for this, but have resigned to do restarts every now 
and  then. We use Zope 2.5.1, Redhat 7.3
How did you debug or pinpoint the culprit to be DCOracle connection?
Gut feeling, doing some basic optimisation of the not-so complicated 
cases. I did receive a few tips on how to do real debugging from Matt, 
Dieter and others, but those tips assume a level of knowlegde that I do 
not posses yet - also I have until recently not had any hardware to do 
that kind of debugging on.

I just made some basic observations - some times Zope would stop 
responding and angry users would call, and I woudl find threads that had 
been running for more thatn 7 seconds, while at the same time I 
could observe queries on the database that had been running for similar 
lenght of time, occasionally even blocking the database altoghther.

Some times it seemed like Zope wold lose the connection to the 
database (we still get that randomly from now and then) but I am not 
100% sure that this is only zope's fault - it may be that the database 
was under so much load that it lost the connection to zope, thus 
triggering some kidn of wait for reply loop on zope's side.

Chris withers has done some work in improving DCOracle2's connections 
and general bug-fixning. If you haven't used it, grab the latest 
DCOracle2 from cvs - it is much better.

We see this behaviour mostly under heavy load - many users accessing the 
database all the time. Using the latest DCOracle and improving parts of 
our code has removed a lot of the problems, however it believe it is 
still unclcear what the problem is - in our setting, my stance is that 
we have only cured the symptoms, not the real problems...

Have you tried cx_Oracle?
No I was not aware that they had a Zope adaptor.
/dario
--
-- ---
Dario Lopez-Kästen, IT Systems  Services Chalmers University of Tech.
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Zope threads on 2.7.0, python 2.3.3

2004-05-26 Thread Heimo Laukkanen
On Wed, 26 May 2004 08:44:44 +0200, Dario Lopez-Kästen  
[EMAIL PROTECTED] wrote:

We use a lot of queries (minimum 10-15) for each request, accros around  
7 DA connected to various Oracle schemas. Most of the queries are  
fairsly simple, but there are a couple of queries that are often used  
that are just plain bad, however they are so bad that it is very  
difficult to fix them.
Ok. In our case we use very simple queries, but this stupid code generates  
a lot of those. One thing to do is to throw away stupidity and Archetypes  
tsorage layer that is an bad idea - and put object to flush and read it's  
data from the oracle only when actually necessary.

Chris withers has done some work in improving DCOracle2's connections  
and general bug-fixning. If you haven't used it, grab the latest  
DCOracle2 from cvs - it is much better.
We did take it from CVS. Has there been much work on lately?
We see this behaviour mostly under heavy load - many users accessing the  
database all the time. Using the latest DCOracle and improving parts of  
our code has removed a lot of the problems, however it believe it is  
still unclcear what the problem is - in our setting, my stance is that  
we have only cured the symptoms, not the real problems...
Sounds familiar.
Have you tried cx_Oracle?
No I was not aware that they had a Zope adaptor.
Well Dieter and others suggested that it should be doable to to write  
adaptor based on any python db api compliant adaptor - either using  
DCOracle, psycopg or similar as base. However since I've never done that  
kind of work before I am reluctant to take the step and be bitten, hence  
the question if you have had more courage.

--
-huima
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Zope threads on 2.7.0, python 2.3.3

2004-05-26 Thread Dario Lopez-Kästen
Heimo Laukkanen wrote:
On Wed, 26 May 2004 08:44:44 +0200, Dario Lopez-Kästen  
[EMAIL PROTECTED] wrote:

Ok. In our case we use very simple queries, but this stupid code 
generates  a lot of those. One thing to do is to throw away stupidity 
and Archetypes  tsorage layer that is an bad idea - and put object to 
flush and read it's  data from the oracle only when actually necessary.
I agree. We cannot do it however, because ost of my code is mostly TTW, 
so w have no custom obejcts. However, this is a good strategy in gerenal 
- do not read from external soruces unless it is necessary...

Chris withers has done some work in improving DCOracle2's connections  
and general bug-fixning. If you haven't used it, grab the latest  
DCOracle2 from cvs - it is much better.

We did take it from CVS. Has there been much work on lately?
not that I am aware of. (I haven't checked :)
Have you tried cx_Oracle?
No I was not aware that they had a Zope adaptor.

Well Dieter and others suggested that it should be doable to to write  
adaptor based on any python db api compliant adaptor - either using  
DCOracle, psycopg or similar as base. However since I've never done 
that  kind of work before I am reluctant to take the step and be bitten, 
hence  the question if you have had more courage.
No, sorry. For me, this too is beyond what I can do at the moment, 
mostly due to time constraints. Hiopefully this will chagne as more 
people become invlved with Zope at my work...

/dario
--
-- ---
Dario Lopez-Kästen, IT Systems  Services Chalmers University of Tech.
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Zope threads on 2.7.0, python 2.3.3

2004-05-26 Thread Chris Withers
Dario Lopez-Kästen wrote:
Chris withers has done some work in improving DCOracle2's connections 
and general bug-fixning. If you haven't used it, grab the latest 
DCOracle2 from cvs - it is much better.
My work is still on a branch...
Have you tried cx_Oracle?
No I was not aware that they had a Zope adaptor.
I think they do, if it's significantly better than DCOracle, then I'll tell 
people to switch instead of trying to make DCOracle work further ;-)

Chris
--
Simplistix - Content Management, Zope  Python Consulting
   - http://www.simplistix.co.uk
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Zope threads on 2.7.0, python 2.3.3

2004-05-26 Thread Dieter Maurer
Heimo Laukkanen wrote at 2004-5-25 22:46 +0300:
 ...
 What does Control_Panel -- Debug information tells you
 about the use of your connections (at the bottom of the page)?

At the moment it said that only one opened connection and others were  
none. I have no terminal access to the machines at the moment to check how  
threads are behing at the moment.

That looks strange...
Maybe, the only running thread does not release the GIL?


And then backtrace with gdb from those quiet threads


My most essential tools when analysing Python code in gdb
are the following two GDB command definitions.

def ps
x/s ({PyStringObject}$arg0)-ob_sval
end

def pfr
ps f-f_code-co_filename
ps f-f_code-co_name
p f-f_lineno
end

ps allows you to look at a Python String variable
and pfr can be called in eval_frame frames.
It tells you what code this frame is executing -- identified
by module and function. You cannot trust the lineno in
Python 2.3 (it is the start of the function not where you actually
are).


~# gdb program 14552
backtrace
 ...
#7  0x080a22f6 in eval_frame (f=0x9b5ea24) at Python/ceval.c:2116

use fr 7 to select this frame and then pfr to see where this is.

...
Another thread ( 14548 )


#0  0x401153c4 in read () from /lib/libc.so.6
#1  0x40029ae0 in __DTOR_END__ () from /lib/libpthread.so.0
#2  0x412474f0 in nttrd () from  
/opt/portaali/comp/oracle/lib/libclntsh.so.9.0
#3  0x410fdcf8 in nsprecv () from  

Obviously, this is a call to Oracle.
Maybe, the GIL is not released?
 ...
#14 0x40e13631 in Cursor_execute (self=0x4216fcd0, args=0x4017002c) at  
src/dco2.c:3740

The GIL should have been released in the function above.
Check it for Py_BEGIN_ALLOW_THREADS, Py_END_ALLOW_THREADS
around line 3740.


  PID: 14547

#0  0x401153c4 in read () from /lib/libc.so.6
#1  0x40029ae0 in __DTOR_END__ () from /lib/libpthread.so.0
#2  0x412474f0 in nttrd () from  
/opt/portaali/comp/oracle/lib/libclntsh.so.9.0

This, too is waiting for Oracle...

PID: 14545

#0  0x401153c4 in read () from /lib/libc.so.6
#1  0x40029ae0 in __DTOR_END__ () from /lib/libpthread.so.0
#2  0x412474f0 in nttrd () from  
/opt/portaali/comp/oracle/lib/libclntsh.so.9.0
#3  0x410fdcf8 in nsprecv () from  
This, too...


PID: 14531

#0  0x4011c7ee in select () from /lib/libc.so.6
#1  0x40568cb4 in __DTOR_END__ () from  
/opt/portaali/ContentManagement-1.0/python-2.3.3/lib/python2.3/lib-dynload/select.so
#2  0x080e0e3f in PyCFunction_Call (func=0x4049636c, arg=0x426476e4,  
kw=0x0) at Objects/methodobject.c:73
#3  0x080a3beb in call_function (pp_stack=0xb11c, oparg=4) at  
Python/ceval.c:3439
#4  0x080a22f6 in eval_frame (f=0x9033f64) at Python/ceval.c:2116

Check where you are here (-- pfr). It may be the medusa/asyncore main loop.


You have several threads waiting for responses from Oracle.
Are you sure that Control_Panel -- Debug information
tells you only about a single ZODB connection?

-- 
Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] Zope threads on 2.7.0, python 2.3.3

2004-05-25 Thread Heimo Laukkanen
Hi,
I am hitting my head agains wall - and witnessing strange behaviour, 
where after time most of load focuses only to one thread. I am not sure 
what is the cause of this, but so far we have seen different things 
before load has concentrated to one single thread.

After reading thread:
http://www.gossamer-threads.com/lists/zope/dev/24230?page=last
I started to wonder whether that signalling behaviour is also a cause 
for our problems - since we are running Debina linux without NPTL (
Linux 2.4.26 #1 SMP Thu Apr 22 11:16:14 EEST 2004 i686 unknown ).

When zope is started, it nicely starts multiple threads and when tested 
with ab - each thread gets their share of the work. We saw in event log 
previously exceptions that were never caught - and thought if those were 
the source of the problem. Eventhough we added ugly try catch code 
around this problematic code - we still notice load concentration to 
this one thread.

I will try to go and look with gdb those no longer working threads - but 
would like to know if anyone could give any pointers on what to look for 
or what to check.

-huima
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Zope threads on 2.7.0, python 2.3.3

2004-05-25 Thread Dieter Maurer
Heimo Laukkanen wrote at 2004-5-25 14:54 +0300:
 ...
witnessing strange behaviour, 
where after time most of load focuses only to one thread.
 ...
After reading thread:
http://www.gossamer-threads.com/lists/zope/dev/24230?page=last

I started to wonder whether that signalling behaviour is also a cause 
for our problems - since we are running Debina linux without NPTL (
Linux 2.4.26 #1 SMP Thu Apr 22 11:16:14 EEST 2004 i686 unknown ).

I would be surprised...

What does Control_Panel -- Debug information tells you
about the use of your connections (at the bottom of the page)?

-- 
Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Zope threads on 2.7.0, python 2.3.3

2004-05-25 Thread Heimo Laukkanen
On Tue, 25 May 2004 20:21:21 +0200, Dieter Maurer [EMAIL PROTECTED]  
wrote:

I would be surprised...
What does Control_Panel -- Debug information tells you
about the use of your connections (at the bottom of the page)?
At the moment it said that only one opened connection and others were  
none. I have no terminal access to the machines at the moment to check how  
threads are behing at the moment.

However I did try to look at the processes with gdb - however I need to do  
more reading on gdb manual to provide more info beyond just backtrace on  
the process status. I'm not familiar on Python internals so any pointers  
on what to expect or what not to expect would be appreciated.

Below is first output from ps -auxfww | grep portaali where portaali is  
the name of the user who owns the processes.

Below we will see that each thread has started at the same time - but  
thread with pid 14553 has done most of the work. And while putting more  
load and looking at top - that is also the only thread that gets any  
percentage of process time. I have no information on when this has  
happened since event log does not tell anything peculiar.

portaali 14530  0.0  0.2  6116 4432 ?S12:12   0:00  
/opt/portaali/ContentManagement-1.0/python-2.3.3/bin/python /opt/portaali/
ContentManagement-1.0/zope-2.7.0/lib/python/zdaemon/zdrun.py -S  
/opt/portaali/ContentManagement-1.0/zope-2.7.0/lib/python/Zope/Startup/
zopeschema.xml -b 10 -d -f -s  
/usr/local/portaali/zope_instance/var/zopectlsock -x 0,2 -z  
/usr/local/portaali/zope_instance  
/usr/local/portaali/zope_instance/bin/runzope
portaali 14531  0.2 16.6 365220 345168 ? S12:12   0:46  \_  
/opt/portaali/ContentManagement-1.0/python-2.3.3/bin/python /opt/
portaali/ContentManagement-1.0/zope-2.7.0/lib/python/Zope/Startup/run.py  
-C /usr/local/portaali/zope_instance/etc/zope.conf
portaali 14532  0.0 16.6 365220 345168 ? S12:12   0:00  \_  
/opt/portaali/ContentManagement-1.0/python-2.3.3/bin/python /opt/
portaali/ContentManagement-1.0/zope-2.7.0/lib/python/Zope/Startup/run.py  
-C /usr/local/portaali/zope_instance/etc/zope.conf
portaali 14553 13.0 16.6 365220 345168 ? S12:12  34:40 \_  
/opt/portaali/ContentManagement-1.0/python-2.3.3/bin/python /opt/
portaali/ContentManagement-1.0/zope-2.7.0/lib/python/Zope/Startup/run.py  
-C /usr/local/portaali/zope_instance/etc/zope.conf
portaali 14552  0.1 16.6 365220 345168 ? S12:12   0:18 \_  
/opt/portaali/ContentManagement-1.0/python-2.3.3/bin/python /opt/
portaali/ContentManagement-1.0/zope-2.7.0/lib/python/Zope/Startup/run.py  
-C /usr/local/portaali/zope_instance/etc/zope.conf
portaali 14548  0.8 16.6 365220 345168 ? S12:12   2:16 \_  
/opt/portaali/ContentManagement-1.0/python-2.3.3/bin/python /opt/
portaali/ContentManagement-1.0/zope-2.7.0/lib/python/Zope/Startup/run.py  
-C /usr/local/portaali/zope_instance/etc/zope.conf
portaali 14545  0.1 16.6 365220 345168 ? S12:12   0:17 \_  
/opt/portaali/ContentManagement-1.0/python-2.3.3/bin/python /opt/
portaali/ContentManagement-1.0/zope-2.7.0/lib/python/Zope/Startup/run.py  
-C /usr/local/portaali/zope_instance/etc/zope.conf
portaali 14547  0.2 16.6 365220 345168 ? S12:12   0:36 \_  
/opt/portaali/ContentManagement-1.0/python-2.3.3/bin/python /opt/
portaali/ContentManagement-1.0/zope-2.7.0/lib/python/Zope/Startup/run.py  
-C /usr/local/portaali/zope_instance/etc/zope.conf

And then backtrace with gdb from those quiet threads
~# gdb program 14552
backtrace
#0  0x4007b87e in sigsuspend () from /lib/libc.so.6
#1  0x4001e879 in __pthread_wait_for_restart_signal () from  
/lib/libpthread.so.0
#2  0x4001fee1 in sem_wait@@GLIBC_2.1 () from /lib/libpthread.so.0
#3  0x080c4830 in PyThread_acquire_lock (lock=0x861a908, waitflag=1) at  
Python/thread_pthread.h:406
#4  0x080c76c8 in lock_PyThread_acquire_lock (self=0x417c4840,  
args=0x4017002c) at ./Modules/threadmodule.c:63
#5  0x080e0e3f in PyCFunction_Call (func=0x41c7c7cc, arg=0x4017002c,  
kw=0x0) at Objects/methodobject.c:73
#6  0x080a3beb in call_function (pp_stack=0xbe7fb904, oparg=0) at  
Python/ceval.c:3439
#7  0x080a22f6 in eval_frame (f=0x9b5ea24) at Python/ceval.c:2116
#8  0x080a31e2 in PyEval_EvalCodeEx (co=0x40442c20, globals=0x40445acc,  
locals=0x0, args=0x8bdb024, argcount=1,
kws=0x8bdb028, kwcount=1, defs=0x4047eb18, defcount=5, closure=0x0) at  
Python/ceval.c:2663
#9  0x080a55ec in fast_function (func=0x404a9924, pp_stack=0xbe7fba94,  
n=3, na=1, nk=1) at Python/ceval.c:3529
#10 0x080a3c71 in call_function (pp_stack=0xbe7fba94, oparg=256) at  
Python/ceval.c:3458

Another thread ( 14548 )
#0  0x401153c4 in read () from /lib/libc.so.6
#1  0x40029ae0 in __DTOR_END__ () from /lib/libpthread.so.0
#2  0x412474f0 in nttrd () from  
/opt/portaali/comp/oracle/lib/libclntsh.so.9.0
#3  0x410fdcf8 in nsprecv () from  
/opt/portaali/comp/oracle/lib/libclntsh.so.9.0
#4  0x41101ac5 in nsrdr () from  
/opt/portaali/comp/oracle/lib/libclntsh.so.9.0
#5