Re: [fossil-users] Fossil process hanging on sync to remote server?

2014-05-11 Thread Rene

On 2014-05-10 17:46, Andy Bradford wrote:

Thus said Gerald Gutierrez on Sat, 10 May 2014 01:53:56 -0700:


frame #8: 0x000105719ba2 fossil`ssl_receive(NotUsed=unavailable,
pContent=unavailable, N=unavailable) + 50 at http_ssl.c:399
   396   size_t got;
   397   size_t total = 0;
   398   while( N0 ){
- 399 got = BIO_read(iBio, pContent, N);
   400 if( got=0 ) break;
   401 total += got;
   402 N -= got;




I'm not sure if it is the course of the problem but got = unsigned,
So when bio_read returns -1 got is a big number because by definition it 
cannot go below 0;

on my machine if i declare
 int l=-1;
size_t r = l;
printf(l = %d r = l = %zu\n,l,r);
l = -1  r =l = 18446744073709551615

N cannot go below 0 also  subtracting got in this case will only yield 0 
by coincidence.


the 3rd argument to bio_read is listed as an int on my machine int is 4 
bytes and size_t is 8 bytes
depending on input this can go wrong with sufficient values not fitting 
in a int .

e.g.

this fits in an int,  0x1000
this does not fits in an int, 0x1 e.g the int will be 0

changing ssl_receive to
 /*
 ** Receive content back from the SSL connection.
 */
 size_t ssl_receive(void *NotUsed, void *pContent, size_t N){
   ssize_t got;
   size_t total = 0;
   while( N0 ){
 got = BIO_read(iBio, pContent, N = INT_MAX ? N : INT_MAX);
 if( got=0 ) break;
 total += got;
 N -= got;
 pContent = (void*)((char*)pContent)[got];
   }
   return total;
 }

will yield better results (I hope) because I cannot test it I attached a 
unified patch.
I patched http_socket.c and http_ssl.c against the latest of the trunk. 
I wonder if it solves your problem?


--
Rene--- http_socket.c
+++ http_socket.c
@@ -182,14 +182,14 @@
 
 /*
 ** Send content out over the open socket connection.
 */
 size_t socket_send(void *NotUsed, void *pContent, size_t N){
-  size_t sent;
+  ssize_t sent;
   size_t total = 0;
   while( N0 ){
-sent = send(iSocket, pContent, N, 0);
+sent = send(iSocket, pContent, NSSIZE_MAX ?SSIZE_MAX:N , 0);
 if( sent=0 ) break;
 total += sent;
 N -= sent;
 pContent = (void*)((char*)pContent)[sent];
   }

--- http_ssl.c
+++ http_ssl.c
@@ -444,19 +444,19 @@
   cert = PEM_read_bio_X509(mem, NULL, 0, NULL);
   free(zCert);
   BIO_free(mem);  
   return cert;
 }
-
+#include limits.h
 /*
 ** Send content out over the SSL connection.
 */
 size_t ssl_send(void *NotUsed, void *pContent, size_t N){
-  size_t sent;
+  ssize_t sent;
   size_t total = 0;
   while( N0 ){
-sent = BIO_write(iBio, pContent, N);
+sent = BIO_write(iBio, pContent,N = INT_MAX ? N : INT_MAX);
 if( sent=0 ) break;
 total += sent;
 N -= sent;
 pContent = (void*)((char*)pContent)[sent];
   }
@@ -465,18 +465,18 @@
 
 /*
 ** Receive content back from the SSL connection.
 */
 size_t ssl_receive(void *NotUsed, void *pContent, size_t N){
-  size_t got;
+  ssize_t got;
   size_t total = 0;
   while( N0 ){
-got = BIO_read(iBio, pContent, N);
+got = BIO_read(iBio, pContent, N = INT_MAX ? N : INT_MAX);
 if( got=0 ) break;
 total += got;
 N -= got;
 pContent = (void*)((char*)pContent)[got];
   }
   return total;
 }
 
 #endif /* FOSSIL_ENABLE_SSL */

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil process hanging on sync to remote server?

2014-05-10 Thread Gerald Gutierrez
On Sat, May 10, 2014 at 1:03 AM, Stephan Beal sgb...@googlemail.com wrote:

 For me fossil builds with -g (debug) flags by default, so you shouldn't
 have to rebuild.


I ended up recompiling fossil just to be sure. On Mac OSX Mavericks, gdb
has been replaced by lldb.

I managed to get the process to hang again (it hadn't completed in over 10
minutes during a sync). I attached to it and went looking up the frames one
by one.

Here are the frames:

frame #0: 0x7fff8aef59f0 libsystem_kernel.dylib`read + 8
frame #1: 0x7fff8fbfd84c libcrypto.0.9.8.dylib`conn_read + 76
frame #2: 0x7fff8fbf7264 libcrypto.0.9.8.dylib`BIO_read + 100
frame #3: 0x7fff86b6513d libssl.0.9.8.dylib`ssl3_read_n + 365
frame #4: 0x7fff86b65b9f libssl.0.9.8.dylib`ssl3_read_bytes + 735
frame #5: 0x7fff86b63dec libssl.0.9.8.dylib`ssl3_read + 156
frame #6: 0x7fff86b4e4c9 libssl.0.9.8.dylib`ssl_read + 73
frame #7: 0x7fff8fbf7264 libcrypto.0.9.8.dylib`BIO_read + 100
frame #8: 0x000105719ba2 fossil`ssl_receive(NotUsed=unavailable,
pContent=unavailable, N=unavailable) + 50 at http_ssl.c:399
   396   size_t got;
   397   size_t total = 0;
   398   while( N0 ){
- 399 got = BIO_read(iBio, pContent, N);
   400 if( got=0 ) break;
   401 total += got;
   402 N -= got;

So, it's hanging in the BIO_read function.

Global/static variables (command: ta v) are:

(SSL_CTX *) sslCtx = 0x7fa1c2e039b0
(char *) sslErrMsg = 0x
(SSL *) ssl = 0x7fa1c2e04430
(BIO *) iBio = 0x7fa1c2e043c0

Frame variables (fr v) are unfortunately:

(void *) NotUsed = variable not available
(void *) pContent = variable not available
(size_t) N = variable not available
(size_t) total = 0
(size_t) got = variable not available

Going up a couple more frames gives context to where in the code it is
stalling but I'm not sure whether it gives any insight to why. Perhaps if
someone could give some guidance on how I can investigate further I can
help diagnose. Here are the remaining frames all the way up to main:

frame #9: 0x00010571a3a6 fossil`transport_fetch(pUrlData=unavailable,
zBuf=0x7fa1c2e078b0, N=1000) + 102 at http_transport.c:311
   308 }
   309   }else if( pUrlData-isHttps ){
   310 #ifdef FOSSIL_ENABLE_SSL
- 311 got = ssl_receive(0, zBuf, N);
   312 #else
   313 got = 0;
   314 #endif
(lldb) up
frame #10: 0x00010571a548 fossil`transport_receive_line [inlined]
transport_load_buffer(pUrlData=0x00010589f430) + 236 at
http_transport.c:393
   390 transport.pBuf = pNew;
   391   }
   392   if( N0 ){
- 393 i = transport_fetch(pUrlData, transport.pBuf[transport.nUsed],
N);
   394 if( i0 ){
   395   transport.nRcvd += i;
   396   transport.nUsed += i;
(lldb) up
frame #11: 0x00010571a45c
fossil`transport_receive_line(pUrlData=0x00010589f430) + 76 at
http_transport.c:416
   413   i = iStart = transport.iCursor;
   414   while(1){
   415 if( i = transport.nUsed ){
- 416   transport_load_buffer(pUrlData, pUrlData-isSsh ? 2 : 1000);
   417   i -= iStart;
   418   iStart = 0;
   419   if( i = transport.nUsed ){
(lldb) up
frame #12: 0x000105718815
fossil`http_exchange(pSend=0x7fff5a518640, pReply=0x7fff5a518620,
useLogin=1, maxRedirect=20) + 1301 at http.c:206
   203   */
   204   closeConnection = 1;
   205   iLength = -1;
- 206   while( (zLine = transport_receive_line(GLOBAL_URL()))!=0 
zLine[0]!=0 ){
   207 /* printf([%s]\n, zLine); fflush(stdout); */
   208 if( fossil_strnicmp(zLine, http/1., 7)==0 ){
   209   if( sscanf(zLine, HTTP/1.%d %d, iHttpVersion, rc)!=2 )
goto write_err;
(lldb) up
frame #13: 0x00010576b1e3 fossil`client_sync(syncFlags=unavailable,
configRcvMask=unavailable, configSendMask=unavailable) + 1955 at
xfer.c:1560
   1557}
   1558fflush(stdout);
   1559/* Exchange messages with the server */
- 1560if( http_exchange(send, recv, (syncFlags  SYNC_CLONE)==0 ||
nCycle0,
   1561MAX_REDIRECTS) ){
   1562  nErr++;
   1563  break;
(lldb) up
frame #14: 0x00010574d2a9 fossil`autosync(flags=unavailable) + 313 at
sync.c:75
   72if( find_option(verbose,v,0)!=0 ) flags |= SYNC_VERBOSE;
   73fossil_print(Autosync:  %s\n, g.urlCanonical);
   74url_enable_proxy(via proxy: );
- 75rc = client_sync(flags, configSync, 0);
   76if( rc ) fossil_warning(Autosync failed);
   77return rc;
   78   }
(lldb) up
frame #15: 0x0001056f9f85 fossil`commit_cmd + 5365 at checkin.c:1927
   1924  db_end_transaction(0);
   1925
   1926  if( !g.markPrivate ){
- 1927autosync(SYNC_PUSH|SYNC_PULL);
   1928  }
   1929  if( count_nonbranch_children(vid)1 ){
   1930fossil_print( warning: a fork has occurred *\n);
(lldb) up
frame #16: 0x000105726195 fossil`main(argc=unavailable,
argv=unavailable) + 2325 at main.c:701
   698 fossil_exit(1);
   699   }
   700   atexit( fossil_atexit );
- 701   aCommand[idx].xFunc();
   702   fossil_exit(0);
   

Re: [fossil-users] Fossil process hanging on sync to remote server?

2014-05-10 Thread Jan Danielsson
On 10/05/14 10:53, Gerald Gutierrez wrote:
[---]
 pContent=unavailable, N=unavailable) + 50 at http_ssl.c:399
396   size_t got;
397   size_t total = 0;
398   while( N0 ){
 - 399 got = BIO_read(iBio, pContent, N);
400 if( got=0 ) break;
401 total += got;
402 N -= got;
 
 So, it's hanging in the BIO_read function.

   ...which is probably blocking and waiting for data.  Either it's
supposed to wait for data which the other side isn't sending (a problem
at the other side?), or it has gotten the idea that it needs more data
even though it doesn't (a local problem?).  I'd start by taking a look
at what the other side is doing.

   Is it possible for you to test without SSL?

-- 
Kind Regards,
Jan
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil process hanging on sync to remote server?

2014-05-10 Thread Andy Bradford
Thus said Gerald Gutierrez on Sat, 10 May 2014 01:53:56 -0700:

 frame #8: 0x000105719ba2 fossil`ssl_receive(NotUsed=unavailable,
 pContent=unavailable, N=unavailable) + 50 at http_ssl.c:399
396   size_t got;
397   size_t total = 0;
398   while( N0 ){
 - 399 got = BIO_read(iBio, pContent, N);
400 if( got=0 ) break;
401 total += got;
402 N -= got;
 

So, it's blocking in the BIO_read function. Hard to say why without more
data about both ends. netstat -na  (on both sides) will probably provide
some interesting  information (e.g. are  there blocks of data  queued in
either Recv-Q or Send-Q on either end of the ESTABLISHED connection).

gdb on the remote fossil would provide some other details.

I did find this particular comment about using BIO_read and non-blocking
I/O in OpenBSD's manpage (specifically the last sentence):

   One technique sometimes used with  blocking sockets is to use
   a system  call (such  as select(),  poll() or  equivalent) to
   determine when data is available and then call read() to read
   the data. The equivalent with  BIOs (that is call select() on
   the underlying I/O structure and then call BIO_read() to read
   the  data)  should not  be  used  because  a single  call  to
   BIO_read() can cause several reads (and writes in the case of
   SSL BIOs) on the underlying I/O  structure and may block as a
   result. Instead  select() (or equivalent) should  be combined
   with  non blocking  I/O so  successive reads  will request  a
   retry instead of blocking.

I'm  not sure  if using  this technique  would fare  any better  without
understanding  why  it's  blocking.  Is there  a  firewall  reacting  to
something it doesn't  like? Some other network problem  that exists like
lost packets? Does it only happen when using cron? If so, why?

Andy
-- 
TAI64 timestamp: 4000536e49ee


___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


[fossil-users] Fossil process hanging on sync to remote server?

2014-05-09 Thread Gerald Gutierrez
On Fri, May 9, 2014 at 2:25 AM, Stephan Beal sgb...@googlemail.com wrote:

 Other than that, i can't comment: i've only seen such behaviour in 'ping'
 on Solaris, where it can cause a backlog of cronjobs, which causes all
 other jobs to queue up until you kill the pings, at which point _all_
 queued jobs, since the queue limit was reached (several days in my case),
 run in rapid succession!



(Changed subject line to reflect new topic)

Well, it's happened again.

This time I have the cron logging on. I get a mail message every time a
cronjob completes, and it's distinctly missing the one for the fossil sync
session that is hung. If I look at processes, I get this (notice it
executed at 9:05am, about 5 minutes after cronjob started and the entire
cronjob, if successful, only takes about 30 seconds):

$ uname -a
Darwin mycomp.local 13.1.0 Darwin Kernel Version 13.1.0: Thu Jan 16
19:40:37 PST 2014; root:xnu-2422.90.20~2/RELEASE_X86_64 x86_64

$ ps auxww | grep fossil
USER  PID  %CPU %MEM  VSZRSS   TT  STAT STARTED
 TIME COMMAND
xxx  7619   0.0  0.2  2490036  29836   ??  S 9:05AM
0:13.80 /usr/local/bin/fossil commit --no-warnings -m Fri May  9 09:05:41
PDT 2014

The cronjob log for the cronjob executing AFTER the one that hung says this:

added 3 files, deleted 0 files
/usr/local/bin/fossil: database is locked: {REPLACE INTO
config(name,value,mtime)
VALUES('last-sync-url','my repo url',now())}
If you have recently updated your fossil executable, you might
need to run fossil all rebuild to bring the repository
schemas up to date.

So, there is definitely a problem here. It doesn't happen all the time, but
enough that it occurs at least once a day.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil process hanging on sync to remote server?

2014-05-09 Thread Stephan Beal
On Sat, May 10, 2014 at 12:40 AM, Gerald Gutierrez 
gerald.gutier...@gmail.com wrote:

 So, there is definitely a problem here. It doesn't happen all the time,
 but enough that it occurs at least once a day.


i suspect it's OS specific. Richard syncs many repositories via cron on a
regular basis (hourly for the sqlite mirrors, IIRC) and has never
had/reported any problem with this, nor can i remember it coming up before
on the list. :/

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do. -- Bigby Wolf
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil process hanging on sync to remote server?

2014-05-09 Thread Andy Bradford
Thus said Gerald Gutierrez on Fri, 09 May 2014 15:40:13 -0700:

 $ ps auxww | grep fossil
 USER  PID  %CPU %MEM  VSZRSS   TT  STAT STARTED
  TIME COMMAND
 xxx  7619   0.0  0.2  2490036  29836   ??  S 9:05AM
 0:13.80 /usr/local/bin/fossil commit --no-warnings -m Fri May  9 09:05:41
 PDT 2014

Any chance you could attach gdb to this process and see what it's doing?

Something like:

gdb fossil -p 7619
...
(gdb) bt

Thanks,

Andy
--
TAI64 timestamp: 4000536d6a2c
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users