[jira] [Created] (PROTON-1170) closed links are never deleted

2016-04-06 Thread michael goulish (JIRA)
michael goulish created PROTON-1170:
---

 Summary: closed links are never deleted
 Key: PROTON-1170
 URL: https://issues.apache.org/jira/browse/PROTON-1170
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
 Environment: miserable
Reporter: michael goulish


I wrote a reactor-based application that makes a single connection, and then 
repeatedly makes-and-closes links (receivers) on that connection.

It makes and closes the links as fast as possible: as soon as it gets the 
on_receiver_close event, it makes a new one.  As soon as it gets the 
on_receiver_open event -- it closes that receiver.

This application talks to a dispatch router.

Problem:  Both the router and my application grow their memory (RSS) rapidly -- 
and the router's ability to respond to new link creations slows down rapidly.  
Looking at the router with   Valgrind/Callgrind, after about 15,000 links have 
been created and closed I see that 45% of all CPU time on the router is being 
consumed by pn_find_link().   Instrumenting that code, I see that the list it 
is looking at never decreases in size.

I tried creating my links with the "lifetime_policy" set to DELETE_ON_CLOSE, 
but that had no effect.  Grepping for that symbol, I see that it does not occur 
in the proton C code except in its definition, and in a printing convenience 
function.

Major scalability bug.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Does anyone read this list?

2016-03-24 Thread Michael Goulish

I do think this list is meant to be mostly about the
mechanical details of development, and any topics 
that would be of interest to actual users is meant to
go on the users list.

The fact that there is no documentation that might 
have told you this is actually intentional. It's to
help get you ready to read the proton code.  :-)





- Original Message -
> Troy,
> 
> I monitor both this discussion (proton) list and the users.
> 
> It is true there is more going on in the users discussion list.  I tend to
> post to the users list because a lot of times discussion items posted here
> (proton) are just not picked here as readily as on users.
> 
> I agree with your general assessment.  It does pose the question "When is it
> appropriate to post to this discussion list?"
> 
> Paul Flores
> 
> 
> 
> From: Troy Daniels [troy.dani...@stresearch.com]
> Sent: Wednesday, March 23, 2016 10:02 AM
> To: proton@qpid.apache.org
> Subject: Does anyone read this list?
> 
> It seems like there are two types of posts to this list: automated posts when
> there is an commit to version control, and initial questions from new users.
> There does not seem to be discussion or answers to questions.
> 
> It seems like I should unsubscribe and find a different forum for my
> questions.  Is that an accurate assessment?
> 
> Troy


Re: [VOTE] Release Qpid Proton 0.12.0

2016-02-04 Thread Michael Goulish
+1

Testing done:  I used it with all of my performance tests:

  * point-to-point communication with C and CPP clients.
  * many CPP senders and receivers intermediated by a router

The CPP clients exercise the proton::handler event interface.






- Original Message -
The artifacts proposed for release:

https://dist.apache.org/repos/dist/dev/qpid/proton/0.12.0-rc/

Please indicate your vote below.  If you favor releasing the 0.12.0 RC bits
as 0.12.0 GA, vote +1.  If you have reason to think the RC is not ready for
release, vote -1.

Thanks,
Justin


Re: PN_REACTOR_QUIESCED

2015-10-13 Thread Michael Goulish

But it's obvious how this constant was chosen.

With circular reasoning.





- Original Message -
> On Mon, 2015-10-12 at 16:05 -0400, aconway wrote:
> > ...
> > +1, that looks like the right fix. 3141 is an odd choice of default,
> > even for a mathematician.
> > 
> 
> At this point, I'm desperately trying to find an appropriate pi joke :
> -)
> 
> Andrew
> 
> 


[jira] [Created] (PROTON-1009) message.h does not have a set method for annotations

2015-09-28 Thread michael goulish (JIRA)
michael goulish created PROTON-1009:
---

 Summary: message.h does not have a set method for annotations
 Key: PROTON-1009
 URL: https://issues.apache.org/jira/browse/PROTON-1009
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Reporter: michael goulish


Comments above the method  pn_message_annotations() indicate that it can bot 
set and get annotations -- but in fact it has no way to set.

And it looks like there is no other way in the C API, either.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PROTON-1009) message.h does not have a set method for annotations

2015-09-28 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved PROTON-1009.
-
Resolution: Not A Problem

Oops.
I didn't realize that the function is returning a pointer that can be used to 
change the annotations.  *That's* how you set them.  Sorry for the noise.

> message.h does not have a set method for annotations
> 
>
> Key: PROTON-1009
> URL: https://issues.apache.org/jira/browse/PROTON-1009
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>        Reporter: michael goulish
>
> Comments above the method  pn_message_annotations() indicate that it can bot 
> set and get annotations -- but in fact it has no way to set.
> And it looks like there is no other way in the C API, either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PROTON-992) Proton's use of Cyrus SASL is not thread-safe.

2015-09-22 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed PROTON-992.
--
Resolution: Duplicate

this is a duplicate of PROTON-862

> Proton's use of Cyrus SASL is not thread-safe.
> --
>
> Key: PROTON-992
> URL: https://issues.apache.org/jira/browse/PROTON-992
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Affects Versions: 0.10
>    Reporter: michael goulish
>Assignee: michael goulish
>Priority: Critical
>
> Documentation for the Cyrus SASL library says that the library is believed to 
> be thread-safe only if the code that uses it meets several requirements.
> The requirements are:
> * you supply mutex functions (see sasl_set_mutex())
> * you make no libsasl calls until sasl_client/server_init() completes
> * no libsasl calls are made after sasl_done() is begun
> * when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library.
> It says explicitly that that sasl_set* calls are not thread safe, since they 
> set global state.
> The proton library makes calls to sasl_set* functions in :
>   pni_init_client()
>   pni_init_server(), and
>   pni_process_init()
> Since those are internal functions, there is no way for code that uses Proton 
> to lock around those calls.
> I think proton needs a new API call to let applications call 
> sasl_set_mutex().  Or something.
> We probably also need other protections to meet the other requirements 
> specified in the Cyrus documentation (and quoted above).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PROTON-992) Proton's use of Cyrus SASL is not thread-safe.

2015-09-22 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902731#comment-14902731
 ] 

michael goulish commented on PROTON-992:


oops.  this is a duplicate of PROTON-862

> Proton's use of Cyrus SASL is not thread-safe.
> --
>
> Key: PROTON-992
> URL: https://issues.apache.org/jira/browse/PROTON-992
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Affects Versions: 0.10
>    Reporter: michael goulish
>Assignee: michael goulish
>Priority: Critical
>
> Documentation for the Cyrus SASL library says that the library is believed to 
> be thread-safe only if the code that uses it meets several requirements.
> The requirements are:
> * you supply mutex functions (see sasl_set_mutex())
> * you make no libsasl calls until sasl_client/server_init() completes
> * no libsasl calls are made after sasl_done() is begun
> * when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library.
> It says explicitly that that sasl_set* calls are not thread safe, since they 
> set global state.
> The proton library makes calls to sasl_set* functions in :
>   pni_init_client()
>   pni_init_server(), and
>   pni_process_init()
> Since those are internal functions, there is no way for code that uses Proton 
> to lock around those calls.
> I think proton needs a new API call to let applications call 
> sasl_set_mutex().  Or something.
> We probably also need other protections to meet the other requirements 
> specified in the Cyrus documentation (and quoted above).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Commented] (PROTON-992) Proton's use of Cyrus SASL is not thread-safe.

2015-09-17 Thread Michael Goulish
Thanks!

I wondered about that (briefly) but thought there was nothing to be done.
If you have a sketch, I would be happy to see it!




- Original Message -

[ 
https://issues.apache.org/jira/browse/PROTON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802937#comment-14802937
 ] 

Andrew Stitcher commented on PROTON-992:


Upon a few days reflection I've realised that you cannot fix this problem with 
a global init for proton:

This is because there are a couple of parameters of the Cyrus library that 
*must* be set before calling either sasl_server_init() or sasl_client_init(). 
These are the configuration file directory and the server name. Currently if 
you want to customise these you must set them before the first usage of SASL 
and this works as SASL is initialised lazily. However in the proposed API there 
is literally no place to set them:

Since the usage pattern has to be documented that you must initialise the 
library before using it the Cyrus Sasl ibrary will be initialised before you 
are allowed to use the APIs that set the path or name.

So I think the only workable solution is to have an atomic count of use of the 
library and initialise on the going from 0->1 then finalise on going from 1->0. 
Obviously also keeping track of how many uses the library has as well so that 
we can finalise in the correct place. An important point here is to make the 
count atomic so that we can be sure to avoid any re-entrance into the 
initialisation or finalisation code (this is doable using gcc/clang builtins - 
we don't really support Cyrus on Win32 so Visual Studio isn't too important, 
but it has atomic primitives too)

I would note that really we should alos be using atomic counts like this in the 
OpenSSL code too,.

> Proton's use of Cyrus SASL is not thread-safe.
> --
>
> Key: PROTON-992
> URL: https://issues.apache.org/jira/browse/PROTON-992
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Affects Versions: 0.10
>    Reporter: michael goulish
>Assignee: michael goulish
>Priority: Critical
>
> Documentation for the Cyrus SASL library says that the library is believed to 
> be thread-safe only if the code that uses it meets several requirements.
> The requirements are:
> * you supply mutex functions (see sasl_set_mutex())
> * you make no libsasl calls until sasl_client/server_init() completes
> * no libsasl calls are made after sasl_done() is begun
> * when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library.
> It says explicitly that that sasl_set* calls are not thread safe, since they 
> set global state.
> The proton library makes calls to sasl_set* functions in :
>   pni_init_client()
>   pni_init_server(), and
>   pni_process_init()
> Since those are internal functions, there is no way for code that uses Proton 
> to lock around those calls.
> I think proton needs a new API call to let applications call 
> sasl_set_mutex().  Or something.
> We probably also need other protections to meet the other requirements 
> specified in the Cyrus documentation (and quoted above).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PROTON-992) Proton's use of Cyrus SASL is not thread-safe.

2015-09-10 Thread michael goulish (JIRA)
michael goulish created PROTON-992:
--

 Summary: Proton's use of Cyrus SASL is not thread-safe.
 Key: PROTON-992
 URL: https://issues.apache.org/jira/browse/PROTON-992
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.10
Reporter: michael goulish
Priority: Critical


Documentation for the Cyrus SASL library says that the library is believed to 
be thread-safe only if the code that uses it meets several requirements.

The requirements are:
* you supply mutex functions (see sasl_set_mutex())
* you make no libsasl calls until sasl_client/server_init() completes
* no libsasl calls are made after sasl_done() is begun
* when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library.

It says explicitly that that sasl_set* calls are not thread safe, since they 
set global state.

The proton library makes calls to sasl_set* functions in :
  pni_init_client()
  pni_init_server(), and
  pni_process_init()

Since those are internal functions, there is no way for code that uses Proton 
to lock around those calls.

I think proton needs a new API call to let applications call sasl_set_mutex().  
Or something.

We probably also need other protections to meet the other requirements 
specified in the Cyrus documentation (and quoted above).






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PROTON-919) make C impl behave like java wrt channel_max error

2015-07-17 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed PROTON-919.
--
   Resolution: Fixed
Fix Version/s: 0.10

commit 4ee726002804d7286a8c76b42e0a0717e0798822

please NOTE that this change also adds  #define PN_OK (0)  to the list of 
errors in error.h

 make C impl behave like java wrt channel_max error
 --

 Key: PROTON-919
 URL: https://issues.apache.org/jira/browse/PROTON-919
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-c, python-binding
Reporter: michael goulish
Assignee: michael goulish
Priority: Minor
 Fix For: 0.10


 In the Java impl, I made TransportImpl throw an exception if the application 
 tries to change the local channel_max setting after we have already sent the 
 OPEN frame to the remote peer.  ( Because at that point we communicate our 
 channel_max limit to the peer -- no fair changing it afterwards.)
 One reviewer suggested that it would be nice if the C impl worked the same 
 way.  That would mean that pn_set_channel_max() would have to return a result 
 code, which the Python binding would detect -- Python binding throws 
 exception, python tests detect it -- so it would work same way as Java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PROTON-864) don't crash when channel number goes high

2015-07-17 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed PROTON-864.
--
   Resolution: Fixed
Fix Version/s: 0.10

This is a duplicate of PROTON-842 

 don't crash when channel number goes high
 -

 Key: PROTON-864
 URL: https://issues.apache.org/jira/browse/PROTON-864
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Assignee: michael goulish
 Fix For: 0.10


 Code in transport.c, and a little in engine.c, looks at the topmost bit in 
 channel numbers to decide if the channels are in use.
 This causes crashes when the number of channels in a single connection goes 
 beyond 32767.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PROTON-949) proton doesn't build with ccache swig

2015-07-14 Thread michael goulish (JIRA)
michael goulish created PROTON-949:
--

 Summary: proton doesn't build with ccache swig
 Key: PROTON-949
 URL: https://issues.apache.org/jira/browse/PROTON-949
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Reporter: michael goulish


Thanks to aconway for finding this and saving me a day of madness and horror.

On freshly-downloaded proton tree, if I use this swig:

   /usr/lib64/ccache/swig

the build fails this way:
  qpid-proton/build/proton-c/bindings/python/cprotonPYTHON_wrap.c:4993:25: 
error: 'PN_HANDLE' undeclared (first use in this function)
PNI_PYTRACER = *((PN_HANDLE *)(argp));

--

but if I delete that swig executable, and use the one in  /bin/swig ,
then everything works.

yikes.

aconway believes the bug is in ccache-swig, not in proton, but I want to put 
this here in case this bites someone else in Proton Land.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PROTON-946) remove generated data structure definitions from protocol.h

2015-07-13 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated PROTON-946:
---
Description: 
Currently protocol.h.py reads the AMQP 1.0 spec xml files and generates all of 
its output into protocol.h -- even the data structure definitions.

Those definitions are currently protected by  #ifdef DEFINE_FIELDS , which is 
defined only in codec.c -- so the definitions only show up in that file, while 
other .c files only see the declarations.

If DEFINE_FIELDS is #defined in any other file, compilation will fail with 
multiple definition errors.

The structure declarations should remain in the .h file , but the actual 
definitions should be moved into a generated .c file.


  was:
Currently protocol.h.py reads the AMQP 1.0 spec xml files and generates all of 
its output into protocol.h -- evel the data structure definitions.

Those definitions are currently protected by  #ifdef DEFINE_FIELDS , which is 
defined only in codec.c -- so the definitions only show up in that file, while 
other .c files only see the declarations.

If DEFINE_FIELDS is #defined in any other file, compilation will fail with 
multiple definition errors.

The structure declarations should remain in the .h file , but the actual 
definitions should be moved into a generated .c file.


Summary: remove generated data structure definitions from protocol.h  
(was: remove generated data structure definitions from .protocol.h)

 remove generated data structure definitions from protocol.h
 ---

 Key: PROTON-946
 URL: https://issues.apache.org/jira/browse/PROTON-946
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-c
Affects Versions: 0.10
Reporter: michael goulish
Assignee: michael goulish

 Currently protocol.h.py reads the AMQP 1.0 spec xml files and generates all 
 of its output into protocol.h -- even the data structure definitions.
 Those definitions are currently protected by  #ifdef DEFINE_FIELDS , which is 
 defined only in codec.c -- so the definitions only show up in that file, 
 while other .c files only see the declarations.
 If DEFINE_FIELDS is #defined in any other file, compilation will fail with 
 multiple definition errors.
 The structure declarations should remain in the .h file , but the actual 
 definitions should be moved into a generated .c file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PROTON-946) remove generated data structure definitions from .protocol.h

2015-07-13 Thread michael goulish (JIRA)
michael goulish created PROTON-946:
--

 Summary: remove generated data structure definitions from 
.protocol.h
 Key: PROTON-946
 URL: https://issues.apache.org/jira/browse/PROTON-946
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-c
Affects Versions: 0.10
Reporter: michael goulish
Assignee: michael goulish


Currently protocol.h.py reads the AMQP 1.0 spec xml files and generates all of 
its output into protocol.h -- evel the data structure definitions.

Those definitions are currently protected by  #ifdef DEFINE_FIELDS , which is 
defined only in codec.c -- so the definitions only show up in that file, while 
other .c files only see the declarations.

If DEFINE_FIELDS is #defined in any other file, compilation will fail with 
multiple definition errors.

The structure declarations should remain in the .h file , but the actual 
definitions should be moved into a generated .c file.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PROTON-826) recent checkin causes frequent double-free or corruption crash

2015-07-08 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved PROTON-826.

Resolution: Fixed

I recreated my test from February, and cannot reproduce the bug using latest 
dispatch + protron code.



 recent checkin causes frequent double-free or corruption crash
 --

 Key: PROTON-826
 URL: https://issues.apache.org/jira/browse/PROTON-826
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Assignee: michael goulish
Priority: Blocker

 In my dispatch testing I am seeing frequent crashes in proton library that 
 began with proton checkin   01cb00c  on 2015-02-15   report read and write 
 errors through the transport
 The output at crash-time says this:
 ---
 *** Error in `/home/mick/dispatch/install/sbin/qdrouterd': double free or 
 corruption (fasttop): 0x020ee880 ***
 === Backtrace: =
 /lib64/libc.so.6[0x3e3d875a4f]
 /lib64/libc.so.6[0x3e3d87cd78]
 /lib64/libqpid-proton.so.2(pn_error_clear+0x18)[0x7f4f4f4e1f18]
 /lib64/libqpid-proton.so.2(pn_error_set+0x11)[0x7f4f4f4e1f41]
 /lib64/libqpid-proton.so.2(pn_error_vformat+0x3e)[0x7f4f4f4e1f9e]
 /lib64/libqpid-proton.so.2(pn_error_format+0x82)[0x7f4f4f4e2032]
 /lib64/libqpid-proton.so.2(pn_i_error_from_errno+0x67)[0x7f4f4f4fd737]
 /lib64/libqpid-proton.so.2(pn_recv+0x5a)[0x7f4f4f4fd16a]
 /home/mick/dispatch/install/lib64/libqpid-dispatch.so.0(qdpn_connector_process+0xd7)[0x7f4f4f759430]
 The backtrace from the core file looks like this:
 
 #0  0x003e3d835877 in raise () from /lib64/libc.so.6
 #1  0x003e3d836f68 in abort () from /lib64/libc.so.6
 #2  0x003e3d875a54 in __libc_message () from /lib64/libc.so.6
 #3  0x003e3d87cd78 in _int_free () from /lib64/libc.so.6
 #4  0x7fbf8a59b2e8 in pn_error_clear (error=error@entry=0x1501140)
 at /home/mick/rh-qpid-proton/proton-c/src/error.c:56
 #5  0x7fbf8a59b311 in pn_error_set (error=error@entry=0x1501140, 
 code=code@entry=-2,
 text=text@entry=0x7fbf801a69c0 recv: Resource temporarily unavailable)
 at /home/mick/rh-qpid-proton/proton-c/src/error.c:65
 #6  0x7fbf8a59b36e in pn_error_vformat (error=0x1501140, code=-2, 
 fmt=optimized out,
 ap=ap@entry=0x7fbf801a6de8) at 
 /home/mick/rh-qpid-proton/proton-c/src/error.c:81
 #7  0x7fbf8a59b402 in pn_error_format (error=error@entry=0x1501140, 
 code=optimized out,
 fmt=fmt@entry=0x7fbf8a5bb21e %s: %s) at 
 /home/mick/rh-qpid-proton/proton-c/src/error.c:89
 #8  0x7fbf8a5b6797 in pn_i_error_from_errno (error=0x1501140,
 msg=msg@entry=0x7fbf8a5bbe1a recv)
 at /home/mick/rh-qpid-proton/proton-c/src/platform.c:119
 #9  0x7fbf8a5b61ca in pn_recv (io=0x14e77b0, socket=optimized out, 
 buf=optimized out,
 size=optimized out) at 
 /home/mick/rh-qpid-proton/proton-c/src/posix/io.c:271
 #10 0x7fbf8a812430 in qdpn_connector_process (c=0x7fbf7801c7f0)
 -
 And I can prevent the crash from happening, apparently forever, by commenting 
 out this line:
   free(error-text);
 in the function  pn_error_clear
 in the file proton-c/src/error.c
 The error text that is being freed which causes the crash looks like this:
   $2 = {text = 0x7f66e8104e30 recv: Resource temporarily unavailable, root 
 = 0x0, code = -2}
 My dispatch test creates a router network and then repeatedly kills and 
 restarts a randomly-selected router.  After this proton checkin it almost 
 never gets through 5 iterations without this crash.  After I commented out 
 that line, it got through more than 500 iterations before I stopped it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 0.10 alpha1

2015-07-07 Thread Michael Goulish
I just took 826 -- I think I can still re-create the test that found it.
Let me see if I can repro...





- Original Message -
Yay Rafi!  Thanks!


A simple query of currently outstanding blocker JIRAs affecting 0.9+ shows only 
three:

https://issues.apache.org/jira/browse/PROTON-826  (unassigned)
https://issues.apache.org/jira/browse/PROTON-923  (asticher)
https://issues.apache.org/jira/browse/PROTON-934  (rschloming)


The remaining open bugs affecting 0.9+ are:

https://issues.apache.org/jira/browse/PROTON-826?jql=project%20%3D%20PROTON%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29%20AND%20affectedVersion%20in%20%280.9%2C%200.9.1%2C%200.10%29%20ORDER%20BY%20priority%20DESC




- Original Message -
 From: Rafael Schloming r...@alum.mit.edu
 To: proton@qpid.apache.org
 Sent: Tuesday, July 7, 2015 1:28:17 AM
 Subject: 0.10 alpha1
 
 As promised, here is the first alpha for 0.10. It's posted in the usual
 places:
 
 Source code is here:
 
 http://people.apache.org/~rhs/qpid-proton-0.10-alpha1/
 
 Java binaries are here:
 
 https://repository.apache.org/content/repositories/orgapacheqpid-1036
 
 Please check it out and follow up with any issues.
 
 --Rafael
 

-- 
-K

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Assigned] (PROTON-826) recent checkin causes frequent double-free or corruption crash

2015-07-07 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned PROTON-826:
--

Assignee: michael goulish

 recent checkin causes frequent double-free or corruption crash
 --

 Key: PROTON-826
 URL: https://issues.apache.org/jira/browse/PROTON-826
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Assignee: michael goulish
Priority: Blocker

 In my dispatch testing I am seeing frequent crashes in proton library that 
 began with proton checkin   01cb00c  on 2015-02-15   report read and write 
 errors through the transport
 The output at crash-time says this:
 ---
 *** Error in `/home/mick/dispatch/install/sbin/qdrouterd': double free or 
 corruption (fasttop): 0x020ee880 ***
 === Backtrace: =
 /lib64/libc.so.6[0x3e3d875a4f]
 /lib64/libc.so.6[0x3e3d87cd78]
 /lib64/libqpid-proton.so.2(pn_error_clear+0x18)[0x7f4f4f4e1f18]
 /lib64/libqpid-proton.so.2(pn_error_set+0x11)[0x7f4f4f4e1f41]
 /lib64/libqpid-proton.so.2(pn_error_vformat+0x3e)[0x7f4f4f4e1f9e]
 /lib64/libqpid-proton.so.2(pn_error_format+0x82)[0x7f4f4f4e2032]
 /lib64/libqpid-proton.so.2(pn_i_error_from_errno+0x67)[0x7f4f4f4fd737]
 /lib64/libqpid-proton.so.2(pn_recv+0x5a)[0x7f4f4f4fd16a]
 /home/mick/dispatch/install/lib64/libqpid-dispatch.so.0(qdpn_connector_process+0xd7)[0x7f4f4f759430]
 The backtrace from the core file looks like this:
 
 #0  0x003e3d835877 in raise () from /lib64/libc.so.6
 #1  0x003e3d836f68 in abort () from /lib64/libc.so.6
 #2  0x003e3d875a54 in __libc_message () from /lib64/libc.so.6
 #3  0x003e3d87cd78 in _int_free () from /lib64/libc.so.6
 #4  0x7fbf8a59b2e8 in pn_error_clear (error=error@entry=0x1501140)
 at /home/mick/rh-qpid-proton/proton-c/src/error.c:56
 #5  0x7fbf8a59b311 in pn_error_set (error=error@entry=0x1501140, 
 code=code@entry=-2,
 text=text@entry=0x7fbf801a69c0 recv: Resource temporarily unavailable)
 at /home/mick/rh-qpid-proton/proton-c/src/error.c:65
 #6  0x7fbf8a59b36e in pn_error_vformat (error=0x1501140, code=-2, 
 fmt=optimized out,
 ap=ap@entry=0x7fbf801a6de8) at 
 /home/mick/rh-qpid-proton/proton-c/src/error.c:81
 #7  0x7fbf8a59b402 in pn_error_format (error=error@entry=0x1501140, 
 code=optimized out,
 fmt=fmt@entry=0x7fbf8a5bb21e %s: %s) at 
 /home/mick/rh-qpid-proton/proton-c/src/error.c:89
 #8  0x7fbf8a5b6797 in pn_i_error_from_errno (error=0x1501140,
 msg=msg@entry=0x7fbf8a5bbe1a recv)
 at /home/mick/rh-qpid-proton/proton-c/src/platform.c:119
 #9  0x7fbf8a5b61ca in pn_recv (io=0x14e77b0, socket=optimized out, 
 buf=optimized out,
 size=optimized out) at 
 /home/mick/rh-qpid-proton/proton-c/src/posix/io.c:271
 #10 0x7fbf8a812430 in qdpn_connector_process (c=0x7fbf7801c7f0)
 -
 And I can prevent the crash from happening, apparently forever, by commenting 
 out this line:
   free(error-text);
 in the function  pn_error_clear
 in the file proton-c/src/error.c
 The error text that is being freed which causes the crash looks like this:
   $2 = {text = 0x7f66e8104e30 recv: Resource temporarily unavailable, root 
 = 0x0, code = -2}
 My dispatch test creates a router network and then repeatedly kills and 
 restarts a randomly-selected router.  After this proton checkin it almost 
 never gets through 5 iterations without this crash.  After I commented out 
 that line, it got through more than 500 iterations before I stopped it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PROTON-826) recent checkin causes frequent double-free or corruption crash

2015-07-07 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616749#comment-14616749
 ] 

michael goulish commented on PROTON-826:


I see why I didn't follow this up earlier.
Current dispatch will not compile against latest proton because of some SASL 
issues.
But I need to test against latest proton.
SO ... now attempting to hack up dispatch so that it doesn't have SASL but will 
still build and run against latest proton

 recent checkin causes frequent double-free or corruption crash
 --

 Key: PROTON-826
 URL: https://issues.apache.org/jira/browse/PROTON-826
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Assignee: michael goulish
Priority: Blocker

 In my dispatch testing I am seeing frequent crashes in proton library that 
 began with proton checkin   01cb00c  on 2015-02-15   report read and write 
 errors through the transport
 The output at crash-time says this:
 ---
 *** Error in `/home/mick/dispatch/install/sbin/qdrouterd': double free or 
 corruption (fasttop): 0x020ee880 ***
 === Backtrace: =
 /lib64/libc.so.6[0x3e3d875a4f]
 /lib64/libc.so.6[0x3e3d87cd78]
 /lib64/libqpid-proton.so.2(pn_error_clear+0x18)[0x7f4f4f4e1f18]
 /lib64/libqpid-proton.so.2(pn_error_set+0x11)[0x7f4f4f4e1f41]
 /lib64/libqpid-proton.so.2(pn_error_vformat+0x3e)[0x7f4f4f4e1f9e]
 /lib64/libqpid-proton.so.2(pn_error_format+0x82)[0x7f4f4f4e2032]
 /lib64/libqpid-proton.so.2(pn_i_error_from_errno+0x67)[0x7f4f4f4fd737]
 /lib64/libqpid-proton.so.2(pn_recv+0x5a)[0x7f4f4f4fd16a]
 /home/mick/dispatch/install/lib64/libqpid-dispatch.so.0(qdpn_connector_process+0xd7)[0x7f4f4f759430]
 The backtrace from the core file looks like this:
 
 #0  0x003e3d835877 in raise () from /lib64/libc.so.6
 #1  0x003e3d836f68 in abort () from /lib64/libc.so.6
 #2  0x003e3d875a54 in __libc_message () from /lib64/libc.so.6
 #3  0x003e3d87cd78 in _int_free () from /lib64/libc.so.6
 #4  0x7fbf8a59b2e8 in pn_error_clear (error=error@entry=0x1501140)
 at /home/mick/rh-qpid-proton/proton-c/src/error.c:56
 #5  0x7fbf8a59b311 in pn_error_set (error=error@entry=0x1501140, 
 code=code@entry=-2,
 text=text@entry=0x7fbf801a69c0 recv: Resource temporarily unavailable)
 at /home/mick/rh-qpid-proton/proton-c/src/error.c:65
 #6  0x7fbf8a59b36e in pn_error_vformat (error=0x1501140, code=-2, 
 fmt=optimized out,
 ap=ap@entry=0x7fbf801a6de8) at 
 /home/mick/rh-qpid-proton/proton-c/src/error.c:81
 #7  0x7fbf8a59b402 in pn_error_format (error=error@entry=0x1501140, 
 code=optimized out,
 fmt=fmt@entry=0x7fbf8a5bb21e %s: %s) at 
 /home/mick/rh-qpid-proton/proton-c/src/error.c:89
 #8  0x7fbf8a5b6797 in pn_i_error_from_errno (error=0x1501140,
 msg=msg@entry=0x7fbf8a5bbe1a recv)
 at /home/mick/rh-qpid-proton/proton-c/src/platform.c:119
 #9  0x7fbf8a5b61ca in pn_recv (io=0x14e77b0, socket=optimized out, 
 buf=optimized out,
 size=optimized out) at 
 /home/mick/rh-qpid-proton/proton-c/src/posix/io.c:271
 #10 0x7fbf8a812430 in qdpn_connector_process (c=0x7fbf7801c7f0)
 -
 And I can prevent the crash from happening, apparently forever, by commenting 
 out this line:
   free(error-text);
 in the function  pn_error_clear
 in the file proton-c/src/error.c
 The error text that is being freed which causes the crash looks like this:
   $2 = {text = 0x7f66e8104e30 recv: Resource temporarily unavailable, root 
 = 0x0, code = -2}
 My dispatch test creates a router network and then repeatedly kills and 
 restarts a randomly-selected router.  After this proton checkin it almost 
 never gets through 5 iterations without this crash.  After I commented out 
 that line, it got through more than 500 iterations before I stopped it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PROTON-930) add explicit AMQP 1.0 constants

2015-07-02 Thread michael goulish (JIRA)
michael goulish created PROTON-930:
--

 Summary: add explicit AMQP 1.0 constants
 Key: PROTON-930
 URL: https://issues.apache.org/jira/browse/PROTON-930
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-c
Reporter: michael goulish
Assignee: michael goulish
Priority: Minor
 Fix For: 0.10


Add an include file that has explicit defined constants for every numeric 
default value that is mandated by the AMQP 1.0 spec.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PROTON-925) proton-c seems to treat unspecified channel-max as implying 0

2015-07-02 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved PROTON-925.

Resolution: Fixed

commit fc38e86a6f5a1b265552708e674d3c8040c1985b

 proton-c seems to treat unspecified channel-max as implying 0
 -

 Key: PROTON-925
 URL: https://issues.apache.org/jira/browse/PROTON-925
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.10
Reporter: Gordon Sim
Assignee: michael goulish
Priority: Blocker
 Fix For: 0.10


 If max-channels is not specified in the open, it appears the latest proton-c 
 treats that as implying the maximum is 0 though the spec states the default 
 is 65535.
 This breaks compatibility with previous proton releases. E.g. the following 
 is the interaction between a sender using the latest 0.10 and a receiver 
 using proton 0.9.
 {noformat}
 [0x151c710]:  - AMQP
 [0x151c710]:0 - @open(16) 
 [container-id=65A6602D-5D24-4D39-9C6F-7403D98F5E15, hostname=localhost, 
 channel-max=32767]
 [0x151c710]:0 - @begin(17) [next-outgoing-id=0, incoming-window=2147483647, 
 outgoing-window=1]
 [0x151c710]:1 - @begin(17) [next-outgoing-id=0, incoming-window=2147483647, 
 outgoing-window=1]
 [0x151c710]:2 - @begin(17) [next-outgoing-id=0, incoming-window=2147483647, 
 outgoing-window=1]
 [0x151c710]:0 - @attach(18) [name=sender-xxx, handle=0, role=false, 
 snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_a, 
 durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_a, 
 durable=0, timeout=0, dynamic=false], initial-delivery-count=0]
 [0x151c710]:1 - @attach(18) [name=sender-xxx, handle=0, role=false, 
 snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_b, 
 durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_b, 
 durable=0, timeout=0, dynamic=false], initial-delivery-count=0]
 [0x151c710]:2 - @attach(18) [name=sender-xxx, handle=0, role=false, 
 snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_c, 
 durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_c, 
 durable=0, timeout=0, dynamic=false], initial-delivery-count=0]
 [0x151c710]:  - AMQP
 [0x151c710]:0 - @open(16) 
 [container-id=abab56b0-c25e-427b-9f4f-d63da48d1973]
 [0x151c710]:0 - @begin(17) [remote-channel=0, next-outgoing-id=0, 
 incoming-window=2147483647, outgoing-window=0]
 [0x151c710]:1 - @begin(17) [remote-channel=1, next-outgoing-id=0, 
 incoming-window=2147483647, outgoing-window=0]
 [0x151c710]:2 - @begin(17) [remote-channel=2, next-outgoing-id=0, 
 incoming-window=2147483647, outgoing-window=0]
 [0x151c710]:0 - @attach(18) [name=sender-xxx, handle=0, role=true, 
 snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_a, 
 durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_a, 
 durable=0, timeout=0, dynamic=false], initial-delivery-count=0]
 [0x151c710]:1 - @attach(18) [name=sender-xxx, handle=0, role=true, 
 snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_b, 
 durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_b, 
 durable=0, timeout=0, dynamic=false], initial-delivery-count=0]
 [0x151c710]:2 - @attach(18) [name=sender-xxx, handle=0, role=true, 
 snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [address=queue_c, 
 durable=0, timeout=0, dynamic=false], target=@target(41) [address=queue_c, 
 durable=0, timeout=0, dynamic=false], initial-delivery-count=0]
 [0x151c710]:0 - @flow(19) [next-incoming-id=0, incoming-window=2147483647, 
 next-outgoing-id=0, outgoing-window=0, handle=0, delivery-count=0, 
 link-credit=341, drain=false]
 [0x151c710]:1 - @flow(19) [next-incoming-id=0, incoming-window=2147483647, 
 next-outgoing-id=0, outgoing-window=0, handle=0, delivery-count=0, 
 link-credit=341, drain=false]
 [0x151c710]:2 - @flow(19) [next-incoming-id=0, incoming-window=2147483647, 
 next-outgoing-id=0, outgoing-window=0, handle=0, delivery-count=0, 
 link-credit=341, drain=false]
 [0x151c710]:0 - @close(24) [error=@error(29) 
 [condition=:amqp:connection:framing-error, description=remote channel 1 is 
 above negotiated channel_max 0.]]
 [0x151c710]:  - EOS
 [0x151c710]:0 - @close(24) []
 [0x151c710]:  - EOS
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PROTON-842) proton-c should honor channel_max

2015-06-30 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved PROTON-842.

Resolution: Fixed

Last checkin fixed java tests.

 proton-c should honor channel_max
 -

 Key: PROTON-842
 URL: https://issues.apache.org/jira/browse/PROTON-842
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-j
Affects Versions: 0.9, 0.10
Reporter: michael goulish
Assignee: michael goulish

 proton-c code should use  transport-channel_max and 
 transport-remote_channel_max  to enforce a limit on the
 maximum number of simultaneously active sessions on a 
 connection.   
 I guess the limit should be the minimum of those
 two numbers, or, if neither side sets a limit, then 2^16.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PROTON-919) make C impl behave like java wrt channel_max error

2015-06-23 Thread michael goulish (JIRA)
michael goulish created PROTON-919:
--

 Summary: make C impl behave like java wrt channel_max error
 Key: PROTON-919
 URL: https://issues.apache.org/jira/browse/PROTON-919
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-c, python-binding
Reporter: michael goulish
Assignee: michael goulish
Priority: Minor


In the Java impl, I made TransportImpl throw an exception if the application 
tries to change the local channel_max setting after we have already sent the 
OPEN frame to the remote peer.  ( Because at that point we communicate our 
channel_max limit to the peer -- no fair changing it afterwards.)

One reviewer suggested that it would be nice if the C impl worked the same way. 
 That would mean that pn_set_channel_max() would have to return a result code, 
which the Python binding would detect -- Python binding throws exception, 
python tests detect it -- so it would work same way as Java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PROTON-919) make C impl behave like java wrt channel_max error

2015-06-23 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598095#comment-14598095
 ] 

michael goulish commented on PROTON-919:


~~~ NOTE ~~~

The proposed change alters the public API in that it changes 
pn_transport_set_channel_max() to return an int, rather than void.




 make C impl behave like java wrt channel_max error
 --

 Key: PROTON-919
 URL: https://issues.apache.org/jira/browse/PROTON-919
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-c, python-binding
Reporter: michael goulish
Assignee: michael goulish
Priority: Minor

 In the Java impl, I made TransportImpl throw an exception if the application 
 tries to change the local channel_max setting after we have already sent the 
 OPEN frame to the remote peer.  ( Because at that point we communicate our 
 channel_max limit to the peer -- no fair changing it afterwards.)
 One reviewer suggested that it would be nice if the C impl worked the same 
 way.  That would mean that pn_set_channel_max() would have to return a result 
 code, which the Python binding would detect -- Python binding throws 
 exception, python tests detect it -- so it would work same way as Java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PROTON-842) proton-c should honor channel_max

2015-06-18 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved PROTON-842.

Resolution: Fixed

commit e38957ae5115ec023993672ca5b7d5e3df414f7e

 proton-c should honor channel_max
 -

 Key: PROTON-842
 URL: https://issues.apache.org/jira/browse/PROTON-842
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Assignee: michael goulish

 proton-c code should use  transport-channel_max and 
 transport-remote_channel_max  to enforce a limit on the
 maximum number of simultaneously active sessions on a 
 connection.   
 I guess the limit should be the minimum of those
 two numbers, or, if neither side sets a limit, then 2^16.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PROTON-842) proton-c should honor channel_max

2015-06-18 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591877#comment-14591877
 ] 

michael goulish commented on PROTON-842:


-- please note -- 

This fix changes API behavior in one way:   pn_session can now return NULL if 
an attempt is made to create more sessions than are allowed by the value of 
channel_max.

Previously, limitation on number of session was enforced by SEGV.



 proton-c should honor channel_max
 -

 Key: PROTON-842
 URL: https://issues.apache.org/jira/browse/PROTON-842
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Assignee: michael goulish

 proton-c code should use  transport-channel_max and 
 transport-remote_channel_max  to enforce a limit on the
 maximum number of simultaneously active sessions on a 
 connection.   
 I guess the limit should be the minimum of those
 two numbers, or, if neither side sets a limit, then 2^16.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (PROTON-842) proton-c should honor channel_max

2015-06-18 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reopened PROTON-842:


My fix for proton-c is making trouble for proton-j

 proton-c should honor channel_max
 -

 Key: PROTON-842
 URL: https://issues.apache.org/jira/browse/PROTON-842
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Assignee: michael goulish

 proton-c code should use  transport-channel_max and 
 transport-remote_channel_max  to enforce a limit on the
 maximum number of simultaneously active sessions on a 
 connection.   
 I guess the limit should be the minimum of those
 two numbers, or, if neither side sets a limit, then 2^16.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PROTON-896) change all static function names to begin with pni_

2015-06-08 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned PROTON-896:
--

Assignee: michael goulish

 change all static function names to begin with pni_
 ---

 Key: PROTON-896
 URL: https://issues.apache.org/jira/browse/PROTON-896
 Project: Qpid Proton
  Issue Type: Improvement
Reporter: michael goulish
Assignee: michael goulish
Priority: Minor

 Change all the static function names to start with pni_ ,
 and declare all functions as static that ought to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: question about proton error philosophy

2015-06-03 Thread Michael Goulish


 It it philosophically questionable but C is not a very philosophical
 language.


I beg to differ.

Here are some famous thoughts concerning the C language by one
early practioner, that greatest of all Roman philosophers, 
the incomparable Ibid.


  Non teneas aurum totum quod splendet ut aurum, et scribe in C.

  Minima maxima sunt, sic haec scriberem C.

  Mutantur omnia nos et mutamur in illis, nisi lingua C.




[jira] [Created] (PROTON-896) change all statis function names to begin with pni_

2015-05-29 Thread michael goulish (JIRA)
michael goulish created PROTON-896:
--

 Summary: change all statis function names to begin with pni_
 Key: PROTON-896
 URL: https://issues.apache.org/jira/browse/PROTON-896
 Project: Qpid Proton
  Issue Type: Improvement
Reporter: michael goulish
Priority: Minor


Change all the static function names to start with pni_ ,
and declare all functions as static that ought to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PROTON-896) change all static function names to begin with pni_

2015-05-29 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated PROTON-896:
---
Summary: change all static function names to begin with pni_  (was: change 
all statis function names to begin with pni_)

 change all static function names to begin with pni_
 ---

 Key: PROTON-896
 URL: https://issues.apache.org/jira/browse/PROTON-896
 Project: Qpid Proton
  Issue Type: Improvement
Reporter: michael goulish
Priority: Minor

 Change all the static function names to start with pni_ ,
 and declare all functions as static that ought to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PROTON-864) don't crash when channel number goes high

2015-05-19 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated PROTON-864:
---
Summary: don't crash when channel number goes high  (was: avoid crashes 
when channel number goes high.)

 don't crash when channel number goes high
 -

 Key: PROTON-864
 URL: https://issues.apache.org/jira/browse/PROTON-864
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Assignee: michael goulish

 Code in transport.c, and a little in engine.c, looks at the topmost bit in 
 channel numbers to decide if the channels are in use.
 This causes crashes when the number of channels in a single connection goes 
 beyond 32767.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PROTON-864) avoid crashes when channel number goes high.

2015-05-19 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated PROTON-864:
---
Summary: avoid crashes when channel number goes high.  (was: don't overload 
top bit of channel numbers )

 avoid crashes when channel number goes high.
 

 Key: PROTON-864
 URL: https://issues.apache.org/jira/browse/PROTON-864
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Assignee: michael goulish

 Code in transport.c, and a little in engine.c, looks at the topmost bit in 
 channel numbers to decide if the channels are in use.
 This causes crashes when the number of channels in a single connection goes 
 beyond 32767.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PROTON-888) allocate_alias linear search becomes slow at scale

2015-05-18 Thread michael goulish (JIRA)
michael goulish created PROTON-888:
--

 Summary: allocate_alias linear search becomes slow at scale
 Key: PROTON-888
 URL: https://issues.apache.org/jira/browse/PROTON-888
 Project: Qpid Proton
  Issue Type: Improvement
Reporter: michael goulish


Testing that I have done recently goes to large scale on number of sessions per 
connection.  I noticed that the test was slowing down rapidly over time, in 
terms of how many sessions were being established per unit time.

The function allocate_alias in file transport.c uses a linear search through an 
array to find the next available channel number for a session  (or the next 
available handle number for a link).  In a usage scenario like mine in which 
many sessions will be established, this becomes very slow as the array fills up.

At the beginning of my test, this function is too fast to measure.  By the end, 
it is using more than 82 milliseconds per call.  Overall, this function alone 
is contributing more than 20 seconds to my 3-minute test.

This is not an unrealistic scenario -- we already have one potential customer 
who is interested in going to this kind of scale.  (Which is why I was doing 
this test.)

Maybe we can find an implementation that does not slow down the common scale, 
and yet behaves better at the high end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PROTON-886) make proton enforce handle-max

2015-05-14 Thread michael goulish (JIRA)
michael goulish created PROTON-886:
--

 Summary: make proton enforce handle-max 
 Key: PROTON-886
 URL: https://issues.apache.org/jira/browse/PROTON-886
 Project: Qpid Proton
  Issue Type: Bug
Reporter: michael goulish


Make the code enforce limits on handles (and links) from section 2.7.2 of the 
AMQP 1.0 spec.

The handle-max value is the highest handle value that can be used on the 
session. A peer MUST NOT attempt to attach a link using a handle value outside 
the range that its partner can handle.  A peer that receives a handle outside 
the supported range MUST close the connection with the framing-error error-code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE]: Release Proton 0.9.1-rc1 as 0.9.1

2015-05-01 Thread Michael Goulish


 [ X ]: Yes, release Proton 0.9.1-rc1 as 0.9.1
 [  ]: No, ...


  tested with my 30,000-link-in-one-connection code.




[jira] [Created] (PROTON-864) don't overload top bit of channel numbers

2015-04-24 Thread michael goulish (JIRA)
michael goulish created PROTON-864:
--

 Summary: don't overload top bit of channel numbers 
 Key: PROTON-864
 URL: https://issues.apache.org/jira/browse/PROTON-864
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Assignee: michael goulish


Code in transport.c, and a little in engine.c, looks at the topmost bit in 
channel numbers to decide if the channels are in use.
This causes crashes when the number of channels in a single connection goes 
beyond 32767.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: I think that's a blocker...

2015-02-25 Thread Michael Goulish

Good point!  I'm afraid it will take me the rest of my life
to reproduce under valgrind .. but ... I'll see what I can do

In the meantime -- I'm not sure what to do with a Jira if the
provenance is in doubt...


- Original Message -
 This isn't necessarily a proton bug. Nothing in the referenced checkin
 actually touches the logic around allocating/freeing error strings, it
 merely causes pn_send/pn_recv to make use of pn_io_t's pn_error_t where
 previously it threw away the error information. This would suggest that
 there is perhaps a pre-existing bug in dispatch where it is calling
 pn_send/pn_recv with a pn_io_t that has been freed, and it is only now
 triggering due to the additional asserts that are encountered due to not
 ignoring the error information.
 
 I could be mistaken, but I would try reproducing this under valgrind. That
 will tell you where the first free occurred and that should hopefully make
 it obvious whether this is indeed a proton bug or whether dispatch is
 somehow freeing the pn_io_t sooner than it should.
 
 (FWIW, if it is indeed a proton bug, then I would agree it is a blocker.)
 
 --Rafael
 
 On Wed, Feb 25, 2015 at 7:54 AM, Michael Goulish mgoul...@redhat.com
 wrote:
 
  ...but if not, somebody please feel free to correct me.
 
  The Jira that I just created -- PROTON-826 -- is for a
  bug I found with my topology testing of the Dispatch Router,
  in which I repeatedly kill and restart a router and make
  sure that the router network comes back to the same topology
  that it had before.
 
  As of checkin 01cb00c -- which had no Jira -- it is pretty
  easy for my test to blow core.  It looks like an error
  string is being double-freed (maybe) in the proton library.
 
  ( full info in the Jira.  https://issues.apache.org/jira/browse/PROTON-826
  )
 
 
 
 


[jira] [Commented] (PROTON-826) recent checkin causes frequent double-free or corruption crash

2015-02-25 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336824#comment-14336824
 ] 

michael goulish commented on PROTON-826:


It looks like the problem here is just that the error struct used in  
proton-c/src/error.c is not thread safe -- so I am opening a new Jira for 
Dispatch.

I am leaving this one open for now, however, because other applications using 
proton will encounter this.  Either something could be changed in proton to 
make this less thread-hostile, or ... it could be publicized better?

Please feel free to close when appropriate.



 recent checkin causes frequent double-free or corruption crash
 --

 Key: PROTON-826
 URL: https://issues.apache.org/jira/browse/PROTON-826
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Priority: Blocker

 In my dispatch testing I am seeing frequent crashes in proton library that 
 began with proton checkin   01cb00c  on 2015-02-15   report read and write 
 errors through the transport
 The output at crash-time says this:
 ---
 *** Error in `/home/mick/dispatch/install/sbin/qdrouterd': double free or 
 corruption (fasttop): 0x020ee880 ***
 === Backtrace: =
 /lib64/libc.so.6[0x3e3d875a4f]
 /lib64/libc.so.6[0x3e3d87cd78]
 /lib64/libqpid-proton.so.2(pn_error_clear+0x18)[0x7f4f4f4e1f18]
 /lib64/libqpid-proton.so.2(pn_error_set+0x11)[0x7f4f4f4e1f41]
 /lib64/libqpid-proton.so.2(pn_error_vformat+0x3e)[0x7f4f4f4e1f9e]
 /lib64/libqpid-proton.so.2(pn_error_format+0x82)[0x7f4f4f4e2032]
 /lib64/libqpid-proton.so.2(pn_i_error_from_errno+0x67)[0x7f4f4f4fd737]
 /lib64/libqpid-proton.so.2(pn_recv+0x5a)[0x7f4f4f4fd16a]
 /home/mick/dispatch/install/lib64/libqpid-dispatch.so.0(qdpn_connector_process+0xd7)[0x7f4f4f759430]
 The backtrace from the core file looks like this:
 
 #0  0x003e3d835877 in raise () from /lib64/libc.so.6
 #1  0x003e3d836f68 in abort () from /lib64/libc.so.6
 #2  0x003e3d875a54 in __libc_message () from /lib64/libc.so.6
 #3  0x003e3d87cd78 in _int_free () from /lib64/libc.so.6
 #4  0x7fbf8a59b2e8 in pn_error_clear (error=error@entry=0x1501140)
 at /home/mick/rh-qpid-proton/proton-c/src/error.c:56
 #5  0x7fbf8a59b311 in pn_error_set (error=error@entry=0x1501140, 
 code=code@entry=-2,
 text=text@entry=0x7fbf801a69c0 recv: Resource temporarily unavailable)
 at /home/mick/rh-qpid-proton/proton-c/src/error.c:65
 #6  0x7fbf8a59b36e in pn_error_vformat (error=0x1501140, code=-2, 
 fmt=optimized out,
 ap=ap@entry=0x7fbf801a6de8) at 
 /home/mick/rh-qpid-proton/proton-c/src/error.c:81
 #7  0x7fbf8a59b402 in pn_error_format (error=error@entry=0x1501140, 
 code=optimized out,
 fmt=fmt@entry=0x7fbf8a5bb21e %s: %s) at 
 /home/mick/rh-qpid-proton/proton-c/src/error.c:89
 #8  0x7fbf8a5b6797 in pn_i_error_from_errno (error=0x1501140,
 msg=msg@entry=0x7fbf8a5bbe1a recv)
 at /home/mick/rh-qpid-proton/proton-c/src/platform.c:119
 #9  0x7fbf8a5b61ca in pn_recv (io=0x14e77b0, socket=optimized out, 
 buf=optimized out,
 size=optimized out) at 
 /home/mick/rh-qpid-proton/proton-c/src/posix/io.c:271
 #10 0x7fbf8a812430 in qdpn_connector_process (c=0x7fbf7801c7f0)
 -
 And I can prevent the crash from happening, apparently forever, by commenting 
 out this line:
   free(error-text);
 in the function  pn_error_clear
 in the file proton-c/src/error.c
 The error text that is being freed which causes the crash looks like this:
   $2 = {text = 0x7f66e8104e30 recv: Resource temporarily unavailable, root 
 = 0x0, code = -2}
 My dispatch test creates a router network and then repeatedly kills and 
 restarts a randomly-selected router.  After this proton checkin it almost 
 never gets through 5 iterations without this crash.  After I commented out 
 that line, it got through more than 500 iterations before I stopped it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PROTON-826) recent checkin causes frequent double-free or corruption crash

2015-02-24 Thread michael goulish (JIRA)
michael goulish created PROTON-826:
--

 Summary: recent checkin causes frequent double-free or corruption 
crash
 Key: PROTON-826
 URL: https://issues.apache.org/jira/browse/PROTON-826
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.9
Reporter: michael goulish
Priority: Blocker


In my dispatch testing I am seeing frequent crashes in proton library that 
began with proton checkin   01cb00c  on 2015-02-15   report read and write 
errors through the transport



The output at crash-time says this:
---

*** Error in `/home/mick/dispatch/install/sbin/qdrouterd': double free or 
corruption (fasttop): 0x020ee880 ***
=== Backtrace: =
/lib64/libc.so.6[0x3e3d875a4f]
/lib64/libc.so.6[0x3e3d87cd78]
/lib64/libqpid-proton.so.2(pn_error_clear+0x18)[0x7f4f4f4e1f18]
/lib64/libqpid-proton.so.2(pn_error_set+0x11)[0x7f4f4f4e1f41]
/lib64/libqpid-proton.so.2(pn_error_vformat+0x3e)[0x7f4f4f4e1f9e]
/lib64/libqpid-proton.so.2(pn_error_format+0x82)[0x7f4f4f4e2032]
/lib64/libqpid-proton.so.2(pn_i_error_from_errno+0x67)[0x7f4f4f4fd737]
/lib64/libqpid-proton.so.2(pn_recv+0x5a)[0x7f4f4f4fd16a]
/home/mick/dispatch/install/lib64/libqpid-dispatch.so.0(qdpn_connector_process+0xd7)[0x7f4f4f759430]




The backtrace from the core file looks like this:


#0  0x003e3d835877 in raise () from /lib64/libc.so.6
#1  0x003e3d836f68 in abort () from /lib64/libc.so.6
#2  0x003e3d875a54 in __libc_message () from /lib64/libc.so.6
#3  0x003e3d87cd78 in _int_free () from /lib64/libc.so.6
#4  0x7fbf8a59b2e8 in pn_error_clear (error=error@entry=0x1501140)
at /home/mick/rh-qpid-proton/proton-c/src/error.c:56
#5  0x7fbf8a59b311 in pn_error_set (error=error@entry=0x1501140, 
code=code@entry=-2,
text=text@entry=0x7fbf801a69c0 recv: Resource temporarily unavailable)
at /home/mick/rh-qpid-proton/proton-c/src/error.c:65
#6  0x7fbf8a59b36e in pn_error_vformat (error=0x1501140, code=-2, 
fmt=optimized out,
ap=ap@entry=0x7fbf801a6de8) at 
/home/mick/rh-qpid-proton/proton-c/src/error.c:81
#7  0x7fbf8a59b402 in pn_error_format (error=error@entry=0x1501140, 
code=optimized out,
fmt=fmt@entry=0x7fbf8a5bb21e %s: %s) at 
/home/mick/rh-qpid-proton/proton-c/src/error.c:89
#8  0x7fbf8a5b6797 in pn_i_error_from_errno (error=0x1501140,
msg=msg@entry=0x7fbf8a5bbe1a recv)
at /home/mick/rh-qpid-proton/proton-c/src/platform.c:119
#9  0x7fbf8a5b61ca in pn_recv (io=0x14e77b0, socket=optimized out, 
buf=optimized out,
size=optimized out) at 
/home/mick/rh-qpid-proton/proton-c/src/posix/io.c:271
#10 0x7fbf8a812430 in qdpn_connector_process (c=0x7fbf7801c7f0)

-

And I can prevent the crash from happening, apparently forever, by commenting 
out this line:
  free(error-text);
in the function  pn_error_clear
in the file proton-c/src/error.c

The error text that is being freed which causes the crash looks like this:
  $2 = {text = 0x7f66e8104e30 recv: Resource temporarily unavailable, root = 
0x0, code = -2}


My dispatch test creates a router network and then repeatedly kills and 
restarts a randomly-selected router.  After this proton checkin it almost never 
gets through 5 iterations without this crash.  After I commented out that line, 
it got through more than 500 iterations before I stopped it.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: no slowdown in 10 gigamessage test

2014-12-08 Thread Michael Goulish
Dominic -- 

I'm trying to get these into proton, under examples/engine/c , but I am 
having some problems because I am brain-damaged.

Attempting to repair brain-damage now.

( And in the proton checkin, the licenses will be standard, not silly. )

( ALthough I *did* like the idea of a PL that is meant to be (partly) sung... )


--- Mick .



- Original Message -
Hi Michael,


Michael Goulish wrote
 After getting complete data for a 10 gigamessage interbox test, 
 ( proton-c, event interface, code here:
 https://github.com/mick-goulish/proton_c_clients.git ) 
 
 ...I see that there is no gradual speed change at all over the duration of
 the test.

I noticed that there wasn't a properly declared license in your repository,
I wonder if you'd be willing to add a LICENSE for
http://www.apache.org/licenses/LICENSE-2.0 and update the headers of psend.c
/ precv.c to match? Currently there seems to be a comedy copyright header
instead :)

This would allow other qpid-proton developers to contribute to these as
generic event-driven samples and/or performance harnesses that could end up
under either ./contrib/ or ./examples/ 




--
View this message in context: 
http://qpid.2158936.n2.nabble.com/no-slowdown-in-10-gigamessage-test-tp7616818p7617251.html
Sent from the Apache Qpid Proton mailing list archive at Nabble.com.


interbox test: 446,000 messages per second

2014-11-25 Thread Michael Goulish


proton-c event interface soak test
24-25 nov 2014
results
--

interbox test over 1 gig-e wire

1 connection, 1 session, 5 links  [1]

10 billion ( 1e10 ) messages sent,
  100 bytes payload per message,
  140 bytes total per message

messages per second average:  446,675  [2]
bandwidth consumed:  500,275,553 bits per second   [2]
 ( 50% of available bandwidth on 1 gig-e )   


credit scheme:  (per link) 400 initial credits, 
200 more every time total falls to 200.

behavior:  Very stable during test.
   Low variability for time-reports after each
   5 million messages delivered.
   No obvious changes in speed.

memory growth:  on receiver, from 4232 KB to 4264 KB





[1] I did an earlier test that determined that these
settings gave me the greatest number of messages 
per second on a single connection.   
However, these settings did not give the highest bandwidth 
consumption.  That happened with larger message sizes.
( i.e. larger payloads. )   
The test was able to saturate a 1 gig-e wire.
Looking for a faster wire to test with ...


[2] note: these data are from a new run that has only
  completed 2 billion messages so far.  
  I lost the data from the first run.   :-(
  I will do more analysis on the data when the current
  test completes. 

 


proton-c event test stable and fast for 5 billion messages

2014-11-20 Thread Michael Goulish

I recently finished switching over my proton-c programs psend  precv 
to the new event-based interface, and my first test of them was a 
5 billion message soak test. 

The programs survived this test with no memory growth, and no gradual 
slowdown. 

This test is meant to find the fastest possible speed of the proton-c 
code itself. (In future, we could make other similar tests designed 
to mimic realistic user scenarios.) In this test, I run both sender 
and receiver on one box, with the loopback interface. I have MTU == 
64K, I use a credit scheme of 600 initial credits, and 300 new credits 
whenever credit falls below 300. The messages are small: exactly 100 
bytes long. 

I am using two processors, both Intel Xeon E5420 @ 2.50GHz with 6144 
KB cache. (Letting the OS decide which processors to use for my two 
processes.) 

On that system, with the above credit scheme, the test is sustaining 
throughput of 408,500 messages per second . That's over a single link, 
between two singly-threaded processes. 

This is significantly faster than my previous, non-event-based code, 
and I find the code *much* easier to understand. 

This may still not be the maximum possible speed on my box. It looks 
like the limiting factor will be the receiver, and right now it is 
using only 74% of its CPU -- so if we could get it to use 100% we *might* 
see a performance gain to the neighborhood of 550,000 messages per second. 
But I have not been able to get closer to 100% just by fooling with the 
credit scheme. Hmm. 



If you'd like to take a look, the code is here: 
https://github.com/mick-goulish/proton_c_clients.git 







Re: [VOTE]: migrate the proton repo to use git

2014-10-31 Thread Michael Goulish

[ X ] Yes, migrate the proton repo over to git.
[   ] No, keep it in svn.
[   ] see if we can find a working copy of SCCS.



proton slowdown - I'm not doing it right

2014-10-29 Thread Michael Goulish

Bozo -- i see you're right. The size of my delivery list -- i.e. the list at 
connection-work_head -- is slowly increasing, and that's the problem.   There 
is 
something I'm not handling, which gets a lot worse under heavy system load, and 
my
sender never digs itself out.

But -- what I want here is a canonical example of how to get high throughput
with proton-c at the engine level -- and what I am hearing is that I should 
be using the event-collector interface which Dispatch uses.  That's what we 
want 
to steer new users toward.

So far the simple fixes I've tried have all resulted in zero speed...

So!  Rather than spending more time fixing these examples I will switch to the
event collector model and see if I can get a nice little example working 
that way.  (And from what I hear, it will be much nicer and much littler.)

At least I know better now what I am trying to do:


  Get a send/receive example at the lowest level,
  that we want to direct new users toward,
  that has throughput as high as possible,
  and that can run flat-out on a time scale of
  days or weeks without crashing, without growing,
  and without slowing down.



- Original Message -
On 28. 10. 14 20:18, Michael Goulish wrote:
 I have gotten callgrind call-graph pictures of my proton/engine sender 
 and receiver when the test is running fast and when it slows down.

 The difference is in the sender -- when running fast, it is spending 
 most of its time in the subtree of pn_connector_process() .. like 71%.

 When it slows down it is instead spending 47% in pn_delivery_writeable(),
 and only 17% in pn_connector_process().


 Since it is still not instantly obvious to me what has happened, 
 I thought I would share with you-all.


 Please see cool pictures at:


 http://people.apache.org/~mgoulish/protonics/performance/slowdown/2014_10_28/svg/psend_fast.svg


 http://people.apache.org/~mgoulish/protonics/performance/slowdown/2014_10_28/svg/psend_slow.svg




 To recap -- I can trigger this condition by getting the box busy while
 my proton/engine test is running.  I.e. by doing a build.
 Even though I stop the build, and all 6 other processors on
 the box go back to being idle -- the test never recovers.

 The receiver goes down to 50% CPU or worse -- but these pictures 
 show that the behavior change is in the sender.




look at call counts, for pn_connector_process() and pn_delivery_writable()

fast : ratio 1 : 5
slow : ratio 1 : 244.5  (!)

The iteration over connection work list gets really expensive, which
means the connection
thinks it has to work on other stuff than what psend.c wants to work on.

I still think that the call to pn_delivery() in psend.c is in a really
unfortunate spot.

btw, why do you iterate over connection work list at all, you could just
remember the delivery
when calling pn_delivery()?

Bozzo


proton slowdown - major clue

2014-10-28 Thread Michael Goulish

I have gotten callgrind call-graph pictures of my proton/engine sender 
and receiver when the test is running fast and when it slows down.

The difference is in the sender -- when running fast, it is spending 
most of its time in the subtree of pn_connector_process() .. like 71%.

When it slows down it is instead spending 47% in pn_delivery_writeable(),
and only 17% in pn_connector_process().


Since it is still not instantly obvious to me what has happened, 
I thought I would share with you-all.


Please see cool pictures at:
   
   
http://people.apache.org/~mgoulish/protonics/performance/slowdown/2014_10_28/svg/psend_fast.svg
   
   
http://people.apache.org/~mgoulish/protonics/performance/slowdown/2014_10_28/svg/psend_slow.svg




To recap -- I can trigger this condition by getting the box busy while
my proton/engine test is running.  I.e. by doing a build.
Even though I stop the build, and all 6 other processors on
the box go back to being idle -- the test never recovers.

The receiver goes down to 50% CPU or worse -- but these pictures 
show that the behavior change is in the sender.





proton gradual slowdown -- I know how to cause it

2014-10-27 Thread Michael Goulish

Earlier I reported a very gradual slowdown in the performance 
of my simple 1-sender 1-receiver test, on RHEL 7.0 and Fedora 20
but not on RHEL 6.3 .

The slowdown caused the test to end up running at half speed after 
a billion or two billion messages.  ( Which took hours to run. )


I now know how to cause this slowdown to happen any time, and
it works just as well on RHEL 6.3 as it does on RHEL 7.0 .

All I have to do is make the machine busy.  Even though I do not
swamp all processors -- in fact, I leave a couple processors idle --
my receiver program slows down when the machine becomes busy --
***and it never recovers***.


I have been doing qpid builds, for example.  Even after I interrupt 
the build -- many minutes later, long after the box has become
idle but for my sender and receiver, the receiver's CPU usage 
is still depressed, and performance has been cut to 1/2 or 1/3
of what it was at the beginning of the test.


It never comes back.

In fact, I can make it ratchet down again by running another build.


This is nothing magic about builds -- that was just a convenient 
way of making the box busy.


I will be making a Jira for this later today.
I was able to make a callgrind picture for the receiver from when 
the test was fast and from when it was slow.  I will attach all 
my info to the Jira.





Re: proton gradual slowdown -- I know how to cause it

2014-10-27 Thread Michael Goulish

You know, I thought of something along those lines, but I can't
see how it makes the receiver actually use less CPU permanently.
It seems like it ought to simply get a backlog, but go back 
to normal CPU usage.

Can you think of any way that a backlog would cause receiver to 
stay at low CPU?

Now that i can make this happen easily instead of waiting forever-
and-a-day, I will get callgrind snapshots of both programs when the
test is fast and slow. It seems like that just must show me 
something.







- Original Message -
 On 27. 10. 14 09:10, Michael Goulish wrote:
  Earlier I reported a very gradual slowdown in the performance
  of my simple 1-sender 1-receiver test, on RHEL 7.0 and Fedora 20
  but not on RHEL 6.3 .
 
  The slowdown caused the test to end up running at half speed after
  a billion or two billion messages.  ( Which took hours to run. )
 
 
  I now know how to cause this slowdown to happen any time, and
  it works just as well on RHEL 6.3 as it does on RHEL 7.0 .
 
  All I have to do is make the machine busy.  Even though I do not
  swamp all processors -- in fact, I leave a couple processors idle --
  my receiver program slows down when the machine becomes busy --
  ***and it never recovers***.
 
 
 Michael,
 
 this is totally a wild guess, but looking at your psend.c the only thing
 that jumps out is that you couple 1:1 number of deliveries created and
 number of
 calls to pn_driver_wait(). So if anything (which I cannot explain
 /what/) happens where
 sender starts to lag in talking to receiver it may not be able to dig
 itself out.
 
 Maybe try to create first delivery before the loop and create the next
 delivery after
 pn_link_advance()
 
 Bozzo
 


proton engine perfectly stable on RHEL 6 after 1.5 billion messages

2014-10-20 Thread Michael Goulish

unlike my recent experience on Fedora,
I have just seen my psend and precv clients 
( written against proton engine/driver interface )
survive a 1.5 billion message test completely unscathed.

On the machine I am using, that is about 4.5 hours of
sending messages as fast as they will fly.

Memory use is absolutely stable -- no increase at all 
in RSS as measured by 'top'.

Time per 5 million messages has always between 56 and 57 
seconds.

This is exactly the same code (for the send/recv clients) 
that I used on Fedora when I saw the gradual slowdown.
( I downloaded new proton code in the wee hours today, but
it sure doesn't look like anything that got checked in in the
last 3 days is at all relevant to a gradual slowdown.)

SO !

please give me your opinion but ... I think that we DO NOT CARE
about behavior on Fedora.  The reason I am doing these 
soak tests is to assure potential users that the code is 
stable enough for prolonged use in a production environment.
Which Fedora is not.

Does that make sense to everybody ?





Re: proton engine perfectly stable on RHEL 6 after 1.5 billion messages

2014-10-20 Thread Michael Goulish
Yes, you're right.
I hadn't thought of the Fedora-as-the-future angle.
sigh.

But then at least I must see whether it happens on another Fedora box,
and a more up to date one -- i.e. F20, which I now have.
My test that showed the slowdown ran on F17.

I will start that today.

I did not get the callgrind data yet, because I was unable to install
callgrind-devel on my F17 box, which is why I finally installed F20
on another box.

so, I will report on what happens here.



- Original Message -
 Hi Mick,
 
 That's a real head-scratcher - I'm at a loss to explain what you are seeing.
 
 On a lark I thought that perhaps generating new random uuids for each message
 send may be involved - perhaps over time the entropy pool would shrink and
 slow down the allocation of new uuids.  I even wrote a little python loop
 that did nothing but allocate uuids.  Ran overnight, no change in allocation
 rate on my Fedora 19 laptop.
 
 Yeah, I really, really need to get a larger tinfoil hat.
 
 Otherwise, while I personally agree with your opinion regarding Fedora
 support, I'm hesitant to dismiss the problem without root causing it since
 what is in Fedora today often ends up in RHEL tomorrow.
 
 Did your callgrind tracing show anything of interest?
 
 -K
 
 
 - Original Message -
  From: Michael Goulish mgoul...@redhat.com
  To: proton@qpid.apache.org
  Sent: Monday, October 20, 2014 9:53:05 AM
  Subject: proton engine perfectly stable on RHEL 6 after 1.5 billion
  messages
  
  
  unlike my recent experience on Fedora,
  I have just seen my psend and precv clients
  ( written against proton engine/driver interface )
  survive a 1.5 billion message test completely unscathed.
  
  On the machine I am using, that is about 4.5 hours of
  sending messages as fast as they will fly.
  
  Memory use is absolutely stable -- no increase at all
  in RSS as measured by 'top'.
  
  Time per 5 million messages has always between 56 and 57
  seconds.
  
  This is exactly the same code (for the send/recv clients)
  that I used on Fedora when I saw the gradual slowdown.
  ( I downloaded new proton code in the wee hours today, but
  it sure doesn't look like anything that got checked in in the
  last 3 days is at all relevant to a gradual slowdown.)
  
  SO !
  
  please give me your opinion but ... I think that we DO NOT CARE
  about behavior on Fedora.  The reason I am doing these
  soak tests is to assure potential users that the code is
  stable enough for prolonged use in a production environment.
  Which Fedora is not.
  
  Does that make sense to everybody ?
  
  
  
  
 
 --
 -K
 


very weird proton soak-test result

2014-10-17 Thread Michael Goulish

I just want to mention this to the list in case
anyone has an immediate brilliant idea.

Something spooky is happening.

I have a 2-process test, one sender, one receiver,
written at the proton engine level.
This is what I've been using for my nightly performance
measurements lately, which are here:
http://people.apache.org/~mgoulish/protonics/performance/results/nightly.svg


Recently changed this test to be perpetual, and the
receiver now only reports the timing for the most 
recent 5 million messages.

So ... the weird result is that, over many messages,
the test is slowing down ... and it is doing this 
without the RSS memory of either process growing!  Arg!
The virtual mem of the sender increased, but only slightly.
It might very well have fallen again if I let the test
keep running.   ( That happened in the receiver. )


The effect is very gradual, but after 500 million
messages it is taking about 50% more time to get 
each batch of 5 million messages received !!
And it looks like the effect is accelerating.


Also -- the receiver CPU usage is slowly going down.
CPU usage on the sender is constant.  For the receiver, 
CPU usage started out around 77% at the beginning of the 
test, and after 500 million msgs has fallen to 64% or so.


My plan is to use callgrind to get a snapshot of
sender behavior (I suspect sender is culprit) 
at the beginning of a run, and a separate snapshot
later after the slowdown has started.


But i wanted to just mention it here, just in case
anybody has a Great Idea, or in case I get hit by a 
truck or something.



Re: nightly graphical proton-c performance results

2014-10-14 Thread Michael Goulish
excellent idea! 

so, just a horizontal line for each release something like this? : 

How about: the lines do not extend indefinitely rightward -- but only until the 
next release? 

- Original Message -

 Michael Goulish wrote
  From now on until ebola gets me (and maybe long after that!)
  new proton-c code will be downloaded, built, performance-tested, and the
  results posted in tasteful and attractive graphical form here:
 
 
 
  http://people.apache.org/~mgoulish/protonics/performance/results/nightly.svg
 
 
  The testing is done with my proton-C engine-level clients.
  Each test consists of 50 trials, 5 million small messages each trial.

 Michael, nice work. Would it also be possible to add baselines for some of
 the stable releases? 0.5, 0.6, 0.7, 0.8rc1 ? Either on a separate Releases
 graph, or if it can elegantly be inlined at the beginning of the nightly
 graph that might work too.

 Cheers,
 Dom

 --
 View this message in context:
 http://qpid.2158936.n2.nabble.com/nightly-graphical-proton-c-performance-results-tp7615180p7615181.html
 Sent from the Apache Qpid Proton mailing list archive at Nabble.com.

Re: nightly graphical proton-c performance results

2014-10-14 Thread Michael Goulish
B-but ... that's me!

I am that process.

I guess you could say that I am automated...




- Original Message -
On Mon, 2014-10-13 at 20:09 -0400, Michael Goulish wrote:
 From now on until ebola gets me (and maybe long after that!)
 new proton-c code will be downloaded, built, performance-tested, and the 
 results posted in tasteful and attractive graphical form here:
 
 
  
 http://people.apache.org/~mgoulish/protonics/performance/results/nightly.svg
 
 
 The testing is done with my proton-C engine-level clients.
 Each test consists of 50 trials, 5 million small messages each trial.
 
 the graphics show you at a glance the mean value of the 50 tests for that
 day, as well as the plus-one sigma and minus-one sigma range.  (That is the 
 range in which about two-thirds of the test results will fall.)
 
 The standard deviation (sigma) is important, because if you see that suddenly
 increasing -- even if the mean value remains relatively constant -- that 
 means 
 we have a problem.
 
 A while ago we had a significant performance issue in qpidd that went 
 undetected
 for several months.   The goal here is to make sure that any significant 
 proton
 performance regression will become obvious within 24 hours.  (8.64e19 
 femtoseconds)
 
 As always, I would be happy to hear any thoughts, questions, criticisms, 
 ideas,
 proposals, desires, hopes, dreams, or schemes relating to this system or 
 anything
 else.
 

Excellent stuff. Now all we need is an automated process to watch for
significant performance regressions and send email listing all those
commits that might have been responsible.



nightly graphical proton-c performance results

2014-10-13 Thread Michael Goulish

From now on until ebola gets me (and maybe long after that!)
new proton-c code will be downloaded, built, performance-tested, and the 
results posted in tasteful and attractive graphical form here:


 
http://people.apache.org/~mgoulish/protonics/performance/results/nightly.svg


The testing is done with my proton-C engine-level clients.
Each test consists of 50 trials, 5 million small messages each trial.

the graphics show you at a glance the mean value of the 50 tests for that
day, as well as the plus-one sigma and minus-one sigma range.  (That is the 
range in which about two-thirds of the test results will fall.)

The standard deviation (sigma) is important, because if you see that suddenly
increasing -- even if the mean value remains relatively constant -- that means 
we have a problem.

A while ago we had a significant performance issue in qpidd that went undetected
for several months.   The goal here is to make sure that any significant proton
performance regression will become obvious within 24 hours.  (8.64e19 
femtoseconds)

As always, I would be happy to hear any thoughts, questions, criticisms, ideas,
proposals, desires, hopes, dreams, or schemes relating to this system or 
anything
else.



proton callgrind pictures online

2014-10-01 Thread Michael Goulish

...together with the callgrind data that generated the pictures,
scripts, licenses, witty  illuminating commentary, advice to 
the lovelorn -- Everything Necessary to the Enjoyment and Success
of the Aspiring Proton Programmer.


http://people.apache.org/~mgoulish/protonics/performance/results/2014_09_05_perf88/






LTO: link-time optimization effect on proton performance

2014-10-01 Thread Michael Goulish

Link-time optimization can be turned on by adding the
 -flto flag to the proton library build, in both compilation
and linking steps.  It offers the possibility of optimizations
using deeper knowledge of the whole program than is available 
before linking.

I have also been trying to get some extra performance by hand-
inlining functions that I select based on valgrind/callgrind
profiling data.

My test procedure has been to run 50 trials, where each trial 
is a run of two programs: my psend and precv proton-C clients
written at the Engine level.  Each trial involves sending and 
receiving 5 million small messages.  

The result from each trial is a single high-resolution timing 
number.  (From just before the sender sends the first message, 
to just after the receiver receives the last message.)
The result of each test is a list of 50 of those numbers.

I compare tests using an online Student's T-test calculator.
(Student was the pen-name of the guy who invented it.
His real name was Gosset, and he was working at the Guiness 
Brewery in Dublin when he invented it. I am not making this up.)  
The t-test gives a number that indicates the likihood 
that the difference between two tests could have happened randomly.

A small t-test result indicates that the difference between 
two test is unlikely to have happened randomly.  For example
a t-test result of 0.01 means that the difference between your 
two tests should only happen 1 time out of 100 times due to 
random chance.  Smaller results are better.

With 50 sample-points in each test, you can get nice high 
certainty as to whether you are seeing real or random results.
All of the results below are hyper-significant.  The *worst*
t-test result was 2.9e-8, i.e. 3 chances out of 100 million 
that the difference between the two tests could happen randomly.



So .. here are the results.   (in seconds)


( builds used throughout are normal release-with-debug-info,
with -O2 optimization. )


1. Proton code as of 0800 EDT yesterday, with no changes. 
   mean 41.267825   sigma 0.834826

2. LTO build
   mean 40.073661   sigma 1.108513improvement: 2.9%

3. manual inlining changes
   mean 39.011794   sigma 1.056831improvement: 5.5%

4. LTO build plus my changes
   mean 39.211283   sigma 1.041303improvement: 5.0%




So!  The LTO technology really works, but it's not as
good as manual inlining based on profiling.  In fact
it slows that down a little, probably because it is choosing 
some inlining candidates that don't help enough to offset
cache thrash due to code size increase.


so there you go.




[jira] [Created] (PROTON-703) inlining performance improvements

2014-09-29 Thread michael goulish (JIRA)
michael goulish created PROTON-703:
--

 Summary: inlining performance improvements
 Key: PROTON-703
 URL: https://issues.apache.org/jira/browse/PROTON-703
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-c
Reporter: michael goulish
Assignee: michael goulish
Priority: Minor


omnibus jira for any other inlining performance improvements i may find.

notes to self:
  * don't affect public APIs.
  * don't forget to test Debug build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PROTON-700) small performance improvement from inling one fn.

2014-09-25 Thread michael goulish (JIRA)
michael goulish created PROTON-700:
--

 Summary: small performance improvement from inling one fn.
 Key: PROTON-700
 URL: https://issues.apache.org/jira/browse/PROTON-700
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-c
Reporter: michael goulish
Assignee: michael goulish
Priority: Minor


inlining the internal function pn_data_node()  improves speed somewhere between 
2.6% and 6%, depending on architecture.

This is based on testing I did with two C-based clients written at the engine 
interface level.

The higher 6% figure was seen on a more modern machine with recent Intel 
processors, the lower figure was seen on an older box with AMD processors.

But the effect is real: after 5- repetition before the change  50 after, 
T-test indicates odds of this happening by chance is 2.0e-18 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


proton performance: Release vs. RelWithDebInfo

2014-09-04 Thread Michael Goulish

A little confirmation -- 

in my testing of proton engine send/receive clients
I just now tested the performance of those clients built against Proton 
Release version, versus Proton RelWithDebInfo version.

I did indeed find that both versions were the same at running my tests, 
after averaging the speed of 5 tests with each version.

In fact, RelWithDebInfo came in a mite faster -- a couple percent -- 
but probably still within the variability of the observations.



Proton Performance Pictures (1 of 2)

2014-09-03 Thread Michael Goulish

[ resend :  I am attaching only 1 image here, so hopefully the apache 
mail gadget will not become upset.  Next one in next email. ]



Attached, please find two cool pictures of the valgrind/callgrind data
I got with a test run of the psend and precv clients I mentioned before.


( Sorry, I keep saying 'clients'.  These are pure Peer-to-Peer. )
( Hey -- if we ever sell this technology to maritime transport companies,
could we call it Pier-to-Pier ? )



This was from a run of 100,000 messages, using credit strategy of
200, 100, 100.
i.e. start at 200, every time you get down to 100, add 100.

That point is where I seem to find the best performance on my
system: 123,500 messages per second received.  ( i.e. 247,000
transfers per second ) using about 180% CPU ( i.e. 90% each of
2 processors. )

By the way, I actually got repeatably better performance (maybe
1.5% better  (which resulted in the 123,500 number)) by using processors
1 and 3 on my laptop, rather than 1 and 2.   Looking at /proc/cpuinfo,
I see that processors 1 and 3 have different core IDs.  OK, whatever.
( And it's an Intel system... )


I think there are no shockers here:
  psend uses its time in pn_post_transfer_frame  (44%)
  precv uses its time in pn_dispatch_frame   (67%)


The code is at https://github.com/mick-goulish/proton_c_clients.git

I will put all this performance info in there too, shortly.


proton performance pic 1 of 2

2014-09-03 Thread Michael Goulish

I hope these are still legible.

I would like to have a Little Talk with the Apache mail server...



I will put full-resolution versions in my git repo...

proton performance: OK, so I can't send you pictures.

2014-09-03 Thread Michael Goulish
Fabulous.

OK, I put them in my git repo:

  https://github.com/mick-goulish/proton_c_clients.git

And there, they are full-res.


sigh.




Re: Proton Performance Pictures (1 of 2)

2014-09-03 Thread Michael Goulish




- Original Message -
 On 09/03/2014 08:51 AM, Michael Goulish wrote:
  That point is where I seem to find the best performance on my
  system: 123,500 messages per second received.  ( i.e. 247,000
  transfers per second ) using about 180% CPU ( i.e. 90% each of
  2 processors. )
 
 If you are sending direct between the sender and receiver process (i.e.
 no intermediary process), then why are you doubling the number of
 messages sent to get 'transfers per second'? One transfer is the sending
 of a message from one process to another, which in this case is the same
 as messages sent or received.
 

Yes, this is interesting.

I need a way to make a fair comparison between something like this setup 
(simple peer-to-peer) and the Dispatch Router numbers I was getting
earlier.


For the router, the analogous topology is

writer -- router -- reader

in which case I counted each message twice.



But it does not seem right to count a single message in
   writer -- router -- reader 
as 2 transfers, while counting a single message in
   writer -- reader
as only 1 transfer.

Because -- from the application point of view, those two topologies 
are doing the same work.



Also I think that I *need* to countwriter--router--reader   
as 2, because in *this* case:


 writer --  router -- reader_1
  \
   \-- reader_2


...I need to count that as 3 .



? Thoughts ?




Re: Proton Performance Pictures (1 of 2)

2014-09-03 Thread Michael Goulish

OK -- I just had a quick talk with Ted, and this makes sense
to me now:

  count *receives* per second.

I had it turned around and was worried about *sends* per second,
and then got confused by issues of fanout.

If you only count *receives* per second, and assume no discards,
it seems to me that you can indeed make a fair speed comparison 
between

   sender -- receiver

   sender -- intermediary -- receiver

and

   sender -- intermediary -- {receiver_1 ... receiver_n}

and even

   sender -- {arbitrary network of intermediaries} -- {receiver_1 ... 
receiver_n}

phew.


So I will do it that way.

This is from the application perspective, asking how fast is your 
messaging system.  It doesn't care about how fancy the intermediation 
is, it only cares about results.  This seems like the right way to judge that.









- Original Message -


On 09/03/2014 11:35 AM, Michael Goulish wrote:
 
 
 
 
 - Original Message -
 On 09/03/2014 08:51 AM, Michael Goulish wrote:
 That point is where I seem to find the best performance on my
 system: 123,500 messages per second received.  ( i.e. 247,000
 transfers per second ) using about 180% CPU ( i.e. 90% each of
 2 processors. )

 If you are sending direct between the sender and receiver process (i.e.
 no intermediary process), then why are you doubling the number of
 messages sent to get 'transfers per second'? One transfer is the sending
 of a message from one process to another, which in this case is the same
 as messages sent or received.

 
 Yes, this is interesting.
 
 I need a way to make a fair comparison between something like this setup 
 (simple peer-to-peer) and the Dispatch Router numbers I was getting
 earlier.
 
 
 For the router, the analogous topology is
 
 writer -- router -- reader
 
 in which case I counted each message twice.
 
 
 
 But it does not seem right to count a single message in
writer -- router -- reader 
 as 2 transfers, while counting a single message in
writer -- reader
 as only 1 transfer.
 
 Because -- from the application point of view, those two topologies 
 are doing the same work.

You should probably be using throughput and not transfers in this case.

 
 
 
 Also I think that I *need* to countwriter--router--reader   
 as 2, because in *this* case:
 
 
  writer --  router -- reader_1
   \
\-- reader_2
 
 
 ...I need to count that as 3 .
 
 
 
 ? Thoughts ?
 
 


Re: proton performance: OK, so I can't send you pictures.

2014-09-03 Thread Michael Goulish




- Original Message -
 On Wed, 2014-09-03 at 04:18 -0400, Michael Goulish wrote:
  Fabulous.
  
  OK, I put them in my git repo:
  
https://github.com/mick-goulish/proton_c_clients.git
  
  And there, they are full-res.
  
  
  sigh.
  
  
 
 pn_data_node and pn_data_add look interesting. Might be worth inlining
 pn_data_node if the compiler isn't doing that already (did you build
 with -O3?) Also shaving a few instructions off pn_data_add might pay
 off.
 
 
 

I just went into ccmake and told it to do a Release build.
Is there any way I can turn it up to 11 ?


proton engine performance: two strong credit management effects

2014-08-27 Thread Michael Goulish

conclusion 
= 

Using the proton engine interface (in C) I am seeing two 
aspects of credit management that can strongly affect 
throughput. 

The first is credit stall. If you frequently allow the credit 
available to the sender to drop to zero, so that he has to cool 
his heels waiting for more, that can have a very strong effect 
even in my simple test case, in which the receiver is granting 
new credit as quickly as possible, and is serving only 1 sender. 

The second effect is credit replenishment amortization. 
It looks like the granting of new credit is kind of an expensive 
operation. If you do it too frequently, that will also have a 
noticeable effect on throughput. 



These tests use the C clients, written against the engine/driver level, 
that I recently put at 
https://github.com/mick-goulish/proton_c_clients.git 




test setup 
= 

* single sender, single receiver, on laptop. 

* sender only sends, receiver only receives. 

* one link 

* sender locked onto CPU core 1, receiver locked onto 
CPU core 2 

* system is otherwise quiet - only OS and XFCE running. 
no browser, no internet. 

* sender sends as fast as possible, receiver receives as 
fast as possible. 

* each test consists of 5,000,000 messages, about 50 
bytes of payload each. 

* each test is repeated 3 times, and the results averaged 
to make the number that is graphed. 





stall test result 
= 

scenario: start out with 200 credits. Every time we 
get down to X, add 100 to credit level. 

X axis: point at which credit gets refilled. 
Y axis: messages received per second. 


Note: When we let credit go to 10 or less before replenishment, throughput 
falls off a cliff. 



amortization test result 
= 

scenario: start out with 200 credits. Every time we 
get down to 200-X, add X to credit level. 

X axis: credit increment 
Y axis: messages received per second. 

Note: In this case, if X has a value of 5, that means we add 5 new 
units of credit every time we see that it has fallen by 5. 
The smaller the X value, the more frequently we replenish credit. 
If we replenish too frequently, throughput is affected. 



[jira] [Commented] (PROTON-625) Biggest Backtrace Ever!

2014-07-03 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051826#comment-14051826
 ] 

michael goulish commented on PROTON-625:


Here's what happens, and a fix.


  1. pni_map_entry() calls pni_map_ensure() to make sure map
 has enough capacity.

  2. The capacity-increasing loop in pni_map_ensure() has two
 conditions on it:  increase the capacity if map-capacity
 is too small, or if  map 'load' is greater than map-load_factor.
 ( Map load is ... meaning not obvious to me. )

  3. If pni_map_ensure() returns true, then pni_map_entry() will
 call itself recursively, and keep doing that until
 pni_map_ensure() returns false.
 'False' means 'I made no change.'

  4. But it is possible for pni_map_ensure() to make no change,
 and yet return true.
 Here is how it happened in my most recent test:
 map-capacity 512
 capacity  331
 pni_map_load(map) 0.75
 map-load_factor  0.75

   5. Those values made *both* conditions on the capacity-
  increasing loop in pni_map_ensure() false.
  So it didn't do anything to change the map.
  But it returned true.
  So pni_map_entry() called itself.
  But nothing had changed.
  And away we go.

   FIX 

 Make the test on the if at the top of pni_map_ensure
 say this:

   if (capacity = map-capacity  load = map-load_factor) {

 ( Added '=' to the load test. )

 After that, I ran twenty tests with no failure.
 Previously, failure probability on my system
 was 0.3.So odds of 20 in a row happening
 by chance is a little less than 1 in 1000.


 Biggest Backtrace Ever!
 ---

 Key: PROTON-625
 URL: https://issues.apache.org/jira/browse/PROTON-625
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.8
Reporter: michael goulish

 I am saving all my stuff so I can repro on demand.
 It doesn't happen every time, but it's about 50%.
 --
 On one box, I have a dispatch router.
 On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 
 qpid-messaging-based senders.
 Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... 
  c.
 100 messages will be sent to each address.
 I start the 5 receivers first.  They start OK.  Dispatch router happy  
 stable.
 Wait a few seconds.
 I start the 5 senders, from a bash script.
 The first sender is already sending when the 2nd, 3rd, 4th start.
 After a few of them start,but before all have finished starting,  a few 
 seconds into the script, the crash occurs.  ( If they all start up 
 successfully, no crash. )
 The crash occurs in the dispatch router.
 Here is the biggest backtrace ever:
 #0  0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at 
 malloc.c:4383
 #1  0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664
 #2  0x0039c6c1650a in pni_map_allocate () from 
 /usr/lib64/libqpid-proton.so.2
 #3  0x0039c6c16a3a in pni_map_ensure () from 
 /usr/lib64/libqpid-proton.so.2
 #4  0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #5  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #6  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #7  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #8  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #9  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 .
 .
 .
 .
 #93549 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93550 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93551 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93552 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93553 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93554 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93555 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93556 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93557 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93558

[jira] [Commented] (PROTON-625) Biggest Backtrace Ever!

2014-07-02 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049897#comment-14049897
 ] 

michael goulish commented on PROTON-625:


I had some confusion about what libraries were being picked up.  
Sorry!

This bug is *not* present on 0.7 !

I was able to run 0.7-based dispatch-router 10 times with no failure.
Then, switching to latest proton trunk code as of today -- 2 out of first 3 
tests resulted in this failure.



 Biggest Backtrace Ever!
 ---

 Key: PROTON-625
 URL: https://issues.apache.org/jira/browse/PROTON-625
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.8
Reporter: michael goulish

 I am saving all my stuff so I can repro on demand.
 It doesn't happen every time, but it's about 50%.
 --
 On one box, I have a dispatch router.
 On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 
 qpid-messaging-based senders.
 Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... 
  c.
 100 messages will be sent to each address.
 I start the 5 receivers first.  They start OK.  Dispatch router happy  
 stable.
 Wait a few seconds.
 I start the 5 senders, from a bash script.
 The first sender is already sending when the 2nd, 3rd, 4th start.
 After a few of them start,but before all have finished starting,  a few 
 seconds into the script, the crash occurs.  ( If they all start up 
 successfully, no crash. )
 The crash occurs in the dispatch router.
 Here is the biggest backtrace ever:
 #0  0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at 
 malloc.c:4383
 #1  0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664
 #2  0x0039c6c1650a in pni_map_allocate () from 
 /usr/lib64/libqpid-proton.so.2
 #3  0x0039c6c16a3a in pni_map_ensure () from 
 /usr/lib64/libqpid-proton.so.2
 #4  0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #5  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #6  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #7  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #8  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #9  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 .
 .
 .
 .
 #93549 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93550 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93551 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93552 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93553 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93554 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93555 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93556 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93557 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93558 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2
 #93560 0x0039c6c17226 in pn_hash_put () from 
 /usr/lib64/libqpid-proton.so.2
 #93561 0x0039c6c2a643 in pn_delivery_map_push () from 
 /usr/lib64/libqpid-proton.so.2
 #93562 0x0039c6c2c44b in pn_do_transfer () from 
 /usr/lib64/libqpid-proton.so.2
 #93563 0x0039c6c24385 in pn_dispatch_frame () from 
 /usr/lib64/libqpid-proton.so.2
 #93564 0x0039c6c2448f in pn_dispatcher_input () from 
 /usr/lib64/libqpid-proton.so.2
 #93565 0x0039c6c2d68b in pn_input_read_amqp () from 
 /usr/lib64/libqpid-proton.so.2
 #93566 0x0039c6c3011a in pn_io_layer_input_passthru () from 
 /usr/lib64/libqpid-proton.so.2
 #93567 0x0039c6c3011a in pn_io_layer_input_passthru () from 
 /usr/lib64/libqpid-proton.so.2
 #93568 0x0039c6c2d275 in transport_consume () from 
 /usr/lib64/libqpid-proton.so.2
 #93569 0x0039c6c304cd in pn_transport_process () from 
 /usr/lib64/libqpid-proton.so.2
 #93570 0x0039c6c3e40c in pn_connector_process () from 
 /usr/lib64/libqpid-proton.so.2
 #93571 0x7f1060c60460 in process_connector () from 
 /home/mick/dispatch/build/libqpid-dispatch.so.0

[jira] [Commented] (PROTON-625) Biggest Backtrace Ever!

2014-07-02 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051007#comment-14051007
 ] 

michael goulish commented on PROTON-625:


Here is a hack that fixes it.
A little new code in pni_map_ensure().

Tested this on latest protonics, version 1607485.

Without hack:  3 failures out of 10 tests. (similar to what I have been seeing 
on other versions.)

With hack:  0 failures out of 13 tests.  ( probability this happened by chance: 
less that 1% )


So, now I'm trying to see how it should *really* be fixed...


--- code --- code --- code --- code --- code --- code --- code --- code --- 
code ---


 // This loop is what is already there, in pni_map_ensure.  No change.
  while (map-capacity  capacity || pni_map_load(map)  map-load_factor) {
map-capacity *= 2;
map-addressable = (size_t) (0.86 * map-capacity);
  }

  /*---
If ever we get past the above while-loop without 
actually having changed map-cap, we are doomed 
to eternal torment.  So, force it.
  ---*/
  if ( oldcap == map-capacity )
  {
fprintf ( stderr, Fiery the angels fell; deep thunder rolled around their 
shores, burning with the fires of Orc!\n );
map-capacity *= 2;
map-addressable = (size_t) (0.86 * map-capacity);
  }


 Biggest Backtrace Ever!
 ---

 Key: PROTON-625
 URL: https://issues.apache.org/jira/browse/PROTON-625
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.8
Reporter: michael goulish

 I am saving all my stuff so I can repro on demand.
 It doesn't happen every time, but it's about 50%.
 --
 On one box, I have a dispatch router.
 On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 
 qpid-messaging-based senders.
 Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... 
  c.
 100 messages will be sent to each address.
 I start the 5 receivers first.  They start OK.  Dispatch router happy  
 stable.
 Wait a few seconds.
 I start the 5 senders, from a bash script.
 The first sender is already sending when the 2nd, 3rd, 4th start.
 After a few of them start,but before all have finished starting,  a few 
 seconds into the script, the crash occurs.  ( If they all start up 
 successfully, no crash. )
 The crash occurs in the dispatch router.
 Here is the biggest backtrace ever:
 #0  0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at 
 malloc.c:4383
 #1  0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664
 #2  0x0039c6c1650a in pni_map_allocate () from 
 /usr/lib64/libqpid-proton.so.2
 #3  0x0039c6c16a3a in pni_map_ensure () from 
 /usr/lib64/libqpid-proton.so.2
 #4  0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #5  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #6  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #7  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #8  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #9  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 .
 .
 .
 .
 #93549 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93550 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93551 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93552 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93553 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93554 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93555 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93556 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93557 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93558 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2
 #93560 0x0039c6c17226 in pn_hash_put () from 
 /usr/lib64/libqpid-proton.so.2
 #93561 0x0039c6c2a643 in pn_delivery_map_push () from 
 /usr/lib64/libqpid-proton.so.2
 #93562 0x0039c6c2c44b

[jira] [Commented] (PROTON-625) Biggest Backtrace Ever!

2014-07-01 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048569#comment-14048569
 ] 

michael goulish commented on PROTON-625:


BTW -- I kill and restart the router after each test.

 Biggest Backtrace Ever!
 ---

 Key: PROTON-625
 URL: https://issues.apache.org/jira/browse/PROTON-625
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.7
Reporter: michael goulish

 I am saving all my stuff so I can repro on demand.
 It doesn't happen every time, but it's about 50%.
 --
 On one box, I have a dispatch router.
 On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 
 qpid-messaging-based senders.
 Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... 
  c.
 100 messages will be sent to each address.
 I start the 5 receivers first.  They start OK.  Dispatch router happy  
 stable.
 Wait a few seconds.
 I start the 5 senders, from a bash script.
 The first sender is already sending when the 2nd, 3rd, 4th start.
 After a few of them start,but before all have finished starting,  a few 
 seconds into the script, the crash occurs.  ( If they all start up 
 successfully, no crash. )
 The crash occurs in the dispatch router.
 Here is the biggest backtrace ever:
 #0  0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at 
 malloc.c:4383
 #1  0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664
 #2  0x0039c6c1650a in pni_map_allocate () from 
 /usr/lib64/libqpid-proton.so.2
 #3  0x0039c6c16a3a in pni_map_ensure () from 
 /usr/lib64/libqpid-proton.so.2
 #4  0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #5  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #6  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #7  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #8  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #9  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 .
 .
 .
 .
 #93549 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93550 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93551 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93552 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93553 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93554 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93555 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93556 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93557 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93558 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2
 #93560 0x0039c6c17226 in pn_hash_put () from 
 /usr/lib64/libqpid-proton.so.2
 #93561 0x0039c6c2a643 in pn_delivery_map_push () from 
 /usr/lib64/libqpid-proton.so.2
 #93562 0x0039c6c2c44b in pn_do_transfer () from 
 /usr/lib64/libqpid-proton.so.2
 #93563 0x0039c6c24385 in pn_dispatch_frame () from 
 /usr/lib64/libqpid-proton.so.2
 #93564 0x0039c6c2448f in pn_dispatcher_input () from 
 /usr/lib64/libqpid-proton.so.2
 #93565 0x0039c6c2d68b in pn_input_read_amqp () from 
 /usr/lib64/libqpid-proton.so.2
 #93566 0x0039c6c3011a in pn_io_layer_input_passthru () from 
 /usr/lib64/libqpid-proton.so.2
 #93567 0x0039c6c3011a in pn_io_layer_input_passthru () from 
 /usr/lib64/libqpid-proton.so.2
 #93568 0x0039c6c2d275 in transport_consume () from 
 /usr/lib64/libqpid-proton.so.2
 #93569 0x0039c6c304cd in pn_transport_process () from 
 /usr/lib64/libqpid-proton.so.2
 #93570 0x0039c6c3e40c in pn_connector_process () from 
 /usr/lib64/libqpid-proton.so.2
 #93571 0x7f1060c60460 in process_connector () from 
 /home/mick/dispatch/build/libqpid-dispatch.so.0
 #93572 0x7f1060c61017 in thread_run () from 
 /home/mick/dispatch/build/libqpid-dispatch.so.0
 #93573 0x003cf9c07851 in start_thread (arg=0x7f1052bfd700) at 
 pthread_create.c:301
 #93574 0x003cf98e890d in clone

[jira] [Commented] (PROTON-625) Biggest Backtrace Ever!

2014-07-01 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048573#comment-14048573
 ] 

michael goulish commented on PROTON-625:


When I put usleep(1000) after each message sent, I have zero failures in 10 
tries.

 Biggest Backtrace Ever!
 ---

 Key: PROTON-625
 URL: https://issues.apache.org/jira/browse/PROTON-625
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.7
Reporter: michael goulish

 I am saving all my stuff so I can repro on demand.
 It doesn't happen every time, but it's about 50%.
 --
 On one box, I have a dispatch router.
 On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 
 qpid-messaging-based senders.
 Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... 
  c.
 100 messages will be sent to each address.
 I start the 5 receivers first.  They start OK.  Dispatch router happy  
 stable.
 Wait a few seconds.
 I start the 5 senders, from a bash script.
 The first sender is already sending when the 2nd, 3rd, 4th start.
 After a few of them start,but before all have finished starting,  a few 
 seconds into the script, the crash occurs.  ( If they all start up 
 successfully, no crash. )
 The crash occurs in the dispatch router.
 Here is the biggest backtrace ever:
 #0  0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at 
 malloc.c:4383
 #1  0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664
 #2  0x0039c6c1650a in pni_map_allocate () from 
 /usr/lib64/libqpid-proton.so.2
 #3  0x0039c6c16a3a in pni_map_ensure () from 
 /usr/lib64/libqpid-proton.so.2
 #4  0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #5  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #6  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #7  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #8  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #9  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 .
 .
 .
 .
 #93549 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93550 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93551 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93552 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93553 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93554 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93555 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93556 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93557 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93558 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2
 #93560 0x0039c6c17226 in pn_hash_put () from 
 /usr/lib64/libqpid-proton.so.2
 #93561 0x0039c6c2a643 in pn_delivery_map_push () from 
 /usr/lib64/libqpid-proton.so.2
 #93562 0x0039c6c2c44b in pn_do_transfer () from 
 /usr/lib64/libqpid-proton.so.2
 #93563 0x0039c6c24385 in pn_dispatch_frame () from 
 /usr/lib64/libqpid-proton.so.2
 #93564 0x0039c6c2448f in pn_dispatcher_input () from 
 /usr/lib64/libqpid-proton.so.2
 #93565 0x0039c6c2d68b in pn_input_read_amqp () from 
 /usr/lib64/libqpid-proton.so.2
 #93566 0x0039c6c3011a in pn_io_layer_input_passthru () from 
 /usr/lib64/libqpid-proton.so.2
 #93567 0x0039c6c3011a in pn_io_layer_input_passthru () from 
 /usr/lib64/libqpid-proton.so.2
 #93568 0x0039c6c2d275 in transport_consume () from 
 /usr/lib64/libqpid-proton.so.2
 #93569 0x0039c6c304cd in pn_transport_process () from 
 /usr/lib64/libqpid-proton.so.2
 #93570 0x0039c6c3e40c in pn_connector_process () from 
 /usr/lib64/libqpid-proton.so.2
 #93571 0x7f1060c60460 in process_connector () from 
 /home/mick/dispatch/build/libqpid-dispatch.so.0
 #93572 0x7f1060c61017 in thread_run () from 
 /home/mick/dispatch/build/libqpid-dispatch.so.0
 #93573 0x003cf9c07851 in start_thread (arg=0x7f1052bfd700) at 
 pthread_create.c:301
 #93574

Re: [jira] [Commented] (PROTON-625) Biggest Backtrace Ever!

2014-07-01 Thread Michael Goulish
Yes!
Great idea -- 
I will attempt.



- Original Message -

[ 
https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048701#comment-14048701
 ] 

Rafael H. Schloming commented on PROTON-625:


I think the easiest way to track down this bug would be to put some sort of 
detection inside of pni_map_entry and if it recurses more than some limit, e.g. 
32 times or something, then print out a representation of the maps internal 
structure. It might also help to use a debug build so you have line numbers. Is 
that something you feel comfortable trying? You should be able to find the 
relevant code around line 551 of object.c.

 Biggest Backtrace Ever!
 ---

 Key: PROTON-625
 URL: https://issues.apache.org/jira/browse/PROTON-625
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.7
Reporter: michael goulish

 I am saving all my stuff so I can repro on demand.
 It doesn't happen every time, but it's about 50%.
 --
 On one box, I have a dispatch router.
 On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 
 qpid-messaging-based senders.
 Each client will handle 100 addresses, of the form mick/0 ... mick/1 ... 
  c.
 100 messages will be sent to each address.
 I start the 5 receivers first.  They start OK.  Dispatch router happy  
 stable.
 Wait a few seconds.
 I start the 5 senders, from a bash script.
 The first sender is already sending when the 2nd, 3rd, 4th start.
 After a few of them start,but before all have finished starting,  a few 
 seconds into the script, the crash occurs.  ( If they all start up 
 successfully, no crash. )
 The crash occurs in the dispatch router.
 Here is the biggest backtrace ever:
 #0  0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at 
 malloc.c:4383
 #1  0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664
 #2  0x0039c6c1650a in pni_map_allocate () from 
 /usr/lib64/libqpid-proton.so.2
 #3  0x0039c6c16a3a in pni_map_ensure () from 
 /usr/lib64/libqpid-proton.so.2
 #4  0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #5  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #6  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #7  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #8  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #9  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 #14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
 .
 .
 .
 .
 #93549 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93550 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93551 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93552 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93553 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93554 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93555 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93556 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93557 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93558 0x0039c6c16c64 in pni_map_entry () from 
 /usr/lib64/libqpid-proton.so.2
 #93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2
 #93560 0x0039c6c17226 in pn_hash_put () from 
 /usr/lib64/libqpid-proton.so.2
 #93561 0x0039c6c2a643 in pn_delivery_map_push () from 
 /usr/lib64/libqpid-proton.so.2
 #93562 0x0039c6c2c44b in pn_do_transfer () from 
 /usr/lib64/libqpid-proton.so.2
 #93563 0x0039c6c24385 in pn_dispatch_frame () from 
 /usr/lib64/libqpid-proton.so.2
 #93564 0x0039c6c2448f in pn_dispatcher_input () from 
 /usr/lib64/libqpid-proton.so.2
 #93565 0x0039c6c2d68b in pn_input_read_amqp () from 
 /usr/lib64/libqpid-proton.so.2
 #93566 0x0039c6c3011a in pn_io_layer_input_passthru () from 
 /usr/lib64/libqpid-proton.so.2
 #93567 0x0039c6c3011a in pn_io_layer_input_passthru () from 
 /usr/lib64/libqpid-proton.so.2
 #93568 0x0039c6c2d275 in transport_consume () from 
 /usr/lib64/libqpid-proton.so.2
 #93569 0x0039c6c304cd in pn_transport_process () from 
 /usr/lib64

[jira] [Created] (PROTON-625) Biggest Backtrace Ever!

2014-06-30 Thread michael goulish (JIRA)
michael goulish created PROTON-625:
--

 Summary: Biggest Backtrace Ever!
 Key: PROTON-625
 URL: https://issues.apache.org/jira/browse/PROTON-625
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Reporter: michael goulish


I am saving all my stuff so I can repro on demand.
It doesn't happen every time, but it's about 50%.

--

On one box, I have a dispatch router.
On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 
qpid-messaging-based senders.

Each client will handle 100 addresses, of the form mick/0 ... mick/1 ...  
c.

100 messages will be sent to each address.

I start the 5 receivers first.  They start OK.  Dispatch router happy  stable.

Wait a few seconds.

I start the 5 senders, from a bash script.
The first sender is already sending when the 2nd, 3rd, 4th start.

After a few of them start,but before all have finished starting,  a few seconds 
into the script, the crash occurs.  ( If they all start up successfully, no 
crash. )

Here is the biggest backtrace ever:

#0  0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at 
malloc.c:4383
#1  0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664
#2  0x0039c6c1650a in pni_map_allocate () from 
/usr/lib64/libqpid-proton.so.2
#3  0x0039c6c16a3a in pni_map_ensure () from /usr/lib64/libqpid-proton.so.2
#4  0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#5  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#6  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#7  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#8  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#9  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
.
.
.
.
#93549 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93550 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93551 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93552 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93553 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93554 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93555 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93556 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93557 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93558 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2
#93560 0x0039c6c17226 in pn_hash_put () from /usr/lib64/libqpid-proton.so.2
#93561 0x0039c6c2a643 in pn_delivery_map_push () from 
/usr/lib64/libqpid-proton.so.2
#93562 0x0039c6c2c44b in pn_do_transfer () from 
/usr/lib64/libqpid-proton.so.2
#93563 0x0039c6c24385 in pn_dispatch_frame () from 
/usr/lib64/libqpid-proton.so.2
#93564 0x0039c6c2448f in pn_dispatcher_input () from 
/usr/lib64/libqpid-proton.so.2
#93565 0x0039c6c2d68b in pn_input_read_amqp () from 
/usr/lib64/libqpid-proton.so.2
#93566 0x0039c6c3011a in pn_io_layer_input_passthru () from 
/usr/lib64/libqpid-proton.so.2
#93567 0x0039c6c3011a in pn_io_layer_input_passthru () from 
/usr/lib64/libqpid-proton.so.2
#93568 0x0039c6c2d275 in transport_consume () from 
/usr/lib64/libqpid-proton.so.2
#93569 0x0039c6c304cd in pn_transport_process () from 
/usr/lib64/libqpid-proton.so.2
#93570 0x0039c6c3e40c in pn_connector_process () from 
/usr/lib64/libqpid-proton.so.2
#93571 0x7f1060c60460 in process_connector () from 
/home/mick/dispatch/build/libqpid-dispatch.so.0
#93572 0x7f1060c61017 in thread_run () from 
/home/mick/dispatch/build/libqpid-dispatch.so.0
#93573 0x003cf9c07851 in start_thread (arg=0x7f1052bfd700) at 
pthread_create.c:301
#93574 0x003cf98e890d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PROTON-625) Biggest Backtrace Ever!

2014-06-30 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated PROTON-625:
---

  Description: 
I am saving all my stuff so I can repro on demand.
It doesn't happen every time, but it's about 50%.

--

On one box, I have a dispatch router.
On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 
qpid-messaging-based senders.

Each client will handle 100 addresses, of the form mick/0 ... mick/1 ...  
c.

100 messages will be sent to each address.

I start the 5 receivers first.  They start OK.  Dispatch router happy  stable.

Wait a few seconds.

I start the 5 senders, from a bash script.
The first sender is already sending when the 2nd, 3rd, 4th start.

After a few of them start,but before all have finished starting,  a few seconds 
into the script, the crash occurs.  ( If they all start up successfully, no 
crash. )

The crash occurs in the dispatch router.

Here is the biggest backtrace ever:

#0  0x003cf9879ad1 in _int_malloc (av=0x7f101c20, bytes=16384) at 
malloc.c:4383
#1  0x003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664
#2  0x0039c6c1650a in pni_map_allocate () from 
/usr/lib64/libqpid-proton.so.2
#3  0x0039c6c16a3a in pni_map_ensure () from /usr/lib64/libqpid-proton.so.2
#4  0x0039c6c16c45 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#5  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#6  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#7  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#8  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#9  0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#10 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#11 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#12 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#13 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
#14 0x0039c6c16c64 in pni_map_entry () from /usr/lib64/libqpid-proton.so.2
.
.
.
.
#93549 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93550 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93551 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93552 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93553 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93554 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93555 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93556 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93557 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93558 0x0039c6c16c64 in pni_map_entry () from 
/usr/lib64/libqpid-proton.so.2
#93559 0x0039c6c16dc0 in pn_map_put () from /usr/lib64/libqpid-proton.so.2
#93560 0x0039c6c17226 in pn_hash_put () from /usr/lib64/libqpid-proton.so.2
#93561 0x0039c6c2a643 in pn_delivery_map_push () from 
/usr/lib64/libqpid-proton.so.2
#93562 0x0039c6c2c44b in pn_do_transfer () from 
/usr/lib64/libqpid-proton.so.2
#93563 0x0039c6c24385 in pn_dispatch_frame () from 
/usr/lib64/libqpid-proton.so.2
#93564 0x0039c6c2448f in pn_dispatcher_input () from 
/usr/lib64/libqpid-proton.so.2
#93565 0x0039c6c2d68b in pn_input_read_amqp () from 
/usr/lib64/libqpid-proton.so.2
#93566 0x0039c6c3011a in pn_io_layer_input_passthru () from 
/usr/lib64/libqpid-proton.so.2
#93567 0x0039c6c3011a in pn_io_layer_input_passthru () from 
/usr/lib64/libqpid-proton.so.2
#93568 0x0039c6c2d275 in transport_consume () from 
/usr/lib64/libqpid-proton.so.2
#93569 0x0039c6c304cd in pn_transport_process () from 
/usr/lib64/libqpid-proton.so.2
#93570 0x0039c6c3e40c in pn_connector_process () from 
/usr/lib64/libqpid-proton.so.2
#93571 0x7f1060c60460 in process_connector () from 
/home/mick/dispatch/build/libqpid-dispatch.so.0
#93572 0x7f1060c61017 in thread_run () from 
/home/mick/dispatch/build/libqpid-dispatch.so.0
#93573 0x003cf9c07851 in start_thread (arg=0x7f1052bfd700) at 
pthread_create.c:301
#93574 0x003cf98e890d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115


  was:
I am saving all my stuff so I can repro on demand.
It doesn't happen every time, but it's about 50%.

--

On one box, I have a dispatch router.
On the other box, I have 10 clients: 5 Messenger-based receivers, and 5 
qpid-messaging-based senders.

Each client will handle 100 addresses, of the form mick/0 ... mick/1 ...  
c.

100 messages will be sent to each address.

I start the 5

big improvement in memory usage for proton address scale-up

2014-05-13 Thread Michael Goulish
In my original testing for address scale-up
with a Proton Messenger based client, I was
measuring a memory cost of 115 KB per subscribed
address in the client.

Now, after Rafi's recent changes, I am seeing a 
better than 7x improvement, to just under 16 KB 
per subscribed address.

The downside, of course, is that this will
make it about 7x harder for me to persuade 
my boss to buy me a Really Big Box.
( But I'll think of something... )

Thanks for the memory!




[jira] [Commented] (PROTON-577) CollectorImpl creates a lot of unnecessary garbage

2014-05-05 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989829#comment-13989829
 ] 

michael goulish commented on PROTON-577:


What the engineer *means* to say is  superfluous paraphernalia.

 CollectorImpl creates a lot of unnecessary garbage
 --

 Key: PROTON-577
 URL: https://issues.apache.org/jira/browse/PROTON-577
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-j
Affects Versions: 0.7
Reporter: Rafael H. Schloming
Assignee: Rafael H. Schloming





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PROTON-566) crash in pn_transport_set_max_frame

2014-04-16 Thread michael goulish (JIRA)
michael goulish created PROTON-566:
--

 Summary: crash in pn_transport_set_max_frame
 Key: PROTON-566
 URL: https://issues.apache.org/jira/browse/PROTON-566
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.7
 Environment: 3 boxes.  1 with senders, 1 with receivers, and 1 in the 
middle with a single router.
Reporter: michael goulish


Here's what I do:

( I have saved all relevant software so I can repro this. )

  1. On router box, start 1 router.
  2. On receiver box, start 1000 receivers.  With delays in between each group 
of 50, so as to avoid backlog problem.
  3. After receivers are all started, start 1000 senders also with delays.
 Senders start up but do not yet begin sending until I manually signal 
them by touching a file.
  4. Short time after sender start sending, qdrouter crashes in proton code, 
with this traceback:

  Core was generated by `/home/mick/dispatch/build/router/qdrouterd --config 
./config_1/X.conf'.
  Program terminated with signal 11, Segmentation fault.
  #0  0x7f29d3c0f3c0 in pn_transport_set_max_frame (transport=0x0, 
size=65536)
  at /home/mick/proton/proton-c/src/transport/transport.c:1915
  1915transport-local_max_frame = size;
  

  #0  0x7f8ad5a613c0 in pn_transport_set_max_frame (transport=0x0, 
size=65536)
  at /home/mick/proton/proton-c/src/transport/transport.c:1915
  #1  0x7f8ad5cdd4bd in thread_process_listeners (qd_server=0x14f8e10) 
at /home/mick/dispatch/src/server.c:100
  #2  0x7f8ad5cddedb in thread_run (arg=0x1490bf0) at 
/home/mick/dispatch/src/server.c:416
  #3  0x003638c07de3 in start_thread () from /lib64/libpthread.so.0
  #4  0x0036388f616d in clone () from /lib64/libc.so.6




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PROTON-566) crash in pn_transport_set_max_frame

2014-04-16 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated PROTON-566:
---

Description: 
Here's what I do:

( I have saved all relevant software so I can repro this. )

  1. On router box, start 1 router.
  2. On receiver box, start 1000 receivers.  With delays in between each group 
of 50, so as to avoid backlog problem.
  3. After receivers are all started, start 1000 senders also with delays.
 Senders start up but do not yet begin sending until I manually signal 
them by touching a file.
  4. Short time after sender start sending, qdrouter crashes in proton code, 
with this traceback:

  Core was generated by `/home/mick/dispatch/build/router/qdrouterd --config 
./config_1/X.conf'.
  Program terminated with signal 11, Segmentation fault.
  #0  0x7f29d3c0f3c0 in pn_transport_set_max_frame (transport=0x0, 
size=65536)
  at /home/mick/proton/proton-c/src/transport/transport.c:1915
  1915transport-local_max_frame = size;
  

  #0  0x7f8ad5a613c0 in pn_transport_set_max_frame (transport=0x0, 
size=65536)
  at /home/mick/proton/proton-c/src/transport/transport.c:1915
  #1  0x7f8ad5cdd4bd in thread_process_listeners (qd_server=0x14f8e10) 
at /home/mick/dispatch/src/server.c:100
  #2  0x7f8ad5cddedb in thread_run (arg=0x1490bf0) at 
/home/mick/dispatch/src/server.c:416
  #3  0x003638c07de3 in start_thread () from /lib64/libpthread.so.0
  #4  0x0036388f616d in clone () from /lib64/libc.so.6



Looks like this is not a proton problem, but something in dispatch.
I'm closing this and moving it



  was:
Here's what I do:

( I have saved all relevant software so I can repro this. )

  1. On router box, start 1 router.
  2. On receiver box, start 1000 receivers.  With delays in between each group 
of 50, so as to avoid backlog problem.
  3. After receivers are all started, start 1000 senders also with delays.
 Senders start up but do not yet begin sending until I manually signal 
them by touching a file.
  4. Short time after sender start sending, qdrouter crashes in proton code, 
with this traceback:

  Core was generated by `/home/mick/dispatch/build/router/qdrouterd --config 
./config_1/X.conf'.
  Program terminated with signal 11, Segmentation fault.
  #0  0x7f29d3c0f3c0 in pn_transport_set_max_frame (transport=0x0, 
size=65536)
  at /home/mick/proton/proton-c/src/transport/transport.c:1915
  1915transport-local_max_frame = size;
  

  #0  0x7f8ad5a613c0 in pn_transport_set_max_frame (transport=0x0, 
size=65536)
  at /home/mick/proton/proton-c/src/transport/transport.c:1915
  #1  0x7f8ad5cdd4bd in thread_process_listeners (qd_server=0x14f8e10) 
at /home/mick/dispatch/src/server.c:100
  #2  0x7f8ad5cddedb in thread_run (arg=0x1490bf0) at 
/home/mick/dispatch/src/server.c:416
  #3  0x003638c07de3 in start_thread () from /lib64/libpthread.so.0
  #4  0x0036388f616d in clone () from /lib64/libc.so.6



 crash in pn_transport_set_max_frame
 ---

 Key: PROTON-566
 URL: https://issues.apache.org/jira/browse/PROTON-566
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.7
 Environment: 3 boxes.  1 with senders, 1 with receivers, and 1 in the 
 middle with a single router.
Reporter: michael goulish

 Here's what I do:
 ( I have saved all relevant software so I can repro this. )
   1. On router box, start 1 router.
   2. On receiver box, start 1000 receivers.  With delays in between each 
 group of 50, so as to avoid backlog problem.
   3. After receivers are all started, start 1000 senders also with delays.
  Senders start up but do not yet begin sending until I manually signal 
 them by touching a file.
   4. Short time after sender start sending, qdrouter crashes in proton code, 
 with this traceback:
   Core was generated by `/home/mick/dispatch/build/router/qdrouterd --config 
 ./config_1/X.conf'.
   Program terminated with signal 11, Segmentation fault.
   #0  0x7f29d3c0f3c0 in pn_transport_set_max_frame (transport=0x0, 
 size=65536)
   at /home/mick/proton/proton-c/src/transport/transport.c:1915
   1915transport-local_max_frame = size;
   
   #0  0x7f8ad5a613c0 in pn_transport_set_max_frame (transport=0x0, 
 size=65536)
   at /home/mick/proton/proton-c/src/transport/transport.c:1915
   #1  0x7f8ad5cdd4bd in thread_process_listeners 
 (qd_server=0x14f8e10) at /home/mick/dispatch/src/server.c:100
   #2  0x7f8ad5cddedb in thread_run (arg=0x1490bf0) at 
 /home/mick/dispatch/src/server.c:416
   #3  0x003638c07de3 in start_thread () from /lib64/libpthread.so.0
   #4  0x0036388f616d in clone () from /lib64/libc.so.6
 Looks like

[jira] [Closed] (PROTON-566) crash in pn_transport_set_max_frame

2014-04-16 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed PROTON-566.
--

Resolution: Fixed

It looks like this is not a proton issue, but a dispatch issue.
I'm closing this and moving it.

 crash in pn_transport_set_max_frame
 ---

 Key: PROTON-566
 URL: https://issues.apache.org/jira/browse/PROTON-566
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.7
 Environment: 3 boxes.  1 with senders, 1 with receivers, and 1 in the 
 middle with a single router.
Reporter: michael goulish

 Here's what I do:
 ( I have saved all relevant software so I can repro this. )
   1. On router box, start 1 router.
   2. On receiver box, start 1000 receivers.  With delays in between each 
 group of 50, so as to avoid backlog problem.
   3. After receivers are all started, start 1000 senders also with delays.
  Senders start up but do not yet begin sending until I manually signal 
 them by touching a file.
   4. Short time after sender start sending, qdrouter crashes in proton code, 
 with this traceback:
   Core was generated by `/home/mick/dispatch/build/router/qdrouterd --config 
 ./config_1/X.conf'.
   Program terminated with signal 11, Segmentation fault.
   #0  0x7f29d3c0f3c0 in pn_transport_set_max_frame (transport=0x0, 
 size=65536)
   at /home/mick/proton/proton-c/src/transport/transport.c:1915
   1915transport-local_max_frame = size;
   
   #0  0x7f8ad5a613c0 in pn_transport_set_max_frame (transport=0x0, 
 size=65536)
   at /home/mick/proton/proton-c/src/transport/transport.c:1915
   #1  0x7f8ad5cdd4bd in thread_process_listeners 
 (qd_server=0x14f8e10) at /home/mick/dispatch/src/server.c:100
   #2  0x7f8ad5cddedb in thread_run (arg=0x1490bf0) at 
 /home/mick/dispatch/src/server.c:416
   #3  0x003638c07de3 in start_thread () from /lib64/libpthread.so.0
   #4  0x0036388f616d in clone () from /lib64/libc.so.6
 Looks like this is not a proton problem, but something in dispatch.
 I'm closing this and moving it



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Plan for Improving Engine API Documentation, and Request for Suggestions, Etc.

2014-01-24 Thread Michael Goulish
Hello, Proton List!

I'd like to take a whack at adding to the documentation
of the Proton Engine API, and I'd be happy to hear any
relevant suggestions, advice, opinions, speculation, calumny,
slander, warnings, desires, requests, demands, tall tales,
theories, predictions, personal experiences, or amusing
anecdotes that anyone might wish to share.  
In that order.


My plan is to attempt documentation of the functions that
are mentioned in Rafi's excellent UML diagram:

   https://cwiki.apache.org/confluence/display/qpid/Proton+Architecture

 that are not already documented,  that are in the
public interface ( engine.h )


As I work on those functions, I will write little examples,
minimal and focused on one function at a time wherever
humanly possible.  I will probably start with some kind of
Absolutely Minimal Hello World, then use that as a template
to show off other functions.  These examples are for myself,
but if they seem useful I will show them to you and see if
they ought to be published somewhere.


Here is the initial list of functions.  After this I will
go after whatever other functions are in the engine.h file
that are not totally glaringly obvious, like pn_return_char_ptr().
( I made that one up... )


pn_condition_clear
pn_condition_is_set
pn_connection_reset

pn_delivery_buffered
pn_delivery_clear
pn_delivery_readable
pn_delivery_settled
pn_delivery_updated
pn_delivery_writable

pn_error_clear
pn_error_set

pn_link_available
pn_link_credit
pn_link_get_drain
pn_link_is_receiver
pn_link_is_sender
pn_link_queued
pn_link_drain
pn_link_draining
pn_link_flow
pn_link_recv
pn_link_set_drain
pn_link_drained
pn_link_offered
pn_link_send

pn_session_incoming_bytes
pn_session_outgoing_bytes

pn_terminus_copy

pn_transport_unbind




please review: Ruby Messenger Doc

2013-11-19 Thread Michael Goulish


I wanted to show you this in lovely HTML, but all my attempts thus far
(outside of the usual Ruby framework) have created only travesties of
the proper format: diseased and horrible things, lurching through the
stygian depths of my browser like ...  like...

Ah.  Sorry.  Anyway, so --- so, I'm not doing that.
Just settling for practical ASCII.
(It will show up the usual way in the rdoc-generated HTML, once I
check this stuff in.)

This Ruby text is almost identical to the Python text I sent out
a while ago, just a few tweaks attempting to increase perceived
Rubiosity.

There are a few places where the Ruby API lacked some of the other
bindings' (and C code's) interfaces -- those are noted here, but I
left the text in this doc, anticipating that those APIs will show
up shortly.  (Thanks to mcpierce.)

Those places are noted, with the word NOTE.


=





-
class comments
-
{
  The Messenger class defines a high level interface for
  sending and receiving Messages. Every Messenger contains
  a single logical queue of incoming messages and a single
  logical queue of outgoing messages. These messages in these
  queues may be destined for, or originate from, a variety of
  addresses.
  The messenger interface is single-threaded.  All methods
  except one (interrupt) are intended to be used from within
  the messenger thread.

Sending  Receiving Messages
{
  The Messenger class works in conjuction with the Message class. The
  Message class is a mutable holder of message content.

  The put method copies its Message to the outgoing queue, and may
  send queued messages if it can do so without blocking.  The send
  method blocks until it has sent the requested number of messages,
  or until a timeout interrupts the attempt.

  Similarly, the recv method receives messages into the incoming
  queue, and may block as it attempts to receive the requested number
  of messages,  or until timeout is reached. It may receive fewer
  than the requested number.  The get method pops the
  eldest Message off the incoming queue and copies it into the Message
  object that you supply.  It will not block.

  The blocking attribute allows you to turn off blocking behavior entirely,
  in which case send and recv will do whatever they can without
  blocking, and then return.  You can then look at the number
  of incoming and outgoing messages to see how much outstanding work
  still remains.
}
}





-
method details
-
{
  __init__
  {
Construct a new Messenger with the given name. The name has
global scope. If a NULL name is supplied, a unique
name will be chosen.
  }


  __del__
  {
Destroy the Messenger.  This will close all connections that
are managed by the Messenger.  Call the stop method before
destroying the Messenger.
  }


  start
  {
Currently a no-op placeholder.
For future compatibility, do not send or recv messages
before starting the Messenger.
  }


  stop
  {
Transitions the Messenger to an inactive state. An inactive
Messenger will not send or receive messages from its internal
queues. A Messenger should be stopped before being discarded to
ensure a clean shutdown handshake occurs on any internally managed
connections.
  }


  subscribe
  {
Subscribes the Messenger to messages originating from the
specified source. The source is an address as specified in the
Messenger introduction with the following addition. If the
domain portion of the address begins with the '~' character, the
Messenger will interpret the domain as host/port, bind to it,
and listen for incoming messages. For example ~0.0.0.0,
amqp://~0.0.0.0, and amqps://~0.0.0.0 will all bind to any
local interface and listen for incoming messages with the last
variant only permitting incoming SSL connections.
  }


  put
  {
Places the content contained in the message onto the outgoing
queue of the Messenger. This method will never block, however
it will send any unblocked Messages in the outgoing
queue immediately and leave any blocked Messages
remaining in the outgoing queue. The send call may be used to
block until the outgoing queue is empty. The outgoing property
may be used to check the depth of the outgoing queue.

When the content in a given Message object is copied to the outgoing
message queue, you may then modify or discard the Message object
without having any impact on the content in the outgoing queue.

This method returns an outgoing tracker for the Message.  The tracker
can be used to determine the delivery status of the 

[jira] [Created] (PROTON-452) Ruby API doesn't have pn_messenger_interrupt()

2013-11-13 Thread michael goulish (JIRA)
michael goulish created PROTON-452:
--

 Summary: Ruby API doesn't have pn_messenger_interrupt()
 Key: PROTON-452
 URL: https://issues.apache.org/jira/browse/PROTON-452
 Project: Qpid Proton
  Issue Type: Bug
Affects Versions: 0.5
Reporter: michael goulish


It looks like the Ruby binding doesn't cover the new-ish C function  
pn_messenger_interrupt().





--
This message was sent by Atlassian JIRA
(v6.1#6144)


proposed Python API doc changes -- will check in on All Hallow's Eve

2013-10-24 Thread Michael Goulish

  
  Dear Proton Proponents -- 


Here is my proposed text for Python Messenger API documentation.

If you'd like to comment, please do so within the next week.
I will incorporate feedback and check in the resulting
changes to the codebase at the stroke of midnight, on 
All Hallows Eve.  ( Samhain. )


I have given you the current text for each method and property,
and then my changes.  My changes are either proposed replacements
( NEW_TEXT ) or proposed additions ( ADD_TEXT ).

Mostly, this is highly similar to the C API text, but with
minor changes for Pythonification.


  -- Mick .




Class Comments
{
  CURRENT_TEXT
  {
The Messenger class defines a high level interface for
sending and receiving Messages. Every Messenger contains
a single logical queue of incoming messages and a single
logical queue of outgoing messages. These messages in these
queues may be destined for, or originate from, a variety of
addresses.
  }

  ADD_TEXT
  {
The messenger interface is single-threaded.  All methods
except one ( interrupt ) are intended to be used from within
the messenger thread.
  }
}




Sending  Receiving Messages
{
  CURRENT_TEXT
  {
The L{Messenger} class works in conjuction with the L{Message}
class. The L{Message} class is a mutable holder of message content.
The L{put} method will encode the content in a given L{Message}
object into the outgoing message queue leaving that L{Message}
object free to be modified or discarded without having any impact on
the content in the outgoing queue.

Similarly, the L{get} method will decode the content in the incoming
message queue into the supplied L{Message} object.
  }



  NEW_TEXT
  {
The Messenger class works in conjuction with the Message class. The
Message class is a mutable holder of message content.

The put method copies its message to the outgoing queue, and may
send queued messages if it can do so without blocking.  The send
method blocks until it has sent the requested number of messages,
or until a timeout interrupts the attempt.

Similarly, the recv() method receives messages into the incoming
queue, and may block until it has received the requested number of
messages, or until timeout is reached.  The get method pops the
eldest message off the incoming queue and copies it into the message
object that you supply.  It will not block.
  }


  NOTE
  {
I thought it would be better in this comment to only emphasize
the blocking and non-blocking differences between get/put and
recv/send.  Details about how the arg message is handled are moved
to the comments for specific methods.
  }

}




Method Details
{
  __init__
  {
CURRENT_TEXT
{
  Construct a new L{Messenger} with the given name. The name has
  global scope. If a NULL name is supplied, a L{uuid.UUID} based
  name will be chosen.
}

NEW_TEXT
{
  // no change
}
  }


  __del__
  {
CURRENT_TEXT
{
  // none
}

NEW_TEXT
{
  Destroy the messenger.  This will close all connections that
  are managed by the messenger.  Call the stop method before
  destroying the messenger.
}
  }


  start
  {
CURRENT_TEXT
{
  Transitions the L{Messenger} to an active state. A L{Messenger} is
  initially created in an inactive state. When inactive a
  L{Messenger} will not send or receive messages from its internal
  queues. A L{Messenger} must be started before calling L{send} or
  L{recv}.
}

NEW_TEXT
{
  Currently a no-op placeholder.
  For future compatibility, do not send or receive messages
  before starting the messenger.
}
  }


  stop
  {
CURRENT_TEXT
{
  Transitions the L{Messenger} to an inactive state. An inactive
  L{Messenger} will not send or receive messages from its internal
  queues. A L{Messenger} should be stopped before being discarded to
  ensure a clean shutdown handshake occurs on any internally managed
  connections.
}

NEW_TEXT
{
  // no change
}
  }


  subscribe
  {
CURRENT_TEXT
{
  Subscribes the L{Messenger} to messages originating from the
  specified source. The source is an address as specified in the
  L{Messenger} introduction with the following addition. If the
  domain portion of the address begins with the '~' 

[jira] [Comment Edited] (PROTON-260) Messenger Documentation

2013-10-16 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797027#comment-13797027
 ] 

michael goulish edited comment on PROTON-260 at 10/16/13 5:30 PM:
--

rev 152 -- checked in new C API doxygen comments in messenger.h



was (Author: mgoulish):
rev r152 -- checked in new C API doxygen comments in messenger.h


 Messenger Documentation
 ---

 Key: PROTON-260
 URL: https://issues.apache.org/jira/browse/PROTON-260
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-c
Affects Versions: 0.5
Reporter: michael goulish
Assignee: michael goulish

 Write documentation for the Proton Messenger interface, to include:
   introduction
   API explanations
   theory of operation
   example programs
   programming idioms
   tutorials
   quickstarts
   troubleshooting
 Documents should use MarkDown markup language.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PROTON-260) Messenger Documentation

2013-10-16 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797027#comment-13797027
 ] 

michael goulish commented on PROTON-260:


rev r152 -- checked in new C API doxygen comments in messenger.h


 Messenger Documentation
 ---

 Key: PROTON-260
 URL: https://issues.apache.org/jira/browse/PROTON-260
 Project: Qpid Proton
  Issue Type: Improvement
  Components: proton-c
Affects Versions: 0.5
Reporter: michael goulish
Assignee: michael goulish

 Write documentation for the Proton Messenger interface, to include:
   introduction
   API explanations
   theory of operation
   example programs
   programming idioms
   tutorials
   quickstarts
   troubleshooting
 Documents should use MarkDown markup language.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


please take a look at new C API descriptions

2013-09-23 Thread Michael Goulish
These are expanded descriptions that I'd like to add to the C API
documentation.  ( These are the descriptions only -- where the
current info already explains the parameters and returns values
I will just leave those in place. )



Please take a look to see

  1. whether the description matches your understanding
 of what the functions do, and how they fit together.


  2. whether you, as a developer using this code, would
 find the description useful, sufficient, understandable,
 etc.


Question 2 is still very valuable even if you have no
idea about Question 1.



This is not yet a complete list.  Some of the functions are
clear already, and some I have no clue about as yet.



Here they are:




pn_messenger_accept
{
  Signal the sender that you have received and have acted on the message
  pointed to by the tracker.  If the PN_CUMULATIVE flag is set, all
  messages prior to the tracker will also be accepted, back to the
  beginning of your incoming window.
}



pn_messenger_errno
{
  Return the code for the most recent error.
  Initialized to zero at messenger creation.
  Error numbers are sticky i.e. are not reset to 0
  at the end of successful API calls.

  (NOTE! This is the only description that is intentionally false.
   There *is* one API call that resets errno to 0 -- but I think
   it shouldn't, and I will complain about it Real Soon Now.)
}



pn_messenger_error
{
  Return a text description of the most recent error.
  Initialized to null at messenger creation.
  Error text is sticky i.e. not reset to null
  at the end of successful API calls.
}



pn_messenger_get
{
  Pop the oldest message off your incoming message queue,
  and copy it into the given message structure.
  If the given pointer to a message structure in NULL,
  the popped message is discarded.
  Returns PN_EOS if there are no messages to get.
  Returns an error code only if there is a problem in
  decoding the message.
}



pn_messenger_incoming_subscription
{
  Returns a pointer to the subscription of the message returned by the
  most recent call to pn_messenger_get(), or NULL if pn_messenger_get()
  has never been called.
}



pn_messenger_incoming_tracker
{
  Returns a tracker for the message most recently fetched by
  pn_messenger_get().  The tracker allows you to accept or reject its
  message, or its message plus all prior messages that are still within
  your incoming window.
}



pn_messenger_outgoing_tracker
{
  Returns a tracker for the outgoing message most recently given
  to pn_messenger_put.  Use this tracker with pn_messenger_status
  to determine the delivery status of the message, as long as the
  message is still within your outgoing window.
}



pn_messenger_put
{
  Puts the message onto the messenger's outgoing queue.
  The message may also be sent if transmission would not cause
  blocking.  This call will not block.
}



pn_messenger_reject
{
  Rejects the message indicated by the tracker.  If the PN_CUMULATIVE
  flag is used this call will also reject all prior messages that
  have not already been settled.  The semantics of message rejection
  are application-specific.  If messages represent work requests,
  then rejection would leave the sender free to try another receiver,
  without fear of having the same task done twice.
}



pn_messenger_rewrite
{
  Similar to pn_messenger_route(), except that the destination of
  the message is determined before the message address is rewritten.
  If a message has an outgoing address of amqp://0.0.0.0:5678, and a
  rewriting rule that changes its outgoing address to foo, it will still
  arrive at the peer that is listening on amqp://0.0.0.0:5678, but when
  it arrives there, its outgoing address will have been changed to foo.
}



pn_messenger_send
{
  If blocking has been set with pn_messenger_set_blocking, this call
  will block until n messages have been sent.  A value of -1 for n means
  all messages in the outgoing queue.

  In addition, if a nonzero size has been set for the outgoing window,
  this call will block until all messages within that window have
  been received.  Any blocking will end upon timeout, if one has been
  set by pn_messenger_timeout.

  If blocking has not been set, this call will stop transmitting
  messages when further transmission would require blocking, or when
  the outgoing queue is empty, or when n messages have been sent.
}



pn_messenger_set_blocking
{
  Enable or disable blocking behavior during calls to
  pn_messenger_send and pn_messenger_recv.
}



pn_messenger_set_incoming_window
{
  The size of your incoming window limits the number of messages
  that can be accepted or rejected using trackers.  Messages do
  not enter this window when they have been received (pn_messenger_recv)
  onto you incoming queue.  Messages enter this window only when you
  take them into your application using pn_messenger_get.
  If your incoming window size is N, and you get N+1 messages without
  explicitly accepting or 

another proton error question

2013-09-17 Thread Michael Goulish

OK, so now I understand that we are using the standard errno philosophy in 
which:

  errno always contains the most recent error from anywhere within messenger 
code.
  ( and is iniialized to 0 on creation of messenger struct. )



But another part of this philosophy is:

  No system call (in our case, translate that to 'Messenger API call' or maybe 
'Messenger function') ever sets
errno to zero.


Yet, I am seeing it get set to zero, sometimes.  ( The only fn I have actually 
observed
doing this so far is  pn_output_write_amqp(), but there may be others )

Is there a reason why we depart from the standard errno philosophy here?
Or is it just an oversight?


If it's an oversight, I'd like to put something like 

  assert(code);

as the first line in pn_error_set().



If it's not an oversight, I'd like to know the reasoning so I can document it.




question about proton error philosophy

2013-09-16 Thread Michael Goulish

I was expecting errno inside the messenger to be reset to 0 at the end of any 
successful API call.

It isn't: instead it looks like the idea is that errno preserves the most 
recent error that happened, regardless of how long ago that might be.

Is this intentional?

I am having a hard time understanding why we would not want errno to always 
represent the messenger state as of the completion of the most recent API call.


I would be happy to submit a patch to make it work this way, and see what 
people think - but not if I am merely exhibiting my own philosophical 
ignorance here.




Re: question about proton error philosophy

2013-09-16 Thread Michael Goulish

No, you're right.

errno is never set to zero by any system call or library function
( That's from Linux doco. )

OK, I was just philosophically challenged.
I think what confused me was the line in the current Proton C doc (about errno) 
that says an error code or zero if there is no error.
I'll just remove that line.

OK, I withdraw the question.


( I still don't like this philosophy, but the whole world is using it, and the 
whole world is bigger than I am... )


  

- Original Message -
Do other APIs reset the errno?  I could have sworn they didn't.

On Mon, Sep 16, 2013 at 12:01 PM, Michael Goulish mgoul...@redhat.com wrote:

 I was expecting errno inside the messenger to be reset to 0 at the end of any 
 successful API call.

 It isn't: instead it looks like the idea is that errno preserves the most 
 recent error that happened, regardless of how long ago that might be.

 Is this intentional?

 I am having a hard time understanding why we would not want errno to always 
 represent the messenger state as of the completion of the most recent API 
 call.


 I would be happy to submit a patch to make it work this way, and see what 
 people think - but not if I am merely exhibiting my own philosophical 
 ignorance here.





-- 
Hiram Chirino

Engineering | Red Hat, Inc.

hchir...@redhat.com | fusesource.com | redhat.com

skype: hiramchirino | twitter: @hiramchirino

blog: Hiram Chirino's Bit Mojo


[MESSENGER] multilingual docs - reviews welcome

2013-06-14 Thread Michael Goulish
  What I'm doing with Messenger Tutorial Docs -- Reviews Welcome
  ==
  {

1. How it works
--
{
  There will be a custom cmake target, probably called 'docs'.
  When you 'make' this target, it runs a bunch of simple Messenger
  tests in all supported languages.

  The purpose of these tests is not the same as the other tests
  that already exist; they are not trying to prove correctness
  of all the Messenger features.  Their purpose is to provide
  the documents with good code snippets -- that so the same thing
  in each language.

  If any of these tests fail, we stop right there and the docs
  do not get made.

  After the tests run successfully, the code snippets are all
  extracted.

  The docs I write are kind of ... templates for docs.  Each doc
  gets expanded into L distinct docs, where L is the number of
  languages that Messenger supports.  So you get the cross-product
  of docs and languages.

  Each doc has little markers in it, like
 pmdocproc 13
  that tell the processor which snippet to put there.

  Each program has little snippet-markers like:

/* pmdocproc 13 c */
code ( goes, here );
/* pmdocproc 13 end */

  to tell the processor where to get the snippet from, and what
  language it's in.

  The markers that are inthe code are always comments -- however that
 language makes its comments -- and must be on a line by themselves.
}





2. Why it works that way

{
  We get two cool benefits out of this:

* You can see the whole doc tree in your favorite language,
  and compare languages.

* The code snippets will never go out of date.  If they
  quite working, the docs don't get built.
}



3. Where I am Now
--
{
  * the python version of the doc-maker is working

  * I know how to integrate with cmake, and doing that now.

  * I'll be away next week, but I would like to check stuff in
shortly after returning.
}



4. Feedback I would like
-
{
  Anything you are inspired to volunteer, about any aspect of this.
  I am attaching the Python doc maker below just in case anyone wants
  to look at it.   ( I was so uncomfortable with Python that I prototyped
  the project in C, so I don't know how *pythonic* my code is )

  Thanks in advance!

  And please send any comments to the list.
}



5. The Python doc processor
--
{
  #! /usr/bin/python

  import os
  import subprocess
  import shutil





  #
  #  First, run all the tests for each language.
  #  We will not create any documents if any of
  #  these tests fail.
  #
  def run_examples(languages):
  saved_dir = os.getcwd()

  for language in languages:
  test_dir = './doc_examples/' + language
  os.chdir ( test_dir )
  subprocess.check_call ( ./run_all )
  print -
  print Tests in , test_dir ,  were successful.
  print -
  os.chdir ( saved_dir )

  print \n=
  print   All language example tests were successful.
  print =\n





  #
  # Make new output dirs for each language.
  # These will hold final docs.
  #
  def make_output_dirs ( output_dir, html_dir, languages ):
  if os.path.exists ( output_dir ):
  shutil.rmtree ( output_dir )
  os.mkdir ( output_dir )

  if os.path.exists ( html_dir ):
  shutil.rmtree ( html_dir )
  os.mkdir ( html_dir )

  for language in languages:
  os.mkdir ( output_dir + '/' + language )
  os.mkdir ( html_dir   + '/' + language )





  #
  #  For each example program name, there
  #  should be an instance of it in each
  #  example/language directory.
  #  I.e., example program foo should exist as
  # doc_examples/c/foo.c
  # doc_examples/rb/foo.rb
  # doc_examples/py/foo.py
  #
  def make_example_file_names(example_dir, languages, example_names, 
example_file_names ):
  for language in languages:
 

Re: message disposition question

2013-04-23 Thread Michael Goulish
Oh!  Oh!  Let me try!   (see inline)



- Original Message -
 On 04/18/2013 06:21 AM, Rafael Schloming wrote:
  I spoke a bit too soon in my first reply. The tracking windows are
  *supposed* to be measured from the point where the tracker is first
  assigned, so from when you call put or get. This means that it shouldn't
  matter how many times you call recv or how much credit recv gives out, the
  only thing that matters is whether you've called get() more than WINDOW
  times. That should be fine as calling get() is very much in your control.
  Now the reason I was confused yesterday is that from looking at the code it
  appears that due to a recent commit, incoming trackers are actually
  assigned earlier than they should be. This has not been the case for any
  released code, however, only for a short time quite recently on trunk.
 
  --Rafael
 
 
  On Wed, Apr 17, 2013 at 2:26 PM, Rafael Schloming r...@alum.mit.edu wrote:
 
  That's a good question and now that you mention it nothing prevents it.
  That was an intentional choice when the feature was added, and it wasn't a
  problem at the time because we didn't have recv(-1). This meant that you
  were always asking for an explicit amount and if you asked for more than
  your window, you were (hopefully knowingly) asking for trouble. With
  recv(-1), however, you are no longer explicitly controlling the credit
  window so this could be a problem. One possibility might be to define
  incoming/outgoing window sizes of -1 to allow for unlimited sizes.
 
  --Rafael
 
 
 
  On Wed, Apr 17, 2013 at 1:32 PM, Michael Goulish
  mgoul...@redhat.comwrote:
 
 
  ( a question inspired by a question from a reviewer of
  one of my docs... )
 
  If you set an incoming message window of a certain size,
  and if Messenger can receive messages even when you, i.e.
  call send() - - -  what's to stop some messages from
  falling off the edge of that window, and thus getting
  accepted-by-default, before your app code ever gets a
  chance to make a real decision about the message's disposition ?
 
 
 
 
 
 I'm still not clear on how this answer the original question: If messages can
 be
 received in the background when I call send() or other functions, and that
 can
 cause messages to fall out of the received window, then how do I ensure that
 I
 get a chance to see and ack/reject every message? I have no control over the
 background message delivery.


First, just to be clear, it's not in the background in the sense of
a separate thread, it's just ... 'unexpectedly'.

But the main thing is .. .that is not a *received* window.
If you set a window size of N, that window exists only
relative to the position of the first message for which you 
create a tracker.

Creating a tracker is done not with recv() but with get(),
and the window only exists in the get-space.  Not in the 
recv space.

So the only way to make a message fall off the window is to:

  0. define incoming window size N

  1. call get()

  2. make a tracker  ( will track starting with most recent message got() )

  3. call get() N more times, but don't both to dispose the messages one
 way or the other.

  4. You just had 1 message fall off the edge of your window, and
 get accepted by default.
   
  5. If you keep calling get() now, you will have 1 new message fall 
 over the edge for each call.


 
 A related question: how can I flow control if I'm getting too many messages?
 The
 naive answer is stop calling recv() but if messages can also be received
 when
 I call send() then I have no way to limit the messages that pile up, or
 worse:
 that are dropped off my receive window and into oblivion.


Don't know.  Brain tired.
I think you can't  -- but at least they won't fall
out the window.


 
 Cheers,
 Alan.
 


[jira] [Created] (PROTON-300) qpidd --help should show sasl config path default

2013-04-19 Thread michael goulish (JIRA)
michael goulish created PROTON-300:
--

 Summary: qpidd --help should show sasl config path default
 Key: PROTON-300
 URL: https://issues.apache.org/jira/browse/PROTON-300
 Project: Qpid Proton
  Issue Type: Bug
Reporter: michael goulish
Priority: Minor


qpidd --help does not show the sasl config path default, which is /etc/sasl2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PROTON-300) qpidd --help should show sasl config path default

2013-04-19 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated PROTON-300:
---

Assignee: michael goulish

 qpidd --help should show sasl config path default
 -

 Key: PROTON-300
 URL: https://issues.apache.org/jira/browse/PROTON-300
 Project: Qpid Proton
  Issue Type: Bug
Reporter: michael goulish
Assignee: michael goulish
Priority: Minor

 qpidd --help does not show the sasl config path default, which is /etc/sasl2  
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


message disposition question

2013-04-17 Thread Michael Goulish

( a question inspired by a question from a reviewer of 
one of my docs... )

If you set an incoming message window of a certain size,
and if Messenger can receive messages even when you, i.e.
call send() - - -  what's to stop some messages from 
falling off the edge of that window, and thus getting 
accepted-by-default, before your app code ever gets a
chance to make a real decision about the message's disposition ?




[jira] [Created] (PROTON-295) recv(-1) + incoming_window == bad

2013-04-17 Thread michael goulish (JIRA)
michael goulish created PROTON-295:
--

 Summary: recv(-1) + incoming_window == bad
 Key: PROTON-295
 URL: https://issues.apache.org/jira/browse/PROTON-295
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Affects Versions: 0.4
Reporter: michael goulish


Use of recv(-1) could receive enough messages that some would exceed the 
incoming window size and be automatically accepted -- with app logic never 
getting a say in the matter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


messenger routing suggestion

2013-04-09 Thread Michael Goulish
The idea I put forward for a change to pn_messenger_route()
may have seemed not very well motivated.

Here is a more complete example, and a more complete
suggestion.


Example
===

All of these nodes are Messenger nodes.


1.  Sender
You have a node that's a sender.
It is sending to only abstract addresses.
It uses its routing table to map all its abstract
addresses to the same receiver -- because that receiver
is where the system centralizes knowledge about
changing network conditions.

Sender's routing table
--

  COLOSSUS -- 1.2.3.4:
  GUARDIAN -- 1.2.3.4:
  HAL9000  -- 1.2.3.4:
  SKYNET   -- 1.2.3.4:



2. Router
   You have a router node that is listening on 1.2.3.4: .
   It receives and then forwards messages.
   It can do this because the messages it receives from
   Sender still have their untranslated addresses.

   It has this routing table
   ---
  COLOSSUS -- 5.6.7.8:1234
  COLOSSUS -- 5.6.7.9:1234
  GUARDIAN -- 5.6.7.23:3456
  HAL9000  -- null
  SKYNET   -- 5.6.7.99:6969



3. Adapting to changing conditions
   The 'router' node can change its address translation
   table based on messages that it receives from other parts
   of the network.  For example, maybe it does load-balancing
   this way.

   But in general -- it needs to change its translation table
   at run time because conditions in the network are changing.
   It is the node that encapsulates knowledge about those changes
   so that the rest of our nodes do not need to worry about it.
   They just send to COLOSSUS or whatever.

   This implies that we should be able to change routing
   dynamically.



4. Fanout
   Note that there are two translations for COLOSSUS.
   We send to both of them.  ( This is a change. )
   This is how we implement fanout with the address translation.



5. Load and Store
   Since the translation table can change due to changing
   network conditions, the Router node should be able to
   store its table to a file, and load from that file.
   The information that it has learned during operation
   is not lost.  Or it can use this facility to fork off
   another copy of itself.



6. API changes
   It seems to me that the address translation functionality
   is potentially very powerful, with a few teensy changes.

   Here they are:



   /*=
 At send time, the messenger examines its translation
 table, and sends a copy of the message to each matching
 address.  ( this is a change )

 The address stored in the message is not changed.
 ( this is true now. )
   =*/

   /*---
 Append the given translation to the list.
   ---*/
   pn_messenger_route_add ( pn_messenger_t *messenger,
const char *pattern,
const char *address );



   /*-
 If the given pattern already exists in the list,
 replace its first occurrence with this translation.
 Otherwise add this translation to the list.
   -*/
   pn_messenger_route_replace ( pn_messenger_t *messenger,
const char *pattern,
const char *address );


   /*-
 Delete the given translation from the list.
 ( Else, NOOP. )
   -*/
   pn_messenger_route_delete ( pn_messenger_t *messenger,
   const char *pattern,
   const char *address );


   /*-
 Delete from the list all translations with this
 pattern.
   -*/
   pn_messenger_route_delete_pattern ( pn_messenger_t *messenger,
   const char *pattern );


   /*-
 Clear the translation table.
   -*/
   pn_messenger_route_clear_table ( pn_messenger_t *messenger );



   /*-
 Load from the given fp
   -*/
   pn_messenger_route_load_table ( pn_messenger_t *messenger,
   FILE * fp );



   /*-
 Store to the given fp
   -*/
   pn_messenger_route_load_table ( pn_messenger_t *messenger,
   FILE * fp );


possible cool change to pn_messenger_route()

2013-04-08 Thread Michael Goulish

While working on docs, I was getting excited about something cool that I could 
do with pn_messenger_route() 
And then I realized that I couldn't.

What would you think about allowing replacement of an old route by a new route 
with the same pattern ?

It seems like this would allow some cool, adaptive behavior in Proton networks.

I just coded and tested a simple case successfully -- sending 2 messages to an 
abstract address, and having them go to 2 different receivers because a new 
route got set in between.

Would this behavior cause any problems that would outweigh the coolness?


Code seems pretty straightforward - - - -


int pn_messenger_route(pn_messenger_t *messenger, const char *pattern, const 
char *address)
{
  if (strlen(pattern)  PN_MAX_PATTERN || strlen(address)  PN_MAX_ROUTE) {
return PN_ERR;
  }
  pn_route_t *new_route = (pn_route_t *) malloc(sizeof(pn_route_t));
  if (!new_route) return PN_ERR;

  strcpy(new_route-pattern, pattern);
  strcpy(new_route-address, address);
  new_route-next = NULL;

  /* The list is empty. */
  if (! messenger-routes ) {
messenger-routes = new_route;
return 0;
  } 

  pn_route_t *old;

  /* The route to be replaced is first on the list. */
  if ( ! strcmp ( messenger-routes-pattern, new_route-pattern ) ) {
old = messenger-routes;
new_route-next = old-next;
messenger-routes = new_route;
free ( (char *) old );
return 0;
  }

  pn_route_t *route = messenger-routes;

  /* The route to be replaced is somewhere down the list, or not there. */
  while ( 1 ) {
/* No route in list had same pattern. */
if ( ! route-next ) {
  route-next = new_route;
  return 0;
}
/* Bingo ! */
if ( ! strcmp ( route-next-pattern, new_route-pattern ) ) {
  old = route-next;
  new_route-next = old-next;
  route-next = new_route;
  free ( (char *) old );
  return 0;
}

route = route-next;
  }

  return 0;
}


problem with multiple senders

2013-04-04 Thread Michael Goulish


  Is this a bug, or am I  Doing  Something  Wrong ?



Scenario
{
  My sender sends a single message, and hopes to see
  that the receiver has accepted it.

  I launch 3 copies of the sender very close together-- 
  they all talk to the same address.   

  My receiver receives in a loop, accepts every message
  that it receives.
}




Result
{
  Sometimes my receiver gets 1 of the 3 messages.
  Usually it gets 2.
  It never gets all 3.

  The 3rd sender hangs in pn_messenger_send().

  While the 3rd sender is hanging in send(), the receiver
  is patiently waiting in recv().
}






Sender Code 

/*
  Launch 3 of these from a script like so:
  ./sender  
  ./sender  
  ./sender  
*/


#include proton/message.h
#include proton/messenger.h

#include getopt.h
#include stdio.h
#include stdlib.h
#include string.h
#include ctype.h


char *
status_2_str ( pn_status_t status )
{
  switch ( status )
  {
case PN_STATUS_UNKNOWN:
  return unknown;
  break;

case PN_STATUS_PENDING:
  return pending;
  break;

case PN_STATUS_ACCEPTED:
  return accepted;
  break;

case PN_STATUS_REJECTED:
  return rejected;
  break;

default:
  return bad value;
  break;
  }
}



pid_t my_pid = 0;


void
check ( char * label, int result )
{
  fprintf ( stderr, %d %s result: %d\n, my_pid, label, result );
}



int
main(int argc, char** argv)
{
  int c;
  char addr [ 1000 ];
  char msgtext [ 100 ];
  pn_message_t   * message;
  pn_messenger_t * messenger;
  pn_data_t  * body;
  pn_tracker_t tracker;
  pn_status_t  status;
  int  result;

  my_pid = getpid();

  sprintf ( addr, amqp://0.0.0.0:%s, argv[1] );


  message = pn_message ( );
  messenger = pn_messenger ( NULL );
  pn_messenger_start ( messenger ) ;
  pn_messenger_set_outgoing_window ( messenger, 1 );


  pn_message_set_address ( message, addr );
  body = pn_message_body ( message );


  sprintf ( msgtext, Message from %d, getpid() );
  pn_data_put_string ( body, pn_bytes ( strlen ( msgtext ), msgtext ));
  pn_messenger_put ( messenger, message );
  tracker = pn_messenger_outgoing_tracker ( messenger );
  pn_messenger_send ( messenger );


  status = pn_messenger_status ( messenger, tracker );
  fprintf ( stderr, status : %s\n, status_2_str(status) );


  pn_messenger_stop ( messenger );
  pn_messenger_free ( messenger );
  pn_message_free ( message );

  return 0;
}




Receiver Code 

/*

  Launch like this:
  ./receiver 
*/

#include stdio.h
#include stdlib.h
#include ctype.h

#include proton/message.h
#include proton/messenger.h



#define BUFSIZE 1024



int
main(int argc, char** argv)
{
  size_t bufsize = BUFSIZE;
  char buffer [ BUFSIZE ];
  char addr [ 1000 ];
  pn_message_t   * message;
  pn_messenger_t * messenger;
  pn_data_t  * body;
  pn_tracker_t tracker;


  sprintf ( addr, amqp://~0.0.0.0:%s, argv[1] );

  message = pn_message();
  messenger = pn_messenger ( NULL );

  pn_messenger_start(messenger);
  pn_messenger_subscribe ( messenger, addr );
  pn_messenger_set_incoming_window ( messenger, 5 );

  /*-
Receive and accept the message.
  -*/
  while ( 1 )
  {
fprintf ( stderr, receiving...\n );
pn_messenger_recv ( messenger, 3 );

while ( pn_messenger_incoming ( messenger )  0 )
{
  fprintf ( stderr, getting message...\n );
  pn_messenger_get ( messenger, message );
  tracker = pn_messenger_incoming_tracker ( messenger );
  pn_messenger_accept ( messenger, tracker, 0 );
  body = pn_message_body ( message );
  pn_data_format ( body, buffer,  bufsize );
  fprintf ( stdout, Address: %s\n, pn_message_get_address ( message ) );
  fprintf ( stdout, Content: %s\n, buffer);
}
  }

  pn_messenger_stop(messenger);
  pn_messenger_free(messenger);

  return 0;
}




Re: problem with multiple senders

2013-04-04 Thread Michael Goulish
] - CLOSE @24 [null]
[0x24fae10:0] - EOS
Closed localhost:42468
[0x2538e40:1] - DETACH @22 [1, true, null]
[0x2538e40:0] - CLOSE @24 [null]
[0x2538e40:0] - EOS
[0x2538e40:1] - DETACH @22 [1, true, null]
[0x2538e40:0] - CLOSE @24 [null]
[0x2538e40:0] - EOS
Closed localhost:42469

- end trace -










- Original Message -
Any clues from a trace of the receiver?

$ PN_TRACE_FRM=1 ./receiver 

-Ted

On 04/04/2013 02:09 PM, Michael Goulish wrote:

Is this a bug, or am I  Doing  Something  Wrong ?



 Scenario
 {
My sender sends a single message, and hopes to see
that the receiver has accepted it.

I launch 3 copies of the sender very close together--
they all talk to the same address.

My receiver receives in a loop, accepts every message
that it receives.
 }




 Result
 {
Sometimes my receiver gets 1 of the 3 messages.
Usually it gets 2.
It never gets all 3.

The 3rd sender hangs in pn_messenger_send().

While the 3rd sender is hanging in send(), the receiver
is patiently waiting in recv().
 }






 Sender Code 

 /*
Launch 3 of these from a script like so:
./sender  
./sender  
./sender  
 */


 #include proton/message.h
 #include proton/messenger.h

 #include getopt.h
 #include stdio.h
 #include stdlib.h
 #include string.h
 #include ctype.h


 char *
 status_2_str ( pn_status_t status )
 {
switch ( status )
{
  case PN_STATUS_UNKNOWN:
return unknown;
break;

  case PN_STATUS_PENDING:
return pending;
break;

  case PN_STATUS_ACCEPTED:
return accepted;
break;

  case PN_STATUS_REJECTED:
return rejected;
break;

  default:
return bad value;
break;
}
 }



 pid_t my_pid = 0;


 void
 check ( char * label, int result )
 {
fprintf ( stderr, %d %s result: %d\n, my_pid, label, result );
 }



 int
 main(int argc, char** argv)
 {
int c;
char addr [ 1000 ];
char msgtext [ 100 ];
pn_message_t   * message;
pn_messenger_t * messenger;
pn_data_t  * body;
pn_tracker_t tracker;
pn_status_t  status;
int  result;

my_pid = getpid();

sprintf ( addr, amqp://0.0.0.0:%s, argv[1] );


message = pn_message ( );
messenger = pn_messenger ( NULL );
pn_messenger_start ( messenger ) ;
pn_messenger_set_outgoing_window ( messenger, 1 );


pn_message_set_address ( message, addr );
body = pn_message_body ( message );


sprintf ( msgtext, Message from %d, getpid() );
pn_data_put_string ( body, pn_bytes ( strlen ( msgtext ), msgtext ));
pn_messenger_put ( messenger, message );
tracker = pn_messenger_outgoing_tracker ( messenger );
pn_messenger_send ( messenger );


status = pn_messenger_status ( messenger, tracker );
fprintf ( stderr, status : %s\n, status_2_str(status) );


pn_messenger_stop ( messenger );
pn_messenger_free ( messenger );
pn_message_free ( message );

return 0;
 }




 Receiver Code 

 /*

Launch like this:
./receiver 
 */

 #include stdio.h
 #include stdlib.h
 #include ctype.h

 #include proton/message.h
 #include proton/messenger.h



 #define BUFSIZE 1024



 int
 main(int argc, char** argv)
 {
size_t bufsize = BUFSIZE;
char buffer [ BUFSIZE ];
char addr [ 1000 ];
pn_message_t   * message;
pn_messenger_t * messenger;
pn_data_t  * body;
pn_tracker_t tracker;


sprintf ( addr, amqp://~0.0.0.0:%s, argv[1] );

message = pn_message();
messenger = pn_messenger ( NULL );

pn_messenger_start(messenger);
pn_messenger_subscribe ( messenger, addr );
pn_messenger_set_incoming_window ( messenger, 5 );

/*-
  Receive and accept the message.
-*/
while ( 1 )
{
  fprintf ( stderr, receiving...\n );
  pn_messenger_recv ( messenger, 3 );

  while ( pn_messenger_incoming ( messenger )  0 )
  {
fprintf ( stderr, getting message...\n );
pn_messenger_get ( messenger, message );
tracker = pn_messenger_incoming_tracker ( messenger );
pn_messenger_accept ( messenger, tracker, 0 );
body = pn_message_body ( message );
pn_data_format ( body, buffer,  bufsize );
fprintf ( stdout, Address: %s\n, pn_message_get_address ( message ) 
 );
fprintf ( stdout, Content: %s\n, buffer);
  }
}

pn_messenger_stop(messenger);
pn_messenger_free(messenger);

return 0;
 }





Re: problem with multiple senders

2013-04-04 Thread Michael Goulish
Yes!   -1 did it.  Thanks!



- Original Message -
 I think this is the same bug we've seen before with passing fixed
 (positive) credit limits to recv. The implementation isn't smart enough to
 pay attention to who actually is offering messages when it allocates
 credit, and so it ends up giving out all of its credit to a sender that has
 no use for it instead of to the senders that are blocked. I suspect if you
 replace your 3 with -1 in your call to pn_messenger_recv, then you will see
 the hang go away.
 
 --Rafael
 
 
 On Thu, Apr 4, 2013 at 3:06 PM, Michael Goulish mgoul...@redhat.com wrote:
 
  OK, I'm looking at trace from receiver, and I thought
  I would post it here so I can't be accused of hogging
  all the fun for myself.
 
  ( Remember, three senders all send to same receiver address,
only two get 'accepted' replies.  Last sender ends up hanging in send(),
while receiver (in infinite loop) blocks on recv(). )
 
  I have marked the lines of application output with APPLICATION OUTPUT:
 
 
  Note:
 
  I see these 3 lines:
Accepted from localhost:42468
Accepted from localhost:42469
Accepted from localhost:42470
 
  But only two get closed:
Closed localhost:42468
Closed localhost:42469
 
 
 
 
  - begin trace ---
  Listening on 0.0.0.0:
  APPLICATION OUTPUT:   receiving...
  Accepted from localhost:42468
  Accepted from localhost:42469
  - SASL
  [0x25013c0:0] - SASL-INIT @65 [:ANONYMOUS, b]
  [0x25013c0:0] - SASL-MECHANISMS @64 [@PN_SYMBOL[:ANONYMOUS]]
  [0x25013c0:0] - SASL-OUTCOME @68 [0]
  - SASL
  - AMQP
  [0x24fae10:0] - OPEN @16 [a03b1f27-5053-47f0-ae85-c543782480b5, null,
  null, null, null, null, null, null, null]
  Accepted from localhost:42470
  - SASL
  [0x253f490:0] - SASL-INIT @65 [:ANONYMOUS, b]
  [0x253f490:0] - SASL-MECHANISMS @64 [@PN_SYMBOL[:ANONYMOUS]]
  [0x253f490:0] - SASL-OUTCOME @68 [0]
  - SASL
  - AMQP
  [0x2538e40:0] - OPEN @16 [a03b1f27-5053-47f0-ae85-c543782480b5, null,
  null, null, null, null, null, null, null]
  - AMQP
  [0x24fae10:0] - OPEN @16 [1425753e-bda0-48af-a60f-b8a23c0933d3,
  0.0.0.0, null, null, null, null, null, null, null]
  [0x24fae10:1] - BEGIN @17 [null, 0, 1024, 1024]
  [0x24fae10:1] - ATTACH @18 [sender-xxx, 1, false, null, null, @40
  [null, 0, null, 0, false, null, null, null, null, null, null], @41 [null,
  0, null, 0, false, null, null], null, null, 0]
  [0x24fae10:1] - BEGIN @17 [1, 0, 1024, 1024]
  [0x24fae10:1] - ATTACH @18 [sender-xxx, 1, true, null, null, null,
  null, null, null, 0]
  [0x24fae10:1] - FLOW @19 [0, 1024, 0, 1024, 1, 0, 3, null, false]
  - SASL
  [0x2563350:0] - SASL-INIT @65 [:ANONYMOUS, b]
  [0x2563350:0] - SASL-MECHANISMS @64 [@PN_SYMBOL[:ANONYMOUS]]
  [0x2563350:0] - SASL-OUTCOME @68 [0]
  - SASL
  - AMQP
  [0x255cd00:0] - OPEN @16 [a03b1f27-5053-47f0-ae85-c543782480b5, null,
  null, null, null, null, null, null, null]
  - AMQP
  [0x2538e40:0] - OPEN @16 [35806640-4a26-47a2-a6e2-7fe7505938cf,
  0.0.0.0, null, null, null, null, null, null, null]
  [0x2538e40:1] - BEGIN @17 [null, 0, 1024, 1024]
  [0x2538e40:1] - ATTACH @18 [sender-xxx, 1, false, null, null, @40
  [null, 0, null, 0, false, null, null, null, null, null, null], @41 [null,
  0, null, 0, false, null, null], null, null, 0]
  [0x2538e40:1] - BEGIN @17 [1, 0, 1024, 1024]
  [0x2538e40:1] - ATTACH @18 [sender-xxx, 1, true, null, null, null,
  null, null, null, 0]
  [0x2538e40:1] - FLOW @19 [0, 1024, 0, 1024, 1, 0, 0, null, false]
  - AMQP
  [0x255cd00:0] - OPEN @16 [c8b87edf-6971-4d73-9790-e6f44772cebb,
  0.0.0.0, null, null, null, null, null, null, null]
  [0x255cd00:1] - BEGIN @17 [null, 0, 1024, 1024]
  [0x255cd00:1] - ATTACH @18 [sender-xxx, 1, false, null, null, @40
  [null, 0, null, 0, false, null, null, null, null, null, null], @41 [null,
  0, null, 0, false, null, null], null, null, 0]
  [0x255cd00:1] - BEGIN @17 [1, 0, 1024, 1024]
  [0x255cd00:1] - ATTACH @18 [sender-xxx, 1, true, null, null, null,
  null, null, null, 0]
  [0x255cd00:1] - FLOW @19 [0, 1024, 0, 1024, 1, 0, 0, null, false]
  [0x24fae10:1] - TRANSFER @20 [1, 0, b\x00\x00\x00\x00\x00\x00\x00\x00,
  0, false, false] (148) \x00Sp\xd0\x00\x00\x00\x0b\x00\x00\x00\x05BP\x04@BR
  \x00\x00Ss\xd0\x00\x00\x00b\x00\x00\x00\x0d@@\xa1\x13amqp://0.0.0.0:
  @\xa1+amqp://1425753e-bda0-48af-a60f-b8a23c0933d3@
  @@\x83\x00\x00\x00\x00\x00\x00\x00\x00\x83\x00\x00\x00\x00\x00\x00\x00\x00@R
  \x00@\x00Sw\xa1\x12Message from 22470
  APPLICATION OUTPUT:   getting message...
  APPLICATION OUTPUT:   Address: amqp://0.0.0.0:
  APPLICATION OUTPUT:   Content: Message from 22470
  APPLICATION OUTPUT:   receiving...
  [0x24fae10:1] - DISPOSITION @21 [true, 0, 0, false, @36 []]
  [0x2538e40:1] - FLOW @19 [0, 1024, 0, 1024, 1, 0, 1, null, false]
  [0x24fae10:1] - DISPOSITION @21 [false, 0, 0, false, @36 []]
  [0x2538e40:1] - TRANSFER @20 [1, 0, b\x00\x00\x00\x00\x00\x00\x00\x00,
  0, false

  1   2   >