Re: topology: all tests pass!

2013-02-19 Thread Rafael Schloming
Cool, can you rerun this with the update version of the proton-200 -1
patch? (https://reviews.apache.org/r/9503/)

Also,would it make sense to set these tests up to run as part of the new
ctest stuff?

--Rafael

On Tue, Feb 19, 2013 at 2:05 PM, Michael Goulish wrote:

>
> Green across the board.
>
> Same program running in 1, 2, or 3 instances.
> 1 messenger in each process.  No recompile
> needed between tests -- only the command line args
> change.
>
> All possible combinations of 3 or fewer nodes,
> with 0, 1, or 2 (bi-directional) links between each
> pair.  Including single node, with self-loop.
>
> To pass a test, I have to see that all nodes are receiving
> messages simultaneously, and when one node is receiving from
> 2 senders, its incoming messages should be interleaved ( i.e.
> *not* all the messages from node A, followed by all the messages
> from node B ) and getting them in similar proportions from both
> sources.
>
> This is using Proton 0.4 RC1 code, with the "infinite credit"
> patch -- but none of the messengers are actually asking for
> infinite credit.
>
> The tests are easy to re-run for future versions, and I will
> do that.
>
> ... pretty picture attached ...
>
>


[jira] [Resolved] (PROTON-232) described arrays seem to force the descriptor to be of the same type as the array

2013-02-19 Thread Alan Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Conway resolved PROTON-232.


Resolution: Fixed

Review: https://reviews.apache.org/r/9516/

r1447943 | aconway | 2013-02-19 17:22:05 -0500 (Tue, 19 Feb 2013) | 5 lines

PROTON-232: described arrays seem to force the descriptor to be of the same 
type as the array

Fixed bug in code.c pn_data_encode_node: was always using the parent->type
for everything inside an array, including the descriptor.


> described arrays seem to force the descriptor to be of the same type as the 
> array
> -
>
> Key: PROTON-232
> URL: https://issues.apache.org/jira/browse/PROTON-232
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Reporter: Rafael H. Schloming
>Assignee: Alan Conway
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PROTON-232) described arrays seem to force the descriptor to be of the same type as the array

2013-02-19 Thread Alan Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Conway reassigned PROTON-232:
--

Assignee: Alan Conway  (was: Rafael H. Schloming)

> described arrays seem to force the descriptor to be of the same type as the 
> array
> -
>
> Key: PROTON-232
> URL: https://issues.apache.org/jira/browse/PROTON-232
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Reporter: Rafael H. Schloming
>Assignee: Alan Conway
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: the killer node

2013-02-19 Thread Michael Goulish
Oh it has to work then  testing ...  and it does.


But I do get this unusual compiler warning:


  warning: ISO C90 forbids fools and madmen to program in this language.  Go 
learn Haskell and leave me alone.


huh.





- Original Message -
From: "Darryl L. Pierce" 
To: proton@qpid.apache.org
Sent: Tuesday, February 19, 2013 1:37:19 PM
Subject: Re: the killer node

On Tue, Feb 19, 2013 at 11:43:52AM -0500, Michael Goulish wrote:
> 
>   This just in.
> 
>   It's a linking issue.
> 
>   When I changed my two fn names from send() to my_send() 
>   and from recv() to my_recv() ... no more problem.
> 
>   Different behavior on Fedora 17 and Fedora 18.
> 
>   Gulp.
> 
>   I will post more if I learn something useful.

Just for grins, what happens if you set the name back and make it
static?

-- 
Darryl L. Pierce, Sr. Software Engineer @ Red Hat, Inc.
Delivering value year after year.
Red Hat ranks #1 in value among software vendors.
http://www.redhat.com/promo/vendor/



Re: the killer node

2013-02-19 Thread Darryl L. Pierce
On Tue, Feb 19, 2013 at 11:43:52AM -0500, Michael Goulish wrote:
> 
>   This just in.
> 
>   It's a linking issue.
> 
>   When I changed my two fn names from send() to my_send() 
>   and from recv() to my_recv() ... no more problem.
> 
>   Different behavior on Fedora 17 and Fedora 18.
> 
>   Gulp.
> 
>   I will post more if I learn something useful.

Just for grins, what happens if you set the name back and make it
static?

-- 
Darryl L. Pierce, Sr. Software Engineer @ Red Hat, Inc.
Delivering value year after year.
Red Hat ranks #1 in value among software vendors.
http://www.redhat.com/promo/vendor/



pgp6qxASlDBZH.pgp
Description: PGP signature


Re: the killer node

2013-02-19 Thread Michael Goulish
Sorry for scaring you!

Final update is -- don't use global names in your C app that 
look like libc names!  Or make them static.

Duh.

It's a little bit of a mystery as to why other testers did not
see the same issue, but -- probably nothing Earth-shattering 
here.




- Original Message -
From: "Rafael Schloming" 
To: proton@qpid.apache.org
Sent: Tuesday, February 19, 2013 12:15:55 PM
Subject: Re: the killer node

Doh!

You had me scared there for a while.

--Rafael

On Tue, Feb 19, 2013 at 8:43 AM, Michael Goulish wrote:

>
>   This just in.
>
>   It's a linking issue.
>
>   When I changed my two fn names from send() to my_send()
>   and from recv() to my_recv() ... no more problem.
>
>   Different behavior on Fedora 17 and Fedora 18.
>
>   Gulp.
>
>   I will post more if I learn something useful.
>
>
>
>
>
>
>
> - Original Message -
> From: "Michael Goulish" 
> To: proton@qpid.apache.org
> Sent: Tuesday, February 19, 2013 7:40:16 AM
> Subject: the killer node
>
>
> Well, it looks like one of my nodes can kill the other one by doing a put.
> No errors reported by either messenger before the fatality.
>
> I'd like to see if someone else can confirm this result,
> and maybe see something that I am not seeing.
>
> compile and run scripts are provided in the directory, called "node".
>
>
> I am testing this against unpatched 0.4 RC1 code.  ( But result was same
> with
> Ken's recent patch for infinite credit. )
>
>
>   1. Two instances of one program are used.  Node A only receives,
>  Node B only sends to it.
>
>   2. Start node A first, with the script "r1".
>  It will go through its main loop, trying to receive
>  and timing out, for as long as you like.
>
>
>   3. Start node B, with script r2.
>  It will pause after formatting it first message, and will
>  then do a dramatic 5-second countdown.  Then it calls
>  put  ( not send! )  and node *A* dies horribly, its core
>  file spattering the hard disk.
>
>  Node B is unaware of the carnage it has caused, sedated
>  by a sleep loop, tragically still expecting to call send
>  and start talking to its partner, node A.
>
>
> ( see attached -- if you dare. )
>
>
>
>
>


Re: the killer node

2013-02-19 Thread Rafael Schloming
Doh!

You had me scared there for a while.

--Rafael

On Tue, Feb 19, 2013 at 8:43 AM, Michael Goulish wrote:

>
>   This just in.
>
>   It's a linking issue.
>
>   When I changed my two fn names from send() to my_send()
>   and from recv() to my_recv() ... no more problem.
>
>   Different behavior on Fedora 17 and Fedora 18.
>
>   Gulp.
>
>   I will post more if I learn something useful.
>
>
>
>
>
>
>
> - Original Message -
> From: "Michael Goulish" 
> To: proton@qpid.apache.org
> Sent: Tuesday, February 19, 2013 7:40:16 AM
> Subject: the killer node
>
>
> Well, it looks like one of my nodes can kill the other one by doing a put.
> No errors reported by either messenger before the fatality.
>
> I'd like to see if someone else can confirm this result,
> and maybe see something that I am not seeing.
>
> compile and run scripts are provided in the directory, called "node".
>
>
> I am testing this against unpatched 0.4 RC1 code.  ( But result was same
> with
> Ken's recent patch for infinite credit. )
>
>
>   1. Two instances of one program are used.  Node A only receives,
>  Node B only sends to it.
>
>   2. Start node A first, with the script "r1".
>  It will go through its main loop, trying to receive
>  and timing out, for as long as you like.
>
>
>   3. Start node B, with script r2.
>  It will pause after formatting it first message, and will
>  then do a dramatic 5-second countdown.  Then it calls
>  put  ( not send! )  and node *A* dies horribly, its core
>  file spattering the hard disk.
>
>  Node B is unaware of the carnage it has caused, sedated
>  by a sleep loop, tragically still expecting to call send
>  and start talking to its partner, node A.
>
>
> ( see attached -- if you dare. )
>
>
>
>
>


Re: the killer node

2013-02-19 Thread Michael Goulish

  This just in.

  It's a linking issue.

  When I changed my two fn names from send() to my_send() 
  and from recv() to my_recv() ... no more problem.

  Different behavior on Fedora 17 and Fedora 18.

  Gulp.

  I will post more if I learn something useful.







- Original Message -
From: "Michael Goulish" 
To: proton@qpid.apache.org
Sent: Tuesday, February 19, 2013 7:40:16 AM
Subject: the killer node


Well, it looks like one of my nodes can kill the other one by doing a put.
No errors reported by either messenger before the fatality.

I'd like to see if someone else can confirm this result,
and maybe see something that I am not seeing.

compile and run scripts are provided in the directory, called "node".


I am testing this against unpatched 0.4 RC1 code.  ( But result was same with 
Ken's recent patch for infinite credit. )


  1. Two instances of one program are used.  Node A only receives, 
 Node B only sends to it.

  2. Start node A first, with the script "r1".  
 It will go through its main loop, trying to receive
 and timing out, for as long as you like.


  3. Start node B, with script r2.
 It will pause after formatting it first message, and will
 then do a dramatic 5-second countdown.  Then it calls 
 put  ( not send! )  and node *A* dies horribly, its core
 file spattering the hard disk.

 Node B is unaware of the carnage it has caused, sedated
 by a sleep loop, tragically still expecting to call send
 and start talking to its partner, node A.


( see attached -- if you dare. )



  


Re: the killer node

2013-02-19 Thread Rafael Schloming
That's almost the same stack trace I see with send when I comment out the
while (1). The only difference is that it's all under pn_messenger_send
rather than pn_messenger_recv.

This looks to me like the stack is getting corrupted since send is actually
your code yet the trace appears to be claiming that proton is calling into
it which it couldn't possibly do. I'm guessing the whole stack underneath
pn_connector_process (or above it in the trace below) is garbage. Can you
try running under valgrind and see if it spots where the corruption is
happening?

As an aside you should probably also build with debug on as it will be a
little clearer what is going on.

--Rafael

On Tue, Feb 19, 2013 at 7:08 AM, Michael Goulish wrote:

> Sorry, I mean to include that.
>
> Here is the stack trace from node A :
>
>
> #0  0x7fbb74173de8 in vfprintf () from /lib64/libc.so.6
> #1  0x7fbb74177abf in buffered_vfprintf () from /lib64/libc.so.6
> #2  0x7fbb74172c1e in vfprintf () from /lib64/libc.so.6
> #3  0x7fbb7417cd87 in fprintf () from /lib64/libc.so.6
> #4  0x00400f40 in send (name=0x6 ,
> messenger=0x149a150, message=0x51,
> addr=0x4000 ) at node.c:44
> #5  0x7fbb7450f524 in pn_send () from /lib/libqpid-proton.so.1
> #6  0x7fbb74510883 in pn_connector_process () from
> /lib/libqpid-proton.so.1
> #7  0x7fbb7450d85a in pn_messenger_tsync () from
> /lib/libqpid-proton.so.1
> #8  0x7fbb7450d961 in pn_messenger_sync () from
> /lib/libqpid-proton.so.1
> #9  0x7fbb7450ef6d in pn_messenger_recv () from
> /lib/libqpid-proton.so.1
> #10 0x00401079 in recv (name=0x7fff2f9a5363 "A",
> messenger=0x1493970,
> message=0x148e010, addr=0x7fff2f9a4360 "amqp://~0.0.0.0:") at
> node.c:88
> #11 0x004014e2 in main (argc=3, argv=0x7fff2f9a4888) at node.c:194
>
>
>
>
> If you like I can give you access to my machine.
>
>
>
>
>
>
> - Original Message -
> From: "Rafael Schloming" 
> To: proton@qpid.apache.org
> Sent: Tuesday, February 19, 2013 9:33:29 AM
> Subject: Re: the killer node
>
> This doesn't happen for me. I see node B loop forever and never send
> anything which is what I would expect given the while (1) { sleep(...); }
> you have in there. What does your debugger say about where node A crashes?
>
> --Rafael
>
> On Tue, Feb 19, 2013 at 4:40 AM, Michael Goulish  >wrote:
>
> >
> > Well, it looks like one of my nodes can kill the other one by doing a
> put.
> > No errors reported by either messenger before the fatality.
> >
> > I'd like to see if someone else can confirm this result,
> > and maybe see something that I am not seeing.
> >
> > compile and run scripts are provided in the directory, called "node".
> >
> >
> > I am testing this against unpatched 0.4 RC1 code.  ( But result was same
> > with
> > Ken's recent patch for infinite credit. )
> >
> >
> >   1. Two instances of one program are used.  Node A only receives,
> >  Node B only sends to it.
> >
> >   2. Start node A first, with the script "r1".
> >  It will go through its main loop, trying to receive
> >  and timing out, for as long as you like.
> >
> >
> >   3. Start node B, with script r2.
> >  It will pause after formatting it first message, and will
> >  then do a dramatic 5-second countdown.  Then it calls
> >  put  ( not send! )  and node *A* dies horribly, its core
> >  file spattering the hard disk.
> >
> >  Node B is unaware of the carnage it has caused, sedated
> >  by a sleep loop, tragically still expecting to call send
> >  and start talking to its partner, node A.
> >
> >
> > ( see attached -- if you dare. )
> >
> >
> >
> >
>


Re: the killer node

2013-02-19 Thread Michael Goulish
Sorry, I mean to include that.  

Here is the stack trace from node A :


#0  0x7fbb74173de8 in vfprintf () from /lib64/libc.so.6
#1  0x7fbb74177abf in buffered_vfprintf () from /lib64/libc.so.6
#2  0x7fbb74172c1e in vfprintf () from /lib64/libc.so.6
#3  0x7fbb7417cd87 in fprintf () from /lib64/libc.so.6
#4  0x00400f40 in send (name=0x6 , 
messenger=0x149a150, message=0x51, 
addr=0x4000 ) at node.c:44
#5  0x7fbb7450f524 in pn_send () from /lib/libqpid-proton.so.1
#6  0x7fbb74510883 in pn_connector_process () from /lib/libqpid-proton.so.1
#7  0x7fbb7450d85a in pn_messenger_tsync () from /lib/libqpid-proton.so.1
#8  0x7fbb7450d961 in pn_messenger_sync () from /lib/libqpid-proton.so.1
#9  0x7fbb7450ef6d in pn_messenger_recv () from /lib/libqpid-proton.so.1
#10 0x00401079 in recv (name=0x7fff2f9a5363 "A", messenger=0x1493970, 
message=0x148e010, addr=0x7fff2f9a4360 "amqp://~0.0.0.0:") at node.c:88
#11 0x004014e2 in main (argc=3, argv=0x7fff2f9a4888) at node.c:194




If you like I can give you access to my machine.






- Original Message -
From: "Rafael Schloming" 
To: proton@qpid.apache.org
Sent: Tuesday, February 19, 2013 9:33:29 AM
Subject: Re: the killer node

This doesn't happen for me. I see node B loop forever and never send
anything which is what I would expect given the while (1) { sleep(...); }
you have in there. What does your debugger say about where node A crashes?

--Rafael

On Tue, Feb 19, 2013 at 4:40 AM, Michael Goulish wrote:

>
> Well, it looks like one of my nodes can kill the other one by doing a put.
> No errors reported by either messenger before the fatality.
>
> I'd like to see if someone else can confirm this result,
> and maybe see something that I am not seeing.
>
> compile and run scripts are provided in the directory, called "node".
>
>
> I am testing this against unpatched 0.4 RC1 code.  ( But result was same
> with
> Ken's recent patch for infinite credit. )
>
>
>   1. Two instances of one program are used.  Node A only receives,
>  Node B only sends to it.
>
>   2. Start node A first, with the script "r1".
>  It will go through its main loop, trying to receive
>  and timing out, for as long as you like.
>
>
>   3. Start node B, with script r2.
>  It will pause after formatting it first message, and will
>  then do a dramatic 5-second countdown.  Then it calls
>  put  ( not send! )  and node *A* dies horribly, its core
>  file spattering the hard disk.
>
>  Node B is unaware of the carnage it has caused, sedated
>  by a sleep loop, tragically still expecting to call send
>  and start talking to its partner, node A.
>
>
> ( see attached -- if you dare. )
>
>
>
>


Re: the killer node

2013-02-19 Thread Rafael Schloming
This doesn't happen for me. I see node B loop forever and never send
anything which is what I would expect given the while (1) { sleep(...); }
you have in there. What does your debugger say about where node A crashes?

--Rafael

On Tue, Feb 19, 2013 at 4:40 AM, Michael Goulish wrote:

>
> Well, it looks like one of my nodes can kill the other one by doing a put.
> No errors reported by either messenger before the fatality.
>
> I'd like to see if someone else can confirm this result,
> and maybe see something that I am not seeing.
>
> compile and run scripts are provided in the directory, called "node".
>
>
> I am testing this against unpatched 0.4 RC1 code.  ( But result was same
> with
> Ken's recent patch for infinite credit. )
>
>
>   1. Two instances of one program are used.  Node A only receives,
>  Node B only sends to it.
>
>   2. Start node A first, with the script "r1".
>  It will go through its main loop, trying to receive
>  and timing out, for as long as you like.
>
>
>   3. Start node B, with script r2.
>  It will pause after formatting it first message, and will
>  then do a dramatic 5-second countdown.  Then it calls
>  put  ( not send! )  and node *A* dies horribly, its core
>  file spattering the hard disk.
>
>  Node B is unaware of the carnage it has caused, sedated
>  by a sleep loop, tragically still expecting to call send
>  and start talking to its partner, node A.
>
>
> ( see attached -- if you dare. )
>
>
>
>


the killer node

2013-02-19 Thread Michael Goulish

Well, it looks like one of my nodes can kill the other one by doing a put.
No errors reported by either messenger before the fatality.

I'd like to see if someone else can confirm this result,
and maybe see something that I am not seeing.

compile and run scripts are provided in the directory, called "node".


I am testing this against unpatched 0.4 RC1 code.  ( But result was same with 
Ken's recent patch for infinite credit. )


  1. Two instances of one program are used.  Node A only receives, 
 Node B only sends to it.

  2. Start node A first, with the script "r1".  
 It will go through its main loop, trying to receive
 and timing out, for as long as you like.


  3. Start node B, with script r2.
 It will pause after formatting it first message, and will
 then do a dramatic 5-second countdown.  Then it calls 
 put  ( not send! )  and node *A* dies horribly, its core
 file spattering the hard disk.

 Node B is unaware of the carnage it has caused, sedated
 by a sleep loop, tragically still expecting to call send
 and start talking to its partner, node A.


( see attached -- if you dare. )



  

node.tar.gz
Description: application/compressed-tar


[jira] [Updated] (PROTON-225) Redesign Transport interface such that Transport owns the in/out buffers rather than its client

2013-02-19 Thread Philip Harvey (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Harvey updated PROTON-225:
-

Description: 
This issue is intended to cover the Transport API redesign proposed on the 
mailing list 
(http://qpid.2158936.n2.nabble.com/transport-interface-changes-td7588099.html) 
as part of discussions around PROTON-222.  The redesign is being tracked under 
this new because we probably want to implement it on a different timescale to 
the PROTON-222 bug fix.

When refactoring the Java implementation, we should consider if the point when 
the sent/received protocol logging is done should be changed.

  was:This issue is intended to cover the Transport API redesign proposed on 
the mailing list 
(http://qpid.2158936.n2.nabble.com/transport-interface-changes-td7588099.html) 
as part of discussions around PROTON-222.  The redesign is being tracked under 
this new because we probably want to implement it on a different timescale to 
the PROTON-222 bug fix.


> Redesign Transport interface such that Transport owns the in/out buffers 
> rather than its client
> ---
>
> Key: PROTON-225
> URL: https://issues.apache.org/jira/browse/PROTON-225
> Project: Qpid Proton
>  Issue Type: Improvement
>Affects Versions: 0.3
>Reporter: Philip Harvey
>Assignee: Ken Giusti
> Fix For: 0.5
>
>
> This issue is intended to cover the Transport API redesign proposed on the 
> mailing list 
> (http://qpid.2158936.n2.nabble.com/transport-interface-changes-td7588099.html)
>  as part of discussions around PROTON-222.  The redesign is being tracked 
> under this new because we probably want to implement it on a different 
> timescale to the PROTON-222 bug fix.
> When refactoring the Java implementation, we should consider if the point 
> when the sent/received protocol logging is done should be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: How to observe connection loss via the messenger API

2013-02-19 Thread Rafael Schloming
There isn't currently a way to get any notification of connection loss
per/se. You might be able to accomplish what you want by checking the
status of the incoming message transfers. What is it you would do based on
the notification?

--Rafael

On Sun, Feb 17, 2013 at 6:43 PM, Bozo Dragojevic  wrote:

> If I kill send.c while it's sending the messages then recv output might
> look like this:
>
> 
> $ ./recv
> .
> 1361144306.723531
> Address: amqp://0.0.0.0
> Subject: Greetings from send 24136
> Content: "Hello World!"
> ### engine.c:1395 pn_do_error ERROR ### transport-5 ERROR
> amqp:connection:framing-error connection aborted
>
> [0x192a910:0] ERROR[-2] connection aborted
> CONNECTION ERROR connection aborted
>
> 
>
> So proton internally does detect connection error.
> Is there a way to get these notifications via pn_messenger_* ?
>
> thanks,
> Bozzo
>
>
>
>