Re: QPid-proton cpp 0.21 - Crash

2018-04-17 Thread Chris Richardson
A couple of suggestions spring to mind - I've experienced problems with
timers in other libraries where a timer fires after (or indeed during) its
callback or associated data has been deleted, resulting in a segfault.
Could this be relevant? Probably capturing a core dump and inspecting with
gdb would be enlightening and would be my first port of call. Another
approach might be some code introspection to verify that timers are
cancelled and that their handlers have completed before relevant garbage
collection takes place. Rather general comments I'm afraid but I thought it
might be worth consideration.

Chris



On 17 April 2018 at 16:36, Baptiste Robert 
wrote:

> Hello,
>
> When I create a proton::container and use it, I have a crash when I delete
> the proton object:
>
> void pn_proactor_free(pn_proactor_t *p) {
> ->  DeleteTimerQueueEx(p->timer_queue, INVALID_HANDLE_VALUE);
>
> I'm using proton 0.21 compiled in CXX03 mode.
>
> Is anyone have an idea ?
>
> Thank you,
>
> Baptiste
>


Re: [VOTE] Release Apache Qpid Proton-J 0.27.0

2018-04-17 Thread Timothy Bish


I've added a fix in ea46607e776062c8555ef24a74c48a9b12bf2ca9 that I 
think sound resolve this.  Sadly we need to -1 this release candidate 
and spin a new one.


changing vote to -1

On 04/17/2018 06:49 PM, Keith W wrote:

With the 0.27.0 RC (and the Qpid JMS Client 0.31.0) I am noticing a
java.lang.ArrayIndexOutOfBoundsException when PN_TRACE_FRM=true, which
did not occur with 0.26.0.

My JMS code fragment (based on HelloWorld).

Context context = new InitialContext();
ConnectionFactory factory = (ConnectionFactory)
context.lookup("myFactoryLookup");
Destination queue = (Destination) context.lookup("myQueueLookup");
Connection connection =
factory.createConnection(System.getProperty("USER"),
System.getProperty("PASSWORD"));
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
MessageProducer messageProducer = session.createProducer(queue);
BytesMessage message = session.createBytesMessage();
message.writeBytes(new byte[261955 /*261954 wont produce the AIOOBE*/]);
messageProducer.send(message);

and the protocol trace/stack trace:

/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/bin/java
-DUSER=admin -DPASSWORD=admin "-javaagent:/Applications/IntelliJ
IDEA.app/Contents/lib/idea_rt.jar=53139:/Applications/IntelliJ
IDEA.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath

Re: [VOTE] Release Apache Qpid Proton-J 0.27.0

2018-04-17 Thread Keith W
With the 0.27.0 RC (and the Qpid JMS Client 0.31.0) I am noticing a
java.lang.ArrayIndexOutOfBoundsException when PN_TRACE_FRM=true, which
did not occur with 0.26.0.

My JMS code fragment (based on HelloWorld).

Context context = new InitialContext();
ConnectionFactory factory = (ConnectionFactory)
context.lookup("myFactoryLookup");
Destination queue = (Destination) context.lookup("myQueueLookup");
Connection connection =
factory.createConnection(System.getProperty("USER"),
System.getProperty("PASSWORD"));
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
MessageProducer messageProducer = session.createProducer(queue);
BytesMessage message = session.createBytesMessage();
message.writeBytes(new byte[261955 /*261954 wont produce the AIOOBE*/]);
messageProducer.send(message);

and the protocol trace/stack trace:

/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/bin/java
-DUSER=admin -DPASSWORD=admin "-javaagent:/Applications/IntelliJ
IDEA.app/Contents/lib/idea_rt.jar=53139:/Applications/IntelliJ
IDEA.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath

Re: Broker-J BDB JE High Availability Time Sync issue

2018-04-17 Thread Keith W
Hi Bryan

I haven't seen problem reports raised against Broker-J regarding this error.

The error actually originates from the HA feature within Berkeley BDB JE.
There's a useful reply within the following thread that explains some
background about how the maximum clock delta is used.

https://community.oracle.com/thread/1027853

For this problem to appear,  I see three possibilities:

1) The NTP synchronisation between the nodes was somehow ineffective
causing a genuine skew between the nodes
2) The processing BDB JE internal
com.sleepycat.je.rep.stream.Protocol.SNTPResponse was somehow delayed
causing a spurious exception to be reported on the master.  I am wondering
whether an unfortunately timed lengthy GC pause could cause this.
3) There is some kind of previously unknown defect in BDB JE HA itself.

To help you narrow in, I'd suggest changing the logging configuration so
you retain logs for longer so if you see a reoccurrence you have more to
investigate.   I also suggest turning GC logging with timestamps, at the
JVM level.

You could also consider raising je.rep.maxClockDelta.  Broker-J doesn’t
actually change the replica consistency policy settings from their
defaults, which includes the permitted clock delta setting.  A BDB HA node
never actually reads from the replicated environment until the node becomes
master.  This means the concerns described by
https://docs.oracle.com/cd/E17277_02/html/ReplicationGuide/consistency.html
don't actually apply.   (As an aside I note that
 NoConsistencyRequiredPolicy (rather the TimeConsistencyPolicy default)
should serve Broker-J's use case just as well - but this is not something I
have tried or investigated completely).

Hope this helps.

Keith Wall.


On Mon, 16 Apr 2018 at 18:54, Bryan Dixon  wrote:

> We are using Broker-J 7.0.2 and just ran into a Berkeley HA Time Sync issue
> that I'm wondering if anyone else has run into.  The stackTrace is at the
> end of this post.   We are running on Windows Server 2012 R2 6.3 amd64 and
> our JDK is Oracle Corporation 1.8.0_162-b12.  We have 3 servers as part of
> our HA setup.
>
> This error occurred in our production environment which has been live for
> just a couple of weeks.  We never ran into this in our Test or Dev
> environments that have been running for a few months.   When one of our
> admins checked the clock times of all 3 servers they were completely in
> sync.  Another admin stated that the server clock times are synced with
> NTP.
> Unfortunately our log files rolled off and I don't know exactly when this
> error first occurred because the older log file are gone.
>
> 2018-04-16 04:10:57,039 ERROR [Group-Change-Learner:prodbrok
> er:prodbroker2]
> (o.a.q.s.u.ServerScopedRuntimeException) - Exception on master check
> com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must
> be
> closed, caused by: com.sleepycat.je.EnvironmentFailureException:
> Environment
> invalid because of previous exception: (JE 7.4.5)
> prodbroker2(2):D:\qpidwork\prodbroker2\config Clock delta: 8859 ms.
> between
> Feeder: prodbroker1 and this Replica exceeds max permissible delta: 2000
> ms.
> HANDSHAKE_ERROR: Error during the handshake between two nodes. Some
> validity
> or compatibility check failed, preventing further communication between the
> nodes. Environment is invalid and must be closed. Originally thrown by HA
> thread: UNKNOWN prodbroker2(2) Originally thrown by HA thread: UNKNOWN
> prodbroker2(2)
> at
> com.sleepycat.je.EnvironmentFailureException.wrapSelf(Enviro
> nmentFailureException.java:228)
> at
> com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(Environm
> entImpl.java:1766)
> at
> com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1775)
> at com.sleepycat.je.Environment.checkOpen(Environment.java:2473)
> at com.sleepycat.je.DbInternal.checkOpen(DbInternal.java:105)
> at
> com.sleepycat.je.rep.ReplicatedEnvironment.checkOpen(Replica
> tedEnvironment.java:1052)
> at
> com.sleepycat.je.rep.ReplicatedEnvironment.getState(Replicat
> edEnvironment.java:764)
> at
> org.apache.qpid.server.store.berkeleydb.replication.Replicat
> edEnvironmentFacade$RemoteNodeStateLearner.executeDatabasePi
> ngerOnNodeChangesIfMaster(ReplicatedEnvironmentFacade.java:2276)
> at
> org.apache.qpid.server.store.berkeleydb.replication.Replicat
> edEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEn
> vironmentFacade.java:2042)
> at
> org.apache.qpid.server.store.berkeleydb.replication.Replicat
> edEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEn
> vironmentFacade.java:2012)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
> tureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
> 

Re: [VOTE] Release Apache Qpid Proton-J 0.27.0

2018-04-17 Thread Gordon Sim

On 13/04/18 20:03, Robbie Gemmell wrote:

Hi folks,

I have put together a spin for a Qpid Proton-J 0.27.0 release, please
test it and vote accordingly.


+1, built from source including all tests.

-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org



Re: Proposed Feature Removal from Dispatch Router

2018-04-17 Thread Gordon Sim

On 17/04/18 14:24, Ken Giusti wrote:

After thinking about the comments on this thread and
spending some time researching various "reliable multicast" solutions
I've come to the conclusion that we *should* allow multicasting
unsettled messages by having the ingress router provide the settlement.

So I vote for Ted's proposal to back out the feature.


I certainly don't think the router should be trying to implement 
reliable multicast.


The original goal for the feature in question, as described in Ted's 
post, was to /inform users/ that a naive assumption about settlement was 
not accurate (and could not be accurate).


I think Ted is right that the feature did not reach that goal. It is not 
sufficiently clear what is going on and on balance doesn't I think 
justify the potential annoyance.


However I do think that some other solution to the problem of informing 
users would be worthwhile. This could be a warning in the log the first 
time a link sends an unsettled message to a multicast address (along 
with a section in the doc that clarifies the way things work).


(As a separate issue, I do also think that the way in which multicast 
messages are dropped when necessary could be improved upon, e.g. looking 
at ttl, priority, settled state etc when decide what to drop).



-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org



QPid-proton cpp 0.21 - Crash

2018-04-17 Thread Baptiste Robert
Hello,

When I create a proton::container and use it, I have a crash when I delete
the proton object:

void pn_proactor_free(pn_proactor_t *p) {
->  DeleteTimerQueueEx(p->timer_queue, INVALID_HANDLE_VALUE);

I'm using proton 0.21 compiled in CXX03 mode.

Is anyone have an idea ?

Thank you,

Baptiste


Re: Proposed Feature Removal from Dispatch Router

2018-04-17 Thread Ken Giusti
Apologies for thread-jacking Ted's original email thread.

After thinking about the comments on this thread and
spending some time researching various "reliable multicast" solutions
I've come to the conclusion that we *should* allow multicasting
unsettled messages by having the ingress router provide the settlement.

So I vote for Ted's proposal to back out the feature.

Reasoning:
1) From my brief research on existing 'reliable multicast'  in the IP space
I think there's way too much complexity and state involved [0].

2) Having the ingress router settle the delivery locally is equivalent
to the way a traditional broker-based messaging bus handles multicast -
minus the store functionality of course.

In the broker's case a settlement merely indicates that the message bus has
taken ownership of the message.  Local settlement by the ingress router means
the same thing.

[0] 
https://www.cisco.com/c/en/us/about/press/internet-protocol-journal/back-issues/table-contents-19/reliable-multicast.html

On Thu, Apr 12, 2018 at 6:20 PM, Ted Ross  wrote:
> We added a feature back in 1.0.0 to reject unsettled deliveries to
> multicast addresses by default.  This can be disabled through
> configuration but is on by default.
>
> The rationale was that the router would accept and settle unsettled
> multicasts even though it might not have delivered the messages to any
> consumer.  The rejection with error code was intended to inform users
> that they should pre-settle deliveries to multicast addresses in
> keeping with the best-effort nature of multicast routing.
>
> In practice, this is more of an annoyance because none of the example
> clients (and apparently the users' clients) actually do anything with
> the error code in the rejected delivery.  The router appears to
> silently drop such messages for no good reason and good will is wasted
> in chasing down the issue to "oh, you should turn off this handy
> feature".
>
> The recently raised https://issues.apache.org/jira/browse/DISPATCH-966
> is caused by this feature as well.  This is because the router can
> stream large messages in multiple transfers.  The first transfer is
> used for routing and the last transfer should be used to determine the
> settlement status of the delivery.  It is not a trivial fix to make
> this work correctly.
>
> For the above two reasons, I propose that we back out this feature and
> allow multicasting with unsettled deliveries.  We should add a clear
> note in the documentation that states that multicast is best-effort,
> regardless of the settlement status of the deliveries.
>
> Any objections from the users?
>
> -Ted
>
> -
> To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
> For additional commands, e-mail: users-h...@qpid.apache.org
>



-- 
-K

-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org