Re: [infinispan-dev] Alert from a failing test

2011-07-04 Thread Galder Zamarreño

On Jun 28, 2011, at 6:58 PM, Dan Berindei wrote:

 On Tue, Jun 28, 2011 at 2:05 PM, Galder Zamarreño gal...@redhat.com wrote:
 Some comments below:
 
 On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote:
 
 On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero sa...@infinispan.org 
 wrote:
 2011/6/20 Manik Surtani ma...@jboss.org:
 Oddly enough, I don't see any other tests exhibiting this behaviour.  Let 
 me know if you see it in more recent CI runs, and we'll investigate in 
 detail.
 
 In fact there are not many tests in core which verify a full stream is
 received; but as in another thread I mentioned I was seeing the
 following exception relatively often (it never caught my attention
 some months ago)
 
 Caused by: java.io.EOFException: The stream ended unexpectedly.
 Please check whether the source of the stream encountered any issues
 generating the stream.
at 
 org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193)
at 
 org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218)
at 
 org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245)
at 
 org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284)
... 27 more
 Caused by: java.io.EOFException: Read past end of file
at 
 org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126)
at 
 org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263)
at 
 org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224)
at 
 org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
at 
 org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
at 
 org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191)
at 
 org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191)
... 30 more
 
 
 The line at 
 org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
 suggests EOF has been reached while reading the lead byte of the
 object, not a partial object. This is consistent with
 StateTransferManagerImpl.generateTransactionLog() getting a timeout
 while trying to acquire the processing lock (at
 StateTransferManagerImpl.java:192) and closing the stream in the
 middle of the transaction log, not a transmission error. We could
 probably get rid of the exception in the logs by inserting another
 delimiter here.
 
 You mean inserting a particular delimiter in case of a timeout acquiring the 
 processing lock? That sounds like a good idea. Just back from holidays and 
 can't remember well, but where previous EOFs related to processing lock 
 acquisition? If so, I think your idea makes even more sense cos it'd fall 
 into a possible known problem and could provide the receiver side with that 
 bit more of information.
 
 
 I'm pretty sure all the EOFs we got are related to processing lock
 acquisition, yes.

https://issues.jboss.org/browse/ISPN-1216

 
 
 Back to the original problem, if this was a stream corruption issue
 I'd expect a lot more instances of deserializing errors because the
 length of the buffer was smaller/larger than the number of bytes
 following it and the next object to be deserialized from the stream
 found garbage instead.
 
 This looks to me more like an index segment has been created with size
 x on node A and also on node B, then it was updated with size y  x on
 node A but only the metadata got to node B, the segment's byte array
 remained the same.
 
 I don't know anything about the Lucene directory implementation yet,
 so I have no idea if/how this could happen and I haven't been able to
 reproduce it on my machine. Is there a way to see the Jenkins test
 logs?
 
 There's the console log in http://goo.gl/JFh5R but if this happens 
 relatively often in the Lucene dir impl, we could create a small jenkins run 
 and pass a -Dlog4j.configuration configuring TRACE to be printed on console. 
 The testsuite is small and would not generate a lot logging. There's always 
 a change of not encountering the issue when TRACE is enabled, particularly 
 if it's a race condition, but I think it's worth doing. IOW, I can set it up.
 
 
 It would be great if you could set it up. It's not happening on every
 run though, so is it possible to configure jenkins to repeat it until
 it gets a failure?

https://infinispan.ci.cloudbees.com/job/Infinispan-lucene-master-TRACE/4/

That job's running the lucene directory testsuite with an log4j configuration 
file with TRACE enabled on org.infinispan

So far I've been able to direct that to the console but I'm checking the CB 
guys to see if that can be directed to a separate file.

The log4j 

Re: [infinispan-dev] Alert from a failing test

2011-06-28 Thread Galder Zamarreño
Some comments below:

On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote:

 On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero sa...@infinispan.org 
 wrote:
 2011/6/20 Manik Surtani ma...@jboss.org:
 Oddly enough, I don't see any other tests exhibiting this behaviour.  Let 
 me know if you see it in more recent CI runs, and we'll investigate in 
 detail.
 
 In fact there are not many tests in core which verify a full stream is
 received; but as in another thread I mentioned I was seeing the
 following exception relatively often (it never caught my attention
 some months ago)
 
 Caused by: java.io.EOFException: The stream ended unexpectedly.
 Please check whether the source of the stream encountered any issues
 generating the stream.
at 
 org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193)
at 
 org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218)
at 
 org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245)
at 
 org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284)
... 27 more
 Caused by: java.io.EOFException: Read past end of file
at 
 org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126)
at 
 org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263)
at 
 org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224)
at 
 org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
at 
 org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
at 
 org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191)
at 
 org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191)
... 30 more
 
 
 The line at 
 org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
 suggests EOF has been reached while reading the lead byte of the
 object, not a partial object. This is consistent with
 StateTransferManagerImpl.generateTransactionLog() getting a timeout
 while trying to acquire the processing lock (at
 StateTransferManagerImpl.java:192) and closing the stream in the
 middle of the transaction log, not a transmission error. We could
 probably get rid of the exception in the logs by inserting another
 delimiter here.

You mean inserting a particular delimiter in case of a timeout acquiring the 
processing lock? That sounds like a good idea. Just back from holidays and 
can't remember well, but where previous EOFs related to processing lock 
acquisition? If so, I think your idea makes even more sense cos it'd fall into 
a possible known problem and could provide the receiver side with that bit more 
of information.

 
 Back to the original problem, if this was a stream corruption issue
 I'd expect a lot more instances of deserializing errors because the
 length of the buffer was smaller/larger than the number of bytes
 following it and the next object to be deserialized from the stream
 found garbage instead.
 
 This looks to me more like an index segment has been created with size
 x on node A and also on node B, then it was updated with size y  x on
 node A but only the metadata got to node B, the segment's byte array
 remained the same.
 
 I don't know anything about the Lucene directory implementation yet,
 so I have no idea if/how this could happen and I haven't been able to
 reproduce it on my machine. Is there a way to see the Jenkins test
 logs?

There's the console log in http://goo.gl/JFh5R but if this happens relatively 
often in the Lucene dir impl, we could create a small jenkins run and pass a 
-Dlog4j.configuration configuring TRACE to be printed on console. The testsuite 
is small and would not generate a lot logging. There's always a change of not 
encountering the issue when TRACE is enabled, particularly if it's a race 
condition, but I think it's worth doing. IOW, I can set it up.

 
 Dan
 
 
 This looks like a suspicious correlation to me, as I think the
 reported errors are similar in nature.
 
 Cheers,
 Sanne
 
 
 
 
 On 18 Jun 2011, at 20:18, Sanne Grinovero wrote:
 
 Hello all,
 I'm not in state to fully debug the issue this week, but even though
 this failure happens in the Lucene Directory it looks like it's
 reporting an issue with Infinispan core:
 
 https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/
 
 In this test we're writing to the index, and then asserting on the
 expected state on both nodes, but while it is successful on the same
 node as the writer, it fails with
 

Re: [infinispan-dev] Alert from a failing test

2011-06-21 Thread Dan Berindei
On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero sa...@infinispan.org wrote:
 2011/6/20 Manik Surtani ma...@jboss.org:
 Oddly enough, I don't see any other tests exhibiting this behaviour.  Let me 
 know if you see it in more recent CI runs, and we'll investigate in detail.

 In fact there are not many tests in core which verify a full stream is
 received; but as in another thread I mentioned I was seeing the
 following exception relatively often (it never caught my attention
 some months ago)

 Caused by: java.io.EOFException: The stream ended unexpectedly.
 Please check whether the source of the stream encountered any issues
 generating the stream.
        at 
 org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193)
        at 
 org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218)
        at 
 org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245)
        at 
 org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284)
        ... 27 more
 Caused by: java.io.EOFException: Read past end of file
        at 
 org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126)
        at 
 org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263)
        at 
 org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224)
        at 
 org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
        at 
 org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
        at 
 org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191)
        at 
 org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191)
        ... 30 more


The line at 
org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
suggests EOF has been reached while reading the lead byte of the
object, not a partial object. This is consistent with
StateTransferManagerImpl.generateTransactionLog() getting a timeout
while trying to acquire the processing lock (at
StateTransferManagerImpl.java:192) and closing the stream in the
middle of the transaction log, not a transmission error. We could
probably get rid of the exception in the logs by inserting another
delimiter here.

Back to the original problem, if this was a stream corruption issue
I'd expect a lot more instances of deserializing errors because the
length of the buffer was smaller/larger than the number of bytes
following it and the next object to be deserialized from the stream
found garbage instead.

This looks to me more like an index segment has been created with size
x on node A and also on node B, then it was updated with size y  x on
node A but only the metadata got to node B, the segment's byte array
remained the same.

I don't know anything about the Lucene directory implementation yet,
so I have no idea if/how this could happen and I haven't been able to
reproduce it on my machine. Is there a way to see the Jenkins test
logs?

Dan


 This looks like a suspicious correlation to me, as I think the
 reported errors are similar in nature.

 Cheers,
 Sanne




 On 18 Jun 2011, at 20:18, Sanne Grinovero wrote:

 Hello all,
 I'm not in state to fully debug the issue this week, but even though
 this failure happens in the Lucene Directory it looks like it's
 reporting an issue with Infinispan core:

 https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/

 In this test we're writing to the index, and then asserting on the
 expected state on both nodes, but while it is successful on the same
 node as the writer, it fails with
 java.io.IOException: Read past EOF on the second node.

 This exception can mean only one thing: the value, which is a
 buffer[], was not completely transferred to the second node, which
 seems quite critical as the caches are using sync.
 I can't reproduce the error locally, but it's not the first time it is
 reported by CI: builds 60, 62, 65 for example (and more) show the same
 testcase fail in the same manner.

 Cheers,
 Sanne
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani

 Lead, Infinispan
 http://www.infinispan.org




 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev


 ___
 infinispan-dev mailing list
 

[infinispan-dev] Alert from a failing test

2011-06-18 Thread Sanne Grinovero
Hello all,
I'm not in state to fully debug the issue this week, but even though
this failure happens in the Lucene Directory it looks like it's
reporting an issue with Infinispan core:

https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/

In this test we're writing to the index, and then asserting on the
expected state on both nodes, but while it is successful on the same
node as the writer, it fails with
java.io.IOException: Read past EOF on the second node.

This exception can mean only one thing: the value, which is a
buffer[], was not completely transferred to the second node, which
seems quite critical as the caches are using sync.
I can't reproduce the error locally, but it's not the first time it is
reported by CI: builds 60, 62, 65 for example (and more) show the same
testcase fail in the same manner.

Cheers,
Sanne
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev