[jira] [Commented] (KAFKA-2861) system tests: grep logs for errors as part of validation

2015-11-19 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013227#comment-15013227
 ] 

Ewen Cheslack-Postava commented on KAFKA-2861:
--

[~geoffra] Of course, the problem with this is that if you intentionally 
trigger an error you'll also fail the test...

I've thought about this before, but it's really difficult to generically detect 
these issues -- log levels aren't good enough, trying to find stack traces 
(i.e. logging an exception) doesn't work, etc.

> system tests: grep logs for errors as part of validation
> 
>
> Key: KAFKA-2861
> URL: https://issues.apache.org/jira/browse/KAFKA-2861
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>
> There may be errors going on under the hood that validation steps do not 
> detect, but which are logged at the ERROR level by brokers or clients. We are 
> more likely to catch subtle issues if we pattern match the server log for 
> ERROR as part of validation, and fail the test in this case.
> For example, in https://issues.apache.org/jira/browse/KAFKA-2813, the error 
> is transient, so our test may pass; however, we still want this issue to be 
> visible.
> To avoid spurious failures, we would probably want to be able to have a 
> whitelist of acceptable errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2861) system tests: grep logs for errors as part of validation

2015-11-19 Thread Geoff Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014665#comment-15014665
 ] 

Geoff Anderson commented on KAFKA-2861:
---

[~ewencp] Good point, I'm not sure if this is workable generically.

This falls into the category of: how do we surface things or events that seem 
to be bad, but not bad in some expected way? How do we increase the probability 
that we'll catch anomalous behavior without creating false failures?



> system tests: grep logs for errors as part of validation
> 
>
> Key: KAFKA-2861
> URL: https://issues.apache.org/jira/browse/KAFKA-2861
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>
> There may be errors going on under the hood that validation steps do not 
> detect, but which are logged at the ERROR level by brokers or clients. We are 
> more likely to catch subtle issues if we pattern match the server log for 
> ERROR as part of validation, and fail the test in this case.
> For example, in https://issues.apache.org/jira/browse/KAFKA-2813, the error 
> is transient, so our test may pass; however, we still want this issue to be 
> visible.
> To avoid spurious failures, we would probably want to be able to have a 
> whitelist of acceptable errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2861) system tests: grep logs for errors as part of validation

2015-11-19 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015036#comment-15015036
 ] 

Ewen Cheslack-Postava commented on KAFKA-2861:
--

[~geoffra] Right. The whitelist could work, as long as you always have some 
escape hatch to get out of this mode entirely if matching all the errors that 
*could* happen becomes to onerous. The question is whether you can make the 
filtering of expected errors low cost enough that people don't immediately jump 
to the escape hatch as soon as they see one error

> system tests: grep logs for errors as part of validation
> 
>
> Key: KAFKA-2861
> URL: https://issues.apache.org/jira/browse/KAFKA-2861
> Project: Kafka
>  Issue Type: Bug
>Reporter: Geoff Anderson
>
> There may be errors going on under the hood that validation steps do not 
> detect, but which are logged at the ERROR level by brokers or clients. We are 
> more likely to catch subtle issues if we pattern match the server log for 
> ERROR as part of validation, and fail the test in this case.
> For example, in https://issues.apache.org/jira/browse/KAFKA-2813, the error 
> is transient, so our test may pass; however, we still want this issue to be 
> visible.
> To avoid spurious failures, we would probably want to be able to have a 
> whitelist of acceptable errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)