[jira] [Commented] (FLUME-3050) add error stats to monitor URL
[ https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896189#comment-15896189 ] Yuval Lifshitz commented on FLUME-3050: --- Hi Tristan, Yes you are right. If these are not already available then they will have to be added first. I guess that actually exposing them to the API would be the easy part. Regarding comparing attempts with successes, this might be tricky due to the nature of sampling at high rates - the numbers might not match even if there are no errors. So, triggering alarms based on that might have false positives. Will change the title. Thanks, Yuval > add error stats to monitor URL > -- > > Key: FLUME-3050 > URL: https://issues.apache.org/jira/browse/FLUME-3050 > Project: Flume > Issue Type: Improvement > Components: Channel, Shell, Sinks+Sources >Affects Versions: v1.7.0 >Reporter: Yuval Lifshitz > Labels: features > > currently error counters are not present when getting stats. for example: > {code} > > curl http://my-flume-host:4/metrics > {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}} > {code} > return only "good" stats for source, channel and sink. > to get error you need to look into the log file. this makes it hard to > integrate flume into automatic monitoring systems, NMS etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLUME-3050) add error stats to monitor URL
[ https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894321#comment-15894321 ] Tristan Stevens commented on FLUME-3050: Hi [~yuvalif] I think what you are asking here is for new counters to be added, rather than for existing counters not being present on the monitoring URL. At the moment, most counters are defined for Source, Sink and Channel as supertypes, and there are only a few specific overrides for counters (KafkaSink, KafkaChannel and KafkaSource are the only ones that I can see). You can deduce the failure rate by comparing the attempts with the successes, but otherwise you'd be in the realms of adding specific counters all over the sources and sinks that you've specified. It's certainly possible, but it would require a fair amount of re-working to get right. What you you think? By the way, at the very least we should re-name the JIRA to something like "Add Counters for error conditions" > add error stats to monitor URL > -- > > Key: FLUME-3050 > URL: https://issues.apache.org/jira/browse/FLUME-3050 > Project: Flume > Issue Type: Improvement > Components: Channel, Shell, Sinks+Sources >Affects Versions: v1.7.0 >Reporter: Yuval Lifshitz > Labels: features > > currently error counters are not present when getting stats. for example: > {code} > > curl http://my-flume-host:4/metrics > {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}} > {code} > return only "good" stats for source, channel and sink. > to get error you need to look into the log file. this makes it hard to > integrate flume into automatic monitoring systems, NMS etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLUME-3050) add error stats to monitor URL
[ https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892039#comment-15892039 ] Attila Simon commented on FLUME-3050: - Hi [~yuvalif], Thanks for your comments. Most of these metrics indeed look useful. I hope it will get picked soon. I think however picks it up can discuss further which ones are possible/makes sense and do the implementation based on that agreement. > add error stats to monitor URL > -- > > Key: FLUME-3050 > URL: https://issues.apache.org/jira/browse/FLUME-3050 > Project: Flume > Issue Type: Improvement > Components: Channel, Shell, Sinks+Sources >Affects Versions: v1.7.0 >Reporter: Yuval Lifshitz > Labels: features > > currently error counters are not present when getting stats. for example: > {code} > > curl http://my-flume-host:4/metrics > {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}} > {code} > return only "good" stats for source, channel and sink. > to get error you need to look into the log file. this makes it hard to > integrate flume into automatic monitoring systems, NMS etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLUME-3050) add error stats to monitor URL
[ https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888208#comment-15888208 ] Yuval Lifshitz commented on FLUME-3050: --- some more error counters to consider: * hdfs sink: error in file rotation * file channel: error is reading/writing to file > add error stats to monitor URL > -- > > Key: FLUME-3050 > URL: https://issues.apache.org/jira/browse/FLUME-3050 > Project: Flume > Issue Type: Improvement > Components: Channel, Shell, Sinks+Sources >Affects Versions: v1.7.0 >Reporter: Yuval Lifshitz > Labels: features > > currently error counters are not present when getting stats. for example: > {code} > > curl http://my-flume-host:4/metrics > {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}} > {code} > return only "good" stats for source, channel and sink. > to get error you need to look into the log file. this makes it hard to > integrate flume into automatic monitoring systems, NMS etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLUME-3050) add error stats to monitor URL
[ https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885378#comment-15885378 ] Yuval Lifshitz commented on FLUME-3050: --- Hi Attila, Thanks for looking into that. Some errors counters we though about: * taildir source: fail to read file * spooldir source: fail to read file; fail to delete file; file changed while reading * tcp source: I assume we don't drop messages, but apply pushback on socket if channels is full. But do we handle malformed messages? message too long? connection lost in the middle of a message? * hdfs sink: fail to write file; connectivity error; failovers * avro sink: fail to write event; connection error * kafka sink: not sure about specific errors but there could be some as well * avro interceptor: conversion failure, since this is based on kite sdk, we may need an interface that allow 3rd party to publish stats as well? having the above as counters and not as one time indicators in the log file is very helpful when integrating with NMS, and reporting systems. > add error stats to monitor URL > -- > > Key: FLUME-3050 > URL: https://issues.apache.org/jira/browse/FLUME-3050 > Project: Flume > Issue Type: Improvement > Components: Channel, Shell, Sinks+Sources >Affects Versions: v1.7.0 >Reporter: Yuval Lifshitz > Labels: features > > currently error counters are not present when getting stats. for example: > {code} > > curl http://my-flume-host:4/metrics > {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}} > {code} > return only "good" stats for source, channel and sink. > to get error you need to look into the log file. this makes it hard to > integrate flume into automatic monitoring systems, NMS etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLUME-3050) add error stats to monitor URL
[ https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878576#comment-15878576 ] Attila Simon commented on FLUME-3050: - Hi [~yuvalif], Could you please mention some example counters (or the full list would be even more helpful) you expect to see? > add error stats to monitor URL > -- > > Key: FLUME-3050 > URL: https://issues.apache.org/jira/browse/FLUME-3050 > Project: Flume > Issue Type: Improvement > Components: Channel, Shell, Sinks+Sources >Affects Versions: v1.7.0 >Reporter: Yuval Lifshitz > Labels: features > > currently error counters are not present when getting stats. for example: > {code} > > curl http://my-flume-host:4/metrics > {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}} > {code} > return only "good" stats for source, channel and sink. > to get error you need to look into the log file. this makes it hard to > integrate flume into automatic monitoring systems, NMS etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)