[jira] [Commented] (FLUME-3050) add error stats to monitor URL

2017-03-05 Thread Yuval Lifshitz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896189#comment-15896189
 ] 

Yuval Lifshitz commented on FLUME-3050:
---

Hi Tristan,
Yes you are right. If these are not already available then they will have to be 
added first. I guess that actually exposing them to the API would be the easy 
part.
Regarding comparing attempts with successes, this might be tricky due to the 
nature of sampling at high rates - the numbers might not match even if there 
are no errors. So, triggering alarms based on that might have false positives.
Will change the title.

Thanks,

Yuval

> add error stats to monitor URL
> --
>
> Key: FLUME-3050
> URL: https://issues.apache.org/jira/browse/FLUME-3050
> Project: Flume
>  Issue Type: Improvement
>  Components: Channel, Shell, Sinks+Sources
>Affects Versions: v1.7.0
>Reporter: Yuval Lifshitz
>  Labels: features
>
> currently error counters are not present when getting stats. for example:
> {code}
>  > curl http://my-flume-host:4/metrics
> {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}}
> {code}
> return only "good" stats for source, channel and sink.
> to get error you need to look into the log file. this makes it hard to 
> integrate flume into automatic monitoring systems, NMS etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLUME-3050) add error stats to monitor URL

2017-03-03 Thread Tristan Stevens (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894321#comment-15894321
 ] 

Tristan Stevens commented on FLUME-3050:


Hi [~yuvalif]
I think what you are asking here is for new counters to be added, rather than 
for existing counters not being present on the monitoring URL.

At the moment, most counters are defined for Source, Sink and Channel as 
supertypes, and there are only a few specific overrides for counters 
(KafkaSink, KafkaChannel and KafkaSource are the only ones that I can see).

You can deduce the failure rate by comparing the attempts with the successes, 
but otherwise you'd be in the realms of adding specific counters all over the 
sources and sinks that you've specified. It's certainly possible, but it would 
require a fair amount of re-working to get right.

What you you think?

By the way, at the very least we should re-name the JIRA to something like "Add 
Counters for error conditions"  

> add error stats to monitor URL
> --
>
> Key: FLUME-3050
> URL: https://issues.apache.org/jira/browse/FLUME-3050
> Project: Flume
>  Issue Type: Improvement
>  Components: Channel, Shell, Sinks+Sources
>Affects Versions: v1.7.0
>Reporter: Yuval Lifshitz
>  Labels: features
>
> currently error counters are not present when getting stats. for example:
> {code}
>  > curl http://my-flume-host:4/metrics
> {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}}
> {code}
> return only "good" stats for source, channel and sink.
> to get error you need to look into the log file. this makes it hard to 
> integrate flume into automatic monitoring systems, NMS etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLUME-3050) add error stats to monitor URL

2017-03-02 Thread Attila Simon (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892039#comment-15892039
 ] 

Attila Simon commented on FLUME-3050:
-

Hi [~yuvalif], 

Thanks for your comments. Most of these metrics indeed look useful. I hope it 
will get picked soon. I think however picks it up can discuss further which 
ones are possible/makes sense and do the implementation based on that agreement.

> add error stats to monitor URL
> --
>
> Key: FLUME-3050
> URL: https://issues.apache.org/jira/browse/FLUME-3050
> Project: Flume
>  Issue Type: Improvement
>  Components: Channel, Shell, Sinks+Sources
>Affects Versions: v1.7.0
>Reporter: Yuval Lifshitz
>  Labels: features
>
> currently error counters are not present when getting stats. for example:
> {code}
>  > curl http://my-flume-host:4/metrics
> {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}}
> {code}
> return only "good" stats for source, channel and sink.
> to get error you need to look into the log file. this makes it hard to 
> integrate flume into automatic monitoring systems, NMS etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLUME-3050) add error stats to monitor URL

2017-02-28 Thread Yuval Lifshitz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888208#comment-15888208
 ] 

Yuval Lifshitz commented on FLUME-3050:
---

some more error counters to consider:
* hdfs sink: error in file rotation
* file channel: error is reading/writing to file

> add error stats to monitor URL
> --
>
> Key: FLUME-3050
> URL: https://issues.apache.org/jira/browse/FLUME-3050
> Project: Flume
>  Issue Type: Improvement
>  Components: Channel, Shell, Sinks+Sources
>Affects Versions: v1.7.0
>Reporter: Yuval Lifshitz
>  Labels: features
>
> currently error counters are not present when getting stats. for example:
> {code}
>  > curl http://my-flume-host:4/metrics
> {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}}
> {code}
> return only "good" stats for source, channel and sink.
> to get error you need to look into the log file. this makes it hard to 
> integrate flume into automatic monitoring systems, NMS etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLUME-3050) add error stats to monitor URL

2017-02-27 Thread Yuval Lifshitz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885378#comment-15885378
 ] 

Yuval Lifshitz commented on FLUME-3050:
---

Hi Attila,
Thanks for looking into that. Some errors counters we though about:
* taildir source: fail to read file
* spooldir source: fail to read file; fail to delete file; file changed while 
reading
* tcp source: I assume we don't drop messages, but apply pushback on socket if 
channels is full. But do we handle malformed messages? message too long? 
connection lost in the middle of a message?
* hdfs sink: fail to write file; connectivity error; failovers
* avro sink: fail to write event; connection error
* kafka sink: not sure about specific errors but there could be some as well
* avro interceptor: conversion failure, since this is based on kite sdk, we may 
need an interface that allow 3rd party to publish stats as well?

having the above as counters and not as one time indicators in the log file is 
very helpful when integrating with NMS, and reporting systems.


> add error stats to monitor URL
> --
>
> Key: FLUME-3050
> URL: https://issues.apache.org/jira/browse/FLUME-3050
> Project: Flume
>  Issue Type: Improvement
>  Components: Channel, Shell, Sinks+Sources
>Affects Versions: v1.7.0
>Reporter: Yuval Lifshitz
>  Labels: features
>
> currently error counters are not present when getting stats. for example:
> {code}
>  > curl http://my-flume-host:4/metrics
> {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}}
> {code}
> return only "good" stats for source, channel and sink.
> to get error you need to look into the log file. this makes it hard to 
> integrate flume into automatic monitoring systems, NMS etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLUME-3050) add error stats to monitor URL

2017-02-22 Thread Attila Simon (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878576#comment-15878576
 ] 

Attila Simon commented on FLUME-3050:
-

Hi [~yuvalif],

Could you please mention some example counters (or the full list would be even 
more helpful) you expect to see?

> add error stats to monitor URL
> --
>
> Key: FLUME-3050
> URL: https://issues.apache.org/jira/browse/FLUME-3050
> Project: Flume
>  Issue Type: Improvement
>  Components: Channel, Shell, Sinks+Sources
>Affects Versions: v1.7.0
>Reporter: Yuval Lifshitz
>  Labels: features
>
> currently error counters are not present when getting stats. for example:
> {code}
>  > curl http://my-flume-host:4/metrics
> {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"100","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}}
> {code}
> return only "good" stats for source, channel and sink.
> to get error you need to look into the log file. this makes it hard to 
> integrate flume into automatic monitoring systems, NMS etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)