Re: Aggregator in CEP

2018-08-24 Thread antonio saldivar
hello thank you very much
I took a look on the link but now how can I check the conditions to get
aggregator results?

El vie., 24 ago. 2018 a las 5:27, aitozi () escribió:

> Hi,
>
> Now that it still not support the aggregator function in cep
> iterativeCondition. Now may be you need to check the condition by yourself
> to get the aggregator result. I will work for this these day, you can take
> a
> look on this issue:
>
> https://issues.apache.org/jira/projects/FLINK/issues/FLINK-9507
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Aggregator in CEP

2018-08-24 Thread antonio saldivar
Hello

I am developing an application where I use Flink(v 1.4.2) CEP , is there
any aggregation function to match cumulative amounts or counts in a
IterativeCondition within a period of time for a KeyBy elements?

if a cumulative amount reaches thresholds fire a result

Thank you
Regards


Re: processWindowFunction

2018-08-20 Thread antonio saldivar
Maybe the usage of that function change, now I have to use it as this [1]


[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#the-keyedprocessfunction

El lun., 20 ago. 2018 a las 5:56, vino yang ()
escribió:

> Hi antonio,
>
> Oh, if you can't use KeyedProcessFunction, then this would be a pity.
> Then you can use MapState, where Key is used to store the key of your
> partition.
> But I am not sure if this will achieve the effect you want.
>
> Thanks, vino.
>
> antonio saldivar  于2018年8月20日周一 下午4:32写道:
>
>> Hello
>>
>> Thank you for the information, for some reason this KeyedProcessFunction
>> is not found in my Flink version 1.4.2 I can only find ProcessFunction and
>> work like this
>>
>> public class TxnProcessFn extends ProcessFunction {
>>
>>  public void open(Configuration parameters) throws Exception {
>>
>> state1 = getRuntimeContext().getState(new ValueStateDescriptor<>(
>> "objState1", Object.class));
>>
>> state2 = getRuntimeContext().getState(new ValueStateDescriptor<>(
>> "objState2", Object.class));
>>
>> state3 = getRuntimeContext().getState(new ValueStateDescriptor<>(
>> "objState3", Object.class));
>>
>> }
>>
>> @Override
>>
>> public void processElement(
>>
>> Object obj,
>>
>> Context ctx,
>>
>> Collector out) throws Exception {
>>
>> // TODO Auto-generated method stub
>>
>> Object current = state.value();
>>
>> if (current == null) {
>>
>> current = new Object();
>>
>> current.id=obj.id();
>>
>>
>>
>> }
>>
>> }
>>
>> El lun., 20 ago. 2018 a las 2:24, vino yang ()
>> escribió:
>>
>>> Hi antonio,
>>>
>>> First, I suggest you use KeyedProcessFunction if you have an operation
>>> similar to keyBy.
>>> The implementation is similar to the Fixed window.
>>> You can create three state collections to determine whether the time of
>>> each element belongs to a state collection.
>>> At the time of the trigger, the elements in the collection are evaluated.
>>>
>>> Thanks, vino.
>>>
>>> antonio saldivar  于2018年8月20日周一 上午11:54写道:
>>>
>>>> Thank you fro the references
>>>>
>>>> I have now my processFunction and getting the state but now how can i
>>>> do for the threshold times to group the elements and also as this is a
>>>> global window, how to purge because if going to keep increasing
>>>>
>>>> El dom., 19 ago. 2018 a las 8:57, vino yang ()
>>>> escribió:
>>>>
>>>>> Hi antonio,
>>>>>
>>>>> Regarding your scenario, I think maybe you can consider using the
>>>>> ProcessFunction (or keyed ProcessFunction) function directly on the 
>>>>> Stream.
>>>>> [1]
>>>>> It can handle each of your elements with a Timer, and you can combine
>>>>> Flink's state API[2] to store your data.
>>>>>
>>>>> [1]:
>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#process-function-low-level-operations
>>>>> [2]:
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#working-with-state
>>>>>
>>>>> Thanks, vino.
>>>>>
>>>>> antonio saldivar  于2018年8月19日周日 上午10:18写道:
>>>>>
>>>>>> hi Vino
>>>>>>
>>>>>> it is possible to use global window, then set the trigger onElement
>>>>>> comparing the element that has arrived with for example 10 mins, 20 mins
>>>>>> and 60 mins of data?
>>>>>>
>>>>>> I have rules evaluating sum of amount for 10,20 or 60 mins for the
>>>>>> same keyed element if the same id sum like $200 total within those
>>>>>> thresholds and count more or equals to 3 I need to be able to set some
>>>>>> values to the object if the object does not reach those thresholds i do 
>>>>>> not
>>>>>> set the values and keep sending the output with or without those value.
>>>>>>
>>>>>> just processing the object on the fly and send output
>>>>>>
>>>>>>
>>>>>>
>>>>>>
&g

Re: processWindowFunction

2018-08-20 Thread antonio saldivar
Hello

Thank you for the information, for some reason this KeyedProcessFunction is
not found in my Flink version 1.4.2 I can only find ProcessFunction and
work like this

public class TxnProcessFn extends ProcessFunction {

 public void open(Configuration parameters) throws Exception {

state1 = getRuntimeContext().getState(new ValueStateDescriptor<>("objState1",
Object.class));

state2 = getRuntimeContext().getState(new ValueStateDescriptor<>("objState2",
Object.class));

state3 = getRuntimeContext().getState(new ValueStateDescriptor<>("objState3",
Object.class));

}

@Override

public void processElement(

Object obj,

Context ctx,

Collector out) throws Exception {

// TODO Auto-generated method stub

Object current = state.value();

if (current == null) {

current = new Object();

current.id=obj.id();



}

}

El lun., 20 ago. 2018 a las 2:24, vino yang ()
escribió:

> Hi antonio,
>
> First, I suggest you use KeyedProcessFunction if you have an operation
> similar to keyBy.
> The implementation is similar to the Fixed window.
> You can create three state collections to determine whether the time of
> each element belongs to a state collection.
> At the time of the trigger, the elements in the collection are evaluated.
>
> Thanks, vino.
>
> antonio saldivar  于2018年8月20日周一 上午11:54写道:
>
>> Thank you fro the references
>>
>> I have now my processFunction and getting the state but now how can i do
>> for the threshold times to group the elements and also as this is a global
>> window, how to purge because if going to keep increasing
>>
>> El dom., 19 ago. 2018 a las 8:57, vino yang ()
>> escribió:
>>
>>> Hi antonio,
>>>
>>> Regarding your scenario, I think maybe you can consider using the
>>> ProcessFunction (or keyed ProcessFunction) function directly on the Stream.
>>> [1]
>>> It can handle each of your elements with a Timer, and you can combine
>>> Flink's state API[2] to store your data.
>>>
>>> [1]:
>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#process-function-low-level-operations
>>> [2]:
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#working-with-state
>>>
>>> Thanks, vino.
>>>
>>> antonio saldivar  于2018年8月19日周日 上午10:18写道:
>>>
>>>> hi Vino
>>>>
>>>> it is possible to use global window, then set the trigger onElement
>>>> comparing the element that has arrived with for example 10 mins, 20 mins
>>>> and 60 mins of data?
>>>>
>>>> I have rules evaluating sum of amount for 10,20 or 60 mins for the same
>>>> keyed element if the same id sum like $200 total within those thresholds
>>>> and count more or equals to 3 I need to be able to set some values to the
>>>> object if the object does not reach those thresholds i do not set the
>>>> values and keep sending the output with or without those value.
>>>>
>>>> just processing the object on the fly and send output
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> El vie., 17 ago. 2018 a las 22:14, vino yang ()
>>>> escribió:
>>>>
>>>>> Hi antonio,
>>>>>
>>>>> Yes, ProcessWindowFunction is a very low level window function.
>>>>> It allows you to access the data in the window and allows you to
>>>>> customize the output of the window.
>>>>> So if you use it, while giving you flexibility, you need to think
>>>>> about other things, which may require you to write more processing logic.
>>>>>
>>>>> Generally speaking, sliding windows usually have some data that is
>>>>> repeated, but a common mode is to apply a reduce function on it to get 
>>>>> your
>>>>> calculation results.
>>>>> If you only send data, there will definitely be some duplication.
>>>>>
>>>>> Thanks, vino.
>>>>>
>>>>> antonio saldivar  于2018年8月17日周五 下午12:01写道:
>>>>>
>>>>>> Hi Vino
>>>>>> thank you for the information, actually I am using a trigger alert
>>>>>> and processWindowFunction to send my results, but when my window slides 
>>>>>> or
>>>>>> ends it sends again the objects and I an getting duplicated data
>>>>>>
>>>>>

Re: processWindowFunction

2018-08-19 Thread antonio saldivar
Thank you fro the references

I have now my processFunction and getting the state but now how can i do
for the threshold times to group the elements and also as this is a global
window, how to purge because if going to keep increasing

El dom., 19 ago. 2018 a las 8:57, vino yang ()
escribió:

> Hi antonio,
>
> Regarding your scenario, I think maybe you can consider using the
> ProcessFunction (or keyed ProcessFunction) function directly on the Stream.
> [1]
> It can handle each of your elements with a Timer, and you can combine
> Flink's state API[2] to store your data.
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#process-function-low-level-operations
> [2]:
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#working-with-state
>
> Thanks, vino.
>
> antonio saldivar  于2018年8月19日周日 上午10:18写道:
>
>> hi Vino
>>
>> it is possible to use global window, then set the trigger onElement
>> comparing the element that has arrived with for example 10 mins, 20 mins
>> and 60 mins of data?
>>
>> I have rules evaluating sum of amount for 10,20 or 60 mins for the same
>> keyed element if the same id sum like $200 total within those thresholds
>> and count more or equals to 3 I need to be able to set some values to the
>> object if the object does not reach those thresholds i do not set the
>> values and keep sending the output with or without those value.
>>
>> just processing the object on the fly and send output
>>
>>
>>
>>
>>
>>
>>
>> El vie., 17 ago. 2018 a las 22:14, vino yang ()
>> escribió:
>>
>>> Hi antonio,
>>>
>>> Yes, ProcessWindowFunction is a very low level window function.
>>> It allows you to access the data in the window and allows you to
>>> customize the output of the window.
>>> So if you use it, while giving you flexibility, you need to think about
>>> other things, which may require you to write more processing logic.
>>>
>>> Generally speaking, sliding windows usually have some data that is
>>> repeated, but a common mode is to apply a reduce function on it to get your
>>> calculation results.
>>> If you only send data, there will definitely be some duplication.
>>>
>>> Thanks, vino.
>>>
>>> antonio saldivar  于2018年8月17日周五 下午12:01写道:
>>>
>>>> Hi Vino
>>>> thank you for the information, actually I am using a trigger alert and
>>>> processWindowFunction to send my results, but when my window slides or ends
>>>> it sends again the objects and I an getting duplicated data
>>>>
>>>> El jue., 16 ago. 2018 a las 22:05, vino yang ()
>>>> escribió:
>>>>
>>>>> Hi Antonio,
>>>>>
>>>>> What results do not you want to get when creating each window?
>>>>> Examples of the use of ProcessWindowFunction are included in many test
>>>>> files in Flink's project, such as SideOutputITCase.scala or
>>>>> WindowTranslationTest.scala.
>>>>>
>>>>> For more information on ProcessWindowFunction, you can refer to the
>>>>> official website.[1]
>>>>>
>>>>> [1]:
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#processwindowfunction
>>>>>
>>>>> Thanks, vino.
>>>>>
>>>>> antonio saldivar  于2018年8月17日周五 上午6:24写道:
>>>>>
>>>>>> Hello
>>>>>>
>>>>>> I am implementing a data stream where I use sliding windows but I am
>>>>>> stuck because I need to set values to my object based on some if 
>>>>>> statements
>>>>>> in my process function  and send the object to the next step but I don't
>>>>>> want results every time a window is creating
>>>>>>
>>>>>> if anyone has a good example on this that can help me
>>>>>>
>>>>>


Re: processWindowFunction

2018-08-18 Thread antonio saldivar
hi Vino

it is possible to use global window, then set the trigger onElement
comparing the element that has arrived with for example 10 mins, 20 mins
and 60 mins of data?

I have rules evaluating sum of amount for 10,20 or 60 mins for the same
keyed element if the same id sum like $200 total within those thresholds
and count more or equals to 3 I need to be able to set some values to the
object if the object does not reach those thresholds i do not set the
values and keep sending the output with or without those value.

just processing the object on the fly and send output







El vie., 17 ago. 2018 a las 22:14, vino yang ()
escribió:

> Hi antonio,
>
> Yes, ProcessWindowFunction is a very low level window function.
> It allows you to access the data in the window and allows you to customize
> the output of the window.
> So if you use it, while giving you flexibility, you need to think about
> other things, which may require you to write more processing logic.
>
> Generally speaking, sliding windows usually have some data that is
> repeated, but a common mode is to apply a reduce function on it to get your
> calculation results.
> If you only send data, there will definitely be some duplication.
>
> Thanks, vino.
>
> antonio saldivar  于2018年8月17日周五 下午12:01写道:
>
>> Hi Vino
>> thank you for the information, actually I am using a trigger alert and
>> processWindowFunction to send my results, but when my window slides or ends
>> it sends again the objects and I an getting duplicated data
>>
>> El jue., 16 ago. 2018 a las 22:05, vino yang ()
>> escribió:
>>
>>> Hi Antonio,
>>>
>>> What results do not you want to get when creating each window?
>>> Examples of the use of ProcessWindowFunction are included in many test
>>> files in Flink's project, such as SideOutputITCase.scala or
>>> WindowTranslationTest.scala.
>>>
>>> For more information on ProcessWindowFunction, you can refer to the
>>> official website.[1]
>>>
>>> [1]:
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#processwindowfunction
>>>
>>> Thanks, vino.
>>>
>>> antonio saldivar  于2018年8月17日周五 上午6:24写道:
>>>
>>>> Hello
>>>>
>>>> I am implementing a data stream where I use sliding windows but I am
>>>> stuck because I need to set values to my object based on some if statements
>>>> in my process function  and send the object to the next step but I don't
>>>> want results every time a window is creating
>>>>
>>>> if anyone has a good example on this that can help me
>>>>
>>>


Re: processWindowFunction

2018-08-16 Thread antonio saldivar
Hi Vino
thank you for the information, actually I am using a trigger alert and
processWindowFunction to send my results, but when my window slides or ends
it sends again the objects and I an getting duplicated data

El jue., 16 ago. 2018 a las 22:05, vino yang ()
escribió:

> Hi Antonio,
>
> What results do not you want to get when creating each window?
> Examples of the use of ProcessWindowFunction are included in many test
> files in Flink's project, such as SideOutputITCase.scala or
> WindowTranslationTest.scala.
>
> For more information on ProcessWindowFunction, you can refer to the
> official website.[1]
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#processwindowfunction
>
> Thanks, vino.
>
> antonio saldivar  于2018年8月17日周五 上午6:24写道:
>
>> Hello
>>
>> I am implementing a data stream where I use sliding windows but I am
>> stuck because I need to set values to my object based on some if statements
>> in my process function  and send the object to the next step but I don't
>> want results every time a window is creating
>>
>> if anyone has a good example on this that can help me
>>
>


processWindowFunction

2018-08-16 Thread antonio saldivar
Hello

I am implementing a data stream where I use sliding windows but I am stuck
because I need to set values to my object based on some if statements in my
process function  and send the object to the next step but I don't want
results every time a window is creating

if anyone has a good example on this that can help me


Re: Flink Rebalance

2018-08-10 Thread antonio saldivar
Hi Fabian

Thank you, yes there are just map functions, i will do it that way with
methods to get it faster

On Fri, Aug 10, 2018, 5:58 AM Fabian Hueske  wrote:

> Hi,
>
> Elias and Paul have good points.
> I think the performance degradation is mostly to the lack of function
> chaining in the rebalance case.
>
> If all steps are just map functions, they can be chained in the
> no-rebalance case.
> That means, records are passed via function calls.
> If you add rebalancing, records will be passed between map functions via
> serialization, network transfer, and deserialization.
> This is of course much more expensive than calling a method.
>
> Best, Fabian
>
> 2018-08-10 4:25 GMT+02:00 Paul Lam :
>
>> Hi Antonio,
>>
>> AFAIK, there are two reasons for this:
>>
>> 1. Rebalancing itself brings latency because it takes time to
>> redistribute the elements.
>> 2. Rebalancing also messes up the order in the Kafka topic partitions,
>> and often makes a event-time window wait longer to trigger in case you’re
>> using event time characteristic.
>>
>> Best Regards,
>> Paul Lam
>>
>>
>>
>> 在 2018年8月10日,05:49,antonio saldivar  写道:
>>
>> Hello
>>
>> Sending ~450 elements per second ( the values are in milliseconds start
>> to end)
>> I went from:
>> with Rebalance
>> *++*
>> *| **AVGWINDOW ** |*
>> *++*
>> *| *32131.0853  * |*
>> *++*
>>
>> to this without rebalance
>>
>> *++*
>> *| **AVGWINDOW ** |*
>> *++*
>> *| *70.2077   * |*
>> *++*
>>
>> El jue., 9 ago. 2018 a las 17:42, Elias Levy (<
>> fearsome.lucid...@gmail.com>) escribió:
>>
>>> What do you consider a lot of latency?  The rebalance will require
>>> serializing / deserializing the data as it gets distributed.  Depending on
>>> the complexity of your records and the efficiency of your serializers, that
>>> could have a significant impact on your performance.
>>>
>>> On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar 
>>> wrote:
>>>
>>>> Hello
>>>>
>>>> Does anyone know why when I add "rebalance()" to my .map steps is
>>>> adding a lot of latency rather than not having rebalance.
>>>>
>>>>
>>>> I have kafka partitions in my topic 44 and 44 flink task manager
>>>>
>>>> execution plan looks like this when I add rebalance but it is adding a
>>>> lot of latency
>>>>
>>>> kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance ->
>>>> kafka-sink
>>>>
>>>> Thank you
>>>> regards
>>>>
>>>>
>>
>


Re: Flink Rebalance

2018-08-09 Thread antonio saldivar
Hello

Sending ~450 elements per second ( the values are in milliseconds start to
end)
I went from:
with Rebalance

*++*

*| **AVGWINDOW ** |*

*++*

*| *32131.0853  * |*

*++*

to this without rebalance

*++*

*| **AVGWINDOW ** |*

*++*

*| *70.2077   * |*

*++*

El jue., 9 ago. 2018 a las 17:42, Elias Levy ()
escribió:

> What do you consider a lot of latency?  The rebalance will require
> serializing / deserializing the data as it gets distributed.  Depending on
> the complexity of your records and the efficiency of your serializers, that
> could have a significant impact on your performance.
>
> On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar 
> wrote:
>
>> Hello
>>
>> Does anyone know why when I add "rebalance()" to my .map steps is adding
>> a lot of latency rather than not having rebalance.
>>
>>
>> I have kafka partitions in my topic 44 and 44 flink task manager
>>
>> execution plan looks like this when I add rebalance but it is adding a
>> lot of latency
>>
>> kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance ->
>> kafka-sink
>>
>> Thank you
>> regards
>>
>>


Flink Rebalance

2018-08-09 Thread antonio saldivar
Hello

Does anyone know why when I add "rebalance()" to my .map steps is adding a
lot of latency rather than not having rebalance.


I have kafka partitions in my topic 44 and 44 flink task manager

execution plan looks like this when I add rebalance but it is adding a lot
of latency

kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance ->
kafka-sink

Thank you
regards


Re: Dynamical Windows

2018-08-01 Thread antonio saldivar
Awesome, thank you very much I will try to do it with key selector to send
the key from the front end

El mié., 1 ago. 2018 a las 11:57, vino yang ()
escribió:

> Sorry, the KeySelector's Java doc is here :
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/functions/KeySelector.html
>
> 2018-08-01 23:57 GMT+08:00 vino yang :
>
>> Hi antonio,
>>
>> The keyBy API can accept a KeySelector [1] which is a interface you can
>> implement to specify the key for your business.
>>
>> I think you can use it and implement its getKey method. In the method,
>> you can access outer system (such as Zookeeper) to get a dynamic key.
>>
>> It's just an idea, you can try it.
>>
>> Thanks, vino.
>>
>>
>> 2018-08-01 23:46 GMT+08:00 antonio saldivar :
>>
>>> Hello
>>>
>>> I am developing a Flink 1.4.2 application currently with sliding windows
>>> (Example below)
>>> I want to ask if there is a way to create the window time dynamically
>>> also the key has to change in some Use Cases and we don't want to create an
>>> specific window for each UC
>>>
>>> I want to send those values from the front end
>>>
>>> SingleOutputStreamOperator windowed = ObjectDTO
>>>
>>> .keyBy("Key")
>>>
>>> .timeWindow(Time.minutes(10),Time.minutes(1))
>>>
>>> .trigger(new AlertTrigger(env.getStreamTimeCharacteristic()))
>>>
>>> .aggregate(new TxnAggregator(), new TxnWindowFn())
>>>
>>> .name("TEN_MINUTES_WINDOW")
>>>
>>>
>>> Thank you
>>> Best Regards
>>>
>>
>>
>


Dynamical Windows

2018-08-01 Thread antonio saldivar
Hello

I am developing a Flink 1.4.2 application currently with sliding windows
(Example below)
I want to ask if there is a way to create the window time dynamically also
the key has to change in some Use Cases and we don't want to create an
specific window for each UC

I want to send those values from the front end

SingleOutputStreamOperator windowed = ObjectDTO

.keyBy("Key")

.timeWindow(Time.minutes(10),Time.minutes(1))

.trigger(new AlertTrigger(env.getStreamTimeCharacteristic()))

.aggregate(new TxnAggregator(), new TxnWindowFn())

.name("TEN_MINUTES_WINDOW")


Thank you
Best Regards


Re: working with flink Session Windows

2018-07-20 Thread antonio saldivar
Hello

Actually I evaluate my WindowFunction with a trigger alert, having
something like below code (testing with 2 different windows), expecting 5K
elements per second arriving


SingleOutputStreamOperator windowedElem = element

.keyBy("id")

.timeWindow(Time.seconds(120))

// .window(EventTimeSessionWindows.withGap(Time.minutes(20)))

.trigger(new AlertTrigger(env.getStreamTimeCharacteristic())

.aggregate(new EleAggregator(), new EleWindowFn())


El vie., 20 jul. 2018 a las 21:23, Hequn Cheng ()
escribió:

> Hi antonio,
>
> I think it worth a try to test the performance in your scenario, since job
> performance can be affected by a number of factors(say your WindowFunction).
>
> Best, Hequn
>
> On Sat, Jul 21, 2018 at 2:59 AM, antonio saldivar 
> wrote:
>
>> Hello
>>
>> I am building an app but for this UC I want to test with session windows
>> and I am not sure if this will be expensive for the compute resources
>> because the gap will be 10 mins, 20 mins 60 mins because I want to trigger
>> an alert if the element reaches some thresholds within those periods of
>> time.
>>
>> flink version 1.4.2
>>
>> Thank you
>> Best Regards
>> Antonio Saldivar
>>
>
>


working with flink Session Windows

2018-07-20 Thread antonio saldivar
Hello

I am building an app but for this UC I want to test with session windows
and I am not sure if this will be expensive for the compute resources
because the gap will be 10 mins, 20 mins 60 mins because I want to trigger
an alert if the element reaches some thresholds within those periods of
time.

flink version 1.4.2

Thank you
Best Regards
Antonio Saldivar


Re: flink javax.xml.parser Error

2018-07-17 Thread antonio saldivar
If somebody is facing this issue I solve it by adding the exclusion to my
POM.xml and I am also using javax.xml




org.apache.flink

artifactId>flink-core

1.4.2



   

xml-apis

xml-apis



   






javax.xml

jaxb-api

2.1





El lun., 16 jul. 2018 a las 18:26, antonio saldivar ()
escribió:

> Hello
>
>
> I am getting this error when I run my application in Ambari local-cluster and 
> I get this error at runtime.
>
> Flink 1.4.2
>
> phoenix
>
> hbase
>
>
> Does any one have any recommendation to solve this issue?
>
>
>
>
> javax.xml.parsers.FactoryConfigurationError: Provider for class 
> javax.xml.parsers.DocumentBuilderFactory cannot be created
>   at 
> javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
>   at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
>   at 
> javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2208)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2185)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2102)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:851)
>   at 
> org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:870)
>   at 
> org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1268)
>   at 
> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
>   at 
> org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:82)
>   at 
> org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:97)
>   at 
> org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:49)
>   at 
> org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:46)
>   at 
> org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78)
>   at 
> org.apache.phoenix.util.PhoenixContextExecutor.callWithoutPropagation(PhoenixContextExecutor.java:93)
>   at 
> org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl.getConfiguration(ConfigurationFactory.java:46)
>   at 
> org.apache.phoenix.query.QueryServicesOptions.withDefaults(QueryServicesOptions.java:285)
>   at 
> org.apache.phoenix.query.QueryServicesImpl.(QueryServicesImpl.java:36)
>   at 
> org.apache.phoenix.jdbc.PhoenixDriver.getQueryServices(PhoenixDriver.java:178)
>   at 
> org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:214)
>   at 
> org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142)
>   at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:270)
>   at 
> net.paladion.npci.dbsteps.FrmRawDataInsertStep$1.map(FrmRawDataInsertStep.java:49)
>   at 
> net.paladion.npci.dbsteps.FrmRawDataInsertStep$1.map(FrmRawDataInsertStep.java:1)
>   at 
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:549)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:524)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:504)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:611)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:572)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:830)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:808)
>   at 
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:549)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:524)
>   at 
> org.apache.flink.streaming.runtime.tas

flink javax.xml.parser Error

2018-07-16 Thread antonio saldivar
Hello


I am getting this error when I run my application in Ambari
local-cluster and I get this error at runtime.

Flink 1.4.2

phoenix

hbase


Does any one have any recommendation to solve this issue?




javax.xml.parsers.FactoryConfigurationError: Provider for class
javax.xml.parsers.DocumentBuilderFactory cannot be created
at 
javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
at 
javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2208)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2185)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2102)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:851)
at 
org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:870)
at 
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1268)
at 
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68)
at 
org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:82)
at 
org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:97)
at 
org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:49)
at 
org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:46)
at 
org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78)
at 
org.apache.phoenix.util.PhoenixContextExecutor.callWithoutPropagation(PhoenixContextExecutor.java:93)
at 
org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl.getConfiguration(ConfigurationFactory.java:46)
at 
org.apache.phoenix.query.QueryServicesOptions.withDefaults(QueryServicesOptions.java:285)
at 
org.apache.phoenix.query.QueryServicesImpl.(QueryServicesImpl.java:36)
at 
org.apache.phoenix.jdbc.PhoenixDriver.getQueryServices(PhoenixDriver.java:178)
at 
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:214)
at 
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at 
net.paladion.npci.dbsteps.FrmRawDataInsertStep$1.map(FrmRawDataInsertStep.java:49)
at 
net.paladion.npci.dbsteps.FrmRawDataInsertStep$1.map(FrmRawDataInsertStep.java:1)
at 
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:549)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:524)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:504)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:611)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:572)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:830)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:808)
at 
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:549)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:524)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:504)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:830)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:808)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at 
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:384)
   

flink 1.4.2 Ambari

2018-07-13 Thread antonio saldivar
Hello

I am trying to find the way to add Flink 1.4.2 service to ambari because is
not listed in the Stack. does anyone has the  steps to add this service
manually?

Thank you
Best regards


Re: Measure Latency from source to sink

2018-06-26 Thread antonio saldivar
Hello thank you

I also was trying using Flink UI Metrics on version 1.4.2
*env.getConfig().setLatencyTrackingInterval(1000L),
*But looks like is not displaying anything

El mar., 26 jun. 2018 a las 10:45, zhangminglei (<18717838...@163.com>)
escribió:

> Hi, You can do that but it does not makes sense in general. But you can do
> that by flink, storm, spark streaming or structured streaming.  And make a
> compare the latency under different framework.
>
> Cheers
> Minglei
>
> 在 2018年6月26日,下午9:36,antonio saldivar  写道:
>
> Hello Thank you for the feedback,
>
> Well for now I just Want to measure the time that takes form Source to
> Sink each transaction add the start and end time in mills
>
>
>
> El mar., 26 jun. 2018 a las 5:19, zhangminglei (<18717838...@163.com>)
> escribió:
>
>> Hi,Antonio
>>
>> Usually, the measurement of delay is for specific business I think it is
>> more reasonable. What I understand of latency from my experience is data
>> preparation time plus query calculation time. It is like an end to end
>> latency test. Hopes this can help you. Not point to the latency of flink
>>
>> Cheers
>> Minglei
>>
>>
>> > 在 2018年6月26日,上午5:23,antonio saldivar  写道:
>> >
>> > Hello
>> >
>> > I am trying to measure the latency of each transaction traveling across
>> the system as a DataSource I have a Kafka consumer and I would like to
>> measure the time that takes from the Source to Sink. Does any one has an
>> example?.
>> >
>> > Thank you
>> > Best Regards
>>
>>
>>
>


Re: Measure Latency from source to sink

2018-06-26 Thread antonio saldivar
Hello Thank you for the feedback,

Well for now I just Want to measure the time that takes form Source to Sink
each transaction add the start and end time in mills



El mar., 26 jun. 2018 a las 5:19, zhangminglei (<18717838...@163.com>)
escribió:

> Hi,Antonio
>
> Usually, the measurement of delay is for specific business I think it is
> more reasonable. What I understand of latency from my experience is data
> preparation time plus query calculation time. It is like an end to end
> latency test. Hopes this can help you. Not point to the latency of flink
>
> Cheers
> Minglei
>
>
> > 在 2018年6月26日,上午5:23,antonio saldivar  写道:
> >
> > Hello
> >
> > I am trying to measure the latency of each transaction traveling across
> the system as a DataSource I have a Kafka consumer and I would like to
> measure the time that takes from the Source to Sink. Does any one has an
> example?.
> >
> > Thank you
> > Best Regards
>
>
>


Re: Measure Latency from source to sink

2018-06-25 Thread antonio saldivar
Thank you very much

I already did #2 but ate the moment i print te output as i am using a
trigger alert and evaluete the window it replace me the toString values to
null or 0 and only prints the ones saved in my accumulator and the keyBy
value

On Mon, Jun 25, 2018, 9:22 PM Hequn Cheng  wrote:

> Hi antonio,
>
> I see two options to solve your problem.
> 1. Enable the latency tracking[1]. But you have to pay attention to it's
> mechanism, for example, a) the sources only *periodically* emit a special
> record and b) the latency markers are not accounting for the time user
> records spend in operators as they are bypassing them.
> 2. Add a time field to each of your record. Each time a record comes in
> from the source, write down the time(t1), so that we can get the latency at
> sink(t2) by  t2 - t1.
>
> Hope this helps.
> Hequn
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#latency-tracking
>
> On Tue, Jun 26, 2018 at 5:23 AM, antonio saldivar 
> wrote:
>
>> Hello
>>
>> I am trying to measure the latency of each transaction traveling across
>> the system as a DataSource I have a Kafka consumer and I would like to
>> measure the time that takes from the Source to Sink. Does any one has an
>> example?.
>>
>> Thank you
>> Best Regards
>>
>
>


Measure Latency from source to sink

2018-06-25 Thread antonio saldivar
Hello

I am trying to measure the latency of each transaction traveling across the
system as a DataSource I have a Kafka consumer and I would like to measure
the time that takes from the Source to Sink. Does any one has an example?.

Thank you
Best Regards


Re: Flink multiple windows

2018-06-10 Thread antonio saldivar
Thank you very much Fabian I found that solution in the link below and this
is the bes fit for my use case
https://stackoverflow.com/questions/47458497/apache-flink-how-to-apply-multiple-counting-window-functions?rq=1

I am still testing how to Count (example. numTransactions >= 3)   then I
sum I am also using  *Tumbling windows *

*Best Regards*

2018-06-10 6:04 GMT-04:00 Fabian Hueske :

> Hi Antonio,
>
> Cascading window aggregations as done in your example is a good idea and
> is preferable if the aggregation function is combinable, which is true for
> sum (count can be done as sum of 1s).
>
> Best, Fabian
>
> 2018-06-09 4:00 GMT+02:00 antonio saldivar :
>
>> Hello
>>
>> Has anyone work this way? I am asking because I have to get the
>> aggregation ( Sum and Count) for multiple windows size  (10 mins, 20 mins,
>> 30 mins) please let me know if this works properly or is there other good
>> solution.
>>
>>
>> DataStream data = ...
>> // append a Long 1 to each record to count it.
>> DataStream> withOnes = data.map(new AppendOne);
>>
>> DataStream> 1minCnts = withOnes
>>   // key by String field
>>   .keyBy(0)
>>   // define time window
>>   .timeWindow(Time.of(1, MINUTES))
>>   // sum ones of the Long field
>>   // in practice you want to use an incrementally aggregating ReduceFunction 
>> and
>>   // a WindowFunction to extract the start/end timestamp of the window
>>   .sum(1);
>>
>> // emit 1-min counts to wherever you need it
>> 1minCnts.addSink(new YourSink());
>>
>> // compute 5-min counts based on 1-min counts
>> DataStream> 5minCnts = 1minCnts
>>   // key by String field
>>   .keyBy(0)
>>   // define time window of 5 minutes
>>   .timeWindow(Time.of(5, MINUTES))
>>   // sum the 1-minute counts in the Long field
>>   .sum(1);
>>
>> // emit 5-min counts to wherever you need it
>> 5minCnts.addSink(new YourSink());
>>
>> // continue with 1 day window and 1 week window
>>
>> Thank you Regards
>>
>>
>>
>


Flink multiple windows

2018-06-08 Thread antonio saldivar
Hello

Has anyone work this way? I am asking because I have to get the aggregation
( Sum and Count) for multiple windows size  (10 mins, 20 mins, 30 mins)
please let me know if this works properly or is there other good solution.


DataStream data = ...
// append a Long 1 to each record to count it.
DataStream> withOnes = data.map(new AppendOne);

DataStream> 1minCnts = withOnes
  // key by String field
  .keyBy(0)
  // define time window
  .timeWindow(Time.of(1, MINUTES))
  // sum ones of the Long field
  // in practice you want to use an incrementally aggregating
ReduceFunction and
  // a WindowFunction to extract the start/end timestamp of the window
  .sum(1);

// emit 1-min counts to wherever you need it
1minCnts.addSink(new YourSink());

// compute 5-min counts based on 1-min counts
DataStream> 5minCnts = 1minCnts
  // key by String field
  .keyBy(0)
  // define time window of 5 minutes
  .timeWindow(Time.of(5, MINUTES))
  // sum the 1-minute counts in the Long field
  .sum(1);

// emit 5-min counts to wherever you need it
5minCnts.addSink(new YourSink());

// continue with 1 day window and 1 week window

Thank you Regards


Take elements from window

2018-06-08 Thread Antonio Saldivar Lezama
Hello


I am wondering if it is possible to process the following scenario, to store 
all events by event time in a general window and process elements from a 
smaller time Frame

1.-  Store elements in a General SlidingWindow (60 mins, 10 mins)
- Rule 1 -> gets 10 mins elements from the general window and get 
aggregations
- Rule 2 -> gets 20 mins elements from the general window and get 
aggregations
- Rule 3 -> gets 30 mins elements from the general window and get 
aggregations
2.- send results 

Thank you
Regards