Re: Aggregator in CEP
hello thank you very much I took a look on the link but now how can I check the conditions to get aggregator results? El vie., 24 ago. 2018 a las 5:27, aitozi () escribió: > Hi, > > Now that it still not support the aggregator function in cep > iterativeCondition. Now may be you need to check the condition by yourself > to get the aggregator result. I will work for this these day, you can take > a > look on this issue: > > https://issues.apache.org/jira/projects/FLINK/issues/FLINK-9507 > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >
Aggregator in CEP
Hello I am developing an application where I use Flink(v 1.4.2) CEP , is there any aggregation function to match cumulative amounts or counts in a IterativeCondition within a period of time for a KeyBy elements? if a cumulative amount reaches thresholds fire a result Thank you Regards
Re: processWindowFunction
Maybe the usage of that function change, now I have to use it as this [1] [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#the-keyedprocessfunction El lun., 20 ago. 2018 a las 5:56, vino yang () escribió: > Hi antonio, > > Oh, if you can't use KeyedProcessFunction, then this would be a pity. > Then you can use MapState, where Key is used to store the key of your > partition. > But I am not sure if this will achieve the effect you want. > > Thanks, vino. > > antonio saldivar 于2018年8月20日周一 下午4:32写道: > >> Hello >> >> Thank you for the information, for some reason this KeyedProcessFunction >> is not found in my Flink version 1.4.2 I can only find ProcessFunction and >> work like this >> >> public class TxnProcessFn extends ProcessFunction { >> >> public void open(Configuration parameters) throws Exception { >> >> state1 = getRuntimeContext().getState(new ValueStateDescriptor<>( >> "objState1", Object.class)); >> >> state2 = getRuntimeContext().getState(new ValueStateDescriptor<>( >> "objState2", Object.class)); >> >> state3 = getRuntimeContext().getState(new ValueStateDescriptor<>( >> "objState3", Object.class)); >> >> } >> >> @Override >> >> public void processElement( >> >> Object obj, >> >> Context ctx, >> >> Collector out) throws Exception { >> >> // TODO Auto-generated method stub >> >> Object current = state.value(); >> >> if (current == null) { >> >> current = new Object(); >> >> current.id=obj.id(); >> >> >> >> } >> >> } >> >> El lun., 20 ago. 2018 a las 2:24, vino yang () >> escribió: >> >>> Hi antonio, >>> >>> First, I suggest you use KeyedProcessFunction if you have an operation >>> similar to keyBy. >>> The implementation is similar to the Fixed window. >>> You can create three state collections to determine whether the time of >>> each element belongs to a state collection. >>> At the time of the trigger, the elements in the collection are evaluated. >>> >>> Thanks, vino. >>> >>> antonio saldivar 于2018年8月20日周一 上午11:54写道: >>> >>>> Thank you fro the references >>>> >>>> I have now my processFunction and getting the state but now how can i >>>> do for the threshold times to group the elements and also as this is a >>>> global window, how to purge because if going to keep increasing >>>> >>>> El dom., 19 ago. 2018 a las 8:57, vino yang () >>>> escribió: >>>> >>>>> Hi antonio, >>>>> >>>>> Regarding your scenario, I think maybe you can consider using the >>>>> ProcessFunction (or keyed ProcessFunction) function directly on the >>>>> Stream. >>>>> [1] >>>>> It can handle each of your elements with a Timer, and you can combine >>>>> Flink's state API[2] to store your data. >>>>> >>>>> [1]: >>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#process-function-low-level-operations >>>>> [2]: >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#working-with-state >>>>> >>>>> Thanks, vino. >>>>> >>>>> antonio saldivar 于2018年8月19日周日 上午10:18写道: >>>>> >>>>>> hi Vino >>>>>> >>>>>> it is possible to use global window, then set the trigger onElement >>>>>> comparing the element that has arrived with for example 10 mins, 20 mins >>>>>> and 60 mins of data? >>>>>> >>>>>> I have rules evaluating sum of amount for 10,20 or 60 mins for the >>>>>> same keyed element if the same id sum like $200 total within those >>>>>> thresholds and count more or equals to 3 I need to be able to set some >>>>>> values to the object if the object does not reach those thresholds i do >>>>>> not >>>>>> set the values and keep sending the output with or without those value. >>>>>> >>>>>> just processing the object on the fly and send output >>>>>> >>>>>> >>>>>> >>>>>> &g
Re: processWindowFunction
Hello Thank you for the information, for some reason this KeyedProcessFunction is not found in my Flink version 1.4.2 I can only find ProcessFunction and work like this public class TxnProcessFn extends ProcessFunction { public void open(Configuration parameters) throws Exception { state1 = getRuntimeContext().getState(new ValueStateDescriptor<>("objState1", Object.class)); state2 = getRuntimeContext().getState(new ValueStateDescriptor<>("objState2", Object.class)); state3 = getRuntimeContext().getState(new ValueStateDescriptor<>("objState3", Object.class)); } @Override public void processElement( Object obj, Context ctx, Collector out) throws Exception { // TODO Auto-generated method stub Object current = state.value(); if (current == null) { current = new Object(); current.id=obj.id(); } } El lun., 20 ago. 2018 a las 2:24, vino yang () escribió: > Hi antonio, > > First, I suggest you use KeyedProcessFunction if you have an operation > similar to keyBy. > The implementation is similar to the Fixed window. > You can create three state collections to determine whether the time of > each element belongs to a state collection. > At the time of the trigger, the elements in the collection are evaluated. > > Thanks, vino. > > antonio saldivar 于2018年8月20日周一 上午11:54写道: > >> Thank you fro the references >> >> I have now my processFunction and getting the state but now how can i do >> for the threshold times to group the elements and also as this is a global >> window, how to purge because if going to keep increasing >> >> El dom., 19 ago. 2018 a las 8:57, vino yang () >> escribió: >> >>> Hi antonio, >>> >>> Regarding your scenario, I think maybe you can consider using the >>> ProcessFunction (or keyed ProcessFunction) function directly on the Stream. >>> [1] >>> It can handle each of your elements with a Timer, and you can combine >>> Flink's state API[2] to store your data. >>> >>> [1]: >>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#process-function-low-level-operations >>> [2]: >>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#working-with-state >>> >>> Thanks, vino. >>> >>> antonio saldivar 于2018年8月19日周日 上午10:18写道: >>> >>>> hi Vino >>>> >>>> it is possible to use global window, then set the trigger onElement >>>> comparing the element that has arrived with for example 10 mins, 20 mins >>>> and 60 mins of data? >>>> >>>> I have rules evaluating sum of amount for 10,20 or 60 mins for the same >>>> keyed element if the same id sum like $200 total within those thresholds >>>> and count more or equals to 3 I need to be able to set some values to the >>>> object if the object does not reach those thresholds i do not set the >>>> values and keep sending the output with or without those value. >>>> >>>> just processing the object on the fly and send output >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> El vie., 17 ago. 2018 a las 22:14, vino yang () >>>> escribió: >>>> >>>>> Hi antonio, >>>>> >>>>> Yes, ProcessWindowFunction is a very low level window function. >>>>> It allows you to access the data in the window and allows you to >>>>> customize the output of the window. >>>>> So if you use it, while giving you flexibility, you need to think >>>>> about other things, which may require you to write more processing logic. >>>>> >>>>> Generally speaking, sliding windows usually have some data that is >>>>> repeated, but a common mode is to apply a reduce function on it to get >>>>> your >>>>> calculation results. >>>>> If you only send data, there will definitely be some duplication. >>>>> >>>>> Thanks, vino. >>>>> >>>>> antonio saldivar 于2018年8月17日周五 下午12:01写道: >>>>> >>>>>> Hi Vino >>>>>> thank you for the information, actually I am using a trigger alert >>>>>> and processWindowFunction to send my results, but when my window slides >>>>>> or >>>>>> ends it sends again the objects and I an getting duplicated data >>>>>> >>>>>
Re: processWindowFunction
Thank you fro the references I have now my processFunction and getting the state but now how can i do for the threshold times to group the elements and also as this is a global window, how to purge because if going to keep increasing El dom., 19 ago. 2018 a las 8:57, vino yang () escribió: > Hi antonio, > > Regarding your scenario, I think maybe you can consider using the > ProcessFunction (or keyed ProcessFunction) function directly on the Stream. > [1] > It can handle each of your elements with a Timer, and you can combine > Flink's state API[2] to store your data. > > [1]: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#process-function-low-level-operations > [2]: > https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#working-with-state > > Thanks, vino. > > antonio saldivar 于2018年8月19日周日 上午10:18写道: > >> hi Vino >> >> it is possible to use global window, then set the trigger onElement >> comparing the element that has arrived with for example 10 mins, 20 mins >> and 60 mins of data? >> >> I have rules evaluating sum of amount for 10,20 or 60 mins for the same >> keyed element if the same id sum like $200 total within those thresholds >> and count more or equals to 3 I need to be able to set some values to the >> object if the object does not reach those thresholds i do not set the >> values and keep sending the output with or without those value. >> >> just processing the object on the fly and send output >> >> >> >> >> >> >> >> El vie., 17 ago. 2018 a las 22:14, vino yang () >> escribió: >> >>> Hi antonio, >>> >>> Yes, ProcessWindowFunction is a very low level window function. >>> It allows you to access the data in the window and allows you to >>> customize the output of the window. >>> So if you use it, while giving you flexibility, you need to think about >>> other things, which may require you to write more processing logic. >>> >>> Generally speaking, sliding windows usually have some data that is >>> repeated, but a common mode is to apply a reduce function on it to get your >>> calculation results. >>> If you only send data, there will definitely be some duplication. >>> >>> Thanks, vino. >>> >>> antonio saldivar 于2018年8月17日周五 下午12:01写道: >>> >>>> Hi Vino >>>> thank you for the information, actually I am using a trigger alert and >>>> processWindowFunction to send my results, but when my window slides or ends >>>> it sends again the objects and I an getting duplicated data >>>> >>>> El jue., 16 ago. 2018 a las 22:05, vino yang () >>>> escribió: >>>> >>>>> Hi Antonio, >>>>> >>>>> What results do not you want to get when creating each window? >>>>> Examples of the use of ProcessWindowFunction are included in many test >>>>> files in Flink's project, such as SideOutputITCase.scala or >>>>> WindowTranslationTest.scala. >>>>> >>>>> For more information on ProcessWindowFunction, you can refer to the >>>>> official website.[1] >>>>> >>>>> [1]: >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#processwindowfunction >>>>> >>>>> Thanks, vino. >>>>> >>>>> antonio saldivar 于2018年8月17日周五 上午6:24写道: >>>>> >>>>>> Hello >>>>>> >>>>>> I am implementing a data stream where I use sliding windows but I am >>>>>> stuck because I need to set values to my object based on some if >>>>>> statements >>>>>> in my process function and send the object to the next step but I don't >>>>>> want results every time a window is creating >>>>>> >>>>>> if anyone has a good example on this that can help me >>>>>> >>>>>
Re: processWindowFunction
hi Vino it is possible to use global window, then set the trigger onElement comparing the element that has arrived with for example 10 mins, 20 mins and 60 mins of data? I have rules evaluating sum of amount for 10,20 or 60 mins for the same keyed element if the same id sum like $200 total within those thresholds and count more or equals to 3 I need to be able to set some values to the object if the object does not reach those thresholds i do not set the values and keep sending the output with or without those value. just processing the object on the fly and send output El vie., 17 ago. 2018 a las 22:14, vino yang () escribió: > Hi antonio, > > Yes, ProcessWindowFunction is a very low level window function. > It allows you to access the data in the window and allows you to customize > the output of the window. > So if you use it, while giving you flexibility, you need to think about > other things, which may require you to write more processing logic. > > Generally speaking, sliding windows usually have some data that is > repeated, but a common mode is to apply a reduce function on it to get your > calculation results. > If you only send data, there will definitely be some duplication. > > Thanks, vino. > > antonio saldivar 于2018年8月17日周五 下午12:01写道: > >> Hi Vino >> thank you for the information, actually I am using a trigger alert and >> processWindowFunction to send my results, but when my window slides or ends >> it sends again the objects and I an getting duplicated data >> >> El jue., 16 ago. 2018 a las 22:05, vino yang () >> escribió: >> >>> Hi Antonio, >>> >>> What results do not you want to get when creating each window? >>> Examples of the use of ProcessWindowFunction are included in many test >>> files in Flink's project, such as SideOutputITCase.scala or >>> WindowTranslationTest.scala. >>> >>> For more information on ProcessWindowFunction, you can refer to the >>> official website.[1] >>> >>> [1]: >>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#processwindowfunction >>> >>> Thanks, vino. >>> >>> antonio saldivar 于2018年8月17日周五 上午6:24写道: >>> >>>> Hello >>>> >>>> I am implementing a data stream where I use sliding windows but I am >>>> stuck because I need to set values to my object based on some if statements >>>> in my process function and send the object to the next step but I don't >>>> want results every time a window is creating >>>> >>>> if anyone has a good example on this that can help me >>>> >>>
Re: processWindowFunction
Hi Vino thank you for the information, actually I am using a trigger alert and processWindowFunction to send my results, but when my window slides or ends it sends again the objects and I an getting duplicated data El jue., 16 ago. 2018 a las 22:05, vino yang () escribió: > Hi Antonio, > > What results do not you want to get when creating each window? > Examples of the use of ProcessWindowFunction are included in many test > files in Flink's project, such as SideOutputITCase.scala or > WindowTranslationTest.scala. > > For more information on ProcessWindowFunction, you can refer to the > official website.[1] > > [1]: > https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/windows.html#processwindowfunction > > Thanks, vino. > > antonio saldivar 于2018年8月17日周五 上午6:24写道: > >> Hello >> >> I am implementing a data stream where I use sliding windows but I am >> stuck because I need to set values to my object based on some if statements >> in my process function and send the object to the next step but I don't >> want results every time a window is creating >> >> if anyone has a good example on this that can help me >> >
processWindowFunction
Hello I am implementing a data stream where I use sliding windows but I am stuck because I need to set values to my object based on some if statements in my process function and send the object to the next step but I don't want results every time a window is creating if anyone has a good example on this that can help me
Re: Flink Rebalance
Hi Fabian Thank you, yes there are just map functions, i will do it that way with methods to get it faster On Fri, Aug 10, 2018, 5:58 AM Fabian Hueske wrote: > Hi, > > Elias and Paul have good points. > I think the performance degradation is mostly to the lack of function > chaining in the rebalance case. > > If all steps are just map functions, they can be chained in the > no-rebalance case. > That means, records are passed via function calls. > If you add rebalancing, records will be passed between map functions via > serialization, network transfer, and deserialization. > This is of course much more expensive than calling a method. > > Best, Fabian > > 2018-08-10 4:25 GMT+02:00 Paul Lam : > >> Hi Antonio, >> >> AFAIK, there are two reasons for this: >> >> 1. Rebalancing itself brings latency because it takes time to >> redistribute the elements. >> 2. Rebalancing also messes up the order in the Kafka topic partitions, >> and often makes a event-time window wait longer to trigger in case you’re >> using event time characteristic. >> >> Best Regards, >> Paul Lam >> >> >> >> 在 2018年8月10日,05:49,antonio saldivar 写道: >> >> Hello >> >> Sending ~450 elements per second ( the values are in milliseconds start >> to end) >> I went from: >> with Rebalance >> *++* >> *| **AVGWINDOW ** |* >> *++* >> *| *32131.0853 * |* >> *++* >> >> to this without rebalance >> >> *++* >> *| **AVGWINDOW ** |* >> *++* >> *| *70.2077 * |* >> *++* >> >> El jue., 9 ago. 2018 a las 17:42, Elias Levy (< >> fearsome.lucid...@gmail.com>) escribió: >> >>> What do you consider a lot of latency? The rebalance will require >>> serializing / deserializing the data as it gets distributed. Depending on >>> the complexity of your records and the efficiency of your serializers, that >>> could have a significant impact on your performance. >>> >>> On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar >>> wrote: >>> >>>> Hello >>>> >>>> Does anyone know why when I add "rebalance()" to my .map steps is >>>> adding a lot of latency rather than not having rebalance. >>>> >>>> >>>> I have kafka partitions in my topic 44 and 44 flink task manager >>>> >>>> execution plan looks like this when I add rebalance but it is adding a >>>> lot of latency >>>> >>>> kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> >>>> kafka-sink >>>> >>>> Thank you >>>> regards >>>> >>>> >> >
Re: Flink Rebalance
Hello Sending ~450 elements per second ( the values are in milliseconds start to end) I went from: with Rebalance *++* *| **AVGWINDOW ** |* *++* *| *32131.0853 * |* *++* to this without rebalance *++* *| **AVGWINDOW ** |* *++* *| *70.2077 * |* *++* El jue., 9 ago. 2018 a las 17:42, Elias Levy () escribió: > What do you consider a lot of latency? The rebalance will require > serializing / deserializing the data as it gets distributed. Depending on > the complexity of your records and the efficiency of your serializers, that > could have a significant impact on your performance. > > On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar > wrote: > >> Hello >> >> Does anyone know why when I add "rebalance()" to my .map steps is adding >> a lot of latency rather than not having rebalance. >> >> >> I have kafka partitions in my topic 44 and 44 flink task manager >> >> execution plan looks like this when I add rebalance but it is adding a >> lot of latency >> >> kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> >> kafka-sink >> >> Thank you >> regards >> >>
Flink Rebalance
Hello Does anyone know why when I add "rebalance()" to my .map steps is adding a lot of latency rather than not having rebalance. I have kafka partitions in my topic 44 and 44 flink task manager execution plan looks like this when I add rebalance but it is adding a lot of latency kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> kafka-sink Thank you regards
Re: Dynamical Windows
Awesome, thank you very much I will try to do it with key selector to send the key from the front end El mié., 1 ago. 2018 a las 11:57, vino yang () escribió: > Sorry, the KeySelector's Java doc is here : > https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/functions/KeySelector.html > > 2018-08-01 23:57 GMT+08:00 vino yang : > >> Hi antonio, >> >> The keyBy API can accept a KeySelector [1] which is a interface you can >> implement to specify the key for your business. >> >> I think you can use it and implement its getKey method. In the method, >> you can access outer system (such as Zookeeper) to get a dynamic key. >> >> It's just an idea, you can try it. >> >> Thanks, vino. >> >> >> 2018-08-01 23:46 GMT+08:00 antonio saldivar : >> >>> Hello >>> >>> I am developing a Flink 1.4.2 application currently with sliding windows >>> (Example below) >>> I want to ask if there is a way to create the window time dynamically >>> also the key has to change in some Use Cases and we don't want to create an >>> specific window for each UC >>> >>> I want to send those values from the front end >>> >>> SingleOutputStreamOperator windowed = ObjectDTO >>> >>> .keyBy("Key") >>> >>> .timeWindow(Time.minutes(10),Time.minutes(1)) >>> >>> .trigger(new AlertTrigger(env.getStreamTimeCharacteristic())) >>> >>> .aggregate(new TxnAggregator(), new TxnWindowFn()) >>> >>> .name("TEN_MINUTES_WINDOW") >>> >>> >>> Thank you >>> Best Regards >>> >> >> >
Dynamical Windows
Hello I am developing a Flink 1.4.2 application currently with sliding windows (Example below) I want to ask if there is a way to create the window time dynamically also the key has to change in some Use Cases and we don't want to create an specific window for each UC I want to send those values from the front end SingleOutputStreamOperator windowed = ObjectDTO .keyBy("Key") .timeWindow(Time.minutes(10),Time.minutes(1)) .trigger(new AlertTrigger(env.getStreamTimeCharacteristic())) .aggregate(new TxnAggregator(), new TxnWindowFn()) .name("TEN_MINUTES_WINDOW") Thank you Best Regards
Re: working with flink Session Windows
Hello Actually I evaluate my WindowFunction with a trigger alert, having something like below code (testing with 2 different windows), expecting 5K elements per second arriving SingleOutputStreamOperator windowedElem = element .keyBy("id") .timeWindow(Time.seconds(120)) // .window(EventTimeSessionWindows.withGap(Time.minutes(20))) .trigger(new AlertTrigger(env.getStreamTimeCharacteristic()) .aggregate(new EleAggregator(), new EleWindowFn()) El vie., 20 jul. 2018 a las 21:23, Hequn Cheng () escribió: > Hi antonio, > > I think it worth a try to test the performance in your scenario, since job > performance can be affected by a number of factors(say your WindowFunction). > > Best, Hequn > > On Sat, Jul 21, 2018 at 2:59 AM, antonio saldivar > wrote: > >> Hello >> >> I am building an app but for this UC I want to test with session windows >> and I am not sure if this will be expensive for the compute resources >> because the gap will be 10 mins, 20 mins 60 mins because I want to trigger >> an alert if the element reaches some thresholds within those periods of >> time. >> >> flink version 1.4.2 >> >> Thank you >> Best Regards >> Antonio Saldivar >> > >
working with flink Session Windows
Hello I am building an app but for this UC I want to test with session windows and I am not sure if this will be expensive for the compute resources because the gap will be 10 mins, 20 mins 60 mins because I want to trigger an alert if the element reaches some thresholds within those periods of time. flink version 1.4.2 Thank you Best Regards Antonio Saldivar
Re: flink javax.xml.parser Error
If somebody is facing this issue I solve it by adding the exclusion to my POM.xml and I am also using javax.xml org.apache.flink artifactId>flink-core 1.4.2 xml-apis xml-apis javax.xml jaxb-api 2.1 El lun., 16 jul. 2018 a las 18:26, antonio saldivar () escribió: > Hello > > > I am getting this error when I run my application in Ambari local-cluster and > I get this error at runtime. > > Flink 1.4.2 > > phoenix > > hbase > > > Does any one have any recommendation to solve this issue? > > > > > javax.xml.parsers.FactoryConfigurationError: Provider for class > javax.xml.parsers.DocumentBuilderFactory cannot be created > at > javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311) > at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267) > at > javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2208) > at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2185) > at > org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2102) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:851) > at > org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:870) > at > org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1268) > at > org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68) > at > org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:82) > at > org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:97) > at > org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:49) > at > org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:46) > at > org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78) > at > org.apache.phoenix.util.PhoenixContextExecutor.callWithoutPropagation(PhoenixContextExecutor.java:93) > at > org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl.getConfiguration(ConfigurationFactory.java:46) > at > org.apache.phoenix.query.QueryServicesOptions.withDefaults(QueryServicesOptions.java:285) > at > org.apache.phoenix.query.QueryServicesImpl.(QueryServicesImpl.java:36) > at > org.apache.phoenix.jdbc.PhoenixDriver.getQueryServices(PhoenixDriver.java:178) > at > org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:214) > at > org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142) > at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202) > at java.sql.DriverManager.getConnection(DriverManager.java:664) > at java.sql.DriverManager.getConnection(DriverManager.java:270) > at > net.paladion.npci.dbsteps.FrmRawDataInsertStep$1.map(FrmRawDataInsertStep.java:49) > at > net.paladion.npci.dbsteps.FrmRawDataInsertStep$1.map(FrmRawDataInsertStep.java:1) > at > org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:549) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:524) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:504) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:611) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:572) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:830) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:808) > at > org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:549) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:524) > at > org.apache.flink.streaming.runtime.tas
flink javax.xml.parser Error
Hello I am getting this error when I run my application in Ambari local-cluster and I get this error at runtime. Flink 1.4.2 phoenix hbase Does any one have any recommendation to solve this issue? javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311) at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267) at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2208) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2185) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2102) at org.apache.hadoop.conf.Configuration.get(Configuration.java:851) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:870) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1268) at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:68) at org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:82) at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:97) at org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:49) at org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:46) at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78) at org.apache.phoenix.util.PhoenixContextExecutor.callWithoutPropagation(PhoenixContextExecutor.java:93) at org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl.getConfiguration(ConfigurationFactory.java:46) at org.apache.phoenix.query.QueryServicesOptions.withDefaults(QueryServicesOptions.java:285) at org.apache.phoenix.query.QueryServicesImpl.(QueryServicesImpl.java:36) at org.apache.phoenix.jdbc.PhoenixDriver.getQueryServices(PhoenixDriver.java:178) at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:214) at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142) at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:270) at net.paladion.npci.dbsteps.FrmRawDataInsertStep$1.map(FrmRawDataInsertStep.java:49) at net.paladion.npci.dbsteps.FrmRawDataInsertStep$1.map(FrmRawDataInsertStep.java:1) at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:549) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:524) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:504) at org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:611) at org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:572) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:830) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:808) at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:549) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:524) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:504) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:830) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:808) at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104) at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111) at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:384)
flink 1.4.2 Ambari
Hello I am trying to find the way to add Flink 1.4.2 service to ambari because is not listed in the Stack. does anyone has the steps to add this service manually? Thank you Best regards
Re: Measure Latency from source to sink
Hello thank you I also was trying using Flink UI Metrics on version 1.4.2 *env.getConfig().setLatencyTrackingInterval(1000L), *But looks like is not displaying anything El mar., 26 jun. 2018 a las 10:45, zhangminglei (<18717838...@163.com>) escribió: > Hi, You can do that but it does not makes sense in general. But you can do > that by flink, storm, spark streaming or structured streaming. And make a > compare the latency under different framework. > > Cheers > Minglei > > 在 2018年6月26日,下午9:36,antonio saldivar 写道: > > Hello Thank you for the feedback, > > Well for now I just Want to measure the time that takes form Source to > Sink each transaction add the start and end time in mills > > > > El mar., 26 jun. 2018 a las 5:19, zhangminglei (<18717838...@163.com>) > escribió: > >> Hi,Antonio >> >> Usually, the measurement of delay is for specific business I think it is >> more reasonable. What I understand of latency from my experience is data >> preparation time plus query calculation time. It is like an end to end >> latency test. Hopes this can help you. Not point to the latency of flink >> >> Cheers >> Minglei >> >> >> > 在 2018年6月26日,上午5:23,antonio saldivar 写道: >> > >> > Hello >> > >> > I am trying to measure the latency of each transaction traveling across >> the system as a DataSource I have a Kafka consumer and I would like to >> measure the time that takes from the Source to Sink. Does any one has an >> example?. >> > >> > Thank you >> > Best Regards >> >> >> >
Re: Measure Latency from source to sink
Hello Thank you for the feedback, Well for now I just Want to measure the time that takes form Source to Sink each transaction add the start and end time in mills El mar., 26 jun. 2018 a las 5:19, zhangminglei (<18717838...@163.com>) escribió: > Hi,Antonio > > Usually, the measurement of delay is for specific business I think it is > more reasonable. What I understand of latency from my experience is data > preparation time plus query calculation time. It is like an end to end > latency test. Hopes this can help you. Not point to the latency of flink > > Cheers > Minglei > > > > 在 2018年6月26日,上午5:23,antonio saldivar 写道: > > > > Hello > > > > I am trying to measure the latency of each transaction traveling across > the system as a DataSource I have a Kafka consumer and I would like to > measure the time that takes from the Source to Sink. Does any one has an > example?. > > > > Thank you > > Best Regards > > >
Re: Measure Latency from source to sink
Thank you very much I already did #2 but ate the moment i print te output as i am using a trigger alert and evaluete the window it replace me the toString values to null or 0 and only prints the ones saved in my accumulator and the keyBy value On Mon, Jun 25, 2018, 9:22 PM Hequn Cheng wrote: > Hi antonio, > > I see two options to solve your problem. > 1. Enable the latency tracking[1]. But you have to pay attention to it's > mechanism, for example, a) the sources only *periodically* emit a special > record and b) the latency markers are not accounting for the time user > records spend in operators as they are bypassing them. > 2. Add a time field to each of your record. Each time a record comes in > from the source, write down the time(t1), so that we can get the latency at > sink(t2) by t2 - t1. > > Hope this helps. > Hequn > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#latency-tracking > > On Tue, Jun 26, 2018 at 5:23 AM, antonio saldivar > wrote: > >> Hello >> >> I am trying to measure the latency of each transaction traveling across >> the system as a DataSource I have a Kafka consumer and I would like to >> measure the time that takes from the Source to Sink. Does any one has an >> example?. >> >> Thank you >> Best Regards >> > >
Measure Latency from source to sink
Hello I am trying to measure the latency of each transaction traveling across the system as a DataSource I have a Kafka consumer and I would like to measure the time that takes from the Source to Sink. Does any one has an example?. Thank you Best Regards
Re: Flink multiple windows
Thank you very much Fabian I found that solution in the link below and this is the bes fit for my use case https://stackoverflow.com/questions/47458497/apache-flink-how-to-apply-multiple-counting-window-functions?rq=1 I am still testing how to Count (example. numTransactions >= 3) then I sum I am also using *Tumbling windows * *Best Regards* 2018-06-10 6:04 GMT-04:00 Fabian Hueske : > Hi Antonio, > > Cascading window aggregations as done in your example is a good idea and > is preferable if the aggregation function is combinable, which is true for > sum (count can be done as sum of 1s). > > Best, Fabian > > 2018-06-09 4:00 GMT+02:00 antonio saldivar : > >> Hello >> >> Has anyone work this way? I am asking because I have to get the >> aggregation ( Sum and Count) for multiple windows size (10 mins, 20 mins, >> 30 mins) please let me know if this works properly or is there other good >> solution. >> >> >> DataStream data = ... >> // append a Long 1 to each record to count it. >> DataStream> withOnes = data.map(new AppendOne); >> >> DataStream> 1minCnts = withOnes >> // key by String field >> .keyBy(0) >> // define time window >> .timeWindow(Time.of(1, MINUTES)) >> // sum ones of the Long field >> // in practice you want to use an incrementally aggregating ReduceFunction >> and >> // a WindowFunction to extract the start/end timestamp of the window >> .sum(1); >> >> // emit 1-min counts to wherever you need it >> 1minCnts.addSink(new YourSink()); >> >> // compute 5-min counts based on 1-min counts >> DataStream> 5minCnts = 1minCnts >> // key by String field >> .keyBy(0) >> // define time window of 5 minutes >> .timeWindow(Time.of(5, MINUTES)) >> // sum the 1-minute counts in the Long field >> .sum(1); >> >> // emit 5-min counts to wherever you need it >> 5minCnts.addSink(new YourSink()); >> >> // continue with 1 day window and 1 week window >> >> Thank you Regards >> >> >> >
Flink multiple windows
Hello Has anyone work this way? I am asking because I have to get the aggregation ( Sum and Count) for multiple windows size (10 mins, 20 mins, 30 mins) please let me know if this works properly or is there other good solution. DataStream data = ... // append a Long 1 to each record to count it. DataStream> withOnes = data.map(new AppendOne); DataStream> 1minCnts = withOnes // key by String field .keyBy(0) // define time window .timeWindow(Time.of(1, MINUTES)) // sum ones of the Long field // in practice you want to use an incrementally aggregating ReduceFunction and // a WindowFunction to extract the start/end timestamp of the window .sum(1); // emit 1-min counts to wherever you need it 1minCnts.addSink(new YourSink()); // compute 5-min counts based on 1-min counts DataStream> 5minCnts = 1minCnts // key by String field .keyBy(0) // define time window of 5 minutes .timeWindow(Time.of(5, MINUTES)) // sum the 1-minute counts in the Long field .sum(1); // emit 5-min counts to wherever you need it 5minCnts.addSink(new YourSink()); // continue with 1 day window and 1 week window Thank you Regards
Take elements from window
Hello I am wondering if it is possible to process the following scenario, to store all events by event time in a general window and process elements from a smaller time Frame 1.- Store elements in a General SlidingWindow (60 mins, 10 mins) - Rule 1 -> gets 10 mins elements from the general window and get aggregations - Rule 2 -> gets 20 mins elements from the general window and get aggregations - Rule 3 -> gets 30 mins elements from the general window and get aggregations 2.- send results Thank you Regards