Re: Flink Cluster and Pipeline Version Compatibility?

2018-07-27 Thread vino yang
Hi Jlist9: Most of Flink's APIs try to ensure backward compatibility. But no documentation gives all APIs to do this. With the development of Flink, some features have changed dramatically, such as State, so Flink's official website gives the migration guidance document[1]. So, my personal

Re: How to connect more than 2 hetrogenous Streams!!

2018-07-27 Thread vino yang
Hi Puneet, Hequn gave you two good solutions. If you have a good knowledge of Flink DataStream API. You can also customize it to connect more than two streams, you must know : - DataStream#connect API - ConnectedStreams - CoMapFunction, CoFlatMapFunction... Referring to them, you can

Re: Does Flink release Hadoop(R) 2.8 work with Hadoop 3?

2018-07-28 Thread vino yang
Hi Mich, I think this depends on the backward compatibility of the Hadoop client API. In theory, there is no problem. Hadoop 2.8 to Hadoop 3.0 is a very large upgrade, and personally recommend using a client version that is consistent with the Hadoop cluster. By compiling and packaging from

Re: Old job resurrected during HA failover

2018-08-01 Thread vino yang
Hi Elias, If a job is explicitly canceled, its jobgraph node on ZK will be deleted. However, it is worth noting here that Flink enables a background thread to asynchronously delete the jobGraph node, so there may be cases where it cannot be deleted. On the other hand, the jobgraph node on ZK is

Re: Dynamical Windows

2018-08-01 Thread vino yang
Hi antonio, The keyBy API can accept a KeySelector [1] which is a interface you can implement to specify the key for your business. I think you can use it and implement its getKey method. In the method, you can access outer system (such as Zookeeper) to get a dynamic key. It's just an idea, you

Re: Dynamical Windows

2018-08-01 Thread vino yang
Sorry, the KeySelector's Java doc is here : https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/functions/KeySelector.html 2018-08-01 23:57 GMT+08:00 vino yang : > Hi antonio, > > The keyBy API can accept a KeySelector [1] which is a interfac

Re: Delay in REST/UI readiness during JM recovery

2018-08-02 Thread vino yang
Hi Joey, Good question! I will copy it to Till and Chesnay who know this part of the implementation. Thanks, vino. 2018-08-03 11:09 GMT+08:00 Joey Echeverria : > I don’t have logs available yet, but I do have some information from ZK. > > The culprit appears to be the

Re: Converting a DataStream into a Table throws error

2018-08-02 Thread vino yang
Hi Mich, I have reviewed your code in the github you provided. I copied your code to org.apache.flink.table.examples.scala under flink-examples-table. It passed the compilation and didn't report the exception you provided, although there are other exceptions (it's about hdfs, this is because of

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread vino yang
Hi Averell, As far as I know, the custom partitioner will inevitably lead to shuffle of data. Even if it is bundled in the logic of the source function, isn't the behavior different? Thanks, vino. 2018-07-30 20:32 GMT+08:00 Averell : > Thanks Vino. > > Yes, I can do that after the source

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread vino yang
Hi Averell, Yes, you can not do it in the source function. I think you can call keyBy with a partitioner (based on NodeID) after source. Why do you have to use the customized partitioner in the source function? Thanks, vino. 2018-07-30 19:56 GMT+08:00 Averell : > Thank you Vino. > > Yes, I

Re: Detect late data in processing time

2018-07-30 Thread vino yang
Hi Soheil, Watermark indicates the progress of the Event time. The reason it exists is because there is a Time skew between Event time and Processing time. Hequn is correct and Watermark cannot be used for processing time. The processing time will be based on the TM local system clock. Usually,

Re: flink command line with save point job is not being submitted

2018-07-30 Thread vino yang
Hi Darshan, This is a known problem with Flink, and no specific exception information is given, making diagnosis more difficult. I personally guess that you are using a local file system, which may be the cause of the problem. Can you specify a HDFS with access permission for Savepoint? Thanks,

Re: Committing Kafka Transactions during Savepoint

2018-07-30 Thread vino yang
Hi Scott, For EXACTLY_ONCE in sink end with Kafka 0.11+ producer, The answer is YES. There is a official documentation you can have a good knowledge of this topic[1]. [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011 Thanks, vino. 2018-07-27

Re: Logs are not easy to read through webUI

2018-07-30 Thread vino yang
bmanager.log.2018-07-29) will > never change, maybe blob store can cache some of them to improve > performance. > > BTW, please considering to develop an API for reading logs. I think many > flink users meet this problem. > > Thanks! > > Xinyu Zhang > > > 2018年7月30日星

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread vino yang
Hi Averell, Did you know Flink allow you to customize a partitioner? Some resource : official documentation : https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/#physical-partitioning discussing in mailing list :

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread vino yang
Hi Averell, The keyBy transformation will trigger the key partition, which is one of the various partition types supported by Flink, which causes the data to be shuffled. It routes the keys of the same hash value to the same node based on the hash of the key you passed (or generated by the custom

Re: splitting DataStream throws error

2018-07-30 Thread vino yang
Answered Mich privately, copy here: *Hi Mich,* *The use of Split directly on the stream object is wrong. * *It is used to split the data in the stream object, not the format of the stream object data itself. In this scenario, if you want to parse the data, use the map function only after the

Re: scala IT

2018-07-30 Thread vino yang
Hi Nicos, The thrown exception has given you a clear solution hint: The return type of function 'apply(Mu ltiplyByTwoTest.scala:43)' could not be determined automatically, due to type erasure. You can giv e type information hints by using the returns(...) method on the result of the

Re: not found: type CustomWatermarkEmitter

2018-07-29 Thread vino yang
Hi Mich, These two mistakes are obvious. 1): The compiler can not find the definition of CustomWatermarkEmitter. Did you define it? Or import the dependency if it defines in other place? 2): The type of variable "myCustomer" is "DataStreamSource", but env.addSource method receive a source

Re: "Futures timed out" when trying to cancel a job with savepoint

2018-07-29 Thread vino yang
Hi Julio, We also encountered this problem on YARN, Savepoint has been completed, and JM has been successfully stopped, but the client is still trying to access the original JM port, which caused a timeout. It seems that this is a problem with Flink itself. I can't give you the answer to this

Re: Checkpointing not happening in Standalone HA mode

2018-07-27 Thread vino yang
Job Manager logs where it says Collection Source is > not being executed at the moment. Aborting checkpoint. In the pipeline I > have a stream initialized using "fromCollection". I think I will have to > get rid of this. > > What do you suggest > > Regards, > Vinay Pa

Re: Custom Window example (data-based)

2018-07-27 Thread vino yang
Hi Chris, I just find some resource you can have a look, list below: - Flink official documentation : https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/windows.html#built-in-and-custom-triggers - Custom, Complex Windows at Scale using Apache

Re: 答复: Best way to find the current alive jobmanager with HA mode zookeeper

2018-07-25 Thread vino yang
id notice there is webMonitorRetrievalService member in Flink > 1.5. > > > > I wonder if I can use RestClusterClient@v1.5 on my client side, to > retrieve the leader JM of Flink v1.4 Cluster. > > > > Thanks > > Youjun > > > > *发件人**:* vino yang > *发送时间:* Wednesday, July 25, 2

Re: Checkpointing not happening in Standalone HA mode

2018-07-26 Thread vino yang
Hi Vinay: Did you call specific config API refer to this documentation[1]; Can you share your job program and JM Log? Or the JM log contains the log message like this pattern "Triggering checkpoint {} @ {} for job {}."? [1]:

Re: override jvm params

2018-07-26 Thread vino yang
Hi Cussac, Flink on Yarn support dynamic properties. Can you try this : -yD=? The implementation is here[1][2]. [1]: https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L151 [2]:

Re: checkpoint always fails

2018-07-25 Thread vino yang
Hi Marvin, Thanks for reporting this issue. Can you share more details about the failed checkpoint, such as log, exception stack trace, which statebackend used, HA configuration? These information can help to trace the issue. Thanks, vino. 2018-07-26 10:12 GMT+08:00 Marvin777 : > Hi, all: >

Re: Event time didn't advance because of some idle slots

2018-07-31 Thread vino yang
Pourbafrani : > Hi vino, > > Could you please show markAsTemporary usage by a simple example? > Thanks > > On Tue, Jul 31, 2018 at 2:10 PM, vino yang wrote: > >> Hi Soheil, >> >> The documentation of markAsTemporarilyIdle method is here : >> https://ci

Re: Converting a DataStream into a Table throws error

2018-07-31 Thread vino yang
Hi Mich, The field specified by the fromDataStream API must match the number of fields contained in the DataStream stream object, your DataStream's type is just a string, example is here.[1] [1]:

Re: python vs java api

2018-07-31 Thread vino yang
ies in terms of the transformations is clear but > what about timestamps and state? > > On Tue, Jul 31, 2018 at 6:47 PM vino yang wrote: > >> Hi Nicos, >> >> You can read the official documentation of latest Python API about >> DataStream transformation[1] and latest Java

Re: python vs java api

2018-07-31 Thread vino yang
Hi Nicos, You can read the official documentation of latest Python API about DataStream transformation[1] and latest Java API transformation[2]. However, the latest documentation may not react the new feature especially for Python API, so you can also compare the implementation of

Re: Rest API calls

2018-07-31 Thread vino yang
Hi yuvraj, The documentation of Flink REST API is here : https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/rest_api.html#monitoring-rest-api Thanks, vino. 2018-08-01 3:29 GMT+08:00 yuvraj singh <19yuvrajsing...@gmail.com>: > Hi I have a use case where I need to call rest

Re: scala IT

2018-07-31 Thread vino yang
Hi Nicos, The returns API is not deprecated, just because it is part of the DataStream Java API. Thanks, vino. 2018-07-31 15:15 GMT+08:00 Nicos Maris : > Isn't the returns functions deprecated? > > On Tue, Jul 31, 2018, 5:32 AM vino yang wrote: > >> Hi Nicos, >> >

Re: Multiple output operations in a job vs multiple jobs

2018-07-31 Thread vino yang
Hi anna, 1. The srcstream is a very high volume stream and the window size is 2 weeks and 4 weeks. Is the window size a problem? In this case, I think it is not a problem because I am using reduce which stores only 1 value per window. Is that right? *>> Window Size is based on your business

Re: Rest API calls

2018-08-01 Thread vino yang
Singh > > On Wed, Aug 1, 2018, 8:32 AM vino yang wrote: > >> Hi yuvraj, >> >> The documentation of Flink REST API is here : https://ci.apache.org/ >> projects/flink/flink-docs-release-1.5/monitoring/rest_ >> api.html#monitoring-rest-api >> >&g

Re: Flink log and out files

2018-08-01 Thread vino yang
Hi Alexander: .log and .out are different. Usually, the .log file stores the log information output by the log framework. Flink uses slf4j as the log interface and supports log4j and logback configurations. The .out file stores the STDOUT information. This information is usually output by you

Re: Flink log and out files

2018-08-01 Thread vino yang
exander, >> >> there is also a doc link where log configuration is described: >> https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/ >> logging.html >> You can modify log configuration in conf directory according to logging >> framework docs.

Re: Old job resurrected during HA failover

2018-08-01 Thread vino yang
Hi Elias, Your analysis is correct, yes, in theory the old jobgraph should be deleted, but Flink currently uses the method of locking and asynchronously deleting Path, so that it can not give you the acknowledgment of deleting, so this is a risk point. cc Till, there have been users who have

Re: Converting a DataStream into a Table throws error

2018-08-01 Thread vino yang
Hi Mich, It seems that the type of your DataStream stream is always wrong. If you want to specify four fields, usually the DataStream type should be similar: DataStream[(Type1, Type2, Type3, Type4)], not DataStream[String], you can try it. Thanks, vino 2018-08-02 6:44 GMT+08:00 Mich Talebzadeh

Re: Delay in REST/UI readiness during JM recovery

2018-08-01 Thread vino yang
Hi Joey, Currently rest endpoints are hosted in JM. Your scenario is at JM failover, and your cluster is running so many jobs. Here, it takes a certain amount of time for ZK to conduct the Leader election. Then JM needs to wait for the TM registration. So many jobs need to be restored and start

Re: Multiple output operations in a job vs multiple jobs

2018-08-02 Thread vino yang
this. Thanks, vino. 2018-08-02 16:11 GMT+08:00 Paul Lam : > > > > 在 2018年7月31日,15:47,vino yang 写道: > > > > Hi anna, > > > > 1. The srcstream is a very high volume stream and the window size is 2 > weeks and 4 weeks. Is the window size a problem? In this case, I t

Re: Could not retrieve the redirect address - No REST endpoint has been started

2018-08-02 Thread vino yang
Hi Pedro, It sounds like a bug from Flink itself. You can create an issue in JIAR and give enough information, such as logs, completed exceptions, Flink versions, and your usage environment. Thanks, vino. 2018-08-02 16:45 GMT+08:00 PedroMrChaves : > Hello, > > It happens whether the WEB UI is

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread vino yang
Hi Averell, Actually, Performing a key partition inside the Source Function is the same as DataStream[Source].keyBy(cumstom partitioner), because keyBy is not a real operator, but a virtual node in a DAG, which does not correspond to a physical operator. Thanks, vino. 2018-07-31 10:52 GMT+08:00

Re: Detect late data in processing time

2018-07-30 Thread vino yang
Hi Averell, I personally don't recommend this. In fact, Processing Time uses the local physical clock of the node where the specific task is located, rather than setting it upstream in advance. This is a bit like another time concept provided by Flink - Ingestion Time. So, If you do not specify

Re: Event time didn't advance because of some idle slots

2018-07-31 Thread vino yang
Hi Soheil, The documentation of markAsTemporarilyIdle method is here : https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/org/apache/flink/streaming/api/functions/source/SourceFunction.SourceContext.html#markAsTemporarilyIdle-- Thanks, vino. 2018-07-31 17:14 GMT+08:00 Hequn

Re: Logs are not easy to read through webUI

2018-07-30 Thread vino yang
Hi Xinyu, This is indeed a problem. Especially when the amount of logs is large, it may even cause the UI to stall for a long time. The same is true for YRAN, and there is really no good way to do it at the moment. Thank you for your suggestion, do you mean "periodic reading" refers to full or

Re: watermark VS window trigger

2018-07-30 Thread vino yang
Hi Soheil, I feel that some of your understanding is a bit problematic. *"After that according to the current watermark, data with the timestamp between the last watermark and current watermark will be released and go to the next steps"* The main role of Watermark here is to define the progress

Re: Logs are not easy to read through webUI

2018-07-30 Thread vino yang
nformation will lessen the > burden of jobmanager, especially when there are many taskManagers in the > flink cluster. > > > 2018年7月30日星期一,vino yang 写道: > >> Hi Xinyu, >> >> This is indeed a problem. Especially when the amount of logs is large, it >&

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-09 Thread vino yang
arm if > I keep retrying cancel with savepoint until the job stops – I expect that > an overlapping cancel request is ignored if the job is already creating a > savepoint. Please correct if my assumption is wrong. > > On Thu, Aug 9, 2018 at 5:04 AM vino yang wrote: > >> Hi

Re: Using sensitive configuration/credentials

2018-08-09 Thread vino yang
idden from the logs and UI. > > On 09.08.2018 04:28, vino yang wrote: > > Hi Matt, > > Flink is currently enhancing its security, such as the current data > transmission can be configured with SSL mode[1]. > However, some problems involving configuration and web ui display do > exist,

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-08 Thread vino yang
Hi Juho, This problem does exist, I suggest you separate these two steps to temporarily deal with this problem: 1) Trigger Savepoint separately; 2) execute the cancel command; Hi Till, Chesnay: Our internal environment and multiple users on the mailing list have encountered similar problems.

Re: Using sensitive configuration/credentials

2018-08-08 Thread vino yang
Hi Matt, Flink is currently enhancing its security, such as the current data transmission can be configured with SSL mode[1]. However, some problems involving configuration and web ui display do exist, and they are still displayed in plain text. I think a temporary way to do this is to keep your

Re: [ANNOUNCE] Apache Flink 1.6.0 released

2018-08-09 Thread vino yang
Congratulations! Great work! Till, thank you for advancing the smooth release of Flink 1.6. Vino. Till Rohrmann 于2018年8月9日周四 下午7:21写道: > The Apache Flink community is very happy to announce the release of Apache > Flink 1.6.0. > > Apache Flink® is an open-source stream processing framework

Re: Yahoo Streaming Benchmark on a Flink 1.5 cluster

2018-08-10 Thread vino yang
Hi Namu, I don't think you need to pay attention to the internals of the Flink API. Its interface is backward compatible. If you update the dependent version of the API and the corresponding version of the Flink system so that their versions are consistent, there should be no problems. Please

Re: Delay in REST/UI readiness during JM recovery

2018-08-06 Thread vino yang
es.apache.org/jira/browse/FLINK-10078 > > Regarding (3) we’re doing some testing with different options for the > state storage. I’ll report back if we find anything significant there. > > -Joey > > > On Aug 6, 2018, at 8:47 AM, vino yang wrote: > > Hi Joey, >

Re: Running SQL to print to Std Out

2018-08-06 Thread vino yang
Hi Mich, Hequn is correct, click on the “Task Managers” menu located on the left side of the Flink web UI, select a TM from the TM list, and then click on the “Stdout” option in the TM Details tab. Thanks, vino. 2018-08-07 9:42 GMT+08:00 Hequn Cheng : > Hi Mich, > > When you print to stdout on

Re: Could not build the program from JAR file.

2018-08-07 Thread vino yang
Hi Florian, The error message is because of a FileNotFoundException, see here[1]. Is there any more information about the exception. Do you make sure the jar exist? [1]: https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209

Re: Filter-Join Ordering Issue

2018-08-07 Thread vino yang
Hi Dylan, I roughly looked at your job program and the DAG of the job. It seems that the optimizer chose the wrong optimization execution plan. cc Till. Thanks, vino. Dylan Adams 于2018年8月8日周三 上午2:26写道: > I'm trying to use the Flink DataSet API to validate some records and have > run into an

Re: Passing the individual table coilumn values to the local variables

2018-08-07 Thread vino yang
Hi Mich, Here you need to understand that the print call does not print the value of a field, it is actually a call to an output to STDOUT sink. So, what you get here is not the value of a variable, please refer to the hequn recommendation. Thanks, vino. Hequn Cheng 于2018年8月8日周三 上午9:11写道: >

Re: checkpoint recovery behavior when kafka source is set to start from timestamp

2018-08-07 Thread vino yang
Hi Yan Zhou: I think the java doc of the setStartFromTimestamp method has been explained very clearly, posted here: */*** ** Specify the consumer to start reading partitions from a specified timestamp.* ** The specified timestamp must be before the current timestamp.* ** This lets the consumer

Re: VerifyError when running Python streaming job

2018-08-08 Thread vino yang
Hi Joe, Did you try the word_count example from the flink codebase?[1] Recently, I tried this example, it works fine to me. An example of an official document may not guarantee your success due to maintenance issues. cc @Chesnay [1]:

Re: Small-files source - partitioning based on prefix of file

2018-08-08 Thread vino yang
Hi Averell, You need to understand that Flink reflects the recovery of the state, not the recovery of the record. Of course, sometimes your record is state, but sometimes the intermediate result of your record is the state. It depends on your business logic and your operators. Thanks, vino.

Re: CoFlatMapFunction with more than two input streams

2018-08-15 Thread vino yang
Hi Averell, What I mean is that if you store stream_c data in an RDBMS, you can access the RDBMS directly in the CoFlatMapFunction instead of using the Table API. This is somewhat similar to stream and dimension table joins. Of course, the premise of adopting this option is that the amount of

Re: Flink CLI does not return after submitting yarn job in detached mode

2018-08-16 Thread vino yang
Hi Marvin777, You are wrong. It uses the Flink on YARN single job mode and should use the "-yd" parameter. Hi Madhav, I seem to have found the problem, the source code of your log is here.[1] It is based on a judgment method "isUsingInteractiveMode". The source code for this method is

Re: cannot find TwoInputStreamOperatorTestHarness after upgrade to Flink 1.6.0

2018-08-13 Thread vino yang
t; *flink-streaming-java_${scala.binary.version}* > ${flink.version} > *test-jar* > > > ``` > > Once I included it, the issue got resolved. > > Thank you, > Dmitry > > On Sat, Aug 11, 2018 at 11:08 PM vino yang wrote: > >> Hi Dmitry, >> >> I c

Re: Limit on number of files to read for Dataset

2018-08-13 Thread vino yang
Hi Darshan, In a distributed scenario, there is no limit in theory, but there are still some real-world conditions that will cause some constraints, such as the size of your individual files, the memory size of your TM configuration, and so on. In addition, your "single" here is logical or

Re: ClassCastException when writing to a Kafka stream with Flink + Python

2018-08-14 Thread vino yang
Hi Joe, ping Chesnay for you, please wait for the reply. Thanks, vino. Joe Malt 于2018年8月15日周三 上午7:16写道: > Hi, > > I'm trying to write to a Kafka stream in a Flink job using the new Python > streaming API. > > My program looks like this: > > def main(factory): > > props = Properties() >

Re: watermark does not progress

2018-08-14 Thread vino yang
Hi Johe, In local mode, it should also work. When you debug, you can set a breakpoint in the getCurrentWatermark method to see if you can enter the method and if the behavior is what you expect. What is your source? If you post your code, it might be easier to locate. In addition, for positioning

Re: CoFlatMapFunction with more than two input streams

2018-08-15 Thread vino yang
Hi Averell, As far as these two solutions are concerned, I think you can only choose option 2, because as you have stated, the current Flink DataStream API does not support the replacement of one of the input stream types of CoFlatMapFunction. Another choice: 1. Split it into two separate jobs.

Re: Flink CLI does not return after submitting yarn job in detached mode

2018-08-16 Thread vino yang
Hi Madhav, Can you set the log level to DEBUG in the log4j-client configuration file? Then post the log. I can try to locate it through the log. Thanks, vino. makelkar 于2018年8月17日周五 上午1:27写道: > Hi Vino, >We should not have to specify class name using -c option to run > job in

Re: processWindowFunction

2018-08-16 Thread vino yang
Hi Antonio, What results do not you want to get when creating each window? Examples of the use of ProcessWindowFunction are included in many test files in Flink's project, such as SideOutputITCase.scala or WindowTranslationTest.scala. For more information on ProcessWindowFunction, you can refer

Re: Could not build the program from JAR file.

2018-08-07 Thread vino yang
o.FileNotFoundException: JAR file does not exist: -yn > at org.apache.flink.client.cli.CliFrontend.buildProgram( > CliFrontend.java:828) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend. > java:205) > ... 4 more > > > >

Re: Delay in REST/UI readiness during JM recovery

2018-08-06 Thread vino yang
to decrease the number of jobs which need to be > recovered in case of a failure. > > Thanks a lot for reporting and analysing this problem. This is definitely > something we should improve! > > Cheers, > Till > > On Fri, Aug 3, 2018 at 5:48 AM vino yang wrote: > >> Hi

Re: connection failed when running flink in a cluster

2018-08-06 Thread vino yang
gt;> at org.apache.flink.streaming.api.operators.StreamSource. >>> run(StreamSource.java:56) >>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run( >>> SourceStreamTask.java:99) >>> at org.apache.flink.streaming.runtime.tasks.StreamTask. >&

Re: connection failed when running flink in a cluster

2018-08-06 Thread vino yang
Hi Felipe, >From the exception information, it seems that you did not start the socket server, the socket source needs to connect to the socket server. Please make sure the socket server has started and is available. Thanks, vino. 2018-08-06 18:45 GMT+08:00 Felipe Gutierrez : > yes. > > when

Re: How to do test in Flink?

2018-08-12 Thread vino yang
Hi Chang, Regarding the return value type, Scala allows the method to not specify the return value type, it can be inferred by the compiler, if you specify a non-Unit type compiler will report an error, if you do not specify an explicit type, the result of the method may be an error, test pass or

Re: flink on kubernetes

2018-08-13 Thread vino yang
Hi mingliang, Yes, you are right, the information that Flink on Kubernetes' current documentation can provide is not very detailed. However, considering that Kubernetes is so popular, the Flink community is currently refining it, this work is mainly done by Till, and you can follow this issue [1]

Re: Select node for a Flink DataStream execution

2018-08-12 Thread vino yang
Hi Rafael, For Standalone clusters, it seems that Flink does not provide such a feature. In general, at the execution level, we don't talk about DataStream, but we talk about Job. If your Flink is running on YARN, you can use YARN's Node Label feature to assign a Label to some Nodes. Earlier this

Re: Flink keyed stream windows

2018-08-13 Thread vino yang
Hi Garvit, Please refer to the Flink official documentation for the window description. [1] In this scenario, you should use Tumbling Windows. [2] If you want to call your own API to handle the window, you can extend the process window function to achieve your needs.[3] [1]:

Re: Select node for a Flink DataStream execution

2018-08-13 Thread vino yang
o select "job-steps" over "defined-nodes"? > Thanks again Vino and anyone who helps, > > Rafael. > > El lun, 13-08-2018 a las 10:22 +0800, vino yang escribió: > > Hi Rafael, > > For Standalone clusters, it seems that Flink does not provide such a

Re: Select node for a Flink DataStream execution

2018-08-13 Thread vino yang
nk. (Excluding > custom solutions like sockets or third parties like kafka) > > Thanks for your help!. > Rafael. > > El lun, 13-08-2018 a las 19:34 +0800, vino yang escribió: > > Hi Rafael, > > Flink does not support the interaction of DataStream in two Jobs. >

Re: Small-files source - partitioning based on prefix of file

2018-08-09 Thread vino yang
Hi Averell, In this case, I think you may need to extend Flink's existing source. First, read your tar.gz large file, when it been decompressed, use the multi-threaded ability to read the record in the source, and then parse the data format (map / flatmap might be a suitable operator, you can

Re: cannot find TwoInputStreamOperatorTestHarness after upgrade to Flink 1.6.0

2018-08-12 Thread vino yang
Hi Dmitry, I confirmed that this class is included in the source code of Flink-1.6 [1]. I just downloaded the source code of Flink-1.6 [2], and then run *mvn package -DskipTests* directly in the flink-streaming-java directory. The package is successful and decompiled to see that this class

Re: classloading strangeness with Avro in Flink

2018-08-20 Thread vino yang
so it's not a class version conflict >>> there. >>> >>> I'm using default child-first loading. It might be a further transitive >>> dependency, though it's not clear by stack trace or stepping through the >>> process. When I get a chance I'll look

Re: Flink checkpointing to Google Cloud Storage

2018-08-20 Thread vino yang
Hi Oleksandr, >From the exception log, you seem to lack the relevant dependencies? You can check again which dependency the related class belongs to. Thanks, vino. Oleksandr Serdiukov 于2018年8月21日周二 上午12:04写道: > Hello All! > > I am trying to configure checkpoints for flink jobs in GCS. > Now I

Re: classloading strangeness with Avro in Flink

2018-08-21 Thread vino yang
ek >> wrote: >> >>> Hi Cliff, >>> >>> Do you actually need the flink-shaded-hadoop2-uber.jar in lib. If you're >>> running on YARN, you should be able to just remove them because with YARN >>> you will have Hadoop in the classpath anyw

Re: getting error while running flink job on yarn cluster

2018-08-21 Thread vino yang
Hi Yubraj, The solution to a similar problem from StackOverflow is to explicitly define the serialVersionUID in your class. For more information, please visit here.[1] [1]: https://stackoverflow.com/questions/27647992/how-resolve-java-io-invalidclassexception-local-class-incompatible-stream-clas

Re: kafka consumer can not auto commit

2018-08-22 Thread vino yang
Hi zhao, Can you explain what version of Kafka connector you are using? Thanks, vino. 远远 于2018年8月22日周三 下午6:37写道: > I find kafka consumer can not auto commit, when I test kudu async client > with flink async io today. > - i do not enable checkpoint, and with procress time. > - the consumer

Re: question about setting different time window for operators

2018-08-23 Thread vino yang
Hi Stephen, Yes, of course. You can reuse upstream stream objects to build different windows. Thanks, vino. Stephen 于2018年8月24日周五 上午2:40写道: > Hi, > Is that possible to set different operators with different time windows in > a pipeline? For example, for the wordcount example, could I set

Re: jobmanager holds too many CLOSE_WAIT connection to datanode

2018-08-23 Thread vino yang
Hi Youjun, How long has your job been running for a long time? As far as I know, if in a short time, for checkpoint, jobmanager will not generate so many connections to HDFS. What is your Flink cluster environment? Standalone or Flink on YARN? In addition, does JM's log show any timeout

Re: jobmanager holds too many CLOSE_WAIT connection to datanode

2018-08-24 Thread vino yang
re, how to avoid jobmanager holing so many, apparently not > necessary, TCP connections? > > > > Thanks > > Youjun > > > > *发件人**:* vino yang > *发送时间:* Friday, August 24, 2018 10:26 AM > *收件人:* Yuan,Youjun > *抄送:* user > *主题:* Re: jobmanager holds too m

Re: Using a ProcessFunction as a "Source"

2018-08-24 Thread vino yang
Hi Addison, I have a lot of things I don't understand. Is your source self-generated message? Why can't source receive input? If the source is unacceptable then why is it called source? Isn't kafka-connector the input as source? If you mean that under normal circumstances it can't receive

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread vino yang
Hi Averell, The checkpoint is automatically triggered periodically according to the checkpoint interval set by the user. I believe that you should have no doubt about this. There are many reasons for the Job failure. The technical definition is that the Job does not normally enter the final

Re: anybody can start flink with job mode?

2018-08-24 Thread vino yang
Hi Hao Sun, >From the error log, it seems that the jar package for the job was not found. You must make sure your Jar is in the classpath. Related documentation may not be up-to-date, and there is a discussion on this issue on this mailing list. [1] I see that the status of FLINK-10001 [2] is

Re: [DISCUSS] Remove the slides under "Community & Project Info"

2018-08-26 Thread vino yang
+1 The reason is the same as Hequn, because we have given a link to SlideShare under the "Flink Forward" section. Thanks, vino. Hequn Cheng 于2018年8月27日周一 上午9:31写道: > Hi Stephan, > > Thanks for bringing up this discussion. > I think we can just remove it, because slides have already be provided

Re: Queryable state and state TTL

2018-08-28 Thread vino yang
Hi Elias, >From the source code, the reason for throwing this exception is because StateTtlConfig is set to StateTtlConfig.DISABLED. Please refer to the usage and description of the official Flink documentation for details.[1] And there is a note you should pay attention : Only TTLs in reference

Re: JobGraphs not cleaned up in HA mode

2018-08-28 Thread vino yang
The next thing to check is the session >> timeout of your ZooKeeper cluster. If you stop the job within the session >> timeout, then it is also not guaranteed that ZooKeeper has detected that >> the ephemeral nodes of the old JM must be deleted. In order to understand >> this b

Re: Why don't operations on KeyedStream return KeyedStream?

2018-08-28 Thread vino yang
Hi Elias, Can you express this matter more clearly? The reason the KeyedStream object exists is that it needs to provide some different transform methods than the DataStream object. These transform methods are limited to keyBy. Why do you need to execute keyBy twice to get a KeyedStream object?

Re: checkpoint failed due to s3 exception: request timeout

2018-08-28 Thread vino yang
Hi Tony, A while ago, I have answered a similar question.[1] You can try to increase this value appropriately. You can't put this configuration in flink-conf.yaml, you can put it in the submit command of the job[2], or in the configuration file you specify. [1]:

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread vino yang
ting for s3 > filesystem and I thought it might have a simple way to support this > setting like other s3.xxx config. > > Very much appreciate for your answer and help. > > Best, > Tony Wei > > 2018-08-29 11:51 GMT+08:00 vino yang : > >> Hi Tony, >> >> A whil

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread vino yang
Hi Vino, > > I thought this config is for aws s3 client, but this client is inner > flink-s3-fs-presto. > So, I guessed I should find a way to pass this config to this library. > > Best, > Tony Wei > > 2018-08-29 14:13 GMT+08:00 vino yang : > >> Hi Tony, >>

  1   2   3   4   5   >