Re: Filter columns of a csv file with Flink

2018-07-06 Thread Hequn Cheng
Hi francois, > I see that CsvTableSource allows to define csv fields. Then, will it check if columns actually exists in the file and throw Exception if not ? Currently, CsvTableSource doesn't support Avro. CsvTableSource uses fieldDelim and rowDelim to parse data. But there is a workaround: read

Re: Slide Window Compute Optimization

2018-07-06 Thread Rong Rong
+1. Yes your use case would probably fit best in the OVER aggregate use case. I actually created for myself a complimentary note for some of the complex aggregate components on top of Flink

Re: Description of Flink event time processing

2018-07-06 Thread Elias Levy
Apologies. Comments are now enabled. On Thu, Jul 5, 2018 at 6:09 PM Rong Rong wrote: > Hi Elias, > > Thanks for putting together the document. This is actually a very good, > well-rounded document. > I think you did not to enable access for comments for the link. Would you > mind enabling

StateMigrationException when switching from TypeInformation.of to createTypeInformation

2018-07-06 Thread Elias Levy
During some refactoring we changed a job using managed state from: ListStateDescriptor("config", TypeInformation.of(new TypeHint[ConfigState]() {})) to ListStateDescriptor("config", createTypeInformation[ConfigState]) After this change, Flink refused to start the new job from a savepoint or

Re: flink1.5 web UI

2018-07-06 Thread Vishal Santoshi
It seems it is the UI refresh that forces the loop on the job server. From flink cli it does it once.. So this might be a false alarm. On Fri, Jul 6, 2018 at 4:55 PM, Vishal Santoshi wrote: > The UI shows the following and the JM goes into a convulsions trying to > retrieve a jobiid as above.

Re: flink1.5 web UI

2018-07-06 Thread Vishal Santoshi
The UI shows the following and the JM goes into a convulsions trying to retrieve a jobiid as above. org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. On Fri, Jul 6, 2018 at 4:53 PM, Vishal Santoshi wrote: > If we submit a job through CLI and it has

flink1.5 web UI

2018-07-06 Thread Vishal Santoshi
If we submit a job through CLI and it has an error ( missing args and so on ) , the JM goes into convulsions. It seems it submits a job without fist validating and then goes into a loop trying to figure out the job Jul 06 16:51:26 flink-9edd15d7.bf2.tumblr.net docker[31171]: at

Re: Filter columns of a csv file with Flink

2018-07-06 Thread françois lacombe
Hi Hequn, The Table-API is really great. I will use and certainly love it to solve the issues I mentioned before One subsequent question regarding Table-API : I've got my csv files and avro schemas that describe them. As my users can send erroneous files, inconsistent with schemas, I want to

Re: [BucketingSink] incorrect indexing of part files, when part suffix is specified

2018-07-06 Thread Rinat
Hi Mingey ! I’ve implemented the group of tests, that shows that problem exists only when part suffix is specified and file in pending state exists here is an exception

Re: Filter columns of a csv file with Flink

2018-07-06 Thread Hequn Cheng
Hi francois, If I understand correctly, you can use sql or table-api to solve you problem. As you want to project part of columns from source, a columnar storage like parquet/orc would be efficient. Currently, ORC table source is supported in flink, you can find more details here[1]. Also, there

Re: Limiting in flight data

2018-07-06 Thread Vishal Santoshi
Further if there is are metrics that allows us to chart delays per pipe on n/w buffers, that would be immensely helpful. On Fri, Jul 6, 2018 at 10:02 AM, Vishal Santoshi wrote: > Awesome, thank you for pointing that out. We have seen stability on pipes > where previously throttling the source (

Re: Limiting in flight data

2018-07-06 Thread Vishal Santoshi
Awesome, thank you for pointing that out. We have seen stability on pipes where previously throttling the source ( rateLimiter ) was the only way out. https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java#L291 This though

flink rocksdb where to configure mount point

2018-07-06 Thread Siew Wai Yow
Hi, We configure rocksdb as statebackend and checkpoint dir persists to hdfs. When the job is run, rocksdb automatically mount to tmpfs /tmp, which consume memory. RocksDBStateBackend rocksdb = new RocksDBStateBackend(new FsStateBackend(hdfs://), true);

Filter columns of a csv file with Flink

2018-07-06 Thread françois lacombe
Hi all, I'm a new user to Flink community. This tool sounds great to achieve some data loading of millions-rows files into a pgsql db for a new project. As I read docs and examples, a proper use case of csv loading into pgsql can't be found. The file I want to load isn't following the same

Re: is there a config to ask taskmanager to keep retrying connect to jobmanager after Disassociated?

2018-07-06 Thread Vishal Santoshi
Yep, pwrfect, that we do. Can you confirm though that jobs will restart in the case of a failover ? That is what we see and that is fine.. On Fri, Jul 6, 2018, 8:24 AM Chesnay Schepler wrote: > If i remember correctly the masters file is only used by the > [start|stop]-cluster.sh scripts to

Re: is there a config to ask taskmanager to keep retrying connect to jobmanager after Disassociated?

2018-07-06 Thread Chesnay Schepler
If i remember correctly the masters file is only used by the [start|stop]-cluster.sh scripts to determine how many JobManagers should be started / stopped and which port they should use. it's not necessarily /required/, but without it you have to manually start/stop all jobmanagers. On

Re: is there a config to ask taskmanager to keep retrying connect to jobmanager after Disassociated?

2018-07-06 Thread Vishal Santoshi
Even though I must admit that the jobs restart but they do restart successfully with the new JM. On Fri, Jul 6, 2018, 8:08 AM Vishal Santoshi wrote: > Hello Chesnay, I have used an HA setup without the masters file and have > seen failover happen based on alerts from a leader election

Re: is there a config to ask taskmanager to keep retrying connect to jobmanager after Disassociated?

2018-07-06 Thread Vishal Santoshi
Hello Chesnay, I have used an HA setup without the masters file and have seen failover happen based on alerts from a leader election routine Is it actually required that there be a masters file when there is a central arbiterer ZK that has the alive JMs and a call back to force TMs to switch

Re: A use-case for Flink and reactive systems

2018-07-06 Thread Fabian Hueske
Hi Yersinia, let me reply to some of your questions. I think these answers should also address most of Mich's questions as well. > What you are saying is I can either use Flink and forget database layer, or make a java microservice with a database. Mixing Flink with a Database doesn't make any

Re: Slide Window Compute Optimization

2018-07-06 Thread Fabian Hueske
Hi Yennie, You might want to have a look at the OVER windows of Flink's Table API or SQL [1]. An OVER window computes an aggregate (such as a count) for each incoming record over a range of previous events. For example the query: SELECT ip, successful, COUNT(*) OVER (PARTITION BY ip, successful

Re: Dynamic Rule Evaluation in Flink

2018-07-06 Thread Puneet Kinra
Hi Fabian I know you can connect 2 streams with heterogeneous schema using connect function. that has only one port or one parameter. can you send more than one heterogeneous stream to connect. On Thu, Jul 5, 2018 at 6:37 PM, Fabian Hueske wrote: > Hi, > > > Flink doesn't support connecting

Re: Slide Window Compute Optimization

2018-07-06 Thread YennieChen88
Hi Kostas and Rong, Thank you for your reply. As both of you ask for more info about my use case, I now reply in unison. My case is used for counting the number of successful login and failures within one hour, keyBy other login related attributes (e.g. ip, device, login type ...).