Re: Data Type of timestamp in Streaming SQL Result? Long instead of timestamp?

2018-06-28 Thread Hequn Cheng
Hi chrisr, > *Is there a way to retrieve this from the query as a long instead?* You have to convert the timestamp type to long type. It seems there are no internal udf to convert timestamp to unix timestamp, however you can write one and used in your SQL. > *Question: how do I express that I

Re: How to deploy Flink in a geo-distributed environment

2018-06-28 Thread Tzu-Li (Gordon) Tai
Hi, It should be possible to deploy a single Flink cluster across geo-distributed nodes, but Flink currently offers no optimization for such a specific use case. AFAIK, the general pattern for dealing with geographically distributed data sources right now, would be to replicate data across

Re: Flink kafka consumers don't honor group.id

2018-06-28 Thread Tzu-Li (Gordon) Tai
Hi Giriraj, The fact that the Flink Kafka Consumer doesn't use the group.id property, is an expected behavior. Since the partition-to-subtask assignment of the Flink Kafka Consumer needs to be deterministic in Flink, the consumer uses static assignment instead of the more high-level consumer

Re: Let BucketingSink roll file on each checkpoint

2018-06-28 Thread XilangYan
Hi Febian, Finally I have time to read the code, and it is brilliant it does provide exactly once guarantee。 However I still suggest to add the function that can close a file when checkpoint made. I noticed that there is an enhancement https://issues.apache.org/jira/browse/FLINK-9138 which can

Data Type of timestamp in Streaming SQL Result? Long instead of timestamp?

2018-06-28 Thread chrisr123
I am trying to understand how to use streaming sql, very similar to the example from the documentation: count the number of pageclicks in a certain period of time for each user. I'm trying to solve the problem using both the SQL API and the table API My input sample stream looks like this:

Re: Few question about upgrade from 1.4 to 1.5 flink ( some very basic )

2018-06-28 Thread Vishal Santoshi
Thanks! I did not see the znode and thus did not paste the ls...anywaz will get you the full JM log ASAP On Thu, Jun 28, 2018, 5:35 PM Gary Yao wrote: > Hi Vishal, > > The znode /flink_test/da_15/leader/rest_server_lock should exist as long > as your > Flink 1.5 cluster is running. In 1.4

Re: Few question about upgrade from 1.4 to 1.5 flink ( some very basic )

2018-06-28 Thread Gary Yao
Hi Vishal, The znode /flink_test/da_15/leader/rest_server_lock should exist as long as your Flink 1.5 cluster is running. In 1.4 this znode will not be created. Are you sure that the znode does not exist? Unfortunately you only attached the output of "ls /flink_test/da_15". Can you share the

Re: String Interning

2018-06-28 Thread Elias Levy
Am I the only one that feels the config should be renamed or the docs on it expanded? Turning on object reuse doesn't really reuse objects, not in the sense that an object can be reused for different values / messages / records. Instead, it stops Flink from making copies of of a record, by

Re: How to partition within same physical node in Flink

2018-06-28 Thread ashish pok
Fabian, All, Along this same line, we have a datasource where we have parent key and child key. We need to first keyBy parent and then by child. If we want to have physical partitioning in a way where physical partiotioning happens first by parent key and localize grouping by child key, is

Re: Over Window Not Processing Messages

2018-06-28 Thread Gregory Fee
I'm writing a custom S3 source in order to work around some issues with back pressure and checkpointing at scale in my bootstrap logic. I moved around the logic to assign timestamps and watermarks. As part of that I ended up generating watermarks earlier in the pipeline but having another operator

Re: Few question about upgrade from 1.4 to 1.5 flink ( some very basic )

2018-06-28 Thread Vishal Santoshi
I am not seeing rest_server_lock. Is it transient ( ephemeral znode ) for the duration of the cli command ? [zk: localhost:2181(CONNECTED) 2] ls /flink_test/da_15 [jobgraphs, leader, checkpoints, leaderlatch, checkpoint-counter] The logs say 2018-06-28 14:02:56 INFO

Re: How to partition within same physical node in Flink

2018-06-28 Thread Fabian Hueske
Hi Vijay, Flink does not provide fine-grained control to place keys to certain slots or machines. When specifying a key, it is up to Flink (i.e., its internal hash function) where the data is processed. This works well for large key spaces, but can be difficult if you have only a few keys. So,

Re: DataSet with Multiple reduce Actions

2018-06-28 Thread Fabian Hueske
Hi Osh, You can certainly apply multiple reduce function on a DataSet, however, you should make sure that the data is only partitioned and sorted once. Moreover, you would end up with multiple data sets that you need to join afterwards. I think the easier approach is to wrap your functions in a

Re: Over Window Not Processing Messages

2018-06-28 Thread Fabian Hueske
In a nutshell the Over operator works as follows: - When a row arrives it is put into a MapState keyed on its timestamp and a timer is registered to process it when the watermark passes that timestamp. - All the heavy computation is done in the onTimer() method. For each unique timestamp, the Over

Flink kafka consumers don't honor group.id

2018-06-28 Thread Giriraj
It seems that Flink kafka consumer don't honor group.id while consumers are added dynamically. Lets say I have some flink kafka consumers reading from a topic and I dynamically add some new Flink kafka consumers with same group.id, kafka messages are getting duplicated to existing as well as new

Re: Streaming

2018-06-28 Thread Hequn Cheng
Hi aitozi, 1> how will sql translated into a datastream job? The Table API and SQL leverage Apache Calcite for parsing, validation, and query optimization. After optimization, the logical plan of the job will be translated into a datastream job. The logical plan contains many different logical

Using Google Cloud Storage for checkpointing

2018-06-28 Thread Rohil Surana
Hi I was trying to setup checkpointing on Google Cloud Storage with Flink on Kubernetes, but was facing issues with Google Cloud Storage Connector classes not loading, even though in the logs I can see it being included in the classpath. Logs showing classpath - *https://pastebin.com/R1P7Eepz

Web history limit in flink 1.5

2018-06-28 Thread eSKa
Hello, we were playing around with flink 1.5 - so far so good. Only thing that we are missing is web history setup. In flink 1.4 and before we were using *web.history* config to hold 100 jobs. With Flink 1.5. we can see that history is limited to 1 hour only. Is it possible to somehow

Re: Few question about upgrade from 1.4 to 1.5 flink ( some very basic )

2018-06-28 Thread Christophe Jolif
Chesnay, Do you have rough idea of the 1.5.1 timeline? Thanks, -- Christophe On Mon, Jun 25, 2018 at 4:22 PM, Chesnay Schepler wrote: > The watermark issue is know and will be fixed in 1.5.1 > > > On 25.06.2018 15:03, Vishal Santoshi wrote: > > Thank you > > One addition > > I do not see

Re: Streaming

2018-06-28 Thread aitozi
Hi, all Thanks for your reply. 1. Can i ask how does the SQL like below transform to a low-level datastream job? 2. If i implement a distinct in datastream job, and there is no keyBy needed advance , and we just calculate the global distinct count, Does i just can used the AllWindowedStream or