AW: Blob server not working with 1.4.0.RC2

2017-12-05 Thread Bernd.Winterstein
Hi Nico I think there were changes in the default port fort the BLOB server. I missed the fact that the Kubernetes configuration was still exposing 6124 for the JobManager BLOB server. Thanks Bernd -Ursprüngliche Nachricht- Von: Nico Kruber [mailto:n...@data-artisans.com] Gesendet:

Re: aggregate does not allow RichAggregateFunction ?

2017-12-05 Thread Vishal Santoshi
It seems that this has to do with session windows tbat are mergeable ? I tried the RixhWindow function and that seems to suggest that one cannot use state ? Any ideas folks... On Dec 1, 2017 10:38 AM, "Vishal Santoshi" wrote: > I have a simple Aggregation with one

Re: How to perform efficient DataSet reuse between iterations

2017-12-05 Thread Miguel Coimbra
Hello Fabian, Thanks for the help. I am interested in the duration of specific operators, so the fact that parts of the execution are in pipeline is not a problem for me. >From my understanding, the automated way to approach this is to run the Flink job with the web interface active and then make

Re: CPU Cores of JobManager

2017-12-05 Thread Yuta Morisawa
Hi Timo I execute streaming job without checkpointing and I don't configure any state backend, so it may be "MemoryStateBackend". Actually, my streaming app just reads data from kafka and writes it to an external DB. Its not so complicated. Regards, Yuta On 2017/12/05 19:55, Timo Walther

Re: Flink Batch Performance degradation at scale

2017-12-05 Thread Fabian Hueske
Hi, Flink's operators are designed to work in memory as long as possible and spill to disk once the memory budget is exceeded. Moreover, Flink aims to run programs in a pipelined fashion, such that multiple operators can process data at the same time. This behavior can make it a bit tricky to

Flink Batch Performance degradation at scale

2017-12-05 Thread Garrett Barton
I have been moving some old MR and hive workflows into Flink because I'm enjoying the api's and the ease of development is wonderful. Things have largely worked great until I tried to really scale some of the jobs recently. I have for example one etl job that reads in about 12B records at a time

Re: S3 Write Execption

2017-12-05 Thread vinay patil
Hi Stephan, I am facing S3 consistency related issue with the exception pasted at the end: We were able to solve the s3 sync issue by adding System.currentTime to inprogressPrefix, inprogressSuffix, s3PendingPrefix and s3PendingSuffix properties of BucketingSink. I tried another approach by

Re: S3 Write Execption

2017-12-05 Thread vinay patil
Hi Stephan, I am facing S3 consistency related issue with the exception pasted at the end: We were able to solve the s3 sync issue by adding System.currentTime to inprogressPrefix, inprogressSuffix, s3PendingPrefix and s3PendingSuffix properties of BucketingSink. I tried another approach by

Re: slot group indication per operator

2017-12-05 Thread Timo Walther
Hi Tovi, you are right, it is difficult to check the correct behavior. @Chesnay: Do you know if we can get this information? If not through the Web UI, maybe via REST? Do we have access to the full ExecutionGraph somewhere? Otherwise it might make sense to open an issue for this. Regards,

Re: Share state across operators

2017-12-05 Thread Shailesh Jain
Thanks, Timo. Either works for me. On Tue, Dec 5, 2017 at 4:55 PM, Timo Walther wrote: > Hi Shailesh, > > sharing state across operators is not possible. However, you could emit > the state (or parts of it) as a stream element to downstream operators by > having a

slot group indication per operator

2017-12-05 Thread Sofer, Tovi
Hi all, I am trying to use the slot group feature, by having 'default' group and additional 'market' group. The purpose is to divide the resources equally between two sources and their following operators. I've set the slotGroup on the source of the market data. Can I assume that all following

Re: Share state across operators

2017-12-05 Thread Timo Walther
Hi Shailesh, sharing state across operators is not possible. However, you could emit the state (or parts of it) as a stream element to downstream operators by having a function that emits a type like "Either". Another option would be to use side outputs to send state to

Re: Flink with Ceph as the persistent storage

2017-12-05 Thread Gyula Fóra
It would be the same as with any other form of async checkpointing. No direct blocking of processing but the network traffic might indirectly affect it to some extent :) Jayant Ameta ezt írta (időpont: 2017. dec. 5., K, 12:15): > If the checkpointing to Ceph happens

Re: Flink with Ceph as the persistent storage

2017-12-05 Thread Jayant Ameta
If the checkpointing to Ceph happens asynchronously, does it still have any impact on the stream processing? Jayant Ameta On Tue, Dec 5, 2017 at 4:34 PM, Gyula Fóra wrote: > Hi, > > To my understanding Ceph as in http://ceph.com/ceph-storage/ is a block > based object

Re: Share state across operators

2017-12-05 Thread Shailesh Jain
Missed one point - I'm using Managed Operator state (and not Keyed state - as my data streams are not keyed). On Tue, Dec 5, 2017 at 4:28 PM, Shailesh Jain wrote: > Hi, > > Is it possible to share state across operators in Flink? > > I have CoFlatMap operator which

Re: Flink with Ceph as the persistent storage

2017-12-05 Thread Gyula Fóra
Hi, To my understanding Ceph as in http://ceph.com/ceph-storage/ is a block based object storage system. You can use it mounted to your server and will behave as a local file system to most extent but will be shared in the cluster. The performance might not be as good as with HDFS to our

Flink with Ceph as the persistent storage

2017-12-05 Thread Jayant Ameta
Hi, Flink documents suggests that Ceph can be used as a persistent storage for states. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/checkpointing.html Considering that Ceph is a transactional database, wouldn't it have adverse effect on Flink's performance?

Share state across operators

2017-12-05 Thread Shailesh Jain
Hi, Is it possible to share state across operators in Flink? I have CoFlatMap operator which maintains a ListState and returns a DataStream. And downstream there is a KafkaSink operator for the same DataStream which needs to access the ListState. Thanks, Shailesh

Re: CPU Cores of JobManager

2017-12-05 Thread Timo Walther
I had some profiling tool like jvisualvm in mind. Are you executing streaming or batch jobs? If streaming, is checkpointing enabled and which type of statebackend? @Chesnay do you have experience with slow behavior of the Web UI? Regards, Timo Am 12/5/17 um 10:37 AM schrieb Yuta Morisawa:

Re: Maintain heavy hitters in Flink application

2017-12-05 Thread Fabian Hueske
Hi, I haven't done that before either. The query API will change with the next version (Flink 1.4.0) which is currently being prepared for releasing. Kostas (in CC) might be able to help you. Best, Fabian 2017-12-05 9:52 GMT+01:00 m@xi : > Hi Fabian, > > Thanks for your

Re: Window function support on SQL

2017-12-05 Thread Fabian Hueske
Hi Tao, computing a group window requires that the event-time timestamp of the DataStream is exposed as a time attribute (in your case as an event time attribute). If you register DataStream at the TableEnvironment, this has to be done in two steps: 1) assign timestamps and watermarks to the

Re: CPU Cores of JobManager

2017-12-05 Thread Timo Walther
Hi Yuta, as far as I know you cannot assign more cores to a JobManager. Can you tell us a bit more about your environment? How many jobs does the JobManager has to manage? How much heap memory is assigned to the JobManager? Maybe you can use a profiler and find out which component consumes

Re: subscribe

2017-12-05 Thread Chesnay Schepler
To subscribe to this mailing list, please send a mail to user-subscr...@flink.apache.org . On 05.12.2017 08:21, Xin Wang wrote: hello flink -- Thanks, Xin