date:20171026

Capacity Planning For Large State in YARN Cluster

2017-10-26 Thread Ashish Pokharel

Hi Everyone, We have hit a roadblock moving an app at Production scale and was hoping to get some guidance. Application is pretty common use case in stream processing but does require maintaining large number of keyed states. We are processing 2 streams - one of which is a daily burst of stream

Data sources and slices

2017-10-26 Thread David Dreyfus

Hello, If I am on a cluster with 2 task managers with 64 CPUs each, I can configure 128 slots in accordance with the documentation. If I set parallelism to 128 and read a 64 MB file (one datasource with a single file), will flink really create 500K slices? Or, will it check the default blocksize o

Re: Not enough free slots to run the job

2017-10-26 Thread David Dreyfus

Hello, I know this is an older thread, but ... If some slots are left empty it doesn't necessarily mean that machine resources are wasted. Some managed memory might be unavailable, but CPU, heap memory, network, and disk are shared across slots. To the extent there are multiple operators executin

Execute multiple jobs in parallel (threading): java.io.OptionalDataException

2017-10-26 Thread David Dreyfus

Hello, I am trying to submit multiple jobs to flink from within my Java program. I am running into an exception that may be related: java.io.OptionalDataException. Should I be able to create multiple plans/jobs within a single session and execute them concurrently? If so, is there a working examp

Re: Tasks, slots, and partitioned joins

2017-10-26 Thread Fabian Hueske

Hi David, please find my answers below: 1. For high utilization, all slot should be filled. Each slot will processes a slice of the program on a slice of the data. In case of partitioning or changed parallelism, the data is shuffled accordingly . 2. That's a good question. I think the default log

Re: Passing Configuration & State

2017-10-26 Thread Fabian Hueske

Hi Navneeth, the configuring user function using a Configuration object and setting the parameters in the open() method of a RichFunction is no longer recommended. In fact, that only works for the DataSet API and has not been added for the DataStream API. The open() method with the Configuration p

Re: PrometheusReporter error

2017-10-26 Thread cslotterback

Hello 김동원, We are experiencing the same issue you were when trying to use the 1.4 prometheus reporter with 1.3: [...] Error while registering metric. java.lang.IllegalArgumentException: Collector already registered that provides name: flink_taskmanager_Status_JVM_CPU_Load [...] - The ji

Passing Configuration & State

2017-10-26 Thread Navneeth Krishnan

Hi All, I have developed a streaming pipeline in java and I need to pass some of the configuration parameters that are passed during program startup to user functions. I used the below link as reference. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/best_practices.html I have t

Re: Local combiner on each mapper in Flink

2017-10-26 Thread Le Xu

Thanks for the help! I’ll try out the ProcessFunction then. Le > On Oct 26, 2017, at 8:03 AM, Kien Truong wrote: > > Hi, > For Streaming API, use a ProcessFunction as Fabian's suggestion. > You can pretty much do anything with a ProcessFunction :) > > Best regards, > > Kien > > > On 10/26

Re: Tasks, slots, and partitioned joins

2017-10-26 Thread David Dreyfus

Hi Fabian, Thank you for the great, detailed answers. 1. So, each parallel slice of the DAG is placed into one slot. The key to high utilization is many slices of the source data (or the various methods of repartitioning it). Yes? 2. In batch processing, are slots filled round-robin on task manag

Re: Local combiner on each mapper in Flink

2017-10-26 Thread Kien Truong

Hi, For Streaming API, use a ProcessFunction as Fabian's suggestion. You can pretty much do anything with a ProcessFunction :) Best regards, Kien On 10/26/2017 8:01 PM, Le Xu wrote: Hi Kien: Is there a similar API for DataStream as well? Thanks! Le On Oct 26, 2017, at 7:58 AM, Kien Tr

Re: Local combiner on each mapper in Flink

2017-10-26 Thread Le Xu

Hi Kien: Is there a similar API for DataStream as well? Thanks! Le > On Oct 26, 2017, at 7:58 AM, Kien Truong wrote: > > Hi, > > For batch API, you can use GroupReduceFunction, which give you the same > benefit as a MapReduce combiner. > https://ci.apache.org/projects/flink/flink-docs-rele

Re: Local combiner on each mapper in Flink

2017-10-26 Thread Kien Truong

Hi, For batch API, you can use GroupReduceFunction, which give you the same benefit as a MapReduce combiner. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/dataset_transformations.html#combinable-groupreducefunctions Regards, Kien On 10/26/2017 7:37 PM, Le Xu wrote:

Re: Local combiner on each mapper in Flink

2017-10-26 Thread Le Xu

Thanks guys! That makes more sense now. So does it mean once I start use a window operator, all operations on my WindowedStream would be global (across all partitions)? In that case, WindowedStream.aggregate (or sum) would apply to all data after shuffling instead of each partition. If I und

Re: Checkpoint was declined (tasks not ready)

2017-10-26 Thread bartektartanus

I think we could try with option number one, as it seems to be easier to implement. Currently I'm cloning Flink repo to fix this and test that solution with our currently not working code. Unfortunately, it takes forever to download all the dependencies. Anyway, I hope that eventually will manage t

Re: StreamTransformation object

2017-10-26 Thread Tony Wei

Hi Andrea, The `learn` operator is defined in this method [1]. If you need to set its slotSharing group, you should add `slotSharingGroup(...)` behind line 97 [2] or a new API to get the result from `inferenceStreamBuilder.build()`. Best Regards, Tony Wei [1] https://github.com/htm-community/fli

Re: StreamTransformation object

2017-10-26 Thread AndreaKinn

Can you be clearer about this part? I'm really appreciating your help Tony Wei wrote > you need to refactor `HTMStream` to expose > `InferenceStreamBuilder.build()`. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: State snapshotting when source is finite

2017-10-26 Thread Flavio Pompermaier

Done: https://issues.apache.org/jira/browse/FLINK-7930 Best, Flavio On Thu, Oct 26, 2017 at 10:52 AM, Till Rohrmann wrote: > Hi Flavio, > > this kind of feature is indeed useful and currently not supported by > Flink. I think, however, that this feature is a bit trickier to implement, > because

Re: StreamTransformation object

2017-10-26 Thread Tony Wei

Hi Andrea, In this way, you will only set a slotSharing group on select operator and learn operator will remain in the default group. If you want to set lean operator as well, I am afraid that you need to refactor `HTMStream` to expose `InferenceStreamBuilder.build()`. Best Regards, Tony Wei 201

Re: StreamTransformation object

2017-10-26 Thread AndreaKinn

Mmm looks good. This solution would be great. In this way am I setting a slotSharing group for both learn and select method and not only on select? I believed I need to call slotSharingGroup exactly on the return type of learn. -- Sent from: http://apache-flink-user-mailing-list-archive.233605

Re: Tasks, slots, and partitioned joins

2017-10-26 Thread Fabian Hueske

Hi David, Flink's DataSet API schedules one slice of a program to a task slot. A program slice is one parallel instance of each operator of a program. When all operator of your program run with a parallelism of 1, you end up with only 1 slice that runs on a single slot. Flink's DataSet API leverag

Re: State snapshotting when source is finite

2017-10-26 Thread Till Rohrmann

Hi Flavio, this kind of feature is indeed useful and currently not supported by Flink. I think, however, that this feature is a bit trickier to implement, because Tasks cannot currently initiate checkpoints/savepoints on their own. This would entail some changes to the lifecycle of a Task and an e

Re: StreamTransformation object

2017-10-26 Thread Tony Wei

Hi Andrea, I roughly read that external library[1], and I think the return object of "select" function could be casted as `SingleOutputStreamOperator` type [2]. How about trying the following code? DataStream> LCxAccResult = HTM.learn(LCxAccStream, new Harness.AnomalyNetwork()) .select(new

Re: HBase config settings go missing within Yarn.

2017-10-26 Thread Niels Basjes

I have an idea how we can reduce the impact this class of problem. If we can detect that we are running in a distributed environment then in order to use HBase you MUST have an hbase-site.xml I'll see if I can make a proof of concept. Niels On Wed, Oct 25, 2017 at 11:27 AM, Till Rohrmann wrote:

Re: StreamTransformation object

2017-10-26 Thread AndreaKinn

Sorry Tony it is my fault, I was wrong the first post. Actually now my situation is the following: DataStream> LCxAccResult = HTM.learn(LCxAccStream, new Harness.AnomalyNetwork()) .select(new InferenceSelectFunction>() {...} so actually the return value of "Lear

Re: Use a round-robin kafka partitioner

2017-10-26 Thread kla

Thanks for your comment. If I write the KafkaPartitioner anyway I have to somehow pass the *kafka.producer.Partitioner* which is not so easy. So I have found the easiest solution for this, you have just pass /null/ value: outputStream.addSink(new FlinkKafkaProducer010<>(producerProperties.getProp

Re: Local combiner on each mapper in Flink

2017-10-26 Thread Fabian Hueske

Hi, in a MapReduce context, combiners are used to reduce the amount of data 1) to shuffle and fully sort (to group the data by key) and 2) to reduce the impact of skewed data. The question is, why do you need a combiner in your use case. - To reduce the data to shuffle: You should not use a windo

Re: State snapshotting when source is finite

2017-10-26 Thread Fabian Hueske

Hi Flavio, Thanks for bringing up this topic. I think running periodic jobs with state that gets restored and persisted in a savepoint is a very valid use case and would fit the stream is a superset of batch story quite well. I'm not sure if this behavior is already supported, but think this would

Capacity Planning For Large State in YARN Cluster

Data sources and slices

Re: Not enough free slots to run the job

Execute multiple jobs in parallel (threading): java.io.OptionalDataException

Re: Tasks, slots, and partitioned joins

Re: Passing Configuration & State

Re: PrometheusReporter error

Passing Configuration & State

Re: Local combiner on each mapper in Flink

Re: Tasks, slots, and partitioned joins

Re: Local combiner on each mapper in Flink

Re: Local combiner on each mapper in Flink

Re: Local combiner on each mapper in Flink

Re: Local combiner on each mapper in Flink

Re: Checkpoint was declined (tasks not ready)

Re: StreamTransformation object

Re: StreamTransformation object

Re: State snapshotting when source is finite

Re: StreamTransformation object

Re: StreamTransformation object

Re: Tasks, slots, and partitioned joins

Re: State snapshotting when source is finite

Re: StreamTransformation object

Re: HBase config settings go missing within Yarn.

Re: StreamTransformation object

Re: Use a round-robin kafka partitioner

Re: Local combiner on each mapper in Flink

Re: State snapshotting when source is finite

28 matches

Site Navigation

Mail list logo

Footer information