Hi Everyone,
We have hit a roadblock moving an app at Production scale and was hoping to get
some guidance. Application is pretty common use case in stream processing but
does require maintaining large number of keyed states. We are processing 2
streams - one of which is a daily burst of stream
Hello,
If I am on a cluster with 2 task managers with 64 CPUs each, I can configure
128 slots in accordance with the documentation. If I set parallelism to 128
and read a 64 MB file (one datasource with a single file), will flink really
create 500K slices? Or, will it check the default blocksize o
Hello,
I know this is an older thread, but ...
If some slots are left empty it doesn't necessarily mean that machine
resources are wasted. Some managed memory might be unavailable, but CPU,
heap memory, network, and disk are shared across slots. To the extent there
are multiple operators executin
Hello,
I am trying to submit multiple jobs to flink from within my Java program.
I am running into an exception that may be related:
java.io.OptionalDataException.
Should I be able to create multiple plans/jobs within a single session and
execute them concurrently?
If so, is there a working examp
Hi David,
please find my answers below:
1. For high utilization, all slot should be filled. Each slot will
processes a slice of the program on a slice of the data. In case of
partitioning or changed parallelism, the data is shuffled accordingly .
2. That's a good question. I think the default log
Hi Navneeth,
the configuring user function using a Configuration object and setting the
parameters in the open() method of a RichFunction is no longer recommended.
In fact, that only works for the DataSet API and has not been added for the
DataStream API. The open() method with the Configuration p
Hello 김동원,
We are experiencing the same issue you were when trying to use the 1.4
prometheus reporter with 1.3:
[...]
Error while registering metric.
java.lang.IllegalArgumentException: Collector already registered that
provides name: flink_taskmanager_Status_JVM_CPU_Load
[...]
-
The ji
Hi All,
I have developed a streaming pipeline in java and I need to pass some of
the configuration parameters that are passed during program startup to user
functions. I used the below link as reference.
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/best_practices.html
I have t
Thanks for the help! I’ll try out the ProcessFunction then.
Le
> On Oct 26, 2017, at 8:03 AM, Kien Truong wrote:
>
> Hi,
> For Streaming API, use a ProcessFunction as Fabian's suggestion.
> You can pretty much do anything with a ProcessFunction :)
>
> Best regards,
>
> Kien
>
>
> On 10/26
Hi Fabian,
Thank you for the great, detailed answers.
1. So, each parallel slice of the DAG is placed into one slot. The key to
high utilization is many slices of the source data (or the various methods
of repartitioning it). Yes?
2. In batch processing, are slots filled round-robin on task manag
Hi,
For Streaming API, use a ProcessFunction as Fabian's suggestion.
You can pretty much do anything with a ProcessFunction :)
Best regards,
Kien
On 10/26/2017 8:01 PM, Le Xu wrote:
Hi Kien:
Is there a similar API for DataStream as well?
Thanks!
Le
On Oct 26, 2017, at 7:58 AM, Kien Tr
Hi Kien:
Is there a similar API for DataStream as well?
Thanks!
Le
> On Oct 26, 2017, at 7:58 AM, Kien Truong wrote:
>
> Hi,
>
> For batch API, you can use GroupReduceFunction, which give you the same
> benefit as a MapReduce combiner.
> https://ci.apache.org/projects/flink/flink-docs-rele
Hi,
For batch API, you can use GroupReduceFunction, which give you the same
benefit as a MapReduce combiner.
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/dataset_transformations.html#combinable-groupreducefunctions
Regards,
Kien
On 10/26/2017 7:37 PM, Le Xu wrote:
Thanks guys! That makes more sense now.
So does it mean once I start use a window operator, all operations on my
WindowedStream would be global (across all partitions)? In that case,
WindowedStream.aggregate (or sum) would apply to all data after shuffling
instead of each partition.
If I und
I think we could try with option number one, as it seems to be easier to
implement. Currently I'm cloning Flink repo to fix this and test that
solution with our currently not working code. Unfortunately, it takes
forever to download all the dependencies. Anyway, I hope that eventually
will manage t
Hi Andrea,
The `learn` operator is defined in this method [1]. If you need to set its
slotSharing group, you should add `slotSharingGroup(...)` behind line 97
[2] or a new API to get the result from `inferenceStreamBuilder.build()`.
Best Regards,
Tony Wei
[1]
https://github.com/htm-community/fli
Can you be clearer about this part?
I'm really appreciating your help
Tony Wei wrote
> you need to refactor `HTMStream` to expose
> `InferenceStreamBuilder.build()`.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Done: https://issues.apache.org/jira/browse/FLINK-7930
Best,
Flavio
On Thu, Oct 26, 2017 at 10:52 AM, Till Rohrmann
wrote:
> Hi Flavio,
>
> this kind of feature is indeed useful and currently not supported by
> Flink. I think, however, that this feature is a bit trickier to implement,
> because
Hi Andrea,
In this way, you will only set a slotSharing group on select operator and
learn operator will remain in the default group.
If you want to set lean operator as well, I am afraid that you need to
refactor `HTMStream` to expose `InferenceStreamBuilder.build()`.
Best Regards,
Tony Wei
201
Mmm looks good. This solution would be great.
In this way am I setting a slotSharing group for both learn and select
method and not only on select?
I believed I need to call slotSharingGroup exactly on the return type of
learn.
--
Sent from: http://apache-flink-user-mailing-list-archive.233605
Hi David,
Flink's DataSet API schedules one slice of a program to a task slot. A
program slice is one parallel instance of each operator of a program.
When all operator of your program run with a parallelism of 1, you end up
with only 1 slice that runs on a single slot.
Flink's DataSet API leverag
Hi Flavio,
this kind of feature is indeed useful and currently not supported by Flink.
I think, however, that this feature is a bit trickier to implement, because
Tasks cannot currently initiate checkpoints/savepoints on their own. This
would entail some changes to the lifecycle of a Task and an e
Hi Andrea,
I roughly read that external library[1], and I think the return object of
"select" function could be casted as `SingleOutputStreamOperator` type [2].
How about trying the following code?
DataStream>
LCxAccResult = HTM.learn(LCxAccStream, new Harness.AnomalyNetwork())
.select(new
I have an idea how we can reduce the impact this class of problem.
If we can detect that we are running in a distributed environment then in
order to use HBase you MUST have an hbase-site.xml
I'll see if I can make a proof of concept.
Niels
On Wed, Oct 25, 2017 at 11:27 AM, Till Rohrmann
wrote:
Sorry Tony it is my fault, I was wrong the first post. Actually now my
situation is the following:
DataStream>
LCxAccResult = HTM.learn(LCxAccStream, new Harness.AnomalyNetwork())
.select(new
InferenceSelectFunction>() {...}
so actually the return value of "Lear
Thanks for your comment.
If I write the KafkaPartitioner anyway I have to somehow pass the
*kafka.producer.Partitioner* which is not so easy.
So I have found the easiest solution for this, you have just pass /null/
value:
outputStream.addSink(new
FlinkKafkaProducer010<>(producerProperties.getProp
Hi,
in a MapReduce context, combiners are used to reduce the amount of data 1)
to shuffle and fully sort (to group the data by key) and 2) to reduce the
impact of skewed data.
The question is, why do you need a combiner in your use case.
- To reduce the data to shuffle: You should not use a windo
Hi Flavio,
Thanks for bringing up this topic.
I think running periodic jobs with state that gets restored and persisted
in a savepoint is a very valid use case and would fit the stream is a
superset of batch story quite well.
I'm not sure if this behavior is already supported, but think this would
28 matches
Mail list logo