Hi Sung
How about using FileProcessingMode.PROCESS_CONTINUOUSLY [1] as watch type when
reading data from HDFS. FileProcessingMode.PROCESS_CONTINUOUSLY would
periodically monitor the source while default FileProcessingMode.PROCESS_ONCE
would only process once the data and exit.
[1]
https://ci.
Hello,
I work on joining two streams, one is from Kafka and another is from a file
(small size).
Stream processing works well, but checkpointing is failed with following
message.
The file only has less than 100 lines and the pipeline related file reading is
finished with “FINISHED’ o as soon as
Ah, sorry for misunderstanding.
So what you are asking is that why we need "--classpath"? I'm not sure what
the original author think of it. I guess the listed below might be
considered.
1. Avoid duplicated deploying. If some common jars are deployed in advance
to each node of cluster, the jobs dep
Hi Biao,
I am aware of it - that's not my question.
On Mon, Jun 17, 2019 at 7:42 PM Biao Liu wrote:
> Hi Abdul, "--classpath " can be used for those are not included in
> user jar. If all your classes are included in your jar passed to Flink, you
> don't need this "--classpath".
>
> Abdul Qadee
Hi Abdul, "--classpath " can be used for those are not included in
user jar. If all your classes are included in your jar passed to Flink, you
don't need this "--classpath".
Abdul Qadeer 于2019年6月18日周二 上午3:08写道:
> Hi!
>
> I was going through submission of a Flink program through CLI. I see that
>
Hi!
I was going through submission of a Flink program through CLI. I see that
"--classpath " needs to be accessible from all nodes in the cluster as
per documentation. As I understand the jar files are already part of the
blob uploaded to JobManager from the CLI. The TaskManagers can download
this
Hi Vijay,
When using windows, you may use the 'trigger' to set a Custom Trigger which
would trigger your *ProcessWindowFunction* accordingly.
In your case, you would probably use:
> *.trigger(ContinuousProcessingTimeTrigger.of(Time.minutes(5)))*
>
Thanks,
Rafi
On Mon, Jun 17, 2019 at 9:01 PM
I am also implementing the ProcessWindowFunction and accessing the
windowState to get data but how do i push data out every 5 mins during a 4
hr time window ?? I am adding a globalState to handle the 4 hr window ???
Or should I still use the context.windowState even for the 4 hr window ?
public c
Hi All,
I am trying a simple example which looks like this:
StreamExecutionEnvironment see =
StreamExecutionEnvironment.createLocalEnvironment();
PojoCsvInputFormat pcif = new PojoCsvInputFormat<>(inPath,
PojoTypeInfo) TypeExtractor.createTypeInfo(TestPojo.class));
DataStreamSource stream = see.re
Hi all,
The scheduled meetup is only about a week away. Please note that RSVP at
meetup.com is required. In order for us to get the actual headcount to
prepare for the event, please sign up as soon as possible if you plan to
join. Thank you very much for your cooperation.
Regards,
Xuefu
On Thu,
Thanks,Fabian.
I got around the issue by moving the logic for the
DropwizardHistogramWrapper -a non serializable class into the
ProcessWindowFunction's open() function.
On Fri, Jun 7, 2019 at 12:33 AM Fabian Hueske wrote:
> Hi,
>
> There are two ways:
>
> 1. make the non-serializable member va
Hi,
Need to calculate a 4 hour time window for count, sum with current
calculated results being output every 5 mins.
How do i do that ?
Currently, I calculate results for 5 sec and 5 min time windows fine on the
KeyedStream.
Time timeWindow = getTimeWindowFromInterval(interval);//eg: timeWindow =
Hi Mans,
I am not sure if you intended to name them like this. but if you were to
access them you need to escape them like `EXPR$0` [1].
Also I think Flink defaults unnamed fields in a row to `f0`, `f1`, ... [2].
so you might be able to access them like that.
--
Rong
[1] https://calcite.apache.o
Well some reasons, machine reboots/maintenance etc... Host/VM crashes and
restarts. And same goes for the job manager. I don't want/need to have to
document/remember some start process for sys admins/devops.
So far I have looked at ./start-cluster.sh and all it seems to do is SSH
into all the spec
Hi,
I used this example of KeyedProcessFunction from the FLink website [1] and
I have implemented my own KeyedProcessFunction to process some
approximation counting [2]. This worked very well. Then I switched the data
source to consume strings from Twitter [3]. The data source is consuming
the str
Hi John,
I have not much experience wrt setting Flink up via systemd services. Why
do you want to do it like that?
1. In standalone mode, Flink won't automatically restart TaskManagers. This
only works on Yarn and Mesos atm.
2. In case of a lost TaskManager, you should run `taskmanager.sh start`.
Hi,
Those are good questions.
> A datastream to connect to a table is available? I
What table, what database system do you mean? You can check the list of
existing connectors provided by Flink in the documentation. About reading from
relational DB (example by using JDBC) you can read a little
Hi,
We have a Hadoop/YARN Cluster with Kafka and Flink/YARN running on the
same machines.
In Spark (Streaming), there is a PreferBrokers location strategy, so that
the executors consume those kafka partitions which are served from the
same machines kafka broker. (
https://spark.apache.org/
Hi,
I don’t know what’s the reason (also there are no open issues with this test in
our jira). This test seems to be working on travis/CI and it works for me when
I’ve just tried running it locally. There might be some bug in the
test/production that is triggered only in some specific condition
Hi,
As far as I know, this is currently impossible.
You can workaround this issue by maybe implementing your own custom post
processing operator/flatMap function, that would:
- track the output of window operator
- register processing time timer with some desired timeout
- every time the process
Hi,
If you use RocksDBStateBackend, one member one state will get better
performance. Because RocksDBStateBackend needs to de/serialize the
key/value when put/get, with one POJO value, you need to de/serializer the
whole POJO value when put/get.
Best,
Congxian
Timothy Victor 于2019年6月17日周一 下午8:0
I would choose encapsulation if it the fields are indeed related and makes
sense for your model. In general, I feel it is not a good thing to let
Flink (or any other frameworks) internal mechanics dictate your data model.
Tim
On Mon, Jun 17, 2019, 4:59 AM Frank Wilson wrote:
> Hi,
>
> Is it be
Hi,
Thanks for reporting the issue. I think this might be caused by
System.currentTimeMillis() not being monotonic [1] and the fact Flink is
accessing this function per element multiple times (at least twice: first for
creating a window, second to perform the check that has failed in your case)
Hi all,
I'm experiencing some unexpected behavior using an interval join in Flink.
I'm dealing with two data sets, lets call them X and Y. They are finite
(10k elements) but I interpret them as a DataStream. The data needs to be
joined for enrichment purposes. I use event time and I know (because
For the case of a single iteration of an iterative loop where the feedback type
is different to the input stream type, what order are events processed in the
forward flow? So for example, if we have:
* the input stream contains input1 followed by input2
* a ConnectedIterativeStream at th
Hi,
Is it better to have one POJO value state with a collection inside or an
explicit state declaration for each member? e.g.
MyPojo {
long id;
List[Foo] foos;
// getter / setters omitted
}
Or
Two managed state declarations in my process function (a value for the long
and a list fo
Hi Zili,
thank you for adding these threads :) I would have otherwise picked them up
next week, just couldn't put everything into one email.
Cheers,
Konstantin
On Sun, Jun 16, 2019 at 11:07 PM Zili Chen wrote:
> Hi Konstantin and all,
>
> Thank Konstantin very much for reviving this tradition
Hi mates, I decided to enable persist the state of our flink jobs, that write
data into hdfs, but got some troubles with that.
I’m trying to use StreamingFileSink with cloudera hadoop, which version is
2.6.5, and it doesn’t contain truncate method.
So, job fails immediately when it’s trying to
Hi Congixian,
I don't use Flink at the moment, I am trying to evaluate its suitability
for my company's purposes by re-writing one of our apps with Flink. We have
apps with similar business logic but different code, despite they do
essentially the same thing. I am new to the streaming paradigms an
29 matches
Mail list logo