Yes, that does make sense! Thank you for explaining. Have you made the
change yet? I couldn't find it on the master.
On Wed, Nov 18, 2015 at 5:16 PM, Stephan Ewen wrote:
> That makes sense...
>
> On Mon, Oct 26, 2015 at 12:31 PM, Márton Balassi
>
Nevermind,
Looking at the logs I saw that it was having issues trying to connect to ZK.
To make I short is had the wrong port.
It is now starting.
Tomorrow I’ll try to kill some JobManagers *evil*.
Another question : if I have multiple HA flink jobs, are there some points to
check in order to
Is there a succinct description of the distinction between these transforms?
Ron
—
Ron Crocker
Principal Engineer & Architect
( ( •)) New Relic
rcroc...@newrelic.com
M: +1 630 363 8835
Hi Ron,
Have you checked:
https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#transformations
?
Fold is like reduce, except that you define a start element (of a different
type than the input type) and the result type is the type of the initial
value. In
Okay, I think I misunderstood your problem.
Usually you can simply execute tests one after another by waiting until
"env.execute()" returns. The streaming jobs terminate by themselves once
the sources reach end of stream (finite streams are supported that way) but
make sure all records flow
Sorry Stephan but I don't follow how global order applies in my case. I'm
merely checking the size of the sink results. I assume all tuples from a
given test invitation have sunk before the next test begins, which is
clearly not the case. Is there a way I can place a barrier in my tests to
ensure
Hi Guido!
If you use Scala, I would use an Option to represent nullable fields. That
is a very explicit solution that marks which fields can be null, and also
forces the program to handle this carefully.
We are looking to add support for Java 8's Optional type as well for
exactly that purpose.
Hi!
If you go with the Batch API, then any failed task (like a sink trying to
insert into the database) will be completely re-executed. That makes sure
no data is lost in any way, no extra effort needed.
It may insert a lot of duplicates, though, if the task is re-started after
half the data was
That makes sense...
On Mon, Oct 26, 2015 at 12:31 PM, Márton Balassi
wrote:
> Hey Max,
>
> The solution I am proposing is not flushing on every record, but it makes
> sure to forward the flushing from the sinkfunction to the outputformat
> whenever it is triggered.
The JobManager does not read all files, but is has to query the HDFS for
all file metadata (size, blocks, block locations), which can take a bit.
There is a separate call to the HDFS Namenode for each file. The more
files, the more metadata has to be collected.
On Wed, Nov 18, 2015 at 7:15 PM,
Okay, let me take a step back and make sure I understand this right...
With many small files it takes longer to start the job, correct? How much
time did it actually take and how many files did you have?
On Wed, Nov 18, 2015 at 7:18 PM, Flavio Pompermaier
wrote:
> in my
So why it takes so much to start the job?because in any case the job
manager has to read all the lines of the input files before generating the
splits?
On 18 Nov 2015 17:52, "Stephan Ewen" wrote:
> Late answer, sorry:
>
> The splits are created in the JobManager, so the sub
in my test I was using the local fs (ext4)
On 18 Nov 2015 19:17, "Stephan Ewen" wrote:
> The JobManager does not read all files, but is has to query the HDFS for
> all file metadata (size, blocks, block locations), which can take a bit.
> There is a separate call to the HDFS
Please see the above gist: my test makes no assertions until after the
env.execute() call. Adding setParallelism(1) to my sink appears to
stabilize my test. Indeed, very helpful. Thanks a lot!
-n
On Wed, Nov 18, 2015 at 9:15 AM, Stephan Ewen wrote:
> Okay, I think I
The new KafkaConsumer fro Kafka 0.9 should be able to handle this, as the
Kafka Client Code itself has support for this then.
For 0.8.x, we would need to implement support for recovery inside the
consumer ourselves, which is why we decided to initially let the Job
Recovery take care of that.
If
it was long ago..but if I remember correctly they were about 50k
On 18 Nov 2015 19:22, "Stephan Ewen" wrote:
> Okay, let me take a step back and make sure I understand this right...
>
> With many small files it takes longer to start the job, correct? How much
> time did it
Granted, both are presented with the same example in the docs. They are
modeled after reduce and fold in functional programming. Perhaps we should
have a bit more enlightening examples.
On Wed, Nov 18, 2015 at 6:39 PM, Fabian Hueske wrote:
> Hi Ron,
>
> Have you checked:
>
Hi,
I'm hitting Compiler Exception with some of my data set, but not all of
them.
Exception in thread "main" org.apache.flink.optimizer.CompilerException: No
plan meeting the requirements could be created @ Bulk Iteration (Bulk
Iteration) (1:null). Most likely reason: Too restrictive plan hints.
Hey Vasia,
I think a very common workload would be an event stream from web servers of
an online shop. Usually, these shops have multiple servers, so events
arrive out of order.
I think there are plenty of different use cases that you can build around
that data:
- Users perform different actions
Hi Konstatin,
you are right, if the stream is keyed by the session-id then it works.
I was referring to the case where you have, for example, some interactions with
timestamps and you want to derive the sessions from this. In that case, it can
happen that events that should belong to one
Hi Aljoscha,
thanks, that's what I thought. Just wanted to verify, that keyBy +
SessionWindow() works with intermingled events.
Cheers,
Konstantin
On 18.11.2015 11:14, Aljoscha Krettek wrote:
> Hi Konstatin,
> you are right, if the stream is keyed by the session-id then it works.
>
> I was
We, were also trying to address session windowing but took slightly different
approach as to what window we place the event into.
We did not want "triggering event" to be purged as part of the window it
triggered, but instead to create a new window for it and have the old window to
fire and
Hi,
I wrote a little example that could be what you are looking for:
https://github.com/dataArtisans/query-window-example
It basically implements a window operator with a modifiable window size that
also allows querying the current accumulated window contents using a second
input stream.
There is no global order in parallel streams, it is something that
applications need to work with. We are thinking about adding operations to
introduce event-time order (at the cost of some delay), but that is only
plans at this point.
What I do in my tests is run the test streams in parallel,
agree,
and Stateful Streaming operator instance in Flink is looks natural compare
to Apache Spark.
On Thu, Nov 19, 2015 at 10:54 AM, Liang Chen
wrote:
> Two aspects are attracting them:
> 1.Flink is using java, it is easy for most of them to start Flink, and be
> more
25 matches
Mail list logo