?
Because I think you will need to do the former.
On Mon, Jun 10, 2019 at 8:41 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hm, that would probably work, thanks!
But, should the timers behave like that? I'm trying to fix tris by
introducing a sequence of watermarks
), which results in state being cleared
before it gets flushed, which means data loss.
Jan
On 6/10/19 5:08 PM, Reuven Lax wrote:
Do you mean for a single key or across keys?
On Mon, Jun 10, 2019, 5:11 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi,
I have come across
Hi,
I have come across issue [1], where I'm not sure how to solve this in
most elegant way.
Any suggestions?
Thanks,
Jan
[1] https://issues.apache.org/jira/browse/BEAM-7520
the class in other places.
Cheers
Reza
On Fri, 7 Jun 2019 at 14:50, Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi Reza, interesting suggestions, thanks.
When you mentioned join, I recalled an older issue (which
apparently was not yet transfered to Beam's JIRA) [1].
Hi,
that sounds interesting, but it seems to be computationally intensive
and might not be well scalable, if I understand it correctly. It looks
like it needs a transitive closure, am I right?
Jan
On 6/7/19 11:17 AM, i.am.moai wrote:
Hello everyone, nice to meet you
I am Naoki Hyu(日宇尚記).
, but that is a compromise.
/4) more tests (for batch and validatesRunner) are needed/
I just posted a question on the best way to make use of
the @ValidateRunner annotation on this list, sounds like it might be
useful to you as well :-)
On Thu, 6 Jun 2019 at 23:03, Jan Lukavský <mailto
be (relatively) cheaply done
in streaming and batch, it's done in very different ways, and also
that it's hard to do via composition).
On Thu, May 23, 2019 at 4:10 PM Jan Lukavský wrote:
Hi,
I have written a very brief draft of how it might be possible to
implement @RequireTimeSortedInput discussed in [1
parallelism in
streaming sources and why batch sources are generally better parallelised.
Jan
On 5/30/19 1:35 PM, Reuven Lax wrote:
Files can grow (depending on the filesystem), and tailing growing
files is a valid use case.
On Wed, May 29, 2019 at 3:23 PM Jan Lukavský <mailto:je...@seznam
ime, which makes them useless for comparison.
On 5/29/19 2:44 PM, Robert Bradshaw wrote:
On Tue, May 28, 2019 at 12:18 PM Jan Lukavský wrote:
As I understood it, Kenn was supporting the idea that sequence metadata
is preferable over FIFO. I was trying to point out, that it even should
provide the s
eam provides this). It also gets
awkward with Flatten - the sequence number is no longer enough, you
must also encode which side of the flatten each element came from.
On Tue, May 28, 2019 at 3:18 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
As I understood it, Kenn was support
to sort by.
I still think your proposal is very useful by the way. I'm merely pointing out
that to solve the state-machine problem we probably need something more.
Reuven
On Thu, May 23, 2019 at 9:50 AM Jan Lukavský wrote:
Hi,
yes. It seems that ordering by user supplied UDF makes sens
be (relatively) cheaply done
in streaming and batch, it's done in very different ways, and also
that it's hard to do via composition).
On Thu, May 23, 2019 at 4:10 PM Jan Lukavský wrote:
Hi,
I have written a very brief draft of how it might be possible to
implement @RequireTimeSortedInput discussed
Hi, absolutely +1 to add this to the model, but does this imply that
MapState can be dropped (or backed by this)? It can have different insert or
delete time complexity (O(1)) instead of O(logn).
Jan
-- Původní e-mail --
Od: Aljoscha Krettek
Komu: dev@beam.apache.org
Datum: 24.
chines you usually need some sort of FIFO ordering along with an
ordered sources, such as Kafka, not timestamp ordering.
>
> Reuven
>
> On Thu, May 23, 2019 at 12:32 AM Jan Lukavský mailto:je...@seznam.cz)> wrote:
>>
>> Hi all,
>>
>> thanks everyone for this discussion.
rdering. Especially given Beam's
decision to have milliseconds timestamps this is possible, but even at
microsecond or nanosecond precision this can happen at scale. To handle
state machines you usually need some sort of FIFO ordering along with an
ordered sources, such as Kafka, not timestamp ordering
Hi,
I have written a very brief draft of how it might be possible to
implement @RequireTimeSortedInput discussed in [1]. I see the document
[2] a starting point for a discussion. There are several open questions,
which I believe can be resolved by this great community. :-)
Jan
[1]
(and checkpoint) processed data and only emit it
once a Flink checkpoint has completed.
Cheers,
Max
On 21.05.19 16:49, Jan Lukavský wrote:
Hi,
> Actually, I think it is a larger (open) question whether exactly
once is guaranteed by the model or whether runners are allowed to
relax that. I would th
Looks like something is wrong with triggers on github side. Triggers are not
triggered on other projects, too.
Jan
-- Původní e-mail --
Od: Valentyn Tymofieiev
Komu: dev
Datum: 22. 5. 2019 21:23:19
Předmět: Jenkins not triggering test runs? Be careful with merges.
"
Is
order to
optimize. (That and, in practice, our annotation-style process methods
don't lend themselves to easy composition.) I think it could work in
specific cases though.
More inline below.
On Tue, May 21, 2019 at 11:38 AM Jan Lukavský wrote:
Hi Robert,
> Beam has an exactly-once model. If
tually need sorting.
Kenn
On Mon, May 20, 2019 at 2:59 PM Jan Lukavský mailto:je...@seznam.cz>> wrote:
Yes, the problem will arise probably mostly when you have
not well distributed keys (or too few keys). I'm really
not sure if a pure
M, Robert Bradshaw wrote:
On Mon, May 20, 2019 at 5:24 PM Jan Lukavský wrote:
> I don't see batch vs. streaming as part of the model. One can have
microbatch, or even a runner that alternates between different modes.
Although I understand motivation of this statement, this project name is
need sorting.
Kenn
On Mon, May 20, 2019 at 2:59 PM Jan Lukavský mailto:je...@seznam.cz>> wrote:
Yes, the problem will arise probably mostly when you have not
well distributed keys (or too few keys). I'm really not sure
if a pure GBK with a trigger c
is, only that
it meets the spec. And other cases like user funnels are monotonic
enough that you also don't actually need sorting.
Kenn
On Mon, May 20, 2019 at 2:59 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Yes, the problem will arise probably mostly when you have not w
/trigger that is missing that you feel would remove the need for
you to use StatefulParDo?
On Mon, May 20, 2019 at 12:54 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi Lukasz,
> Today, if you must have a strict order, you must guarantee that
your StatefulParD
cannot first read everything
from distributed storage and then buffer it all into memory, just to
read it again, but sorted. That will not work. And even if it would, it
would be a terrible waste of resources.
Jan
On 5/20/19 8:39 PM, Lukasz Cwik wrote:
On Mon, May 20, 2019 at 8:24 AM Jan Lu
tch case (and do it for *all* batch stateful pardos
without annotation), or introduce annotation, but then make the same
guarantees for streaming case as well.
Jan
On 5/20/19 4:41 PM, Robert Bradshaw wrote:
On Mon, May 20, 2019 at 1:19 PM Jan Lukavský wrote:
Hi Robert,
yes, I think you rephrased my po
On 5/20/19 1:39 PM, Reuven Lax wrote:
On Mon, May 20, 2019 at 4:19 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi Robert,
yes, I think you rephrased my point - although no *explicit*
guarantees
of ordering are given in either mode, there is *implicit*
2:41 PM, Robert Bradshaw wrote:
On Fri, May 17, 2019 at 4:48 PM Jan Lukavský wrote:
Hi Reuven,
How so? AFAIK stateful DoFns work just fine in batch runners.
Stateful ParDo works in batch as far, as the logic inside the state works for absolutely unbounded
out-of-orderness of elements. That ba
a/src/main/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverrides.java#L64
On Thu, May 16, 2019 at 5:09 PM Jan Lukavský mailto:je...@seznam.cz>> wrote:
Hi Max,
answers inline.
-- Původní e-mail --
Od: Maximilian Michels mailto:m...@apac
p in the future (but you sometimes might)
>
> Those seem quite “poor”, but I think you can’t get better guarantees for
general cases for the reasons mentioned above. Also, this is just of the top
of my head and I might be wrong in my understanding of the Beam model. :-O
>
> Best,
>
p
of my head and I might be wrong in my understanding of the Beam model. :-O
>
> Best,
> Aljoscha
>
>> On 16. May 2019, at 13:53, Jan Lukavský wrote:
>>
>> Hi,
>>
>> this is starting to be really exciting. It seems to me that there is
either some
of the Beam model. :-O
Best,
Aljoscha
On 16. May 2019, at 13:53, Jan Lukavský wrote:
Hi,
this is starting to be really exciting. It seems to me that there is either something
wrong with my definition of "Unified model" or with how it is implemented
inside (at least) Direct and Fli
ially global sorting.
If you want to have sorting by key, you could do a GroupByKey and then sort the
groups in memory. This only works, of course, if your groups are not too large.
On 15. May 2019, at 21:01, Jan Lukavský wrote:
Hmmm, looking into the code of FlinkRunner (and also by ob
places where this might matter (which therefore enables
some optimizations). The order would be relevant only in the stateful
ParDo, I'd say.
Jan
On 5/15/19 8:34 PM, Jan Lukavský wrote:
Just to clarify, I understand, that changing semantics of the
PCollection.isBounded, is probably impossible
such a need.
Jan
On 5/15/19 5:20 PM, Jan Lukavský wrote:
Hi,
I have come across unexpected (at least for me) behavior of some
apparent inconsistency of how a PCollection is processed in
DirectRunner and what PCollection.isBounded signals. Let me explain:
- I have a stateful ParDo, which needs
Hi,
I have come across unexpected (at least for me) behavior of some
apparent inconsistency of how a PCollection is processed in DirectRunner
and what PCollection.isBounded signals. Let me explain:
- I have a stateful ParDo, which needs to make sure that elements
arrive in order - it
as a first pass for understanding the implications.
On Fri, May 10, 2019 at 9:31 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hm, yes, the fix might be also in fixing hashCode and equals of
SimpleStateTag, so that it doesn't hash and compare the StateSpec,
but only the St
id and should only be using that id for comparisons/lookups.
On Fri, May 10, 2019 at 1:07 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:
I'm not sure. Generally it affects any runner that uses
HashMap to store StateSpec.
Jan
On 5/9/19 6:32 PM, Reuv
I'm not sure. Generally it affects any runner that uses HashMap to store
StateSpec.
Jan
On 5/9/19 6:32 PM, Reuven Lax wrote:
Is this specific to the DirectRunner, or does it affect other runners?
On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
but wasn't able to figure it out
yet:
https://lists.apache.org/thread.html/dae8b605a218532c085a0eea4e71338eae51922c26820f37b24875c0@%3Cdev.beam.apache.org%3E
Regards,
Anton
*From: *Jan Lukavský mailto:je...@seznam.cz>>
*Date: *Thu, May 9, 2019 at 8:13 AM
*To: * mailto:dev@beam.apac
, 2019 at 7:42 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi,
I have spent several hour digging into strange issue with
DirectRunner,
that manifested as non-deterministic run of pipeline. The pipeline
contains basically only single stateful ParDo, which adds
Hi,
I have spent several hour digging into strange issue with DirectRunner,
that manifested as non-deterministic run of pipeline. The pipeline
contains basically only single stateful ParDo, which adds elements into
state and after some timeout flushes these elements into output. The
issues
Hi,
I have come across an issue using PubsubIO with flink runner. The
problem is described at [1]. I also created PR for this: [2], but there
are some doubts described in comment in the JIRA issue. Would someone
have time to walk through it and/or provide some insights?
Thanks,
Jan
[1]
Hi,
I'm working on a project for which I need to supply URL from where
runner should load classes (at least when it cannot find them locally).
I'd say, that most runners already use URLClassLoader, so that should be
definitely possible, but there seems to be no API for this (definitely
not
a string). To avoid that overhead, I would imagine that SDKs should
keep SQL queries and wait for a later but shared processing (I don't
know if Portability should handle SQL or if it could).
-Rui
On Tue, Dec 4, 2018 at 2:04 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
win, but
don't build any policy here or do big refactors right now.
Kenn
On Mon, Dec 3, 2018 at 9:31 AM Jan Lukavský <mailto:je...@seznam.cz>> wrote:
Hi Robert,
currently there is no actual proposal, I was just trying to gather
feedback from the community. But my
of abstraction. This
makes sense to me.
(2) Nesting the directories (but not the gradle targets or module
names?). Here I'm not so sure about the benefit, especially vs. the
cost.
On Sat, Dec 1, 2018 at 8:38 AM Jan Lukavský wrote:
I think that the fact that SQL uses some other internal dependency
should
relationship
does not prohibit the layered approach to implementation that sounds
like it makes sense.
(As for merging Euphoria into core, my initial impression is that's
probably a good idea, and something we should consider for 3.0 at the
very least.)
On Fri, Nov 30, 2018 at 11:06 PM Jan Lukavský wrote
'll be probably adding sketching
extension dependency soon.
>
> D.
>
> On Fri, Nov 30, 2018 at 7:08 PM Jan Lukavský mailto:je...@seznam.cz)> wrote:
>>
>> Hi Anton,
>> reactions inline.
>>
>> -- Původní e-mail --
>> Od: Anton Kedi
ll be probably adding sketching
extension dependency soon.
>
> D.
>
> On Fri, Nov 30, 2018 at 7:08 PM Jan Lukavský wrote:
>>
>> Hi Anton,
>> reactions inline.
>>
>> -- Původní e-mail --
>> Od: Anton Kedin
>> Komu: dev@beam.apach
ationRel.java#L179)
On Fri, Nov 30, 2018 at 6:29 AM Jan Lukavský mailto:je...@seznam.cz)> wrote:
"Hi community,
I'm part of Euphoria DSL team, and on behalf of this team, I'd like to
discuss possible development of Java based DSLs currently present in
Beam. In my know
Hi community,
I'm part of Euphoria DSL team, and on behalf of this team, I'd like to
discuss possible development of Java based DSLs currently present in
Beam. In my knowledge, there are currently two DSLs based on Java SDK -
Euphoria and SQL. These DSLs currently share only the SDK itself,
Hi all,
the are actually some steps taken in this direction - a few emails
already went to this channel about donation of Euphoria API
(https://github.com/seznam/euphoria) to Apache Beam. SGA has already
been signed, currently there is work in progress for porting all
Euphoria's features to
, Euphoria seems like a programming
model/SDK "close" to Beam more than a DSL on top of an existing Beam SDK.
Am I wrong ?
Regards
JB
On 12/18/2017 03:44 PM, Jan Lukavský wrote:
Hi Ismael,
basically we adopted the Beam's design regarding partitioning
(https://github.com/seznam
Hi Ismael,
basically we adopted the Beam's design regarding partitioning
(https://github.com/seznam/euphoria/issues/160) and implemented the
sorting manually (https://github.com/seznam/euphoria/issues/158). I'm
not aware of the time model differences (Euphoria supports ingestion and
event
501 - 555 of 555 matches
Mail list logo