job.
What I am wondering is essentially if this means that I am loosing data or
if this will be retried by the sink? Also if this means losing the record
what is the best way to configure the KafkaIO sink to be less aggressive?
I am still Beam 2.8 in this pipeline.
Regards,
Vilhelm von Ehrenheim
; particular key and find the last. It is not ideal, either in clarity or
> performance, but it can work for some cases until we have retraction
> support.
>
> Apologies for typos or broken code here, as I am just typing it in email
> without checking its compilation or behavio
from
the latest trigger firing. This is particularly useful if you use a side
input with a single global window and specify a trigger.
Thanks!
// Vilhelm von Ehrenheim
for the `Count.globally()` transform. I can
implement this using my own CompbineFn or `Sum` transform instead of course
but still thought it was a bit strange.
Is this a bug or why is this happening?
Regards,
Vilhelm von Ehrenheim
ses FileSystems.match() to list
> the files and computes file timestamps and watermark in the way appropriate
> to your use case.
>
> On Wed, Feb 21, 2018 at 7:29 AM Vilhelm von Ehrenheim <
> vonehrenh...@gmail.com> wrote:
>
>> Hi!
>> I have some problems with w
in KafkaIO are being updated in
> https://github.com/apache/beam/pull/4680.
> - TextIO.watchForNewFiles() - I am not sure how the watermark is handled
> by TextIO. Didn't notice any mentions of in implementation.
>
> On Tue, Feb 20, 2018 at 10:13 AM, Vilhelm von Ehrenheim <
>
PM, Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:
> No the order is not so important as long as it is correct and doesnt emit
> sums for late values.
>
> {"id": "2", "parent_id": "a", "timestamp": 2, "amount"
gt;> {"id": "1", parent_id: "a", "timestamp": 3, "amount": 2}
>>> ```
>>> would the output be 3 and then 5 or would you still want 1, 4, and then
>>> 5?
>>>
>>
>> My own guess here would
I'm super excited about this release! Great work everyone involved!
On Mon, Dec 4, 2017 at 10:58 AM, Jean-Baptiste Onofré
wrote:
> Just an important note that we forgot to mention.
>
> !! The 2.2.0 release will be the last one supporting Spark 1.x and Java 7
> !!
>
> Starting
I am interested! I need to do this soon but havent started yet. Would love
to see an example of how you do it. :)
Br,
Vilhelm von Ehrenheim
On 20 Oct 2017 12:40, "Csaba Kassai" <csaba.kas...@doctusoft.com> wrote:
> Hi Jacob,
>
> we are doing the opposite direction
+1
On Tue, Oct 17, 2017 at 8:22 PM, Thomas Groh wrote:
> I'm pretty strongly in favor of phasing out Java7 support, especially
> given that it was EoL'd more than two years ago. However, I'm not sure how
> this interacts with the repository's backwards-compatibility guarantees
Hi Steve!
I have several pipelines that successfully use both numpy and scikit models
without any problems. I don't think I use Pandas atm but I'm sure that is
fine too.
However, you might have to do some special stuff if you encounter
serializabillity problems. I also have tensorflow models in
interesting to know what the
best approach would be.
Thanks!
Vilhelm von Ehrenheim
you so much and sorry for my confusion!
Still a bit interesting that I didn't need to do this explicitly when
running in Dataflow but it was needed in the Testing case.
Br,
Vilhelm
On Wed, Sep 20, 2017 at 11:50 PM, Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:
> I dont hav
fferent encodings or processing, it gets
> harder to reason about the coder decoding the original inputs with full
> fidelity - splitting these into multiple PCollections is preferable if they
> need to be processed as their original type.
>
> On Wed, Sep 20, 2017 at 2:24 PM, Vilhelm v
;
>> List records = … // create test records
>>
>> PCollection inputRecords = pipeline.apply(Create.of(records));
>>
>> PCollection output =
>> input.apply(ParDo.of(myTagByConsumptionTypeFunc));
>>
>> PAssert.that(output).satisfies(new TestSatisfies&
ice/dataflow-service-desc#autoscaling
Does anyone know if this is true? I know this is not the forum for Dataflow
questions in general but I though someone else here might have experience
that support or contradict this.
Thanks,
Vilhelm von Ehrenheim
as you add to the set of
previous records. However, if the index of previous records can fit into
memory on the nodes I would recommend to use a side input instead that you
do the check against in the DoFn. That should both be fast and work well in
streaming.
Hope it helps.
Br,
Vilhelm von Ehrenheim
can loop over in a nice way? If I read the whole set I'll
most likely run out of memory.
I've found that there exist stateful processing in the Java SDK but it
seems to be missing in python still.
Any help/ideas are greatly appreciated.
Thanks,
Vilhelm von Ehrenheim
; loads and caches the ML model from a side input or returns the singleton if
>> it has been loaded.
>> You'll want to use some form of locking to ensure that you really only
>> load the ML model once.
>>
>> On Wed, May 24, 2017 at 6:18 AM, Vilhelm von Ehrenheim <
>&
,
Vilhelm von Ehrenheim
On Wed, May 24, 2017 at 8:37 AM, Prabeesh K. <prabsma...@gmail.com> wrote:
>
> How to we can take sample data in Python?
>
21 matches
Mail list logo