The same holds true in Python: Read the files with TextIO and follow with a
Map operation that splits the lines into records.
This, of course, only works if you don't have newlines within your records.
In that case, you may need to use a DoFn that takes as input a each
filename and reads the
It's "lgajowy". Sorry, I incorrectly assumed it's somehow connected to Jira.
pt., 23 lis 2018 o 18:37 Thomas Weise napisał(a):
> Alexey, you have been added.
>
> Łukasz, I could not find you. Did you create an account? What's the user
> ID?
>
> On Thu, Nov 22, 2018 at 7:47 AM Alexey Romanenko
+1 for introducing the new interface now and deprecating the old one. The
major version change then provides the opportunity to remove deprecated
code.
On Mon, Nov 26, 2018 at 10:09 AM Lukasz Cwik wrote:
> Before 3.0 we will still want to introduce this giving time for people to
> migrate,
Do you need it to change based on the timestamps of the records being
processed, or based on actual current time?
On Mon, Nov 26, 2018 at 5:30 PM Matthew Schneid
wrote:
> Hello,
>
>
>
> I have an interesting issue that I can’t seem to find a reliable
> resolution too.
>
>
>
> I have a standard
Hello,
I have an interesting issue that I can’t seem to find a reliable resolution too.
I have a standard TextIO output that looks like the following:
TextIO.write().to("gs://+ new DateTime().toString("HH-mm-ss") +
"/Test-")
The above works, and writes to GSC, as I expect it too.
However, it
Modifying an existing coder is a non-starter until we have a versioning
story. Creating an entirely new coder should definitely be possible, and
using it either opt-in or, if a good enough case can be made, possibly even
opt-out could get this unblocked.
On Mon, Nov 26, 2018 at 3:05 PM Jeff
High Priority Dependency Updates Of Beam Python SDK:
Dependency Name
Current Version
Latest Version
Release Date Of the Current Used Version
Release Date Of The Latest Release
JIRA Issue
future
0.16.0
0.17.1
2016-10-27
Picking up this thread again. Based on the feedback from Kenn, Reuven, and
Romain, it sounds like there's no objection to the idea of SimpleFunction
and SerializableFunction declaring that they throw Exception. So the
discussion at this point is about whether there's an acceptable way to
introduce
Reuven was one of the people I reached out to on this matter and he replied
on this thread.
On Mon, Nov 26, 2018 at 7:07 AM Robert Bradshaw wrote:
> Modifying an existing coder is a non-starter until we have a versioning
> story. Creating an entirely new coder should definitely be possible, and
Thanks Łukasz.
Should the solution be documented (in Beam testing guide ?) so that other
performance tests can support manual triggering without affecting benchmark
results in a similar manner ?
- Cham
On Thu, Nov 22, 2018 at 4:03 AM Łukasz Gajowy
wrote:
> Hi all,
>
> BEAM-6011 is now
Lukasz - Were you able to get any more context on the possibility of
versioning coders from other folks at Google?
It sounds like adding versioning for coders and/or schemas is potentially a
large change. At this point, should I just write up some highlights from
this thread in a JIRA issue for
Reuven - How is the work on constructor support for ByteBuddy codegen
going? Does it still look like that's going to be a feasible way forward
for generating schemas/coders for AutoValue classes?
On Thu, Nov 15, 2018 at 4:37 PM Reuven Lax wrote:
> I would hope so if possible.
>
> On Fri, Nov
I'm working on BEAM-6102 and after 12 hours on the issue I have not made
much real progress. I initially suspected its a shading issue with the
Dataflow worker jar but can't reproduce the issue without running a full
Dataflow pipeline. Any help would be appreciated, context of what I have
tried is
Examples are good for showing users how to use certain concepts but we
should stick with ValidatesRunner tests for ensuring that runners / SDKs
implement concepts correctly. We have several ValidatesRunner side input
tests in ParDoTest.java[1], ViewTest.java[2], and sideinputs_test.py[3]
that
PR for this: https://github.com/apache/beam/pull/7129
On Tue, Oct 16, 2018 at 11:40 AM Robert Bradshaw
wrote:
> Thanks for bringing this to a conclusion.
>
> On Mon, Oct 15, 2018 at 6:18 PM Thomas Weise wrote:
> >
> > Here is my attempt to summarize the discussion, please see the TBDs.
> >
>
Before 3.0 we will still want to introduce this giving time for people to
migrate, would it make sense to do that now and deprecate the alternatives
that it replaces?
On Mon, Nov 26, 2018 at 5:59 AM Jeff Klukas wrote:
> Picking up this thread again. Based on the feedback from Kenn, Reuven, and
Thanks Maximilian, let me know if you need any help. Usually I debug this
sort of thing by pausing the IntelliJ debugger to see all the different
threads which are waiting on various conditions. If you find any insights
from that, please post them here and we can try to figure out the source of
On Mon, Nov 26, 2018 at 9:09 AM Ismaël Mejía wrote:
> > Bundle finalization is unrelated to backlogs but is needed since there
> is a class of data stores which need acknowledgement that says I have
> successfully received your data and am now responsible for it such as
> acking a message from a
Hi All,
Currently there are two blockers for the 2.9.0 release.
* Dataflow cannot deserialize DoFns -
https://issues.apache.org/jira/browse/BEAM-6102
* [SQL] Nexmark 5, 7 time out -
https://issues.apache.org/jira/browse/BEAM-6082
We'll postpone cutting the release candidate till these issues
Thanks Kenneth. Didn't look into subfolders, let me read a bit more. And
will look into the tests Luke pointed out as well.
To make sure I understand your comments of "Side inputs _are_ different in
streaming as *you* have to ...", are you saying either: 1) a user needs to
use/treat SideInput
On Mon, Nov 26, 2018 at 1:32 PM Ruoyun Huang wrote:
> Thanks Kenneth. Didn't look into subfolders, let me read a bit more. And
> will look into the tests Luke pointed out as well.
>
> To make sure I understand your comments of "Side inputs _are_ different in
> streaming as *you* have to ...",
Hi Alex,
Thanks for your help! I'm quite used to debugging concurrent/distributed
problems. But this one is quite tricky, especially with regards to GRPC
threads. I try to provide more information in the following.
There are two observations:
1) The problem is specifically related to how
22 matches
Mail list logo