Moving to u...@beam.apache.org, the best mailing list for questions like
this.
Yes, this kind of workload is a core use case for Beam. If you have a
problem, please write to this user list with details.
Kenn
On Wed, Oct 30, 2019 at 4:07 AM Taher Koitawala wrote:
> Hi All,
> My
Very good points. We definitely ship a lot of code/features in very early
stages, and there seems to be no problem.
I intend mostly to leave this judgment to people like you who know better
about Spark users.
But I do think 1 or 2 jars is better than 3. I really don't like "3 jars"
and I did
On Wed, Oct 30, 2019 at 3:34 PM Maximilian Michels wrote:
>
> > One thing I don't understand is what it means for "CLI or REST API
> > context [to be] present." Where does this context come from? A config
> > file in a standard location on the user's machine? Or is this
> > something that is only
On Tue, Oct 29, 2019 at 7:01 PM Aaron Dixon wrote:
>
> Thank you, Luke and Robert. Sorry for hitting dev@, I criss-crossed and meant
> to hit user@, but as we're here could you clarify your two points, however--
No problem. This is veering into dev@ territory anyway :).
> 1) I am under the
One thing I don't understand is what it means for "CLI or REST API
context [to be] present." Where does this context come from? A config
file in a standard location on the user's machine? Or is this
something that is only present when a user uploads a jar and then
Flink runs it in a specific
I am still a bit lost about why we are discussing options without giving any
arguments or reasons for the options? Why is 2 modules better than 3 or 3 better
than 2, or even better, what forces us to have something different than a single
module?
What are the reasons for wanting to have separate
On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova wrote:
>
>> Do you believe that a future mypy plugin could replace pipeline type checks
>> in Beam, or are there limits to what it can do?
>
> mypy will get us quite far on its own once we completely annotate the beam
> code. That said, my PR does
The herculean term is perfect to describe this impressive achievement Chad.
Congratulations and thanks for the effort to make this happen. This will give
Beam users not only improved functionality but as Robert mentioned help
others to understand more quickly the internals of the python SDK.
On Wed, Oct 30, 2019 at 2:00 AM Jan Lukavský wrote:
>
> TL;DR - can we solve this by representing aggregations as not point-wise
> events in time, but time ranges? Explanation below.
>
> Hi,
>
> this is pretty interesting from a theoretical point of view. The
> question generally seems to be -
> Do you believe that a future mypy plugin could replace pipeline type
> checks in Beam, or are there limits to what it can do?
>
mypy will get us quite far on its own once we completely annotate the beam
code. That said, my PR does not include my efforts to turn PTransforms
into Generics, which
One more question: https://issues.apache.org/jira/browse/BEAM-8396
still seems valuable, but with [auto] as the default, how should we
detect whether LOOPBACK is safe to enable from Python?
On Wed, Oct 30, 2019 at 11:53 AM Robert Bradshaw wrote:
>
> Sounds good to me.
>
> One thing I don't
Sounds good to me.
One thing I don't understand is what it means for "CLI or REST API
context [to be] present." Where does this context come from? A config
file in a standard location on the user's machine? Or is this
something that is only present when a user uploads a jar and then
Flink runs it
Yes, agree, two jars included in uber jar will work in the similar way. Though
having 3 jars looks still quite confusing for me.
> On 29 Oct 2019, at 23:54, Kenneth Knowles wrote:
>
> Is it just as easy to have two jars and build an uber jar with both included?
> Then the runner can still be
A lot of the logic is around handling various error scenarios.
You should notice that the majority of that graph is about passing around
metadata around what files were written and what errors there were. That
metadata is tiny in comparison and should only be a blip when compared to
writing the
>
> As Beam devs will be gaining more first-hand experience with the tooling,
> we may need to add a style guide/best practices/FAQ to our contributor
> guide to clarify known issues.
>
I'm happy to help out with that, just let me know.
-chad
+1 for type annotations.
On Mon, Oct 28, 2019 at 7:41 PM Robert Burke wrote:
> As someone who cribs from the Python SDK to make changes in the Go SDK,
> this will make things much easier to follow! Thank you.
>
> On Mon, Oct 28, 2019, 6:52 PM Chad Dombrova wrote:
>
>>
>> Wow, that is an
Hi All,
My current use-case is to write data from Pubsub to Spanner using
a streaming pipeline. I do see that Beam does have a SpannerIO to write.
However, pubsub being streaming and Spanner being RDBMS like, it
would be helpful to you guys can tell me if this will be
TL;DR - can we solve this by representing aggregations as not point-wise
events in time, but time ranges? Explanation below.
Hi,
this is pretty interesting from a theoretical point of view. The
question generally seems to be - having two events, can I reliably order
them? One event might be
18 matches
Mail list logo