Re: Testing of Metrics in context of DoFnTester

2017-05-09 Thread Pablo Estrada
Hi Michael, For the Metrics API, the way to programatically query the value of a metric is by using the MetricResults.queryMetrics method. You get the MetricResults object from the PipelineResult object, and query it like so: PipelineResult res = p.run(); MetricQueryResults metricResult =

Re: Testing of Metrics in context of DoFnTester

2017-05-09 Thread Pablo Estrada
do not like the idea to much starting to do > some mocking of the metrics api within my test implementation. > > Regards, > > michel > > > On Wed, May 10, 2017 at 1:10 AM, Pablo Estrada <pabl...@google.com> wrote: > >> Hi Michael, >> For the Metrics API, the way t

Re: [FYI] New Apache Beam Swag Store!

2018-06-08 Thread Pablo Estrada
Nice : D On Fri, Jun 8, 2018, 3:43 PM Raghu Angadi wrote: > Woo-hoo! This is terrific. > > If we are increasing color choices I would like black or charcoal... Beam > logo would really pop on a dark background. > > On Fri, Jun 8, 2018 at 3:32 PM Griselda Cuevas wrote: > >> Hi Beam Community,

Re: Apache Beam June Newsletter

2018-06-13 Thread Pablo Estrada
Thanks Gris! Lots of interesting things. Best -P. On Wed, Jun 13, 2018 at 4:40 PM Griselda Cuevas wrote: > Hi Beam Community! > > Here > > [1] > is the June Edition of our Apache Beam

Re: Testing an updating side input on global window

2018-05-29 Thread Pablo Estrada
As far as I know, that behavior is not specified. I do not think that Dataflow streaming supports this sort of updating to side inputs, though I've added Slava who might have more to add. If updating side inputs is really not supported in Dataflow, you may be able to use a LoadingCache, like so:

Re: Initial contributor experience

2018-06-05 Thread Pablo Estrada
Thanks Austin for taking the time to go through this! We came out with a few JIRAs to improve the documentation (see doc), and hopefully we'll keep iterating on this. Hopefully we can get more experiences from other people that start to approach Beam. Best -P. On Tue, Jun 5, 2018 at 1:49 PM

Re: Data guarantees PubSub to GCS

2018-01-04 Thread Pablo Estrada
I am not a streaming expert, but I will answer according to how I understand the system, and others can correct me if I get something wrong. *Regarding elements coming from PubSub into your pipeline:* Once the data enters your pipeline, it is 'acknowledged' on your PubSub subscription, and you

Re: Data guarantees PubSub to GCS

2018-01-04 Thread Pablo Estrada
Fusion optimization would >> fuse multiple stages together and if any of these stages throw an >> exception, the Pub/Sub message won't be acknowledged. I've also verified >> this behavior. >> >> Let me know if my understanding is correct. :) >> >> Thanks, &g

Re: [Call for items] Beam August Newsletter

2018-08-02 Thread Pablo Estrada
Thanks for doing this Rose! I'll add a couple of things. -P. On Thu, Aug 2, 2018, 4:18 PM Rose Nguyen wrote: > Hi all: > > Here's > > [1] > the template for the August Beam Newsletter! > > *Add the

[ANNOUNCE] Apache Beam 2.6.0 released!

2018-08-08 Thread Pablo Estrada
Beam 2.6.0. -- Pablo Estrada, on behalf of The Apache Beam team -- Got feedback? go/pabloem-feedback

Re: Beam Java newbie.

2018-08-11 Thread Pablo Estrada
Can you elaborate on your setup? Are you using a Maven archetype? Did you hand-write a pom.xml? Are you installing from code with gradle? Best -P. On Sat, Aug 11, 2018, 3:43 PM Mahesh Vangala wrote: > Hello folks - > > I am enthusiastic about learning beam using java sdk. > I set up maven using

Any Beamers at OSCON?

2018-07-15 Thread Pablo Estrada
Hello everyone, I am reaching out for people that will be attending this year's OSCon in Portland[1], starting tomorrow 10/16. I am aware of a few Beam people that will be attending along with me, and I'm sure we'll be glad to see others. I am also aware of one talk involving Beam, by Holden

Re: Slides from OSCON

2018-09-04 Thread Pablo Estrada
I found this video: https://www.safaribooksonline.com/videos/oscon-2018/9781492026075/9781492026075-video321550 Though it requires a suscription to Safari books. Best -P. On Tue, Sep 4, 2018 at 8:21 AM Matthias Baetens wrote: > Hey Holden, just checking if you were able to find something? :) >

Re: BigQuery streaming insert errors

2018-04-05 Thread Pablo Estrada
Im adding Cham as he might be knowledgeable about BQ IO, or he might be able to redirect to someone else. Cham, do you have guidance for Carlos here? Thanks -P. On Mon, Apr 2, 2018 at 11:08 AM Carlos Alonso wrote: > And... where could I catch that exception? > > Thanks! >

Re: [Please take it] Apache Beam Summit - Exit Survey

2018-03-28 Thread Pablo Estrada
Including user list. -- Forwarded message - From: Pablo Estrada <pabl...@google.com> Date: Wed, Mar 28, 2018 at 1:15 PM Subject: [Please take it] Apache Beam Summit - Exit Survey To: d...@beam.apache.org <d...@beam.apache.org> Hello all, For those who attende

Re: Modular IO presentation at Apachecon

2018-09-27 Thread Pablo Estrada
I'll take this chance to plug in my little directory of Beam tools/materials: https://github.com/pabloem/awesome-beam Please feel free to send PRs : ) On Wed, Sep 26, 2018 at 10:29 PM Ankur Goenka wrote: > Thanks for sharing. Great slides and looking for the recorded session. > > Do we have a

Re: Agenda for the Beam Summit London 2018

2018-09-27 Thread Pablo Estrada
Very exciting. I will have to miss it, but I'm excited to see what comes out of it:) Thanks to Gris, Matthias and other organizers. Best -P. On Thu, Sep 27, 2018, 4:26 PM Jean-Baptiste Onofré wrote: > Great !! Thanks Gris. > > Looking forward to see you all next Monday in London. > > Regards >

Re: 2019 Beam Events

2018-12-04 Thread Pablo Estrada
FWIW, for some of these places that have interest (e.g. Brazil, Israel), it's possible to create a group in meetup.com, and start gauging interest, and looking for organizers. Once a group of people with interest exists, it's easier to get interest / sponsorship to bring speakers. So if you are

Re: Single threaded processing

2019-01-07 Thread Pablo Estrada
ConsumeFileDescriptors()); -P. On Mon, Jan 7, 2019 at 5:23 PM Pablo Estrada wrote: > Hi Matt, > > I am much more familiar with Python, so I usually answer questions using > that SDK. Also, it's quicker to type a fully detailed pipeline on an email > and the SDKs are similar enough

Re: Single threaded processing

2019-01-07 Thread Pablo Estrada
gmail.com> > Senior Solution Architect, Kettle Project Founder > > > > Op ma 7 jan. 2019 om 23:09 schreef Pablo Estrada : > >> Hi Matt, >> is this computation running as part of a larger pipeline that does run >> some parallel processing? Otherwise, it's odd th

Re: Single threaded processing

2019-01-07 Thread Pablo Estrada
Hi Matt, is this computation running as part of a larger pipeline that does run some parallel processing? Otherwise, it's odd that it needs to run on Beam. Nonetheless, you can certainly do this with a pipeline that has a single element. Here's what that looks like in python: p |

Re: Implementation an S3 file system for python SDK

2019-04-03 Thread Pablo Estrada
Hi Pasan! Thanks for the proposal. I'll try to take a look in the next few hours and give some feedback. Best --P. On Wed, Apr 3, 2019, 8:53 AM Ahmet Altay wrote: > +Pablo Estrada > > On Wed, Apr 3, 2019 at 8:46 AM Lukasz Cwik wrote: > >> +dev >> >> On Wed

Re: Is there an integration test available for filesystem checking

2019-04-08 Thread Pablo Estrada
I recommend you send these questions to the dev@ list Pasan. Have you looked at the *_test.py files corresponding to each one of the file systems? Are they all mocking their access to GCS? Best -P. On Sun, Apr 7, 2019 at 11:12 PM Pasan Kamburugamuwa < pasankamburugamu...@gmail.com> wrote: >

Re: Beam Summit Europe 2019: CfP

2019-03-04 Thread Pablo Estrada
Thanks to everyone involved organizing this. This is exciting : ) Best -P. On Mon, Mar 4, 2019 at 1:27 PM Matthias Baetens wrote: > Hi everyone, > > As you might already know, the *Beam Summit Europe 2019* will take place > in *Berlin* this year on *19-20 June*! > > Of course, we would love to

Re: GSOC - Apache Beam Python SDK

2019-03-12 Thread Pablo Estrada
Hi Pasan! Welcome to Apache Beam. Happy to have your interest. Can you share what are your specific questions about the topic? My initial advice would be to study the filesystems[1] packages of Beam, and the GCS filesystem[2]. As a piece of advice, you can find us in the ASF slack:

Re: GSOC - Apache Beam Python SDK

2019-03-12 Thread Pablo Estrada
Oh, if you are not yet subscribed to the ASF slack, you can do so here: https://s.apache.org/slack-invite On Tue, Mar 12, 2019 at 10:30 AM Pablo Estrada wrote: > Hi Pasan! > Welcome to Apache Beam. Happy to have your interest. Can you share what > are your specific questions about the

PipelineOptions at execution time from DirectRunner

2019-03-21 Thread Pablo Estrada
Hi all, The DirectRunner does not seem to support RuntimeValueProvider. Is there a suggestion for DirectRunner pipelines to access arguments passed in as pipeline options(but not necessarily passed explicitly by users) at pipeline execution time? Getting it as pcoll.pipeline.options in the

Re: PipelineOptions at execution time from DirectRunner

2019-03-21 Thread Pablo Estrada
Thanks Ahmet! These are illustrative explanations. I still wonder about one question: > >> Getting it as pcoll.pipeline.options in the expand(self, pcoll) call is a >> possiblity, but it seems like that's not ideal. Any other suggestions? >> > Is this an appropriate way of obtaining an option

[Meetup] Apache Flink+Beam+others in Seattle. Feb 21.

2019-02-15 Thread Pablo Estrada
Hello everyone, There is an upcoming meetup happening in the Google Seattle office, on February 21st, starting at 5:30pm: https://www.meetup.com/seattle-apache-flink/events/258723322/ People will be chatting about Beam, Flink, Hive, and AthenaX . Anyone who is

Re: Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Pablo Estrada
. On Mon, Apr 8, 2019 at 10:00 AM Alex Amato wrote: > +Lukasz Cwik , +Boyuan Zhang , +Lara > Schmidt > > Should splittable DoFn be considered in this design? In order to split and > scale the source step properly? > > On Mon, Apr 8, 2019 at 9:11 AM Ahmet Altay wrote: >

Re: Hi, some sample about Extracting data from Xlsx ?

2019-04-15 Thread Pablo Estrada
Hello Henrique, I am not aware of existing Beam transforms specifically used for reading in XLSX data. Can you share what you mean by "examples related with Cs extension"? I am aware of some Python libraries foir this sort of thing[1]. You could use the FileIO transforms in the Python SDK to

Re: Hi, some sample about Extracting data from Xlsx ?

2019-04-16 Thread Pablo Estrada
t is more ellegant... > > If your have some Idea ! let me know . it will be welcome!! > > > On Mon, Apr 15, 2019 at 6:01 PM Pablo Estrada wrote: > >> Hello Henrique, >> >> I am not aware of existing Beam transforms specifically used for reading >> in XLSX data.

Re: GSOC - Implement an S3 filesystem for Python SDK

2019-03-13 Thread Pablo Estrada
Hi Pasan! I answered with some links an tips in your previous post. You can find them here: https://lists.apache.org/thread.html/c6637178a0fa5e4f0b2f1b3fe8991b79863f384d2573b6f22cb5f3b2@%3Cuser.beam.apache.org%3E Best -P. On Tue, Mar 12, 2019 at 8:26 PM Pasan Kamburugamuwa <

[ANNOUNCEMENT] Common Pipeline Patterns - new section in the documentation + contributions welcome

2019-06-07 Thread Pablo Estrada
Hello everyone, A group of community members has been working on gathering and providing common pipeline patterns for pipelines in Beam. These are examples on how to perform certain operations, and useful ways of using Beam in your pipelines. Some of them relate to processing of files, use of side

Re: Streaming inserts BQ with Java SDK Beam

2019-05-07 Thread Pablo Estrada
Hi Andres! You can definitely do streaming inserts using the Java SDK. This is available with BigQueryIO.write(). Specifically, you can use the `withMethod`[1] call to specify whether you want batch loads or streaming inserts. If you specify streaming inserts, Beam should insert rows as they come

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread Pablo Estrada
An example that I can think of as a feature that Beam could provide to other runners is SQL. Beam SQL expands into Beam transforms, and it can run on other runners. Flink and Spark do have SQL support because they've invested in it, but think of smaller runners e.g. Nemo. Of course, not all of

Re: Python WriteToBigQuery with FILE_LOAD & additional_bq_parameters not working

2019-09-05 Thread Pablo Estrada
Not sure why with > WriteToBigQuery doesn't work, since it's using BigQueryBatchFileLoads under > the hood... > > Thanks for the help. > Zdenko > ___ > http://www.the-swamp.info > > > > On Wed, Sep 4, 2019 at 6:55 PM Chamikara Jayalath > wrote: &g

Re: Is there any job board for Beam with Java?

2019-09-05 Thread Pablo Estrada
Hi deepak, this doesn't count as spamming, so don't feel bad for posting the question : ). Your question is an appropriate one for this mailing list. That being said, I do not know of a Beam job board. You can probably find jobs with Beam or Apache Beam as a keyword on LinkedIn. Have you tried

[Java] Accessing state from FinishBundle method

2019-07-29 Thread Pablo Estrada
Hello all, I am working on a pipeline where I'd like to write a value to state at the end of processing a bundle. As it turns out, I don't think this is possible, as FinishBundleContext does not provide a method for it; and doing something like so also errors out: == @FinishBundle

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-20 Thread Pablo Estrada
Hopefully, it will also be recorded >> as well. >> >> On Wed, Jul 17, 2019 at 2:50 PM Pablo Estrada wrote: >> >>> Yes! So I will be working on a small feature request for Java's >>> BigQueryIO: https://issues.apache.org/jira/browse/BEAM-7607 >>

Choosing a coder for a class that contains a Row?

2019-07-22 Thread Pablo Estrada
Hello all, I am writing a utility to push data to PubSub. My data class looks something like so: == class MyData { String someId; Row someRow; Row someOtherRow; } == The schema for the Rows is not known a-priori. It is contained by the Row. I am then pushing this data to

Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-16 Thread Pablo Estrada
Hello all, I'll be having a session where I live-fix a Beam bug for 1 hour next week. Everyone is invited. It will be on July 25, between 3:30pm and 4:30pm PST. Hopefully I will finish a full change in that time frame, but we'll see. I have not yet decided if I will do this via hangouts, or via

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-17 Thread Pablo Estrada
uld be really helpful > newbies like me. > > Is it possible to list out what are the things that you are planning to > cover? > > > > > On Tue, Jul 16, 2019 at 11:19 AM Yichi Zhang wrote: > >> Thanks for organizing this Pablo, it'll be very helpful! >> >

Re: [python SDK] Returning Pub/Sub message_id and timestamp

2019-07-19 Thread Pablo Estrada
Beam 2.14.0 will include support for writing files in the fileio module (the support will include GCS, local files, HDFS). It will also support streaming. The transform is still marked as experimental, and is likely to receive improvements - but you can check it out for your pipelines, and see if

Re: Choosing a coder for a class that contains a Row?

2019-07-23 Thread Pablo Estrada
ks like an exciting candidate for per-job schema > registry story... > > I'm super eager to see if there are other ideas or a contribution we > can make in this area that's "Beam Row" oriented! > > Ryan > > [1] > https://github.com/Talend/components/blob/master/core/component

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-23 Thread Pablo Estrada
, Jul 20, 2019 at 2:05 PM Pablo Estrada wrote: > Hello all, > > This will be streamed on youtube on this link: > https://www.youtube.com/watch?v=xpIpEO4PUDo > > I think there will be a live chat, so I will hopefully be available to > answer questions. To be honest, my workflow i

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-25 Thread Pablo Estrada
wrote: > >> Pablo, >> >> Assigned https://issues.apache.org/jira/browse/BEAM-7607 to you, to >> make even more likely that it is still around on the 25th :-) >> >> Cheers, >> Austin >> >> On Tue, Jul 23, 2019 at 11:24 AM Pablo Estrada >>

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-25 Thread Pablo Estrada
t; On Thu, Jul 25, 2019 at 4:56 PM Pablo Estrada wrote: > >> The link is here: https://www.youtube.com/watch?v=xpIpEO4PUDo >> This is still happening. >> >> On Thu, Jul 25, 2019 at 2:55 PM Innocent Djiofack >> wrote: >> >>> Did I miss the link or th

Beam meetup Seattle!! September 26th, 6pm

2019-09-23 Thread Pablo Estrada
Hello everyone! If you are in the Seattle area please come to Beam meetup this Thursday, September 26th - at 6pm in the Google office in Fremont. There will be interesting talks, and there should be a number of Beam contributors and users around. Also pizza and drinks. The page with al the info:

Re: How to store offset with kafkaio

2019-12-04 Thread Pablo Estrada
Hi! What do you mean by offset? Is 'offset' a field in a database table? Or maybe it's an offset in the database binlog? Best -P. On Wed, Nov 27, 2019 at 7:32 PM 郑 洁锋 wrote: > Hi, >I want to store the offset in Oracle/Mysql, and then every time I > start the real-time streaming task, I

Re: beam.io.BigQuerySource does not accept value providers

2019-10-18 Thread Pablo Estrada
Hi Theodore! Display data is what's throwing the error, but the BigQuerySource does not support value providers even despite that issue because it's a Dataflow native source. Unfortunately, this is not currently possible. Currently, you could do this executing a BQ export job (using a DoFn), and

Re: real real-time beam

2019-11-25 Thread Pablo Estrada
If I understand correctly - your pipeline has some kind of windowing, and on every trigger downstream of the combiner, the pipeline updates a cache with a single, non-windowed value. Is that correct? What are your keys for this pipeline? You could work this out with, as you noted, a timer that

Re: Need Help | SpannerIO

2019-12-18 Thread Pablo Estrada
Or perhaps you have a PCollection or something like that, and you want to use those strings to issue queries to Spanner? PCollection myStrings = p.apply(.) PCollection rows = myStrings.apply( SpannerIO.read() .withInstanceId(instanceId) .withDatabaseId(dbId)

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Pablo Estrada
Elapsed time5 min 11 sec > > On Tue, Feb 4, 2020 at 9:15 AM Pablo Estrada wrote: > >> Hi Alan, >> could it be that you're picking up the new Apache Beam 2.19.0 release? >> Could you try depending on beam 2.18.0 to see if the issue surfaces when >> using the new r

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Pablo Estrada
Hi Alan, could it be that you're picking up the new Apache Beam 2.19.0 release? Could you try depending on beam 2.18.0 to see if the issue surfaces when using the new release? If something was working and no longer works, it sounds like a bug. This may have to do with how we pickle (dill /

Re: 2nd Apache Beam Meetup in Warsaw

2020-01-30 Thread Pablo Estrada
Adding user@ :) On Thu, Jan 30, 2020, 5:05 AM Michał Walenia wrote: > Hi there, > we're organizing the second Apache Beam Meetup in Warsaw! It's going to > take place on 20th February in our office. We're looking for speakers, so > if anyone is interested in sharing some knowledge - let me know

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Pablo Estrada
Thanks Udi! On Tue, Jan 28, 2020 at 11:08 AM Rui Wang wrote: > Thank you Udi for taking care of Beam 2.18.0 release! > > > > -Rui > > On Tue, Jan 28, 2020 at 10:59 AM Udi Meiri wrote: > >> The Apache Beam team is pleased to announce the release of version 2.18.0. >> >> Apache Beam is an open

Re: Beam Summit North America 2019 - recordings

2020-01-13 Thread Pablo Estrada
Thanks Matthias! On Sun, Jan 12, 2020 at 7:51 AM Matthias Baetens wrote: > Hi everyone, > > It's our pleasure to share the recordings from the Beam Summit North > America 2019. > Please find them in the YouTube playlist >

[FYI] Rephrasing the 'lull'/processing stuck logs

2020-01-09 Thread Pablo Estrada
Hello Beam users and community, The Beam Python SDK, and Java workers have a utility where they will print a log message whenever there's an execution thread where no state transitions happen for over five minutes. These messages are common in two scenarios: 1. A deadlock happening in the worker

Re: Big Query source and Dataflow template doubt

2020-03-13 Thread Pablo Estrada
Hello Andre! Unfortunately, the BigQuerySource does not support value providers. There is a different transform to read from bigquery in apache_beam.io.gcp.bigquery._ReadFromBigQuery. This one will soon support it (hopefully by 2.21). Unfortunately at the moment, it is not possible to change the

Re: Beam Katas YouTube

2020-03-27 Thread Pablo Estrada
Nice : D On Fri, Mar 27, 2020 at 12:44 AM Alex Van Boxel wrote: > That's nicely done! Congrats, going to share this immediately. > > And I actually didn't know where the name Beam came from, now I know :-) > > _/ > _/ Alex Van Boxel > > > On Fri, Mar 27, 2020 at 4:32 AM Henry Suryawirawan < >

Re: Try Beam Katas Today

2020-05-13 Thread Pablo Estrada
Sharing Damon's email with the user@ list as well. Thanks Damon! On Tue, May 12, 2020 at 9:02 PM Damon Douglas wrote: > Hello Everyone, > > If you don't already know, there are helpful instructional tools for > learning the Apache Beam SDKs called Beam Katas hosted on > https://stepik.org.

Re: Provide credentials for s3 writes

2020-09-29 Thread Pablo Estrada
Hi Ross, it seems that this feature is missing (e.g. passing a pipeline option with authentication information for AWS). I'm sorry about that - that's pretty annoying. I wonder if you can use the setup.py file to add the default configuration yourself while we have appropriate support for a

Re: PaneInfo showing UNKOWN State

2020-05-26 Thread Pablo Estrada
Hi Jayadeep, Unfortunately, it seems that PaneInfo is not well supported yet on the local runners: https://issues.apache.org/jira/browse/BEAM-3759 Can you share more about your use case, and what you'd like to do with the PaneInfo? On Sat, May 23, 2020 at 10:03 AM Jay wrote: > Hi All - > >

Re: [ANNOUNCE] Beam 2.25.0 Released

2020-10-26 Thread Pablo Estrada
Thanks Robin! On Mon, Oct 26, 2020 at 11:06 AM Robin Qiu wrote: > The Apache Beam team is pleased to announce the release of version 2.25.0. > > Apache Beam is an open source unified programming model to define and > execute data processing pipelines, including ETL, batch and stream >

Re: ValueProviderOptions and templates

2020-06-17 Thread Pablo Estrada
I believe you don't need to provide it at template construction time, but at invocation time. Are you having trouble with providing the parameters at invocation time? Best -P. On Tue, Jun 16, 2020 at 2:22 PM Marco Mistroni wrote: > HI all > i am creating dataflow jobs using python API by

Re: [VOTE] Release 2.27.0, release candidate #4

2021-01-06 Thread Pablo Estrada
+1 (binding) I've built and unit tested existing Dataflow Templates with the new version. Best -P. On Tue, Jan 5, 2021 at 11:17 PM Pablo Estrada wrote: > Hi everyone, > Please review and vote on the release candidate #4 for the version 2.27.0, > as follows: > [ ] +1, Approve

Re: FileIO Azure Storage problems

2020-12-02 Thread Pablo Estrada
le to assign it: > https://issues.apache.org/jira/browse/BEAM-11313 > > Thank you so much for taking a look at this! > > Best Regards > Thomas Li Fredriksen > > On Fri, Nov 20, 2020 at 5:43 PM Pablo Estrada wrote: > >> Follow up: What is the size of the file you're consuming? &

Re: [VOTE] Release 2.27.0, release candidate #1

2020-12-24 Thread Pablo Estrada
t;> > >>>> I validated python quickstarts. Thank you Pablo. > >>>> > >>>> On Tue, Dec 22, 2020 at 10:04 PM Jean-Baptiste Onofre < > j...@nanthrax.net> wrote: > >>>>> > >>>>> +1 (binding) > >>>>&

[VOTE] Release 2.27.0, release candidate #1

2020-12-22 Thread Pablo Estrada
Hi everyone, Please review and vote on the release candidate #1 for the version 2.27.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) Reviewers are encouraged to test their own use cases with the release candidate, and vote +1 if

Re: Unit Testing Custom Coder

2020-11-19 Thread Pablo Estrada
Hi Dave! I don't have a lot of experience with coders, but I would include the Beam user@ list (added just now) to see if someone else has done this. Best -P. On Wed, Nov 18, 2020 at 7:22 AM Dave Anderson wrote: > Pablo, > > Also, for now I've created tests that exercise the encode() and

Re: FileIO Azure Storage problems

2020-11-20 Thread Pablo Estrada
Follow up: What is the size of the file you're consuming? -P. On Fri, Nov 20, 2020 at 8:40 AM Pablo Estrada wrote: > Hi Thomas! > This looks like it may be a bug with the azfs implementation. If you > notice the code, you're hitting this issue when the byte channel needs to > se

Re: FileIO Azure Storage problems

2020-11-20 Thread Pablo Estrada
Hi Thomas! This looks like it may be a bug with the azfs implementation. If you notice the code, you're hitting this issue when the byte channel needs to seek backwards. I may take a stab at fixing it. I believe we have to catch a mark expiration, and just reopen the file if that happens. Do you

Re: [ANNOUNCEMENT] Support for Hadoop 3 confirmed

2020-11-18 Thread Pablo Estrada
Very nice. Thanks Piotr! On Wed, Nov 18, 2020 at 7:42 AM Alexey Romanenko wrote: > This is very good improvement, thank you for working on this! > > PS: What’s about Hive/HBase modules? > > > On 18 Nov 2020, at 11:54, Piotr Szuberski > wrote: > > > > Starting with [1] all of the Beam modules

Re: [REMOTE WORKSHOPS] Introduction to Apache Beam - remote workshops Dec 3rd and Dec 10th

2020-11-17 Thread Pablo Estrada
+dev so everyone will know. This is cool. Thanks Karolina! Will these be an introduction to basic Beam concepts? Thanks! -P. On Mon, Nov 16, 2020 at 11:52 AM Karolina Rosół wrote: > Hello everyone, > > You may not know me but I'm Karolina Rosół, Head of Cloud & OSS at Polidea > and I'm working

Re: October 2020, Beam Community Update

2020-11-03 Thread Pablo Estrada
Hi Alexey, Do you have any other place in mind? I don't think Brittany has current plans to publish this elsewhere, but if you have any good ideas, I imagine she could consider them : ) Best -P. On Tue, Nov 3, 2020 at 8:23 AM Alexey Romanenko wrote: > Thanks for doing this! > > Is it going to

Re: [VOTE] Release 2.27.0, release candidate #1

2020-12-28 Thread Pablo Estrada
tringsToPubSub and I just sent [2] to replace WriteStringsToPubSub > with WriteToPubSub in example code. Issue is tracked in [3]. > > > > [1] https://github.com/apache/beam/pull/13614 > > [2] https://github.com/apache/beam/pull/13615 > > [3] https://issues.apache.org/jira/browse

[ANNOUNCE] Beam 2.27.0 Released

2021-01-08 Thread Pablo Estrada
The Apache Beam team is pleased to announce the release of version 2.27.0. Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing. See https://beam.apache.org You can download the release

[VOTE] Release 2.27.0, release candidate #4

2021-01-05 Thread Pablo Estrada
Hi everyone, Please review and vote on the release candidate #4 for the version 2.27.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) *NOTE*. What happened to RC #2? I started building RC2 before completing all the cherry-picks, so

Re: Dataflow v2 runner scaling behaviour

2021-03-24 Thread Pablo Estrada
Hi David, Thanks for sharing. I'm investigating something like this recently. What's the size of your data? Best -P. On Wed, Mar 24, 2021, 7:52 AM David Sánchez wrote: > Hi folks! > > I'm testing the dataflow v2 runner in a batch pipeline (Apache Beam Python > 3.7 SDK 2.27.0) that reads many

Re: Python SDK with S3IO on Flink

2021-02-25 Thread Pablo Estrada
hi Nir! was this fixed by the PR you submitted? On Wed, Feb 24, 2021 at 8:55 AM Nir Gazit wrote: > Hey, > When trying to read a file from S3 with a combine action, the pipeline > seems to be stuck. When replacing it with a GCP source it works fine. > Furthermore - if I comment out the

Re: Rate Limiting in Beam

2021-04-15 Thread Pablo Estrada
You could implement a Splittable DoFn that generates a limited number of splits. We do something like this for GenerateSequence.from(X).withRate(...) via UnboundedCountingSource[1]. It keeps track of its local EPS, and generates new splits if more EPSs are wanted. This should help you scale up to

[ANNOUNCE] Apache Beam 2.40.0 Released

2022-06-27 Thread Pablo Estrada
The Apache Beam team is pleased to announce the release of version 2.40.0. Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing. See https://beam.apache.org You can download the release

Re: [ANNOUNCE] Apache Beam 2.41.0 Released

2022-08-25 Thread Pablo Estrada via user
Thank you Kiley! On Thu, Aug 25, 2022 at 10:55 AM Kiley Sok wrote: > The Apache Beam team is pleased to announce the release of version 2.41.0. > > Apache Beam is an open source unified programming model to define and > execute data processing pipelines, including ETL, batch and stream >

Re:  Join us in NYC at Beam Summit 2023

2023-05-25 Thread Pablo Estrada via user
let's goo On Thu, May 25, 2023 at 12:49 PM Carolina Escobar wrote: > *Get to know our speakers!* > > *Take a quick peek at our program:* > > >- > >*Beam IO: CDAP and SparkReceiver IO Connectors Overview * >Alex Kosolapov and Elizaveta Lomteva give an overview of a Beam IO >

Re: Where to specify trust.jks

2023-05-18 Thread Pablo Estrada via user
Hi Utkarsh, you can pass a path in GCS (or a filesystem), and the workers should be able to download it onto themselves. You'd pass `gs://my-bucket-name/path/to/trust.jks`. Can you try that? Best -P. On Wed, May 10, 2023 at 1:58 PM Utkarsh Parekh wrote: > Hi, > > I'm testing a streaming app