Re: apache beam bigquery IO connector support for bigquery external tables

2023-04-21 Thread Brian Hulette via user
Hi Nirav, BQ external tables are read-only, so you won't be able to write this way. I also don't think reading a standard external table will work since the Read API and tabledata.list are not supported for external tables [1]. BigLake tables [2] on the other hand, may "just work". I haven't check

Re: DeferredDataFrame not saved as feather file. Problem in both DirectRunner and DataFlowRunner

2022-11-10 Thread Brian Hulette via user
context, when saving a large parquet file, I wish we could > control that it saves in 10 files instead of 1000 for example. This would > make it easier to load back. > > Thanks for the help!! > Of course! > > On 10 Nov 2022 at 18.36.43, Brian Hulette wrote: > >> +user

Re: DeferredDataFrame not saved as feather file. Problem in both DirectRunner and DataFlowRunner

2022-11-10 Thread Brian Hulette via user
+user (adding back the user list) On Thu, Nov 10, 2022 at 9:34 AM Brian Hulette wrote: > Thanks, glad to hear that worked! > > The feather error looks like a bug, I filed an issue [1] to track it. I > think using parquet instead of feather is the best workaround for now. &

Re: DeferredDataFrame not saved as feather file. Problem in both DirectRunner and DataFlowRunner

2022-11-09 Thread Brian Hulette via user
Hi Duarte, I commented on the Stack Overflow question. It looks like your to_dataframe and to_feather calls are outside of the Pipeline context, so they are being created _after_ the pipeline has already run. Hopefully moving them inside the Pipeline context will resolve the issue. Brian On Wed,

Re: [Question] Beam SQL failed with NPE

2022-08-30 Thread Brian Hulette via user
Hi Zheng, Could you share a minimal example that reproduces the issue? Also, have you tried using Beam >2.35.0? I'm curious if this happens in the 2.41.0 release as well. Brian On Tue, Aug 30, 2022 at 11:10 AM Zheng Ni wrote: > Hi There, > > I am using beam 2.35.0 to build a simple sql based pi

Re: Using a non-AutoValue member with AutoValueSchema

2022-08-04 Thread Brian Hulette via user
In some places (e.g. in AutoValueSchema) we assume that nested schema-inferred types are of the same "class". I filed [1] to track this a while back - I think we should support mixing and matching SchemaProviders for nested types. [1] https://github.com/apache/beam/issues/20359 On Thu, Aug 4, 202

Re: [Question] Apache Beam v2.30 breaking change to BigQuery nested arrays of Maps

2022-07-18 Thread Brian Hulette via user
Yeah I think a minimal example would be really helpful. Then at the very least we should be able to bisect to identify the breaking change. For creating a test BigQuery database you might look at using the TestBigQuery rule [1]. There are several usage examples in the Beam repo [2]. [1] https://gi

[ANNOUNCE] Apache Beam 2.37.0 Released

2022-03-09 Thread Brian Hulette
here: https://beam.apache.org/get-started/downloads/ This release includes bug fixes, features, and improvements detailed on the Beam blog: https://beam.apache.org/blog/beam-2.37.0/ Thanks to everyone who contributed to this release, and we hope you enjoy using Beam 2.37.0. -- Brian Hulette

Re: Building a Schema from a file

2021-06-18 Thread Brian Hulette
I have files that define field names and types. > > On Fri, Jun 18, 2021 at 12:12 PM Brian Hulette > wrote: > >> Could you clarify what you mean? I could interpret this two different >> ways: >> 1) Have a separate file that defines the literal schema (field names and

Re: Building a Schema from a file

2021-06-18 Thread Brian Hulette
Could you clarify what you mean? I could interpret this two different ways: 1) Have a separate file that defines the literal schema (field names and types). 2) Infer a schema from data stored in some file in a structurerd format (e.g csv or parquet). For (1) Reuven's suggestion would work. You cou

Re: SqlTransform on windows using direct runner

2021-06-16 Thread Brian Hulette
Git\apache_beam\venv\lib\site-packages\apache_beam\utils\subprocess_server.py", > line 141, in stop_process > self._process.send_signal(signal.SIGINT) > File > "c:\users\XXX\appdata\local\programs\python\python37\lib\subprocess.py", > line 1306, in send_sig

Re: SqlTransform on windows using direct runner

2021-06-15 Thread Brian Hulette
Hi Igor, "Universal Local Runner" is a term we've used in the past for a runner that executes your pipeline locally. It's similar to each SDK's DirectRunner, except that by leveraging portability we should only need one implementation, making it "universal". I don't think we've been using that ter

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Brian Hulette
Guava's EqualityTester. Then it can run through all the > properties without a user setting up test suites. Downside is that the test > failure signal gets aggregated. > > Kenn > > On Wed, Jun 2, 2021 at 12:09 PM Brian Hulette wrote: > >> Could the DirectRunner jus

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Brian Hulette
object >>> that is inconsistent with its encoded form, which could happen to any >>> transform. >>> >>> This does seem to be a gap in DirectRunner testing though. It also makes >>> it hard to test using PAssert, as I believe that puts everything in

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Brian Hulette
ad. >>>>> - I recommend capturing the RenameFields PCollection into a local >>>>> variable of type PCollection and printing out the schema (which you >>>>> can get using the PCollection.getSchema method) to ensure that the output >>>>

Re: RenameFields behaves differently in DirectRunner

2021-06-01 Thread Brian Hulette
Hi Matthew, > The unit tests also seem to be disabled for this as well and so I don’t know if the PTransform behaves as expected. The exclusion for NeedsRunner tests is just a quirk in our testing framework. NeedsRunner indicates that a test suite can't be executed with the SDK alone, it needs a

Re: A problem with nexmark build

2021-05-17 Thread Brian Hulette
Hm it looks like there may be a bug in our gradle config, it doesn't seem to make a shaded jar for use with Spark (see this comment on the PR that added this to the website [1]). Maybe we need to add a shadowJar configuration to :sdks:java:testing:nexmark? +dev does anyone have context on this?

Re: Is there a way (seetings) to limit the number of element per worker machine

2021-05-17 Thread Brian Hulette
What type of files are you reading? If they can be split and read by multiple workers this might be a good candidate for a Splittable DoFn (SDF). Brian On Wed, May 12, 2021 at 6:18 AM Eila Oriel Research wrote: > Hi, > I am running out of resources on the workers machines. > The reasons are: >

Re: DirectRunner, Fusion, and Triggers

2021-05-17 Thread Brian Hulette
> P.S. I need this pipeline to work both on a distributed runner and also on a local machine with many cores. That's why the performance of DirectRunner is important to me. IIUC the DirectRunner has intentionally made some trade-offs to make it less performant, so that it better verifies pipelines

Re: Unsubscribe

2021-05-17 Thread Brian Hulette
Hi Tarek, Pasan, You can unsubscribe by writing to user-unsubscr...@beam.apache.org [1] [1] https://apache.org/foundation/mailinglists.html#request-addresses-for- unsubscribing On Sun, May 16, 2021 at 6:04 AM Pasan Kamburugamuwa < pasankamburugamu...@gmail.com> wrote: > > On Sun, May 16, 2021,

Re: unsubscribe

2021-05-06 Thread Brian Hulette
Hey Simon, you can unsubscribe by writing to user-unsubscr...@beam.apache.org [1] [1] https://apache.org/foundation/mailinglists.html#request-addresses-for-unsubscribing On Thu, May 6, 2021 at 2:24 PM Simon Gauld wrote: > >

Re: New Beam Glossary

2021-05-05 Thread Brian Hulette
+user@beam for visibility Thanks David! -- Forwarded message - From: David Huntsperger Date: Wed, May 5, 2021 at 10:25 AM Subject: New Beam Glossary To: Hey all, We published a new Apache Beam glossary , to help new users learn

Re: Query regarding support for ROLLUP

2021-05-05 Thread Brian Hulette
+Andrew Pilloud do you know if this is a bug? On Tue, May 4, 2021 at 7:38 AM D, Anup (Nokia - IN/Bangalore) < anu...@nokia.com> wrote: > Hi All, > > > > I was trying to use “GROUP BY WITH ROLLUP” (2.29.0 version) which I saw > here - > https://beam.apache.org/documentation/dsls/sql/calcite/query

Re: Question on printing out a PCollection

2021-04-30 Thread Brian Hulette
+Ning Kang +Sam Rohde On Thu, Apr 29, 2021 at 6:13 PM Tao Li wrote: > Hi Beam community, > > > > The notebook console from Google Cloud defines a show() API to display a > PCollection which is very neat: > https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development > > > > I

Re: Any easy way to extract values from PCollection?

2021-04-22 Thread Brian Hulette
I don't think there's an easy answer to this question, in general all you can do with a PCollection is indicate you'd like to write it out to an IO. There has been some work in the Python SDK on "Interactive Beam" which is designed for using the Python SDK interactively in a notebook environment. I

Re: [EXT] Re: [EXT] Re: Beam Dataframe - sort and grouping

2021-04-20 Thread Brian Hulette
but it is not working. I tried > pure pandas, it does work, so I am not sure if anything wrong with Beam. > > Wenbing > > On Wed, Apr 7, 2021 at 2:56 PM Brian Hulette wrote: > >> Hm, to_parquet does have a `partition_cols` argument [1] which we pass >> through [2].

Re: [EXT] Re: Beam Dataframe - sort and grouping

2021-04-07 Thread Brian Hulette
d Brian. >> >> I'd like to try this out. I am trying to distribute my dataset to nodes, >> sort each partition by some key and then store each partition to its own >> file. >> >> Wenbing >> >> On Fri, Apr 2, 2021 at 9:23 AM Brian Hulette wrote:

Re: Beam Dataframe - sort and grouping

2021-04-02 Thread Brian Hulette
Note groupby.apply [1] in particular should be able to do what you want, something like: df.groupby('key1').apply(lambda df: df.sort_values('key2')) But as Robert noted we don't make any guarantees about preserving this ordering later in the pipeline. For this reason I actually just sent a PR t

Re: How to Parallelize Google Cloud Storage Blob Downloads with Grouping

2021-03-31 Thread Brian Hulette
I'm not very familiar with SDF so I can't comment on that approach. Maybe +Boyuan Zhang would be helpful there. What if FileIO could yield the glob that was matched along with each file? Then you could use that as a grouping key later on. Brian On Wed, Mar 24, 2021 at 7:08 AM Evan Galpin wrote

Re: Dataflow - Kafka error

2021-03-30 Thread Brian Hulette
+Chamikara Jayalath Could you try with beam 2.27.0 or 2.28.0? I think that this PR [1] may have addressed the issue. It avoids the problematic code when the pipeline is multi-language [2]. [1] https://github.com/apache/beam/pull/13536 [2] https://github.com/apache/beam/blob/7eff49fae34e8d3c50716

Re: [Question] Need to write a pipeline in Go consuming events from Kafka

2021-03-29 Thread Brian Hulette
+Robert Burke any advice here? On Wed, Mar 24, 2021 at 4:24 AM Đức Trần Tiến wrote: > Hi, > > I am very very new to Go and Apache Beam too! This is my situation: > - I have a kafka running > - I want to write an etl pipeline that consuming data from the kafka in Go > > Because there is no Kaf

Re: JdbcIO SQL best practice

2021-03-23 Thread Brian Hulette
FYI the schemas option has been pursued a little bit in JdbcSchemaIOProvider [1], which naively generates SELECT and INSERT statements for reads and writes. Practically, this code is only usable from SQL, and multi-language pipelines (e.g. it's accessible from the python SDK [2]). We could consider

Re: Ref needed for apache beam SQL

2021-03-08 Thread Brian Hulette
wrote: > Dear Apache Beam experts, > > Kindly help me to use SQL in apache beam with python SDK. > > Best Regards > Thirusenthilkumar P > > > > On Saturday, 6 March 2021, 01:04:14 GMT+5:30, Brian Hulette < > bhule...@apache.org> wrote: > > > Hi Thirusent

Re: A problem with ZetaSQL

2021-03-04 Thread Brian Hulette
the schema is really simple. Just 3 primitive type columns: > > > > root > > |-- column_1: integer (nullable = true) > > |-- column_2: integer (nullable = true) > > |-- column_3: string (nullable = true) > > > > > > *From: *Brian Hulette > *Date: *Th

Re: A problem with ZetaSQL

2021-03-04 Thread Brian Hulette
.apply(MapElements > > .into(TypeDescriptors.rows()) > > > .via(AvroUtils.getGenericRecordToRowFunction(AvroUtils.toBeamSchema(avroSchema > > .setCoder(RowCoder.of(AvroUtils.toBeamSchema(avroSchema))); > > > > > > *Fr

Re: A problem with ZetaSQL

2021-03-02 Thread Brian Hulette
Thanks for reporting this Tao - could you share what the type of your input PCollection is? On Tue, Mar 2, 2021 at 9:33 AM Tao Li wrote: > Hi all, > > > > I was following the instructions from this doc to play with ZetaSQL > https://beam.apache.org/documentation/dsls/sql/overview/ > > > > The qu

Re: Potential bug with BEAM-11460?

2021-03-01 Thread Brian Hulette
hod, which was inadvertently mentioned in your error message due to a copy-paste error. Brian On Sun, Feb 28, 2021 at 5:28 AM Anant Damle wrote: > Thanks Tao! > @Brian Hulette & @Tao Li: I would be curious to > learn how can one test these kinds of PRs where one is only providing some

Re: Potential bug with BEAM-11460?

2021-02-25 Thread Brian Hulette
at 6:49 PM Anant Damle wrote: > Hi Brian, > I think you are right. Create BEAM-11861 > <https://issues.apache.org/jira/browse/BEAM-11861>, will send a PR today. > Present workaround is to provide .setCoder directly on the Output > PCollection. > > On Thu, Feb 25,

Re: Potential bug with BEAM-11460?

2021-02-24 Thread Brian Hulette
+Anant Damle is this an oversight in https://github.com/apache/beam/pull/13616? What would be the right way to fix this? On Tue, Feb 23, 2021 at 5:24 PM Tao Li wrote: > Hi Beam community, > > > > I cannot log into Beam jira so I am asking this question here. I am > testing this new feature from

Re: Unit Testing Kafka in Apache Beam

2021-02-09 Thread Brian Hulette
pplied/messages > dropped accordingly > - assert against the output of the windows > > Thanks so much for the response, I do appreciate it. > > On Feb 9, 2021, at 11:08 AM, Brian Hulette wrote: > >  > Hi Rion, > > Can you run the pipeline asynchronously and in

Re: Unit Testing Kafka in Apache Beam

2021-02-09 Thread Brian Hulette
Hi Rion, Can you run the pipeline asynchronously and inject messages after it has started? We use this approach for some tests against Cloud PubSub. Note if using the DirectRunner you need to set the blockOnRun pipeline option to False to do this. Brian On Mon, Feb 8, 2021 at 2:10 PM Rion Willia

Re: Using the Beam Python SDK and PortableRunner with Flink to connect to Kafka with SSL

2021-02-02 Thread Brian Hulette
Hm I would expect that to work. Can you tell what container Flink is using if it's not using the one you specified? +Chamikara Jayalath may have some insight here Brian On Tue, Feb 2, 2021 at 3:27 AM Paul Nimusiima wrote: > Hello Beam community, > > I am wondering whether it is possible to con

Re: Regarding the field ordering after Select.Flattened transform

2021-01-20 Thread Brian Hulette
f may be convenient in > some use cases if we can just keep the order (roughly) consistent with the > order of the parent fields from the original schema. > > > > *From: *Brian Hulette > *Reply-To: *"user@beam.apache.org" > *Date: *Wednesday, January 20, 2021 at

Re: Regarding the field ordering after Select.Flattened transform

2021-01-20 Thread Brian Hulette
This does seem like an odd choice, I suspect this was just a matter of convenience of implementation since the javadoc makes no claims about field order. In general schema transforms don't take care to maintain a particular field order and I'd recommend against relying on it. Instead fields should

Re: Beam with Confluent Schema Registry and protobuf

2021-01-11 Thread Brian Hulette
Thanks Cristian! +user in case there are interested parties that monitor user@ and not dev@ On Fri, Jan 8, 2021 at 8:17 PM Cristian Constantinescu wrote: > Hi everyone, > > Beam currently has a dependency on older versions of the Confluent libs. > It makes it difficult to use Protobufs with th

Re: Quick question regarding ParquetIO

2021-01-07 Thread Brian Hulette
On Wed, Jan 6, 2021 at 11:07 AM Tao Li wrote: > Hi Brian, > > > > Please see my answers inline. > > > > *From: *Brian Hulette > *Reply-To: *"user@beam.apache.org" > *Date: *Wednesday, January 6, 2021 at 10:43 AM > *To: *user > *Subject: *Re: Q

Re: Quick question regarding ParquetIO

2021-01-06 Thread Brian Hulette
Hey Tao, It does look like BEAM-11460 could work for you. Note that relies on a dynamic object which won't work with schema-aware transforms and SqlTransform. It's likely this isn't a problem for you, I just wanted to point it out. Out of curiosity, for your use-case would it be acceptable if Bea

Re: [VOTE] Release 2.27.0, release candidate #1

2020-12-24 Thread Brian Hulette
t leads to buggy behavior, so I vote -1. >> > >> > The fix to release branch is in flight: >> https://github.com/apache/beam/pull/13613. >> > >> > >> > >> > On Wed, Dec 23, 2020 at 3:38 PM Brian Hulette >> wrote: >> >> >> &g

Re: [VOTE] Release 2.27.0, release candidate #1

2020-12-23 Thread Brian Hulette
-1 (non-binding) Good news: I validated a dataframe pipeline on Dataflow which looked good (with expected performance improvements!) Bad news: I also tried to run the sql_taxi example pipeline (streaming SQL in python) on Dataflow and ran into PubSub IO related issues. The example fails in the same

Re: About Beam SQL Schema Changes and Code generation

2020-12-08 Thread Brian Hulette
Reuven, could you clarify what you have in mind? I know multiple times we've discussed the possibility of adding update compatibility support to SchemaCoder, including support for certain schema changes (field additions/deletions) - I think the most recent discussion was here [1]. But it sounds li

Re: snowflake io in python

2020-11-20 Thread Brian Hulette
nsform.from_runner_api(proto, > context) 1219 if uses_python_sideinput_tags: 1220 # Ordering is > important here. > ~/opt/anaconda3/lib/python3.8/site-packages/apache_beam/transforms/ptransform.py > in from_runner_api(cls, proto, context)688 if proto is None or

Re: snowflake io in python

2020-11-20 Thread Brian Hulette
kages/apache_beam/transforms/ptransform.py > in from_runner_api(cls, proto, context)688 if proto is None or > proto.spec is None or not proto.spec.urn:689 return None--> 690 > parameter_type, constructor = cls._known_urns[proto.spec.urn]691 692 >tr

Re: snowflake io in python

2020-11-19 Thread Brian Hulette
nguage-sdks-for-building-cloud-pipelines On Thu, Nov 19, 2020 at 11:10 AM Alan Krumholz wrote: > DataFlow runner > > On Thu, Nov 19, 2020 at 2:00 PM Brian Hulette wrote: > >> Hm what runner are you using? It looks like we're trying to encode and >> decode the pipe

Re: snowflake io in python

2020-11-19 Thread Brian Hulette
t; 1216 is_python_side_input(side_input_tags[0]) if side_input_tags else >> False) >> 1217 >> -> 1218 transform = ptransform.PTransform.from_runner_api(proto, context) >> 1219 if uses_python_sideinput_tags: >> 1220 # Ordering is important here. >> ~/opt/anaconda3/

Re: snowflake io in python

2020-11-19 Thread Brian Hulette
Hi Alan, Sorry this error message is so verbose. What are you passing for the server_name argument [1]? It looks like that's what the Java stacktrace is complaining about: java.lang.IllegalArgumentException: serverName must be in format .snowflakecomputing.com [1] https://github.com/apache/beam/b

Re: beam rebuilds numpy on pipeline run

2020-10-09 Thread Brian Hulette
+Valentyn Tymofieiev This sounds like it's related to ARROW-8983 (pyarrow takes a long time to download after 0.16.0), discussed on the arrow dev list [2]. I'm not sure what would've triggered this to start happening for you today though. [1] https://issues.apache.org/jira/browse/ARROW-8983 [2]

Re: Ability to link to "latest" of python docs

2020-09-10 Thread Brian Hulette
There's https://beam.apache.org/releases/pydoc/current (and https://beam.apache.org/releases/javadoc/current) which redirect to the most recent release. These get updated somewhere in the release process. They're not very discoverable since it's a redirect and the URL changes when you click on it,

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-03 Thread Brian Hulette
ate a new >>>> buffer, and then the buffer may also need to be resized as the String >>>> grows, so you could be creating a lot of orphaned buffers very quickly. I'm >>>> not that familiar with StringBuilder, is there a way to reset it and re-use >>>&g

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Brian Hulette
That error isn't exactly an OOM, it indicates the JVM is spending a significant amount of time in garbage collection. It looks like `writer.setLength(0)` may actually allocate a new buffer, and then the buffer may also need to be resized as the String grows, so you could be creating a lot of orpha

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-08-19 Thread Brian Hulette
(); > > PCollection> joinedCollection = > KeyedPCollectionTuple > .of(articleTag, articles).and(assetTag, > assets).apply(CoGroupByKey.create()); > > PCollection output = joinedCollection > .apply(ParDo.of(new ArticlesKafkaToBigQuery.EnrichFn(articleTag, &

Re: Registering Protobuf schema

2020-08-19 Thread Brian Hulette
; >> I did subsequently discover some other issues with protoBuf-derived >> schemas (essentially they don’t seem to be properly supported by >> BigQueryIO.Write or allow for optional fields) but I posted a separate >> message on the dev channel covering this. >> >> &g

Re: Registering Protobuf schema

2020-08-18 Thread Brian Hulette
Hi Robert, Sorry for the late reply on this. I think you should be able to do this by registering it in your pipeline's SchemaRegistry manually, like so: Pipeline p; p.getSchemaRegistry().registerSchemaProvider(Fx.class, ProtoMessageSchema.class); Of course this isn't quite as nice as just ad

Re: Accumulator with Map field in CombineFn not serializing correctly

2020-08-07 Thread Brian Hulette
Interesting, thanks for following up with the fix. Were you able to find a way to reproduce this locally, or did it only occur on Dataflow? Did you have to make a similar change for the HashMap in Accum, or just the ExpiringLinkHashMap? Brian On Fri, Aug 7, 2020 at 9:58 AM Josh wrote: > I have

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-29 Thread Brian Hulette
e.org/jira/browse/BEAM-10265 On Mon, Jun 29, 2020 at 11:03 AM Brian Hulette wrote: > Hm it looks like the error is from trying to call the zero-arg constructor > for the ArticleEnvelope proto class. Do you have a schema registered for > ArticleEnvelope? > > I think maybe what'

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-29 Thread Brian Hulette
Inference.java:162) >> [...] >> >> 2.1 When I make the fields public, the pipeline executes, but the >> PCollection does not have a schema associated with it, which causes the >> next pipeline step (BigQueryIO) to fail. >> >> I want to try AutoValue as well

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-26 Thread Brian Hulette
Hi Tobias, You should be able to annotate the EnrichedArticle class with an4 @DefaultSchema annotation and Beam will infer a schema for it. You would need to make some tweaks to the class though to be compatible with the built-in schema providers: you could make the members public and use JavaFiel

Re: Making RPCs in Beam

2020-06-19 Thread Brian Hulette
Kenn wrote a blog post showing how to do batched RPCs with the state and timer APIs: https://beam.apache.org/blog/timely-processing/ Is that helpful? Brian On Thu, Jun 18, 2020 at 5:29 PM Praveen K Viswanathan < harish.prav...@gmail.com> wrote: > Hello Everyone, > > In my pipeline I have to mak

Re: BEAM-2217 NotImplementedError - DataflowRunner parsing Protos from PubSub (Python SDK)

2020-06-10 Thread Brian Hulette
elpful, > just thought I'd mention it). > > After adding beam.typehints.disable_type_annotations() it still throws > the same error. > > Another thing I forgot to mention in my first email is that I registered a > ProtoCoder as suggested at the bottom of this page ( > https:

[ANNOUNCE] Beam 2.22.0 Released

2020-06-10 Thread Brian Hulette
here: https://beam.apache.org/get-started/downloads/ This release includes bug fixes, features, and improvements detailed on the Beam blog: https://beam.apache.org/blog/beam-2.22.0/ Thanks to everyone who contributed to this release, and we hope you enjoy using Beam 2.22.0. -- Brian Hulette, on

Re: BEAM-2217 NotImplementedError - DataflowRunner parsing Protos from PubSub (Python SDK)

2020-06-08 Thread Brian Hulette
Hi Lien, > First time writing the email list, so please tell me if I'm doing this all wrong. Not at all! This is exactly the kind of question this list is for I have a couple of questions that may help us debug: - Can you share the full stacktrace? - What version of Beam are you using? There wer

Re: Best approach for Sql queries in Beam

2020-05-21 Thread Brian Hulette
It might help if you share what exactly you are hoping to improve upon. Do you want to avoid the separate sessionWindow transforms? Do it all with one SQL query? Or just have it be generally more concise/readable? I'd think it would be possible to do this with a single SQL query but I'm not sure,

Re: GoogleCloudOptions.worker_machine_type = 'n1-highcpu-96'

2020-05-12 Thread Brian Hulette
Hi Eila, It looks like you're attempting to set the option on the GoogleCloudOptions class directly, I think you want to set it on an instance of PipelineOptions that you've viewed as GoogleCloudOptions. Like this example from https://cloud.google.com/dataflow/docs/guides/specifying-exec-params#co

Re: Upgrading from 2.15 to 2.19 makes compilation fail on trigger

2020-05-07 Thread Brian Hulette
> messages would likely help as well. >> >> On Tue, May 5, 2020 at 8:45 AM Brian Hulette wrote: >> >>> In both SDKs this is an unsafe trigger because it will only fire once >>> for the first window (per key), and any subsequent data on the same key >>

Re: Upgrading from 2.15 to 2.19 makes compilation fail on trigger

2020-05-05 Thread Brian Hulette
In both SDKs this is an unsafe trigger because it will only fire once for the first window (per key), and any subsequent data on the same key will be dropped. In 2.18, we closed BEAM-3288 with PR https://github.com/apache/beam/pull/9960, which detects these cases and fails early. Probably the fix i