Re: [EXTERNAL] Re: Vulnerabilities in Transitive dependencies

2023-05-02 Thread Brule, Joshua L. (Josh), CISSP via user
The SnakeYAML analysis is exactly what I was looking for. The library of concern is org.codehaus.jackson jackson-mapper-asl 1.9.13. Our scanner is reporting ~20 CVEs with a CVSS of >= 7 and ~60 CVEs total. Thank you, Josh From: Bruno Volpato Date: Monday, May 1, 2023 at 9:04 PM To: u

Vulnerabilities in Transitive dependencies

2023-05-01 Thread Brule, Joshua L. (Josh), CISSP via user
? Thank you for your time, Josh Joshua Brule | Sr Information Security Engineer

Accumulator with Map field in CombineFn not serializing correctly

2020-08-06 Thread Josh
have pasted an outline of my CombineFn below. Thanks for any help with this! Josh private static class MyCombineFn extends CombineFn { private static class ExpiringLinkedHashMap extends LinkedHashMap { @Override protected boolean removeEldestEntry(Map.Entry eldest

Re: Python Development Environments for Apache Beam

2018-06-20 Thread Josh McGinley
Great idea! Here is a link to the post in a tweet. https://twitter.com/jmcginley/status/1009517852892770309 On Wed, Jun 20, 2018 at 12:04 PM Holden Karau wrote: > Do you happen to have a tweet we reshould RT for reach? > > On Wed, Jun 20, 2018, 11:26 AM Josh McGinley wrote: > &

Python Development Environments for Apache Beam

2018-06-20 Thread Josh McGinley
rticle with this community. If you have any feedback let me know. Otherwise keep up the great work on Beam! -- Josh McGinley

Scio 0.5.3 released

2018-05-01 Thread Josh Baer
Hi all, We just released Scio 0.5.3 with a few enhancements and bug fixes. Cheers, Josh https://github.com/spotify/scio/releases/tag/v0.5.3 *"Lasiorhinus latifrons"* Features - Add enabled-parameter to SCollection#debug #1107 <https://github.com/spotify/scio/pull/1107&

Advice on parallelizing network calls in DoFn

2018-03-09 Thread Josh Ferge
Hello all: Our team has a pipeline that make external network calls. These pipelines are currently super slow, and the hypothesis is that they are slow because we are not threading for our network calls. The github issue below provides some discussion around this:

Re: BigQueryIO streaming inserts - poor performance with multiple tables

2018-03-01 Thread Josh
Hi Cham, Thanks, I have emailed the dataflow-feedback email address with the details. Best regards, Josh On Thu, Mar 1, 2018 at 12:26 AM, Chamikara Jayalath <chamik...@google.com> wrote: > Could be a DataflowRunner specific issue. Would you mind reporting this > with corresponding

Re: Partitioning a stream randomly and writing to files with TextIO

2018-02-23 Thread Josh
t; distribute elements well. > > 2) This is runner dependent but most runners don't require storing > everything in memory. For example if you were using Dataflow, you would > only need to store a couple of elements in memory not the entire > PCollection. > > On Thu, Feb 22, 2018

Partitioning a stream randomly and writing to files with TextIO

2018-02-22 Thread Josh
PCollectionList and use TextIO to write each partition to a GCS file. For this, would I need all data for the largest partition to fit into the memory of a single worker? Thanks for any advice, Josh

Re: PubSubIO withTimestampAttribute - what are the implications?

2017-08-04 Thread Josh
s correct - the data watermark will only matter for >> windowing. It will not affect auto-scaling. If the pipeline is not doing >> any windowing, triggering, etc then there is no need to pay for the cost of >> the second subscription. >> >> On Thu, Aug 3, 2017 at 8:17 AM,

PubSubIO withTimestampAttribute - what are the implications?

2017-08-03 Thread Josh
ince we pay per subscription)! So I want to remove `withTimestampAttribute` from jobs where possible, but want to first understand the implications. Thanks for any advice, Josh

Re: What state is buffered when using Combine.perKey with an accumulator?

2017-06-20 Thread Josh
Hi Kenn, Thanks for the reply, that makes sense. As far as I can tell, the DirectPipelineRunner doesn't do this optimisation (when I test the pipeline locally) but I guess the DataflowRunner will. Josh On Tue, Jun 20, 2017 at 4:26 PM, Kenneth Knowles <k...@google.com> wrote: >

What state is buffered when using Combine.perKey with an accumulator?

2017-06-20 Thread Josh
ored across panes? Thanks for any advice, Josh

Re: How to partition a stream by key before writing with FileBasedSink?

2017-06-06 Thread Josh
ur elements > into 4 logical elements (each containing some proportion of your original > data). > > On Tue, Jun 6, 2017 at 9:37 AM, Josh <jof...@gmail.com> wrote: > >> Thanks for the reply, Lukasz. >> >> >> What I meant was that I want to shard

Re: How to decrease latency when using PubsubIO.Read?

2017-05-24 Thread Josh
Hi Raghu, My job ID is 2017-05-24_02_46_42-11524480684503077480 - thanks for taking a look! Yes I'm using BigtableIO for the sink and I am measuring the end-to-end latency. It seems to take 3-6 seconds typically, I would like to get it down to ~1s. Thanks, Josh On Wed, May 24, 2017 at 6:50 PM

Re: How to partition a stream by key before writing with FileBasedSink?

2017-05-24 Thread Josh
y 24, 2017 at 9:14 AM, Josh <jof...@gmail.com> wrote: > >> Hi Lukasz, >> >> Thanks for the example. That sounds like a nice solution - >> I am running on Dataflow though, which dynamically sets numShards - so if >> I set numShards to 1 on each of those Avr

Re: How to decrease latency when using PubsubIO.Read?

2017-05-24 Thread Josh
s "Wrote 0 records" in the logs. Probably about 50% of the "Wrote n records" messages are zero. While the other 50% are quite high (e.g. "Wrote 80 records"). Not sure if that could indicate a bad setting? Josh On Wed, May 24, 2017 at 5:22 PM, Ankur Chauhan <an...

Re: How to partition a stream by key before writing with FileBasedSink?

2017-05-24 Thread Josh
fine as long as I partition my stream into a large enough number of partitions so that Dataflow won't override numShards. Josh On Wed, May 24, 2017 at 4:10 PM, Lukasz Cwik <lc...@google.com> wrote: > Since your using a small number of shards, add a Partition transform which > uses a d

Re: How to decrease latency when using PubsubIO.Read?

2017-05-24 Thread Josh
nner are you using? If you are using google cloud dataflow then the >> PubsubIO class is not the one doing the reading from the pubsub topic. They >> provide a custom implementation at run time. >> >> Ankur Chauhan >> Sent from my iPhone >> >> On May 24, 20

Re: How to decrease latency when using PubsubIO.Read?

2017-05-24 Thread Josh
/io/gcp/pubsub/PubsubIO.java Thanks, Josh On Wed, May 24, 2017 at 3:36 PM, Ankur Chauhan <an...@malloc64.com> wrote: > What runner address you using. Google cloud dataflow uses a closed source > version of the pubsub reader as noted in a comment on Read class. > > Ankur Chau

How to decrease latency when using PubsubIO.Read?

2017-05-24 Thread Josh
CPU. Could forcing a higher number of nodes help improve latency? Thanks for any advice, Josh

How to partition a stream by key before writing with FileBasedSink?

2017-05-24 Thread Josh
to the same file. Is there a way to do this? Note that in my stream the number of keys is very large (most elements have a unique key, while a few elements share a key). Thanks, Josh

Re: Using PubSubIO.read with windowing

2017-05-09 Thread Josh
... Best, Josh On Tue, May 9, 2017 at 10:30 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi Josh, > What is this running on? I suspect the Dataflow service? In that case I’m > afraid I can’t help because I know to little about it. > > Best, > Aljoscha > > On 8.

Re: Using PubSubIO.read with windowing

2017-05-08 Thread Josh
2-21T19:59:05.225Z last reported watermark On Mon, May 8, 2017 at 9:56 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > One suspicion I have is that the watermark could be lacking behind a bit. > Have you looked at that? > > On 7. May 2017, at 22:44, Josh <jof...@gmail.com>

Re: Using PubSubIO.read with windowing

2017-05-07 Thread Josh
been sent, rather than immediately after each window. Any ideas what's going on here? Thanks, Josh On Sun, May 7, 2017 at 12:18 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > First, a bit of clarification (or refinement): a windowing strategy is > used in all subseq

Re: Fwd: Slack Invite

2017-05-05 Thread Josh
Could someone add me too please? at j...@permutive.com On Fri, May 5, 2017 at 9:08 AM, Jean-Baptiste Onofré wrote: > Done > > Regards > JB > > > On 05/05/2017 10:02 AM, Edward Bosher wrote: > >> i, >> >> Whenever you have time I'd love to get an invite to slack on this email

Http sink - Does it make sense?

2017-05-04 Thread Josh
with Beam? I was unable to find any examples of an Http sink online. If I write my own custom sink to do this, is there anything to be wary of? Thanks for any advice, Josh

Slack channel invite

2017-05-02 Thread Josh Di Fabio
Please will someone kindly invite joshdifa...@gmail.com to the Beam slack channel?

Re: How to skip processing on failure at BigQueryIO sink?

2017-04-12 Thread Josh
prefiltering out any records in a preceeding DoFn instead of relying on >> BigQuery telling you that the schema doesn't match? >> >> Otherwise you are correct in believing that you will need to update >> BigQueryIO to have the retry/error semantics that you want.

Re: How to skip processing on failure at BigQueryIO sink?

2017-04-11 Thread Josh
this at the moment? Will I need to make some custom changes to BigQueryIO? On Mon, Apr 10, 2017 at 7:11 PM, Josh <jof...@gmail.com> wrote: > Hi, > > I'm using BigQueryIO to write the output of an unbounded streaming job to > BigQuery. > > In the case that an element in the stream canno

How to skip processing on failure at BigQueryIO sink?

2017-04-10 Thread Josh
, it seems to cause the whole pipeline to halt. How can I configure beam so that if writing an element fails a few times, it simply gives up on writing that element and moves on without affecting the pipeline? Thanks for any advice, Josh

Re: BigQueryIO - Why is CREATE_NEVER not supported when using a tablespec?

2017-04-07 Thread Josh
Hi Dan, Ok great thanks for confirming. I will create a JIRA and submit a PR to remove this check then. Thanks, Josh On Fri, Apr 7, 2017 at 6:09 PM, Dan Halperin <dhalp...@apache.org> wrote: > Hi Josh, > You raise a good point. I think we had put this check in (long before > p

BigQueryIO - Why is CREATE_NEVER not supported when using a tablespec?

2017-04-07 Thread Josh
CreateDisposition.CREATE_IF_NEEDED. I can't use CreateDisposition.CREATE_IF_NEEDED because it requires me to provide a table schema and my BigQuery schema isn't available at compile time. Is there any good reason why CREATE_NEVER is not allowed when using a tablespec? Thanks, Josh

Re: Having a local cache (per JVM) to use in DoFns

2017-04-06 Thread Josh
ronized (MyDoFn.class) { > if (cachedService == null) { > cachedService = ... > } > } > } > } > > [1]: https://github.com/apache/beam/blob/master/sdks/ > java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L496 > > On T

Having a local cache (per JVM) to use in DoFns

2017-04-06 Thread Josh
in there? What if I want my cache to be used in two separate DoFns (which sometimes run in the same JVM) - how can I ensure one cache per JVM rather than one cache per DoFn? Thanks for any advice, Josh