The SnakeYAML analysis is exactly what I was looking for. The library of
concern is org.codehaus.jackson jackson-mapper-asl 1.9.13. Our scanner is
reporting ~20 CVEs with a CVSS of >= 7 and ~60 CVEs total.
Thank you,
Josh
From: Bruno Volpato
Date: Monday, May 1, 2023 at 9:04 PM
To: u
?
Thank you for your time,
Josh
Joshua Brule | Sr Information Security Engineer
have pasted an outline of my
CombineFn below.
Thanks for any help with this!
Josh
private static class MyCombineFn extends CombineFn {
private static class ExpiringLinkedHashMap extends
LinkedHashMap {
@Override
protected boolean removeEldestEntry(Map.Entry eldest
Great idea! Here is a link to the post in a tweet.
https://twitter.com/jmcginley/status/1009517852892770309
On Wed, Jun 20, 2018 at 12:04 PM Holden Karau
wrote:
> Do you happen to have a tweet we reshould RT for reach?
>
> On Wed, Jun 20, 2018, 11:26 AM Josh McGinley wrote:
>
&
rticle with this community. If you have any
feedback let me know. Otherwise keep up the great work on Beam!
--
Josh McGinley
Hi all,
We just released Scio 0.5.3 with a few enhancements and bug fixes.
Cheers,
Josh
https://github.com/spotify/scio/releases/tag/v0.5.3
*"Lasiorhinus latifrons"*
Features
- Add enabled-parameter to SCollection#debug #1107
<https://github.com/spotify/scio/pull/1107&
Hello all:
Our team has a pipeline that make external network calls. These pipelines
are currently super slow, and the hypothesis is that they are slow because
we are not threading for our network calls. The github issue below provides
some discussion around this:
Hi Cham,
Thanks, I have emailed the dataflow-feedback email address with the details.
Best regards,
Josh
On Thu, Mar 1, 2018 at 12:26 AM, Chamikara Jayalath <chamik...@google.com>
wrote:
> Could be a DataflowRunner specific issue. Would you mind reporting this
> with corresponding
t; distribute elements well.
>
> 2) This is runner dependent but most runners don't require storing
> everything in memory. For example if you were using Dataflow, you would
> only need to store a couple of elements in memory not the entire
> PCollection.
>
> On Thu, Feb 22, 2018
PCollectionList and use TextIO to write
each partition to a GCS file. For this, would I need all data for the
largest partition to fit into the memory of a single worker?
Thanks for any advice,
Josh
s correct - the data watermark will only matter for
>> windowing. It will not affect auto-scaling. If the pipeline is not doing
>> any windowing, triggering, etc then there is no need to pay for the cost of
>> the second subscription.
>>
>> On Thu, Aug 3, 2017 at 8:17 AM,
ince we pay per subscription)! So I want to
remove `withTimestampAttribute` from jobs where possible, but want to first
understand the implications.
Thanks for any advice,
Josh
Hi Kenn,
Thanks for the reply, that makes sense.
As far as I can tell, the DirectPipelineRunner doesn't do this optimisation
(when I test the pipeline locally) but I guess the DataflowRunner will.
Josh
On Tue, Jun 20, 2017 at 4:26 PM, Kenneth Knowles <k...@google.com> wrote:
>
ored across panes?
Thanks for any advice,
Josh
ur elements
> into 4 logical elements (each containing some proportion of your original
> data).
>
> On Tue, Jun 6, 2017 at 9:37 AM, Josh <jof...@gmail.com> wrote:
>
>> Thanks for the reply, Lukasz.
>>
>>
>> What I meant was that I want to shard
Hi Raghu,
My job ID is 2017-05-24_02_46_42-11524480684503077480 - thanks for taking a
look!
Yes I'm using BigtableIO for the sink and I am measuring the end-to-end
latency. It seems to take 3-6 seconds typically, I would like to get it
down to ~1s.
Thanks,
Josh
On Wed, May 24, 2017 at 6:50 PM
y 24, 2017 at 9:14 AM, Josh <jof...@gmail.com> wrote:
>
>> Hi Lukasz,
>>
>> Thanks for the example. That sounds like a nice solution -
>> I am running on Dataflow though, which dynamically sets numShards - so if
>> I set numShards to 1 on each of those Avr
s "Wrote 0 records" in the
logs. Probably
about 50% of the "Wrote n records" messages are zero. While the other 50%
are quite high (e.g. "Wrote 80 records"). Not sure if that could indicate a
bad setting?
Josh
On Wed, May 24, 2017 at 5:22 PM, Ankur Chauhan <an...
fine as long as I partition my stream into a large enough number of
partitions so that Dataflow won't override numShards.
Josh
On Wed, May 24, 2017 at 4:10 PM, Lukasz Cwik <lc...@google.com> wrote:
> Since your using a small number of shards, add a Partition transform which
> uses a d
nner are you using? If you are using google cloud dataflow then the
>> PubsubIO class is not the one doing the reading from the pubsub topic. They
>> provide a custom implementation at run time.
>>
>> Ankur Chauhan
>> Sent from my iPhone
>>
>> On May 24, 20
/io/gcp/pubsub/PubsubIO.java
Thanks,
Josh
On Wed, May 24, 2017 at 3:36 PM, Ankur Chauhan <an...@malloc64.com> wrote:
> What runner address you using. Google cloud dataflow uses a closed source
> version of the pubsub reader as noted in a comment on Read class.
>
> Ankur Chau
CPU. Could forcing a higher number of nodes help improve latency?
Thanks for any advice,
Josh
to the same file. Is there a way to do this? Note
that in my stream the number of keys is very large (most elements have a
unique key, while a few elements share a key).
Thanks,
Josh
...
Best,
Josh
On Tue, May 9, 2017 at 10:30 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:
> Hi Josh,
> What is this running on? I suspect the Dataflow service? In that case I’m
> afraid I can’t help because I know to little about it.
>
> Best,
> Aljoscha
>
> On 8.
2-21T19:59:05.225Z last reported watermark
On Mon, May 8, 2017 at 9:56 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:
> One suspicion I have is that the watermark could be lacking behind a bit.
> Have you looked at that?
>
> On 7. May 2017, at 22:44, Josh <jof...@gmail.com>
been sent, rather than immediately after
each window.
Any ideas what's going on here?
Thanks,
Josh
On Sun, May 7, 2017 at 12:18 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:
> Hi,
> First, a bit of clarification (or refinement): a windowing strategy is
> used in all subseq
Could someone add me too please? at j...@permutive.com
On Fri, May 5, 2017 at 9:08 AM, Jean-Baptiste Onofré
wrote:
> Done
>
> Regards
> JB
>
>
> On 05/05/2017 10:02 AM, Edward Bosher wrote:
>
>> i,
>>
>> Whenever you have time I'd love to get an invite to slack on this email
with Beam? I was unable to
find any examples of an Http sink online. If I write my own custom sink to
do this, is there anything to be wary of?
Thanks for any advice,
Josh
Please will someone kindly invite joshdifa...@gmail.com to the Beam slack
channel?
prefiltering out any records in a preceeding DoFn instead of relying on
>> BigQuery telling you that the schema doesn't match?
>>
>> Otherwise you are correct in believing that you will need to update
>> BigQueryIO to have the retry/error semantics that you want.
this at the moment? Will I need to make some custom
changes to BigQueryIO?
On Mon, Apr 10, 2017 at 7:11 PM, Josh <jof...@gmail.com> wrote:
> Hi,
>
> I'm using BigQueryIO to write the output of an unbounded streaming job to
> BigQuery.
>
> In the case that an element in the stream canno
, it seems to
cause the whole pipeline to halt.
How can I configure beam so that if writing an element fails a few times,
it simply gives up on writing that element and moves on without affecting
the pipeline?
Thanks for any advice,
Josh
Hi Dan,
Ok great thanks for confirming. I will create a JIRA and submit a PR to
remove this check then.
Thanks,
Josh
On Fri, Apr 7, 2017 at 6:09 PM, Dan Halperin <dhalp...@apache.org> wrote:
> Hi Josh,
> You raise a good point. I think we had put this check in (long before
> p
CreateDisposition.CREATE_IF_NEEDED. I can't
use CreateDisposition.CREATE_IF_NEEDED because it requires me to provide a
table schema and my BigQuery schema isn't available at compile time.
Is there any good reason why CREATE_NEVER is not allowed when using a
tablespec?
Thanks,
Josh
ronized (MyDoFn.class) {
> if (cachedService == null) {
> cachedService = ...
> }
> }
> }
> }
>
> [1]: https://github.com/apache/beam/blob/master/sdks/
> java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L496
>
> On T
in there? What if I want my cache to be used in two separate DoFns
(which sometimes run in the same JVM) - how can I ensure one cache per JVM
rather than one cache per DoFn?
Thanks for any advice,
Josh
36 matches
Mail list logo