I have some side inputs that I would like to add to my pipeline. Some of
them are based on a file pattern, so I found that I can collect the
contents of those files using a pattern like the following:
val genotypes =
p.apply(FileIO.`match`.filepattern(opts.getGenotypesFilePattern()))
csv) to
> the output files and, as consequence, GZIP files were automatically
> decompressed when downloading them (as explained in the previous link).
>
> Best,
>
>
> El vie., 12 oct. 2018 23:40, Randal Moore escribió:
>
>> Using Beam Java SDK 2.6.
>>
>> I h
Using Beam Java SDK 2.6.
I have a batch pipeline that has run successfully in its current several
times. Suddenly I am getting strange errors complaining about the format of
the input. As far as I know, the pipeline didn't change at all since the
last successful run. The error:
I'm using dataflow. Found what seems to me to be "usage problem" with the
available APIs.
When I submit a job to the dataflow runner, I get back a
DataflowPipelineJob or its superclass, PipelineResult which provides me the
status of the job - as an enumerated type. But if I use a DataflowClient to
+1
On Tue, Oct 17, 2017 at 5:21 PM Raghu Angadi wrote:
> +1.
>
> On Tue, Oct 17, 2017 at 2:11 PM, David McNeill
> wrote:
>
>> The final version of Beam that supports Java 7 should be clearly stated
>> in the docs, so those stuck on old production
I have a batch pipeline that runs well with small inputs but fails with a
larger dataset.
Looking at stackdriver I find a fair number of the following:
Request failed with code 400, will NOT retry:
beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L441
> [3] https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>
> On Sun, Jul 9, 2017 at 2:54 PM, Randal Moore <rdmoor...@gmail.com> wrote:
&g
Is this part of the Beam API or something I should look at the google docs
for help? Assume a job is running in dataflow - how can an interested
third-party app query the status if it knows the job-id?
rdm
n be shared across multiple bundles
>> and can contain methods marked with @Setup/@Teardown which only get invoked
>> once per DoFn instance (which is relatively infrequently) and you could
>> store an instance per DoFn instead of a singleton if the REST library was
>> not thread s
I have a step in my beam pipeline that needs some data from a rest service.
The data acquired from the rest service is dependent on the context of the
data being processed and relatively large. The rest client I am using isn't
serializable - nor is it likely possible to make it so (background
Just starting looking at Beam this week as a candidate for executing some
fairly CPU intensive work. I am curious if the stream-oriented features of
Beam are a match for my usecase. My user will submit a large number of
computations to the system (as a "job"). Those computations can be
expressed
11 matches
Mail list logo