Hi Jesse,
Yes, within a PCollection the lists keep their internal order
- they are "just values" from the perspective of Beam. So the output from
Top is sorted and will remain sorted, there is just no ordering between the
lists. If you want to assemble sorted output by joining together such
Understanding these ordering guarantees is fundamental. Is my understanding
of the ordering guarantees for Top and List correct?
On Fri, May 20, 2016, 6:48 PM Jesse Anderson wrote:
> Here's the output I'm looking for (and getting):
> 2016-01-11T23:59:59.998Z low 682
>
I second Thomas: thanks for the details explanation (I forgot the
mention the "unique" JVM ;)).
Regards
JB
On 05/24/2016 07:28 PM, Thomas Groh wrote:
More specifically, the InProcessPipelineRunner (soon to be renamed to
the DirectRunner) will run on a single machine, with a number of threads
More specifically, the InProcessPipelineRunner (soon to be renamed to the
DirectRunner) will run on a single machine, with a number of threads based
on the number of available processors in the JVM, fanning out work to these
threads as appropriate; It will not perform any cross-process (including
Hi Ryan, perhaps this is https://issues.apache.org/jira/browse/BEAM-197 ?
On Mon, May 23, 2016 at 6:47 PM, Ryan Madsen wrote:
> Hi all,
>
> I'm looking to solve a problem related to performing a join on two
> streaming datasets, and am having a hard time figuring out if
Hi David,
if you use the InProcessPipelineRunner (the "new" DirectPipelineRunner),
than it can creates several threads.
Regards
JB
On 05/24/2016 04:38 PM, David Olsen wrote:
A naive question about DirectPipelineRunner: Is it possible to
execute DirectPipelineRunner with multiple threads/
A naive question about DirectPipelineRunner: Is it possible to
execute DirectPipelineRunner with multiple threads/ instances (across
machines) or the parallelism is only supported by runner such as
SparkPipelineRunner?
My requirement is to run pipeline in parallel, either threading or multiple
Yes -- MinimalWordCount example currently defaults to the
DataflowPipelineRunner, which runs pipelines on the Google Cloud Dataflow
service. (We'll be changing this.) In general, Cloud-based runners don't
have access to your local machine, hence the exception you saw.
DirectPipelineRunner can
Just find out what goes wrong. Changing to use
org.apache.beam.sdk.options.DirectPipelineOptions
org.apache.beam.sdk.runners.DirectPipelineRunner
fixing the problem.
Thanks
On Tue, May 24, 2016 at 6:24 PM, Robertson Williams
wrote:
> I try with the latest version
I try with the latest version 0.1.0-SNAPSHOT cloned from git, but when
testing with MinimalWordCount, it throws
expected a valid 'gs://' path but was given '/tmp/tmpLocation'
Can I run MinimalWordCount example locally (by supplying tmp location at
local file system e.g. file://) or is it
10 matches
Mail list logo